RcppEigen/0000755000176200001440000000000014567702434012142 5ustar liggesusersRcppEigen/NAMESPACE0000644000176200001440000000100114520536112013334 0ustar liggesusersuseDynLib("RcppEigen", .registration=TRUE) importFrom("Rcpp", "evalCpp") importFrom("utils", "packageDescription", "package.skeleton") importFrom("stats", "model.frame", "model.matrix", "model.response", "fitted", "coef", "printCoefmat", "pt") export("fastLm", "fastLmPure", "RcppEigen.package.skeleton" ) S3method("fastLm", "default") S3method("fastLm", "formula") S3method("predict", "fastLm") S3method("print", "fastLm") S3method("summary", "fastLm") S3method("print", "summary.fastLm") RcppEigen/LICENSE0000644000176200001440000000050512254315522013134 0ustar liggesusers Please see the included file COPYRIGHTS for per-file (or per-directory) details on licenses as well as copyrights. In short: - Eigen (as an upstream project) in licensed under the MPL-2 (a version of which is included in the file COPYRIGHT), and - the RcppEigen integration into R is licensed as GPL-2 or later. RcppEigen/ChangeLog0000644000176200001440000011606614567626734013736 0ustar liggesusers2024-02-28 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.4.0.0 2024-02-12 Tomas Kalibera * inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadYield.h: Support clang on Windows by including 'sched.h' header 2024-01-17 Yixuan Qiu * DESCRIPTION (Version, Date): Release candidate 0.3.3.99.0 * inst/include/Eigen: Upgraded to Eigen 3.4.0 * inst/include/unsupported/Eigen: Idem * patches/eigen-3.4.0.diff: Carried local CRAN patches forward * patches/howToDiff.md: Idem 2023-11-01 Dirk Eddelbuettel * DESCRIPTION (Version, Date): CRAN Release 0.3.3.9.4 * inst/NEWS.Rd: Release 0.3.3.9.4 2023-10-05 Mikael Jagan * DESCRIPTION: Package 'Matrix' is now only a Suggests: * NAMESPACE: Remove unconditional imports from package Matrix * inst/include/Eigen/CholmodSupport: No longer need to include RcppEigenCholmod.h * inst/include/Eigen/src/CholmodSupport/CholmodSupport.h: Small wrapper adjustments conditional on Matrix use * inst/include/RcppEigenCholmod.h: Updated * inst/include/RcppEigenForward.h: Simplified * inst/include/RcppEigenWrap.h: Ditto * inst/include/RcppEigenStubs.cpp: New shorter helper * inst/include/RcppEigenStubs.h: Removed 2023-07-21 Dirk Eddelbuettel * DESCRIPTION (Date, Version): Roll micro version and date * README.md: Add r-universe badge 2023-07-20 Dirk Eddelbuettel * src/RcppEigen.cpp (EigenNbThreads): Add simple threads reporter * R/fastLm.R (fastLmPure): Simpler call to `fastLm_Impl()` * src/init.c: Replaced by auto-generated section in RcppExports.cpp * src/RcppExports.cpp: Regenerated * R/RcppExports.R: Idem * src/Makevars: Document possible use of '-fopenmp' * src/Makevars.win: Idem 2023-04-18 Dirk Eddelbuettel * README.md: Use app.codecov.io as base for codecov link 2023-03-10 Dirk Eddelbuettel * DESCRIPTION (Date, Version): Roll minor version * R/RcppEigen.package.skeleton.R: No longer set Imports: RcppEigen in DESCRIPTION and NAMESPACE 2023-02-12 Dirk Eddelbuettel * inst/CITATION: Convert to bibentry() style with person() 2022-11-04 Dirk Eddelbuettel * DESCRIPTION (Version, Date): CRAN Release 0.3.3.9.3 * inst/NEWS.Rd: Release 0.3.3.9.3 * src/init.c: Add 'void' for proper prototype pleasing clang-15 * R/fastLm.R (summary.fastLm,print.fastLm): Refer to correct and full variable name df.residual in the returned object * .github/workflows/ci.yaml (jobs): Update to actions/checkout@v3 2022-09-15 Jonah Gabry * inst/skeleton/rcppeigen_hello_world.cpp: Correct typo 2022-04-08 Dirk Eddelbuettel * DESCRIPTION (Depends): Add a versioned dependency on R 3.6.0 or later because of our use of FCONE to support USE_FC_LEN_T 2022-04-05 Dirk Eddelbuettel * DESCRIPTION (Version, Date): CRAN Release 0.3.3.9.2 * inst/NEWS.Rd: Release 0.3.3.9.2 2022-04-04 Dirk Eddelbuettel * src/fastLm.cpp: Add FCONE in two calls for improved Fortran and C character interface per Writing R Extensions, Section 6.6.1 2022-01-16 Dirk Eddelbuettel * inst/tinytest/test_wrap.R: Added (optional) large memory wrap tests * inst/tinytest/cpp/wrap.cpp: Added C++ part of test * .codecov.yml: Added to not trigger PR fail for small additions 2022-01-16 Mikael Jagan * inst/include/RcppEigenWrap.h: Refine use plain dense wrap() change 2022-01-15 Mikael Jagan * inst/include/RcppEigenWrap.h: Use R_xlen_t for vectors rows + cols 2021-12-29 Dirk Eddelbuettel * README.md: Add total downloads badge 2021-12-08 Dirk Eddelbuettel * README.md: Remove unused continuous integration artifact and badge 2021-10-13 Dirk Eddelbuettel * inst/CITATION: Refinment of doi use 2021-10-10 Dirk Eddelbuettel * inst/CITATION: Switch JSS url to doi form per JSS request * README.md: Idem 2021-07-19 Dirk Eddelbuettel * inst/include/RcppEigenWrap.h: Two more #nocov tags * src/fastLm.cpp: One more #nocov tag 2021-07-18 Dirk Eddelbuettel * inst/tinytest/test_fastLm.R: Add tests for summary * src/fastLm.cpp: Add a few #nocov tags 2021-07-17 Dirk Eddelbuettel * DESCRIPTION (Date, Version): Roll minor version * R/RcppEigen.package.skeleton.R (RcppEigen.package.skeleton): Also import RcppEigen in DESCRIPTION 2021-06-07 Dirk Eddelbuettel * DESCRIPTION (Date, Version): Roll minor version * inst/tinytest/test_misc.R: New test file * inst/tinytest/test_fastLm.R: Added tests * R/fastLm.R: Add single one-line nocov tag * .Rbuildignore: Add .covignore and .codecov.yml 2021-06-06 Dirk Eddelbuettel * README.md: Added coverage badge * .github/workflows/ci.yaml (jobs): Turn on coverage * .covrignore: Added * .codecov.yml (ignore): Idem * R/RcppEigen.package.skeleton.R: Set nocov 2021-05-09 Dirk Eddelbuettel * DESCRIPTION (URL): Add GitHub repo to URL field 2021-01-02 Dirk Eddelbuettel * R/RcppEigen.package.skeleton.R: Wrap any() around grepl() 2020-12-25 Dirk Eddelbuettel * .github/workflows/ci.yaml: Small tweaks to CI YAML file 2020-12-17 Dirk Eddelbuettel * DESCRIPTION (Version, Date): CRAN Release 0.3.3.9.1 (following coordinated update of reverse dependency StanHeaders) * inst/NEWS.Rd: Release 0.3.3.9.1 2020-12-14 Dirk Eddelbuettel * .github/workflows/ci.yaml: Add CI runner using r-ci * README.md: Add new badge 2020-12-05 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.3.9.0 * inst/include/Eigen: Upgraded to Eigen 3.3.9 * inst/include/unsupported/Eigen: Idem * patches/eigen-3.3.9.diff: Carried local CRAN patches forward * .travis.yml: Switch to r-ci using focal and bspm * README.md: Updated URLs to https and/or redirect location * inst/CITATION: Idem * man/RcppEigen-package.Rd: Idem * man/fastLm.Rd: Idem 2020-08-16 Dirk Eddelbuettel * README.md: Add JSS badge 2020-03-29 Dirk Eddelbuettel * README.md: Added commit badge, edited 2020-01-22 Dirk Eddelbuettel * README.md: README.md: Add a Debian badge 2019-11-16 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.3.7.0 * inst/include/Eigen: Upgraded to Eigen 3.3.7 * inst/include/unsupported/Eigen: Idem * patches/eigen-3.3.7.diff: Carried local CRAN patches forward 2019-11-01 Dirk Eddelbuettel * R/unit.test.R (compile_unit_tests): Removed as no longer needed 2019-10-31 Dirk Eddelbuettel * inst/tinytest/test_transform.R: Switch from RUnit to tinytest * inst/tinytest/test_wrap.R: Idem * inst/tinytest/cpp/transform.cpp: Added using Rcpp Attributes * inst/tinytest/cpp/wrap.cpp: Idem * inst/tinytest/cpp/sparse.cpp: More idiomatic Eigen code * inst/tinytest/test_sparse.R: Idem * .editorconfig: Added 2019-10-30 Dirk Eddelbuettel * DESCRIPTION (Suggests): Switch from RUnit to tinytest * .travis.yml (install): Ditto * tests/tinytest.R: Converted to tinytest * test_sparse.R: Converted to tinytest * cpp/sparse.cpp: Added using Rcpp Attributes 2019-10-29 Dirk Eddelbuettel * test_fastLm.R: Converted to tinytest * test_RcppEigen.R: Idem * test_solution.R: Idem * cpp/rcppeigen.cpp: Added using Rcpp Attributes * cpp/solution.cpp: Idem 2019-10-28 Dirk Eddelbuettel * tests/tinytest.R: Renamed from tests/doRUnit.R * inst/tinytest/test_fastLm.R: Renamed from inst/unitTests/runit* * inst/tinytest/test_RcppEigen.R: Idem * inst/tinytest/test_solution.R: Idem * inst/tinytest/test_sparse.R: Idem * inst/tinytest/test_transform.R: Idem * inst/tinytest/test_wrap.R: Idem 2019-10-13 Dirk Eddelbuettel * README.md: Added CRAN + BioConductor badges for reverse depends, add StackOverflow badge to recommend help searches under 'Rcpp' * DESCRIPTION (Version, Date): Roll minor version * R/RcppEigen.package.skeleton.R (RcppEigen.package.skeleton): Test for example_code outside of haveKitten test; remove force argument if unused 2019-05-24 Dirk Eddelbuettel * vignettes/RcppEigen-Introduction.Rnw: Update vignette to use RcppEigen:::eigen_version() instead of .Call() 2019-03-29 Dirk Eddelbuettel * inst/skeleton/rcppeigen_hello_world.cpp: Rework first example to not rely on Eigen RNG which R CMD check would complain about. 2019-02-16 James Joseph Balamuta * R/runit.RcppEigen.R: Removed listing RcppEigen in Imports during skeleton package creation. 2018-11-24 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.3.5.0 2018-11-23 Dirk Eddelbuettel * inst/include/Eigen: Updated to release 3.3.5 * inst/include/unsupported: Idem * patches/eigen-3.3.5.diff: Diff of local patches 2018-09-02 Dirk Eddelbuettel * .travis.yml: Switch Travis CI to R 3.5 repo 2018-05-30 Michael Weylandt * inst/examples/lmBenchmark.R: Update benchmark script to use microbenchmark and to use exposed fastLm functions from Rcpp packages rather than invoking .Call directly 2018-05-25 Ralf Stubner * inst/include/RcppEigenWrap.h: Use Rf_xlength and R_xlen_t to support long vectors 2018-02-07 Dirk Eddelbuettel * patches/eigen-3.3.4.diff: Diff to upstream Eigen 3.3.4, cherry-picked in the 3.3.4 branch of Yixuan 2018-02-05 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.3.4.0 * inst/include/Eigen/src/Core/arch/CUDA/Half.h (Eigen): Condition use of long long and unsigned long on C++11 * inst/include/Eigen/src/Core/arch/SSE/Complex.h (Eigen): Comment-out use of diagnostic-suppressing pragma for gcc + clang to satisfy CRAN * inst/include/Eigen/src/Core/util/DisableStupidWarnings.h: Idem 2018-02-04 Yixuan Qiu [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen: Updated to the upstream 3.3 branch of Eigen based on version 3.3.4 * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem * inst/include/Eigen/src/Core/util/DisableStupidWarnings.h: Patch from upstream Eigen that has not been ported to the 3.3 branch (cf GitHub issue #48) 2017-11-19 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.3.3.1 * R/inline.R: Use '::' not ':::' for Rcpp.plugin.maker 2017-08-26 Dirk Eddelbuettel * .travis.yml (before_install): Use https for curl fetch 2017-06-06 Yu Gong * inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h: Also permit compilation under Haiku-OS 2017-05-28 Dirk Eddelbuettel * inst/examples/lmBenchmark.R (do_bench): Remove spurious argument in call to RcppEigen:::Eigen_SSE() (cf github issue #44) 2017-04-29 Dirk Eddelbuettel * DESCRIPTION: Release 0.3.3.3.0 [ which again owes a very big thank you to Yixuan for doing the work! ] 2017-04-27 Dirk Eddelbuettel * .travis.yml: Switch to using run.sh for Travis CI 2017-04-27 Yixuan Qiu [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen/src/Core/arch/CUDA/Half.h: Fixed compiler warning on 'long long' type in C++ 98 mode 2017-03-14 Dirk Eddelbuettel * DESCRIPTION (Version, Date): Release 0.3.2.9.1 * src/init.c (R_init_RcppEigen): Call R_registerRoutines() and R_useDynamicSymbols() * NAMESPACE: Use .registration=TRUE on useDynLib * R/fastLm.R (fastLmPure): Remove erroneous fourth argument from .Call 2017-03-13 Martin Maechler * inst/include/RcppEigenCholmod.h: Synchronize with Matrix package 2017-02-21 Yixuan Qiu [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen: Updated to release 3.3.3 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem 2017-01-20 Yixuan Qiu [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen: Updated to release 3.3.2 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem * inst/unitTests/runit.RcppEigen.R, inst/unitTests/runit.sparse.R: Explicitly convert matrix size to `double` type such that Rcpp can properly return the value to R, thanks to ChingChuan and Dirk 2017-01-20 ChingChuan Chen [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen: Updated to release 3.3.1 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * inst/examples/lmBenchmark.R: Fixed function names 2016-12-22 Dirk Eddelbuettel * DESCRIPTION (URL, BugReports): Added / updated 2016-11-12 Yixuan Qiu [ In RcppEigen 0.3.3.* branch ] * inst/include/Eigen: Updated to release 3.3.0 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: Added exporters for the new Map > type * inst/unitTests/runit.sparse.R: Unit tests for the new type 2016-08-20 Dirk Eddelbuettel * DESCRIPTION: Release 0.3.2.9.0 with big thanks to Yixuan for doing the work! 2016-08-19 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.9 of Eigen * README.md: Updated version number and fixed the NOTE from CRAN URL check 2016-04-30 Dirk Eddelbuettel * README.md: Expanded 2016-04-28 James Joseph Balamuta * inst/include/RcppEigenWrap.h: Added an exporter class for Map::RowVectorX per http://stackoverflow.com/questions/36920689/ * inst/include/unitTests/runit.RcppEigen.R: Added row exporter unit test. 2016-02-29 Dirk Eddelbuettel * DESCRIPTION (Version): Release 0.3.2.8.1 * inst/NEWS.Rd: Release 0.3.2.8.1 * debian/*: Changes for Debian release of 0.3.2.8.1 2016-02-28 Yixuan Qiu * inst/include/Eigen/src/SparseCore/CompressedStorage.h, inst/include/Eigen/src/SparseCore/SparseBlock.h, inst/include/Eigen/src/SparseCore/SparseMatrix.h, inst/include/Eigen/src/SparseCore/SparseRedux.h, inst/include/Eigen/src/SparseCore/SparseVector.h: Another patch from upstream to fix UBSAN null pointer errors 2016-02-23 Dirk Eddelbuettel * DESCRIPTION (Version): Release 0.3.2.8.0 * DESCRIPTION (Author): Added Yixuan Qiu * inst/NEWS.Rd: Release 0.3.2.8.0 2016-02-22 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.8 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem * inst/include/Eigen/src/SparseCore/CompressedStorage.h, inst/include/Eigen/src/Core/util/Memory.h: Applied patch from upstream to fix UBSAN errors 2016-01-20 Dirk Eddelbuettel * DESCRIPTION: Release 0.3.2.7.0 * debian/*: Changes for Debian release of 0.3.2.7.0 2016-01-19 Dirk Eddelbuettel * inst/unitTests/runit.sparse.R (test.sparseCholesky.R): Accomodate stricter as.vector() requirements in R-devel 2016-01-18 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.7 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem 2015-11-01 Alexey Stukalov * src/fastLm.cpp: Do not include R's LAPACK header if MKL defined 2015-09-23 Dirk Eddelbuettel * DESCRIPTION: Release 0.3.2.5.1 * debian/*: Changes for Debian release of 0.3.2.5.1 2015-09-21 Dirk Eddelbuettel * inst/include/Eigen/src/SparseCore/CompressedStorage.h: Added somewhat adapted code from upstream PR #118 to address UBSAN 2015-08-21 Dirk Eddelbuettel * src/RcppEigen.cpp: Updated code to use Rcpp Attributes * src/fastLm.cpp: Ditto * src/fastLm.h: Ditto * R/fastLm.R: Ditto * inst/unitTests/runit.fastLm.R: Ditto 2015-08-04 Dirk Eddelbuettel * R/RcppEigen.package.skeleton.R: Correct kitten() use based on patch kindly supplied by Grant Brown (cf GitHub issue #21) 2015-07-13 Dirk Eddelbuettel * DESCRIPTION: Version 0.3.2.5.0 2015-07-13 Yixuan Qiu * inst/include/Eigen/src/Core/Assign.h: Applied patch from upstream Eigen that will appear in the next release 2015-07-12 Dirk Eddelbuettel * DESCRIPTION: Added more explicitly quoting per R CMD check * NAMESPACE: Explicitly importFrom() stats and utils packages 2015-07-11 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.5 of Eigen * inst/include/unsupported: Idem * DESCRIPTION: Idem * README.md: Idem * inst/include/RcppEigenWrap.h: Fixed three incorrect error messages 2015-06-06 Dirk Eddelbuettel * R/RcppEigen.package.skeleton.R: Correct use of requireNamespace() 2015-02-26 Dirk Eddelbuettel * DESCRIPTION: Version 0.3.2.4.0 * debian/*: Updated 2015-02-23 Dirk Eddelbuettel * inst/include/Eigen: Updated to release 3.2.4 of Eigen * DESCRIPTION: Updated Version: and Date: * README.md: Idem * .travis.yml: Switch to ppa:edd/misc for binaries * R/RcppEigen.package.skeleton.R: Use pkgKitten::kitten, if installed * inst/skeleton/rcppeigen_hello_world.Rd: Added new page * inst/skeleton/Makevars: Minor edits * inst/skeleton/Makevars.win: Idem * DESCRIPTION: Added Suggests: for pkgKitten * .travis.yml: Also install pkgKitten 2014-12-22 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.3 of Eigen * DESCRIPTION: Idem * README.md: Idem 2014-11-23 Yixuan Qiu * inst/unitTests/runit.wrap.R: Added a number of unit tests * inst/unitTests/runit.transform.R: Idem * inst/unitTests/runit.sparse.R: Idem 2014-10-20 Dirk Eddelbuettel * inst/examples/lmBenchmark.R: Updated RcppArmadillo .Call in example 2014-08-21 Dirk Eddelbuettel * DESCRIPTION: New version 0.3.2.2.0 which was prepared expertly by Yixuan Qiu in a nice and clean pull request -- a big thank you for that! * DESCRIPTION: Adjusting Date: as well 2014-08-19 Yixuan Qiu * inst/include/Eigen: Updated to release 3.2.2 of Eigen * inst/unitTests/runit.wrap.R: More unit tests to reflect changes in as() exporters 2014-05-29 Yixuan Qiu * inst/include/RcppEigenWrap.h: Added as() conversion from R vector to Eigen::Array * inst/include/RcppEigenWrap.h: Fixed incorrect as() exporters * inst/include/RcppEigenWrap.h: Check matrix type of "dgCMatrix" instead of "CsparseMatrix" * inst/include/RcppEigenWrap.h, inst/include/RcppEigenForward.h: Added as() conversion from "dgRMatrix" to Eigen::MappedSparseMatrix 2014-05-01 Dirk Eddelbuettel * inst/include/Eigen/src/Core/util/Memory.h: Applied patch by Gael Guennebaud created in response to Doug's bug report (#803) which in turn is based on the UBSAN report at CRAN by Brian Ripley 2014-03-06 Dirk Eddelbuettel * DESCRIPTION: New version 0.3.2.1.1 2014-03-05 Dirk Eddelbuettel * inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h: Better, more general test for DIRENT feature thanks to Brian Ripley 2014-03-04 Dirk Eddelbuettel * inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h: (re-)add another #ifndef excluding a line on Solaris 2014-03-03 Dirk Eddelbuettel * DESCRIPTION: New release version 0.3.2.1.0 2014-03-03 Douglas Bates * inst/include/Eigen: Update to release 3.2.1 of Eigen 2014-03-01 Dirk Eddelbuettel * DESCRIPTION: New minor version 3.2.0.3 * DESCRIPTION: Switched to 'Imports: Rcpp (>= 0.11.0)' * src/Makevars: Updated for Rcpp 0.11.0 * src/Makevars.win: Ditto * R/RcppEigen.package.skeleton.R: Updated for Rcpp 0.11.0, also updated for current R versions by removing deprecated 'namespace' arg * man/RcppEigen.package.skeleton.Rd: Removed 'namespace' for current R * inst/skeleton/rcppeigen_hello_world.cpp: Rewritten / extended similar to the RcppArmadillo variant; now contains four functions exported by Rcpp Attributes * inst/skeleton/Makevars: Updated for Rcpp 0.11.0, also noted C++11 option via setting of 'USE_CXX1X=' with R 3.1.0 or later * inst/skeleton/Makevars.win: Ditto 2014-01-26 Dirk Eddelbuettel * DESCRIPTION: New minor version 3.2.0.2 2014-01-22 Dirk Eddelbuettel * NAMESPACE: Added importFrom(Rcpp, evalCpp) needed for next Rcpp 2014-01-12 Dirk Eddelbuettel * R/SHLIB.maker: Commented-out, file to be removed 2013-12-19 Dirk Eddelbuettel * DESCRIPTION: Updated Description to mention MPL-2 license for Eigen * inst/unitTests/runit.solutions.R: RUnit file converted from testthat * inst/unitTests/runit.transform.R: Idem * inst/unitTests/runit.wrap.R: Idem * DESCRIPTION: Remove Suggests: of testthat 2013-12-18 Dirk Eddelbuettel * DESCRIPTION: New minor version 3.2.0.1 * DESCRIPTION: New maintainer * DESCRIPTION: Added Copyright: reference to new COPYRIGHTS file * DESCRIPTION: Added reference to new LICENSE file * inst/COPYRIGHTS: Added, based on existing debian/copyright file * LICENSE: Added to state MPL-2 for Eigen, GPL (>=2) for RcppEigen * inst/include/Eigen/src/SparseLU/SparseLU.h: Applied Solaris patch * inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h: Idem * inst/include/RcppEigenForward.h: Idem 2013-12-04 Dirk Eddelbuettel * debian/*: Added after emails with Doug * .Rbuildignore: Added debian/ 2013-11-13 Douglas Bates * DESCRIPTION: New version, release date and dependencies * inst/include/Eigen: Update to release 3.2.0 of Eigen 2013-02-03 Dirk Eddelbuettel * DESCRIPTION: New minor version 0.3.1.2.1 for upload to CRAN and JSS * NEWS.org: Updated 2013-01-26 Dirk Eddelbuettel * vignettes/jss835/: Regroup all related files in this directory * .Rbuildignore: Ignore vignettes/jss835 2013-01-14 Dirk Eddelbuettel * inst/CITATION: Added as provided by JSS editor * man/fastLm.Rd: Added reference to JSS paper * man/RcppEigen-package.Rd: Idem 2012-11-29 Douglas Bates * DESCRIPTION: New version * inst/include/Eigen, inst/include/unsupported: Update to version 3.1.2 of Eigen. 2012-11-29 Romain Francois * include/RcppEigenWrap.h: compatibility issue with Rcpp 0.10.1 2012-08-14 Dirk Eddelbuettel * R/flags.R: Add two (unexported) functions CxxFlags() and RcppEigenCxxFlags() for use in Makefiles etc 2012-07-30 Douglas Bates * inst/include/RcppEigenWrap.h, inst/unitTests/runit.RcppEigen.R: Another suggestion from Gael Guennebaud to allow row vectors to be wrapped. 2012-07-28 Douglas Bates * inst/include/RcppEigenWrap.h, inst/unitTests/runit.RcppEigen.R: More fixes to RcppEigenWrap.h and adjustment of tests accordingly. The changes allow RowMajor matrices to be wrapped (thanks to Gael Guennebaud) but cannot handle RowVector types. There will need to be more template metaprogramming done to redirect the case of RowVector, which cannot be changed to a ColMajor form. * src/Makevars: Because of changes in R, -DNDEBUG is automatic. One must override it with -UNDEBUG in the local ~/.R/Makevars to activate the debugging code. * inst/doc/code.R: New version of wrap code provides correct answer to the badtrans example * DESCRIPTION: Bump the date. 2012-07-27 Douglas Bates * inst/include/Eigen/*: Changed to released version of Eigen-3.1.0 using the MPL2 license. 2012-07-19 Dirk Eddelbuettel * R/fastLm.R: correct residual standard error display * R/fastLm.R: improved test for intercept once more with a tip of the hat to Doug 2012-07-17 Dirk Eddelbuettel * R/fastLm.R: Corrections for R^2 in no-intercept case 2012-06-26 Douglas Bates * DESCRIPTION, R/unit.test.R, inst/include/Eigen/*: Massive changes related to upgrade to Eigen-3.1.0 2012-03-13 Douglas Bates * inst/include/RcppEigenWrap.h: Change the wrap methods to avoid creating Rcpp::Dimension objects (which are implicitly created by the Rcpp::Matrix constructor). * inst/tests/test-solutions.R: Clean up. I defined the results then reevaluated them. 2012-03-08 Douglas Bates * inst/include/RcppEigenForward.h: Remove an include of RcppEigenConfig.h which is no longer needed. * ChangeLog, NEWS, NEWS.org: Update Changelog and NEWS which is generated from an org-mode file NEWS.org * inst/include/RcppEigenConfig.h, inst/include/unsupported/Eigen/MoreVectorization, inst/include/unsupported/Eigen/src/MoreVectorization/CMakeLists.txt: Delete the unsupported/Eigen/src/MoreVectorization module which we don't need * inst/include/unsupported/Eigen/src/MoreVectorization/MathFunctions.h: Delete the unsupported/Eigen/src/MoreVectorization module which we don't need * inst/tests/test-solutions.R, inst/tests/test-transform.R, inst/tests/test-wrap.R: Add testthat test files, temporarily disabled for R CMD check because of an unknown conflict * DESCRIPTION, R/unit.test.R, inst/include/Eigen/Cholesky, ..., src/Makevars: Massive changes in upgrade to Eigen-3.1.0-alpha2 2012-03-03 Douglas Bates * inst/include/RcppEigenWrap.h: Correct a typo. 2012-02-28 Douglas Bates * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: Add as templates for ArrayXd and ArrayXXd 2012-02-03 Douglas Bates * inst/include/RcppEigenWrap.h, inst/unitTests/runit.sparse.R: Allow import of compressed row sparse matrices. Add test of same. 2012-01-18 Douglas Bates * inst/include/RcppEigenWrap.h: Allow for wrapping sparse row-major matrices. 2011-12-31 Douglas Bates * inst/include/RcppEigenCholmod.h: Minor typo in a comment 2011-12-23 Dirk Eddelbuettel * inst/unitTests/runTests.R: unit tests output 'fallback' directory changed to '..' and files are now in top-level of $pkg.Rcheck/ 2011-11-14 Douglas Bates * vignettes/RcppEigen-intro-nojss.Rnw: Many, many changes by Dirk and by Doug over the last two weeks to create a JSS paper and updated package. Too many changes to list. 2011-10-26 Douglas Bates * inst/doc/Rcpp.bib, inst/doc/RcppEigen-Intro.Rnw, inst/doc/RcppEigen-Intro.pdf: Add an introductory vignette. * src/fastLm.cpp, src/fastLm.h: Use a method for XtX but returning a MatrixXd, not a view. * inst/doc/RcppEigen-Intro.Rnw: Describe the SymmEig method correctly. * src/fastLm.cpp, src/fastLm.h: Use a macro for XtX temporarily. Clean up code. * inst/include/RcppEigenCholmod.h, inst/include/RcppEigenStubs.h: Update to Matrix_1.0.2 versions * inst/include/Eigen, inst/include/unsupported: update to eigen 3.0.3 * src/RcppEigen.cpp: Add externally callable functions for Eigen version and SSE instruction sets in use. * inst/examples/lmBenchmark.R: Suppress messages and provide additional information about Eigen version and SSE instructions in use. * R/fastLm.R: Allow optional arguments to be passed to printCoefmat; adapt to new returned structure. 2011-09-16 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Avoid compiler warnings about comparing signed and unsigned values. 2011-09-13 Douglas Bates * src/Makevars: Force the -DNDEBUG flag to satisfy R CMD check in R-2.14.0 and higher 2011-09-12 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Remove the solvetype specification for the solve method (but type is retained for solveInPlace). 2011-09-11 Douglas Bates * inst/include/RcppEigenForward.h: Remove forward declaration of non-existent templated function. 2011-09-02 Douglas Bates * DESCRIPTION, inst/include/Eigen/src/Core/util/Meta.h: New release. ifdef the use of "long long" using RCPP_HAS_LONG_LONG_TYPES (thanks, Dirk) 2011-08-31 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Delete my addition of a CholmodAutoLLt mode, which is not needed (I misunderstood something previously). 2011-08-28 Douglas Bates * many files in inst/include/Eigen and inst/include/unsupported: Upgrade to Eigen release 3.0.2 2011-08-12 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: By-passing the const reference 2011-08-11 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Add a solveInPlace method to try to avoid memory problems. 2011-08-10 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Attempted fix of problem of freeing Map'ed memory in the Cholmod solve. 2011-07-31 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Trying to fix the memory problems from the CholmodDecomposition solve methods - apparently unsuccessfully. 2011-07-29 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Add LLtAuto method (which doesn't seem to work), more extractors and a factorize_p method for a const_CHM_SP argument. * inst/include/RcppEigenCholmod.h: Add a declaration of log determinant squared function. * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h, inst/unitTests/runit.sparse.R: Allow rectangular matrix in CholmodDecomposition's factorize method. Test rectangular version of KNex example. 2011-07-28 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Added a solveType member - there will be better ways of doing this * inst/include/RcppEigenForward.h: Include the stubs unconditionally * inst/include/RcppEigenStubs.cpp: * inst/include/RcppEigenStubs.h: Inlined the function definitions so it is a header file now * inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h: Extended CholmodDecomposition analyzePattern and factorize to support rectangular matrices. Added factorize_p method, primarily for the factorization of A'A + I. * DESCRIPTION, NAMESPACE, R/SHLIB.R, R/inline.R, inst/include/RcppEigen.h, inst/include/RcppEigenConfig.h, inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h, inst/include/unsupported/Eigen/SparseExtra, inst/include/unsupported/Eigen/src/SparseExtra/CholmodSupport.h, inst/skeleton/Makevars, inst/unitTests/runit.sparse.R: Add support for Cholmod functions linked through the Matrix package. Tests for same. Wrap of an Eigen::SparseMatrix now returns an S4 dgCMatrix object. * inst/include/RcppEigenCholmod.h, inst/include/RcppEigenStubs.cpp: Create stubs and the Cholmod header. (To Do: make the stubs inline functions in a header file.) 2011-07-27 Douglas Bates * inst/unitTests/runit.sparse.R: Separate the tests for sparse matrices. 2011-07-26 Douglas Bates * inst/include/unsupported/Eigen/src/SparseExtra/SimplicialCholesky.h: Added matrixLDL method 2011-07-18 Douglas Bates * src/fastLm.cpp: clean up code, taking advantage of the more general wrap methods 2011-07-15 Douglas Bates * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h, inst/unitTests/runit.RcppEigen.R: added as methods for Eigen::SparseMatrix and Eigen::MappedSparseMatrix classes and tests of same 2011-07-13 Douglas Bates * DESCRIPTION, inst/include/RcppEigenWrap.h, inst/unitTests/runit.RcppEigen.R: Dispatch on wrap to vector according to T::ColsAtCompileTime; modify tests to avoid the .eval() methods; bump Rcpp version dependence 2011-07-13 Romain Francois * inst/include/RcppEigenWrap.h: dispatch sparse/dense and generalizes dealing with sparse objects * inst/include/RcppEigenWrap.h: comment non generic implementations * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: comment non generic implementations * inst/include/RcppEigenWrap.h: deal with dimensions in eigen_wrap_is_plain * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: first steps into dealing with Eigen expressions 2011-07-10 Douglas Bates * inst/include/PlainObjectBaseAddon.h: Added some begin and end methods to PlainObjectBase template through the extension mechanism. Should make these legitimate iterators to simplify some wrap methods (need a value_type member). * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h, inst/unitTests/runit.RcppEigen.R: Added as methods for mapped vectors and mapped matrices (still some code duplication) and tests of same. 2011-07-09 Dirk Eddelbuettel * inst/unitTests/runit.RcppEigen.R: s/Armadillo/Eigen/ in a few places 2011-07-09 Douglas Bates * inst/unitTests/runit.RcppEigen.R: Added tests of wrap and as. Need to create an as method for mapped arrays. 2011-07-08 Douglas Bates * DESCRIPTION: Prepare for a new release. * inst/include/RcppEigen.h, inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: Add wrap methods for mapped Eigen objects; Initial support for as<> with some Eigen classes. * inst/include/unsupported/Eigen/src/SparseExtra/SimplicialCholesky.h: Commit interim version of the SimplicialLLT and SimplicialLDLT classes that will appear in Eigen 3.0.2 2011-06-30 Douglas Bates * src/fastLm.cpp: Code simplification suggested by Gael Guennebaud 2011-06-29 Dirk Eddelbuettel * DESCRIPTION: make Maintainers equal to Authors (but keep our joint email address) 2011-06-29 Douglas Bates * inst/include/RcppEigenWrap.h: Syntax errors corrected. 2011-06-28 Douglas Bates * inst/examples/lmBenchmark.R: Print the results from do_bench() so echo=TRUE is not needed when sourcing; add a suppressSVD argument to do_bench(). 2011-06-27 Douglas Bates * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: Add a wrap method (compiles but currently untested) for Eigen::SparseMatrix * src/fastLm.cpp: Include sample code for the Moore-Penrose inverse. There are better ways of doing this but I can't work out the syntax. 2011-06-25 Douglas Bates * inst/examples, inst/examples/lmBenchmark.R: Add lm benchmark example * src/fastLm.cpp: tighten code a bit 2011-06-23 Douglas Bates * src/fastLm.cpp: Don't try to extract names that aren't there. * man/fastLm.Rd: Add a simple example. * src/fastLm.cpp, src/fastLm.h: Add the SymmEig lm method. Preliminary support for setting a tolerance for the rank calculation. * src/RcppEigen.cpp: Use the correct macros for eigen_version. * man/fastLm.Rd, tests/simple.R: Add examples and test cases for simple crossprod and tcrossprod functions * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: bypass the incomplete exporter functions to support as<> 2011-06-21 Douglas Bates * src/fastLm.cpp, src/fastLm.h: Add an SVD method for fastLm * DESCRIPTION: Minor fix in punctuation. * R/fastLm.R, inst/unitTests/runit.fastLm.R, man/fastLm.Rd, src/fastLm.cpp, src/fastLm.h: Refactoring of the fastLm code to allow algorithms to be added easily. Adjust docs and R code accordingly. 2011-06-17 Douglas Bates * inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h: Add wrap instantiations for ArrayXd, ArrayXXd classes in Eigen 2011-06-15 Douglas Bates * DESCRIPTION, R/RcppEigen.package.skeleton.R, R/SHLIB.R, R/fastLm.R, R/inline.R, R/unit.test.R, inst/doc/RcppEigen-unitTests.Rnw, inst/include/RcppEigen.h, inst/include/RcppEigenConfig.h, inst/include/RcppEigenForward.h, inst/include/RcppEigenWrap.h, inst/unitTests/runit.fastLm.R, man/fastLm.Rd, src/RcppEigen.cpp, src/fastLm.cpp: Change references to Armadillo into Eigen; author order in copyright statements. 2011-06-15 Dirk Eddelbuettel * tests/doRUnit.R: oops: s/Armadillo/Eigen/ 2011-06-15 Douglas Bates * src/fastLm.cpp: Initial (not very good) implementation of "fastLmSVD", which isn't really that fast. * man/fastLm.Rd: Minor clarification. 2011-06-15 Douglas Bates * cleanup, inst/doc/*: Documentation based on unit tests. * R/SHLIB.R, R/inline.R, R/unit.test.R: Add support for inline package * inst/include/RcppEigen.h: Add support for sparse Cholesky and LU * inst/include/unsupported, inst/include/unsupported/*: Add support for sparse Cholesky and LU * man/RcppEigen-package.Rd: Add reference to web page * NAMESPACE, R/RcppEigen.package.skeleton.R, inst/skeleton/*, man/RcppEigen.package.skeleton.Rd: Add RcppEigen.package.skeleton and support files. * DESCRIPTION: Remove Conrad's text about Armadillo, as suggested by Dirk 2011-06-15 Dirk Eddelbuettel * inst/unitTests/runit.fastLm.R: better way to load trees dataset * man/fastLm.Rd: better way to load trees dataset * inst/unitTests/runit.fastLm.R, man/fastLm.Rd: added unit tests for fastLm{Bench,Chol1,Chol2} * ChangeLog, inst/unitTests, inst/unitTests/runTests.R, inst/unitTests/runit.fastLm.R, tests, tests/doRUnit.R: added initial unit tests 2011-06-14 Dirk Eddelbuettel * inst/unitTests/*: Added initial unit tests * tests/doRUnit.R: Added hook to run RUnit tests 2011-06-14 Douglas Bates * src/fastLm.cpp: Cosmetic fixes. * DESCRIPTION: New version. * man/fastLm.Rd: Change order of fastLmPure arguments in the example. Dirk said I would miss one of these. :-) * R/fastLm.R, man/fastLm.Rd, src/fastLm.cpp: Change order of fastLmPure arguments and the various fastLm* C++ functions. Add fastLmChol1 and fastLmChol2. 2011-06-13 Douglas Bates * ChangeLog: Add ChangeLog * src/fastLm.cpp: Handle the rank-deficient case. * inst/include/Eigen/src/LU/arch/Inverse_SSE.h: Use an _m128d type instead of long long int for the mask. * R/fastLm.R: Associate names with coefficients. Clean up fastLm. Forward the object through summary. * inst/include/Eigen/src/Core/util/Meta.h: Suppress use of long long RcppEigen/README.md0000644000176200001440000000773014456534622013427 0ustar liggesusers## RcppEigen: R and Eigen via Rcpp [![CI](https://github.com/RcppCore/RcppEigen/workflows/ci/badge.svg)](https://github.com/RcppCore/RcppEigen/actions?query=workflow%3Aci) [![License](https://img.shields.io/badge/license-GPL%20%28%3E=%202%29-brightgreen.svg?style=flat)](https://www.gnu.org/licenses/gpl-2.0.html) [![License](https://img.shields.io/badge/license-MPL2-brightgreen.svg?style=flat)](https://www.mozilla.org/MPL/2.0/) [![CRAN](https://www.r-pkg.org/badges/version/RcppEigen)](https://cran.r-project.org/package=RcppEigen) [![r-universe](https://rcppcore.r-universe.dev/badges/RcppEigen)](https://rcppcore.r-universe.dev/RcppEigen) [![Dependencies](https://tinyverse.netlify.com/badge/RcppEigen)](https://cran.r-project.org/package=RcppEigen) [![Coverage Status](https://codecov.io/gh/RcppCore/RcppEigen/graph/badge.svg)](https://app.codecov.io/github/RcppCore/RcppEigen?branch=master) [![Debian package](https://img.shields.io/debian/v/r-cran-rcppeigen/sid?color=brightgreen)](https://packages.debian.org/sid/r-cran-rcppeigen) [![Last Commit](https://img.shields.io/github/last-commit/RcppCore/RcppEigen)](https://github.com/RcppCore/RcppEigen) [![Downloads (monthly)](https://cranlogs.r-pkg.org/badges/RcppEigen?color=brightgreen)](https://www.r-pkg.org:443/pkg/RcppEigen) [![Downloads (total)](https://cranlogs.r-pkg.org/badges/grand-total/RcppEigen?color=brightgreen)](https://www.r-pkg.org:443/pkg/RcppEigen) [![CRAN use](https://jangorecki.gitlab.io/rdeps/RcppEigen/CRAN_usage.svg?sanitize=true)](https://cran.r-project.org/package=RcppEigen) [![BioConductor use](https://jangorecki.gitlab.io/rdeps/RcppEigen/BioC_usage.svg?sanitize=true)](https://cran.r-project.org/package=RcppEigen) [![StackOverflow](https://img.shields.io/badge/stackoverflow-rcpp-orange.svg)](https://stackoverflow.com/questions/tagged/rcpp) [![JSS](https://img.shields.io/badge/JSS-10.18637%2Fjss.v052.i05-brightgreen)](https://doi.org/10.18637/jss.v052.i05) ### Synopsis [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page) is a C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers, decompositions of such matrices, and solutions of linear systems. Its performance on many algorithms is comparable with some of the best implementations based on `Lapack` and level-3 `BLAS`. RcppEigen provides an interface from R to and from [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page) by using the facilities offered by the [Rcpp](http://dirk.eddelbuettel.com/code/rcpp.html) package for seamless R and C++ integration. ### Examples A few examples are over at the [Rcpp Gallery](https://gallery.rcpp.org/tags/eigen/). A simple one is ```c++ #include // [[Rcpp::depends(RcppEigen)]] using Eigen::Map; // 'maps' rather than copies using Eigen::MatrixXd; // variable size matrix, double precision using Eigen::VectorXd; // variable size vector, double precision using Eigen::SelfAdjointEigenSolver; // one of the eigenvalue solvers // [[Rcpp::export]] VectorXd getEigenValues(Map M) { SelfAdjointEigenSolver es(M); return es.eigenvalues(); } ``` which can be turned into a function callable from R via a simple ``` sourceCpp("eigenExample.cpp") ``` due to the two Rcpp directives to use headers from the RcppEigen package, and to export the `getEigenValues()` function -- but read [the full post](https://gallery.rcpp.org/articles/eigen-eigenvalues/) for details. ### Status The package is mature and under active development, following the [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page) release cycle. ### Documentation The package contains a pdf vignette which is a pre-print of the [paper by Bates and Eddelbuettel](https://doi.org/10.18637/jss.v052.i05) in JSS (2013, v52i05). ### Authors Douglas Bates, Dirk Eddelbuettel, Romain Francois, and Yixuan Qiu ### License GPL (>= 2) RcppEigen/man/0000755000176200001440000000000013763205127012707 5ustar liggesusersRcppEigen/man/fastLm.Rd0000644000176200001440000001132513763205127014426 0ustar liggesusers\name{fastLm} \alias{fastLm} \alias{fastLm.default} \alias{fastLm.formula} \alias{fastLmPure} \concept{regression} \title{Bare-bones linear model fitting function} \description{ \code{fastLm} estimates the linear model using one of several methods implemented using the \code{Eigen} linear algebra library. } \usage{ fastLmPure(X, y, method = 0L) fastLm(X, \dots) \method{fastLm}{default}(X, y, method = 0L, \dots) \method{fastLm}{formula}(formula, data = list(), method = 0L, \dots) } \arguments{ \item{y}{the response vector} \item{X}{a model matrix} \item{formula}{an object of class \code{"\link{formula}"} (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given in the \sQuote{Details} section of the documentation for \code{\link{lm}}.} \item{data}{an optional data frame, list or environment (or object coercible by \code{\link{as.data.frame}} to a data frame) containing the variables in the model. If not found in \code{data}, the variables are taken from \code{environment(formula)}, typically the environment from which \code{lm} is called.} \item{method}{an integer scalar with value 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the Jacobi singular value decomposition (SVD) and 5 for a method based on the eigenvalue-eigenvector decomposition of \eqn{\mathbf{X}^\prime\mathbf{X}}{X'X}. Default is zero.} \item{\dots}{not used} } \details{ Linear models should be estimated using the \code{\link{lm}} function. In some cases, \code{\link{lm.fit}} may be appropriate. The \code{fastLmPure} function provides a reference use case of the \code{Eigen} C++ template library via the wrapper functions in the \pkg{RcppEigen} package. The \code{fastLm} function provides a more standard implementation of a linear model fit, offering both a default and a formula interface as well as \code{print}, \code{summary} and \code{predict} methods. Internally the \code{fastLm} function, by default, uses a QR decomposition with column pivots, which is a rank-revealing decomposition, so that it can handle rank-deficient cases effectively. Other methods for determining least squares solutions are available according to the value of the \code{method} argument. An example of the type of situation requiring extra care in checking for rank deficiency is a two-way layout with missing cells (see the examples section). These cases require a special pivoting scheme of \dQuote{pivot only on (apparent) rank deficiency} which is not part of conventional linear algebra software. } \value{ \code{fastLmPure} returns a list with several components: \item{coefficients}{a vector of coefficients} \item{se}{a vector of the standard errors of the coefficient estimates} \item{rank}{a scalar denoting the computed rank of the model matrix} \item{df.residual}{a scalar denoting the degrees of freedom in the model} \item{residuals}{the vector of residuals} \item{s}{a numeric scalar - the root mean square for residuals} \item{fitted.values}{the vector of fitted value} \code{fastLm} returns a richer object which also includes the call argument similar to the \code{\link{lm}} or \code{\link[MASS]{rlm}} functions.. } \seealso{\code{\link{lm}}, \code{\link{lm.fit}}} \references{ Douglas Bates and Dirk Eddelbuettel (2013). Fast and Elegant Numerical Linear Algebra Using the \pkg{RcppEigen} Package. \emph{Journal of Statistical Software}, \bold{52(5)}, 1-24. URL http://www.jstatsoft.org/v52/i05/. } \author{ Eigen is described at \url{http://eigen.tuxfamily.org/index.php?title=Main_Page}. RcppEigen is written by Douglas Bates, Dirk Eddelbuettel and Romain Francois. } \examples{ data(trees, package="datasets") mm <- cbind(1, log(trees$Girth)) # model matrix y <- log(trees$Volume) # response ## bare-bones direct interface flm <- fastLmPure(mm, y) print(flm) ## standard R interface for formula or data returning object of class fastLm flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees) summary(flmmod) ## case where non-rank-revealing methods break down dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]), f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ] xtabs(~ f2 + f1, dd) # one missing cell mm <- model.matrix(~ f1 * f2, dd) kappa(mm) # large, indicating rank deficiency set.seed(1) dd$y <- mm \%*\% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1) summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency try(summary(fastLm(y ~ f1 * f2, dd))) # also detects rank deficiency } \keyword{regression} RcppEigen/man/RcppEigen.package.skeleton.Rd0000644000176200001440000000467713550713140020276 0ustar liggesusers\name{RcppEigen.package.skeleton} \alias{RcppEigen.package.skeleton} \title{ Create a skeleton for a new package that intends to use RcppEigen } \description{ \code{RcppEigen.package.skeleton} automates the creation of a new source package that intends to use features of RcppEigen. It is based on the \link[utils]{package.skeleton} function which it executes first. } \usage{ RcppEigen.package.skeleton(name = "anRpackage", list = character(), environment = .GlobalEnv, path = ".", force = FALSE, code_files = character(), example_code = TRUE) } \arguments{ \item{name}{See \link[utils]{package.skeleton}} \item{list}{See \link[utils]{package.skeleton}} \item{environment}{See \link[utils]{package.skeleton}} \item{path}{See \link[utils]{package.skeleton}} \item{force}{See \link[utils]{package.skeleton}} \item{code_files}{See \link[utils]{package.skeleton}} \item{example_code}{If TRUE, example C++ code using RcppEigen is added to the package} } \details{ In addition to \link[utils]{package.skeleton} : The \samp{DESCRIPTION} file gains a Depends line requesting that the package depends on Rcpp and RcppEigen and a LinkingTo line so that the package finds Rcpp and RcppEigen header files. The \samp{NAMESPACE} gains a \code{useDynLib} directive. The \samp{src} directory is created if it does not exists and a \samp{Makevars} file is added setting the environment variable \samp{PKG_LIBS} to accomodate the necessary flags to link with the Rcpp library. If the \code{example_code} argument is set to \code{TRUE}, example files \samp{rcppeigen_hello_world.h} and \samp{rcppeigen_hello_world.cpp} are also created in the \samp{src}. An R file \samp{rcppeigen_hello_world.R} is expanded in the \samp{R} directory, the \code{rcppeigen_hello_world} function defined in this files makes use of the C++ function \samp{rcppeigen_hello_world} defined in the C++ file. These files are given as an example and should eventually by removed from the generated package. } \value{ Nothing, used for its side effects } \seealso{ \link[utils]{package.skeleton} } \references{ Read the \emph{Writing R Extensions} manual for more details. Once you have created a \emph{source} package you need to install it: see the \emph{R Installation and Administration} manual, \code{\link{INSTALL}} and \code{\link{install.packages}}. } \examples{ \dontrun{ RcppEigen.package.skeleton("foobar") } } \keyword{ programming } RcppEigen/man/RcppEigen-package.Rd0000644000176200001440000000177313763205127016453 0ustar liggesusers\name{RcppEigen-package} \alias{RcppEigen-package} \alias{RcppEigen} \docType{package} \title{ Rcpp/Eigen bridge } \description{ The package eases the use of the Eigen C++ template library for linear algebra with Rcpp } \details{ This package contains the header files for the Eigen C++ template library. The typical usage is to install this package and list it in the \env{LinkingTo: } line in the \file{DESCRIPTION} file of other packages. The C++ source code and the R source code in this package are for illustration only. As described at the Eigen project's home page, \url{http://eigen.tuxfamily.org/index.php?title=Main_Page}, Eigen is a versatile, fast, reliable and elegant collection of C++ classes for linear algebra. } \references{ Douglas Bates and Dirk Eddelbuettel (2013). Fast and Elegant Numerical Linear Algebra Using the \pkg{RcppEigen} Package. \emph{Journal of Statistical Software}, \bold{52(5)}, 1-24. URL http://www.jstatsoft.org/v52/i05/. } \keyword{ package } RcppEigen/DESCRIPTION0000644000176200001440000000336314567702434013655 0ustar liggesusersPackage: RcppEigen Type: Package Title: 'Rcpp' Integration for the 'Eigen' Templated Linear Algebra Library Version: 0.3.4.0.0 Date: 2024-02-28 Author: Douglas Bates, Dirk Eddelbuettel, Romain Francois, and Yixuan Qiu; the authors of Eigen for the included version of Eigen Maintainer: Dirk Eddelbuettel Copyright: See the file COPYRIGHTS for various Eigen copyright details Description: R and 'Eigen' integration using 'Rcpp'. 'Eigen' is a C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers, decompositions of such matrices, and solutions of linear systems. Its performance on many algorithms is comparable with some of the best implementations based on 'Lapack' and level-3 'BLAS'. The 'RcppEigen' package includes the header files from the 'Eigen' C++ template library. Thus users do not need to install 'Eigen' itself in order to use 'RcppEigen'. Since version 3.1.1, 'Eigen' is licensed under the Mozilla Public License (version 2); earlier version were licensed under the GNU LGPL version 3 or later. 'RcppEigen' (the 'Rcpp' bindings/bridge to 'Eigen') is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'. License: GPL (>= 2) | file LICENSE LazyLoad: yes Depends: R (>= 3.6.0) LinkingTo: Rcpp Imports: Rcpp (>= 0.11.0), stats, utils Suggests: Matrix, inline, tinytest, pkgKitten, microbenchmark URL: https://github.com/RcppCore/RcppEigen, https://dirk.eddelbuettel.com/code/rcpp.eigen.html BugReports: https://github.com/RcppCore/RcppEigen/issues NeedsCompilation: yes Packaged: 2024-02-28 13:13:53 UTC; edd Repository: CRAN Date/Publication: 2024-02-28 19:10:20 UTC RcppEigen/build/0000755000176200001440000000000014567630620013236 5ustar liggesusersRcppEigen/build/vignette.rds0000644000176200001440000000040714567630620015576 0ustar liggesusersQn0 MmNcHⰫiM)MT/sFFФY86b,)K9|BOzO |(JcEE9CVyyBE_-w“d鯼Q ʇ?^Jd[ !:u BU7X7)y'F1"qVێ寄np ]f穮_ldhuw[ s$WWJǫU y;/RcppEigen/tests/0000755000176200001440000000000013557014221013270 5ustar liggesusersRcppEigen/tests/tinytest.R0000644000176200001440000000103413557014221015274 0ustar liggesusers if (requireNamespace("tinytest", quietly=TRUE) && utils::packageVersion("tinytest") >= "1.0.0") { ## Set a seed to make the test deterministic set.seed(42) ## R makes us to this Sys.setenv("R_TESTS"="") ## there are several more granular ways to test files in a tinytest directory, ## see its package vignette; tests can also run once the package is installed ## using the same command `test_package(pkgName)`, or by director or file tinytest::test_package("RcppEigen", ncpu=getOption("Ncpus", 1)) } RcppEigen/src/0000755000176200001440000000000014567630621012727 5ustar liggesusersRcppEigen/src/fastLm.h0000644000176200001440000000727612565743640014344 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // fastLm.h: Rcpp/Eigen example of a simple lm() alternative // // Copyright (C) 2011 - 2015 Douglas Bates, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // in file.path(R.home("share"), "licenses"). If not, see // . #ifndef RCPPEIGEN_FASTLM_H #define RCPPEIGEN_FASTLM_H #include namespace lmsol { using Eigen::ArrayXd; using Eigen::ColPivHouseholderQR; using Eigen::ComputeThinU; using Eigen::ComputeThinV; using Eigen::HouseholderQR; using Eigen::JacobiSVD; using Eigen::LDLT; using Eigen::LLT; using Eigen::Lower; using Eigen::Map; using Eigen::MatrixXd; using Eigen::SelfAdjointEigenSolver; using Eigen::SelfAdjointView; using Eigen::TriangularView; using Eigen::VectorXd; using Eigen::Upper; typedef MatrixXd::Index Index; typedef MatrixXd::Scalar Scalar; typedef MatrixXd::RealScalar RealScalar; typedef ColPivHouseholderQR::PermutationType Permutation; typedef Permutation::IndicesType Indices; class lm { protected: Map m_X; /**< model matrix */ Map m_y; /**< response vector */ Index m_n; /**< number of rows of X */ Index m_p; /**< number of columns of X */ VectorXd m_coef; /**< coefficient vector */ int m_r; /**< computed rank or NA_INTEGER */ VectorXd m_fitted; /**< vector of fitted values */ VectorXd m_se; /**< standard errors */ RealScalar m_prescribedThreshold; /**< user specified tolerance */ bool m_usePrescribedThreshold; public: lm(const Map&, const Map&); ArrayXd Dplus(const ArrayXd& D); MatrixXd I_p() const {return MatrixXd::Identity(m_p, m_p);} MatrixXd XtX() const; // setThreshold and threshold are based on ColPivHouseholderQR methods RealScalar threshold() const; const VectorXd& se() const {return m_se;} const VectorXd& coef() const {return m_coef;} const VectorXd& fitted() const {return m_fitted;} int rank() const {return m_r;} lm& setThreshold(const RealScalar&); }; class ColPivQR : public lm { public: ColPivQR(const Map&, const Map&); }; class Llt : public lm { public: Llt(const Map&, const Map&); }; class Ldlt : public lm { public: Ldlt(const Map&, const Map&); }; class QR : public lm { public: QR(const Map&, const Map&); }; class GESDD : public lm { public: GESDD(const Map&, const Map&); }; class SVD : public lm { public: SVD(const Map&, const Map&); }; class SymmEigen : public lm { public: SymmEigen(const Map&, const Map&); }; } // extern "C" SEXP fastLm(SEXP Xs, SEXP ys, SEXP types); #endif RcppEigen/src/fastLm.cpp0000644000176200001440000002171214562417221014656 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // fastLm.cpp: Rcpp/Eigen example of a simple lm() alternative // // Copyright (C) 2011 - 2022 Douglas Bates, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // in file.path(R.home("share"), "licenses"). If not, see // . #include "fastLm.h" #if !defined(EIGEN_USE_MKL) // don't use R Lapack.h if MKL is enabled #include #endif #ifndef FCONE # define FCONE #endif namespace lmsol { using Rcpp::_; using Rcpp::as; using Rcpp::CharacterVector; using Rcpp::clone; using Rcpp::List; using Rcpp::NumericMatrix; using Rcpp::NumericVector; using Rcpp::RObject; using Rcpp::wrap; using std::invalid_argument; using std::numeric_limits; lm::lm(const Map &X, const Map &y) : m_X(X), m_y(y), m_n(X.rows()), m_p(X.cols()), m_coef(VectorXd::Constant(m_p, ::NA_REAL)), // #nocov m_r(::NA_INTEGER), m_fitted(m_n), m_se(VectorXd::Constant(m_p, ::NA_REAL)), // #nocov m_usePrescribedThreshold(false) { } lm& lm::setThreshold(const RealScalar& threshold) { // #nocov start m_usePrescribedThreshold = true; m_prescribedThreshold = threshold; return *this; // #nocov end } inline ArrayXd lm::Dplus(const ArrayXd& d) { ArrayXd di(d.size()); double comp(d.maxCoeff() * threshold()); for (int j = 0; j < d.size(); ++j) di[j] = (d[j] < comp) ? 0. : 1./d[j]; m_r = (di != 0.).count(); return di; } MatrixXd lm::XtX() const { return MatrixXd(m_p, m_p).setZero().selfadjointView(). rankUpdate(m_X.adjoint()); } /** Returns the threshold that will be used by certain methods such as rank(). * * The default value comes from experimenting (see "LU precision * tuning" thread on the Eigen list) and turns out to be * identical to Higham's formula used already in LDLt. * * @return The user-prescribed threshold or the default. */ RealScalar lm::threshold() const { return m_usePrescribedThreshold ? m_prescribedThreshold : numeric_limits::epsilon() * m_p; } ColPivQR::ColPivQR(const Map &X, const Map &y) : lm(X, y) { ColPivHouseholderQR PQR(X); // decompose the model matrix Permutation Pmat(PQR.colsPermutation()); m_r = PQR.rank(); if (m_r == m_p) { // full rank case m_coef = PQR.solve(y); m_fitted = X * m_coef; m_se = Pmat * PQR.matrixQR().topRows(m_p). triangularView().solve(I_p()).rowwise().norm(); return; } MatrixXd Rinv(PQR.matrixQR().topLeftCorner(m_r, m_r). // #nocov start triangularView(). solve(MatrixXd::Identity(m_r, m_r))); VectorXd effects(PQR.householderQ().adjoint() * y); m_coef.head(m_r) = Rinv * effects.head(m_r); m_coef = Pmat * m_coef; // create fitted values from effects // (can't use X*m_coef if X is rank-deficient) effects.tail(m_n - m_r).setZero(); m_fitted = PQR.householderQ() * effects; m_se.head(m_r) = Rinv.rowwise().norm(); m_se = Pmat * m_se; // #nocov end } QR::QR(const Map &X, const Map &y) : lm(X, y) { HouseholderQR QR(X); m_coef = QR.solve(y); m_fitted = X * m_coef; m_se = QR.matrixQR().topRows(m_p). triangularView().solve(I_p()).rowwise().norm(); } Llt::Llt(const Map &X, const Map &y) : lm(X, y) { LLT Ch(XtX().selfadjointView()); m_coef = Ch.solve(X.adjoint() * y); m_fitted = X * m_coef; m_se = Ch.matrixL().solve(I_p()).colwise().norm(); } Ldlt::Ldlt(const Map &X, const Map &y) : lm(X, y) { LDLT Ch(XtX().selfadjointView()); Dplus(Ch.vectorD()); // to set the rank //FIXME: Check on the permutation in the LDLT and incorporate it in //the coefficients and the standard error computation. // m_coef = Ch.matrixL().adjoint(). // solve(Dplus(D) * Ch.matrixL().solve(X.adjoint() * y)); m_coef = Ch.solve(X.adjoint() * y); m_fitted = X * m_coef; m_se = Ch.solve(I_p()).diagonal().array().sqrt(); } int gesdd(MatrixXd& A, ArrayXd& S, MatrixXd& Vt) { int info, mone = -1, m = A.rows(), n = A.cols(); std::vector iwork(8 * n); double wrk; if (m < n || S.size() != n || Vt.rows() != n || Vt.cols() != n) throw std::invalid_argument("dimension mismatch in gesvd"); // #nocov F77_CALL(dgesdd)("O", &m, &n, A.data(), &m, S.data(), A.data(), &m, Vt.data(), &n, &wrk, &mone, &iwork[0], &info FCONE); int lwork(wrk); std::vector work(lwork); F77_CALL(dgesdd)("O", &m, &n, A.data(), &m, S.data(), A.data(), &m, Vt.data(), &n, &work[0], &lwork, &iwork[0], &info FCONE); return info; } GESDD::GESDD(const Map& X, const Map &y) : lm(X, y) { MatrixXd U(X), Vt(m_p, m_p); ArrayXd S(m_p); if (gesdd(U, S, Vt)) throw std::runtime_error("error in gesdd"); MatrixXd VDi(Vt.adjoint() * Dplus(S).matrix().asDiagonal()); m_coef = VDi * U.adjoint() * y; m_fitted = X * m_coef; m_se = VDi.rowwise().norm(); } SVD::SVD(const Map &X, const Map &y) : lm(X, y) { JacobiSVD UDV(X.jacobiSvd(ComputeThinU|ComputeThinV)); MatrixXd VDi(UDV.matrixV() * Dplus(UDV.singularValues().array()).matrix().asDiagonal()); m_coef = VDi * UDV.matrixU().adjoint() * y; m_fitted = X * m_coef; m_se = VDi.rowwise().norm(); } SymmEigen::SymmEigen(const Map &X, const Map &y) : lm(X, y) { SelfAdjointEigenSolver eig(XtX().selfadjointView()); MatrixXd VDi(eig.eigenvectors() * Dplus(eig.eigenvalues().array()).sqrt().matrix().asDiagonal()); m_coef = VDi * VDi.adjoint() * X.adjoint() * y; m_fitted = X * m_coef; m_se = VDi.rowwise().norm(); } enum {ColPivQR_t = 0, QR_t, LLT_t, LDLT_t, SVD_t, SymmEigen_t, GESDD_t}; static inline lm do_lm(const Map &X, const Map &y, int type) { switch(type) { case ColPivQR_t: return ColPivQR(X, y); case QR_t: return QR(X, y); case LLT_t: return Llt(X, y); case LDLT_t: return Ldlt(X, y); case SVD_t: return SVD(X, y); case SymmEigen_t: return SymmEigen(X, y); case GESDD_t: return GESDD(X, y); } throw invalid_argument("invalid type"); return ColPivQR(X, y); // -Wall } List fastLm(Rcpp::NumericMatrix Xs, Rcpp::NumericVector ys, int type) { const Map X(as >(Xs)); const Map y(as >(ys)); Index n = X.rows(); if ((Index)y.size() != n) throw invalid_argument("size mismatch"); // Select and apply the least squares method lm ans(do_lm(X, y, type)); // Copy coefficients and install names, if any NumericVector coef(wrap(ans.coef())); List dimnames(NumericMatrix(Xs).attr("dimnames")); if (dimnames.size() > 1) { RObject colnames = dimnames[1]; if (!(colnames).isNULL()) coef.attr("names") = clone(CharacterVector(colnames)); } VectorXd resid = y - ans.fitted(); int rank = ans.rank(); int df = (rank == ::NA_INTEGER) ? n - X.cols() : n - rank; double s = resid.norm() / std::sqrt(double(df)); // Create the standard errors VectorXd se = s * ans.se(); return List::create(_["coefficients"] = coef, _["se"] = se, _["rank"] = rank, _["df.residual"] = df, _["residuals"] = resid, _["s"] = s, _["fitted.values"] = ans.fitted()); } } // This defines the R-callable function 'fastLm' // [[Rcpp::export]] Rcpp::List fastLm_Impl(Rcpp::NumericMatrix X, Rcpp::NumericVector y, int type) { return lmsol::fastLm(X, y, type); } RcppEigen/src/Makevars0000644000176200001440000000077414456323545014434 0ustar liggesusers## -*- mode: makefile; -*- ## One could add '-fopenmp' here (and below) for multithreaded operations PKG_CXXFLAGS = -I../inst/include ## One could add '-fopenmp' here (and above) for multithreaded operations PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) ## We are not enabling multithreaded operations by default because this would be ## a change in behaviour that would like create trouble for packages using RcppEigen ## as CRAN tests of those packages may seen use of more than two cores and protest RcppEigen/src/Makevars.win0000644000176200001440000000077414456323674015233 0ustar liggesusers## -*- mode: makefile; -*- ## One could add '-fopenmp' here (and below) for multithreaded operations PKG_CXXFLAGS = -I../inst/include ## One could add '-fopenmp' here (and above) for multithreaded operations PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) ## We are not enabling multithreaded operations by default because this would be ## a change in behaviour that would like create trouble for packages using RcppEigen ## as CRAN tests of those packages may seen use of more than two cores and protest RcppEigen/src/RcppEigen.cpp0000644000176200001440000000306414456320130015276 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // RcppEigen.cpp: Rcpp/Eigen glue // // Copyright (C) 2011 - 2023 Douglas Bates, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with RcppEigen. If not, see . #include // [[Rcpp::export]] Rcpp::IntegerVector eigen_version(bool single) { using Rcpp::_; using Rcpp::IntegerVector; if (single) { return Rcpp::wrap( 10000 * EIGEN_WORLD_VERSION + 100 * EIGEN_MAJOR_VERSION + EIGEN_MINOR_VERSION ) ; } return IntegerVector::create(_["major"] = EIGEN_WORLD_VERSION, _["minor"] = EIGEN_MAJOR_VERSION, _["patch"] = EIGEN_MINOR_VERSION); } // [[Rcpp::export]] bool Eigen_SSE() { return Rcpp::wrap(Eigen::SimdInstructionSetsInUse()); } // [[Rcpp::export]] int EigenNbThreads() { return Eigen::nbThreads(); } RcppEigen/src/RcppExports.cpp0000644000176200001440000000441614456320632015724 0ustar liggesusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include "../inst/include/RcppEigen.h" #include using namespace Rcpp; #ifdef RCPP_USE_GLOBAL_ROSTREAM Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); #endif // eigen_version Rcpp::IntegerVector eigen_version(bool single); RcppExport SEXP _RcppEigen_eigen_version(SEXP singleSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< bool >::type single(singleSEXP); rcpp_result_gen = Rcpp::wrap(eigen_version(single)); return rcpp_result_gen; END_RCPP } // Eigen_SSE bool Eigen_SSE(); RcppExport SEXP _RcppEigen_Eigen_SSE() { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; rcpp_result_gen = Rcpp::wrap(Eigen_SSE()); return rcpp_result_gen; END_RCPP } // EigenNbThreads int EigenNbThreads(); RcppExport SEXP _RcppEigen_EigenNbThreads() { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; rcpp_result_gen = Rcpp::wrap(EigenNbThreads()); return rcpp_result_gen; END_RCPP } // fastLm_Impl Rcpp::List fastLm_Impl(Rcpp::NumericMatrix X, Rcpp::NumericVector y, int type); RcppExport SEXP _RcppEigen_fastLm_Impl(SEXP XSEXP, SEXP ySEXP, SEXP typeSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< Rcpp::NumericMatrix >::type X(XSEXP); Rcpp::traits::input_parameter< Rcpp::NumericVector >::type y(ySEXP); Rcpp::traits::input_parameter< int >::type type(typeSEXP); rcpp_result_gen = Rcpp::wrap(fastLm_Impl(X, y, type)); return rcpp_result_gen; END_RCPP } static const R_CallMethodDef CallEntries[] = { {"_RcppEigen_eigen_version", (DL_FUNC) &_RcppEigen_eigen_version, 1}, {"_RcppEigen_Eigen_SSE", (DL_FUNC) &_RcppEigen_Eigen_SSE, 0}, {"_RcppEigen_EigenNbThreads", (DL_FUNC) &_RcppEigen_EigenNbThreads, 0}, {"_RcppEigen_fastLm_Impl", (DL_FUNC) &_RcppEigen_fastLm_Impl, 3}, {NULL, NULL, 0} }; RcppExport void R_init_RcppEigen(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } RcppEigen/vignettes/0000755000176200001440000000000014567630621014150 5ustar liggesusersRcppEigen/vignettes/RcppEigen-Introduction.Rnw0000644000176200001440000024401313471777756021215 0ustar liggesusers\documentclass[shortnames,article,nojss]{jss} \usepackage{booktabs,bm,amsmath,thumbpdf} %\VignetteIndexEntry{RcppEigen-intro} %\VignetteKeywords{linear algebra, template programming, C++, R, Rcpp} %\VignettePackage{RcppEigen} %% VIGNETTE <>= pkgVersion <- packageDescription("RcppEigen")$Version pkgDate <- packageDescription("RcppEigen")$Date prettyDate <- format(Sys.Date(), "%B %e, %Y") #require("RcppEigen") #eigenVersion <- paste(unlist(.Call("eigen_version", FALSE)), collapse=".") @ \author{Douglas Bates\\University of Wisconsin-Madison \And Dirk Eddelbuettel\\Debian Project} \Plainauthor{Douglas Bates, Dirk Eddelbuettel} \title{Fast and Elegant Numerical Linear Algebra Using the \pkg{RcppEigen} Package} \Plaintitle{Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package} \Shorttitle{Fast and Elegant Numerical Linear Algebra with \pkg{RcppEigen}} \Abstract{ The \pkg{RcppEigen} package provides access from \proglang{R} \citep{R:Main} to the \pkg{Eigen} \citep*{Eigen:Web} \proglang{C++} template library for numerical linear algebra. \pkg{Rcpp} \citep{CRAN:Rcpp,Eddelbuettel:2013:Rcpp} classes and specializations of the \proglang{C++} templated functions \code{as} and \code{wrap} from \pkg{Rcpp} provide the ``glue'' for passing objects from \proglang{R} to \proglang{C++} and back. Several introductory examples are presented. This is followed by an in-depth discussion of various available approaches for solving least-squares problems, including rank-revealing methods, concluding with an empirical run-time comparison. Last but not least, sparse matrix methods are discussed. } \Keywords{linear algebra, template programming, \proglang{R}, \proglang{C++}, \pkg{Rcpp}} \Plainkeywords{linear algebra, template programmig, R, C++, Rcpp} \Address{ Douglas Bates \\ Department of Statistics \\ University of Wisconsin-Madison \\ Madison, WI, United States of America \\ E-mail: \email{bates@stat.wisc.edu} \\ URL: \url{http://www.stat.wisc.edu/~bates/}\\ Dirk Eddelbuettel \\ Debian Project \\ River Forest, IL, United States of America\\ E-mail: \email{edd@debian.org}\\ URL: \url{http://dirk.eddelbuettel.com}\\ } \usepackage{Sweave} \newcommand{\argmin}{\operatorname{argmin}\displaylimits} \newcommand{\rank}{\operatorname{rank}} %% highlights macros %% Style definition file generated by highlight 2.7, http://www.andre-simon.de/ % Highlighting theme definition: \newcommand{\hlstd}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlnum}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlesc}[1]{\textcolor[rgb]{0.74,0.55,0.55}{#1}} \newcommand{\hlstr}[1]{\textcolor[rgb]{0.90,0.15,0.15}{#1}} \newcommand{\hldstr}[1]{\textcolor[rgb]{0.74,0.55,0.55}{#1}} \newcommand{\hlslc}[1]{\textcolor[rgb]{0.67,0.13,0.13}{\it{#1}}} \newcommand{\hlcom}[1]{\textcolor[rgb]{0.67,0.13,0.13}{\it{#1}}} \newcommand{\hldir}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlsym}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlline}[1]{\textcolor[rgb]{0.33,0.33,0.33}{#1}} \newcommand{\hlkwa}[1]{\textcolor[rgb]{0.61,0.13,0.93}{\bf{#1}}} \newcommand{\hlkwb}[1]{\textcolor[rgb]{0.13,0.54,0.13}{#1}} \newcommand{\hlkwc}[1]{\textcolor[rgb]{0,0,1}{#1}} \newcommand{\hlkwd}[1]{\textcolor[rgb]{0,0,0}{#1}} \definecolor{bgcolor}{rgb}{1,1,1} % ------------------------------------------------------------------------ \begin{document} \SweaveOpts{engine=R,eps=FALSE} \begin{quote} \footnotesize This vignette corresponds to a \href{http://www.jstatsoft.org/v52/i05/}{paper published} in the \textsl{Journal of Statistical Software}. Currently still identical to the paper, this vignette version may over time receive minor updates. For citations, please use \citet{JSS:RcppEigen} as provided by \code{citation("RcppEigen")}. This version corresponds to \pkg{RcppEigen} version \Sexpr{pkgVersion} and was typeset on \Sexpr{prettyDate}. \end{quote} \section{Introduction} \label{sec:intro} Linear algebra is an essential building block of statistical computing. Operations such as matrix decompositions, linear program solvers, and eigenvalue/eigenvector computations are used in many estimation and analysis routines. As such, libraries supporting linear algebra have long been provided by statistical programmers for different programming languages and environments. Because it is object-oriented, \proglang{C++}, one of the central modern languages for numerical and statistical computing, is particularly effective at representing matrices, vectors and decompositions, and numerous class libraries providing linear algebra routines have been written over the years. As both the \proglang{C++} language and standards have evolved \citep{Meyers:2005:EffectiveC++,Meyers:1995:MoreEffectiveC++,Cpp11}, so have the compilers implementing the language. Relatively modern language constructs such as template meta-programming are particularly useful because they provide overloading of operations (allowing expressive code in the compiled language similar to what can be done in scripting languages) and can shift some of the computational burden from the run-time to the compile-time. (A more detailed discussion of template meta-programming is, however, beyond the scope of this paper). \cite{Veldhuizen:1998:Blitz} provided an early and influential implementation of numerical linear algebra classes that already demonstrated key features of this approach. Its usage was held back at the time by the limited availability of compilers implementing all the necessary features of the \proglang{C++} language. This situation has greatly improved over the last decade, and many more libraries have been made available. One such \proglang{C++} library is \pkg{Eigen} by \citet*{Eigen:Web} which started as a sub-project to KDE (a popular Linux desktop environment), initially focussing on fixed-size matrices to represent projections in a visualization application. \pkg{Eigen} grew from there and has over the course of about a decade produced three major releases with \pkg{Eigen}3 being the current major version. To check the minor and patch version numbers, load the \pkg{RcppEigen} package and call this (internal) helper function: \begin{CodeInput} R> RcppEigen:::eigen_version(FALSE) \end{CodeInput} \begin{CodeOutput} major minor patch 3 3 5 \end{CodeOutput} \pkg{Eigen} is of interest as the \proglang{R} system for statistical computation and graphics \citep{R:Main} is itself easily extensible. This is particular true via the \proglang{C} language that most of \proglang{R}'s compiled core parts are written in, but also for the \proglang{C++} language which can interface with \proglang{C}-based systems rather easily. The manual ``Writing \proglang{R} Extensions'' \citep{R:Extensions} is the basic reference for extending \proglang{R} with either \proglang{C} or \proglang{C++}. The \pkg{Rcpp} package by \citet{JSS:Rcpp,CRAN:Rcpp} facilitates extending \proglang{R} with \proglang{C++} code by providing seamless object mapping between both languages. % As stated in the \pkg{Rcpp} \citep{CRAN:Rcpp} vignette, ``Extending \pkg{Rcpp}'' as well as in \citet{Eddelbuettel:2013:Rcpp} \begin{quote} \pkg{Rcpp} facilitates data interchange between \proglang{R} and \proglang{C++} through the templated functions \code{Rcpp::as} (for conversion of objects from \proglang{R} to \proglang{C++}) and \code{Rcpp::wrap} (for conversion from \proglang{C++} to \proglang{R}). \end{quote} The \pkg{RcppEigen} package provides the header files composing the \pkg{Eigen} \proglang{C++} template library, along with implementations of \code{Rcpp::as} and \code{Rcpp::wrap} for the \proglang{C++} classes defined in \pkg{Eigen}. The \pkg{Eigen} classes themselves provide high-performance, versatile and comprehensive representations of dense and sparse matrices and vectors, as well as decompositions and other functions to be applied to these objects. The next section introduces some of these classes and shows how to interface to them from \proglang{R}. \section[Eigen classes]{\pkg{Eigen} classes} \label{sec:eclasses} \pkg{Eigen} is a \proglang{C++} template library providing classes for many forms of matrices, vectors, arrays and decompositions. These classes are flexible and comprehensive allowing for both high performance and well structured code representing high-level operations. \proglang{C++} code based on \pkg{Eigen} is often more like \proglang{R} code, working on the ``whole object'', than like compiled code in other languages where operations often must be coded in loops. As in many \proglang{C++} template libraries using template meta-programming \citep{Abrahams+Gurtovoy:2004:TemplateMetaprogramming}, the templates themselves can be very complicated. However, \pkg{Eigen} provides \code{typedef}s for common classes that correspond to \proglang{R} matrices and vectors, as shown in Table~\ref{tab:REigen}, and this paper will use these \code{typedef}s. \begin{table}[t!] \centering \begin{tabular}{l l} \hline \multicolumn{1}{c}{\proglang{R} object type} & \multicolumn{1}{c}{\pkg{Eigen} class typedef}\\ \hline numeric matrix & \code{MatrixXd}\\ integer matrix & \code{MatrixXi}\\ complex matrix & \code{MatrixXcd}\\ numeric vector & \code{VectorXd}\\ integer vector & \code{VectorXi}\\ complex vector & \code{VectorXcd}\\ \code{Matrix::dgCMatrix} \phantom{XXX} & \code{SparseMatrix}\\ \hline \end{tabular} \caption{Correspondence between \proglang{R} matrix and vector types and classes in the \pkg{Eigen} namespace. \label{tab:REigen}} \end{table} Here, \code{Vector} and \code{Matrix} describe the dimension of the object. The \code{X} signals that these are dynamically-sized objects (as opposed to fixed-size matrices such as $3 \times 3$ matrices also available in \pkg{Eigen}). Lastly, the trailing characters \code{i}, \code{d} and \code{cd} denote, respectively, storage types \code{integer}, \code{double} and \code{complex double}. The \proglang{C++} classes shown in Table~\ref{tab:REigen} are in the \pkg{Eigen} namespace, which means that they must be written as \code{Eigen::MatrixXd}. However, if one prefaces the use of these class names with a declaration like \begin{quote} \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{;}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \end{quote} \vspace*{-0.4cm} then one can use these names without the namespace qualifier. \subsection[Mapped matrices in Eigen]{Mapped matrices in \pkg{Eigen}} \label{sec:mapped} Storage for the contents of matrices from the classes shown in Table~\ref{tab:REigen} is allocated and controlled by the class constructors and destructors. Creating an instance of such a class from an \proglang{R} object involves copying its contents. An alternative is to have the contents of the \proglang{R} matrix or vector mapped to the contents of the object from the \pkg{Eigen} class. For dense matrices one can use the \pkg{Eigen} templated class \code{Map}, and for sparse matrices one can deploy the \pkg{Eigen} templated class \code{MappedSparseMatrix}. One must, of course, be careful not to modify the contents of the \proglang{R} object in the \proglang{C++} code. A recommended practice is always to declare mapped objects as {\ttfamily\hlkwb{const}\normalfont}. \subsection[Arrays in Eigen]{Arrays in \pkg{Eigen}} \label{sec:arrays} For matrix and vector classes \pkg{Eigen} overloads the \code{`*'} operator as matrix multiplication. Occasionally component-wise operations instead of matrix operations are desired, for which the \code{Array} templated classes are used in \pkg{Eigen}. Switching back and forth between matrices or vectors and arrays is usually done via the \code{array()} method for matrix or vector objects and the \code{matrix()} method for arrays. \subsection[Structured matrices in Eigen]{Structured matrices in \pkg{Eigen}} \label{sec:structured} \pkg{Eigen} provides classes for matrices with special structure such as symmetric matrices, triangular matrices and banded matrices. For dense matrices, these special structures are described as ``views'', meaning that the full dense matrix is stored but only part of the matrix is used in operations. For a symmetric matrix one must specify whether the lower triangle or the upper triangle is to be used as the contents, with the other triangle defined by the implicit symmetry. \section{Some simple examples} \label{sec:simple} \proglang{C++} functions to perform simple operations on matrices or vectors can follow a pattern of: \begin{enumerate} \item Map the \proglang{R} objects passed as arguments into \pkg{Eigen} objects. \item Create the result. \item Return \code{Rcpp::wrap} applied to the result. \end{enumerate} An idiom for the first step is % using Eigen::Map; % using Eigen::MatrixXd; % using Rcpp::as; % % const Map A(as >(AA)); %\end{lstlisting} \begin{quote} \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{Map}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{MatrixXd}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Rcpp}\hlsym{::}\hlstd{as}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXd}\hlsym{$>$}\hlstd{\ \ }\hlsym{}\hlstd{}\hlkwd{A}\hlstd{}\hlsym{(}\hlstd{as}\hlsym{$<$}\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXd}\hlsym{$>$\ $>$(}\hlstd{AA}\hlsym{));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \end{quote} \vspace*{-0.4cm} where \code{AA} is the name of the \proglang{R} object (of type \code{SEXP} in \proglang{C} and \proglang{C++}) passed to the \proglang{C++} function. An alternative to the \code{using} declarations is to declare a \code{typedef} as in % typedef Eigen::Map MapMatd; % const MapMatd A(Rcpp::as(AA)); \begin{quote} \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{$<$}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ \ }\hlopt{}\hlstd{MapMatd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{Rcpp}\hlopt{::}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \end{quote} \vspace*{-0.4cm} The \code{cxxfunction} function from the \pkg{inline} package \citep*{CRAN:inline} for \proglang{R} and its \pkg{RcppEigen} plugin provide a convenient method of developing and debugging the \proglang{C++} code. For actual production code one generally incorporates the \proglang{C++} source code files in a package and includes the line \code{LinkingTo: Rcpp, RcppEigen} in the package's \code{DESCRIPTION} file. The \code{RcppEigen.package.skeleton} function provides a quick way of generating the skeleton of a package that will use \pkg{RcppEigen}. The \code{cxxfunction} with the \code{"Rcpp"} or \code{"RcppEigen"} plugins has the \code{as} and \code{wrap} functions already defined as \code{Rcpp::as} and \code{Rcpp::wrap}. In the examples below these declarations are omitted. It is important to remember that they are needed in actual \proglang{C++} source code for a package. The first few examples are simply for illustration as the operations shown could be more effectively performed directly in \proglang{R}. Finally, the results from \pkg{Eigen} are compared to those from the direct \proglang{R} results. \subsection{Transpose of an integer matrix} \label{sec:transpose} The next \proglang{R} code snippet creates a simple matrix of integers \begin{CodeInput} R> (A <- matrix(1:6, ncol = 2)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 \end{CodeOutput} \begin{CodeInput} R> str(A) \end{CodeInput} \begin{CodeOutput} int [1:3, 1:2] 1 2 3 4 5 6 \end{CodeOutput} and, in Figure~\ref{trans}, the \code{transpose()} method for the \code{Eigen::MatrixXi} class is used to return the transpose of the supplied matrix. The \proglang{R} matrix in the \code{SEXP} \code{AA} is first mapped to an \code{Eigen::MatrixXi} object, and then the matrix \code{At} is constructed from its transpose and returned to \proglang{R}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{Map}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{MatrixXi}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ Map\ the\ integer\ matrix\ AA\ from\ R}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXi}\hlsym{$>$}\hlstd{\ \ }\hlsym{}\hlstd{}\hlkwd{A}\hlstd{}\hlsym{(}\hlstd{as}\hlsym{$<$}\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXi}\hlsym{$>$\ $>$(}\hlstd{AA}\hlsym{));}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ evaluate\ and\ return\ the\ transpose\ of\ A}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{At}\hlstd{}\hlsym{(}\hlstd{A}\hlsym{.}\hlstd{}\hlkwd{transpose}\hlstd{}\hlsym{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{}\hlkwd{wrap}\hlstd{}\hlsym{(}\hlstd{At}\hlsym{);}\hlstd{}\hspace*{\fill} \normalfont \hrule \caption{\code{transCpp}: Transpose a matrix of integers. \label{trans}} \end{figure} The \proglang{R} snippet below compiles and links the \proglang{C++} code segment. The actual work is done by the function \code{cxxfunction} from the \pkg{inline} package which compiles, links and loads code written in \proglang{C++} and supplied as a character variable. This frees the user from having to know about compiler and linker details and options, which makes ``exploratory programming'' much easier. Here the program piece to be compiled is stored as a character variable named \code{transCpp}, and \code{cxxfunction} creates an executable function which is assigned to \code{ftrans}. This new function is used on the matrix $\bm A$ shown above, and one can check that it works as intended by comparing the output to an explicit transpose of the matrix argument. \begin{CodeInput} R> ftrans <- cxxfunction(signature(AA = "matrix"), transCpp, + plugin = "RcppEigen") R> (At <- ftrans(A)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(At, t(A))) \end{CodeInput} For numeric or integer matrices the \code{adjoint()} method is equivalent to the \code{transpose()} method. For complex matrices, the adjoint is the conjugate of the transpose. In keeping with the conventions in the \pkg{Eigen} documentation the \code{adjoint()} method is used in what follows to create the transpose of numeric or integer matrices. \subsection{Products and cross-products} \label{sec:products} As mentioned in Section~\ref{sec:arrays}, the \code{`*'} operator is overloaded as matrix multiplication for the various matrix and vector classes in \pkg{Eigen}. The \proglang{C++} code in Figure~\ref{prod} produces a list containing both the product and cross-product (in the sense of the \proglang{R} function call \code{crossproduct(A, B)} evaluating $\bm A^\top\bm B$) of its two arguments % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{$<$}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{$>$}\hlstd{\ \ \ }\hlopt{}\hlstd{MapMati}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{B}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{BB}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{C}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{CC}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"B\ \%{*}\%\ C"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{B\ }\hlopt{{*}\ }\hlstd{C}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"crossprod(B,\ C)"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{B}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{C}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{prodCpp}: Product and cross-product of two matrices. \label{prod}} \end{figure} % \begin{CodeInput} R> fprod <- cxxfunction(signature(BB = "matrix", CC = "matrix"), + prodCpp, "RcppEigen") R> B <- matrix(1:4, ncol = 2) R> C <- matrix(6:1, nrow = 2) R> str(fp <- fprod(B, C)) \end{CodeInput} \begin{CodeOutput} List of 2 $ B %*% C : int [1:2, 1:3] 21 32 13 20 5 8 $ crossprod(B, C): int [1:2, 1:3] 16 38 10 24 4 10 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(fp[[1]], B %*% C), all.equal(fp[[2]], crossprod(B, C))) \end{CodeInput} % Note that the \code{create} class method for \code{Rcpp::List} implicitly applies \code{Rcpp::wrap} to its arguments. \subsection{Crossproduct of a single matrix} \label{sec:crossproduct} As shown in the last example, the \proglang{R} function \code{crossprod} calculates the product of the transpose of its first argument with its second argument. The single argument form, \code{crossprod(X)}, evaluates $\bm X^\top\bm X$. One could, of course, calculate this product as \begin{verbatim} t(X) %*% X \end{verbatim} but \code{crossprod(X)} is roughly twice as fast because the result is known to be symmetric and only one triangle needs to be calculated. The function \code{tcrossprod} evaluates \code{crossprod(t(X))} without actually forming the transpose. To express these calculations in \pkg{Eigen}, a \code{SelfAdjointView}---which is a dense matrix of which only one triangle is used, the other triangle being inferred from the symmetry---is created. (The characterization ``self-adjoint'' is equivalent to symmetric for non-complex matrices.) The \pkg{Eigen} class name is \code{SelfAdjointView}. The method for general matrices that produces such a view is called \code{selfadjointView}. Both require specification of either the \code{Lower} or \code{Upper} triangle. For triangular matrices the class is \code{TriangularView} and the method is \code{triangularView}. The triangle can be specified as \code{Lower}, \code{UnitLower}, \code{StrictlyLower}, \code{Upper}, \code{UnitUpper} or \code{StrictlyUpper}. For self-adjoint views the \code{rankUpdate} method adds a scalar multiple of $\bm A\bm A^\top$ to the current symmetric matrix. The scalar multiple defaults to 1. The code in Figure~\ref{crossprod} produces both $\bm A^\top\bm A$ and $\bm A\bm A^\top$ from a matrix $\bm A$. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{Lower}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$\ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$\ $>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{m}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()),\ }\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXi}\hlstd{}\hlopt{(}\hlstd{n}\hlopt{,\ }\hlstd{n}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$().}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()));}\hspace*{\fill}\\ \hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{AAt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXi}\hlstd{}\hlopt{(}\hlstd{m}\hlopt{,\ }\hlstd{m}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$().}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"crossprod(A)"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{AtA}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"tcrossprod(A)"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{AAt}\hlopt{);}\hlstd{}\hspace*{\fill} \normalfont \normalsize \hrule \caption{\code{crossprodCpp}: Cross-product and transposed cross-product of a single matrix. \label{crossprod}} \end{figure} \begin{CodeInput} R> fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen") R> str(crp <- fcprd(A)) \end{CodeInput} \begin{CodeOutput} List of 2 $ crossprod(A) : int [1:2, 1:2] 14 32 32 77 $ tcrossprod(A): int [1:3, 1:3] 17 22 27 22 29 36 27 36 45 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(crp[[1]], crossprod(A)), + all.equal(crp[[2]], tcrossprod(A))) \end{CodeInput} To some, the expressions in Figure~\ref{crossprod} to construct \code{AtA} and \code{AAt} are compact and elegant. To others they are hopelessly confusing. If you find yourself in the latter group, you just need to read the expression left to right. So, for example, we construct \code{AAt} by creating a general integer matrix of size $m\times m$ (where $\bm A$ is $m\times n$), ensuring that all its elements are zero, regarding it as a self-adjoint (i.e., symmetric) matrix using the elements in the lower triangle, then adding $\bm A\bm A^\top$ to it and converting back to a general matrix form (i.e., the strict lower triangle is copied into the strict upper triangle). In more detail: \begin{enumerate} \item \code{MatrixXi(n, n)} creates an $n\times n$ integer matrix with arbitrary contents \item \code{.setZero()} zeros out the contents of the matrix \item \code{.selfAdjointView()} causes what follows to treat the matrix as a symmetric matrix in which only the lower triangle is used, the strict upper triangle being inferred by symmetry \item \code{.rankUpdate(A)} forms the sum $\bm B+\bm A\bm A^\top$ where $\bm B$ is the symmetric matrix of zeros created in the previous steps. \end{enumerate} The assignment of this symmetric matrix to the (general) \code{MatrixXi} object \code{AAt} causes the result to be symmetrized during the assignment. For these products one could define the symmetric matrix from either the lower triangle or the upper triangle as the result will be symmetrized before it is returned. To cut down on repetition of \code{using} statements we gather them in a character variable, \code{incl}, that will be given as the \code{includes} argument in the calls to \code{cxxfunction}. We also define a utility function, \code{AtA}, that returns the crossproduct matrix as shown in Figure~\ref{fig:incl} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{LLT}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Lower}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Upper}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{VectorXd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapMatd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapMati}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{VectorXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapVecd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{inline\ }\hlstd{MatrixXd\ }\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlopt{\&\ }\hlstd{A}\hlopt{)\ \{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwb{int}\hlstd{\ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{return}\hlstd{\ \ \ }\hlkwa{}\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{n}\hlopt{,}\hlstd{n}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{The contents of the character vector, \code{incl}, that will preface \proglang{C++} code segments that follow. \label{fig:incl}} \end{figure} \subsection{Cholesky decomposition of the crossprod} \label{sec:chol} The Cholesky decomposition of the positive-definite, symmetric matrix, $\bm A$, can be written in several forms. Numerical analysts define the ``LLt'' form as the lower triangular matrix, $\bm L$, such that $\bm A=\bm L\bm L^\top$ and the ``LDLt'' form as a unit lower triangular matrix $\bm L$ and a diagonal matrix $\bm D$ with positive diagonal elements such that $\bm A=\bm L\bm D\bm L^\top$. Statisticians often write the decomposition as $\bm A=\bm R^\top\bm R$ where $\bm R$ is an upper triangular matrix. Of course, this $\bm R$ is simply the transpose of $\bm L$ from the ``LLt'' form. The templated \pkg{Eigen} classes for the LLt and LDLt forms are called \code{LLT} and \code{LDLT} and the corresponding methods are \code{.llt()} and \code{.ldlt()}. Because the Cholesky decomposition involves taking square roots, we pass a numeric matrix, $\bm A$, not an integer matrix. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const}\hlstd{\ \ }\hlkwb{}\hlstd{LLT}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"L"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{()),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"R"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixU}\hlstd{}\hlopt{()));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{cholCpp}: Cholesky decomposition of a cross-product. \label{chol}} \end{figure} % \begin{CodeInput} R> storage.mode(A) <- "double" R> fchol <- cxxfunction(signature(AA = "matrix"), cholCpp, "RcppEigen", incl) R> (ll <- fchol(A)) \end{CodeInput} \begin{CodeOutput} $L [,1] [,2] [1,] 3.74166 0.00000 [2,] 8.55236 1.96396 $R [,1] [,2] [1,] 3.74166 8.55236 [2,] 0.00000 1.96396 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(ll[[2]], chol(crossprod(A)))) \end{CodeInput} \subsection{Determinant of the cross-product matrix} \label{sec:determinant} The ``D-optimal'' criterion for experimental design chooses the design that maximizes the determinant, $|\bm X^\top\bm X|$, for the $n\times p$ model matrix (or Jacobian matrix), $\bm X$. The determinant, $|\bm L|$, of the $p\times p$ lower Cholesky factor $\bm L$, defined so that $\bm L\bm L^\top=\bm X^\top\bm X$, is the product of its diagonal elements, as is the case for any triangular matrix. By the properties of determinants, \begin{displaymath} |\bm X^\top\bm X|=|\bm L\bm L^\top|=|\bm L|\,|\bm L^\top|=|\bm L|^2 \end{displaymath} Alternatively, if using the ``LDLt'' decomposition, $\bm L\bm D\bm L^\top=\bm X^\top\bm X$ where $\bm L$ is unit lower triangular and $\bm D$ is diagonal then $|\bm X^\top\bm X|$ is the product of the diagonal elements of $\bm D$. Because it is known that the diagonal elements of $\bm D$ must be non-negative, one often evaluates the logarithm of the determinant as the sum of the logarithms of the diagonal elements of $\bm D$. Several options are shown in Figure~\ref{cholDet}. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ }\hlstd{}\hlkwd{ata}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ double}\hlstd{\ \ \ }\hlkwb{}\hlstd{}\hlkwd{detL}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{ata}\hlopt{.}\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{()).}\hlstd{}\hlkwd{diagonal}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{prod}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{Dvec}\hlstd{}\hlopt{(}\hlstd{ata}\hlopt{.}\hlstd{}\hlkwd{ldlt}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{vectorD}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"d1"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{detL\ }\hlopt{{*}\ }\hlstd{detL}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"d2"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{Dvec}\hlopt{.}\hlstd{}\hlkwd{prod}\hlstd{}\hlopt{(),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"ld"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{Dvec}\hlopt{.}\hlstd{}\hlkwd{array}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{log}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{sum}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{cholDetCpp}: Determinant of a cross-product using the ``LLt'' and ``LDLt'' forms of the Cholesky decomposition.} \label{cholDet} \end{figure} % \begin{CodeInput} R> fdet <- cxxfunction(signature(AA = "matrix"), cholDetCpp, "RcppEigen", + incl) R> unlist(ll <- fdet(A)) \end{CodeInput} \begin{CodeOutput} d1 d2 ld 54.00000 54.00000 3.98898 \end{CodeOutput} % Note the use of the \code{array()} method in the calculation of the log-determinant. Because the \code{log()} method applies to arrays, not to vectors or matrices, an array must be created from \code{Dvec} before applying the \code{log()} method. \section{Least squares solutions} \label{sec:leastSquares} A common operation in statistical computing is calculating a least squares solution, $\widehat{\bm\beta}$, defined as \begin{displaymath} \widehat{\bm\beta}=\argmin_{\bm \beta}\|\bm y-\bm X\bm\beta\|^2 \end{displaymath} where the model matrix, $\bm X$, is $n\times p$ ($n\ge p$) and $\bm y$ is an $n$-dimensional response vector. There are several ways, based on matrix decompositions, to determine such a solution. Earlier, two forms of the Cholesky decomposition were discussed: ``LLt'' and ``LDLt'', which can both be used to solve for $\widehat{\bm\beta}$. Other decompositions that can be used are the QR decomposition, with or without column pivoting, the singular value decomposition and the eigendecomposition of a symmetric matrix. Determining a least squares solution is relatively straightforward. However, statistical computing often requires additional information, such as the standard errors of the coefficient estimates. Calculating these involves evaluating the diagonal elements of $\left(\bm X^\top\bm X\right)^{-1}$ and the residual sum of squares, $\|\bm y-\bm X\widehat{\bm\beta}\|^2$. \subsection{Least squares using the ``LLt'' Cholesky} \label{sec:LLtLeastSquares} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{X}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{XX}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapVecd}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{y}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()),\ }\hlstd{}\hlkwd{p}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{LLT}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ }\hlstd{}\hlkwd{fitted}\hlstd{}\hlopt{(}\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{resid}\hlstd{}\hlopt{(}\hlstd{y\ }\hlopt{{-}\ }\hlstd{fitted}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{df}\hlstd{}\hlopt{(}\hlstd{n\ }\hlopt{{-}\ }\hlstd{p}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ double}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{s}\hlstd{}\hlopt{(}\hlstd{resid}\hlopt{.}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{()\ /\ }\hlstd{std}\hlopt{::}\hlstd{}\hlkwd{sqrt}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{(}\hlstd{df}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,\ }\hlstd{p}\hlopt{))}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{colwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return}\hlstd{\ \ \ \ \ }\hlkwa{}\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"coefficients"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ }\hlopt{=\ }\hlstd{betahat}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"fitted.values"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{fitted}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"residuals"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ }\hlopt{=\ }\hlstd{resid}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"s"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{s}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"df.residual"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ }\hlopt{=\ }\hlstd{df}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"rank"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{p}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"Std.\ Error"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ }\hlopt{=\ }\hlstd{se}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{lltLSCpp}: Least squares using the Cholesky decomposition. \label{lltLS}} \end{figure} Figure~\ref{lltLS} shows a calculation of the least squares coefficient estimates (\code{betahat}) and the standard errors (\code{se}) through an ``LLt'' Cholesky decomposition of the crossproduct of the model matrix, $\bm X$. Next, the results from this calculation are compared to those from the \code{lm.fit} function in \proglang{R} (\code{lm.fit} is the workhorse function called by \code{lm} once the model matrix and response have been evaluated). \begin{CodeInput} R> lltLS <- cxxfunction(signature(XX = "matrix", yy = "numeric"), lltLSCpp, + "RcppEigen", incl) R> data("trees", package = "datasets") R> str(lltFit <- with(trees, lltLS(cbind(1, log(Girth)), log(Volume)))) \end{CodeInput} \begin{CodeOutput} List of 7 $ coefficients : num [1:2] -2.35 2.2 $ fitted.values: num [1:31] 2.3 2.38 2.43 2.82 2.86 ... $ residuals : num [1:31] 0.0298 -0.0483 -0.1087 -0.0223 0.0727 ... $ s : num 0.115 $ df.residual : int 29 $ rank : int 2 $ Std. Error : num [1:2] 0.2307 0.0898 R> str(lmFit <- with(trees, lm.fit(cbind(1, log(Girth)), log(Volume)))) List of 8 $ coefficients : Named num [1:2] -2.35 2.2 ..- attr(*, "names")= chr [1:2] "x1" "x2" $ residuals : num [1:31] 0.0298 -0.0483 -0.1087 -0.0223 0.0727 ... $ effects : Named num [1:31] -18.2218 2.8152 -0.1029 -0.0223 0.0721 ... ..- attr(*, "names")= chr [1:31] "x1" "x2" "" "" ... $ rank : int 2 $ fitted.values: num [1:31] 2.3 2.38 2.43 2.82 2.86 ... $ assign : NULL $ qr :List of 5 ..$ qr : num [1:31, 1:2] -5.57 0.18 0.18 0.18 0.18 ... ..$ qraux: num [1:2] 1.18 1.26 ..$ pivot: int [1:2] 1 2 ..$ tol : num 1e-07 ..$ rank : int 2 ..- attr(*, "class")= chr "qr" $ df.residual : int 29 \end{CodeOutput} \begin{CodeInput} R> for(nm in c("coefficients", "residuals", "fitted.values", "rank")) + stopifnot(all.equal(lltFit[[nm]], unname(lmFit[[nm]]))) R> stopifnot(all.equal(lltFit[["Std. Error"]], + unname(coef(summary(lm(log(Volume) ~ log(Girth), trees)))[,2]))) \end{CodeInput} There are several aspects of the \proglang{C++} code in Figure~\ref{lltLS} worth mentioning. The \code{solve} method for the \code{LLT} object evaluates, in this case, $\left(\bm X^\top\bm X\right)^{-1}\bm X^\top\bm y$ but without actually evaluating the inverse. The calculation of the residuals, $\bm y-\widehat{\bm y}$, can be written, as in \proglang{R}, as \code{y - fitted}. (But note that \pkg{Eigen} classes do not have a ``recycling rule'' as in \proglang{R}. That is, the two vector operands must have the same length.) The \code{norm()} method evaluates the square root of the sum of squares of the elements of a vector. Although one does not explicitly evaluate $\left(\bm X^\top\bm X\right)^{-1}$ one does evaluate $\bm L^{-1}$ to obtain the standard errors. Note also the use of the \code{colwise()} method in the evaluation of the standard errors. It applies a method to the columns of a matrix, returning a vector. The \pkg{Eigen} \code{colwise()} and \code{rowwise()} methods are similar in effect to the \code{apply} function in \proglang{R}. In the descriptions of other methods for solving least squares problems, much of the code parallels that shown in Figure~\ref{lltLS}. The redundant parts are omitted, and only the evaluation of the coefficients, the rank and the standard errors is shown. Actually, the standard errors are calculated only up to the scalar multiple of $s$, the residual standard error, in these code fragments. The calculation of the residuals and $s$ and the scaling of the coefficient standard errors is the same for all methods. (See the files \code{fastLm.h} and \code{fastLm.cpp} in the \pkg{RcppEigen} source package for details.) \subsection{Least squares using the unpivoted QR decomposition} \label{sec:QR} A QR decomposition has the form \begin{displaymath} \bm X=\bm Q\bm R=\bm Q_1\bm R_1 \end{displaymath} where $\bm Q$ is an $n\times n$ orthogonal matrix, which means that $\bm Q^\top\bm Q=\bm Q\bm Q^\top=\bm I_n$, and the $n\times p$ matrix $\bm R$ is zero below the main diagonal. The $n\times p$ matrix $\bm Q_1$ is the first $p$ columns of $\bm Q$ and the $p\times p$ upper triangular matrix $\bm R_1$ is the top $p$ rows of $\bm R$. There are three \pkg{Eigen} classes for the QR decomposition: \code{HouseholderQR} provides the basic QR decomposition using Householder transformations, \code{ColPivHouseholderQR} incorporates column pivots and \code{FullPivHouseholderQR} incorporates both row and column pivots. Figure~\ref{QRLS} shows a least squares solution using the unpivoted QR decomposition. The calculations in Figure~\ref{QRLS} are quite similar to those in Figure~\ref{lltLS}. In fact, if one had extracted the upper triangular factor (the \code{matrixU()} method) from the \code{LLT} object in Figure~\ref{lltLS}, the rest of the code would be nearly identical. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{HouseholderQR}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{HouseholderQR}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{QR}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{QR}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{y}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{fitted}\hlstd{}\hlopt{(}\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{df}\hlstd{}\hlopt{(}\hlstd{n\ }\hlopt{{-}\ }\hlstd{p}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{QR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topRows}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{).}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,}\hlstd{p}\hlopt{)).}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{QRLSCpp}: Least squares using the unpivoted QR decomposition. \label{QRLS}} \end{figure} \subsection{Handling the rank-deficient case} \label{sec:rankdeficient} One important consideration when determining least squares solutions is whether $\rank(\bm X)$ is $p$, a situation described by saying that $\bm X$ has ``full column rank''. When $\bm X$ does not have full column rank it is said to be ``rank deficient''. Although the theoretical rank of a matrix is well-defined, its evaluation in practice is not. At best one can compute an effective rank according to some tolerance. Decompositions that allow to estimation of the rank of the matrix in this way are said to be ``rank-revealing''. Because the \code{model.matrix} function in \proglang{R} does a considerable amount of symbolic analysis behind the scenes, one usually ends up with full-rank model matrices. The common cases of rank-deficiency, such as incorporating both a constant term and a full set of indicators columns for the levels of a factor, are eliminated. Other, more subtle, situations will not be detected at this stage, however. A simple example occurs when there is a ``missing cell'' in a two-way layout and the interaction of the two factors is included in the model. \begin{CodeInput} R> dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]), + f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ] R> xtabs(~ f2 + f1, dd) \end{CodeInput} \begin{CodeOutput} f1 f2 A B C D a 2 0 2 2 b 2 2 2 2 c 2 2 2 2 \end{CodeOutput} \begin{CodeInput} R> mm <- model.matrix(~ f1 * f2, dd) R> kappa(mm) \end{CodeInput} \begin{CodeOutput} [1] 4.30923e+16 \end{CodeOutput} This yields a large condition number, indicating rank deficiency. Alternatively, the reciprocal condition number can be evaluated. \begin{CodeInput} R> rcond(mm) \end{CodeInput} \begin{CodeOutput} [1] 2.3206e-17 \end{CodeOutput} \begin{CodeInput} R> (c(rank = qr(mm)$rank, p = ncol(mm))) \end{CodeInput} \begin{CodeOutput} rank p 11 12 \end{CodeOutput} \begin{CodeInput} R> set.seed(1) R> dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1) R> fm1 <- lm(y ~ f1 * f2, dd) R> writeLines(capture.output(print(summary(fm1), + signif.stars = FALSE))[9:22]) \end{CodeInput} \begin{CodeOutput} Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9779 0.0582 16.8 3.4e-09 f1B 12.0381 0.0823 146.3 < 2e-16 f1C 3.1172 0.0823 37.9 5.2e-13 f1D 4.0685 0.0823 49.5 2.8e-14 f2b 5.0601 0.0823 61.5 2.6e-15 f2c 5.9976 0.0823 72.9 4.0e-16 f1B:f2b -3.0148 0.1163 -25.9 3.3e-11 f1C:f2b 7.7030 0.1163 66.2 1.2e-15 f1D:f2b 8.9643 0.1163 77.1 < 2e-16 f1B:f2c NA NA NA NA f1C:f2c 10.9613 0.1163 94.2 < 2e-16 f1D:f2c 12.0411 0.1163 103.5 < 2e-16 \end{CodeOutput} The \code{lm} function for fitting linear models in \proglang{R} uses a rank-revealing form of the QR decomposition. When the model matrix is determined to be rank deficient, according to the threshold used in \proglang{R}'s \code{qr} function, the model matrix is reduced to $\rank{(\bm X)}$ columns by pivoting selected columns (those that are apparently linearly dependent on columns to their left) to the right hand side of the matrix. A solution for this reduced model matrix is determined and the coefficients and standard errors for the redundant columns are flagged as missing. An alternative approach is to evaluate the ``pseudo-inverse'' of $\bm X$ from the singular value decomposition (SVD) of $\bm X$ or the eigendecomposition of $\bm X^\top\bm X$. The SVD is of the form \begin{displaymath} \bm X=\bm U\bm D\bm V^\top=\bm U_1\bm D_1\bm V^\top \end{displaymath} where $\bm U$ is an orthogonal $n\times n$ matrix and $\bm U_1$ is its leftmost $p$ columns, $\bm D$ is $n\times p$ and zero off the main diagonal so that $\bm D_1$ is a $p\times p$ diagonal matrix with non-increasing, non-negative diagonal elements, and $\bm V$ is a $p\times p$ orthogonal matrix. The pseudo-inverse of $\bm D_1$, written $\bm D_1^+$ is a $p\times p$ diagonal matrix whose first $r=\rank(\bm X)$ diagonal elements are the inverses of the corresponding diagonal elements of $\bm D_1$ and whose last $p-r$ diagonal elements are zero. The tolerance for determining if an element of the diagonal of $\bm D_1$ is considered to be (effectively) zero is a multiple of the largest singular value (i.e., the $(1,1)$ element of $\bm D$). The pseudo-inverse, $\bm X^+$, of $\bm X$ is defined as \begin{displaymath} \bm X^+=\bm V\bm D_1^+\bm U_1^\top . \end{displaymath} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{inline\ }\hlstd{ArrayXd\ }\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlopt{\&\ }\hlstd{d}\hlopt{)\ \{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{ArrayXd}\hlstd{\ \ \ }\hlstd{}\hlkwd{di}\hlstd{}\hlopt{(}\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{size}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwb{double}\hlstd{\ \ }\hlkwb{}\hlstd{}\hlkwd{comp}\hlstd{}\hlopt{(}\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{maxCoeff}\hlstd{}\hlopt{()\ {*}\ }\hlstd{}\hlkwd{threshold}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{for\ }\hlstd{}\hlopt{(}\hlstd{}\hlkwb{int\ }\hlstd{j\ }\hlopt{=\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{;\ }\hlstd{j\ }\hlopt{$<$\ }\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{size}\hlstd{}\hlopt{();\ ++}\hlstd{j}\hlopt{)\ }\hlstd{di}\hlopt{{[}}\hlstd{j}\hlopt{{]}\ =\ (}\hlstd{d}\hlopt{{[}}\hlstd{j}\hlopt{{]}\ $<$\ }\hlstd{comp}\hlopt{)\ }\hlstd{?\ }\hlnum{0}\hlstd{}\hlopt{.\ :\ }\hlstd{}\hlnum{1}\hlstd{}\hlopt{./}\hlstd{d}\hlopt{{[}}\hlstd{j}\hlopt{{]};}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{return\ }\hlstd{di}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{DplusCpp}: Create the diagonal $\bm d^+$ of the pseudo-inverse, $\bm D_1^+$, from the array of singular values, $\bm d$. \label{Dplus}} \end{figure} In Figure~\ref{Dplus} a utility function, \code{Dplus}, is defined to return the diagonal of the pseudo-inverse, $\bm D_1^+$, as an array, given the singular values (the diagonal of $\bm D_1$) as an array. Calculation of the maximum element of $\bm d$ (the method is called \code{.maxCoeff()}) and the use of a \code{threshold()} function provides greater generality for the function. It can be used on the eigenvalues of $\bm X^\top\bm X$, as shown in Section~\ref{sec:eigendecomp}, even though these are returned in increasing order, as opposed to the decreasing order of the singular values. \subsection{Least squares using the SVD} \label{sec:SVDls} \begin{figure}[b!] \hrule \smallskip \noindent \ttfamily \hlkwb{const\ }\hlstd{Eigen}\hlopt{::}\hlstd{JacobiSVD}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{UDV}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{jacobiSvd}\hlstd{}\hlopt{(}\hlstd{Eigen}\hlopt{::}\hlstd{ComputeThinU}\hlopt{\textbar }\hlstd{Eigen}\hlopt{::}\hlstd{ComputeThinV}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{Dp}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{singularValues}\hlstd{}\hlopt{()));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{((}\hlstd{Dp\ }\hlopt{$>$\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{).}\hlstd{}\hlkwd{count}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{VDp}\hlstd{}\hlopt{(}\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{matrixV}\hlstd{}\hlopt{()\ {*}\ }\hlstd{Dp}\hlopt{.}\hlstd{}\hlkwd{matrix}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{asDiagonal}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{VDp\ }\hlopt{{*}\ }\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{matrixU}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{SVDLSCpp}: Least squares using the SVD. \label{SVDLS}} \end{figure} With these definitions the code for least squares using the singular value decomposition can be written as in Figure~\ref{SVDLS}. Even in the rank-deficient case this code will produce a complete set of coefficients and their standard errors. The user must be careful to check if the computed rank is less than $p$, the number of columns in $\bm X$, in which case the estimated coefficients are just one of an infinite number of coefficient vectors that produce the same fitted values. It happens that the solution from this pseudo-inverse is the minimum norm solution (in the sense that $\|\bm\beta\|^2$ is minimized among those $\bm\beta$ that minimize $\|\bm y-\bm X\bm\beta\|^2$). The standard errors of the coefficient estimates in the rank-deficient case must be interpreted carefully. The solution with one or more missing coefficients, as returned by the \code{lm.fit} function in \proglang{R} and by the column-pivoted QR decomposition described in Section~\ref{sec:colPivQR}, does not provide standard errors for the missing coefficients. That is, both the coefficient and its standard error are returned as \code{NA} because the least squares solution is performed on a reduced model matrix. It is also true that the solution returned by the SVD method is with respect to a reduced model matrix but the $p$ coefficient estimates and their $p$ standard errors don't show this. They are, in fact, linear combinations of a set of $r$ coefficient estimates and their standard errors. \pagebreak \subsection{Least squares using the eigendecomposition} \label{sec:eigendecomp} The eigendecomposition of $\bm X^\top\bm X$ is defined as \begin{displaymath} \bm X^\top\bm X=\bm V\bm\Lambda\bm V^\top \end{displaymath} where $\bm V$, the matrix of eigenvectors, is a $p\times p$ orthogonal matrix and $\bm\Lambda$ is a $p\times p$ diagonal matrix with non-decreasing, non-negative diagonal elements, called the eigenvalues of $\bm X^\top\bm X$. When the eigenvalues are distinct, this $\bm V$ is the same (up to reordering of the columns) as that in the SVD and the eigenvalues of $\bm X^\top\bm X$ are the (reordered) squares of the singular values of $\bm X$. With these definitions one can adapt much of the code from the SVD method for the eigendecomposition, as shown in Figure~\ref{SymmEigLS}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlkwb{const\ }\hlstd{Eigen}\hlopt{::}\hlstd{SelfAdjointEigenSolver}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{VLV}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{Dp}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{eig}\hlopt{.}\hlstd{}\hlkwd{eigenvalues}\hlstd{}\hlopt{()).}\hlstd{}\hlkwd{sqrt}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{((}\hlstd{Dp\ }\hlopt{$>$\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{).}\hlstd{}\hlkwd{count}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{VDp}\hlstd{}\hlopt{(}\hlstd{VLV}\hlopt{.}\hlstd{}\hlkwd{eigenvectors}\hlstd{}\hlopt{()\ {*}\ }\hlstd{Dp}\hlopt{.}\hlstd{}\hlkwd{matrix}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{asDiagonal}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{VDp\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{SymmEigLSCpp}: Least squares using the eigendecomposition. \label{SymmEigLS}} \end{figure} \subsection{Least squares using the column-pivoted QR decomposition} \label{sec:colPivQR} The column-pivoted QR decomposition provides results similar to those from \proglang{R} in both the full-rank and the rank-deficient cases. The decomposition is of the form \begin{displaymath} \bm X\bm P=\bm Q\bm R=\bm Q_1\bm R_1 \end{displaymath} where, as before, $\bm Q$ is $n\times n$ and orthogonal and $\bm R$ is $n\times p$ and upper triangular. The $p\times p$ matrix $\bm P$ is a permutation matrix. That is, its columns are a permutation of the columns of $\bm I_p$. It serves to reorder the columns of $\bm X$ so that the diagonal elements of $\bm R$ are non-increasing in magnitude. An instance of the class \code{Eigen::ColPivHouseholderQR} has a \code{rank()} method returning the computational rank of the matrix. When $\bm X$ is of full rank one can use essentially the same code as in the unpivoted decomposition except that one must reorder the standard errors. When $\bm X$ is rank-deficient, the coefficients and standard errors are evaluated for the leading $r$ columns of $\bm X\bm P$ only. In the rank-deficient case the straightforward calculation of the fitted values, as $\bm X\widehat{\bm\beta}$, cannot be used because some of the estimated coefficients, $\widehat{\bm\beta}$, are \code{NA} so the product will be a vector of \code{NA}'s. One could do some complicated rearrangement of the columns of X and the coefficient estimates but it is conceptually (and computationally) easier to employ the relationship \begin{displaymath} \widehat{\bm y} = \bm Q_1\bm Q_1^\top\bm y=\bm Q \begin{bmatrix} \bm I_r & \bm 0\\ \bm 0 & \bm 0 \end{bmatrix} \bm Q^\top\bm y \end{displaymath} The vector $\bm Q^\top\bm y$ is called the ``effects'' vector in \proglang{R}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{ColPivHouseholderQR}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{CPivQR}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{CPivQR}\hlopt{::}\hlstd{PermutationType}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Permutation}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{CPivQR}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{PQR}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{Permutation\ }\hlkwd{Pmat}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{colsPermutation}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{rank}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{VectorXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{betahat}\hlopt{,\ }\hlstd{fitted}\hlopt{,\ }\hlstd{se}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{if\ }\hlstd{}\hlopt{(}\hlstd{r\ }\hlopt{==\ }\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{())\ \{\ }\hlstd{}\hlslc{//\ full\ rank\ case}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat\ }\hlopt{=\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{fitted}\hlstd{\ \ }\hlstd{}\hlopt{=\ }\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topRows}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{).}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,\ }\hlstd{p}\hlopt{)).}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlopt{\}\ }\hlstd{}\hlkwa{else\ }\hlstd{}\hlopt{\{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{MatrixXd}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{Rinv}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topLeftCorner}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{,\ }\hlstd{r}\hlopt{)}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$().}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{,\ }\hlstd{r}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{VectorXd\ }\hlkwd{effects}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{householderQ}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlopt{.}\hlstd{}\hlkwd{fill}\hlstd{}\hlopt{(::}\hlstd{NA\textunderscore REAL}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{)}\hlstd{\ \ }\hlopt{=\ }\hlstd{Rinv\ }\hlopt{{*}\ }\hlstd{effects}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlopt{.}\hlstd{}\hlkwd{fill}\hlstd{}\hlopt{(::}\hlstd{NA\textunderscore REAL}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{)}\hlstd{\ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{Rinv}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{se}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ create\ fitted\ values\ from\ effects}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{effects}\hlopt{.}\hlstd{}\hlkwd{tail}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()\ {-}\ }\hlstd{r}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{fitted}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{householderQ}\hlstd{}\hlopt{()\ {*}\ }\hlstd{effects}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{ColPivQRLSCpp}: Least squares using the pivoted QR decomposition. \label{ColPivQRLS}} \end{figure} Just to check that the code in Figure~\ref{ColPivQRLS} does indeed provide the desired answer \begin{CodeInput} R> print(summary(fmPQR <- fastLm(y ~ f1 * f2, dd)), signif.st = FALSE) \end{CodeInput} \begin{CodeOutput} Call: fastLm.formula(formula = y ~ f1 * f2, data = dd) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.977859 0.058165 16.812 3.413e-09 f1B 12.038068 0.082258 146.346 < 2.2e-16 f1C 3.117222 0.082258 37.896 5.221e-13 f1D 4.068523 0.082258 49.461 2.833e-14 f2b 5.060123 0.082258 61.516 2.593e-15 f2c 5.997592 0.082258 72.912 4.015e-16 f1B:f2b -3.014763 0.116330 -25.916 3.266e-11 f1C:f2b 7.702999 0.116330 66.217 1.156e-15 f1D:f2b 8.964251 0.116330 77.059 < 2.2e-16 f1B:f2c NA NA NA NA f1C:f2c 10.961326 0.116330 94.226 < 2.2e-16 f1D:f2c 12.041081 0.116330 103.508 < 2.2e-16 Residual standard error: 0.2868 on 11 degrees of freedom Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999 \end{CodeOutput} \begin{CodeInput} R> all.equal(coef(fm1), coef(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(fitted(fm1)), fitted(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(residuals(fm1)), residuals(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} The rank-revealing SVD method produces the same fitted values but not the same coefficients. \begin{CodeInput} R> print(summary(fmSVD <- fastLm(y ~ f1 * f2, dd, method = 4)), + signif.st = FALSE) \end{CodeInput} \begin{CodeOutput} Call: fastLm.formula(formula = y ~ f1 * f2, data = dd, method = 4) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.977859 0.058165 16.812 3.413e-09 f1B 7.020458 0.038777 181.049 < 2.2e-16 f1C 3.117222 0.082258 37.896 5.221e-13 f1D 4.068523 0.082258 49.461 2.833e-14 f2b 5.060123 0.082258 61.516 2.593e-15 f2c 5.997592 0.082258 72.912 4.015e-16 f1B:f2b 2.002847 0.061311 32.667 2.638e-12 f1C:f2b 7.702999 0.116330 66.217 1.156e-15 f1D:f2b 8.964251 0.116330 77.059 < 2.2e-16 f1B:f2c 5.017610 0.061311 81.838 < 2.2e-16 f1C:f2c 10.961326 0.116330 94.226 < 2.2e-16 f1D:f2c 12.041081 0.116330 103.508 < 2.2e-16 Residual standard error: 0.2868 on 11 degrees of freedom Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999 \end{CodeOutput} \begin{CodeInput} R> all.equal(coef(fm1), coef(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] "'is.NA' value mismatch: 0 in current 1 in target" \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(fitted(fm1)), fitted(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(residuals(fm1)), residuals(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} The coefficients from the symmetric eigendecomposition method are the same as those from the SVD, hence the fitted values and residuals must be the same for these two methods. \begin{CodeInput} R> summary(fmVLV <- fastLm(y ~ f1 * f2, dd, method = 5)) R> all.equal(coef(fmSVD), coef(fmVLV)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \subsection{Comparative speed} In the \pkg{RcppEigen} package the \proglang{R} function to fit linear models using the methods described above is called \code{fastLm}. It follows an earlier example in the \pkg{Rcpp} package which was carried over to both \pkg{RcppArmadillo} and \pkg{RcppGSL}. The natural question to ask is, ``Is it indeed fast to use these methods based on \pkg{Eigen}?''. To this end, the package provides benchmarking code for these methods, \proglang{R}'s \code{lm.fit} function and the \code{fastLm} implementations in the \pkg{RcppArmadillo} \citep{CRAN:RcppArmadillo} and \pkg{RcppGSL} \citep{CRAN:RcppGSL} packages, if they are installed. The benchmark code, which uses the \pkg{rbenchmark} \citep{CRAN:rbenchmark} package, is in a file named \code{lmBenchmark.R} in the \code{examples} subdirectory of the installed \pkg{RcppEigen} package. It can be run as \begin{CodeInput} R> source(system.file("examples", "lmBenchmark.R", package = "RcppEigen")) \end{CodeInput} Results will vary according to the speed of the processor and the implementation of the BLAS (Basic Linear Algebra Subroutines) used. (\pkg{Eigen} methods do not use the BLAS but the other methods do.) The \pkg{Eigen}3 template library does not use multi-threaded code for these operations but does use the graphics pipeline instructions (SSE and SSE2, in this case) in some calculations. \begin{table}[t!] \centering \begin{tabular}{r r r r r} \hline Method & Relative& Elapsed & User & Sys \\ \hline LDLt & 1.00 & 1.18 & 1.17 & 0.00 \\ LLt & 1.01 & 1.19 & 1.17 & 0.00 \\ SymmEig & 2.76 & 3.25 & 2.70 & 0.52 \\ QR & 6.35 & 7.47 & 6.93 & 0.53 \\ arma & 6.60 & 7.76 & 25.69 & 4.47 \\ PivQR & 7.15 & 8.41 & 7.78 & 0.62 \\ lm.fit & 11.68 & 13.74 & 21.56 & 16.79 \\ GESDD & 12.58 & 14.79 & 44.01 & 10.96 \\ SVD & 44.48 & 52.30 & 51.38 & 0.80 \\ GSL & 150.46 & 176.95 & 210.52 & 149.86 \\ \hline \end{tabular} \caption{\code{lmBenchmark} results on a desktop computer for the default size, $100,000\times 40$, full-rank model matrix running 20 repetitions for each method. Times (Elapsed, User, and Sys) are in seconds. The BLAS in use is a locally-rebuilt version of the OpenBLAS library included with Ubuntu 11.10. \label{tab:lmRes}} \end{table} Results obtained on a desktop computer, circa 2010, are shown in Table~\ref{tab:lmRes}. The processor used for these timings is a 4-core processor but almost all the methods are single-threaded and not affected by the number of cores. Only the \code{arma}, \code{lm.fit}, \code{GESDD} and \code{GSL} methods benefit from the multi-threaded BLAS implementation provided by OpenBLAS, and the relative speed increase is modest for this problem size and number of cores (at 7.76 seconds relative to 10.29 seconds for \code{arma}, 13.74 seconds relative to 16.91 seconds for \code{lm.fit}, and 176.95 seconds relative to 193.29 seconds for the \code{GSL}). Parallel computing approaches will always have to trade-off increased communication and overhead costs against the potential gains from running multiple execution threads. % Nonetheless, with the ongoing %shift to multi-core architectures and an ever increasing number of cores in %modern processing units, it is conceivable that \pkg{Eigen} may also switch %to a multi-threaded approach to further improve its performance. These results indicate that methods based on forming and decomposing $\bm X^\top\bm X$ (LDLt, LLt and SymmEig) are considerably faster than the others. The SymmEig method, using a rank-revealing decomposition, would be preferred, although the LDLt method could probably be modified to be rank-revealing. However, the dimensions of the problem will influence the comparative results. Because there are 100,000 rows in $\bm X$, methods that decompose the whole $\bm X$ matrix (all the methods except those named above) will be at a disadvantage. The pivoted QR method is 1.6 times faster than \proglang{R}'s \code{lm.fit} on this test and provides nearly the same information as \code{lm.fit}. Methods based on the singular value decomposition (SVD and GSL) are much slower but, as mentioned above, this is caused in part by $\bm X$ having many more rows than columns. The GSL method from the GNU Scientific Library uses an older algorithm for the SVD and is clearly out of contention. The \code{GESDD} method provides an interesting hybrid: It uses the \pkg{Eigen} classes, but then deploys the LAPACK routine \code{dgesdd} for the actual SVD calculation. The resulting time is much faster than using the SVD implementation of \pkg{Eigen} which is not a particularly fast SVD method. \section{Delayed evaluation} \label{sec:delayed} A form of delayed evaluation is used in \pkg{Eigen}. That is, many operators and methods do not evaluate the result but instead return an ``expression object'' that is evaluated when needed. As an example, even though one writes the $\bm X^\top\bm X$ evaluation as \code{.rankUpdate(X.adjoint())} the \code{X.adjoint()} part is not evaluated immediately. The \code{rankUpdate} method detects that it has been passed a matrix that is to be used in its transposed form and evaluates the update by taking inner products of columns of $\bm X$ instead of rows of $\bm X^\top$. As \pkg{Eigen} has developed some of these unevaluated expressions have been modified. In \pkg{Eigen 3.1}, which is incorporated in version 0.3.1 of \pkg{RcppEigen}, the \code{.adjoint()} method applied to a real dense matrix copies the contents of the original matrix, flags it as row-major and interchanges the number of rows and columns. This is indeed the adjoint of the original matrix but, at one time, the \code{wrap} method for the \pkg{Eigen} dense matrix classes would fail on row-major matrices. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{}\hlkwd{wrap}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{transpose}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{badtransCpp}: Transpose producing a run-time error in early versions of \pkg{RcppEigen}. \label{badtrans}} \end{figure} In the code for the transpose of an integer matrix shown in Figure~\ref{trans} the transpose is assigned to a \code{MatrixXi} object before applying \code{wrap} to it. The assignment forces the evaluation as a column-major matrix. In early versions of the \pkg{RcppEigen} package if this step is skipped, as in Figure~\ref{badtrans}, the result would have been incorrect. \begin{CodeInput} R> Ai <- matrix(1:6, ncol = 2L) R> ftrans2 <- cxxfunction(signature(AA = "matrix"), badtransCpp, + "RcppEigen", incl) R> (At <- ftrans2(Ai)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 \end{CodeOutput} Although the problem no longer persists for this particular example, the recommended practice is to first assign objects before wrapping them for return to \proglang{R}. \section{Sparse matrices} \label{sec:sparse} \begin{figure}[b!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MappedSparseMatrix}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{SparseMatrix}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$\ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$\ $>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapVecd}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{y}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{SparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ \ \ \ \ }\hlopt{}\hlstd{}\hlkwd{At}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"At"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{At}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"Aty"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{At\ }\hlopt{{*}\ }\hlstd{y}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{sparseProdCpp}: Transpose and product with sparse matrices. \label{sparseProd}} \end{figure} \pkg{Eigen} provides sparse matrix classes. An \proglang{R} object of class \code{dgCMatrix} (from the \pkg{Matrix} package by \citet{CRAN:Matrix}) can be mapped as in Figure~\ref{sparseProd}. \begin{CodeInput} R> sparse1 <- cxxfunction(signature(AA = "dgCMatrix", yy = "numeric"), + sparseProdCpp, "RcppEigen", incl) R> data("KNex", package = "Matrix") R> rr <- sparse1(KNex$mm, KNex$y) R> stopifnot(all.equal(rr$At, t(KNex$mm)), + all.equal(rr$Aty, as.vector(crossprod(KNex$mm, KNex$y)))) \end{CodeInput} % Sparse Cholesky decompositions are provided by the \code{SimplicialLLT} and \code{SimplicialLDLT} classes in the \pkg{RcppEigen} package for \proglang{R}. These are subclasses of the \code{SimplicialCholesky} templated. A sample usage is shown in Figure~\ref{fig:spLS}. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MSpMat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{SparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ \ \ \ \ \ \ \ }\hlopt{}\hlstd{SpMat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{SimplicialLDLT}\hlopt{$<$}\hlstd{SpMat}\hlopt{$>$}\hlstd{\ \ \ \ \ \ \ }\hlopt{}\hlstd{SpChol}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{SpMat}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{At}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MSpMat}\hlopt{$>$(}\hlstd{AA}\hlopt{).}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{Aty}\hlstd{}\hlopt{(}\hlstd{At\ }\hlopt{{*}\ }\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{SpChol}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{Ch}\hlstd{}\hlopt{(}\hlstd{At\ }\hlopt{{*}\ }\hlstd{At}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{if\ }\hlstd{}\hlopt{(}\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{info}\hlstd{}\hlopt{()\ !=\ }\hlstd{Eigen}\hlopt{::}\hlstd{Success}\hlopt{)\ }\hlstd{}\hlkwa{return\ }\hlstd{R\textunderscore NilValue}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Named}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"betahat"}\hlstd{}\hlopt{)}\hlstd{\ \ }\hlopt{=\ }\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{Aty}\hlopt{),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{Named}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"perm"}\hlstd{}\hlopt{)}\hlstd{\ \ \ \ \ }\hlopt{=\ }\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{permutationP}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{indices}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{sparseLSCpp}: Solving a sparse least squares problem. \label{fig:spLS}} \end{figure} % \begin{CodeInput} R> sparse2 <- cxxfunction(signature(AA = "dgCMatrix", yy = "numeric"), + sparseLSCpp, "RcppEigen", incl) R> str(rr <- sparse2(KNex$mm, KNex$y)) \end{CodeInput} \begin{CodeOutput} List of 2 $ betahat: num [1:712] 823 340 473 349 188 ... $ perm : int [1:712] 572 410 414 580 420 425 417 426 431 445 ... \end{CodeOutput} \begin{CodeInput} R> res <- as.vector(solve(Ch <- Cholesky(crossprod(KNex$mm)), + crossprod(KNex$mm, KNex$y))) R> stopifnot(all.equal(rr$betahat, res)) R> all(rr$perm == Ch@perm) \end{CodeInput} \begin{CodeOutput} [1] FALSE \end{CodeOutput} The fill-reducing permutations are different. \section{Summary} This paper introduced the \pkg{RcppEigen} package which provides high-level linear algebra computations as an extension to the \proglang{R} system. \pkg{RcppEigen} is based on the modern \proglang{C++} library \pkg{Eigen} which combines extended functionality with excellent performance, and utilizes \pkg{Rcpp} to interface \proglang{R} with \proglang{C++}. Several illustrations covered common matrix operations and several approaches to solving a least squares problem---including an extended discussion of rank-revealing approaches. Sparse matrix computations were also discussed, and a short example provided an empirical demonstration of the excellent run-time performance of the \pkg{RcppEigen} package. \pagebreak \bibliography{\Sexpr{Rcpp:::bib()}} \end{document} %%% Local Variables: %%% mode: latex %%% TeX-master: t %%% End: RcppEigen/R/0000755000176200001440000000000014456322731012336 5ustar liggesusersRcppEigen/R/inline.R0000644000176200001440000000203413204322073013723 0ustar liggesusers## Copyright (C) 2011 - 2017 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. ## ## RcppEigen is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppEigen is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppEigen. If not, see . inlineCxxPlugin <- Rcpp::Rcpp.plugin.maker(include.before = "#include ", package = "RcppEigen" # , LinkingTo = c("RcppEigen", "Rcpp") ) RcppEigen/R/RcppExports.R0000644000176200001440000000066514456320632014757 0ustar liggesusers# Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 eigen_version <- function(single) { .Call(`_RcppEigen_eigen_version`, single) } Eigen_SSE <- function() { .Call(`_RcppEigen_Eigen_SSE`) } EigenNbThreads <- function() { .Call(`_RcppEigen_EigenNbThreads`) } fastLm_Impl <- function(X, y, type) { .Call(`_RcppEigen_fastLm_Impl`, X, y, type) } RcppEigen/R/fastLm.R0000644000176200001440000000716714456322731013722 0ustar liggesusers## fastLm.R: Rcpp/Eigen implementation of lm() ## ## Copyright (C) 2011 - 2023 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. ## ## RcppEigen is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppEigen is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppEigen. If not, see . fastLmPure <- function(X, y, method = 0L) { stopifnot(is.matrix(X), is.numeric(y), NROW(y)==nrow(X)) fastLm_Impl(X, y, method) } fastLm <- function(X, ...) UseMethod("fastLm") fastLm.default <- function(X, y, method = 0L, ...) { X <- as.matrix(X) y <- as.numeric(y) res <- fastLmPure(X, y, method) res$call <- match.call() res$intercept <- any(apply(X, 2, function(x) all(x == x[1]))) class(res) <- "fastLm" res } print.fastLm <- function(x, ...) { cat("\nCall:\n") print(x$call) cat("\nCoefficients:\n") print(x$coefficients, digits=5) } summary.fastLm <- function(object, ...) { coef <- object$coefficients se <- object$se tval <- coef/se object$coefficients <- cbind(Estimate = coef, "Std. Error" = se, "t value" = tval, "Pr(>|t|)" = 2*pt(-abs(tval), df=object$df.residual)) ## cf src/stats/R/lm.R and case with no weights and an intercept f <- object$fitted.values r <- object$residuals #mss <- sum((f - mean(f))^2) mss <- if (object$intercept) sum((f - mean(f))^2) else sum(f^2) rss <- sum(r^2) object$r.squared <- mss/(mss + rss) df.int <- if (object$intercept) 1L else 0L n <- length(f) rdf <- object$df.residual object$adj.r.squared <- 1 - (1 - object$r.squared) * ((n - df.int)/rdf) class(object) <- "summary.fastLm" object } print.summary.fastLm <- function(x, ...) { cat("\nCall:\n") print(x$call) cat("\nResiduals:\n") digits <- max(3, getOption("digits") - 3) print(summary(x$residuals, digits=digits)[-4]) cat("\n") printCoefmat(x$coefficients, P.values=TRUE, has.Pvalue=TRUE, ...) cat("\nResidual standard error: ", formatC(x$s, digits=digits), " on ", formatC(x$df.residual), " degrees of freedom\n", sep="") cat("Multiple R-squared: ", formatC(x$r.squared, digits=digits), ",\tAdjusted R-squared: ",formatC(x$adj.r.squared, digits=digits), "\n", sep="") invisible(x) } fastLm.formula <- function(formula, data=list(), method = 0L, ...) { mf <- model.frame(formula=formula, data=data) X <- model.matrix(attr(mf, "terms"), data=mf) y <- model.response(mf) res <- fastLm.default(X, y, method=method, ...) res$call <- match.call() ## I think this is redundant. The formula is available as res$call$formula res$formula <- formula res$intercept <- attr(attr(mf, "terms"), "intercept") res } predict.fastLm <- function(object, newdata=NULL, ...) { if (is.null(newdata)) { y <- fitted(object) } else { if (!is.null(object$formula)) { x <- model.matrix(object$formula, newdata) } else { x <- newdata # #nocov } y <- as.vector(x %*% coef(object)) } y } RcppEigen/R/RcppEigen.package.skeleton.R0000644000176200001440000001125314402632114017542 0ustar liggesusers## RcppEigen.package.skeleton.R: makes a skeleton for a package that wants to use RcppEigen ## ## Copyright (C) 2011 - 2023 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. ## ## RcppEigen is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppEigen is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppEigen. If not, see . RcppEigen.package.skeleton <- function(name= "anRpackage", list = character(), environment = .GlobalEnv, path = ".", force = FALSE, code_files = character(), example_code = TRUE) { env <- parent.frame(1) # #nocov start if (!length(list)) { fake <- TRUE assign("Rcpp.fake.fun", function() {}, envir = env) } else { fake <- FALSE } haveKitten <- requireNamespace("pkgKitten", quietly=TRUE) skelFunUsed <- ifelse(haveKitten, pkgKitten::kitten, package.skeleton) skelFunName <- ifelse(haveKitten, "kitten", "package.skeleton") message("\nCalling ", skelFunName, " to create basic package.") ## first let the traditional version do its business call <- match.call() call[[1]] <- skelFunUsed if ("example_code" %in% names(call)) { call[["example_code"]] <- NULL # remove the example_code argument } if (! haveKitten) { # in the package.skeleton() case if (fake) { call[["list"]] <- "Rcpp.fake.fun" } } else { if (force) { call[["force"]] <- NULL } } tryCatch(eval(call, envir = env), error = function(e) { cat(paste(e, "\n")) # print error stop(paste("error while calling `", skelFunName, "`", sep="")) }) message("\nAdding RcppEigen settings") ## now pick things up root <- file.path(path, name) ## Add Rcpp to the DESCRIPTION DESCRIPTION <- file.path(root, "DESCRIPTION") if (file.exists(DESCRIPTION)) { x <- cbind(read.dcf(DESCRIPTION), "Imports" = sprintf("Rcpp (>= %s)", packageDescription("Rcpp")[["Version"]]), "LinkingTo" = "Rcpp, RcppEigen") write.dcf(x, file = DESCRIPTION) message(" >> added Imports: Rcpp") message(" >> added LinkingTo: Rcpp, RcppEigen") } ## add a useDynLib to NAMESPACE, NAMESPACE <- file.path(root, "NAMESPACE") lines <- readLines(NAMESPACE) if (!any(grepl("useDynLib", lines))) { lines <- c(sprintf("useDynLib(%s)", name), "importFrom(Rcpp, evalCpp)", ## ensures Rcpp instantiation lines) writeLines(lines, con = NAMESPACE) message(" >> added useDynLib and importFrom directives to NAMESPACE") } ## lay things out in the src directory src <- file.path(root, "src") if (!file.exists(src)) { dir.create(src) } man <- file.path(root, "man") if (!file.exists(man)) { dir.create(man) } skeleton <- system.file("skeleton", package = "RcppEigen") Makevars <- file.path(src, "Makevars") if (!file.exists(Makevars)) { file.copy(file.path(skeleton, "Makevars"), Makevars) message(" >> added Makevars file with RcppEigen settings") } Makevars.win <- file.path(src, "Makevars.win") if (!file.exists(Makevars.win)) { file.copy(file.path(skeleton, "Makevars.win"), Makevars.win) message(" >> added Makevars.win file with RcppEigen settings") } if (example_code) { file.copy(file.path(skeleton, "rcppeigen_hello_world.cpp"), src) message(" >> added example src file using Eigen classes") file.copy(file.path(skeleton, "rcppeigen_hello_world.Rd"), man) message(" >> added example Rd file for using Eigen classes") Rcpp::compileAttributes(root) message(" >> invoked Rcpp::compileAttributes to create wrappers") } if (fake) { rm("Rcpp.fake.fun", envir = env) unlink(file.path(root, "R" , "Rcpp.fake.fun.R")) unlink(file.path(root, "man", "Rcpp.fake.fun.Rd")) } invisible(NULL) # #nocov end } RcppEigen/R/flags.R0000644000176200001440000000164112253717461013560 0ustar liggesusers## Copyright (C) 2012 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. ## ## RcppEigen is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppEigen is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppEigen. If not, see . RcppEigenCxxFlags <- function() { paste('-I"', system.file("include", package="RcppEigen"), '"', sep="") } CxxFlags <- function() { cat(RcppEigenCxxFlags()) } RcppEigen/R/SHLIB.R0000644000176200001440000000153412264631730013322 0ustar liggesusers## Copyright (C) 2011 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. ## ## RcppEigen is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppEigen is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppEigen. If not, see . #SHLIB <- Rcpp:::SHLIB.maker( # env = list( # PKG_LIBS = Rcpp:::RcppLdFlags(), # ) #) RcppEigen/MD50000644000176200001440000014120014567702434012450 0ustar liggesusersfc9f797212798f3d58a78c5d14fe2701 *ChangeLog 2e5f0d5cb56a9b23b56f8b358222836a *DESCRIPTION e42bf18a51bf55ebd814dbec277ee8a9 *LICENSE 742cc37f7265d5c5769876355e6dd8a8 *NAMESPACE 309b1ccf1c3c1dbd6cef99df62730204 *R/RcppEigen.package.skeleton.R e7f01094df303fd863645afc66135e22 *R/RcppExports.R c7aef4bfa80277e3de4994e273195d67 *R/SHLIB.R e1fac714cf161eb0e1ff6a2c8b7806d9 *R/fastLm.R 5e9a26fc37e3d6e32effd1906837a58f *R/flags.R cdd004c471ea6b4571a36553715a1310 *R/inline.R 2d97481a5230f795e6102af5eaa8873f *README.md caa25fd4c303fd80ef1a8ccf977fec9e *build/vignette.rds 122124e5c83aa22c1a1edd530dfe89d5 *cleanup 85ae8c4845d02f52b357eb46770fa686 *inst/CITATION 8b394fa87a687b21a763754e0891b7af *inst/COPYRIGHTS 1a924bfcfb137217d616fa93f34f9c63 *inst/NEWS.Rd be6e7b61cfe053492c3fd78dbb427f3b *inst/doc/RcppEigen-Introduction.R c52f521406b904dee163f3286ba491e8 *inst/doc/RcppEigen-Introduction.Rnw a42c6b19e469496606724c89e6c5bed7 *inst/doc/RcppEigen-Introduction.pdf a3f7e1e72bbe54ef5e0acaf6e0a90521 *inst/examples/lmBenchmark.R 145af623d4b14c143f728edefb421c90 *inst/include/Eigen/Cholesky f07ba76748be1d30bf6b5f780346ddc9 *inst/include/Eigen/CholmodSupport 81296a0f304f1f4c2d7683c0da4f41e8 *inst/include/Eigen/Core d83e7f96733d79a7a1a38630c4287e85 *inst/include/Eigen/Dense 97913f1cc1bc6793f4d308537075563a *inst/include/Eigen/Eigen 4a101a6893eb59648e608fb780793df4 *inst/include/Eigen/Eigenvalues 16b99e5b068f30d75b0a40ea3a51671c *inst/include/Eigen/Geometry 21f7d4b876b5e53d591f85566a1fc2a4 *inst/include/Eigen/Householder aaeeeb9a7377155d7ff520d71426c8d0 *inst/include/Eigen/IterativeLinearSolvers e57d3f4e9d7cba9138197a2a327d7290 *inst/include/Eigen/Jacobi 773905229dd2c59b07c0195014b15aad *inst/include/Eigen/KLUSupport a79c9abe1514e57f05658202a6e713fc *inst/include/Eigen/LU 003fe3d1e91833e14d42c3f51f540f55 *inst/include/Eigen/MetisSupport 8603fdfa8648b85a1799a7620b66eadd *inst/include/Eigen/OrderingMethods 6cea51382aa721a45cf1a62312072001 *inst/include/Eigen/PaStiXSupport 61734ff861c3a3996dcb7f397d00e7e1 *inst/include/Eigen/PardisoSupport fd275dac6ea9537884730a8c73387602 *inst/include/Eigen/QR 3ec0f5af1ec7173a413270ecfa454a62 *inst/include/Eigen/QtAlignedMalloc 0eecd23b6d4b5153e15ecbfd55fb802f *inst/include/Eigen/SPQRSupport 49589ccc9db3334105e1399803dba5df *inst/include/Eigen/SVD 76a6f277df269701446d49bc9b31f1e4 *inst/include/Eigen/Sparse 1e32650635cb6297b4578d30edddc59d *inst/include/Eigen/SparseCholesky 9e4037b36be07c843c2e58ba233a97a9 *inst/include/Eigen/SparseCore de0cc8dd6ed700c53b839b5e1f39b1c0 *inst/include/Eigen/SparseLU 52150dd6e9daa40ac034086650733e30 *inst/include/Eigen/SparseQR 4cce81671567f95d091ba804dabe5dea *inst/include/Eigen/StdDeque 5532886591f4b7af430ba6a959570f50 *inst/include/Eigen/StdList 8427ab22ff8235eaca7414fe91c28679 *inst/include/Eigen/StdVector e9debf276c69bf6c2069ca30c17aeea7 *inst/include/Eigen/SuperLUSupport 9dd93af950d3f4d6874967fbd71a7b33 *inst/include/Eigen/UmfPackSupport 8c58b327902c449a4c5b766f928d415d *inst/include/Eigen/src/Cholesky/LDLT.h c2ca41302cfd99797ab636105d4e4998 *inst/include/Eigen/src/Cholesky/LLT.h f59c7074c241180a9135878387652c64 *inst/include/Eigen/src/Cholesky/LLT_LAPACKE.h 75647444567fec25b0aa91ee708980d7 *inst/include/Eigen/src/CholmodSupport/CholmodSupport.h 3faa766a3f8465ee3f97763276ad3de0 *inst/include/Eigen/src/Core/ArithmeticSequence.h 1afdafdd1125921f96e01f55bfcce5f8 *inst/include/Eigen/src/Core/Array.h 73aeac050eca259eb57753f8a3d4dbdb *inst/include/Eigen/src/Core/ArrayBase.h fdccb4e49e0b7ab7e6ca96218b390573 *inst/include/Eigen/src/Core/ArrayWrapper.h ee87e13d58d537915bd0cbc466e59425 *inst/include/Eigen/src/Core/Assign.h 6f3db3cd818698c2c19e7334a25bad68 *inst/include/Eigen/src/Core/AssignEvaluator.h c67b385de56849e26fe94d7118fdbdce *inst/include/Eigen/src/Core/Assign_MKL.h 9951bfbd0ebb7388ff86ea488599a577 *inst/include/Eigen/src/Core/BandMatrix.h ca30591356b64fbcb37f72e4e0f4c8c0 *inst/include/Eigen/src/Core/Block.h ce3feecb621af9961e12507b93c78f64 *inst/include/Eigen/src/Core/BooleanRedux.h f12f1b11b61167132d75acdb4604fc85 *inst/include/Eigen/src/Core/CommaInitializer.h f8cd1f21019da8a5de7094b01055b5e2 *inst/include/Eigen/src/Core/ConditionEstimator.h cd264602b6e6d96fa79458364039e4dc *inst/include/Eigen/src/Core/CoreEvaluators.h 2170d1468014c409a911a8d75b08e101 *inst/include/Eigen/src/Core/CoreIterators.h ee3aaa93b6dafabb21b32970eda2c92b *inst/include/Eigen/src/Core/CwiseBinaryOp.h a61a85b707c1bfd91f7ec2aa73353eb9 *inst/include/Eigen/src/Core/CwiseNullaryOp.h d1b300f2db147c44e8efd3965b3e236a *inst/include/Eigen/src/Core/CwiseTernaryOp.h 7d3e4cdd6611853e42fcf7416414e677 *inst/include/Eigen/src/Core/CwiseUnaryOp.h 0a822e911e015093ae5e261a0b067e57 *inst/include/Eigen/src/Core/CwiseUnaryView.h e8eb1ff37805d18d6a26e4f821fe4099 *inst/include/Eigen/src/Core/DenseBase.h 6ce38277d5f5557b2e91d1f5059478a0 *inst/include/Eigen/src/Core/DenseCoeffsBase.h 332c4beb28938a85892ca249f9f71682 *inst/include/Eigen/src/Core/DenseStorage.h 4269cd5446dd3e1b75a7c1da1302b27c *inst/include/Eigen/src/Core/Diagonal.h d6df22ac21add88b37ce586795ddee59 *inst/include/Eigen/src/Core/DiagonalMatrix.h c98bc66c82e201fdf0d224b4b485454d *inst/include/Eigen/src/Core/DiagonalProduct.h bf80acc97cf60b9cf870554000522591 *inst/include/Eigen/src/Core/Dot.h b4c55dd945692ce504da3801a033e7d2 *inst/include/Eigen/src/Core/EigenBase.h 1991064bf2bfc62c08f74a77f7341488 *inst/include/Eigen/src/Core/ForceAlignedAccess.h aff434c5fb66db0698b41899c19c3cac *inst/include/Eigen/src/Core/Fuzzy.h 0f8d025a52ad45a6d95bad802bb1dc3c *inst/include/Eigen/src/Core/GeneralProduct.h 59e500a7ece6bb21359ee4480405f532 *inst/include/Eigen/src/Core/GenericPacketMath.h 179f3d915354a047342692cb3e5f045a *inst/include/Eigen/src/Core/GlobalFunctions.h a7c663a0183539b201679507000b5d71 *inst/include/Eigen/src/Core/IO.h dc24fe7822e71daf6606b8634d02d769 *inst/include/Eigen/src/Core/IndexedView.h d383b45610f5be5fe97085f7c94925c8 *inst/include/Eigen/src/Core/Inverse.h de260d365e311d39526b98cac59db29f *inst/include/Eigen/src/Core/Map.h cba37023ad729001bdc1f44aee8ddaf4 *inst/include/Eigen/src/Core/MapBase.h ad2bd2b246eed7dd0cc6afe0ef9caf80 *inst/include/Eigen/src/Core/MathFunctions.h 8e1afc569876c4a7c093e9c4c7078b44 *inst/include/Eigen/src/Core/MathFunctionsImpl.h 0622be8830768a011865cf92f28f815d *inst/include/Eigen/src/Core/Matrix.h 8c876a9c927aa3c0b86081d9ee1be2a6 *inst/include/Eigen/src/Core/MatrixBase.h 196d298b6e41f0cbe11615a7ae9c5c03 *inst/include/Eigen/src/Core/NestByValue.h 32bf81920e341e78e1a3a191f08d2911 *inst/include/Eigen/src/Core/NoAlias.h 4496eeb0b722906c6a7ae2fc2e6b52cd *inst/include/Eigen/src/Core/NumTraits.h 2e6f187c558a152de2ec80ba965f7952 *inst/include/Eigen/src/Core/PartialReduxEvaluator.h 0cd11cbf270eca32c38f912090b5f694 *inst/include/Eigen/src/Core/PermutationMatrix.h 69a4fcd14ddf17acbf51862df3a76248 *inst/include/Eigen/src/Core/PlainObjectBase.h 22564df60904e7991ebd0b8148a3b2cd *inst/include/Eigen/src/Core/Product.h 5c1e8be423fcfaf719cd0071bf0cada0 *inst/include/Eigen/src/Core/ProductEvaluators.h d98ef910498ce2862a36134065437180 *inst/include/Eigen/src/Core/Random.h 6805acca17aa2ee1a4415f99e1c5e2ad *inst/include/Eigen/src/Core/Redux.h 9fdd22aee2e1405dac11e7c89ef21ea2 *inst/include/Eigen/src/Core/Ref.h 1fe4a31f7d0ae07750f675fde8c50077 *inst/include/Eigen/src/Core/Replicate.h 666109da17a24240328399d3c83a52ad *inst/include/Eigen/src/Core/Reshaped.h 2aff31f4dd9ba727f663018b6fb64b15 *inst/include/Eigen/src/Core/ReturnByValue.h 6dd655897dd69f6dfe0c80a3f15d570d *inst/include/Eigen/src/Core/Reverse.h 380a770ebe58c88373bed39084352fa1 *inst/include/Eigen/src/Core/Select.h 92c84ac9f3d5f082087324f35fd43ca2 *inst/include/Eigen/src/Core/SelfAdjointView.h 897d2ce94b0417abf2123ce3733a8310 *inst/include/Eigen/src/Core/SelfCwiseBinaryOp.h a9138fe508e2c1d3ebde81ea10f346ff *inst/include/Eigen/src/Core/Solve.h 44fc8adb3344a7c66ec79740e6c73e42 *inst/include/Eigen/src/Core/SolveTriangular.h 1728669436f06d2217a1acbc25e0652d *inst/include/Eigen/src/Core/SolverBase.h 6ca07f4ff21a829a66d5f4961b563f41 *inst/include/Eigen/src/Core/StableNorm.h 8f84953a86a0c369dd30ab9ace48a4e4 *inst/include/Eigen/src/Core/StlIterators.h 434caa1c6d2cac678733bdcb56f87955 *inst/include/Eigen/src/Core/Stride.h 4e1eccdaee1bfcd6cc4a18fa83f79d07 *inst/include/Eigen/src/Core/Swap.h c5d3e32c95df0538050042ca5727e1c6 *inst/include/Eigen/src/Core/Transpose.h 7a707e09534882ef199ed88da23743c4 *inst/include/Eigen/src/Core/Transpositions.h 4b9d138e539834ecdddc8b86ce5eb3b2 *inst/include/Eigen/src/Core/TriangularMatrix.h c90e5017218d387169e1de632bafe989 *inst/include/Eigen/src/Core/VectorBlock.h c44a3977b15c671570a9c40f162c16ae *inst/include/Eigen/src/Core/VectorwiseOp.h e3949c304fd0f0b8d818f55d496592eb *inst/include/Eigen/src/Core/Visitor.h eccf708f1cc23919cba245c5f7b26730 *inst/include/Eigen/src/Core/arch/AVX/Complex.h bb62aaffbdaad0e79a35e1295649abeb *inst/include/Eigen/src/Core/arch/AVX/MathFunctions.h 1ef051a7272c6f962ddd46a06adf9b9a *inst/include/Eigen/src/Core/arch/AVX/PacketMath.h 26a07d215c4953e4f65df781c22ac6bd *inst/include/Eigen/src/Core/arch/AVX/TypeCasting.h d5f4d27595402694f2ad29417ccbf454 *inst/include/Eigen/src/Core/arch/AVX512/Complex.h 76adc7d1ac14e572b03170346f1b3a9e *inst/include/Eigen/src/Core/arch/AVX512/MathFunctions.h 3672c75c894e6d4972d01f6da6eca094 *inst/include/Eigen/src/Core/arch/AVX512/PacketMath.h 3b06eaf49c05a3674b06363001cb4ebf *inst/include/Eigen/src/Core/arch/AVX512/TypeCasting.h 57f01fb49d12aab575f3a1d9f9d27eef *inst/include/Eigen/src/Core/arch/AltiVec/Complex.h b1ff1c2373e425bbb5a1d74ca3603669 *inst/include/Eigen/src/Core/arch/AltiVec/MathFunctions.h 244915bdb5006bd4c323336b1f543954 *inst/include/Eigen/src/Core/arch/AltiVec/MatrixProduct.h 1ece1c8c380758be36bde88eefdf4d18 *inst/include/Eigen/src/Core/arch/AltiVec/MatrixProductCommon.h 2ffdf1cf62c4ef756f4ca2c5478ea916 *inst/include/Eigen/src/Core/arch/AltiVec/MatrixProductMMA.h 3f2b25fb5cc4cb83bc5287a78d8c7593 *inst/include/Eigen/src/Core/arch/AltiVec/PacketMath.h 48834752aafbf0c1eed78d591ea5ce43 *inst/include/Eigen/src/Core/arch/CUDA/Complex.h b125897b051b64f28a6ad4a24936d6b2 *inst/include/Eigen/src/Core/arch/Default/BFloat16.h e50eb2bf2d022bb3121deca2b32cecc8 *inst/include/Eigen/src/Core/arch/Default/ConjHelper.h 3973af46054bdab3588a455d07794c94 *inst/include/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h 0b153cd08b1e89c4872863687f915560 *inst/include/Eigen/src/Core/arch/Default/GenericPacketMathFunctionsFwd.h b768042c101bf7279ca54ccbe198ce74 *inst/include/Eigen/src/Core/arch/Default/Half.h 1f5048f878e6a99bb02a552b5d64acc6 *inst/include/Eigen/src/Core/arch/Default/Settings.h dbdb97c9e0db96cccc275a9e5d0cedaa *inst/include/Eigen/src/Core/arch/Default/TypeCasting.h 331f06b2a86d9a7e397638641b52f96e *inst/include/Eigen/src/Core/arch/GPU/MathFunctions.h 4a178bdb2141cf9df1e91482ebc62bfc *inst/include/Eigen/src/Core/arch/GPU/PacketMath.h 655f92633b865e13c43b3232d6dcbde1 *inst/include/Eigen/src/Core/arch/GPU/TypeCasting.h 465706ce6839f39d71ce56515627d14d *inst/include/Eigen/src/Core/arch/HIP/hcc/math_constants.h 837da7ecef71c2b87f92226aaa1ee0d6 *inst/include/Eigen/src/Core/arch/MSA/Complex.h 120f702e71eca10ac75ebb5995530a9f *inst/include/Eigen/src/Core/arch/MSA/MathFunctions.h 4ab06a761f8ac6a64ea71d700c859e7f *inst/include/Eigen/src/Core/arch/MSA/PacketMath.h 49542657bc09b2a1e702dec391a79aab *inst/include/Eigen/src/Core/arch/NEON/Complex.h 5f156a8c03925d154882b995722d6863 *inst/include/Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h 272c6d3940697697b599a73cc2813533 *inst/include/Eigen/src/Core/arch/NEON/MathFunctions.h 906be4dd71b459ddbc93e1ab4a418c72 *inst/include/Eigen/src/Core/arch/NEON/PacketMath.h 67a19bf44a9e155ec745307c7a2ef537 *inst/include/Eigen/src/Core/arch/NEON/TypeCasting.h 8b516b2428b1cc51620837772ca4e06e *inst/include/Eigen/src/Core/arch/SSE/Complex.h ac0dfbd09583027880e951536f42e09b *inst/include/Eigen/src/Core/arch/SSE/MathFunctions.h 53fae523e6473b2b427ca3b06eff8588 *inst/include/Eigen/src/Core/arch/SSE/PacketMath.h 288defcacab8be5efb5de016099ce69c *inst/include/Eigen/src/Core/arch/SSE/TypeCasting.h 0abdec9e38825f2994a5b04cc658e050 *inst/include/Eigen/src/Core/arch/SVE/MathFunctions.h 3078718a350c99baa7266e53715076eb *inst/include/Eigen/src/Core/arch/SVE/PacketMath.h 73eab75c1c0a1c209bdf90b27b0d0367 *inst/include/Eigen/src/Core/arch/SVE/TypeCasting.h 9494c07dcfe0b91c52ff975314ef21db *inst/include/Eigen/src/Core/arch/SYCL/InteropHeaders.h 29d2c00fcc5424eeb836cecc258a06cc *inst/include/Eigen/src/Core/arch/SYCL/MathFunctions.h 55976f063f0bc6c03a9e4fcc47282fd1 *inst/include/Eigen/src/Core/arch/SYCL/PacketMath.h c9ae7e85fcb856c7a8d9f66cb2cce4fc *inst/include/Eigen/src/Core/arch/SYCL/SyclMemoryModel.h 2b42e5034c211881e453fdb5acabde12 *inst/include/Eigen/src/Core/arch/SYCL/TypeCasting.h 36c5dc8e86d6332dcc1cefa8224cdbf7 *inst/include/Eigen/src/Core/arch/ZVector/Complex.h 875f0c30c8ab7a39fbc03238929354ec *inst/include/Eigen/src/Core/arch/ZVector/MathFunctions.h 7bc05c9230c129ca8dda453d5f0a3170 *inst/include/Eigen/src/Core/arch/ZVector/PacketMath.h 3173559aeda799ea7110b0b395f020b4 *inst/include/Eigen/src/Core/functors/AssignmentFunctors.h 26fdd4b0e1f992a264af84429e92ad2d *inst/include/Eigen/src/Core/functors/BinaryFunctors.h 887548a06d7e5cc00d6874474eb80501 *inst/include/Eigen/src/Core/functors/NullaryFunctors.h 5707388b8989c2ef61207097280a4f6e *inst/include/Eigen/src/Core/functors/StlFunctors.h bcde622f6e8f3fa6a22feccb4907518f *inst/include/Eigen/src/Core/functors/TernaryFunctors.h 35eef13d14f9fb343215728840bc643a *inst/include/Eigen/src/Core/functors/UnaryFunctors.h 6f3af820c5dd67d16dfed3285be0fd57 *inst/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h 462bfeb265eabe911224d5ec0506736d *inst/include/Eigen/src/Core/products/GeneralMatrixMatrix.h 65cebdb846d5c9af2d5bb38f1b0f2745 *inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h c05ed3f002a96a84639364e8b0163e92 *inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_BLAS.h ba2f515e6a5b7b480d2b30fc6993668a *inst/include/Eigen/src/Core/products/GeneralMatrixMatrix_BLAS.h 517e666638d03f2c95a7a3cef9b623b8 *inst/include/Eigen/src/Core/products/GeneralMatrixVector.h cdf5c4721b372237ca45c6792d497fa4 *inst/include/Eigen/src/Core/products/GeneralMatrixVector_BLAS.h ef62740fc4b14576f4ddeecf7b89886e *inst/include/Eigen/src/Core/products/Parallelizer.h 48090a4f6107df8d103747624e7b4449 *inst/include/Eigen/src/Core/products/SelfadjointMatrixMatrix.h 4387b005442d80b4d6d6f8b97ed19a46 *inst/include/Eigen/src/Core/products/SelfadjointMatrixMatrix_BLAS.h f53417133b66fe42c4ed4f8fd04e84f8 *inst/include/Eigen/src/Core/products/SelfadjointMatrixVector.h b3c4de4cc9058b7d198c9573b1825112 *inst/include/Eigen/src/Core/products/SelfadjointMatrixVector_BLAS.h 72ab2c1e9288bf204fa2c38277df19ec *inst/include/Eigen/src/Core/products/SelfadjointProduct.h 2f82a286df83e9d56a58d27d7a175241 *inst/include/Eigen/src/Core/products/SelfadjointRank2Update.h 2b64023bcb0d4ba24e0545ddf23c05ea *inst/include/Eigen/src/Core/products/TriangularMatrixMatrix.h 16750354416672507c557117b18adc91 *inst/include/Eigen/src/Core/products/TriangularMatrixMatrix_BLAS.h 40e9483b9c1cd7614cce6fe433cf37f5 *inst/include/Eigen/src/Core/products/TriangularMatrixVector.h 2b3c7391c827c88987c76f199d84c839 *inst/include/Eigen/src/Core/products/TriangularMatrixVector_BLAS.h d84f957ec602fd796856a227eaab8196 *inst/include/Eigen/src/Core/products/TriangularSolverMatrix.h 8b85110f876cb655ce064e1d73040a97 *inst/include/Eigen/src/Core/products/TriangularSolverMatrix_BLAS.h d9efe059a2f494eeb743dd3b5dce326d *inst/include/Eigen/src/Core/products/TriangularSolverVector.h 070a882fde6ab2a7ccf944fbd53c07ba *inst/include/Eigen/src/Core/util/BlasUtil.h 54c8950e010de8b8e2e6d976ff327e71 *inst/include/Eigen/src/Core/util/ConfigureVectorization.h 524daabe8b0ad6b04157d460b45dc71c *inst/include/Eigen/src/Core/util/Constants.h d653b434e6e09d354cf77f131d38ba1c *inst/include/Eigen/src/Core/util/DisableStupidWarnings.h 5321f87825811b679ba086d119636a79 *inst/include/Eigen/src/Core/util/ForwardDeclarations.h e8a432ad83dd1e332510fe10e2091b33 *inst/include/Eigen/src/Core/util/IndexedViewHelper.h 7680c8bd0ee18e19577b1b0c7380eda4 *inst/include/Eigen/src/Core/util/IntegralConstant.h 18aae1e5c1824fdd615691e36584195f *inst/include/Eigen/src/Core/util/MKL_support.h 35ca1e12bda85d53a1622684f452cb0f *inst/include/Eigen/src/Core/util/Macros.h 8a980fda9472eee25b88fe8bdf3f47df *inst/include/Eigen/src/Core/util/Memory.h 5091d09f469f1105e8e0b90a03f1e1f3 *inst/include/Eigen/src/Core/util/Meta.h c4ca2e16c302c71df98732240109e7e0 *inst/include/Eigen/src/Core/util/NonMPL2.h be45859b38e024d8c0823bf6e293f22d *inst/include/Eigen/src/Core/util/ReenableStupidWarnings.h 58b03232b4fa1cc3f2538e9cf6290f6c *inst/include/Eigen/src/Core/util/ReshapedHelper.h 3b585e1b5e5da798cafae55fc6d4cb26 *inst/include/Eigen/src/Core/util/StaticAssert.h 90b5ebe2dd38c421f22f0e4346220128 *inst/include/Eigen/src/Core/util/SymbolicIndex.h 32cd41fd1ed3bd3fd615f9ed56b0e1e8 *inst/include/Eigen/src/Core/util/XprHelper.h 30055c308d945119bc025b0811b0f83a *inst/include/Eigen/src/Eigenvalues/ComplexEigenSolver.h af23a6147de0154039ea815d3ed9ac81 *inst/include/Eigen/src/Eigenvalues/ComplexSchur.h 2555626edfca98529090a0997c336abf *inst/include/Eigen/src/Eigenvalues/ComplexSchur_LAPACKE.h baa22346cdf27a59648055aa445edbfe *inst/include/Eigen/src/Eigenvalues/EigenSolver.h b81925277584de764f8d31b0b02fc584 *inst/include/Eigen/src/Eigenvalues/GeneralizedEigenSolver.h b5b7fdfc0837ab65092a36e9861289e9 *inst/include/Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h b29bd73a0fdb3acaa55b90057846bbd0 *inst/include/Eigen/src/Eigenvalues/HessenbergDecomposition.h 89ee9085ead2ae20e35b181408377f83 *inst/include/Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h 544b0af45362ef8848955ff6e6cd4118 *inst/include/Eigen/src/Eigenvalues/RealQZ.h 5624e33cbaa28ee63c46dd7ae764ca22 *inst/include/Eigen/src/Eigenvalues/RealSchur.h 10c5af47ddd5e16ccb7a320dcd600492 *inst/include/Eigen/src/Eigenvalues/RealSchur_LAPACKE.h f5a29f0450d35f396c9001818a6ed431 *inst/include/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h eaca77e68a4289a23230708222d4d39c *inst/include/Eigen/src/Eigenvalues/SelfAdjointEigenSolver_LAPACKE.h f279721c98dc0136651ee180cdc83864 *inst/include/Eigen/src/Eigenvalues/Tridiagonalization.h df9023c12bca691bf96ccd1f6aa46787 *inst/include/Eigen/src/Geometry/AlignedBox.h a68f17dafd98a9e53d09c2f2cf9a26c7 *inst/include/Eigen/src/Geometry/AngleAxis.h a6a60dec0043515f0fb9e01708a500cb *inst/include/Eigen/src/Geometry/EulerAngles.h 0aeaf85793e7a6d1bec92e755a2228c3 *inst/include/Eigen/src/Geometry/Homogeneous.h 8f73fb2f90452392f75687a82c50d6d3 *inst/include/Eigen/src/Geometry/Hyperplane.h 385c034ba5489035f82962e9900604a0 *inst/include/Eigen/src/Geometry/OrthoMethods.h 6a89a934a1907e21785445aefb6676f4 *inst/include/Eigen/src/Geometry/ParametrizedLine.h 6167d1c4fe02187fce8711029b4300e5 *inst/include/Eigen/src/Geometry/Quaternion.h 9761fa57781c85a08a91d5add80591a7 *inst/include/Eigen/src/Geometry/Rotation2D.h ac623d0fac2a116d9c06b23c8980d0c7 *inst/include/Eigen/src/Geometry/RotationBase.h a2ca635a91c609b7cc04b9f93fb63557 *inst/include/Eigen/src/Geometry/Scaling.h 945a726ad581c2094e9c2df50a0b35b0 *inst/include/Eigen/src/Geometry/Transform.h 1d901c1f895f7deeb3064e2fa40797d9 *inst/include/Eigen/src/Geometry/Translation.h 96e84c0d16e8ab0b86a30a5dc0170a8d *inst/include/Eigen/src/Geometry/Umeyama.h 4a1990f7e43fc51601dc5b45d3588aa9 *inst/include/Eigen/src/Geometry/arch/Geometry_SIMD.h e616b6c79f8585000c341130304fb505 *inst/include/Eigen/src/Householder/BlockHouseholder.h eafe0af2a8a3f6d31c7a9e9289db091c *inst/include/Eigen/src/Householder/Householder.h 9964e4c6105e11ea30800379a9c9c42c *inst/include/Eigen/src/Householder/HouseholderSequence.h bcecbad4f12eadff9c4a8c9e38598238 *inst/include/Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h e17a6edbfcdad1e6dc347a1f07ca8378 *inst/include/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h 6d26e8ed13906e3ee69a8a194630d795 *inst/include/Eigen/src/IterativeLinearSolvers/ConjugateGradient.h b8cbc38f82fea3b5f847c4e0ab5275ea *inst/include/Eigen/src/IterativeLinearSolvers/IncompleteCholesky.h 750ce867811ca64e5c99a9969b42755a *inst/include/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h 03227d1d680fb5e1d36105aca9ac2433 *inst/include/Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h ef29a7509ad32bdfb424e93f1f27c2ac *inst/include/Eigen/src/IterativeLinearSolvers/LeastSquareConjugateGradient.h 4c6cebdd36c8399f03fea7d9fedb268a *inst/include/Eigen/src/IterativeLinearSolvers/SolveWithGuess.h e6e33db6c0e9404873f5c4f0cf35affa *inst/include/Eigen/src/Jacobi/Jacobi.h 03f84ae4be7cf89985c79fb0bcc05fbe *inst/include/Eigen/src/KLUSupport/KLUSupport.h 7ffff43e112f6635eb0c455cf39822f1 *inst/include/Eigen/src/LU/Determinant.h 5bd14d969773d1d48178b80d986b2872 *inst/include/Eigen/src/LU/FullPivLU.h 3f2fc7624ca164d4f6884e300e35e37b *inst/include/Eigen/src/LU/InverseImpl.h 2db64e29d2ab949f9e91ee931050d96d *inst/include/Eigen/src/LU/PartialPivLU.h 9e5f31822442bfdb277a0f78b183306f *inst/include/Eigen/src/LU/PartialPivLU_LAPACKE.h c90df35abb13baa4d77cbc2d960b64ab *inst/include/Eigen/src/LU/arch/InverseSize4.h 5dbb0113de10441d8d1f996fb9cd085e *inst/include/Eigen/src/MetisSupport/MetisSupport.h 8de296e44450afd05c27d56e1b9601fd *inst/include/Eigen/src/OrderingMethods/Amd.h 665b2e5eddcb2cc874b9ebc373899d91 *inst/include/Eigen/src/OrderingMethods/Eigen_Colamd.h d72f08e34d076a79587d7ec20dcccb6d *inst/include/Eigen/src/OrderingMethods/Ordering.h 88e3776a970754fa5cc7f38160bd725a *inst/include/Eigen/src/PaStiXSupport/PaStiXSupport.h eccb80bd7c73941aa0ce3bf85736db46 *inst/include/Eigen/src/PardisoSupport/PardisoSupport.h 01312cbe66424f2377dc45fecb0b155b *inst/include/Eigen/src/QR/ColPivHouseholderQR.h 5549163935d111d6ea09c19bb848b696 *inst/include/Eigen/src/QR/ColPivHouseholderQR_LAPACKE.h c2bc6c679aa91f0d0e0e926bea74461c *inst/include/Eigen/src/QR/CompleteOrthogonalDecomposition.h 39d8d91a9197583b3bd08b9f907b66b9 *inst/include/Eigen/src/QR/FullPivHouseholderQR.h 5ee87087db6ee3b64ac05b54424e38c0 *inst/include/Eigen/src/QR/HouseholderQR.h 5a6fb56c5132fd49451bd9c2da752833 *inst/include/Eigen/src/QR/HouseholderQR_LAPACKE.h fca07660088d80bae414808ff49e8553 *inst/include/Eigen/src/SPQRSupport/SuiteSparseQRSupport.h 1f1de7a316f6deefc12ef268353c33a1 *inst/include/Eigen/src/SVD/BDCSVD.h 87c2dc197df35a2d43d76e5a11bbd967 *inst/include/Eigen/src/SVD/JacobiSVD.h 80266065e69cd87a54ab72cf61fd8c75 *inst/include/Eigen/src/SVD/JacobiSVD_LAPACKE.h fd337c9be2444b6da9d69dcaffbb7b14 *inst/include/Eigen/src/SVD/SVDBase.h 30588fee4028392fada31352f75aadfc *inst/include/Eigen/src/SVD/UpperBidiagonalization.h ee59e519e968229b1ccbabf532ca2cac *inst/include/Eigen/src/SparseCholesky/SimplicialCholesky.h c1876efaf959f6a7f66d0bf8bd572eee *inst/include/Eigen/src/SparseCholesky/SimplicialCholesky_impl.h c2a3e3e4dca66c99172c172cafcbeffc *inst/include/Eigen/src/SparseCore/AmbiVector.h f0b3947776d1f81fceb142e77f4cc96a *inst/include/Eigen/src/SparseCore/CompressedStorage.h eb0e2921618f7a0d03df1b8e56cfd6e9 *inst/include/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h 0884d5b081e7f51dc171dd1800202384 *inst/include/Eigen/src/SparseCore/MappedSparseMatrix.h c97932af3160f16b0952d0a0b064e73a *inst/include/Eigen/src/SparseCore/SparseAssign.h d2517b9b4359dffcb5d72ad1f90e36b1 *inst/include/Eigen/src/SparseCore/SparseBlock.h b2f2001a8c8a897cffd4bdab6a633af0 *inst/include/Eigen/src/SparseCore/SparseColEtree.h 4f9df1f9f74d642abcae0fefb9d80c33 *inst/include/Eigen/src/SparseCore/SparseCompressedBase.h d02bdcfbacec3837b3b38702e6ee23ad *inst/include/Eigen/src/SparseCore/SparseCwiseBinaryOp.h 50b39b36aa74a4e8bb575769714a782d *inst/include/Eigen/src/SparseCore/SparseCwiseUnaryOp.h 51daae3ce550d9acbf4ac00a53c5f41a *inst/include/Eigen/src/SparseCore/SparseDenseProduct.h a5b0bea9c6fba7de90d0d9d13ccc1ba8 *inst/include/Eigen/src/SparseCore/SparseDiagonalProduct.h 70c41e25cddfc6f434c3b538f1408107 *inst/include/Eigen/src/SparseCore/SparseDot.h 3286f132a4926940c987390f47be3d7a *inst/include/Eigen/src/SparseCore/SparseFuzzy.h df51c9241f0b09618e91429df9152562 *inst/include/Eigen/src/SparseCore/SparseMap.h 44a3e99834d97901640edf78f7a2dd27 *inst/include/Eigen/src/SparseCore/SparseMatrix.h b969205a8ee249ff6630ed45bf98ab73 *inst/include/Eigen/src/SparseCore/SparseMatrixBase.h fbc26e89dfb89c64081ce2ce645d399c *inst/include/Eigen/src/SparseCore/SparsePermutation.h c9f504ba9d93a52833f8d33e499d4de0 *inst/include/Eigen/src/SparseCore/SparseProduct.h a158561b3c68f135e55a18df9a07e52c *inst/include/Eigen/src/SparseCore/SparseRedux.h cdc958c4134408cfba40648faa92b900 *inst/include/Eigen/src/SparseCore/SparseRef.h 97f7b4a6934f9c75bf782f4c849d7cc3 *inst/include/Eigen/src/SparseCore/SparseSelfAdjointView.h c97f6e2a1e4958ad9e0be90050bcdcfc *inst/include/Eigen/src/SparseCore/SparseSolverBase.h a755b2ac228e2eb0a0fa97773100df3f *inst/include/Eigen/src/SparseCore/SparseSparseProductWithPruning.h f79edcfc05ae044bd9c3ff584b0a8951 *inst/include/Eigen/src/SparseCore/SparseTranspose.h b0cd41339687152b78311edce045fece *inst/include/Eigen/src/SparseCore/SparseTriangularView.h 21ef1846ac0f10b18eb5b25a7853c996 *inst/include/Eigen/src/SparseCore/SparseUtil.h fa560763cd7a5bb20d5878698a4c6243 *inst/include/Eigen/src/SparseCore/SparseVector.h 2bd75163656dd29db5bab551f6890913 *inst/include/Eigen/src/SparseCore/SparseView.h 1ba1ec01dcd995c58e4b4c0f08c80c97 *inst/include/Eigen/src/SparseCore/TriangularSolver.h 0e066407863da1f2b319925190dbcf39 *inst/include/Eigen/src/SparseLU/SparseLU.h cbfff2f704bcf8ca688daf09ff50947c *inst/include/Eigen/src/SparseLU/SparseLUImpl.h 6d0afb3241a3d73846d164c342954b29 *inst/include/Eigen/src/SparseLU/SparseLU_Memory.h 3d4309877f3eb4325af2643aab1e23a9 *inst/include/Eigen/src/SparseLU/SparseLU_Structs.h 706ae5c3d9293e71ab54729ce6d998d8 *inst/include/Eigen/src/SparseLU/SparseLU_SupernodalMatrix.h 8aaa285321fd675fea2b4d31754e0f95 *inst/include/Eigen/src/SparseLU/SparseLU_Utils.h 5dff2c3c59cb15e6df2da1b0e66145a7 *inst/include/Eigen/src/SparseLU/SparseLU_column_bmod.h 30dab09f618e660469bfa3ee64adad41 *inst/include/Eigen/src/SparseLU/SparseLU_column_dfs.h 551299f37e2ee9640ff95769358f79b2 *inst/include/Eigen/src/SparseLU/SparseLU_copy_to_ucol.h 594d09438b2bada027b73e9bd734944a *inst/include/Eigen/src/SparseLU/SparseLU_gemm_kernel.h e3c46328e66d4f410da2474155a9874f *inst/include/Eigen/src/SparseLU/SparseLU_heap_relax_snode.h c1e457785aca12c411e4d4f1604982fb *inst/include/Eigen/src/SparseLU/SparseLU_kernel_bmod.h 28a13019f675d1221e1743af48382a10 *inst/include/Eigen/src/SparseLU/SparseLU_panel_bmod.h 76b774de3134b8136b8b869f2c18a9b5 *inst/include/Eigen/src/SparseLU/SparseLU_panel_dfs.h e40b8ca24a6f9a1cd81995818acff05a *inst/include/Eigen/src/SparseLU/SparseLU_pivotL.h 98a63ced95b43ba2ef87d654555d6a11 *inst/include/Eigen/src/SparseLU/SparseLU_pruneL.h cfba7ff37fee21d1a94d8b5e86d06be0 *inst/include/Eigen/src/SparseLU/SparseLU_relax_snode.h 37c8bddfc7afd91acfb2377195237725 *inst/include/Eigen/src/SparseQR/SparseQR.h 3ed7e7cc49cc30d11ddffe2140c35f11 *inst/include/Eigen/src/StlSupport/StdDeque.h 5f37a7a9ca8db822562aee07b9f0e91a *inst/include/Eigen/src/StlSupport/StdList.h a18c9208acd9db9ecc9643eebe051c74 *inst/include/Eigen/src/StlSupport/StdVector.h 4ca2b8ce970639b16e3607774bd0e0c1 *inst/include/Eigen/src/StlSupport/details.h 64f57bfac851f3a8f748a0283abc2688 *inst/include/Eigen/src/SuperLUSupport/SuperLUSupport.h 883329bb82305c48528a1084a43ff399 *inst/include/Eigen/src/UmfPackSupport/UmfPackSupport.h c850487641ece48cb76f0ed6eb162b55 *inst/include/Eigen/src/misc/Image.h 204aec7b6cbd46a40fecb67549751f32 *inst/include/Eigen/src/misc/Kernel.h 62b9105c4ceb13dd8f7747adb6019a3a *inst/include/Eigen/src/misc/RealSvd2x2.h 5c3a7f49587dc7962f1d6a3905ab75eb *inst/include/Eigen/src/misc/blas.h 6b21f68118e06a562cf64f3e385d5e0d *inst/include/Eigen/src/misc/lapack.h a048ab6772f93f28d18bfef8f98d0f7d *inst/include/Eigen/src/misc/lapacke.h 78a253bca660904aae9b45474fbc88bb *inst/include/Eigen/src/misc/lapacke_mangling.h 42429baaf47ae7b289ebe6889baefaa6 *inst/include/Eigen/src/plugins/ArrayCwiseBinaryOps.h 6deec1ad021076ea6ab50d8455f9305e *inst/include/Eigen/src/plugins/ArrayCwiseUnaryOps.h 85e04e86e084a6ab310701e08ef738db *inst/include/Eigen/src/plugins/BlockMethods.h 0a4e2987ce97a2988f02e202d8d721dd *inst/include/Eigen/src/plugins/CommonCwiseBinaryOps.h 31b1395d3465cb91f22f4abd59e10ff2 *inst/include/Eigen/src/plugins/CommonCwiseUnaryOps.h 04321efbca30d74a62e9f793ce76ee35 *inst/include/Eigen/src/plugins/IndexedViewMethods.h 57a3069a059090096d36532f1c7f5be1 *inst/include/Eigen/src/plugins/MatrixCwiseBinaryOps.h c2c96628d2be989f2809f8d90919c483 *inst/include/Eigen/src/plugins/MatrixCwiseUnaryOps.h 029331e21a26ca36f285500c6e7c0d2c *inst/include/Eigen/src/plugins/ReshapedMethods.h ce0bc155c278e78c32ae2ba83875dc8c *inst/include/RcppEigen.h 867e96baa1693665406ff7722386f2c1 *inst/include/RcppEigenCholmod.h af3eafe4bdce7a2a731af54f076d78b1 *inst/include/RcppEigenForward.h 2ff59c7f138b4e564e77c59f140761ca *inst/include/RcppEigenStubs.cpp 6fe9907c0c098413251d6aa4d0331d3e *inst/include/RcppEigenWrap.h 24ded1f60f4ce0ed3818f66cadf5adc8 *inst/include/unsupported/Eigen/AdolcForward e99e523bbb6c2dc85615f392a568627f *inst/include/unsupported/Eigen/AlignedVector3 e961d90876b34128336eb25e9344705a *inst/include/unsupported/Eigen/ArpackSupport f9b4acb68a0603f67222c3e48cd64b57 *inst/include/unsupported/Eigen/AutoDiff 2ab4f5b6ca6547bded20a5b7859e56f1 *inst/include/unsupported/Eigen/BVH 9898870dac5a80296ac469ea9ebfd3cb *inst/include/unsupported/Eigen/CXX11/Tensor 2e55a9ee72ff54c6b84b86d9c456a7ce *inst/include/unsupported/Eigen/CXX11/TensorSymmetry c8ca91a3a4623e4ef3d2770631826e61 *inst/include/unsupported/Eigen/CXX11/ThreadPool 72834521c8c429d3e59be510cef34c13 *inst/include/unsupported/Eigen/CXX11/src/Tensor/README.md 2ad2419cbab9f2db679d5c0e286ae4e4 *inst/include/unsupported/Eigen/CXX11/src/Tensor/Tensor.h b20dd2bcb79295ca469d46a6f8eae872 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h e9f80fbef1898da59f0dc9f8af0ae4b5 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h 50fb858dd166f85773bc84eaf4ce2a8d *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBase.h 341d36c4e71b4cbaf70de16d11127738 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h d22ed87f0d91893e4498a7a67bf5eb9e *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h 3059c9c6732e156926c22f1d42018cab *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h ebe0fb5ae5f6116dd46e1786cc973f01 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h baad1e7051fc5ff4d558a0273c860a43 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h 9605c70ee69b8fe6dc9eee591ee34a77 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionBlocking.h d8a47e6835de083022a09cb80b5ecf7c *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h c0981d2f7c139348facd4b84b4acbfb7 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionGpu.h 1b50fa39db16999e88f4a42177082c75 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMapper.h c7a184b1ff0c2eea46b5ee4ba3a72570 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionSycl.h f55836eb433eb9c52770af5b1e28d126 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h d9e9a66f6b5a3f582a584a2cbba463fb *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h c43f7bf1924a1e49c73e67254d1e62ad *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h 92551b80060266683d82d9eaedca0a63 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolutionSycl.h b84589099123afb4e557456325dc1876 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorCostModel.h fad761ed911a3cadd3b0d6f56be3ed2b *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h 59a83feb1475732537ea3a14999e5b13 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h e9dab2e49bec95a42e5de9fd439465bf *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceCuda.h 753de1d90165c4d2e20bc941a2b2b9a8 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceDefault.h b35cb44ff9d1391af5639974ba177d0b *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceGpu.h be0fc410f8644f0a958c4fb38ca4e10c *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceSycl.h cd1265f55d196f3752f2e2d1d294bd03 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h 857b647f53809b09ad1c899a708d36b5 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h 49b0a87a14f3f7b2133bef53a9c3b228 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h 59d5e9d341db08f4c239eab7c17e9612 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h aa7891f16542853daa328070347199a6 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h bf10253eca39f1226e1f870716af5881 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h b0d35804998980bbcae849f460908c6c *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h fecc8bfce4334f9ed5ec3f074b58b869 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h 5afe7b5622e1a44a0114854b07af2ed7 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h acdb14779e7fe7d2554fa4eb5128e53a *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h 0f1b2555a9eb2f112f02e7368a715273 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h b2a3f7c4b2b5c25a3f25fe36bb7a893e *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h ce1a927a7639b74066444665c31471a0 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h 54d3a782295491a6eccbe0eeb372ea7b *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGlobalFunctions.h 283c86d12a861d4bf771bbacbd976044 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGpuHipCudaDefines.h f683315262a2d02680801567ad4c4922 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGpuHipCudaUndefines.h 758cdf0ded97dde946991d7defb652c4 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIO.h 6acb57fbae8fcf48c79dea5ace827b87 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h 34d78d2dbd9e9d21673f1ca14bfc53f2 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h 33bd3d5b7968992c565992a8e544be35 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h cb90de121809bc1942168c1ceec0d27f *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h aa2e92edf6e9bed3c9c9b1e2bf5b1d2b *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h 2c5c07bb910c8da488cf23017cfa59d4 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h 2d517757ba82dc213b90aa035870983b *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMacros.h 5ccfa8a1d9c554c164ba1ea19b5bc650 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h e63919a1c6e5e84190d020c01751a67c *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h 3305a0b609ae2c46755182fbb1fa1caf *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h 5694ccb891d11c4bd3be9e6df6e69f03 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h 29d15195b83f13e267daa115d1977648 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h 2b43d06e57e732602ca8335da5331262 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h cb63b27f17492e5b08679304b392597c *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h 4c282ab2f0237c96f1976535b6d22852 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h d0400712aaeede05b108a1f7ee67a63a *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h fb4e175446163f5df3d821979ca27f32 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionSycl.h 1b2df14e0f7ff8c8a2f28a782c60ebc8 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorRef.h fda02c0b40acef943a0f9257e195a4cd *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h 8bb7e61b5da97fe6e65f72cdf76b70cb *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorScan.h 72dc6d3c8694b68013ba6e19c2dec369 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorScanSycl.h d43b96ac8a1744dd1fae1b930fd2c135 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h c2eab785c9222f6a955377f8df858254 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h 40a277a774380ff50e9058a69e4afca8 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h 9c5ebca86272b2c3a05fb7153e4c023f *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorTrace.h 34d4213eb0a18e72b582286583f6e537 *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h db02acae1da6d2d06c666d6c22b4740f *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorUInt128.h ccc96193cbb99ac0cdf13e9e36464b1a *inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h 6f54705b485291c4673e3a00071422c2 *inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/DynamicSymmetry.h 984e6e1291e9fe783dd226c8754288af *inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/StaticSymmetry.h 605ff85dc24af5c998a19d7bfe49b583 *inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/Symmetry.h 22b36025d9ecfd6e85038a4303a07f86 *inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h 1c6baa178d843ec8960212a46581f084 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/Barrier.h 34d22af36853a3c71ce3da3f3e0d565d *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/EventCount.h e60ab3f4920ffdfe2497c337ae04a396 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h 99bd359fc00e1e160eea19a121d840ef *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h 99c1519739630dc91690432451033ee8 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadCancel.h 77ba70869910de96c93fd4b535526834 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadEnvironment.h 4a673fd804c8ad4519c9b7676eed6b51 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadLocal.h be92216cff4a7978a175f0e7374bf03a *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadPoolInterface.h 265bd1619a45db5e1aa7b41c66dcab53 *inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadYield.h eec720d2717d77abd9be8fe98f871d3d *inst/include/unsupported/Eigen/CXX11/src/util/CXX11Meta.h 9ec42bffe00578aceca970344a507864 *inst/include/unsupported/Eigen/CXX11/src/util/CXX11Workarounds.h 1e12f04d429aa7539da4e08d1b105753 *inst/include/unsupported/Eigen/CXX11/src/util/EmulateArray.h 2106c71ab7c12b1df8f21cd7de91900e *inst/include/unsupported/Eigen/CXX11/src/util/MaxSizeVector.h e1aad13f436794eb03abbf9cb9c8f329 *inst/include/unsupported/Eigen/EulerAngles 54eed7040f45334ac17e8ce52052657f *inst/include/unsupported/Eigen/FFT 900031a26ead4c754ca9a5ec4fb4fd60 *inst/include/unsupported/Eigen/IterativeSolvers 2ec3b8ae1f6f8cf0c06cdd26e4c92b19 *inst/include/unsupported/Eigen/KroneckerProduct 7db524ba7997a0e026986947a7681485 *inst/include/unsupported/Eigen/LevenbergMarquardt fe82f55590ef4e7e8746d42849b36c16 *inst/include/unsupported/Eigen/MPRealSupport 7d203b31fdc9e2a16ef3a84c857bb098 *inst/include/unsupported/Eigen/MatrixFunctions b6bb2d06c8ecf9e5ee1c1b1ffece2c64 *inst/include/unsupported/Eigen/MoreVectorization 21cce7ad666181669753c90327e01321 *inst/include/unsupported/Eigen/NonLinearOptimization 5ad7a0ec7dc670c56313da681c282242 *inst/include/unsupported/Eigen/NumericalDiff 3bbda3333cce7212781177633429242e *inst/include/unsupported/Eigen/OpenGLSupport fc6cade564d8203a15ded8d3230e8939 *inst/include/unsupported/Eigen/Polynomials 5eb76b0bf6d7158082b9f53ce8952c1f *inst/include/unsupported/Eigen/Skyline fa2b95a8a9d6d32a7620da61c209d187 *inst/include/unsupported/Eigen/SparseExtra 36ad389632e775ed0afc72b87caa2fcc *inst/include/unsupported/Eigen/SpecialFunctions a9e681a260f3e7f0229ad01c0c74b0cd *inst/include/unsupported/Eigen/Splines a87cd9c4876c33612bd474881d479bf6 *inst/include/unsupported/Eigen/src/AutoDiff/AutoDiffJacobian.h 93d82a06e0a1b8ddef50c5719c549dda *inst/include/unsupported/Eigen/src/AutoDiff/AutoDiffScalar.h 92ad88e993f5118143fd5e9a0a01d772 *inst/include/unsupported/Eigen/src/AutoDiff/AutoDiffVector.h 48837c998e8b3eab654adbd4fa663e84 *inst/include/unsupported/Eigen/src/BVH/BVAlgorithms.h 3e01f21ae682bb50741a0e045f348ba0 *inst/include/unsupported/Eigen/src/BVH/KdBVH.h be69b918915c2b18c89b0b5c4bded54f *inst/include/unsupported/Eigen/src/Eigenvalues/ArpackSelfAdjointEigenSolver.h 7641cc97dd35ba62dd0774321f68904e *inst/include/unsupported/Eigen/src/EulerAngles/EulerAngles.h 67c26fa8d2b4c0bc9e5435b9f93398a8 *inst/include/unsupported/Eigen/src/EulerAngles/EulerSystem.h 91c40c9d1a1c2f82a5bc928760f00809 *inst/include/unsupported/Eigen/src/FFT/ei_fftw_impl.h 883ba96410570d9432c3479ca7b38767 *inst/include/unsupported/Eigen/src/FFT/ei_kissfft_impl.h 816b889bab1493935c8b57a011788e08 *inst/include/unsupported/Eigen/src/IterativeSolvers/ConstrainedConjGrad.h 39398818ce951f1a7eab9588250e348f *inst/include/unsupported/Eigen/src/IterativeSolvers/DGMRES.h 14e0f3b7c5d4dc6c24a0c93bc9d8490a *inst/include/unsupported/Eigen/src/IterativeSolvers/GMRES.h a42bc8531a6fe28e8fe3b70166cc7d2d *inst/include/unsupported/Eigen/src/IterativeSolvers/IDRS.h 27ac1505b341316adbc72365b9367ae8 *inst/include/unsupported/Eigen/src/IterativeSolvers/IncompleteLU.h 4cfae3281bf13bc9ed1a952ecd3a34ba *inst/include/unsupported/Eigen/src/IterativeSolvers/IterationController.h 134d6eacba23e34a48e930bc2de55f20 *inst/include/unsupported/Eigen/src/IterativeSolvers/MINRES.h 700f58fae8ecc28726582df3ff76783d *inst/include/unsupported/Eigen/src/IterativeSolvers/Scaling.h fe3008f64d440042b992dfcbdb1c43fb *inst/include/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h d9ff86916e5e5a1e3d74273ff947dab7 *inst/include/unsupported/Eigen/src/LevenbergMarquardt/CopyrightMINPACK.txt d81eff550898b40c7d51e86be39a5eee *inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMcovar.h eb5aab42e6b1755ad6ed6a3d64c85849 *inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMonestep.h 370bfa601a00a753effd58dd2acab1eb *inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMpar.h b335d907e4283aaa1fd76ea468e5b955 *inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMqrsolv.h c87a5e2ae14f64cf56439d08236891ce *inst/include/unsupported/Eigen/src/LevenbergMarquardt/LevenbergMarquardt.h d0731f6733298683ba4303908de31e7b *inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixExponential.h 946b65b5dabe6dd1cab9bcec99be0365 *inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixFunction.h e39bf3d9ef17a3c5bf01100d5f74ce94 *inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixLogarithm.h b0c03782cb14949ef52526a841ebef1e *inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixPower.h c2233ea27daa28237dc585513f9e59be *inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixSquareRoot.h ad137f2e6b96d77cce5fd3000dde38b6 *inst/include/unsupported/Eigen/src/MatrixFunctions/StemFunction.h e3e564572a4abd4270811bfd55eefc0b *inst/include/unsupported/Eigen/src/MoreVectorization/MathFunctions.h ef5bfb36ead3d59b0740262c42f35e9c *inst/include/unsupported/Eigen/src/NonLinearOptimization/HybridNonLinearSolver.h 916bb8c904d76c8f5221c354d380b1d3 *inst/include/unsupported/Eigen/src/NonLinearOptimization/LevenbergMarquardt.h 2dddb5bda1aa9c3ec00c14fe57d9b036 *inst/include/unsupported/Eigen/src/NonLinearOptimization/chkder.h 8fa8effaf7427df38efe187d79f38977 *inst/include/unsupported/Eigen/src/NonLinearOptimization/covar.h 2bc5fad8c5e8bc445a152cca73b55695 *inst/include/unsupported/Eigen/src/NonLinearOptimization/dogleg.h 7c9dbda9021e771f78218266f7bd7d36 *inst/include/unsupported/Eigen/src/NonLinearOptimization/fdjac1.h 39c65c9a6ea3e0a03b30f838f4a4e434 *inst/include/unsupported/Eigen/src/NonLinearOptimization/lmpar.h 0c31ca70c017a1caa330afb6e7f7d613 *inst/include/unsupported/Eigen/src/NonLinearOptimization/qrsolv.h 383cf032d01af4567d1f02d11897fb7d *inst/include/unsupported/Eigen/src/NonLinearOptimization/r1mpyq.h c930b6771d0c4a8e13c1c6b77b93d273 *inst/include/unsupported/Eigen/src/NonLinearOptimization/r1updt.h bc16831605dac24fd37a87e1cc1bce38 *inst/include/unsupported/Eigen/src/NonLinearOptimization/rwupdt.h 3d568e244f7e3e7fba0d70ed07e3263a *inst/include/unsupported/Eigen/src/NumericalDiff/NumericalDiff.h e112dec4dd76e520ba0bad302364fe79 *inst/include/unsupported/Eigen/src/Polynomials/Companion.h 12b0293d31e7c4b24f024b7a1603f07d *inst/include/unsupported/Eigen/src/Polynomials/PolynomialSolver.h de9aaaebae2081ceedd850ec2f439827 *inst/include/unsupported/Eigen/src/Polynomials/PolynomialUtils.h 79ca7405d8b8ad2fe36e10b8fee4148b *inst/include/unsupported/Eigen/src/Skyline/SkylineInplaceLU.h 824c9da2a9592016b537be34e54e4dfe *inst/include/unsupported/Eigen/src/Skyline/SkylineMatrix.h 9f5a3fae279cd008f68e22f1b3f68e0d *inst/include/unsupported/Eigen/src/Skyline/SkylineMatrixBase.h a2ef069d1b248a87813de8b901693684 *inst/include/unsupported/Eigen/src/Skyline/SkylineProduct.h bae2cb0cb4a531ae9b1f47c4c280cacd *inst/include/unsupported/Eigen/src/Skyline/SkylineStorage.h 3fde3a91d55eb2fafb974de05722a571 *inst/include/unsupported/Eigen/src/Skyline/SkylineUtil.h 442ac0f752eb3cc76178316070d1f627 *inst/include/unsupported/Eigen/src/SparseExtra/BlockOfDynamicSparseMatrix.h 66cda544a242ec51e4230b3caab4b96f *inst/include/unsupported/Eigen/src/SparseExtra/BlockSparseMatrix.h 9de57da7656f706385d421f9b7e2fcb2 *inst/include/unsupported/Eigen/src/SparseExtra/DynamicSparseMatrix.h dc752616501332f15020a721ec6d7ebf *inst/include/unsupported/Eigen/src/SparseExtra/MarketIO.h 5159861264eb780296f6ea62b1c885c5 *inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h a59440ef2127a48d2d990c3e350c57aa *inst/include/unsupported/Eigen/src/SparseExtra/RandomSetter.h cbd20ac144f08c97ee471cb1f3ca2a54 *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsArrayAPI.h 4621520b36892f2903eb151b804a68ab *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsBFloat16.h a63cd8aafb6d61e8f4ac6035fdb93017 *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsFunctors.h 9bdad930bf90868e1ee9d09801c6f77b *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsHalf.h 54a0647aa83f10e267777acbabb84889 *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsImpl.h 95e2404ede0aa0be7028b928b767fdcb *inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsPacketMath.h ce0ae07a1b2f7f5bedd9e31d5b8d0134 *inst/include/unsupported/Eigen/src/SpecialFunctions/HipVectorCompatibility.h 0fd1fab71470296959461ff0b0cc0e62 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsArrayAPI.h 80336ac8a4cf3b1372e1ca989ef2f5d5 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsBFloat16.h e126a9c2faab38c4d8552af22ffa9464 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsFunctors.h 870773cb741c1bfb74c626c668ca0976 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsHalf.h 764c502155970dfd44bc27230b8855b2 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsImpl.h 031cc02523f69e04c4321a42b6dc59a8 *inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsPacketMath.h a896db19e3f753e87edb0c855f6611a6 *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX/BesselFunctions.h 7d5c3e84f0af0555cdeb0a90395a161a *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX/SpecialFunctions.h 4183ece3dd27af3bf108c8aa1ca8947e *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX512/BesselFunctions.h 34efb0d9403f8e589cdda3cd1e73ad99 *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX512/SpecialFunctions.h 7e042ddaa7e463b95c487e9eef5d7093 *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/GPU/SpecialFunctions.h e32b6e9a7992d683ed5e5258516e69e5 *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/NEON/BesselFunctions.h 0ce08e0456e47a1640c1777f2cf38450 *inst/include/unsupported/Eigen/src/SpecialFunctions/arch/NEON/SpecialFunctions.h 6cf9e021f1bc3e964b85f3b436337d4a *inst/include/unsupported/Eigen/src/Splines/Spline.h a7a8b63bca04532ec7b30e6134764558 *inst/include/unsupported/Eigen/src/Splines/SplineFitting.h bc75ab5a8c3ca729f8dcb08c309b6f47 *inst/include/unsupported/Eigen/src/Splines/SplineFwd.h b4f0b014ec53404e843dbc718fabf57e *inst/skeleton/Makevars b4f0b014ec53404e843dbc718fabf57e *inst/skeleton/Makevars.win 094a35f66ff4b08c54a4fb8d6d1f4393 *inst/skeleton/rcppeigen_hello_world.Rd b83bee5d5ad946dbfa8f1e410c90d745 *inst/skeleton/rcppeigen_hello_world.cpp b81206a4c831ced7a37fd8f7a0066830 *inst/tinytest/cpp/rcppeigen.cpp be6cfb634100b1f08a3e08b30e929fd7 *inst/tinytest/cpp/solution.cpp a62e42e20e8381d578b0f3af53fb11c6 *inst/tinytest/cpp/sparse.cpp 2bba9269b536376c10e821c2e9dcadc7 *inst/tinytest/cpp/transform.cpp 1c06c0fac5f692d057b35d2584001afd *inst/tinytest/cpp/wrap.cpp 76f6f32151c7d47de981270fe5e97613 *inst/tinytest/test_RcppEigen.R a67648e012cd905a275ea48d5da0e6b0 *inst/tinytest/test_fastLm.R eb291ceff757f9e041cb865f09fc8428 *inst/tinytest/test_misc.R 637b2794a96520b25fe867d2a2974cc0 *inst/tinytest/test_solution.R 856cbc90058c8f2c24d05c4afc6e8174 *inst/tinytest/test_sparse.R 0c87d121b0a4a7499b01e07146a1dcf0 *inst/tinytest/test_transform.R 5971761ce901a48ca3651c1293aaf284 *inst/tinytest/test_wrap.R 9b1fd1fa838f03ed944e8b724dbafe98 *man/RcppEigen-package.Rd 7d22109cab99baae3f89a00a23fdd0bc *man/RcppEigen.package.skeleton.Rd 3f68ed0b145014b9e9e4bc769970b915 *man/fastLm.Rd 410976344986b045687d43424a05deb3 *src/Makevars 410976344986b045687d43424a05deb3 *src/Makevars.win d7bc0ddde486ae6dccc76d29f1a5f297 *src/RcppEigen.cpp 66c7c106ce076772af617e639df41042 *src/RcppExports.cpp 35b458bb1fd3d10375ae693f196d073b *src/fastLm.cpp 16915b4298cfa6d0ba90c23fd5e0c633 *src/fastLm.h 24d69df3f629ce3447ec7d7c81ab775c *tests/tinytest.R c52f521406b904dee163f3286ba491e8 *vignettes/RcppEigen-Introduction.Rnw RcppEigen/inst/0000755000176200001440000000000014567630620013114 5ustar liggesusersRcppEigen/inst/examples/0000755000176200001440000000000013303656150014723 5ustar liggesusersRcppEigen/inst/examples/lmBenchmark.R0000644000176200001440000000604213303656150017273 0ustar liggesusers## lmBenchmark.R: Benchmark different implementations of linear model solutions ## ## Copyright (C) 2011 - 2017 Douglas Bates, Dirk Eddelbuettel and Romain Francois ## ## This file is part of RcppEigen. require("stats", character=TRUE, quietly=TRUE) require("RcppEigen", character=TRUE, quietly=TRUE) if(require("microbenchmark", character=TRUE, quietly=TRUE)){ ## define different versions of lm exprs <- list() ## These versions use rank-revealing decompositions and thus can ## handle rank-deficient cases. # default version used in lm() exprs["lm.fit"] <- alist(stats::lm.fit(mm, y)) # versions from RcppEigen ## column-pivoted QR decomposition - similar to lm.fit exprs["PivQR"] <- alist(RcppEigen::fastLmPure(mm, y, 0L)) ## LDLt Cholesky decomposition with rank detection exprs["LDLt"] <- alist(RcppEigen::fastLmPure(mm, y, 2L)) ## SVD using the Lapack subroutine dgesdd and Eigen support exprs["GESDD"] <- alist(RcppEigen::fastLmPure(mm, y, 6L)) ## SVD (the JacobiSVD class from Eigen) exprs["SVD"] <- alist(RcppEigen::fastLmPure(mm, y, 4L)) ## eigenvalues and eigenvectors of X'X exprs["SymmEig"] <- alist(RcppEigen::fastLmPure(mm, y, 5L)) ## Non-rank-revealing decompositions. These work fine except when ## they don't. ## Unpivoted QR decomposition exprs["QR"] <- alist(RcppEigen::fastLmPure(mm, y, 1L)) ## LLt Cholesky decomposition exprs["LLt"] <- alist(RcppEigen::fastLmPure(mm, y, 3L)) if (suppressMessages(require("RcppArmadillo", character=TRUE, quietly=TRUE))) { exprs["arma"] <- alist(RcppArmadillo::fastLmPure(mm, y)) } if (suppressMessages(require("RcppGSL", character=TRUE, quietly=TRUE))) { exprs["GSL"] <- alist(RcppGSL::fastLmPure(mm, y)) } do_bench <- function(n=100000L, p=40L, nrep=20L, suppressSVD=(n > 100000L)) { mm <- cbind(1, matrix(rnorm(n * (p - 1L)), nc=p-1L)) y <- rnorm(n) if (suppressSVD) exprs <- exprs[!names(exprs) %in% c("SVD", "GSL")] cat("lm benchmark for n = ", n, " and p = ", p, ": nrep = ", nrep, "\n", sep='') cat("RcppEigen: Included Eigen version", paste(RcppEigen:::eigen_version(FALSE), collapse="."), "\n") cat("RcppEigen: Eigen SSE support", RcppEigen:::Eigen_SSE(), "\n") mb <- microbenchmark(list=exprs, times = nrep) op <- options(microbenchmark.unit="relative") on.exit(options(op)) mb_relative <- summary(mb) levels(mb_relative$expr) <- names(exprs) options(microbenchmark.unit=NULL) mb_absolute <- summary(mb) levels(mb_absolute$expr) <- names(exprs) mb_combined <- merge(mb_relative[, c("expr", "median")], mb_absolute[, c("expr", "median")], by="expr") colnames(mb_combined) <- c("Method", "Relative", paste0("Elapsed (", attr(mb_absolute, "unit"), ")")) mb_combined[order(mb_combined$Relative),] } print(do_bench()) } RcppEigen/inst/doc/0000755000176200001440000000000014567630620013661 5ustar liggesusersRcppEigen/inst/doc/RcppEigen-Introduction.Rnw0000644000176200001440000024401313471777756020727 0ustar liggesusers\documentclass[shortnames,article,nojss]{jss} \usepackage{booktabs,bm,amsmath,thumbpdf} %\VignetteIndexEntry{RcppEigen-intro} %\VignetteKeywords{linear algebra, template programming, C++, R, Rcpp} %\VignettePackage{RcppEigen} %% VIGNETTE <>= pkgVersion <- packageDescription("RcppEigen")$Version pkgDate <- packageDescription("RcppEigen")$Date prettyDate <- format(Sys.Date(), "%B %e, %Y") #require("RcppEigen") #eigenVersion <- paste(unlist(.Call("eigen_version", FALSE)), collapse=".") @ \author{Douglas Bates\\University of Wisconsin-Madison \And Dirk Eddelbuettel\\Debian Project} \Plainauthor{Douglas Bates, Dirk Eddelbuettel} \title{Fast and Elegant Numerical Linear Algebra Using the \pkg{RcppEigen} Package} \Plaintitle{Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package} \Shorttitle{Fast and Elegant Numerical Linear Algebra with \pkg{RcppEigen}} \Abstract{ The \pkg{RcppEigen} package provides access from \proglang{R} \citep{R:Main} to the \pkg{Eigen} \citep*{Eigen:Web} \proglang{C++} template library for numerical linear algebra. \pkg{Rcpp} \citep{CRAN:Rcpp,Eddelbuettel:2013:Rcpp} classes and specializations of the \proglang{C++} templated functions \code{as} and \code{wrap} from \pkg{Rcpp} provide the ``glue'' for passing objects from \proglang{R} to \proglang{C++} and back. Several introductory examples are presented. This is followed by an in-depth discussion of various available approaches for solving least-squares problems, including rank-revealing methods, concluding with an empirical run-time comparison. Last but not least, sparse matrix methods are discussed. } \Keywords{linear algebra, template programming, \proglang{R}, \proglang{C++}, \pkg{Rcpp}} \Plainkeywords{linear algebra, template programmig, R, C++, Rcpp} \Address{ Douglas Bates \\ Department of Statistics \\ University of Wisconsin-Madison \\ Madison, WI, United States of America \\ E-mail: \email{bates@stat.wisc.edu} \\ URL: \url{http://www.stat.wisc.edu/~bates/}\\ Dirk Eddelbuettel \\ Debian Project \\ River Forest, IL, United States of America\\ E-mail: \email{edd@debian.org}\\ URL: \url{http://dirk.eddelbuettel.com}\\ } \usepackage{Sweave} \newcommand{\argmin}{\operatorname{argmin}\displaylimits} \newcommand{\rank}{\operatorname{rank}} %% highlights macros %% Style definition file generated by highlight 2.7, http://www.andre-simon.de/ % Highlighting theme definition: \newcommand{\hlstd}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlnum}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlesc}[1]{\textcolor[rgb]{0.74,0.55,0.55}{#1}} \newcommand{\hlstr}[1]{\textcolor[rgb]{0.90,0.15,0.15}{#1}} \newcommand{\hldstr}[1]{\textcolor[rgb]{0.74,0.55,0.55}{#1}} \newcommand{\hlslc}[1]{\textcolor[rgb]{0.67,0.13,0.13}{\it{#1}}} \newcommand{\hlcom}[1]{\textcolor[rgb]{0.67,0.13,0.13}{\it{#1}}} \newcommand{\hldir}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlsym}[1]{\textcolor[rgb]{0,0,0}{#1}} \newcommand{\hlline}[1]{\textcolor[rgb]{0.33,0.33,0.33}{#1}} \newcommand{\hlkwa}[1]{\textcolor[rgb]{0.61,0.13,0.93}{\bf{#1}}} \newcommand{\hlkwb}[1]{\textcolor[rgb]{0.13,0.54,0.13}{#1}} \newcommand{\hlkwc}[1]{\textcolor[rgb]{0,0,1}{#1}} \newcommand{\hlkwd}[1]{\textcolor[rgb]{0,0,0}{#1}} \definecolor{bgcolor}{rgb}{1,1,1} % ------------------------------------------------------------------------ \begin{document} \SweaveOpts{engine=R,eps=FALSE} \begin{quote} \footnotesize This vignette corresponds to a \href{http://www.jstatsoft.org/v52/i05/}{paper published} in the \textsl{Journal of Statistical Software}. Currently still identical to the paper, this vignette version may over time receive minor updates. For citations, please use \citet{JSS:RcppEigen} as provided by \code{citation("RcppEigen")}. This version corresponds to \pkg{RcppEigen} version \Sexpr{pkgVersion} and was typeset on \Sexpr{prettyDate}. \end{quote} \section{Introduction} \label{sec:intro} Linear algebra is an essential building block of statistical computing. Operations such as matrix decompositions, linear program solvers, and eigenvalue/eigenvector computations are used in many estimation and analysis routines. As such, libraries supporting linear algebra have long been provided by statistical programmers for different programming languages and environments. Because it is object-oriented, \proglang{C++}, one of the central modern languages for numerical and statistical computing, is particularly effective at representing matrices, vectors and decompositions, and numerous class libraries providing linear algebra routines have been written over the years. As both the \proglang{C++} language and standards have evolved \citep{Meyers:2005:EffectiveC++,Meyers:1995:MoreEffectiveC++,Cpp11}, so have the compilers implementing the language. Relatively modern language constructs such as template meta-programming are particularly useful because they provide overloading of operations (allowing expressive code in the compiled language similar to what can be done in scripting languages) and can shift some of the computational burden from the run-time to the compile-time. (A more detailed discussion of template meta-programming is, however, beyond the scope of this paper). \cite{Veldhuizen:1998:Blitz} provided an early and influential implementation of numerical linear algebra classes that already demonstrated key features of this approach. Its usage was held back at the time by the limited availability of compilers implementing all the necessary features of the \proglang{C++} language. This situation has greatly improved over the last decade, and many more libraries have been made available. One such \proglang{C++} library is \pkg{Eigen} by \citet*{Eigen:Web} which started as a sub-project to KDE (a popular Linux desktop environment), initially focussing on fixed-size matrices to represent projections in a visualization application. \pkg{Eigen} grew from there and has over the course of about a decade produced three major releases with \pkg{Eigen}3 being the current major version. To check the minor and patch version numbers, load the \pkg{RcppEigen} package and call this (internal) helper function: \begin{CodeInput} R> RcppEigen:::eigen_version(FALSE) \end{CodeInput} \begin{CodeOutput} major minor patch 3 3 5 \end{CodeOutput} \pkg{Eigen} is of interest as the \proglang{R} system for statistical computation and graphics \citep{R:Main} is itself easily extensible. This is particular true via the \proglang{C} language that most of \proglang{R}'s compiled core parts are written in, but also for the \proglang{C++} language which can interface with \proglang{C}-based systems rather easily. The manual ``Writing \proglang{R} Extensions'' \citep{R:Extensions} is the basic reference for extending \proglang{R} with either \proglang{C} or \proglang{C++}. The \pkg{Rcpp} package by \citet{JSS:Rcpp,CRAN:Rcpp} facilitates extending \proglang{R} with \proglang{C++} code by providing seamless object mapping between both languages. % As stated in the \pkg{Rcpp} \citep{CRAN:Rcpp} vignette, ``Extending \pkg{Rcpp}'' as well as in \citet{Eddelbuettel:2013:Rcpp} \begin{quote} \pkg{Rcpp} facilitates data interchange between \proglang{R} and \proglang{C++} through the templated functions \code{Rcpp::as} (for conversion of objects from \proglang{R} to \proglang{C++}) and \code{Rcpp::wrap} (for conversion from \proglang{C++} to \proglang{R}). \end{quote} The \pkg{RcppEigen} package provides the header files composing the \pkg{Eigen} \proglang{C++} template library, along with implementations of \code{Rcpp::as} and \code{Rcpp::wrap} for the \proglang{C++} classes defined in \pkg{Eigen}. The \pkg{Eigen} classes themselves provide high-performance, versatile and comprehensive representations of dense and sparse matrices and vectors, as well as decompositions and other functions to be applied to these objects. The next section introduces some of these classes and shows how to interface to them from \proglang{R}. \section[Eigen classes]{\pkg{Eigen} classes} \label{sec:eclasses} \pkg{Eigen} is a \proglang{C++} template library providing classes for many forms of matrices, vectors, arrays and decompositions. These classes are flexible and comprehensive allowing for both high performance and well structured code representing high-level operations. \proglang{C++} code based on \pkg{Eigen} is often more like \proglang{R} code, working on the ``whole object'', than like compiled code in other languages where operations often must be coded in loops. As in many \proglang{C++} template libraries using template meta-programming \citep{Abrahams+Gurtovoy:2004:TemplateMetaprogramming}, the templates themselves can be very complicated. However, \pkg{Eigen} provides \code{typedef}s for common classes that correspond to \proglang{R} matrices and vectors, as shown in Table~\ref{tab:REigen}, and this paper will use these \code{typedef}s. \begin{table}[t!] \centering \begin{tabular}{l l} \hline \multicolumn{1}{c}{\proglang{R} object type} & \multicolumn{1}{c}{\pkg{Eigen} class typedef}\\ \hline numeric matrix & \code{MatrixXd}\\ integer matrix & \code{MatrixXi}\\ complex matrix & \code{MatrixXcd}\\ numeric vector & \code{VectorXd}\\ integer vector & \code{VectorXi}\\ complex vector & \code{VectorXcd}\\ \code{Matrix::dgCMatrix} \phantom{XXX} & \code{SparseMatrix}\\ \hline \end{tabular} \caption{Correspondence between \proglang{R} matrix and vector types and classes in the \pkg{Eigen} namespace. \label{tab:REigen}} \end{table} Here, \code{Vector} and \code{Matrix} describe the dimension of the object. The \code{X} signals that these are dynamically-sized objects (as opposed to fixed-size matrices such as $3 \times 3$ matrices also available in \pkg{Eigen}). Lastly, the trailing characters \code{i}, \code{d} and \code{cd} denote, respectively, storage types \code{integer}, \code{double} and \code{complex double}. The \proglang{C++} classes shown in Table~\ref{tab:REigen} are in the \pkg{Eigen} namespace, which means that they must be written as \code{Eigen::MatrixXd}. However, if one prefaces the use of these class names with a declaration like \begin{quote} \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{;}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \end{quote} \vspace*{-0.4cm} then one can use these names without the namespace qualifier. \subsection[Mapped matrices in Eigen]{Mapped matrices in \pkg{Eigen}} \label{sec:mapped} Storage for the contents of matrices from the classes shown in Table~\ref{tab:REigen} is allocated and controlled by the class constructors and destructors. Creating an instance of such a class from an \proglang{R} object involves copying its contents. An alternative is to have the contents of the \proglang{R} matrix or vector mapped to the contents of the object from the \pkg{Eigen} class. For dense matrices one can use the \pkg{Eigen} templated class \code{Map}, and for sparse matrices one can deploy the \pkg{Eigen} templated class \code{MappedSparseMatrix}. One must, of course, be careful not to modify the contents of the \proglang{R} object in the \proglang{C++} code. A recommended practice is always to declare mapped objects as {\ttfamily\hlkwb{const}\normalfont}. \subsection[Arrays in Eigen]{Arrays in \pkg{Eigen}} \label{sec:arrays} For matrix and vector classes \pkg{Eigen} overloads the \code{`*'} operator as matrix multiplication. Occasionally component-wise operations instead of matrix operations are desired, for which the \code{Array} templated classes are used in \pkg{Eigen}. Switching back and forth between matrices or vectors and arrays is usually done via the \code{array()} method for matrix or vector objects and the \code{matrix()} method for arrays. \subsection[Structured matrices in Eigen]{Structured matrices in \pkg{Eigen}} \label{sec:structured} \pkg{Eigen} provides classes for matrices with special structure such as symmetric matrices, triangular matrices and banded matrices. For dense matrices, these special structures are described as ``views'', meaning that the full dense matrix is stored but only part of the matrix is used in operations. For a symmetric matrix one must specify whether the lower triangle or the upper triangle is to be used as the contents, with the other triangle defined by the implicit symmetry. \section{Some simple examples} \label{sec:simple} \proglang{C++} functions to perform simple operations on matrices or vectors can follow a pattern of: \begin{enumerate} \item Map the \proglang{R} objects passed as arguments into \pkg{Eigen} objects. \item Create the result. \item Return \code{Rcpp::wrap} applied to the result. \end{enumerate} An idiom for the first step is % using Eigen::Map; % using Eigen::MatrixXd; % using Rcpp::as; % % const Map A(as >(AA)); %\end{lstlisting} \begin{quote} \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{Map}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{MatrixXd}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Rcpp}\hlsym{::}\hlstd{as}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXd}\hlsym{$>$}\hlstd{\ \ }\hlsym{}\hlstd{}\hlkwd{A}\hlstd{}\hlsym{(}\hlstd{as}\hlsym{$<$}\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXd}\hlsym{$>$\ $>$(}\hlstd{AA}\hlsym{));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \end{quote} \vspace*{-0.4cm} where \code{AA} is the name of the \proglang{R} object (of type \code{SEXP} in \proglang{C} and \proglang{C++}) passed to the \proglang{C++} function. An alternative to the \code{using} declarations is to declare a \code{typedef} as in % typedef Eigen::Map MapMatd; % const MapMatd A(Rcpp::as(AA)); \begin{quote} \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{$<$}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ \ }\hlopt{}\hlstd{MapMatd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{Rcpp}\hlopt{::}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \end{quote} \vspace*{-0.4cm} The \code{cxxfunction} function from the \pkg{inline} package \citep*{CRAN:inline} for \proglang{R} and its \pkg{RcppEigen} plugin provide a convenient method of developing and debugging the \proglang{C++} code. For actual production code one generally incorporates the \proglang{C++} source code files in a package and includes the line \code{LinkingTo: Rcpp, RcppEigen} in the package's \code{DESCRIPTION} file. The \code{RcppEigen.package.skeleton} function provides a quick way of generating the skeleton of a package that will use \pkg{RcppEigen}. The \code{cxxfunction} with the \code{"Rcpp"} or \code{"RcppEigen"} plugins has the \code{as} and \code{wrap} functions already defined as \code{Rcpp::as} and \code{Rcpp::wrap}. In the examples below these declarations are omitted. It is important to remember that they are needed in actual \proglang{C++} source code for a package. The first few examples are simply for illustration as the operations shown could be more effectively performed directly in \proglang{R}. Finally, the results from \pkg{Eigen} are compared to those from the direct \proglang{R} results. \subsection{Transpose of an integer matrix} \label{sec:transpose} The next \proglang{R} code snippet creates a simple matrix of integers \begin{CodeInput} R> (A <- matrix(1:6, ncol = 2)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 \end{CodeOutput} \begin{CodeInput} R> str(A) \end{CodeInput} \begin{CodeOutput} int [1:3, 1:2] 1 2 3 4 5 6 \end{CodeOutput} and, in Figure~\ref{trans}, the \code{transpose()} method for the \code{Eigen::MatrixXi} class is used to return the transpose of the supplied matrix. The \proglang{R} matrix in the \code{SEXP} \code{AA} is first mapped to an \code{Eigen::MatrixXi} object, and then the matrix \code{At} is constructed from its transpose and returned to \proglang{R}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{Map}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlsym{::}\hlstd{MatrixXi}\hlsym{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ Map\ the\ integer\ matrix\ AA\ from\ R}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXi}\hlsym{$>$}\hlstd{\ \ }\hlsym{}\hlstd{}\hlkwd{A}\hlstd{}\hlsym{(}\hlstd{as}\hlsym{$<$}\hlstd{Map}\hlsym{$<$}\hlstd{MatrixXi}\hlsym{$>$\ $>$(}\hlstd{AA}\hlsym{));}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ evaluate\ and\ return\ the\ transpose\ of\ A}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{At}\hlstd{}\hlsym{(}\hlstd{A}\hlsym{.}\hlstd{}\hlkwd{transpose}\hlstd{}\hlsym{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{}\hlkwd{wrap}\hlstd{}\hlsym{(}\hlstd{At}\hlsym{);}\hlstd{}\hspace*{\fill} \normalfont \hrule \caption{\code{transCpp}: Transpose a matrix of integers. \label{trans}} \end{figure} The \proglang{R} snippet below compiles and links the \proglang{C++} code segment. The actual work is done by the function \code{cxxfunction} from the \pkg{inline} package which compiles, links and loads code written in \proglang{C++} and supplied as a character variable. This frees the user from having to know about compiler and linker details and options, which makes ``exploratory programming'' much easier. Here the program piece to be compiled is stored as a character variable named \code{transCpp}, and \code{cxxfunction} creates an executable function which is assigned to \code{ftrans}. This new function is used on the matrix $\bm A$ shown above, and one can check that it works as intended by comparing the output to an explicit transpose of the matrix argument. \begin{CodeInput} R> ftrans <- cxxfunction(signature(AA = "matrix"), transCpp, + plugin = "RcppEigen") R> (At <- ftrans(A)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(At, t(A))) \end{CodeInput} For numeric or integer matrices the \code{adjoint()} method is equivalent to the \code{transpose()} method. For complex matrices, the adjoint is the conjugate of the transpose. In keeping with the conventions in the \pkg{Eigen} documentation the \code{adjoint()} method is used in what follows to create the transpose of numeric or integer matrices. \subsection{Products and cross-products} \label{sec:products} As mentioned in Section~\ref{sec:arrays}, the \code{`*'} operator is overloaded as matrix multiplication for the various matrix and vector classes in \pkg{Eigen}. The \proglang{C++} code in Figure~\ref{prod} produces a list containing both the product and cross-product (in the sense of the \proglang{R} function call \code{crossproduct(A, B)} evaluating $\bm A^\top\bm B$) of its two arguments % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{$<$}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{$>$}\hlstd{\ \ \ }\hlopt{}\hlstd{MapMati}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{B}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{BB}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{C}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{CC}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"B\ \%{*}\%\ C"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{B\ }\hlopt{{*}\ }\hlstd{C}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"crossprod(B,\ C)"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{B}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{C}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{prodCpp}: Product and cross-product of two matrices. \label{prod}} \end{figure} % \begin{CodeInput} R> fprod <- cxxfunction(signature(BB = "matrix", CC = "matrix"), + prodCpp, "RcppEigen") R> B <- matrix(1:4, ncol = 2) R> C <- matrix(6:1, nrow = 2) R> str(fp <- fprod(B, C)) \end{CodeInput} \begin{CodeOutput} List of 2 $ B %*% C : int [1:2, 1:3] 21 32 13 20 5 8 $ crossprod(B, C): int [1:2, 1:3] 16 38 10 24 4 10 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(fp[[1]], B %*% C), all.equal(fp[[2]], crossprod(B, C))) \end{CodeInput} % Note that the \code{create} class method for \code{Rcpp::List} implicitly applies \code{Rcpp::wrap} to its arguments. \subsection{Crossproduct of a single matrix} \label{sec:crossproduct} As shown in the last example, the \proglang{R} function \code{crossprod} calculates the product of the transpose of its first argument with its second argument. The single argument form, \code{crossprod(X)}, evaluates $\bm X^\top\bm X$. One could, of course, calculate this product as \begin{verbatim} t(X) %*% X \end{verbatim} but \code{crossprod(X)} is roughly twice as fast because the result is known to be symmetric and only one triangle needs to be calculated. The function \code{tcrossprod} evaluates \code{crossprod(t(X))} without actually forming the transpose. To express these calculations in \pkg{Eigen}, a \code{SelfAdjointView}---which is a dense matrix of which only one triangle is used, the other triangle being inferred from the symmetry---is created. (The characterization ``self-adjoint'' is equivalent to symmetric for non-complex matrices.) The \pkg{Eigen} class name is \code{SelfAdjointView}. The method for general matrices that produces such a view is called \code{selfadjointView}. Both require specification of either the \code{Lower} or \code{Upper} triangle. For triangular matrices the class is \code{TriangularView} and the method is \code{triangularView}. The triangle can be specified as \code{Lower}, \code{UnitLower}, \code{StrictlyLower}, \code{Upper}, \code{UnitUpper} or \code{StrictlyUpper}. For self-adjoint views the \code{rankUpdate} method adds a scalar multiple of $\bm A\bm A^\top$ to the current symmetric matrix. The scalar multiple defaults to 1. The code in Figure~\ref{crossprod} produces both $\bm A^\top\bm A$ and $\bm A\bm A^\top$ from a matrix $\bm A$. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{Lower}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$\ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$\ $>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{m}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()),\ }\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXi}\hlstd{}\hlopt{(}\hlstd{n}\hlopt{,\ }\hlstd{n}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$().}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()));}\hspace*{\fill}\\ \hlstd{MatrixXi}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{AAt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXi}\hlstd{}\hlopt{(}\hlstd{m}\hlopt{,\ }\hlstd{m}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$().}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"crossprod(A)"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{AtA}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"tcrossprod(A)"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{AAt}\hlopt{);}\hlstd{}\hspace*{\fill} \normalfont \normalsize \hrule \caption{\code{crossprodCpp}: Cross-product and transposed cross-product of a single matrix. \label{crossprod}} \end{figure} \begin{CodeInput} R> fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen") R> str(crp <- fcprd(A)) \end{CodeInput} \begin{CodeOutput} List of 2 $ crossprod(A) : int [1:2, 1:2] 14 32 32 77 $ tcrossprod(A): int [1:3, 1:3] 17 22 27 22 29 36 27 36 45 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(crp[[1]], crossprod(A)), + all.equal(crp[[2]], tcrossprod(A))) \end{CodeInput} To some, the expressions in Figure~\ref{crossprod} to construct \code{AtA} and \code{AAt} are compact and elegant. To others they are hopelessly confusing. If you find yourself in the latter group, you just need to read the expression left to right. So, for example, we construct \code{AAt} by creating a general integer matrix of size $m\times m$ (where $\bm A$ is $m\times n$), ensuring that all its elements are zero, regarding it as a self-adjoint (i.e., symmetric) matrix using the elements in the lower triangle, then adding $\bm A\bm A^\top$ to it and converting back to a general matrix form (i.e., the strict lower triangle is copied into the strict upper triangle). In more detail: \begin{enumerate} \item \code{MatrixXi(n, n)} creates an $n\times n$ integer matrix with arbitrary contents \item \code{.setZero()} zeros out the contents of the matrix \item \code{.selfAdjointView()} causes what follows to treat the matrix as a symmetric matrix in which only the lower triangle is used, the strict upper triangle being inferred by symmetry \item \code{.rankUpdate(A)} forms the sum $\bm B+\bm A\bm A^\top$ where $\bm B$ is the symmetric matrix of zeros created in the previous steps. \end{enumerate} The assignment of this symmetric matrix to the (general) \code{MatrixXi} object \code{AAt} causes the result to be symmetrized during the assignment. For these products one could define the symmetric matrix from either the lower triangle or the upper triangle as the result will be symmetrized before it is returned. To cut down on repetition of \code{using} statements we gather them in a character variable, \code{incl}, that will be given as the \code{includes} argument in the calls to \code{cxxfunction}. We also define a utility function, \code{AtA}, that returns the crossproduct matrix as shown in Figure~\ref{fig:incl} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{LLT}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Lower}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Map}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{MatrixXi}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{Upper}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using}\hlstd{\ \ \ }\hlkwa{}\hlstd{Eigen}\hlopt{::}\hlstd{VectorXd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapMatd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{MatrixXi}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapMati}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Map}\hlopt{$<$}\hlstd{VectorXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MapVecd}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{inline\ }\hlstd{MatrixXd\ }\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlopt{\&\ }\hlstd{A}\hlopt{)\ \{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwb{int}\hlstd{\ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{return}\hlstd{\ \ \ }\hlkwa{}\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{n}\hlopt{,}\hlstd{n}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{().}\hlstd{selfadjointView}\hlopt{$<$}\hlstd{Lower}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{rankUpdate}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{The contents of the character vector, \code{incl}, that will preface \proglang{C++} code segments that follow. \label{fig:incl}} \end{figure} \subsection{Cholesky decomposition of the crossprod} \label{sec:chol} The Cholesky decomposition of the positive-definite, symmetric matrix, $\bm A$, can be written in several forms. Numerical analysts define the ``LLt'' form as the lower triangular matrix, $\bm L$, such that $\bm A=\bm L\bm L^\top$ and the ``LDLt'' form as a unit lower triangular matrix $\bm L$ and a diagonal matrix $\bm D$ with positive diagonal elements such that $\bm A=\bm L\bm D\bm L^\top$. Statisticians often write the decomposition as $\bm A=\bm R^\top\bm R$ where $\bm R$ is an upper triangular matrix. Of course, this $\bm R$ is simply the transpose of $\bm L$ from the ``LLt'' form. The templated \pkg{Eigen} classes for the LLt and LDLt forms are called \code{LLT} and \code{LDLT} and the corresponding methods are \code{.llt()} and \code{.ldlt()}. Because the Cholesky decomposition involves taking square roots, we pass a numeric matrix, $\bm A$, not an integer matrix. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const}\hlstd{\ \ }\hlkwb{}\hlstd{LLT}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"L"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{()),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"R"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixU}\hlstd{}\hlopt{()));}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{cholCpp}: Cholesky decomposition of a cross-product. \label{chol}} \end{figure} % \begin{CodeInput} R> storage.mode(A) <- "double" R> fchol <- cxxfunction(signature(AA = "matrix"), cholCpp, "RcppEigen", incl) R> (ll <- fchol(A)) \end{CodeInput} \begin{CodeOutput} $L [,1] [,2] [1,] 3.74166 0.00000 [2,] 8.55236 1.96396 $R [,1] [,2] [1,] 3.74166 8.55236 [2,] 0.00000 1.96396 \end{CodeOutput} \begin{CodeInput} R> stopifnot(all.equal(ll[[2]], chol(crossprod(A)))) \end{CodeInput} \subsection{Determinant of the cross-product matrix} \label{sec:determinant} The ``D-optimal'' criterion for experimental design chooses the design that maximizes the determinant, $|\bm X^\top\bm X|$, for the $n\times p$ model matrix (or Jacobian matrix), $\bm X$. The determinant, $|\bm L|$, of the $p\times p$ lower Cholesky factor $\bm L$, defined so that $\bm L\bm L^\top=\bm X^\top\bm X$, is the product of its diagonal elements, as is the case for any triangular matrix. By the properties of determinants, \begin{displaymath} |\bm X^\top\bm X|=|\bm L\bm L^\top|=|\bm L|\,|\bm L^\top|=|\bm L|^2 \end{displaymath} Alternatively, if using the ``LDLt'' decomposition, $\bm L\bm D\bm L^\top=\bm X^\top\bm X$ where $\bm L$ is unit lower triangular and $\bm D$ is diagonal then $|\bm X^\top\bm X|$ is the product of the diagonal elements of $\bm D$. Because it is known that the diagonal elements of $\bm D$ must be non-negative, one often evaluates the logarithm of the determinant as the sum of the logarithms of the diagonal elements of $\bm D$. Several options are shown in Figure~\ref{cholDet}. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ }\hlstd{}\hlkwd{ata}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{AA}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ double}\hlstd{\ \ \ }\hlkwb{}\hlstd{}\hlkwd{detL}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{MatrixXd}\hlstd{}\hlopt{(}\hlstd{ata}\hlopt{.}\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{()).}\hlstd{}\hlkwd{diagonal}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{prod}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{Dvec}\hlstd{}\hlopt{(}\hlstd{ata}\hlopt{.}\hlstd{}\hlkwd{ldlt}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{vectorD}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"d1"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{detL\ }\hlopt{{*}\ }\hlstd{detL}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"d2"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{Dvec}\hlopt{.}\hlstd{}\hlkwd{prod}\hlstd{}\hlopt{(),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"ld"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{Dvec}\hlopt{.}\hlstd{}\hlkwd{array}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{log}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{sum}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{cholDetCpp}: Determinant of a cross-product using the ``LLt'' and ``LDLt'' forms of the Cholesky decomposition.} \label{cholDet} \end{figure} % \begin{CodeInput} R> fdet <- cxxfunction(signature(AA = "matrix"), cholDetCpp, "RcppEigen", + incl) R> unlist(ll <- fdet(A)) \end{CodeInput} \begin{CodeOutput} d1 d2 ld 54.00000 54.00000 3.98898 \end{CodeOutput} % Note the use of the \code{array()} method in the calculation of the log-determinant. Because the \code{log()} method applies to arrays, not to vectors or matrices, an array must be created from \code{Dvec} before applying the \code{log()} method. \section{Least squares solutions} \label{sec:leastSquares} A common operation in statistical computing is calculating a least squares solution, $\widehat{\bm\beta}$, defined as \begin{displaymath} \widehat{\bm\beta}=\argmin_{\bm \beta}\|\bm y-\bm X\bm\beta\|^2 \end{displaymath} where the model matrix, $\bm X$, is $n\times p$ ($n\ge p$) and $\bm y$ is an $n$-dimensional response vector. There are several ways, based on matrix decompositions, to determine such a solution. Earlier, two forms of the Cholesky decomposition were discussed: ``LLt'' and ``LDLt'', which can both be used to solve for $\widehat{\bm\beta}$. Other decompositions that can be used are the QR decomposition, with or without column pivoting, the singular value decomposition and the eigendecomposition of a symmetric matrix. Determining a least squares solution is relatively straightforward. However, statistical computing often requires additional information, such as the standard errors of the coefficient estimates. Calculating these involves evaluating the diagonal elements of $\left(\bm X^\top\bm X\right)^{-1}$ and the residual sum of squares, $\|\bm y-\bm X\widehat{\bm\beta}\|^2$. \subsection{Least squares using the ``LLt'' Cholesky} \label{sec:LLtLeastSquares} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MapMatd}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{X}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMatd}\hlopt{$>$(}\hlstd{XX}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapVecd}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{y}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{n}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()),\ }\hlstd{}\hlkwd{p}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{LLT}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{llt}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ }\hlstd{}\hlkwd{fitted}\hlstd{}\hlopt{(}\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{resid}\hlstd{}\hlopt{(}\hlstd{y\ }\hlopt{{-}\ }\hlstd{fitted}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{df}\hlstd{}\hlopt{(}\hlstd{n\ }\hlopt{{-}\ }\hlstd{p}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ double}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{s}\hlstd{}\hlopt{(}\hlstd{resid}\hlopt{.}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{()\ /\ }\hlstd{std}\hlopt{::}\hlstd{}\hlkwd{sqrt}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{(}\hlstd{df}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{llt}\hlopt{.}\hlstd{}\hlkwd{matrixL}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,\ }\hlstd{p}\hlopt{))}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{colwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return}\hlstd{\ \ \ \ \ }\hlkwa{}\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"coefficients"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ }\hlopt{=\ }\hlstd{betahat}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"fitted.values"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{fitted}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"residuals"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ }\hlopt{=\ }\hlstd{resid}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"s"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{s}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"df.residual"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ }\hlopt{=\ }\hlstd{df}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"rank"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{p}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"Std.\ Error"}\hlstd{}\hlopt{{)}}\hlstd{\ \ \ \ \ }\hlopt{=\ }\hlstd{se}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{lltLSCpp}: Least squares using the Cholesky decomposition. \label{lltLS}} \end{figure} Figure~\ref{lltLS} shows a calculation of the least squares coefficient estimates (\code{betahat}) and the standard errors (\code{se}) through an ``LLt'' Cholesky decomposition of the crossproduct of the model matrix, $\bm X$. Next, the results from this calculation are compared to those from the \code{lm.fit} function in \proglang{R} (\code{lm.fit} is the workhorse function called by \code{lm} once the model matrix and response have been evaluated). \begin{CodeInput} R> lltLS <- cxxfunction(signature(XX = "matrix", yy = "numeric"), lltLSCpp, + "RcppEigen", incl) R> data("trees", package = "datasets") R> str(lltFit <- with(trees, lltLS(cbind(1, log(Girth)), log(Volume)))) \end{CodeInput} \begin{CodeOutput} List of 7 $ coefficients : num [1:2] -2.35 2.2 $ fitted.values: num [1:31] 2.3 2.38 2.43 2.82 2.86 ... $ residuals : num [1:31] 0.0298 -0.0483 -0.1087 -0.0223 0.0727 ... $ s : num 0.115 $ df.residual : int 29 $ rank : int 2 $ Std. Error : num [1:2] 0.2307 0.0898 R> str(lmFit <- with(trees, lm.fit(cbind(1, log(Girth)), log(Volume)))) List of 8 $ coefficients : Named num [1:2] -2.35 2.2 ..- attr(*, "names")= chr [1:2] "x1" "x2" $ residuals : num [1:31] 0.0298 -0.0483 -0.1087 -0.0223 0.0727 ... $ effects : Named num [1:31] -18.2218 2.8152 -0.1029 -0.0223 0.0721 ... ..- attr(*, "names")= chr [1:31] "x1" "x2" "" "" ... $ rank : int 2 $ fitted.values: num [1:31] 2.3 2.38 2.43 2.82 2.86 ... $ assign : NULL $ qr :List of 5 ..$ qr : num [1:31, 1:2] -5.57 0.18 0.18 0.18 0.18 ... ..$ qraux: num [1:2] 1.18 1.26 ..$ pivot: int [1:2] 1 2 ..$ tol : num 1e-07 ..$ rank : int 2 ..- attr(*, "class")= chr "qr" $ df.residual : int 29 \end{CodeOutput} \begin{CodeInput} R> for(nm in c("coefficients", "residuals", "fitted.values", "rank")) + stopifnot(all.equal(lltFit[[nm]], unname(lmFit[[nm]]))) R> stopifnot(all.equal(lltFit[["Std. Error"]], + unname(coef(summary(lm(log(Volume) ~ log(Girth), trees)))[,2]))) \end{CodeInput} There are several aspects of the \proglang{C++} code in Figure~\ref{lltLS} worth mentioning. The \code{solve} method for the \code{LLT} object evaluates, in this case, $\left(\bm X^\top\bm X\right)^{-1}\bm X^\top\bm y$ but without actually evaluating the inverse. The calculation of the residuals, $\bm y-\widehat{\bm y}$, can be written, as in \proglang{R}, as \code{y - fitted}. (But note that \pkg{Eigen} classes do not have a ``recycling rule'' as in \proglang{R}. That is, the two vector operands must have the same length.) The \code{norm()} method evaluates the square root of the sum of squares of the elements of a vector. Although one does not explicitly evaluate $\left(\bm X^\top\bm X\right)^{-1}$ one does evaluate $\bm L^{-1}$ to obtain the standard errors. Note also the use of the \code{colwise()} method in the evaluation of the standard errors. It applies a method to the columns of a matrix, returning a vector. The \pkg{Eigen} \code{colwise()} and \code{rowwise()} methods are similar in effect to the \code{apply} function in \proglang{R}. In the descriptions of other methods for solving least squares problems, much of the code parallels that shown in Figure~\ref{lltLS}. The redundant parts are omitted, and only the evaluation of the coefficients, the rank and the standard errors is shown. Actually, the standard errors are calculated only up to the scalar multiple of $s$, the residual standard error, in these code fragments. The calculation of the residuals and $s$ and the scaling of the coefficient standard errors is the same for all methods. (See the files \code{fastLm.h} and \code{fastLm.cpp} in the \pkg{RcppEigen} source package for details.) \subsection{Least squares using the unpivoted QR decomposition} \label{sec:QR} A QR decomposition has the form \begin{displaymath} \bm X=\bm Q\bm R=\bm Q_1\bm R_1 \end{displaymath} where $\bm Q$ is an $n\times n$ orthogonal matrix, which means that $\bm Q^\top\bm Q=\bm Q\bm Q^\top=\bm I_n$, and the $n\times p$ matrix $\bm R$ is zero below the main diagonal. The $n\times p$ matrix $\bm Q_1$ is the first $p$ columns of $\bm Q$ and the $p\times p$ upper triangular matrix $\bm R_1$ is the top $p$ rows of $\bm R$. There are three \pkg{Eigen} classes for the QR decomposition: \code{HouseholderQR} provides the basic QR decomposition using Householder transformations, \code{ColPivHouseholderQR} incorporates column pivots and \code{FullPivHouseholderQR} incorporates both row and column pivots. Figure~\ref{QRLS} shows a least squares solution using the unpivoted QR decomposition. The calculations in Figure~\ref{QRLS} are quite similar to those in Figure~\ref{lltLS}. In fact, if one had extracted the upper triangular factor (the \code{matrixU()} method) from the \code{LLT} object in Figure~\ref{lltLS}, the rest of the code would be nearly identical. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{HouseholderQR}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{HouseholderQR}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{QR}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{QR}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{y}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{fitted}\hlstd{}\hlopt{(}\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{df}\hlstd{}\hlopt{(}\hlstd{n\ }\hlopt{{-}\ }\hlstd{p}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{QR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topRows}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{).}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,}\hlstd{p}\hlopt{)).}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{QRLSCpp}: Least squares using the unpivoted QR decomposition. \label{QRLS}} \end{figure} \subsection{Handling the rank-deficient case} \label{sec:rankdeficient} One important consideration when determining least squares solutions is whether $\rank(\bm X)$ is $p$, a situation described by saying that $\bm X$ has ``full column rank''. When $\bm X$ does not have full column rank it is said to be ``rank deficient''. Although the theoretical rank of a matrix is well-defined, its evaluation in practice is not. At best one can compute an effective rank according to some tolerance. Decompositions that allow to estimation of the rank of the matrix in this way are said to be ``rank-revealing''. Because the \code{model.matrix} function in \proglang{R} does a considerable amount of symbolic analysis behind the scenes, one usually ends up with full-rank model matrices. The common cases of rank-deficiency, such as incorporating both a constant term and a full set of indicators columns for the levels of a factor, are eliminated. Other, more subtle, situations will not be detected at this stage, however. A simple example occurs when there is a ``missing cell'' in a two-way layout and the interaction of the two factors is included in the model. \begin{CodeInput} R> dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]), + f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ] R> xtabs(~ f2 + f1, dd) \end{CodeInput} \begin{CodeOutput} f1 f2 A B C D a 2 0 2 2 b 2 2 2 2 c 2 2 2 2 \end{CodeOutput} \begin{CodeInput} R> mm <- model.matrix(~ f1 * f2, dd) R> kappa(mm) \end{CodeInput} \begin{CodeOutput} [1] 4.30923e+16 \end{CodeOutput} This yields a large condition number, indicating rank deficiency. Alternatively, the reciprocal condition number can be evaluated. \begin{CodeInput} R> rcond(mm) \end{CodeInput} \begin{CodeOutput} [1] 2.3206e-17 \end{CodeOutput} \begin{CodeInput} R> (c(rank = qr(mm)$rank, p = ncol(mm))) \end{CodeInput} \begin{CodeOutput} rank p 11 12 \end{CodeOutput} \begin{CodeInput} R> set.seed(1) R> dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1) R> fm1 <- lm(y ~ f1 * f2, dd) R> writeLines(capture.output(print(summary(fm1), + signif.stars = FALSE))[9:22]) \end{CodeInput} \begin{CodeOutput} Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9779 0.0582 16.8 3.4e-09 f1B 12.0381 0.0823 146.3 < 2e-16 f1C 3.1172 0.0823 37.9 5.2e-13 f1D 4.0685 0.0823 49.5 2.8e-14 f2b 5.0601 0.0823 61.5 2.6e-15 f2c 5.9976 0.0823 72.9 4.0e-16 f1B:f2b -3.0148 0.1163 -25.9 3.3e-11 f1C:f2b 7.7030 0.1163 66.2 1.2e-15 f1D:f2b 8.9643 0.1163 77.1 < 2e-16 f1B:f2c NA NA NA NA f1C:f2c 10.9613 0.1163 94.2 < 2e-16 f1D:f2c 12.0411 0.1163 103.5 < 2e-16 \end{CodeOutput} The \code{lm} function for fitting linear models in \proglang{R} uses a rank-revealing form of the QR decomposition. When the model matrix is determined to be rank deficient, according to the threshold used in \proglang{R}'s \code{qr} function, the model matrix is reduced to $\rank{(\bm X)}$ columns by pivoting selected columns (those that are apparently linearly dependent on columns to their left) to the right hand side of the matrix. A solution for this reduced model matrix is determined and the coefficients and standard errors for the redundant columns are flagged as missing. An alternative approach is to evaluate the ``pseudo-inverse'' of $\bm X$ from the singular value decomposition (SVD) of $\bm X$ or the eigendecomposition of $\bm X^\top\bm X$. The SVD is of the form \begin{displaymath} \bm X=\bm U\bm D\bm V^\top=\bm U_1\bm D_1\bm V^\top \end{displaymath} where $\bm U$ is an orthogonal $n\times n$ matrix and $\bm U_1$ is its leftmost $p$ columns, $\bm D$ is $n\times p$ and zero off the main diagonal so that $\bm D_1$ is a $p\times p$ diagonal matrix with non-increasing, non-negative diagonal elements, and $\bm V$ is a $p\times p$ orthogonal matrix. The pseudo-inverse of $\bm D_1$, written $\bm D_1^+$ is a $p\times p$ diagonal matrix whose first $r=\rank(\bm X)$ diagonal elements are the inverses of the corresponding diagonal elements of $\bm D_1$ and whose last $p-r$ diagonal elements are zero. The tolerance for determining if an element of the diagonal of $\bm D_1$ is considered to be (effectively) zero is a multiple of the largest singular value (i.e., the $(1,1)$ element of $\bm D$). The pseudo-inverse, $\bm X^+$, of $\bm X$ is defined as \begin{displaymath} \bm X^+=\bm V\bm D_1^+\bm U_1^\top . \end{displaymath} \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{inline\ }\hlstd{ArrayXd\ }\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlopt{\&\ }\hlstd{d}\hlopt{)\ \{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{ArrayXd}\hlstd{\ \ \ }\hlstd{}\hlkwd{di}\hlstd{}\hlopt{(}\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{size}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwb{double}\hlstd{\ \ }\hlkwb{}\hlstd{}\hlkwd{comp}\hlstd{}\hlopt{(}\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{maxCoeff}\hlstd{}\hlopt{()\ {*}\ }\hlstd{}\hlkwd{threshold}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{for\ }\hlstd{}\hlopt{(}\hlstd{}\hlkwb{int\ }\hlstd{j\ }\hlopt{=\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{;\ }\hlstd{j\ }\hlopt{$<$\ }\hlstd{d}\hlopt{.}\hlstd{}\hlkwd{size}\hlstd{}\hlopt{();\ ++}\hlstd{j}\hlopt{)\ }\hlstd{di}\hlopt{{[}}\hlstd{j}\hlopt{{]}\ =\ (}\hlstd{d}\hlopt{{[}}\hlstd{j}\hlopt{{]}\ $<$\ }\hlstd{comp}\hlopt{)\ }\hlstd{?\ }\hlnum{0}\hlstd{}\hlopt{.\ :\ }\hlstd{}\hlnum{1}\hlstd{}\hlopt{./}\hlstd{d}\hlopt{{[}}\hlstd{j}\hlopt{{]};}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{}\hlkwa{return\ }\hlstd{di}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{DplusCpp}: Create the diagonal $\bm d^+$ of the pseudo-inverse, $\bm D_1^+$, from the array of singular values, $\bm d$. \label{Dplus}} \end{figure} In Figure~\ref{Dplus} a utility function, \code{Dplus}, is defined to return the diagonal of the pseudo-inverse, $\bm D_1^+$, as an array, given the singular values (the diagonal of $\bm D_1$) as an array. Calculation of the maximum element of $\bm d$ (the method is called \code{.maxCoeff()}) and the use of a \code{threshold()} function provides greater generality for the function. It can be used on the eigenvalues of $\bm X^\top\bm X$, as shown in Section~\ref{sec:eigendecomp}, even though these are returned in increasing order, as opposed to the decreasing order of the singular values. \subsection{Least squares using the SVD} \label{sec:SVDls} \begin{figure}[b!] \hrule \smallskip \noindent \ttfamily \hlkwb{const\ }\hlstd{Eigen}\hlopt{::}\hlstd{JacobiSVD}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{UDV}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{jacobiSvd}\hlstd{}\hlopt{(}\hlstd{Eigen}\hlopt{::}\hlstd{ComputeThinU}\hlopt{\textbar }\hlstd{Eigen}\hlopt{::}\hlstd{ComputeThinV}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{Dp}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{singularValues}\hlstd{}\hlopt{()));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{((}\hlstd{Dp\ }\hlopt{$>$\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{).}\hlstd{}\hlkwd{count}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{VDp}\hlstd{}\hlopt{(}\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{matrixV}\hlstd{}\hlopt{()\ {*}\ }\hlstd{Dp}\hlopt{.}\hlstd{}\hlkwd{matrix}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{asDiagonal}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{VDp\ }\hlopt{{*}\ }\hlstd{UDV}\hlopt{.}\hlstd{}\hlkwd{matrixU}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{SVDLSCpp}: Least squares using the SVD. \label{SVDLS}} \end{figure} With these definitions the code for least squares using the singular value decomposition can be written as in Figure~\ref{SVDLS}. Even in the rank-deficient case this code will produce a complete set of coefficients and their standard errors. The user must be careful to check if the computed rank is less than $p$, the number of columns in $\bm X$, in which case the estimated coefficients are just one of an infinite number of coefficient vectors that produce the same fitted values. It happens that the solution from this pseudo-inverse is the minimum norm solution (in the sense that $\|\bm\beta\|^2$ is minimized among those $\bm\beta$ that minimize $\|\bm y-\bm X\bm\beta\|^2$). The standard errors of the coefficient estimates in the rank-deficient case must be interpreted carefully. The solution with one or more missing coefficients, as returned by the \code{lm.fit} function in \proglang{R} and by the column-pivoted QR decomposition described in Section~\ref{sec:colPivQR}, does not provide standard errors for the missing coefficients. That is, both the coefficient and its standard error are returned as \code{NA} because the least squares solution is performed on a reduced model matrix. It is also true that the solution returned by the SVD method is with respect to a reduced model matrix but the $p$ coefficient estimates and their $p$ standard errors don't show this. They are, in fact, linear combinations of a set of $r$ coefficient estimates and their standard errors. \pagebreak \subsection{Least squares using the eigendecomposition} \label{sec:eigendecomp} The eigendecomposition of $\bm X^\top\bm X$ is defined as \begin{displaymath} \bm X^\top\bm X=\bm V\bm\Lambda\bm V^\top \end{displaymath} where $\bm V$, the matrix of eigenvectors, is a $p\times p$ orthogonal matrix and $\bm\Lambda$ is a $p\times p$ diagonal matrix with non-decreasing, non-negative diagonal elements, called the eigenvalues of $\bm X^\top\bm X$. When the eigenvalues are distinct, this $\bm V$ is the same (up to reordering of the columns) as that in the SVD and the eigenvalues of $\bm X^\top\bm X$ are the (reordered) squares of the singular values of $\bm X$. With these definitions one can adapt much of the code from the SVD method for the eigendecomposition, as shown in Figure~\ref{SymmEigLS}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlkwb{const\ }\hlstd{Eigen}\hlopt{::}\hlstd{SelfAdjointEigenSolver}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$\ }\hlstd{}\hlkwd{VLV}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{AtA}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{ArrayXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{Dp}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Dplus}\hlstd{}\hlopt{(}\hlstd{eig}\hlopt{.}\hlstd{}\hlkwd{eigenvalues}\hlstd{}\hlopt{()).}\hlstd{}\hlkwd{sqrt}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{((}\hlstd{Dp\ }\hlopt{$>$\ }\hlstd{}\hlnum{0}\hlstd{}\hlopt{).}\hlstd{}\hlkwd{count}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MatrixXd}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{VDp}\hlstd{}\hlopt{(}\hlstd{VLV}\hlopt{.}\hlstd{}\hlkwd{eigenvectors}\hlstd{}\hlopt{()\ {*}\ }\hlstd{Dp}\hlopt{.}\hlstd{}\hlkwd{matrix}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{asDiagonal}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd\ }\hlkwd{betahat}\hlstd{}\hlopt{(}\hlstd{VDp\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{se}\hlstd{}\hlopt{(}\hlstd{s\ }\hlopt{{*}\ }\hlstd{VDp}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{SymmEigLSCpp}: Least squares using the eigendecomposition. \label{SymmEigLS}} \end{figure} \subsection{Least squares using the column-pivoted QR decomposition} \label{sec:colPivQR} The column-pivoted QR decomposition provides results similar to those from \proglang{R} in both the full-rank and the rank-deficient cases. The decomposition is of the form \begin{displaymath} \bm X\bm P=\bm Q\bm R=\bm Q_1\bm R_1 \end{displaymath} where, as before, $\bm Q$ is $n\times n$ and orthogonal and $\bm R$ is $n\times p$ and upper triangular. The $p\times p$ matrix $\bm P$ is a permutation matrix. That is, its columns are a permutation of the columns of $\bm I_p$. It serves to reorder the columns of $\bm X$ so that the diagonal elements of $\bm R$ are non-increasing in magnitude. An instance of the class \code{Eigen::ColPivHouseholderQR} has a \code{rank()} method returning the computational rank of the matrix. When $\bm X$ is of full rank one can use essentially the same code as in the unpivoted decomposition except that one must reorder the standard errors. When $\bm X$ is rank-deficient, the coefficients and standard errors are evaluated for the leading $r$ columns of $\bm X\bm P$ only. In the rank-deficient case the straightforward calculation of the fitted values, as $\bm X\widehat{\bm\beta}$, cannot be used because some of the estimated coefficients, $\widehat{\bm\beta}$, are \code{NA} so the product will be a vector of \code{NA}'s. One could do some complicated rearrangement of the columns of X and the coefficient estimates but it is conceptually (and computationally) easier to employ the relationship \begin{displaymath} \widehat{\bm y} = \bm Q_1\bm Q_1^\top\bm y=\bm Q \begin{bmatrix} \bm I_r & \bm 0\\ \bm 0 & \bm 0 \end{bmatrix} \bm Q^\top\bm y \end{displaymath} The vector $\bm Q^\top\bm y$ is called the ``effects'' vector in \proglang{R}. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{ColPivHouseholderQR}\hlopt{$<$}\hlstd{MatrixXd}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{CPivQR}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{CPivQR}\hlopt{::}\hlstd{PermutationType}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Permutation}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{CPivQR}\hlstd{\ \ \ \ \ \ \ }\hlstd{}\hlkwd{PQR}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{Permutation\ }\hlkwd{Pmat}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{colsPermutation}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ int}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ }\hlkwb{}\hlstd{}\hlkwd{r}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{rank}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{VectorXd}\hlstd{\ \ \ \ \ \ \ }\hlstd{betahat}\hlopt{,\ }\hlstd{fitted}\hlopt{,\ }\hlstd{se}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{if\ }\hlstd{}\hlopt{(}\hlstd{r\ }\hlopt{==\ }\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{cols}\hlstd{}\hlopt{())\ \{\ }\hlstd{}\hlslc{//\ full\ rank\ case}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat\ }\hlopt{=\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{fitted}\hlstd{\ \ }\hlstd{}\hlopt{=\ }\hlstd{X\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topRows}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{).}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$()}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{p}\hlopt{,\ }\hlstd{p}\hlopt{)).}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlopt{\}\ }\hlstd{}\hlkwa{else\ }\hlstd{}\hlopt{\{}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{MatrixXd}\hlstd{\ \ \ \ }\hlstd{}\hlkwd{Rinv}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{matrixQR}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{topLeftCorner}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{,\ }\hlstd{r}\hlopt{)}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{.}\hlstd{triangularView}\hlopt{$<$}\hlstd{Upper}\hlopt{$>$().}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{MatrixXd}\hlopt{::}\hlstd{}\hlkwd{Identity}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{,\ }\hlstd{r}\hlopt{)));}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{VectorXd\ }\hlkwd{effects}\hlstd{}\hlopt{(}\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{householderQ}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{()\ {*}\ }\hlstd{y}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlopt{.}\hlstd{}\hlkwd{fill}\hlstd{}\hlopt{(::}\hlstd{NA\textunderscore REAL}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{)}\hlstd{\ \ }\hlopt{=\ }\hlstd{Rinv\ }\hlopt{{*}\ }\hlstd{effects}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{betahat}\hlstd{\ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{betahat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlopt{.}\hlstd{}\hlkwd{fill}\hlstd{}\hlopt{(::}\hlstd{NA\textunderscore REAL}\hlopt{);}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlopt{.}\hlstd{}\hlkwd{head}\hlstd{}\hlopt{(}\hlstd{r}\hlopt{)}\hlstd{\ \ \ \ \ \ \ }\hlopt{=\ }\hlstd{Rinv}\hlopt{.}\hlstd{}\hlkwd{rowwise}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{norm}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{se}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{Pmat\ }\hlopt{{*}\ }\hlstd{se}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlslc{//\ create\ fitted\ values\ from\ effects}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{effects}\hlopt{.}\hlstd{}\hlkwd{tail}\hlstd{}\hlopt{(}\hlstd{X}\hlopt{.}\hlstd{}\hlkwd{rows}\hlstd{}\hlopt{()\ {-}\ }\hlstd{r}\hlopt{).}\hlstd{}\hlkwd{setZero}\hlstd{}\hlopt{();}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ }\hlstd{fitted}\hlstd{\ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlopt{=\ }\hlstd{PQR}\hlopt{.}\hlstd{}\hlkwd{householderQ}\hlstd{}\hlopt{()\ {*}\ }\hlstd{effects}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlopt{\}}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{ColPivQRLSCpp}: Least squares using the pivoted QR decomposition. \label{ColPivQRLS}} \end{figure} Just to check that the code in Figure~\ref{ColPivQRLS} does indeed provide the desired answer \begin{CodeInput} R> print(summary(fmPQR <- fastLm(y ~ f1 * f2, dd)), signif.st = FALSE) \end{CodeInput} \begin{CodeOutput} Call: fastLm.formula(formula = y ~ f1 * f2, data = dd) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.977859 0.058165 16.812 3.413e-09 f1B 12.038068 0.082258 146.346 < 2.2e-16 f1C 3.117222 0.082258 37.896 5.221e-13 f1D 4.068523 0.082258 49.461 2.833e-14 f2b 5.060123 0.082258 61.516 2.593e-15 f2c 5.997592 0.082258 72.912 4.015e-16 f1B:f2b -3.014763 0.116330 -25.916 3.266e-11 f1C:f2b 7.702999 0.116330 66.217 1.156e-15 f1D:f2b 8.964251 0.116330 77.059 < 2.2e-16 f1B:f2c NA NA NA NA f1C:f2c 10.961326 0.116330 94.226 < 2.2e-16 f1D:f2c 12.041081 0.116330 103.508 < 2.2e-16 Residual standard error: 0.2868 on 11 degrees of freedom Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999 \end{CodeOutput} \begin{CodeInput} R> all.equal(coef(fm1), coef(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(fitted(fm1)), fitted(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(residuals(fm1)), residuals(fmPQR)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} The rank-revealing SVD method produces the same fitted values but not the same coefficients. \begin{CodeInput} R> print(summary(fmSVD <- fastLm(y ~ f1 * f2, dd, method = 4)), + signif.st = FALSE) \end{CodeInput} \begin{CodeOutput} Call: fastLm.formula(formula = y ~ f1 * f2, data = dd, method = 4) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.977859 0.058165 16.812 3.413e-09 f1B 7.020458 0.038777 181.049 < 2.2e-16 f1C 3.117222 0.082258 37.896 5.221e-13 f1D 4.068523 0.082258 49.461 2.833e-14 f2b 5.060123 0.082258 61.516 2.593e-15 f2c 5.997592 0.082258 72.912 4.015e-16 f1B:f2b 2.002847 0.061311 32.667 2.638e-12 f1C:f2b 7.702999 0.116330 66.217 1.156e-15 f1D:f2b 8.964251 0.116330 77.059 < 2.2e-16 f1B:f2c 5.017610 0.061311 81.838 < 2.2e-16 f1C:f2c 10.961326 0.116330 94.226 < 2.2e-16 f1D:f2c 12.041081 0.116330 103.508 < 2.2e-16 Residual standard error: 0.2868 on 11 degrees of freedom Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999 \end{CodeOutput} \begin{CodeInput} R> all.equal(coef(fm1), coef(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] "'is.NA' value mismatch: 0 in current 1 in target" \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(fitted(fm1)), fitted(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \begin{CodeInput} R> all.equal(unname(residuals(fm1)), residuals(fmSVD)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} The coefficients from the symmetric eigendecomposition method are the same as those from the SVD, hence the fitted values and residuals must be the same for these two methods. \begin{CodeInput} R> summary(fmVLV <- fastLm(y ~ f1 * f2, dd, method = 5)) R> all.equal(coef(fmSVD), coef(fmVLV)) \end{CodeInput} \begin{CodeOutput} [1] TRUE \end{CodeOutput} \subsection{Comparative speed} In the \pkg{RcppEigen} package the \proglang{R} function to fit linear models using the methods described above is called \code{fastLm}. It follows an earlier example in the \pkg{Rcpp} package which was carried over to both \pkg{RcppArmadillo} and \pkg{RcppGSL}. The natural question to ask is, ``Is it indeed fast to use these methods based on \pkg{Eigen}?''. To this end, the package provides benchmarking code for these methods, \proglang{R}'s \code{lm.fit} function and the \code{fastLm} implementations in the \pkg{RcppArmadillo} \citep{CRAN:RcppArmadillo} and \pkg{RcppGSL} \citep{CRAN:RcppGSL} packages, if they are installed. The benchmark code, which uses the \pkg{rbenchmark} \citep{CRAN:rbenchmark} package, is in a file named \code{lmBenchmark.R} in the \code{examples} subdirectory of the installed \pkg{RcppEigen} package. It can be run as \begin{CodeInput} R> source(system.file("examples", "lmBenchmark.R", package = "RcppEigen")) \end{CodeInput} Results will vary according to the speed of the processor and the implementation of the BLAS (Basic Linear Algebra Subroutines) used. (\pkg{Eigen} methods do not use the BLAS but the other methods do.) The \pkg{Eigen}3 template library does not use multi-threaded code for these operations but does use the graphics pipeline instructions (SSE and SSE2, in this case) in some calculations. \begin{table}[t!] \centering \begin{tabular}{r r r r r} \hline Method & Relative& Elapsed & User & Sys \\ \hline LDLt & 1.00 & 1.18 & 1.17 & 0.00 \\ LLt & 1.01 & 1.19 & 1.17 & 0.00 \\ SymmEig & 2.76 & 3.25 & 2.70 & 0.52 \\ QR & 6.35 & 7.47 & 6.93 & 0.53 \\ arma & 6.60 & 7.76 & 25.69 & 4.47 \\ PivQR & 7.15 & 8.41 & 7.78 & 0.62 \\ lm.fit & 11.68 & 13.74 & 21.56 & 16.79 \\ GESDD & 12.58 & 14.79 & 44.01 & 10.96 \\ SVD & 44.48 & 52.30 & 51.38 & 0.80 \\ GSL & 150.46 & 176.95 & 210.52 & 149.86 \\ \hline \end{tabular} \caption{\code{lmBenchmark} results on a desktop computer for the default size, $100,000\times 40$, full-rank model matrix running 20 repetitions for each method. Times (Elapsed, User, and Sys) are in seconds. The BLAS in use is a locally-rebuilt version of the OpenBLAS library included with Ubuntu 11.10. \label{tab:lmRes}} \end{table} Results obtained on a desktop computer, circa 2010, are shown in Table~\ref{tab:lmRes}. The processor used for these timings is a 4-core processor but almost all the methods are single-threaded and not affected by the number of cores. Only the \code{arma}, \code{lm.fit}, \code{GESDD} and \code{GSL} methods benefit from the multi-threaded BLAS implementation provided by OpenBLAS, and the relative speed increase is modest for this problem size and number of cores (at 7.76 seconds relative to 10.29 seconds for \code{arma}, 13.74 seconds relative to 16.91 seconds for \code{lm.fit}, and 176.95 seconds relative to 193.29 seconds for the \code{GSL}). Parallel computing approaches will always have to trade-off increased communication and overhead costs against the potential gains from running multiple execution threads. % Nonetheless, with the ongoing %shift to multi-core architectures and an ever increasing number of cores in %modern processing units, it is conceivable that \pkg{Eigen} may also switch %to a multi-threaded approach to further improve its performance. These results indicate that methods based on forming and decomposing $\bm X^\top\bm X$ (LDLt, LLt and SymmEig) are considerably faster than the others. The SymmEig method, using a rank-revealing decomposition, would be preferred, although the LDLt method could probably be modified to be rank-revealing. However, the dimensions of the problem will influence the comparative results. Because there are 100,000 rows in $\bm X$, methods that decompose the whole $\bm X$ matrix (all the methods except those named above) will be at a disadvantage. The pivoted QR method is 1.6 times faster than \proglang{R}'s \code{lm.fit} on this test and provides nearly the same information as \code{lm.fit}. Methods based on the singular value decomposition (SVD and GSL) are much slower but, as mentioned above, this is caused in part by $\bm X$ having many more rows than columns. The GSL method from the GNU Scientific Library uses an older algorithm for the SVD and is clearly out of contention. The \code{GESDD} method provides an interesting hybrid: It uses the \pkg{Eigen} classes, but then deploys the LAPACK routine \code{dgesdd} for the actual SVD calculation. The resulting time is much faster than using the SVD implementation of \pkg{Eigen} which is not a particularly fast SVD method. \section{Delayed evaluation} \label{sec:delayed} A form of delayed evaluation is used in \pkg{Eigen}. That is, many operators and methods do not evaluate the result but instead return an ``expression object'' that is evaluated when needed. As an example, even though one writes the $\bm X^\top\bm X$ evaluation as \code{.rankUpdate(X.adjoint())} the \code{X.adjoint()} part is not evaluated immediately. The \code{rankUpdate} method detects that it has been passed a matrix that is to be used in its transposed form and evaluates the update by taking inner products of columns of $\bm X$ instead of rows of $\bm X^\top$. As \pkg{Eigen} has developed some of these unevaluated expressions have been modified. In \pkg{Eigen 3.1}, which is incorporated in version 0.3.1 of \pkg{RcppEigen}, the \code{.adjoint()} method applied to a real dense matrix copies the contents of the original matrix, flags it as row-major and interchanges the number of rows and columns. This is indeed the adjoint of the original matrix but, at one time, the \code{wrap} method for the \pkg{Eigen} dense matrix classes would fail on row-major matrices. \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwb{const\ }\hlstd{MapMati}\hlstd{\ \ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapMati}\hlopt{$>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{}\hlkwd{wrap}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{transpose}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{badtransCpp}: Transpose producing a run-time error in early versions of \pkg{RcppEigen}. \label{badtrans}} \end{figure} In the code for the transpose of an integer matrix shown in Figure~\ref{trans} the transpose is assigned to a \code{MatrixXi} object before applying \code{wrap} to it. The assignment forces the evaluation as a column-major matrix. In early versions of the \pkg{RcppEigen} package if this step is skipped, as in Figure~\ref{badtrans}, the result would have been incorrect. \begin{CodeInput} R> Ai <- matrix(1:6, ncol = 2L) R> ftrans2 <- cxxfunction(signature(AA = "matrix"), badtransCpp, + "RcppEigen", incl) R> (At <- ftrans2(Ai)) \end{CodeInput} \begin{CodeOutput} [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 \end{CodeOutput} Although the problem no longer persists for this particular example, the recommended practice is to first assign objects before wrapping them for return to \proglang{R}. \section{Sparse matrices} \label{sec:sparse} \begin{figure}[b!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{MappedSparseMatrix}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{using\ }\hlstd{Eigen}\hlopt{::}\hlstd{SparseMatrix}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$\ }\hlstd{}\hlkwd{A}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$\ $>$(}\hlstd{AA}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{MapVecd}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{y}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{SparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ \ \ \ \ }\hlopt{}\hlstd{}\hlkwd{At}\hlstd{}\hlopt{(}\hlstd{A}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"At"}\hlstd{}\hlopt{{)}}\hlstd{\ \ }\hlopt{=\ }\hlstd{At}\hlopt{,}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{Named}\hlopt{{(}}\hlstd{}\hlstr{"Aty"}\hlstd{}\hlopt{{)}\ =\ }\hlstd{At\ }\hlopt{{*}\ }\hlstd{y}\hlopt{);}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{sparseProdCpp}: Transpose and product with sparse matrices. \label{sparseProd}} \end{figure} \pkg{Eigen} provides sparse matrix classes. An \proglang{R} object of class \code{dgCMatrix} (from the \pkg{Matrix} package by \citet{CRAN:Matrix}) can be mapped as in Figure~\ref{sparseProd}. \begin{CodeInput} R> sparse1 <- cxxfunction(signature(AA = "dgCMatrix", yy = "numeric"), + sparseProdCpp, "RcppEigen", incl) R> data("KNex", package = "Matrix") R> rr <- sparse1(KNex$mm, KNex$y) R> stopifnot(all.equal(rr$At, t(KNex$mm)), + all.equal(rr$Aty, as.vector(crossprod(KNex$mm, KNex$y)))) \end{CodeInput} % Sparse Cholesky decompositions are provided by the \code{SimplicialLLT} and \code{SimplicialLDLT} classes in the \pkg{RcppEigen} package for \proglang{R}. These are subclasses of the \code{SimplicialCholesky} templated. A sample usage is shown in Figure~\ref{fig:spLS}. % \begin{figure}[t!] \hrule \smallskip \noindent \ttfamily \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{MappedSparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ }\hlopt{}\hlstd{MSpMat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{SparseMatrix}\hlopt{$<$}\hlstd{}\hlkwb{double}\hlstd{}\hlopt{$>$}\hlstd{\ \ \ \ \ \ \ \ \ }\hlopt{}\hlstd{SpMat}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwc{typedef\ }\hlstd{Eigen}\hlopt{::}\hlstd{SimplicialLDLT}\hlopt{$<$}\hlstd{SpMat}\hlopt{$>$}\hlstd{\ \ \ \ \ \ \ }\hlopt{}\hlstd{SpChol}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hspace*{\fill}\\ \hlkwb{const\ }\hlstd{SpMat}\hlstd{\ \ \ \ \ \ }\hlstd{}\hlkwd{At}\hlstd{}\hlopt{(}\hlstd{as}\hlopt{$<$}\hlstd{MSpMat}\hlopt{$>$(}\hlstd{AA}\hlopt{).}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{VectorXd}\hlstd{\ \ }\hlstd{}\hlkwd{Aty}\hlstd{}\hlopt{(}\hlstd{At\ }\hlopt{{*}\ }\hlstd{as}\hlopt{$<$}\hlstd{MapVecd}\hlopt{$>$(}\hlstd{yy}\hlopt{));}\hspace*{\fill}\\ \hlstd{}\hlkwb{const\ }\hlstd{SpChol}\hlstd{\ \ \ \ \ }\hlstd{}\hlkwd{Ch}\hlstd{}\hlopt{(}\hlstd{At\ }\hlopt{{*}\ }\hlstd{At}\hlopt{.}\hlstd{}\hlkwd{adjoint}\hlstd{}\hlopt{());}\hspace*{\fill}\\ \hlstd{}\hlkwa{if\ }\hlstd{}\hlopt{(}\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{info}\hlstd{}\hlopt{()\ !=\ }\hlstd{Eigen}\hlopt{::}\hlstd{Success}\hlopt{)\ }\hlstd{}\hlkwa{return\ }\hlstd{R\textunderscore NilValue}\hlopt{;}\hspace*{\fill}\\ \hlstd{}\hlkwa{return\ }\hlstd{List}\hlopt{::}\hlstd{}\hlkwd{create}\hlstd{}\hlopt{(}\hlstd{}\hlkwd{Named}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"betahat"}\hlstd{}\hlopt{)}\hlstd{\ \ }\hlopt{=\ }\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{solve}\hlstd{}\hlopt{(}\hlstd{Aty}\hlopt{),}\hspace*{\fill}\\ \hlstd{}\hlstd{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\hlstd{}\hlkwd{Named}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"perm"}\hlstd{}\hlopt{)}\hlstd{\ \ \ \ \ }\hlopt{=\ }\hlstd{Ch}\hlopt{.}\hlstd{}\hlkwd{permutationP}\hlstd{}\hlopt{().}\hlstd{}\hlkwd{indices}\hlstd{}\hlopt{());}\hlstd{}\hspace*{\fill}\\ \mbox{} \normalfont \normalsize \hrule \caption{\code{sparseLSCpp}: Solving a sparse least squares problem. \label{fig:spLS}} \end{figure} % \begin{CodeInput} R> sparse2 <- cxxfunction(signature(AA = "dgCMatrix", yy = "numeric"), + sparseLSCpp, "RcppEigen", incl) R> str(rr <- sparse2(KNex$mm, KNex$y)) \end{CodeInput} \begin{CodeOutput} List of 2 $ betahat: num [1:712] 823 340 473 349 188 ... $ perm : int [1:712] 572 410 414 580 420 425 417 426 431 445 ... \end{CodeOutput} \begin{CodeInput} R> res <- as.vector(solve(Ch <- Cholesky(crossprod(KNex$mm)), + crossprod(KNex$mm, KNex$y))) R> stopifnot(all.equal(rr$betahat, res)) R> all(rr$perm == Ch@perm) \end{CodeInput} \begin{CodeOutput} [1] FALSE \end{CodeOutput} The fill-reducing permutations are different. \section{Summary} This paper introduced the \pkg{RcppEigen} package which provides high-level linear algebra computations as an extension to the \proglang{R} system. \pkg{RcppEigen} is based on the modern \proglang{C++} library \pkg{Eigen} which combines extended functionality with excellent performance, and utilizes \pkg{Rcpp} to interface \proglang{R} with \proglang{C++}. Several illustrations covered common matrix operations and several approaches to solving a least squares problem---including an extended discussion of rank-revealing approaches. Sparse matrix computations were also discussed, and a short example provided an empirical demonstration of the excellent run-time performance of the \pkg{RcppEigen} package. \pagebreak \bibliography{\Sexpr{Rcpp:::bib()}} \end{document} %%% Local Variables: %%% mode: latex %%% TeX-master: t %%% End: RcppEigen/inst/doc/RcppEigen-Introduction.pdf0000644000176200001440000050441514567630621020721 0ustar liggesusers%PDF-1.5 % 1 0 obj << /Type /ObjStm /Length 4649 /Filter /FlateDecode /N 81 /First 683 >> stream x\ksI~E[&n~mr``Xf6&6R K==YUDLܐGvVV>NfuKPu.S)PژPBx#\BBJ! IUBiL [hA#<Ba VkAD&p  .l)-LS~Q8}Dxv<1 cG/%|iҠk7-M#dGr<߅Ve9$דqWC.O_Ofڛ .vE)qu)f df©&u''X恫y;&DLVna|By t|^;N79Z+y45{ً_ dR>?A}> ^!n9L@:tkn)Y뼵yif*2IƟAkѸd>Dhi$ 2oU5yk6ߜ0D'2=[nGM=sRgOvX'KTL0e u~MpAjXi?5O2)ɘSnr2 hO S,mYh &SjD"O^j{˯֕楃Fnz8貽m8;MJD[4g*MN3Ӱt:Z'*ytD%OoK#o03QĆIlQzhZqmxH80jW{px4FB8!Qo$]=:&a81!g-D1wt۟}b9>E;dXn*x ā> kJ,euQoUg߃ :LPs(gYT X^=h~><C퍦 w44>mM0cMwMcjWq(q@ėӣGvG{ۣ:%fք[O̞F=' OMZM*Uw'3Eo0C_QD>lG>˂#k/DVa~^ܴc`'.Ş.m<V^⫱ ]>mu/ն{%-yUn++UkL,g®Ɗ)Z®*o(TJ%mW[[$çzq)ZO߯_ ҆u\_`Vv}e>S-,RbA[hU2 t\iק†踸g v.ݬ"qᛗ94̧ >m}:|F7ݯ:ow3 1ajξ"#UMx;*+Xt򍖿 -aa[nѱBXR. =Ql,1! WjŜR̙S\o$-Uik"PA͖:ǜ%!桔W)fP#K轟PҔd-:{C5ȁ-+xTg,8˪:8|>h_{(Pm$Q Yf8Kk|iY#9C NBG˧7ALRH:?(t,qתpY,t^Av/3--dЗ:ր){ޒ_B ")+)= OdZ"Ƹq֣tZ] i gYQY¹%@BEC3JjtwfǏ`:ǖ٬醴M5zÕ704@=(!J%G}f7I=_T77(BHf#.4Nޱu ;0U1R L`E}(E5B 8eGO͔tix[^Z+h\?B ʹg09 A&JttVzGmpzt{D+d[ t@t@~Un*6__/lF^01B"D!Y"2X_'(Q*|awCpED|[P}endstream endobj 83 0 obj << /Subtype /XML /Type /Metadata /Length 1538 >> stream GPL Ghostscript 10.01.2 linear algebra, template programmig, R, C++, Rcpp 2024-02-28T07:13:52-06:00 2024-02-28T07:13:52-06:00 LaTeX with hyperref Fast and Elegant Numerical Linear Algebra Using the RcppEigen PackageDouglas Bates, Dirk Eddelbuettel endstream endobj 84 0 obj << /Type /ObjStm /Length 2830 /Filter /FlateDecode /N 81 /First 724 >> stream xZ[o7~_pxxgQMۢ4uʒ!vw8#it'8"pԐs!JD/(Z0("&HDQ$C[6AV9A l)m) m)v.!d)bdq"0X:" ;$ <ɱ Y\,I̲b@^4yln=Qa(Kd&@9a84KT,X** 3.@b%f( OG{V>xޏ&h*yfX5 8tV-įYu8˚Dd6)joŋr4x%XNhIB~j0agbv; _jPY'g@V{5 k> ~ ݔU{|^ W=^ ދ}6 뱱5FMhqϾR2/ךO[C݋'.#Dy[OgWKUNt e^_}ŠzLCRʫ`Nxz=||Q=yb8T9Fg >ifl3 k mEfߴe)hS:%'/_dᑖ\y$JLJm$n.tA[ 'f8%PP_<6|vɆk{NKj Pruw'yqZ*~*Ί7⢸,A1*j? קw[ω-1{:ZvgZDkL12C[* C$!9RGgi^h{a ~K$U׶wϟ|d\WX۲^LXw{9jm֖UwW5TxXVzc I=|pLZh8W4HT=&@s'AZ:R`$dgUG9)Df1U+ݣNɮEC;n6[,@J!s/a␫`Ǹ:D`/#NOwBв NuہdۉPvLULMfr(\rss^n+oud=ȳ1Scl+zs`<|A`8ӓ2eߓ( J7ncoOQVd|uyl0V4rFf\ڏ 4ިnhű`L犰b>bcN(xWbnVn{f4${u=:O~d򩡝玻3zP֭fyo.ٰ[vUwWõ:x`KNU̬#$Nr4 h ț,=ATQ|%a^sY5Om$*IYje·4p&5|?C&o⼕:_2%3|%oAo4祣 yփ|*JGf})w32|bgIf3Zi!ac7/;ȽuF+^m+bUJzM@" 7XMຎ VQWLVśwl~9dzظlD0;p\}&7$bh/G?||BOZmMJOdlbtk2|.HlLkZqu˼#*u: { 6~FKHi]|YW4]K lp3f^Gۉf>΢9ś+VٴxR;. >arte7F#H|LJ斯KED5UuUQ|An[78lo$ND͠Q@'.ѷr?.&#Ő/ح˿j,W"eOS;Qtio7cO`,ȗ&MѬendstream endobj 166 0 obj << /Type /ObjStm /Length 3242 /Filter /FlateDecode /N 79 /First 732 >> stream x[is7 s˕*[sl$;Q@S#2TטsRTtT fn |`) L0Lɜq8+Y:- 8L;8(x4 i%q"Ơ吘v1^2pףaK4-3IScYeϬ&thZcd֛1H4I1GTq8O'Jy-|X,h$LZJ`h|;h Ez--AnX4<iX΢YI,&(JbIiTnPVP5TeoʲP$IMU-%5cI]jJ}k# 2jGv@ѿRP UgMW4J-}@ԌV h̄F5Ci(9E$T2h}ЇCFJz}k2)h~(X< d,2 |R~i՟,Vx:\]an0UϤ\q_$uǝء+*fTNwM Li3;Wj._ hk$R#ׁG>~I~dg\Ĝٜ{i]]3:ߓHǝ .Ajw!a'KյUYle2\6Dejsʰ!¼h_ ;.Pi~]dYɵMWfNte鶮 Ü3Ɍ xYCǢZYwENc8+=[X5:M)SU_ UcKkU;PAȣCwڍ ] TiF!\G^ְ: n^k(Qe+B4A9J&baj̫)G!ܕX̅ ݋2hWAo QWCGn?TnD WJsD}hrjU0 o9jgQ(KՑ BtRѨk,uDPNrש5ui7"zlbКjx7Su.Ĭ9󠬠`yt74kn{"Hw#k#t1ݻ:z=cn{_t\m7P"-h<s吧%:uJ7yeq=[v)(ad*D j=/6@kyj~R_>d˷@⟗5u^h;ʹ>6P ĐM82™eEaL}(Pb1<<ˬ`mGF_0ZiqNT:3_}u'<vsJk{YZ7 Jx6Żo 72X%mGtMeM}jg~kJ#T9j)II€$`6pսs}o^jY?nwJ}ddC&"9FT_] R0qSt !!5W 0O7[Oudz/TyQ :Їtão% _!*mH$2N`hyty6 vo 8GNfPv R+?%O8'Bi%G:;h@s^jx18y/*@}TpZ%nh@\ܵח,HCH͍pF|N[nhէ9'9ab,o}~?~ m٬ζ|>;-҉Osf2-)y %!Y{ ~瞕]&Jטu 8qnGZ x pZRaNqKi"D&+=5b` 8 Y`L!V;[f%yփ3BSj1<8Hj)rqGϥmӡlӎ3 -gpQ4i-b"7wheC[ڪv|&ʺL#.6μ{$ GV6lח/4yJ彝[رK ܾH2hn#丧vR$`@RBj3Qj;)Y_nFY[Ash,a@mKvaH^wӋ+W'U青#9>,s]j֭mC~z@4PəJ2|٤9 ۍj²΃0p2vz9p-oO*$sx\HGxv-PtN٩ :hWpp)=v 16@j#;p[wn0*ohwZ ?eZKkO_Nd)5v_, 2> yZ4093ύaa(xG pKkN_D~s(ZDvZsKJnnl­u#]*.E6J<Ѣ%Q_諼?|| !==/ $@^O-ټןżendstream endobj 246 0 obj << /Filter /FlateDecode /Length 5056 >> stream x[KsHrd7ѱ'A<3z%}|- nпޙYU@&F#eVMUM^r&9mTfJΔ\Si6+:]-utrEoN %aFEUWm}|8>Y;ԝ[\X(-뮎 obhn{CR,p(ߑ;r'D! z[pAˤ=-.u6I47Es{B[Ë;De2&QYvEEQl'>Ž kX|ZQ@+cZ7V~hn /GS1R4^-%4w d!Ks" 6ܖ*'@HE7Ų@og^:aiQVۢ 3i+&fzڷɯkb̕o2h@4/^a͸!<*S~o7 Ϳo'~Jk4L@bn>w‚Wjfxit8/ޜu^G:]noj[|{]}3pɴSZPl)mżP>xv0.ځ-$K1ܯ Lї: |n>kZVR1WMBɊ oB q l'sS^ ;:k&?/&H1v}nמ́'Z1:>%v!d'>MDv+ $(ׁSn(Y& ^qQOd:Y#H rm#Eڰe+h"ϛ?((JqQ%D)bOKVS$TfAשŧK׼ļ"(N(\k(gBqZܮҀUYA5+QRѡqddKagR9wlƐc?*d%cs$8IAd^vaNe@q ,*m PGH(tnE Ff?գbL- 6梁/kzVۅ&B&KiwWgLl>8cÐ} seK^N *eRG+|\Xʥ6 Zkyi+FJQ!aA|ac!8$:rY8;(˘ HUd33276]$IYK@(F԰:T6!Mc 4}2 1gK3X3"% e'{8 B%u/kcisѭߜ~u8pu-PXT9gV ̜.E,"!tk:d7߼FO5˭2YښŖ(b7Do3"2hGJϰY14Q#2,J!*uIV4iڝMl(p??]pgДTKCmT `J!k`K]f 4 6c}+FrkJ0s +el RlѷO 4(L(pAL30)}%eV)_;g8aeh|qosY  [XZw]&?vH{q 9Y,nrz,8F3ҮUAI]c 5}C݅T#K pXDs t鎻Ai^ˉ2AO3PZoRE0NJQ3Ϗ$7~(bW`Pcݡ6>6e5;;6w A i>d -j;B(DJ3+ P@C8xb}NX (§]F.$o܍ӇcNBV#6{I T< 5):FE~A-A@V$,9Y0@y8E@\9_KyqtB릋ĥtC!psG\i[j./]J񐰷C60a< [Wn-_Aʽ?jJ@ئ!5g[=> Pd<38甥Jm33rVV?<ܺsRezٲZzImZr(#2ѡJKta%ThdPO>A|^*-8ûoٝ-bEctͨX+Q/;yfV5'Hɒ-z" -ixׂvx x&"džHe~ &bRyo׎xI%ԡmE=K,<ż=[&$;9Mc(7% Lx;ؗ,1]">BlKF|#Jx%L="}aS~؝@)Rs &r.t:-z'0f_ pT_R߄*\}S>!Dnh BmF($zGǶ_>V2b趔3/3TRI0w[7?K+?=e ˷=y!TU26)S1kkM"a%YD5;X3OT54ȓV?'nO|HX{'=qIzQĝYY*ɒ݇BD>φNRvrvc(:2c:,am7|3L%Q|Y41<*&rKHO\p!Uy>vwa^6 Cz ^(KIދz_Btc#,Ibܡج<;L)+;կPDKCp<̾/f%im3Nz>3UL8l}Iov=6J֟(u_9q$j_QR/Bd.!_ǚ$Veޯ[[QhV٪o$&o~||>:%+&Nuл-~'08]*m^KT,[ӨǤ> +\+ ^4kl i i֣S@x}sGzLa$$iC~ZGzOq˫+a#gO*'i>a[Uh+#fuwVϔn~{$~!H/#㥥iک<ɡXn)`qv۰YߥY^4bCSgdyJebI8MYy|,O@g"{>U7q35MO&,)ro1jD/qmioW\ Waff,҈V9czzs1` ͕ qR8Tv[kg8ft'.¢,E>GfSyQQ;uݶ CG'TlLԒ>JjDk L FVB~Qk^gOׅ~Eñfg++o'Q`ێ'qj"Ϳl׃ìoEƴ@gdx) yX)321N.@p1cEEyW7o 镌ZYߵz?Ӻ-W8< N!|+=@N#OtOCOhOɠ|.!ϋ> stream x\K8ri>)xr#k+w#iJ-^L$$Zj}"&kuUwp+A^p{/t_q}UE\{qk2׷Fܾ+z`ۻ?xن~8lRm{ӵ^•;4:߱íVשf",@ R]6(Ƃ31}tY  2vU>Wv0HQa菟l'R|>ODy6ZPǨ2~ho^zƶHNh>ō N g-ŵf a/-^E@[F+AJ?è#2Yrj 2:pOĂ춥z=ed`;g9;Me'ne`QiZ:5> °D1ho6R%UF53R vtz\n3f@ Oŕ3h4XKeʙQ7Kß^*s}R0&:JhFPr-驶" NQuN.ْd&Cuj;͗2EBK4/1[[JNb/҃;$MC폥ro6Hp6 M! N4$댟%g+PRϛw0284o\$sPxh\<]P6f@bAJ/h5q (G.*O" ZRU=v%b|SCއPӾ7.YȆi 8sfHIF4סG ]3|_'E%36*t4 3>F Z=;~ 6NTM6Ҿண#2D)-123j;w) Ѝl ͖]#M7@ebU{`+jZI?]QL}*VMS@ns*3e% ^e6"58rt&{2=0Ѷ iyC4É3f,s!M*| Uh<|C bH8`2= Ҍd]W8[B]{ X;p[If-8.S#٢STL!oq~ek_chtH%0='=s:6! ϼDP: QZ(ء#-~OQHA;z5 KMC?Ҷ7G]J}c~D-,4+][WmFxTzR.I#XM1IūwhC={v= ?a)Ut6dгߠiPw ϯv1b%$C@°,zMw|$y2qW :]rAy%^աJm8z)^`$b)Z)ڧZY$  pZHD; 4ΥI&Myf7|r*v[ p.'FV*yk?2ũlayJ,1`B)U@S`h$& 40aۡ|簥D1f>*Z-)Mʎp:X }og搞g4+A+%hGtR$45> "xm?-S*g,mZCTYK*"B Jԟ%n9'M!2jX2Ϛ6y.CEw]oc pp+>'n HQu23"v! `(`lgG܉9:/-Ψy{;qV"=噤P np3lؔaJKb0 ~ѵVUkQ@-hc^2LT \Yxm?sK4*ʂC, "EdnrF-X|7J0o*ֲ8 ycƌ0j.  KL#mipZ/eG|p7kx8tt$`jyBtpE(NrVJdu t 9@[-4N#aS=H»[PE?O.c:va'ʂ՗H!GÑ㝉36Qu5Xk%6?l U0Y :  [bU},a<ܕiTV&i_UK_w@`vlFP4vc~DvQPi=5Ik/^(;_Ck|Ȃ-@W< ڴVka)^ Br)}o( Yq́b,s-F_Ee ,4L?jXSb*H!mU{zbiZV4)TmZJZ%ټ )^MpRoC юs.dzqS]Ч \{6'C"cvԫTFlQFƶ+oVpiz'ߔYosD{G_`G. b3hy Aj yxkżLSUUiv=X6 *뗒Qjj:/U@ .sջwϞkvw,PRT^(ǻMjX5M/6 ˊkkgiRMWc=V&aD9&;ER5)JHaH@}-F/scAT>kaZ֥V96]KZ<Ρ9Q*%Lk2UuY]{>sEN1 Jͺ EA֢n3z?*fU6YNLQz\RlB*]D׵ªC%#:923-<#MVi>dѰtѢ@ѵN[왘!46ow`S.k]>*4Re-iꕽcj? 3:C[{\䟕@{CRŅ$q[ٗi"!dօoͦhvs&67b3}qJqA+Pc"Ok[! 0"1U1qN iQ"^2N[nWg؄0@ P7qj(c?ưbIJ}%ML]UBz(-Nn-Y^j!ÐHri&N]3eNU#I!~%XxtVMrc?ϩ A\j+AڪN0%@ƈpsDi7/;n֦G W Ҏn stO~Kiaš#OvH p#ցv VN?%amd!AKOx\]8>vBa{bv u|HNo:S˯133sBBwHB@'$H<<"Pg@]JpHu)BO- AU/ Vxc@9Vpd۹=)"J]Ƅj6\X2[>qWf7J| $8;}u NtEJ0٩cv+~L[8ɼ:ca)_F~c>jGaq>U>wN%U2 r$,i<|NBX"q7v ^LRcGVic|kK|M[nb'o&Td9<҆EQ4FJS)\<$WI{|hl NN=b[9PUR|FLG$f@lv/Y(DYt_ :fYer?Fh9Q?,B@4> stream x[ߏܶC-īI(iAP t}w(ٽֹ\ II=)V$3oY65[6_n,o,]]_^,7k[^[!l9gKLZ^߯Q°U7MZ !U_n׫5O~R_V_FӍTYm/:A]v'xwijkjQO.w-o]ރ`ک-w )oWUFpQ3IG r!Q&ڿ]9꧹_uVmpg,dҵỆǵjoo5*%6LC8ՠ,vNK/<++mU"6[Y[[6PxsBƒQ4fiW3g߂F2.HHUx/j nl0jXpo`@W%_XOV`\6HeEM*;Z|i'Hs9^s#=I˝tEgO8z~ Sf8C^_‡1' Hs! kn>B\ |Kazu9tퟯnJDskM⛧ @ndm!\,ZOj䦑3A=_`^]MfH#%V% ɫ3Hl~1YԢ6 F zVadm(Wۄ)z X)ɯH!Fê vhfWĶx8:kM)FӢ/؛]*ݚ\}-&nH=!Q(qˀ]\TLJwO'ʆOŭB_/2ف{2Z[ 8~s 3 1 c&3? ժFGP%n~9{؉KFQΌ\Hǹt 6@|:گ\8.o)șjah4fLʷ9$D9Zs2r*MfJٚ0esS|vF$="!/4*0e;yGm@ ܍,ʱ8Fo&vB㘡+ӬFh$rmOb&5Tx$ Hg>& s- YȓtKB ۖ f&NLq~l%ċwRB`6+(6~?ͅǪkpSڿ ^PA֫ru1?X.>0%evMt]؏cSé[R[fj' UC2VC$Plۇu'Axҍ-AFNa.x}CKUߏ!ir'&öBC| ?/^[P%'IhJA ]9)ӀPQՆ$ԂKq|*06"L= =Z1UE$DN=zYٛWM?P~fGsɗ T"o1{Lյ:+mCs'.5c.zg<g;SWBK#ȱ0(+},g%~LJ6>G 5 '?/ S9;<먷PϗabDi uo ]j0Ψq?A;CA7'X ^ԔYa.A,;>8rLcB53 +U?LHYYMQԍC N`́M??U M042N{!Ry|uMV9U4Z . JI]cǼ \͔Mn^ՌRO?ƞi)ԩ"N9tX(};^\)׍61UKs)l3m34c` QҀU RiĺH{D]TMoV)D\ ɽO-2( Xpi24mP`V 8PY4}<=/(:'&~ٶR-\>T 1BSP SaA kJBG:f)lAذب>̐sh׼-jd-2m,gSHUkor'dſgJSklgd$,t%`3Vvb8Op @6F7OdK )B8Zتvb4k_ iދZt4_L%^NۂB3BN0Yro ``E|( ؈c4;cgBJzXMK8:q᱙>$teE2 ff9=mdG@,k%':^+@'0{ 1lrUßOdbēΗ:Q*5S&F&HÑ('\Vm : Q;+&%&<<@8.8&DA ,m!$RUah ('$I"ցHp^rti%n U47~ݏݤU'i =XS"3z'iܭOc H|d0_;.OoIzDbŽ^q~9O*`)"+`ZvZ\H(j?j7v'Ié#1M-]/iŰ{9݀PAsl97Oɴ=w#߶YqM8VG'&9]szxucRFq}Rv1=t/]XV!:P+ JWjM۱&X‰ѵ$Ž?|ƥJ4UʍNYSx>(^A۱P]''4C!Нq1JIaMR$`GĄd  t6&EFZCLA,8vS@9~Pҝr2_V!Ւ.^BToV{?nvf^h"p+EINzpl@6ڰ 7E8R<~r0aDhdb@dNi`F],?Pt"\k,yHZX!t7 pУk:$Efendstream endobj 249 0 obj << /Filter /FlateDecode /Length 4145 >> stream x[sܶԞۃ&r,K|n3̸CfhIV6*ud7@p%`K>iײ/6'/'.68? N:NߞWةeF uz9igxu鳖i|U9b)hN6͢k;%-_.>UwdUĺ-^V%j :d󊆮BnK`02A|)kPȜ|pïڲI-x*.jxugT }*!~uT|rgJydoNX(i~*a=]-ӅUu\PE>& ^-2kz# nCjw>.e3KHŘk[1~ߐmVf)ϐ.`T퐨B(tRv05/e~ڦEsZJ rC[NE '+e@%d+ِ vFv}x>TSvcY'8'JrG`FCj[O3ܽ:D)ٹF7pzI}z~Qq;D>Fv:eRsk3<&D(b@DO> 8_?&O`9eGpy>=)i# OA %o'c:ڏfwC/7(bԲ k(=Q1`ړ2t>]\v,ԉC>6up{~_>4dpc'2r)@z2đ +@G*Ak LVp-Qсm6?T.՞ KDي,)B"+[BpLw-<tV>3\ @? r-$N, 9܏6&GPS++3 n$;/r tT'Oʊ>i\`j'= @fD_PcĒqEzƛPpIT$(Ap"^vь!@*O~8 lbΣ! .k.ٖq*!(`iJɸNmcHֲtogO+;q|qñxϦl m#rK5E xq8ovހy}9x6:#HwL y o|Y`&U 'J&g]X#}nQOGRx^80+=H~C?8JA/CZ?mӭ7rq,LI8@gAJJKj'?%6cA5‗g?~?8v@)85Ϫ olXϲ㋃j0Y!*u^2[VAbSk%}P8}K fl_n=tz).T2 e!!ɳTL%w60 +`inhe,[I/"#urX+M |:z[;CYRi@cdٔ+,+IL>jQXd@cEBk5HCTZ8z ^c=J~BC^<ĴKFxAkT!8 3aԓ޾ /QKjr JoK-Հj@i`y;b6)zni,ްwU PIYօ&j^&[ߏ=~b146+Y6" {ұa2s\Eg[m1:Ę VZT *$$z!kW7լ V?_L*];RXS^â [=YE3uc~3m](,嚏wW!:Uktrk/L^$!t=iWM@-g/xWEFF%WlSCE=2b¬*&_#]uôgKTZV ɎC^|* }²%5U@}aFC4E4A*j7׳x\_0hШyTѮυ;"Pzkg;T#VD:EljhPפ6[ 4+c!~DwXaUFܢ^84zv']|O~\GlZ2E#Z+M"fVף<&"S,>.kǮL^Z;3"JRVF~͵Qjv!8Iq_6}F#qSǺÞx3'농ŨZ!cK%녖&sX`E@lp9 gӒ)ʝ.vVbE}ÿ:2L(vJY#(xY K"} Bp<̃e![*n4ga&-[}t9KdRQ([ d;$<^$`d=} 9/+eh(Yx:c*dRL(R#rZmFU7'v:nC?0f3 t"?5< ^K9x++Uj `3qRO7P٥kmݸs1_f}qR6G+lCui,! “Yk27JI?BWJ#tO.2Rfgr4p T_!!C_ Pe"K&jA6ytzCupRֈ> stream xX XS׶>1sDE1DO٪֡U+yPf@eAЋNնm굯VMv{dž;'g_:Ʀ#H/Z,"7|+fEn^ dӜ/[+6_Bg"fomng;}cnZ-^Ky/Y2l'2i/6eaܙ%PƋ,eb5}-oOÀ/e8uP{wW _v9>qJuTLUT:'ҥZ,8"nt:ZƲ$>X/ #AKOɻ@;U>jkP,F0Keq*Y~J-G.caolﰥDP܀'(;Z<=kA{oɌ$[{> 6+Ǟ*) אvXi[}k1aEҎXMJqXա=rL:w2vJ2<(,KgQ"䋂V /XD JzEäԉLT9L$S&%8S# #rv3N(k8QlnkQāX0 (5F[$Mԛ #/JNHNH$c|Dx&{_9(ך^0n!s F J g!c0sw$'~8EOd4d=L"gsxD}%G{kut&˷g3O2!p8ɗ8\I:٫Te͡fcm]." tPfs`k6d,'^B紞 NrI3$r w \??x"pd؎ewcJS{-mczsfD*~MV-ʩkkP|w2ko~l}|m,}+.KΡ [OL_o0+3,8X8͌c%}ej2EO;d4clf qoq`z, _hT2J.j9["+o]uhA{hIeU2vFQ i2#MY8 { {4;!345 4OJBdFWJL$6 zkg7jV1rơ2I?IKbA58R "GR-L~G2W,] -ܥ{P̹Am%w)d+fK kww iBh 8`TeU)e/4$6D/P^T!wJ²1\eKyLS*1w>kAhPk~4b.<,dzFL炩"enzu 4 zҎێ7 ׺B]nTI1n-z4nxoNK;ŴJU ܪd8mXD<j ΘVyF=Wi;'EOًSzJw( DtrZ66@;|hNlTl!zuڞ,T-!ޥ?xՠPUe@W&9DGlXd۵sa+RFs:1rp.lN_Y{/w(׶VCL4~?k'2yi"9 e:Oܼ]ت❹X%]i2M%ItrW?{t@ɯ?VYi6)*-7%6dߑ9>;J4k,*9w1Kp᠂-)Q&jɣFb?[5y40XAwpOX@ю3"~OrT|koQ0de+-JJVaV~ =݌`h Vw!! |bB8la_T٢+h B*T7(u> !0[qVrTXߤd]R8[d[DF#=Bc/Y$eV&L6aStV#J(^%Wa.#5~OxjzC\Hxd5yPϙJBA>M>}退Aofl/S-"N*ti aN*$l?JNY紩juYYZ6a3WϹ%dwqN<\g h<Ī!$$G)մwQ>n[Z6Z vyW7og2PƉg,Vv>٧L("ٗ5z+*#8B#z.$UKzs)YavGjO Iĕ 5t>\b:\3U p!'+=Yk=n4)T-;hkgмF浜TWu|truh ζ8&VL67^YoP[l8/Rл49XLAyQRvRm,P xO_3&f{Rx̓8ן'sqr<:IШ?^ZWR{B%Hm2I=tWycy"B8muO:SYHG M6S؝ܦ؃euMNtٌzle،nҎbw4slwq~߈ M_JQb*e81d}-F-8p8qm'cD8L^[SǪށl\Bo 3)w~*/?hRo_6|\ӿ:}㛥Ցyº0 "a#UYTɳ[~eua"Cz*8u~%2Z<' |63< WJE@2ٵVYMoDƺ`SIWpDD_tT'ʍP;NI )~˓"`~Wha p8$QdAH7*#-會Ķ䍿s ?$j5ʖ3#Ő$.sp73>78~<7cAD iSAu#->BWhSYevNՀ %OG3seMj 3a]VnN|Iȋ2btq:!uM:BGIRڒƄ.uO4+XU==Q_16@m|ٶ tq|kۺ2+I|n2y] pJ4/CU_0 ȉZGbq ˿O޴p#~݇ࢵyyc-~!7֚[c[\Հ T<>r 9Y)ssw}S]IIF.KHTk#KMp[d~OURm#O[W9 b3(kpx+gN٫)!7׸㬃ބˏ[9ooo`|NkɣY4I5ˌ(ɾÀ aô2F&K?Ȟb$gɯXpYi"+~J.~OFZ& t)/'x꧶iwWo`*ғRҨ"ᶢu''?PN—Vb c?2AJtks캣M+䃹:͋bOJ;>f5OK,:|)#p8OD(^EvAnD)iK:iiv(Y: ?OzJmSwZ$ Is,!M $bYv lz5",VmZxKXEou5]Re ~m?'Sny3/dμ0V-V^5֩ vU閝eC=N]þ!7S0ˬ>S|:T[~Q.!kЄ(ӡ3>Nddl[7~zz8*ɍ?$FsHMpi$IaRT|xP??S]+]\Ȱ\t4B-LB%`Dv2seDbqnËjEgRр!ԭv'$%πԶQAWZ{gs(cWZv+=i- +4l[ ܃gR0HWNq*(ѯp΁J^8Q'1ozn̾@f5w>Zr|iQh6mM9p44h?.i]RJQ,o$^TQS'Xd&]j]\dC!xHة$MT5r l!/M{c6 %] eJV3)G]gp^lq%e9NVX~9ƭ* t:~b^BfO_g@!1FCVG 7!?$OBxmP_3:]fn=~-i"`+PMu'/3H| HA?|J# #Og {Lpo;uGx}Dyj(R:x['CsBNdԉ7Djs8_Y0)3B=+ˋpa-7+ge !ͅ8)9y2KQ HAlA޹2Zr'[1Ly]dOx$lr=SMUԝl{ S˦a%/(GY5ϤEq+MKtĖ7:.juvLOnYTD/#n<'ء`g> stream x}Vy\uyWDH^Cw4ͤ4V<+FPCRg PAApF%|MD.-+3Qs{^m?Ϳs}/` ֤%K8!tqBR5Y'@Vsg:8| _|xhϹ/xZzFfxLÚyq IɋS]I ,`2%Xf)3Y`B0Q&`f1I fg|_ƏN{bTy_1EqcIrCUA=Q]>r~аA=`o{b= H,);Sɼ.j.CP>h=IXR`5s-(^ӎfm jmy j`%~F:C"8<0_CױVM=2Po2בŧ ޓK(g].Zed;[Zr6Bǡ}iLJT~ U_2I${Ї>vl.bSiaDd47 v `yaJa|! !ʖ8he΃M:+*S! V $RZ:r w^clJO9 wOʲ [ wӄs:G>GϟXŘD=ǻ-|` KtԐF I)I‡w8ܫѱ9?w (t4Zyttpwzi1f1N%v-t}WX{^wsNp 2W;@X(UDZ-ZB+Su[A8pq'Bކ.w8cp${Jc(f10$@D [1v^C℞ggoNzbSx^Ξ|>66V@-¨[Y+,QYl;CёğH>Fp=vj[dVwv@N-Jf2]`.$\{Y%OzkNrK7&Gù" %:ta*8M%̥s﷾K'W\;1 C&# {4P_XfP ^c4K5FuuWaȨH,4'@-,H.~dH2zS^*R_3J\lK B|ԫt,NωN*~fr'Kc/ET|Af.>uP(wTjw rpb%m1nߞ, |rAޚB?7 UU9 Hr& wK>s9nY"K!PAVĞLh,dUo9;LSˮ8M.'$@Nɖe4+jykK P^7? uow\W`t'0pDW=;Kr Yp+zQvK\ \ZGNd!툵'L!cG!W;XN|\$CbFtp"7o4TDcgWG!hgڇB:pӗW7F<.J6XJ6rh=Z}Wpf$z<8x! X!͎7?s7_)}1Ş3`ElޝLoY2#qb}8CMwcիGHW,N9X)O/ǵ@8{n$Sy 'j*qk4vCtRv }ޅ:C{-{W, ^)wkpDCWHp]~ /mr dzP͏[-Y6ۡNla?*6u)_SrGHp'-fMrEZF⨏>AmVbt*VĐ Ar[Y.//aendstream endobj 252 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 8055 >> stream xyX, ;cWƕEͬncER-{]"{/1Qc%M;h<':;>{?sH(k+J".\gn*M=]1KuMR4ӅΔakf4`vyC6-uYhnK,u_ar^+WM< 9jt1cǍ2abI?h`A=8C-&S}ԇT?j՟rP˩ jH VRCUPj55FfRé,jIͦFQs\j 5KͧQ BjH-&QT{ʛ@}NP)_ՙBRzRrBQ ʊS=CmHJQ j5FT[}l/'͵`a#Q<a6mڬoEO۾jk?}SMGwt)ӕNow%VfuеA_fKIWh{E={G`'RrI.sow ƣ15!DҸF/dujQg!t"lGE0#ZgkJJ4Iy ÍZkWa70+Jc Y6a5L/j-ܮرBe7^`ɠ4ޫ&"ף`̀ξh 񢯦ĭUBI sSY Jf+~vtGGsMs^R mƇG'}@ ]<:WXf{/z X(9)`ܽFfnfW>}}B+$a;!NB'4Djx2.С40A}aI~flnٌ< HQ-'^ Yy|\#A':%r#لJ'_DKoq8Ar4.5KYWBG1jy$y0 (P"@L (΅dV Qܚ[jV x bg&h1)liG,=Qb ʹcvCa)6B4Zfi[[M1J'78~ jS$WN;|:q[AQF(`P v=Zqhbq%u+U9h4~YLjCRnK׆9,̀> f=]+.MSDw:!%]wPPW[KNhQ.?"2}n˓Z¢sA ݒg*02o=03?@4jwWƭh") -1H{ty,omt  Fq$P"+/R4{O[|uD˿W_ݷc{ E^U<}sDSOd=p khzdLoK^IF%G 5 =ȣD@z>p}타ԷĎ~ɛ{ϋFsMQ{ŃfAf?bbPn70եUh/gI&4wQ"96:Fs+wdk:]5ǷL/|-3:e{.Ӑ L$ą)qFc~>g2)/ߦBϿ$v>H:EB@ qG2&; Czr1;nV5U~4,B<+O "^'1xބҳ=BJ͠?#{(RBP,z`w=Y~:`=m\x!q_q &{eT42Ǩk.q-vӹH"J(kL7b^U777[͌W1ȿYU"l7| ՄG@~5-JC<{55 çɇ^<6M[7gDcݣ 1J<аS/Tpi궞o$)ߍ?ֵVoN++Ե>nZN1/[:uD҂$_ B?B !Ɖ֐jq[JM&$ؗA`XWB0R$E3\MEjQF(#VY`޶Y/@ֿaY#5P,mF[C?ȭʦ5Wrl 3 X_%;K-bז"Jmw#|f5Av<--`ldXO.^>`>pM(tGr8l6.GN6A6ޡ_c#'xG?hJuG̔l#,$4)b <::6_g҈AVh.LI({‰6j $.y`o!XI.u0SP+v}>(Zca)H*ԧDw' ,Kޒ!;;Ȱd)]07 F%ϰ\~ ;[T N&X ?T`:7)pº~ՑIˠΰSoFU7ywk3,>,CW,-M?5|oQ[K|<a8{q`-Tfc `Lϩ;u2C[\͵!PcZG2!|LxG+=OƎj1M;]WA:Qc3LiaZq2CЕ'T菖m7b2 SӛE/38|]mJdI/kq(2l=G$AwdLS<:IX-Kme;M(~3+_i6Bc/ڧƹQ|tX٧]%y@ΰu뼝T=KwNC=ԚU/wWjh;o' {',&K2pQ_z?eTrb2 P'^l(0dtF2M\:1hHÚ(Rd` 9_)K- ьsa&Vm7a >A2 9>!}njG:G|z CcZ!3rE٢_}mj!L$ύQ㲉cj&Ljٳ?hq+ VFƇ, g+PGף7#.V/DԜ!C%WEu|`~rFmxm@fWMbm;5@2 jmQ9iE)9?z$D9Vo^0ku?fr%V c?ă82=3dPs y?O"?^ H}.A@MS R={}!ly[ b,{[o_]%\ސ)yaԥ/-y'஼bQ$e_=q&Ȃ'9%qF h[c6T 8nwK"҉c{O:>㦟2]{4q[ 2섂,(cJ |}BW=K8kcAD q0ZKjJPR"m.HUdfUh(A) 0eD\dpzWd]쬼tSHYx[,깿5.}TwGH!Tvh`O"1ڹGL}E/qL}7!B";* ~_Im_/ѻa9هn׆r͖* 7ƨx`ۡV۽9 hzܶ9pB8:и7b0+dRaUyV['RK-ai9xڤ $e76=&,m1C?r:^4+rjxxlϊ&Z7"94Q `JLB=EVmYbΖNwx_ v裏;2e)s!Xx5m@;vkM3Rz/7%n(Pb RC\xiH' 6#5GDx9;4BSV6MLl3M s9:& aSRˇ-2$\wzoZ]sVAZՋ`tѺG_XlO=.|$ 37`L-3c{NVQ$& ㏟n B#9͛ ': 4UӀl\.eؘie5 q,yE`O`Oq`EEqqE) ~#DHQ2uXlN?FjtS>tL862xdI&=g*I.{`GfyA()W*Y䇄@( WBHD7Y@~P%w޶!AxҌ_d'CAKIilnH_ zMﹰ&:sk|!(/*yvEj,쟝ԌpCHtQjӂ+ gFz׭a~~5YIick4hDmbƷ" "Dx iHR=|~бQyFLF yV-j՝PNk`0jMY.d0}dy_C&ft_qT(mx4CXʎ\+9RhI:ʓ *e4وH[*Uq_ICqb[}J K}1mtH'TQƤg%93w /, J#u9܊ ?OK`S#0~Mᾟ*$ބoIr2GaiIs%r":rX0{dGjK/ h]zR8C)[`,xG>Jl"TQ9O1|{w4:\:\*K/ߞׂ#A|wq,g\٫hΝH6ms3KJ>;Q{G{?d?|b^=j12--ԯBxL ߀›} oJ`=SLV#aYLI(2JutXC>b qfɐݮ3yZ٭x@ψ~$G/fQw^P__Ou٤kfljq:,cD7fSgr!ƦVdtƂ|Է .rsR/ff$'݄L2]e{x`%`w)'ﻀz Ey+khEpy__W״θi`N42ux{%OUةDpc9Ԩ5-ecpy<NfrjXϥGl_3~9 bz,1v;l^d+|Kav2?kvҪ\a\>}ڪg^sҽ4<7,nVw.c4dyQgyϏEݖ# bm3 13E'#nXѴˏۗRf½c6=rQG=#'m2c.U`amّݙ$TmubXLX|o:G|}/+_9z\1ݔLДb`zka\Xh3[W1DeKKDDk4A/#Y _^IgfՓKV畕??gUqrMsqd뉖hn4uB]2[5kGRn5R5'94`ɢ [PEei#I^uݴOD?`afOixH~p> y.D c@Y 1`oC6 .ȅUI"qu}b[}wK:uGT` 0qbc}aq@UFx^IJ; m,Fd|}{7 endstream endobj 253 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 2090 >> stream xeUiTgpYO8 '&qD"j]hv46fUYYnYZErhDƌySmIf~~_w߻(3J$Yzl9mǵpɯ&}`a fgmp7W(ǮOH<'lネܖ@Q3͔/GSۨʓZCQ eMlPʌJ Md~glX;SvJ_25zO{rkk_TQzީݦfݱg2!d]֨U7TuFrtMF(mKє "€9H;{2U٩W_c1r!P%ʉVBΉ"8Xus@ӸE.49!ƷKy Z.*uQ 1[F )Bםs標IdƣCw~V"N Ԩ'Cg|uB i*.i֨ s!:\㰤VX+\"</@GɕuizݕPD5F[mʀzmj-MIYoDʹ %bCq_#B@NhѦR((ҳsdBtvm' k'<ӣ<2h c,5βT]PḆͱ)NdJ=''c"U$H/Lo&pmןXS~aSA&<NeG"6aBA~pߑ^#oUh r@ŰSbkSk9@/+~rl TYD6bG,v \KN|zY>A7эHgû4 _sMZ,f'd}Ȑ]d_NǽǏyX,?D~+Ac%wٝs~'Jcz7]ًB%**$9shH0S_ ])?p۞; D_}hpn}B.v8 F@̀#iA b59{xBN٘4ACP_%?0> stream xiLMSans10-Regular8eV  CR+$3 ~~߸nwnxikhiJPb]uՋ֋ֹ֡ƴƮ̋|ymlklkk3:cPJMEk0..0FJc㋮vZҋ#J2PwDKP $$PD Mvͳwex}wwx}}xww}xuPW  To L;endstream endobj 255 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 6550 >> stream xYy|Se>!4AHx (*EZXJ6ݗ4m6ii[toҲeS(E #ܙ;wd/K[^}}=qL{f^; IEC/O}5-|EʌUed.Y{4j]^ q6'n*ޖ xrѳK$YD(1H,!b+1FAY.t :].~kc6ˎdzgs(A2V 0ZQ{Z&l=3JƽJ,+ppBn7wΙ{j^!DnȘ8 69߲CM67!yy( 3BgQV:T2 5 BW4 t=åq1XQ~wt P& qEPPg[k&I:hEZiV@us1]c*&CL{Ts%6R]5%"&aβD׿k|Rh %AQL[y~SE8(NpE(x(VCor}GNף?A衯7wq摒z:ܥF}IiЭ51-6 "a*yË[ܩ:r؋&T{ j Ui#tzMh#=@ tGmv1pCMQS5U6RiЫ 5U,r`lm#C Q2]_dsXRej*A WhPt4pOTh-J 6X/ᐠ//c0?z q&#bIQbmsD« 2CTd]P-Ɓ@7PE|/Cx߼#ҽn@K1qAF+Ur 2}.8g/HX]1f*Н̢@ L[)#b,*ٝ {_bL6mj 'oP8'g:b B$;-?b@c]hJ/7? F$~;6%ɉ H`#TB MJP jZB"%-2fkQ,_xТO ҍa/ go!{S,? ܼ[̀Ʋ{_@=& +rsSԩ*=*wPB4-5e۾m52,dO$p4|2P6ot:,eRGUmۯqC'ԗ{JORXiS|t-rNkI9uZ- qjwn(d5V{3zy9ܣJ*VV53@Z*=5XvMߧMvyS3 j`؆J) ^k@+,wXŹ|lIR/jϓ9g[ЮaUI $RhHK1^@NΑqeWm(HW & E^ߐ0吷~G=4K6YmAT qyzu ZųrQo.g79K3&Tu:FfNN3H0Ͽ >t36i¹hKljM;TB+tZuSw-C{pq+痁>oX5P ?5iNI?C UKvAc>17Y `-ux"F$:Z@6l%~LJx q[vUNI, Ql̘E=,쥻6ۊcK fZUth VK c g/ T gXlXp#Q{4c%. J6A0>71@E_K[b""-,xqISjc٤ori-%Hk 6 5僺D.0[-N?=<|']ܿ=0ôiF}CwHM\a7q7>Ւ_t~\{ G-cR2!MR;:MQ-Z^WqsJM:w= *Q\p|k\['B_i0c.0@F W=6IX9tzVpi5j(~qH{lR++w,n\n8Hd>̽ID=-NJdwmzb F?Ihk<4ӅjOTSn\ZFӧO(ZHD°xPUꠚ9d %7zP+\gJ Zb[٠FhYKT)+;jW:_V向ֱ;jTF"czR&Ћ[h쮄pi(4j[|]$cH#p7+0X0/&"oMI9v%#N([M8p㹻yËB>:p9ȃ~+#;_dNHHI>@ܼkATa/p׸*,-bMUխϼ?n1AxpyQ4_9cz?+Uo),4h*K6ilm`58o{g|@^znA"`Lmv"U-S'abꢎ~ӆ+ї ) Isk]{xO5ЛChr*B'5[oZSby~+}+4,0u9At++EѷeerĖVs]a7i єɃؘ<ܙSKYlٞF։}D4E@I¸X`é1,Va W綳G4zt ^Ie`K-a &o#%@EQ lI ,e2>pS .Z/{`8Xϖ] #X8B##j  M (c =f8)7zckbXjT>urI]ۥ!8kkMF3m#`hU:{mYMMɽtۊޭ&jJ{#ވHm~]&Ko~(_kDA:g%Ij1rPjP+2;xNU62c,?)X2'2/|bNS ~cn5{*1RM s?H]aѨ@bxߜ⣶k5e(uf*4w06*\('EUvSc?6o.o޹*|׮Ju>#)7STZ|xy {a{A:U_L\梟/Ŀc#y}c/u3pH`?`+]ih`,ke8.8ǧ[Riq36Fʤa \x;h LQǮNQre? 5q#*H7 QbawV| 8л/qKP^"^Qw.@߯FZ rid,)\QE2XOʃ8@.\7(X#7@';Ԏ2,PyrPMkq'#SGz3NsV f)N%\g5ASy`|ML\=N-Zsk+yZOQg" (Qy@4: B;o[eHYS-nmummqtfBVYS],1S/]T(ɰjkF;S =λ,|,o MJ°HAWh; (U:–Fn3B&MQmkV-O}A.57O:J)(c&fd;lN| p=E3 rT S3+|hܺ^1p.WZM;kq;[-hƨg5\oٟB6r/قD>&t=_W> stream xYXWמ̎%&#E3cb4bAA0*VTP",{GĈ+W&&ML4&?\}~yww{9y{H(=J"pN.{4'$Vh>jpp #x0W4m^ۖ{[u@ =2G됽69}en7{l\);N5}g}>K>;9s?7o.hKޣ'|jz@-&RI-5DGmާPSTjNSvJjZE}@YP3ԇjeIͦʚKPS먱`j5ZD ޢPè2jeNvRF.쩷)FQƔ GR52P4%xj2"(QRMu `.EVzYŤ17jP`!!g:=V[mvm)_4f #gv$SpOF?x&J1vUL͖9Ś5{5zh՘c9;w|`$d {dd\ظ#ǷM0ݸwU &ZÔabRKj@[b\遼PR8᷺M(%EEƳ*\ a.]9%yxV]\q؄jgN&jؤ6f \  uEkdA;(WՂ ^6[Ɯ| 610 P3>9+LG%VIK`aY%jl]%jPPIPMqa ͖ y6wrOa!wE@`=k' A*Wp`!;/(Vq,d{yD5W"[394W Bn9a$~]P?,O5,/UjV./6ٟxLn2ss{~~׬fd(Ve2 q(J;"f?` 菮+v-(RCSӂE3$>:6OJ@I~nuʶ!s 3lQ|Dc1S*15%N35˸!yE}R]_H9hB H~ XbUV>>ߑF/z^TI Yo`TZ*`~C1$eHG[L\*C4kA?Gq(t*5ݜ[C5 B<́)ؔǎ #b??0Xm \4v2vWJ2# nNOcF~0G8nh,TU.$,qn|s[OR}K Pڼ7VW$AKLne ((\ s18l$mwGԟ r~?QG<0] j1;S peGm[b? eԎ辕= &sxWW#g27͟Ɛbk~n7CFMc7b :}y JT+wO_g╶:}45Mo* M mI86ICP/ Чr=qvv6_M)0א]l]%Ցt)A0[TCj!XAðxH3Бn,X L_vjЌȚBP|p޵aobwhlǦ0⑆2!|U{J,bf0JP[rP(l1:1Kj0Oz=eu`wj`;^,`1krgaLǗ*cKg@Iϴh)B~G1l&dg ?=$7M 8ISK_t W¢cNs]qSu$MGҵT^;=^ 3q"p4WAb;` 0f+4 ΋ri&9|&kI+΋d[ِ]䈱zJ`I5r5lCjt*MiYN1yy(def|ye/B'Cv5xquuh//SLwH=(89t8@eMH@xC%E;&%U2H(>R6`]Q Gxl٠Ql}fȯC' %S<0|#Ea21Q<╉‰Y(. ΣeΩ eOQ ͵ކV=[6 ߼P?: #"-{eD$>Qݞ|Ls7:q*RT\}}Tf҇x0Wo> ɛaB{̐,u/ ;U7~7W4}|XI*k8+g;[Nʲ3x  1:{J}R 1!z]s9x'sqɂ=yN-$sm\8#O ̍+mLtZ|g ز9ϞpEsל_![XMGf[w_mVSp$$8Ľm&ھ")? FD qAGS]`rh:rGI0{:B^(2%<zb/ &xmNUO?}1~@< [&ͤ 1{s#!d[Z߇-0-ziZgUj h4vn;T}-(^Gz֭[?6HO4胗/=hH F!yp%bK`mj;VYjcTK_v?ܢ(d1&cEG1k{mxggi^^^^Y\!Tם1>u E xB4 e 9b):Ytٙzt?ych5\ ;:(<,)sNv=3<:&31L [itF MrXE[\7MDѼS~B2`b` wr?av%!7ŝơ$_f.egR& )545!72ŗUyvE!ܸX$1>26"& <<7y*[s3jήOTë;SS_R <՟H ғ6+K-V.k##} U$|K_ %iqEp:"Wr4F-(O :<&xiN/9վOx;5gd*S2pa щ!( / -nzu~p!m,5_ǖ}|`zMMz7FNճùPucP3 6QD?'s%q<_aŘUdg͟EJL5`^J`J~$@?-iunk. MF|(MBVzfOVç$^n"c=l0YDa+l z`{VxV0ރIwzONkzu+WTxL*:#EՉmT6+RQ5:UrqJ b22BVaIP!9mĠR]\·' &.8߳*9GIIbf/"OdHS)a*,xb˺) n%%%J K2̔+~iϥ>kY"9EX/QhjzzY74;fgW;N\ Nu*1b"/6yyqVYXٍ,;ԪKWxzzI(׸ԓ" yE:cexjPe1CJb+oIsӰy$Ywlqq>XT@.ʠ TTzNf/۴y[>vG 1Me+t, h_6xSUt T K]ʡvxb- EۈD ij˫Z9÷jF;I^ڥg` y08\jX~&#-' U2u ].-&wt ?$agG빍9[3ԩÝ9 ,66kވ8j1R &'_Ů|f'YxL\<iuM`0;aB.'"w_; j{Х{<^h@\U I(TVVZYCퟫrRQ?Ηow^on޺wy%fq2HYݱO=ݚ>lcyy>ZuCYQ0XyB0EbBx"."L3'N&dRJ*{ Rթ3:b:V(1w{^V,t%j__$/V }ݙBL̆I20Wa//[zrgۚ5EWިy!$x!A E^yKى+0Ӥ:Fq+ems(aPjj*Q oq[\刱Z{*7Y!3ԭ a}b <`뭇D'F\3"yA˝NlĿf47t#Z5 GCfELbs[OY@CfiH'K\hOR,/C FiL=8ȳ Ą"d,#cKLGjcPY.!FW sBٖ9TG| L@r3P!_wce;n=[(j˲aQ\XK 6yt(Cye JQ@endstream endobj 257 0 obj << /Filter /FlateDecode /Length 3882 >> stream xZݓ۶OM, d&Gqb|i;ZKٹ7)@'ىL&cow'e'%]lNNɃӹǿ|rvq>!xXm /ls{\Mˢҳպ3)eQr^9Ǻ{J+8tny2]r֯7WF}_g_Ow'˲pOӳppUHe ad&u?9MnNIAŖrXvZxƝ(0-+y~ؚr{VUT7A6.UML- 9Ϯ8?^2Q崈*&- xP᧟Ng Ζur|tì e_]S-` ANZz6vt&hY'*vUs͌k_&3.a^]T WKT2B9AށC؃&ItwKKuE?znNV -@h%8?,o ] RVx6n$~`swqTYj$Hq0od} ̄*T9|g^ a !H?6p@8<iaq8G \"zgs<.ߦ`]LUxBJOf%O=:[r#X?WW싶^?QhhF!b׫XղTSm m狒뀠oƒ :e=\&Gx̕stQ|M`Ղ@AoUbʥ'ip,`$_6W6&5kl5{\LA?hʂe `E:\bZĀO#@f/~9Ec1L j,vpkYCYi bˉpNahf q& /05IߴST#O]W-եH {{hRᾀpOV&8{ܿpO6FĞU٨+LƴL=&2,܅:.|8gUֿDd"C2Q|}q9 j;ǡؠy{܎q4U]!)#y E | *I-9}>_qsa\KPe0cqUa``/L݋a^S(:X6E?/3Ȅ,m~Z.Y-pGDX/ ҕ& )ly'AR"4>]p-1@P@3)]Zs8S,s5. èP;k*eԐl"E4Ycs6HE=\' G⭛q \͌3vf҆<N@(qBy]l]a'PN4Δ'J]9 &&p_mrZ 2@HHd4j /#,@D+H]J.9@sT 4݂( X }\?D{HXBp6q)YG`Erk ))"`|e@! 2i=~͂Ym\v\1|)蛶 t]6l(OyG.t^"?'~yjGX%1 pg$RZ2we2( c|{S }  )6I?ԯlؐ uUc_30fed8-_Mf즞8q_E|=Whdm\'Ib#8X@Ĵpy(0Pۃ)x>$|X6%HL CEÔ`KJ҉)?yڥ_AV~ꓙ_`R@bQMtwTN6d(i9!6O ?}Qq!0Rz*iqA%\y\[op 2ly!ex^g3grXazqX_Q%_88!^c _zNJ zQ)#텳, #k: 7(D JGiICdY6qyMm oBWW |?;qeTHG I;|uw6m;L13 n !/rJbkޮbAYFTG~@ΆǣeWkR#`\쇳xy..k^ Fx F!$ 4@ ;zo-?|RK_#v]q;>SkF}eam70H^  7⍆- 7pxÚ.6^},t( `=eϮŖ>֒Ԓ-R*g`K ,ŲM?1rXm:`m{ eҁIl 袧eOw5*8* E۷;l嶏be׹oh{I:>l6VSM\d#jA/cS߫ :ihiuNS 盜?<[Kl=n("X9QQ?)x+"-Fw{WK p zɌ k LxZCx^*]rLGi)0,/ ^ XJ娱+9Ή5x%0 -Þ; X:e~yz0,K C4V.g;{"zwmp0l> stream xXMSFzUA~0ә!3$$m@$CnFݕ̮c:Ӥt8X8,HX-t8:4tƖx1@Ld+lI(%e 0ޣgaZr9&DbL0 V;l cP5rfQAiLWWDH`$~-Ίe 湦ĢD{Td<Zasq^dxvr>[nHks&0a|sRқ(սˠ}B- :rAzq7^VY4dLa CN`^lKI$Ћ] b RYjT&X1EWeyk$Co67(R65Fk_ }Es_ VF?d8 *$F(;)R|eU\\i@Ak 慒`#ミF.bOL+`RhUu& ;T-WN%I]uvثmEZB2Lo#}Rўs˄2')@#:[ Gׅc$91&F&$,\"9eFKD{4z.,dCSxߣ˶iF/(wBDlU=߻)&.T' Wrư]0ΗjATZS~P/^\/g G\)hF'u*+si$CND"óVe[(pggkԿJtV=9yWqB?ZJzGr hS zmiet!.,7>*N;tba^-e]䪩PicUp5dxv )USqzNqZ> stream xWytS?iJcϱ."" ( Et4-NܙiZҁte(  U=ý'.*pZֺY#d'ood%ɂW.59&e*&%sγOOTb2|_^F+ @O|!=9&QCKSBS7S2nĤ]{Tɳ(j zZC=AMSORj# QOS˩Wg0*ZANޤVQ e׏ϑG(*2_1c>dgǮ'L8%pmH۳eޘmE UpBȃ<0u߸Aݲ Zy@̃Ra29t+~&h ! Htcx y\4Z>S) I6;uI&Andt󚺔 ;߬;^k +(-oP!ڂW8h?V*ض8s1}0Gۉ ;8ApF B77T7n~ eaW՟0l5)VNh.f0f">\W& E}c|,?ABMwn⍌ _SjX0L#At;Q3Ь2[J-=(h҅X x21X'X@obeJ}bqDʎ䨭)[ \8ov:B.Zg#ˆWIA7ъr +:,Cs+?e@tZ<`ں7-|4`&w&}>P.XcW' ts݃{1vclw0{*f<;)/!y5p7U릏zE]0/O59tFWiߠbt9R+Fݟ׿& sWahz0DEFȏcc"Q[UBtBD:+߲hvRJ{w3xv3EATk F*{93tupa4 n=JșNyCm?Q$;)񾔺-A/;"-ñfigET] l>*8?oSi%G7*h4j#LLfS\P8XT_[QzuŻCA)6ٜG~ԙZP Q[lvB>@^^TڤnMo_*h4~YV9&n0!+ZA!:!G9Nu`i&̺=1 8a0뾡nR6*AKQ3 ?fN^ȟÑ.HUdI2~OWޥwi0䭔*mui 1^,bsQ{ih6x[K/4!RMfb=}8ͨA2z/ XLm{sI5`ՠ騛;{чr[ZrowVwn{]c~{1M 3EkDouHqޛҳ7v,őfhW\M]pw?6 % wڤڅ?d2ZvI3kj2A%hWTVH*{3Bz?kD e(ZuM.F*b1 `s侜ScOW#?( U[>!-46ql%IOЬNJSE(s E&l}'N0Uמ8sC{z]|Cn. */dgKydh^mL}" OˁY0Ai0D ³ Nt^sN5Z$+E @1FA5;\#h^+?|XA\ZhHߟ Tg;ȰU.2?LeIY/#Cp+k3?Bص;ҁ{Ox"Vev7GY:p;kWp)BXZ˰}ԶRܷ9:{uxt.o*˅R!IF&*[!83[PiH㗏RJ4#HxmCX#6 &Q7wJ"^PYCy* ZR`-zИD5`rXK'8[K-`)مBɮ2$O૏^(8uH:ף>^9dfP;~X6l+Q \Drفx%."7w' 5,mBomBФi +OFt)tBwB6D2X#!֢ 3nGm++?n}Ao^#LeleLuZXAIeMVkSCCg! ??wȨƨӖjJM-TaibFL)i=E8`X5joZVl-v'∗̀iz%c=G-h2`swG5{vZȈ'{~egۯLyJ:d50d5u# `V`s%O!g~e|nw"~RZ[rad3 i VۗP̧ Ρ`TFK5gDn`zDfJP&)iK yb/L.3~$Lݭ|: 9/Ǹ6!2Kmo~D0 _ߒWA.!wX҈sxOqƛ^Vwo/BЊF9aRV^=IfL7Wu~oмjSN/`,ٌ6?w1LQZVR*퍴gGoH 2&Tnoffendstream endobj 260 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 2226 >> stream xVyp_a)(GJ )}0-ɖU} +90`HLC% p *C鼅3[;4?ow{py7oZ4#vS˜ 6&lMUwW/g>ޟQчS?}LN]-*z^l 8n[MVqӹG\77sZ֕44 =px kz~=] f u^@@*U>G'HW,! * tX@ {!j6]"VˬD¾1'%^>c {ƆӃjUnL8ȣh6m; s+c@ \]--ZTt\Ov/샲f7gBZ "?Yр3R[{~S4_*8jY 'Эohl= gaN:\]4X*kqEbg.uaZ/~$k( dI#|=9D2Kp8$h3GԛՖldY#q[|0ÊWN~5#ux@%!r :[Wzai =UxD'mF8Cy^~FyI֨CP6&/ ȸ$CQȕ Oӕg],m}"C{ W`@%f0* ͷIB=I`+}Uڮz0z0=e/irIRTݏ|NaP}1B#4W+<v8kCY:F&3(͇Ns?u@Lp!iI;ؿΩ F)V*aĀ¢o!Xeh19;m xQΧ6@g`xάw л*K>oa(\t^ = (Po 8$$YWS 1[NcU|' ;9"/ cVpk̫cEbKHK3AҾm"yێlR c 7.D,i+ߦ5 륄b/2 r`'ۭ٦9S!6טZ[N.\}Yʖcڑ*_ٕ1Zahfd:$ձ z;DWAv9q9pe\ )|5L&Xx 4FHJ:AOT< ۟'m1Xa4' dKK1CuOL&X@6CH᪹;v~ϓ]3ڸ +=nÒ);h5S+܂#6k6Dyly欄ӣ\5-($BuEGA.`?\4쯣 =x )t0Op~${' GOrZsmb&;ڵH}a]XVDצFvMUG[,6n37ʮG4"a3Sr`Qd)oʼhD?Z沊rWn0%92XVPg^3?Br|f:[Rz*]%6u1VaTDܔж!{uf9 qoFZ;F5Fǯdy$>?&yF%&959PVOƮ2 }-HxW VmGvdJf g/A_Y_$O^(~1 "-_bx.y6tV,냅SaتӼ|)mD\ÁGQ4F3dϤq@ h`u]AxGk_ zt %:?1B8Yy+:Y.k?2'\8) 'er+j3*MN]TP &YUyIїj2R7A ]srCm~lkb~f7섾RCm*ͼJVCAN*d7V-5vklGlEq;ӭGN3`YL{DmoKWQkx?S#K7~o拏}z5I.(*tTtendstream endobj 261 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 7252 >> stream xY \SW~y}.u5hںWmj-R7XT\Qq} K ! Y }ĪEKƵ"N;ծs_:_gFwͽ|ƅvļ#yvԯn#DO%nWq]Wj)+[2_Kss㆏+~^B Ƨ?9A1񭉖O7IF`ʿҊg7stJFs!dVqrMB2`~ˡƖD ~\8a/$6CQ œ2's<mcFCEcӆml_,6Ē @Zc໋c%lڗİVdċA,Y^#F# tDb5e"A%'vN>حVV;q|WF _w2%C\P9EtI߮Tamx t<ۈ&W?-Xa rB}Z=35m2QV naH0hcu "V 8Zvۏ"pN̆%ߔܽmd٠һ/A(P "ͶbCtH\ҫ fGdJu$N!wZK'v0T+Jq =Y2.,,Rm%e&G$Ɩ̂}<B;}ݿm`6['fC/٩|?L`e*S;Zd} [v\0MΌ]i"vKo|jJ } /74ضNO˝_5b5<v>DS8[oPG];1KO;\ܬ(ǣ_p4x[1w|sDvLVJvJ ړ>4rzWU $^{ksJh4MЙ*n2YY1@le6Js;{2,xoḠf[{V^J"n6t8vXkUԡ~&\|6S ?J龈vÒWhZ12uqa6Ep%0vÇ"7PR~:OvAyd{y0"0[G;Y68!xP#)aUP.F Gj)CIeAoͮG8GƯ膨#dS1o{}Pemk~D]dOqPQ+DNGc|\MC} م!N,OЪWԉL6 xe ԃfks:sc9aN) t41k&;c"$2Q8Wxex"IFTjy999E$i>pRLͲ \<.haX$Ȗxb]|tg2g;dH|!N4X]'wOTmAgŃBG!,Zl5BlTѯvOpRb?}T$ [%O$i.Gټy ;ĖG7[ A1h7 mqg==wӓ4M$ & j2>s2cw'tH }OnmrzY O4GFG}4|݊DmiF>sth}Ρn7D@)8Gd 5nh?ӯ|W"LH&bQɨ&!14Ċ,ebFZ%0޾\m.zUڐ2y-2cc6xlDcZ jCFEP_Xl0!N>\0Ʊǀ~_lwR֗V4T< %GIOM ] M W%?6a&f;֩.Y WJzKeY,FVoOAgLыF"2שhX :{&:\b(QC]^Hiv=x4Cts]!F-U|-9s<* 9Ym 2?s RNxvBխF\_)i!-*Es+;?C\D‡7-"B*A8<,#8CP)4]%(/7.ta˝=Ɨ;d^8(Y;BcC| G%0f}"$Kœ# ~v_-% =[r됤 S>{Kzn%d:i$0S9q%MtǑYfQ&cI6LIWiRM q"C4foOwo'O(¢)E})XZ43!wӋVrw-Esa?,.9?nBK >zRcOxe^{CkwHB ;ӧK@3>؝(T/Sr|s\Q^qcnϛ)lj PЄ!2;+ œj|Y.UTqx_Z%!<hUUjbl& Ԗc66|[߽o_兝\'\_"8 ?_fhDhƀ;oT0-NnHo'Dn(k۝5$֦POYMȵԄd)H "Lˆ&,DIe)dp#B"BaO4\'.!*B%熼2ER%fZRʈ@AѠgGGw[({Π+g ~%Gw!q8 ]hq^gL}AD6\\ őx!Yp-_+ΊY #τ՟cw{FgPCb؎֡hW x8)Ӳt|u#"#i1%΄K>": xqҨ$rŊ` Ake?D='R> .93h4W~nCGpۯ盟td4+MR`R!L#wu)T >| A{|brb bRyzImuCҠh5\Ib=؜,4b᳀_Ow^y &|k8nԚjs zl|ʴ%!R,1 7ZW 2tIꝜCudTFxRe5(46@y]9WS/~Vvͦ!-d4{XwnY ~p0-/P|#a-zyʦݜ-x yL:(N_<Ћ 1wO|x3 T}r`]wj-܅W3-§ •Rex7PFTg88uikyı}$>4Sm3ՏNY.OS BrJFI_&di >T0RMDS`Xt[W]$1%)M?`>Jjȏ+S7=\[f<2CХ⋤ OYLESg_:&â7H[z-<rjqo14J3󎼎h&ߕKM8VW D~b=}BCwjƭ(0c{&{A{MqC`A9NgE7#` Qx-nZwѳ ʺ/֗+EqV<Ӥj]Rn- yWp1'7qsa3lK[X|],QԲ+ 1=ӡ" ?]z/VE+H57佗^MT@;9'~+W(pem+WxjR,2<]L54ҬFqW6Pe1úmRx7RR|CHo A&5¿F 1DibI>Ai Uy8 "R""Ud 3 Ԩ{s2V<2rJ3BYȿY iP۰>@)Gg4`tendstream endobj 262 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 1711 >> stream x}Pg7"6En8VF;v<=|)K I# $pZRKkZm֞=-3ަw>=<>L.$*)9 /+?vႴbm.w3*Pyߓ)D_|]VN&oqFcX %`X b {Sc6%ݓd{俖ZAeQPo| I(;I'8uЁmp 8mc#`SfA2O sMZķ0HaCrV8I$ʹ-(.E}V$j'3!guB`(m>8>6+׵('}QV5PHtV8[=T2rGsm \uGs9]:avN&` XiMh;b; ٤e! %.Sd¶Ss:/}B /Is+r١hqC%a ]8jw!p]*w I _G;Y)0YY91pfKʊ]9}J&+Ca(qadC7Zw*"̄jԃ1P0ځ*޿_84̘j0凌F%056k*tUƢm@,{.簛&ǘ&Wo<[D#Lo{&q;Oب[H"rC!$_K54/SA;Km^3Gߩ t]ost)?BQ_5(ow@1dHjZ62tydi!8<ȁdBhqJU\!l6_^i+NuLw&!>w(ʢJMu=枺'n?$nx/&H)6L,oH>?7]bBYɿ+ !:HAϿ:{B 2KPUXşyAV$O .fb%C~Wd٢ A6P*z^P zh$Δ(u8F^4;'g;%7QMڏw~,H­1ڥ,Òa8u80!^lU2Xż\\Z0p.k 2{ts:C!Vsc-Pƿcٖ~"*8Ѐh+Qb0c)VMOeJWv䖠b2~3WÍt鍸sO,Ξw7A 1x8XZAmeu*j2Wa4BuBy> stream x]A EcƤaS7]hz CBJhƸ$̇un'}DžZMyZF:<@ˇʩ*֝UxTvz"\ͤ'sP#sZ+ zTmTf AnI(*YPW k%8otA@Uf]`ߗ5FKIYRG)dM"oV`endstream endobj 264 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 498 >> stream x]MoQ $™_` ׍u. PRT# hg.!30 #3IF0.tKܸ2%moؘ797yHaSVW[OV6s/ nߺ ]/:X{2&3C"qԚuWPص\8W([bvyucl(J_ǻCrdJiRU0Mt!ӻ0Yp(PJ*8%f XyWI˱TxwؓgoxAj$Z+G'Q9ytMač}N-s9t4ͤ(?g=u]'DbuQYQӈZt4͋4Kkр횦I4sq5۰A:n:OiTB]oȣ!kM2DQug`{ 9ہ=q(‡J ZSBsxa4sR<mm-m%[endstream endobj 265 0 obj << /Filter /FlateDecode /Length 3625 >> stream xZKo)B^;~wӎcKD + ^gФSՏY>tp_}U=eA% ףr|5887_OF~+,)lir_cCZr9>[2}ЂYo]w k-vd9/Rj3)R /xLA3ZRZ3[+(.J-b(IVC |eH?UX2f皖doU5>&@&VWN!)gP؃.qoϝgC.i6:(~\/|gaIW1\P5g1^$c>|i99tX\b8='KL`Lk߫k`U %7\)`H+wB·C2,$[~ "CInKg繆J.&SF˒ K'>xcC6T(>[fd6M~3c+0N}kӾUi0ޘ{  7#J~5zQ5Z/.f$ْ-Kf)dœWgb9yq2J&)!/&-tv) !tl2 X\NB䤻|]9%F8ͧR@Be2oy-zr(9\Q 9V@^W7 HKq| Z,|hPEUSX8'<xO VY)*OUʲ=^W4}ʊ^>0 e'K I2*dWԠRL6פdLyڊ%rҨ l7nYKWf'i۔ Hu{KRE5ѫ)c? xYF8HM>3Wꊜt}Rr; $%a+iNı{\lY څ#d n,e]a+&6_5A*SbZm]B"O6IEJ㫍6JZzp] :ݵw'ekg!k`Mo.wܻ`mO]мIq8GDp>,t, ٛs V@AF~@ܐy+ zQ{$ǚHak'<4!`Q5zk;{i{t\(;]#q" ^jY3i$o}/﫞%%T HXd@o `-P埆c"@lcΈBatg"UrY@j>n)ۃUNen 3;*M; lIz[:ؾ0Tr{*PQ v` R/ZF5gUv98~ˋn>F!"s\ "NW_ *l6 pI9pv33힇 Y$ݸ& 3p@P]JgH=1EslDqF#SB,As6U6j͡y^~ap乀X+gaAeiA_%ȇe1l;Z; kcdcϭO~ eף9r6 YA G2I\l{f*ދޥZ[0E cQbWGkŧdR6j$>K$tI BaUƜ!U >Хga2q֤#0H4}dh?tN;,q P/U}5TO[l6 RnN_o?'8of1#զjnN)èx\2WxK䋣d tڍM ,H,O:^__t7,ek=qLz3߮3`DGv@1M'I5 %ltx`zK7W"QsoKkb| W#vc؄*vYwMywZv64C<4oLv)xMJ45vLtעLYCk0pX|V ÇV=bDH4Xԡں[!zr&3a#Lz(ּ0]{mRI5.=9}`Nk)!- J 4˸M+gtNQxh_6O`j#:BAۺ:i&p\ SAE)k%.D@A85;@vqib@adqT71 <(wnl\O.cl 3ʹbJ-vxpmS]]GK8]`4NYٽ*$K0yuq>+nh LQ,}Bkﵶgi+Eצ5Q~;=O1'Eh͘1̨4vN+ 熝;[ ~c?@qC'&SV%`Q%KV~(o{%S\R l![]H`$*]{Cc}qlބ^ b+ə y?(StvVI`LPD^j7 x.J 삄y~:Pf[Hx) UwGn#-{+˞ X7 X&F8䫆Ta7rf ug)P~J 1vЊ g7\ՙR";&{'EI?r;-S V>@E뼰v;C7 sLAఇ32|P;/ 4z>YZ_$U@o$-h4s?g;@(6BrփpN4,c𦚴g+ێ5Da:P_Ed?C; -ֳq**LDz:pykqV~o> eA ,F n#1-V' \.n(|}TO+A& 398-{w4V{gw\aȐ3Ē. 7a&w0[Y7/N_ GytR~^8x怹j|[EAEs/AqDv<9%2{1n]?Ɔ/Z;ozjcw:xdq+Eױ;#6<\b& 5Lu]6wrZ]&gZ҈e di~+xv` 1 )F!*|zc< b!>Ѭ?? ; ԝ{&wQnaq.endstream endobj 266 0 obj << /Filter /FlateDecode /Length 3997 >> stream x[KssS =8l9U))˴ L.WA(:y3,)Yd强믻G?/,*r{T-G?rxyv/ Y])dB U&gۣB/~jRR#qyRS]]MǂЬ8O.FT'0\&]wn|hm*ߣh)vd;Fq,!W=.Tӥ8K)*8̇Yۿm{Uݯڼ +^r0=? 7hDA(ibj_S;WYFk 6| A&M ܩCqIEkXb)"Pu@*-!틥+L&=7ˍT T2k6/!I mrnI*@ΤU>˞[aMD'gCYq.>C&_m<̢]<(՗tlz]i׃U`E)iрb q%YWKlګK|Jb׷z_ܢQ8&SvK w2}pRw>X-bF\̿R᪤4oKu /]"d7x]W%%M`00e>ť>d\6ZƺmQQy%`؍Zt|ikUIKĂXơUΎ6 1«;kF !Y8JQǥ&$Xt_c ,knd8 Np.p{zp0fbة%KAqa\G:Iyưkk=Bi] ҅G*OpD(LdX^޹o}o:>yI!/>`nu>0C&Ɗ.xJ"(/="BK)C>dBJ;9c8[ 003 NZphyꂩ}&=;#_Z G8pnj%-cԶ7!`**0*!rmbu| p Cdi0‘y߼k ͟.]> @+6p69$P@Y(z#l@\څI칞Jw.̮g\ra@ZCFZoRbʂUFEUR?e=:{0A|PipP`34(gL13>X{j|p6$xX"M'#_Xd<Mȶ|P3izۖ`t;Rp4}6eq28xΒ&4iOTB $*e7a5M,cma dwvA_yAF)O:6H&'fu.gGFB}oZ ݘDɯ,< 1sCn 4A;Agf0Jǹу$$KfgH&ɰpiΤ#, [.[gVQ8l}ls_>[j<)lf xYtivUqs##T$wmI7}3&Pj'vl6N(٣SrU_.jU, ڀpni>&bbK"*)X'o6|ɢT7N2ogBϒ7mezAo'I6 ˴*?:`& fɇCpBj!LMHGw<4&|3JMR<ɷ8N)uOXxKz`= pi2F)/7( [Ds$`QWB xD*4!YB}SBL:ɴMk19ܰ3 V |JlA2npRoCUCmqb~2f4~3;afաzp4j!3d_Q B_ n]1 %k> aKK!? h&d 2Kt;tlM.y,Ij㚺Hz\&쪿.tyYoNO m%LR 8˝IIfQG#40Q-FU΁_s3ȕ?*PW``:8Ʃ-H$G.+]`F&Cu٪ 8y>{netv`1셄ϐ"38?VZgI@&?$  #@'cxlx#>d/_G X_?xu'ں&M;o'IځcIJU2"y5Aݮ>$ڒ?%ɳ/vs'0%I`_ .x+Axl}+`[jvHI7: 9+^wg<5ϦQbɇE-H+h}%JmEhۃG7xpypLs3&24n - _os0p96hqX orHR>2G5%>zYJ'oB3ᛊx֥i)Pяxendstream endobj 267 0 obj << /Filter /FlateDecode /Length 3251 >> stream xZ[oȧPVg$5@ (ym^rK}ϙrL.gϞ|;gqYqjTG:WWo^ZRBnjт2:RjtM.6(%״~9MŬ^L;ǟ$E}8iW&AF|8&~MeYm«F:I/wkdM3S0;^$c8Yc8xYr[*l}-M TdYF5/2!v^J_^^o^R{tA N/ɥBK=n{ Y@QQ0pUX\7y@6 dc @Jڡ,IUX=?ڈ\qYe0Зbq؃V*Ϟͷ ا7jq{ClYrtLA%k?< 6-s.>(&Tרkr3L%2|~ zUܷa"Ti|;@52Yp\&5y6 7X#:f ϷLy)ٽ4OlV)R)$']UW5`0p>sCfN\vMRl@AxDLB]YzcJ()␃+/̀1vv(V[>'Kn73q,_]ظnxknȮ_yC9HgY/xDsfmsj!TSbDJF.bֳ#(IjڧX" ܳޥ}I?Op;aRydXC6p]@]Bŗ ,{ YZJp7#_]L7)jgn<0lHw4Krl# F~ ȕVzP'!-2+0"|'!Ԓ7Vٖ!"wGD(> Y`2sWkjRW;pϤ!UT bJyށtOfQQ2F6ԛ:|@f8&UJ%tS hե8YPsOӥQdZiȸвt} SYj@}sZ[6*{4HU~O]m^"c eJ)n[v8~o4c^<[B]bsk%dț"G5,@XpTA6v;VjY-zʁsS\⃄2+L0$e*jW\6&b&"<:BXH%HCKUɢˍ[+5\CJ&ȸD^J=,ut]YNo!{s"յbE^ p9a E0g (ZɌ B"7`?r|U8WP]onmè4XDZL` $No޺0 C&HmOlb꙰;O9ٜ /~2H=SibKAԔiR^sn`ʑ=%p̬nT ؄snLSCm$cw6jBCm@@g]x+D+vZua]uE=C.~TJ^H0vTP&O19zyJN:^a5cĴx3[<]IIF5a;o'S궐Ng~%4?'GT"4`0z6 guS.!Y~lĠ*ce]wY!-*zkvWPSeuN0#g/^`G|s kDx :(8$&3? pnIΠċ؉]> 'hX]tR' `U(E,;q79hKrɽ,$>նvY(% ] Xv'c> :` Au}!P=y(h'_B'NϦ^o(<"';>2BʾyF[CHw2Hr*,Y pK`Sat;S@j,wpBut1l E]64JKK,G{Y Bˀ}<|ۛB h]4{q-fT̍Ɋhcf5l:sdluSLX@ptmofXn EejR|8 3! U. w{0 p~_WW\b}.%3 -om:(ஃU{t= ˓$ C ` (A?@nF#_f=~MtG€HO1(:+9ٽ! {rˀUsM3Ǵ K *kp6͆'S%p\+|lG@pA$J}w#/q}ぴtفޙ!;Kkq>Q8L\/]Zh3k`\A#T7 [amz2s#Wh)Qkz߾s =K)ҭ~yw36"ٟmOL{$700uOݠN9D7~~tfn9Oߠ9~xECZφDBOoJm^c*|d*)VV'׼7JzdLC;ওs|z(@nJf){Ye5@[ryP~r/DF|1Ǻ:ԄqǮǶfKmu!Ã> stream x[KorrH@ȁJ4 <Qlq۝v$ImDq>nOu祵AAc9ĩêكA#471 -㉯?=a,sT^/A7ɄLJ*NS5m.bund2 b7OĽ>ÿ?Ĭ/l l<t Bk|/7 kBYx w뺺mՖ^ݷ=1C2/zM|{PqxlEfxG^ ruQ K^jk۱ednh?y\$3_]ŪT:m@ˢ0 lIeNj:UN+ɐ_KTo/ReO_$J\PL{/KIv!ZxO|GZ>Xq`E4Jc`cv14#Է-vs8er°(chK}QT~#b_ c پ߻`J^xl){!y >E\W!`p-6zdV͍Y&mn YnRL6 &8y̋d*o=2FqPe:`W7lw#% {EޖHF<^1ཊrL8 g1Mɥww8P&O"UIw, gM@Ȏ![ VY ,!5'M'K۲ 5\Mb$%4 vqx\}@GD&lz lTH-;&,k/%Ln+dV&zx[yl;$O5Kg&uE3Ud8'V3Ya*3ɴ(Q +vDŽUְ&@ MyXs5 s W{ 29erMi:Gְ q~8L >i굔˙HCh>&hān}r> \Dd, *#WO86qžh)\2r}YBOIbg(X#Til%`OyeٛÁ@tNfs@$n0Ud@>ɸ[`[ hR1A*!t#`(ȌA3?7Er#ҞD9"T #wNR۪  iS'uf¥)5=/o^$.PΉCpiWm#*M)VR6 Bv^BzIHĕW瀘V,tCr#08u&"H 24ظ) C&,/Kn òWL'ȣO׷U}{GNU M`Mޡ%Ӫ%Uɰ ><\ɢVUΩ,ft>bebډnjj~0JN|GUyKHֱlM]5|9@&F 7Pv` c&bܐMeJa4 2 n'bt~MFՉpf61Z$`Z* 4.XNTF2>5@dƍ*\(Q"f0O}hP6R$iR"i98n\~ʁ9-T7<_ZڎDNJ]Ia v.gٝFb1?"Ieǐw&7XN&^>N@o>^Ώ4k줚wXg|8nؼ`+l}dNWTeK@["܏Tk@$ռ 4eCFcl0y }hݏQF6MDiv̿!;9oM??.=ZMXA;X)PXD<8}*6wuq􄅣f1jxi}(O.D8n8Lh9uNGZBF=ч /ru6GGɇM_Ua.F0^! S+JJ&瀫b(&Fq7٩`&$Ntd&흧KN͂IKV)ya9I7}uj̜J1ng:-88PӒ=P6 şSٌy[8!L=BM3)thMXԧ8)P*>8 #s9ke} F!sƓ+O2G,x* XczXv_r|aB]dv_:{16to7n蹯VUX9Cyu;f?Ǯoq_x07(3Zlޯ36Oftb˺"^i{&I 22C+r~AԫzǸxb^o}B)xs>'h5ft/FLۑ5naH*E|gyrǻ> stream x\sqwOsR.>ڷJ⪳K_JqE:uUbH.YVt74ff/l UuxYZU+_V4ܲ~^گ8a}x?<9Ojkv7w72ʃ >yuz]{aip/j61#%[]kU30a߫~'gӦ#05еR!\V?^._o\v{`HSO7err)-S:LS*DUև5"o@Tc$܀DY(F›h׸~wY $._N`ǫi v7$[RzE}ݿ=tHo Ww77 $.1MCz40n21 G-u3\7~KLڋ_և@7~F.~rN !SL_)> pHd}C%܋tVr~}`_?EbZR+^B}wڳ]bGu ݦɪT=]-{nc,[Z@J&VÆYMVZ+j_,zDև.AM _.ֈr3hUthlu`}t, V$/*x( #n 0M"q(z_6r Q4JҨt?S";I݅z%pwqoM9`jɛV-:c@̌>y:P-t>a3vt[!0t הTRHM5{tЃ&!-2uSQ|hϕqW,GgiH5ܠEwtd}!?0?Oɝ >΃\Zz*j7+L6q>!ҪHZK׋{o\AXZBl)ja hpt\~ .re\mל͊,een,~jcgizAH) # #Ռv薂*M%FP96xR-Z&C:6.plbelzcǽVSQa~j<4K7StBߤ%.4t*/1a"Ƥuiדkm, 겘m tJ,ܯiyD'譩  Yu] tׅTw8*WhdRp8N%z~@/ٷMoL` LtI[A6Il N$[!@Flͤ[sak&_`Lq0 .B|y9d{1f: -R]؞1ŠMe" K 6UB6`^6 K1ĘXC&x%^O\OJ>J-йn|n6v .r֍E(:zCg hV:rF|wK ?4 ܌hk{OҲmڇ3DMt6 VJq1dKsitK%,U| QMJGۣmo7qd"@0H FIwq֑Ec6>#zOx%|+A)AqZ<@H!myP:GsPdԁR8,R(hWƇwsX0Ƙk$yYx\?|_c`6WK-g!,Z!cZ2*| jg#,FG+x3&B6ʏ ׿S{tݾ#DgȓmQ]FL<_hhj2}+wP"O&咧x ne䙩tn=`Č}ӈ$ a2ܥtJ'^w$b$[ eD!nNn5WެK{ q_&s/E}iM=D#dw 9=*D8Go.~c9!(j0lG+caTF,8e44yfdmL]n!1"nЇyzIKR h5-L(GbfSbqs8q5H, {jC.:.e?hR-ܶL]Scе-}EF^!j80qfn)"D•O'eܐc,i [6j~zadm疊xV[U3n:A㠏h@P _ b^_oiSS!XB˘ǩTwSkB(U ;0sIA8 7k`jtV|X]fSnd!^(<@PXpF7i&S C֘>RPܳhOe^eCy ڹ拈SyWbHBW.{IYƑ׾E'=(u\(=r Ko*_EY80o; !Q]V?A;#aY;]D&|*Sj[lsuheѭ6-_j<!Cv$>kƛ/L%ʡO3-v,9ptn4ڑxQtxZo:,U S&(c>%׮kcfYYѾBUA RVbG];4!S}x3|xwCkq_\Q"l]4@Bd%gkߦ;/RS|p2XjĞ8l al*+DnRaƆׄpۙ.z`Y !PvE?&kp;z4<ΧK&?aTb׺"#]Qnly6e1. S350UlãO*P%'QetA1Bʯ|M~S.kQC ?MvGe-l B7}s C푀&/hh54m}{?xSx])9 ӥ֝/~ӊ&kՑG"0=o "~A͇1N2 Ral`X w.ךƗ||>PsIf*w{X̐Ù3(\uCi'JCamM8m0x*RեՖTK3AWäj-@ 1a1vj,R1st~D3st"k֟<:OC?#+MpO4XpZajZ|$KQvНTCiz93$'j-.pYg ,}g 8|v?B#z/0"g%{7Xv.(upSpkcy,q${(KN/pVhn_9f{&lqPRa r[Te3c%??T+QVh5v(JhQZń&個,ƞ!][:)r :,n2>DnaM`C{N38}cw[?iynF>ZOrde(êö5(|ONf2NF":X׹ 페3K;v og-Ui FCg- >URg@XA9Oo[f dd!)$#9PԱޅqW7&0ByT6v(5baB9ҭt^cX׍pucz`w8BV :0+1]F&9>7S].99.2nYOT4xBQц^AS&Ze #݌OXz q(HrNJ0bk4y,Rև\ׇ엾c}Sfܣ˯} fzgl #`ц$e/=r܊Oju]ZΚ߫a3tG01T#8 'bq +IG j9e*y(>Pf_g "G}#vEp 8xX۠sM$kfr饆I Vzm(td\QQؕ g3Pqeі~^ oz{y{:aRf'}^?mtendstream endobj 270 0 obj << /Filter /FlateDecode /Length 2894 >> stream xZM>LJ9@ðH"A <@53fDW$Wp%Wi&|?JWwW8] ILjj{{e<Ⴏr'FնaۇuZ<5.FJ*zQEkfzu R//F CfӾy7ˣ!MDH{\UC혭yt2Şq52,}xsg^qP9:$tg; %HMeD дXYbcj &WS@J 20Jq:[| vⳫܟNǵ4]̜}x3l.YTUʘ:`ش Xσ7>ĖCEhτ ځt]j0e]*1QT׸fL]kYK+M)LVts XŅMp+ pN0Xɑ!7]vW=cjt3!Gy&t]:2=9y!2"}jY.ڂEAE 2&6Ih=r"]LPomکk'%ھ4ط EhE܁u7I[0uՂڤd 8a,F6v|\qr!CE]RsBul{yO-YPV`u1MhH5^_f0=i/2ck<=;@Pٵ[N{f}z[_O"ŭ545l{3q= yf8 Ǯ}%[lv,@tc }P~wiЪU|4btحKa+T&>b}6TV PtiW_֛\Ϛ#v3ܔ3Qie a,6JVNEUrQ -ϝN0i*RX0|a{Y0|ޑyy"\I$^9%p buSjoju[5ΙZhyKd 0Hm*hךQ'f%tsZ0|3z$R&tlRȼP%oD޶W diǘdل/[JK 71$qQϺQB4utk/jHfG==ocہ*ا-kM$|ڷπǿkYpنGrMbf!Gp,w6|@}XL0w7?~O}> stream x\ݏ$mO`i:dP 9hwn㙝;?$?fAݳ,"d|[=\*nſe.,^T5azi9UզMUOG&V/e"eCٛl"{n)˧XKj&j-u 3}sؾz5Z +mEХWo_5aOۻq;pxF<><6m5%2f\m]1/]ta:y]hC`mvO/V{8ٺ͹0mcmo:" e\ CV{|v0iVZmB=Ð$5M4!.6esXخ70Z pnE+NAȫxE1$zW؆qzD퀞"zDDe"gD LĊ~et/(ɃD״Q]/~M&5◵j|Vꑬ bk$Y uZD9 _/$jo\&km#YSDdǯ/ʊB&/ _q< #%fxI}p&},nqϙM"~䛭2r# `rnۇ`5l_e2G軽9<Pݙd'Hu]| ݙDT?xa;m r.|;&jC20p~p>)ͱ7MD,SWu[|R8Xy7/(~>aJ M 5%yGtgftG܁c0\Mosd::ؐ) ApSvL #piUuAİ҃yYq<Dٶk|pTȱ }ϸğc'2FYjad1pC%Ru0[m ͕jl)vE Jbݠ[cr׍PSD:q%!nN[S4 1)$< |.I<79^^]̞D>=#0F`-D& &M,rCV{iFaB̌Jsx^s77R1ʝiȸ7Շ#/fa$b[ȷ*-BNV2Sp!,+ Ƴϭrn={@Mi}.}d n (}&\.M;,m!Nɸzwc R~d9ݙÁ+O$Q d'^$ĵxAM'\>h-/#]bqjmѮ"% ҇ͫnVGBR?&C1EQ %7xLR@!~V"A.9R^4-$yQA`+<첟1hgvvٚKH -O sH`8)lLwY46HX rP3Z7BE\/MXYPoB2?u>XIP}s< -WL,Lnj+72CdgU KH<㩡f#RYpR# },^@^-3Cv*-ǃ=&(ryW+\*sSҁGz.!7YIXif!scX3L fR2ޙYes1C;tD?y1[f D>:r(2$89.;y0svSh*Wrh:%'|Rd5mR '6JL:&r+!9O,O4;W SwuPY偈v<Kl榔~O|ah&e hZe4s݉rٝ)|$%ۦ k<|5XR,lR9jmk' ÀL*=*QJ,%KI]Jiq YzԄ0~ZdũWK@Kid DLioa 8kz c#|ēT>8H<Į:Y~d2M䑽X"10 ʞ1ZS+)_CJC1fa5cL奬 ~rf$̋tB4?"oe\YY=_P P? B"OP I`hk7K[ZZ}u4Fr;N(ӧK֟*|Ӫ菤Za?)%tNoi $PB)WdSϟ,̶%LC`ͫn|ބ*>O%{7cSʙ-]5x/,KMA2΄11XT3^0.emY\7Lz+|af}ː7θak5QQą<1VbI<{ @St,K߂M `BTQ'Z nFgi:e [檺PXMR軝9n+aiBm_+]#x=âUMBlf>/F%eԫ9OVaڍ^/>-`p >r܉Q:\fD9aȨK* &k3<]/m ɇ8Z?q@v-ZJ,+̛ FΟ*/GqV?VYZne_%f `SZ?2 (J$<x5x={S5qJ`2,j-<1 PNB6~Mq9)뿸"\ѫcp x3Rhrhq+3r78g9D 5e/ ~O˘溨}՛qN/LvebK+?۾%Z)J4B-v\.1 XVL3@MOrE&j+~n3_96O75ܨzC^nЄ!4\Fʿ|H-JL{?V4Y\x11JP"MfǍ~F zlo3K61WRt)0*,qo6)RI-!P]] 0(7m=3<^7?S[ǽft6/j@(X)_L&~cmve :ɢ  6ȸy~9% WxP3r&G|u͉ժF  7aa_vK쮥N`@3a0]&}dj L:wcMe0ڌzp'$W^,]n7[Ia";m*Erxc GlƤHq#/Ago;hjjhETNfBQf,QA~&Uqgh= €(6*DlJRRn1xs ZR->jsVE?M `byʚ IpgjlmLAQxI,C=!:RӒ`fXۨx >Ƭ [Du<ߖrTY#*Ty9GMo2W̥SǴw~8PXeӅ﹅&YE fI 9pӒ%^n `XeFo^ i!1-0Mxy;q%&IǗWWrYm4ݟgx5ƘI$|LF`=UvMl]VC m\ 4z?0||] ce.>8;CLuw {0<0F~| fJL%gf CbfLt@s1<`+@Ko FLN`{zPzXuƱ%kAY1c.Ɩgad#qp|OOh8n? OK驆 M*yf Z>9fendstream endobj 272 0 obj << /Filter /FlateDecode /Length 4687 >> stream x\[uv3N]0TM#fn6nBJ_A:į(t޼76{SդǹE<36$<_Avĩ8-Zְ}  ݧO;vtanTDh8D!j ~ӿާ8a ^zt A4:<|8@~{CWW(%\ !Sid"B5|H鳑CeIG H5)iFrt2P9Yɚnw]: r,HYSe-̼ [~̀3 ^B<-b`܃0z>C`fT{|wv'قQ"pLǬ DӋ_j%hy96|||t~DD{̬@ oKKʼnU]/C|3&鲘 p0b}YȬ#8{-Slg?Op٠`3A[];6!XԹc WtheD/`]IED/kނYzb<28kt\okQOrKz77RȿOM/Nq_9S8| )$;:[ PU8Ci9!2rz2oaM6͈H2;zhLc ԭFu}NSaZѱڟ"lU4hqY iiah5J!=B|i&Bz1n < Ղ +jU>~ #,3Z҂&s#H鳟Ext{sܜŔb(gkRKfӦیM?0or>o)kq`6i4cFN'(MHN'i]~_4_P_- ~@a,햲FɜJ%RN 6w~qiec_5}8|L+|l kL TV!@X$Ċ`@ 3~ƀ*-#mHZ(#}?k‡{Hp`ۛ^00L-XIseR$Q rJ1cGsauBqQԝ1!Jat2͉78d)8Mۻ1J܆;06/0!g50ZJXdhV[2O #4Pv. *Qi=.r%)Is~30]̬Jx1r&v%ʂOSyjr H [2T?>ϨIPZ+܂6i]X(R4%9AJr͚s9{81bi`U/p6Z;?YR\ WK pŕv@sU2U3d1%; aH->tyl}.g˜_*M+g(~eйF, ַDbQɰ*P<>lb Jd!$#*5gHB`Vt{JtœnHlu9dǴy?3{,!5˘7wɳe"%uxV91yqYvJ\P:JIDj-pq)xf$ ?nL?MHLI9 IU7AVx 7:Sip6{J] C@ Tp`ZQg1AQyA=Qv~06CubE}bc~ 穚) $mlmT`^g>8dNYnhq]# }AEX8ȉY "plky{K'\ ;wfBuR%Pc<;" o2A -O@S>i}۲si/c!?,K)lX!*A~3AØa $9a*Pc?oeuw7(#eT1n-4^sȦ ZZ&Nj|O^߸D )KD fmD}km a.MU B])WںRQQ$Yb1/ ~}7yۤO~DY;mye'|C324_ϧ~# `kWz\.rÉ*tT+;PLbH-?FqJJPEXcVs ř}f簣 WHwA U La)u>w+4p],FӹQQoHflJj>wqaw 9L;eP⎻Cqδ3/G6S9@UΌ^cg% 'j'K7DjK)긢㶎DQ{i pf9n ұnn5؆vIuυrTP-^%+G:GpB/Z]%È#SM@7śL6B%yY'puFE# Y}^_x If?mZ/LtptVuOg]|K+ŞQKSKMKXq곅=s? ݥA.@fȣiS}wcG}_tH;dZTV #mZ{*S%Eb8\}A_̞zv[{x.jgȘb&Bkb2#l_A(ax1++꟪s{ވO_ufii@jUfDn{ C "@M+apoTԋUx7)\oXQ(Cy^*q!{t^I /VŘ\{ɿeR8}\=RLV.Q:cZX _rF2yA ҏ4vakԯ"aڌIs׌͒I_£bja{#\xԊ]ɄJLYE?ޚsfhڷm7Ņ]% l^}g ~,endstream endobj 273 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 1117 >> stream xklSeO{ aXm CbD>H/(0nntkZvnmOO/+{ e.l,SX KF"h5F¼{YgD%~xߟ S1L;EE5ֆBk1}֗o9Vc^zQXZή%JԆU랸ۂ박gYM1'ġɆgѦ(_6bbz}a׸@ev8X_g ]U=pƴOwwPbԭ?тICgC0BM8l9@$F#*ai"z @̓aPUFGRɱl醒XQJf_qmajqrFg;`w`=M̦h${O:jX$CS+^I@ng%GF 2#Gk#0CdWne.zR-=s3պ絯}ӕs9ʉfvQi 9+/3 #PgH|MZ;CߓAR KUFC4M#\jݕ ٓ1ͧY3l}V|葿lp['SJ|=&we 46F>bX83%7!%/W*(MaU wW yiksut|y"`0@@zC@4f,w(O!+yG9U֒6-1mfԀ]n_kSaeԂt8 h 8Kb}Ԁ3|TP'> stream xZKsF{˗nD&rg8:JуŇ Bﷻg̀ eg 0~}˂Kjn{—oga .j[xn(RKKVjXJ)ofď+ ?)wusZ 縔85plTS>Ne ϯgL/^~qW,?1s&++ofZzӬ,Lըeᝓ7͛bJ,*a_lW$ 4,wj7p¦۰Wfz*]R RM:x/D*Z|8غB97^G?hRe ־ _G?հnng:r7$e˥3^kn#ݦmD^QH' 4ؖ|A׀T¨:XkM؂;.b~˂8 C:\MTG~={MOkX|C o.e 2o:!.˗gcsZO>B¨HǚHVvKX 8ld8qIB@Aq/Ky!dwpgQ,N_K:J ]*mYNV7 W5"Uz%sXp; DH~|á.Ac,T)jc K~dv\a%<8}8eXLBh`ZDZ ^Rpx,=ʃ0;h14tI5۷ hBx$<@2-NELevO$_9j A9L\2'> G"YK"JBy'AVrA. Q"XF";y9 + ctKj=nB4)QUzfh6[WNv ̦68D%~Ne5{{TB.lJpCyE?16 1MĶIgqJ$1]j>Zy,& l`;6 |D?@EQX o#qqVu0@@K?I sHdaqBB\D"u3}ZQmNyO:D>rV9 JD ^ΰT>7Ұ$ Ue(Xz4{ ! mי q~pu~&Sy 2Uq /R` =vA~']q,^G7( %@j)V2UWi$0̃q=R/ܪfnji)1b'Jp gJ|_Z:NϚ[m$ ![nW4mgYRtfKiSoN!{LLuyd@#ؐEE4'QGaR.y2("!D[z">UAɄ0Ȣ@1 (Q?6u~*|MLᓐ ux@ VU/`((qX0䞥VDlP_h6U^ hb M\&q, +z0Jɛު4$y:G^ >sUwl<<!]j]X #*!jmSD:'ǎGYh9bqGimUøqv{J/\y Q*Ӹ \wÇ,.vMߺ+ׇ㞔l{n\QZLЮ ńnoy I jWLrIxٰk85%;O\*lPRfG!+BhP.Z}DnbfSNdĴj$ YlVs.2"۾r痋BN#Bv4 03 }HZ!;nŘ0@E!RHFG5]wZ*Yƅ+`r@@Fv&P)PZt#2qA A+‰Ɯb#$g"IU=m88iE/&cJ^gF_BJ: 8uUMT8 JvBO5f"ƕZ[ _2}hnލwikY8Пb(ݯVt'y."XlT >7&n/9Մh.ɡǖTچv(tmw_nw%CF%%6*WK 'M|AdOKb(5!MZCԬƅRi!G{d(@ȆY![%єhНO.Q6IkndSJkyIŏuzľhendstream endobj 275 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 2308 >> stream xVyp]YֲgEv\ 2 GCZ:H eB/0>d[%Kdi[|ba&ǁ)a&M)!%Iä |Τ Ld&ggwg{VAQ /^WUBU^IιYVV+u ;ڦL%d'dL@{u\<,Y^2Z~-5MMP/S˩,Pƒ/RT–67er8*@7S|12A4{P.+>J[͠@pVQ=-sdqwZۢ:O`5.J!{aG' Lh$ZpAS5eP'݉jpf`&(mr5h ]N` 6TfC'TYBL=I`@3KyS ҠI/D6tGj=PI>Cef? 8ވkF> M gbTJU%7@1(ohFN7O`3`w]A&yTᇔz_# 2#nRhnМM=HBr'zB94*pkɛOтĹe'tJ8cmA0zY/V'9u3Vh_(s?>U/ߘpx} (EA A0h,7ԋs OL4h+)LO3h)<?>A~ЫDRzA<& ؎N";U]R6gi70S{o񅠅ywMO)zK"!CHN#:*[E} kn0[s(W.YY&#a?,my5L 4 i>F88orТ(J)fcf^kZ`lVxoc`-尺"(c>a(Wc;a S;y7?<}<>Qy{?z mRo/jَӇ|'9ߵ{-ҮMJ<|$6EYʡOc 7 )Š{փ\ ckŏx z۬P3*F/ (e8W t;dMm}d_ f oA33ǒLjtV KNAhEۗP AqR|?[v<-S\?CΡNYg->WfOMk_At/e8׋VB/6cxY:0b8tx߉eR yQ.eJIGTmϨܳnXv =$!(sxG+_Q7y\f7W0?rCө3/E|muʷ~efOqj=x]DsRZDד]IT KZ}h:NxvvK DIxpާ}dklG11Vl0N PHvv>\]?zYWĕ 5ѠZ> stream x]1 EwNhƈ%]2^1!Co@-zpo)2XM% FgBq?]8x_L~m=5R~"5U[=0gB*֝HUI?Ra:ה ^ -XRVendstream endobj 277 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 224 >> stream xcd`ab`ddM,M) JM/I,IItwӋeH sKJ2SK 13032d}_*{NƟ%DvV[]R{l3v/[PSf'@! T[mwE g}7;Dw~ -ߴm'Nn9.p}}&L3[endstream endobj 278 0 obj << /Filter /FlateDecode /Length 3567 >> stream xYo>,>̶^s(iD@l(*Vk:{KEv~}orKVl?~{ƯeA% rQv\o_]>{fMaKK3et.,|[E)%Mu\,9EIWUZ,r||䛺5,, vѯIF%r7mMMeYmËыWo/f?:IA\ Isf fۙd=*-̬1,ʘy3-kdm@̌unxqzpa0c(a:EC׫?>6W)RMUoS2֔MdIؑܖ >+t\8ݱQNd!ϞQ`1ݣ j( _M KG _`+m x <,uG} .d£}9dAY"Wŧ# t?!!޵VRX* cOd_sIAd &e^đؒ`=RuQWW*etR>6ͫ=0zu?F#Q7 fj !GPܚ 7)EcɠTYAyjj5a ͪkv( +c}lή_vusElYr, kK?<;2u?IɇrK;XR7J1rzujB cK 짒 aѡWJec,}OW>LvgͪjX51봘oVE&tHN^_BqK!)97EdyHfX2 *#)YM]{A KPXN-%n*G‹Re/AzQ1B?^I=*<I.Va0{D|-f, }UZu~x;L倄@7(L/>ږR1LE9).HIeMdkOR[tU F7|mlxԀj TRg(vX*!uX+6gE *]bE .J dRA`ݣz=~#AaMlבZ6~o#҇שjP]ZΤ[yJU[w+gUn_SV)ƛ.m:@FD%9A훱Z <>&@28Vz{!-YRZa΍ P| .Uݦr S }R!HSRҹ"}-SAjUHyNAy3cIF%/6ȆvN5mMvхPRuR"zGO&k\M~>hJ&rfP2{4p0OUB&>Q~OǼGF]Sn'AB ݴ9xy; QBHW=$$l$4G:'A;l4s|$p(p03s&溬n_gH*K I~:$PG DhB%Is@߆JM@OAjXoiXǕ$h"u=qϾ vH*ׇX߼]c/rEzWw Q ^2zyyMәO7^ɳM`-Gqvלc\C*&(mǧ=}ZEHmIe/9#/B*.#֐Kz"M׋?q!%?kK#h0<[ !<#!K6!VO 3NNMB'ylvop P^Ugŋ\ he7gA6uCMTM], P`8ϫ(^!I74v-gW ۢtĥ6|BS[?,1 ܌1VNmLbSxr Ntmi!tLx0ڷ3o>ab];v~Nm4eAߖβKWl| {HP\z ,5X/=K=%X/&ǤAf[!oZ:Aci^]͛Clz T(`|HT?EIaŗsq t(cM$b.)Sl |dXRc˾̼JA|^̰1H\ޭ1Ah8]I< Ta#v`2)) c^D&[,O| @,=ɡnW^R<,)<dL>ti-fȶpR)XaJReN'0WG`3$Sٍ!-S<fٻ%y}h| 52De0T(>P +}1C~K3qs$zO<I i>iq΀PE6>TV]:>Eg Mqf]j>Ɓ8ngFwsFw9c>^kQ%Z$3ZRA?#DO֦IB$ &h(@ƃ |?ně7V,1셈d88 -Fo=s4ȮY fU@3:g*o)Q6k.G UٝgF?xk*ɵO$Yt줶QECb\,'xs(4mS5o{: 8. --m4endstream endobj 279 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 419 >> stream xcd`ab`dd M̳ JM/I, Jt+Sen '00 PEjgJc(~?v2OtQiw]w^XR] daL@[mwE g}}̅݋3 ;0tzr`阾A~e=rIa:Y˺v]g*{ZG\Fۖ;QawvЁ/|{%.Gwwfnnlߥ͞u왒';;߂ryl 6\jFyr+[nse'\y cTY$ClYN^:KH>0w݁zendstream endobj 280 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 540 >> stream x]OHqJ ա0AI?(4*CF]gDk"[\Ƿ;H$Pk yKt[],Wlb#IQbCk19/%KjXmozApۅj >1vdN۰p߉; !s!FE =i`5.yl7 iSux0^wA w|jj: elg9xo>&L9]م ߜ -;qcRrzV8hLNZBFu4 S']u}&r0PO o+p^bNiX~r[,R E^nC'oP!X$5LȌ6ol,河icnqU H/endstream endobj 281 0 obj << /Filter /FlateDecode /Length 1959 >> stream xYI6WFt:f".@hh9(^&jd#IGJ2IM`;E#Z[ }b{o,6尿/|BQLK!?fb7$.).Ks E%j82/BՍy+~H7E󡻵-)EUP{{V] ov`%5h8]߽pQ]X;맞$g 3c^]|lg0-hQU!4)ǑOx~M&r8$(!a<q{#n/2 mn~p>pa†w-W<@,\G !ŻjR7_y(͗ ('QSc|./p&!%p]fEs%9KCĎ AE Ќ( W:y(8Ug6~պur(=\D9Wfu w 8M=;8:y8fTT~pzc 55fCvTJ1<\Q~TUb*[]j,&X(7aTx\ݗќ!)IFG)Ɩa.up,6 6K VwA͘p +tzwnp^G2Am<. `7"}z6.SnR _m{l3`HS"eQn 9 ~knW؞5Ok10G1F;YڝHەRU$ܡ~m?x8IUMLnB@ս/NR70oUio;b~u$BosxL)8EEi˃~iUk@sL9f kGXZ.=xO{68eG)`>É;؉FHD5$<8m=!F0Upf3FUw#BX@A+'qc#}3;uϊϝr8c>i>ipb)vNϽɖTUz۹PL}67.y"q9K7=0/u1X1z0}ɏ ~*р$,~m=GwŽY4 \[?,]պN>||:xىܗŨ(!QА02.@|ƽ\*^%kc6.$ޟC!m h ,ë{ l,޺Na6K-\0lXNU;!Gŀ4GnI#> stream xMLpmbKK4^4E 8Pn]16V۲9B ă1jX$r2&& zb|oy^i0ϗgiFӉͭ6ɲJUUKv^ z]fyvcvtY0+.Zt*@M+9*}ĴVUa\!CZO5Q_Bh' ֫REU5yT51[USrdn8.e<qvgm–QwFzP0(7- !].L./-"f*L (_\: dr?Lc.lh14ivY;?xtj'?SV瀈ITF96ΑA?,{* apG^ IYDٔL?l]:B7@˵ u@lH˜@=Ae!8x!xM=d־[j!e)/RŐ3RbGF>M|&8(ⱩiQ(FW-$lcGd.V g%rһ]E>;^1'1endstream endobj 283 0 obj << /Filter /FlateDecode /Length 3548 >> stream xZoC"O^p߻v "AZ "I2{()>H.ɓX\kg~3 7b~5e<9̿>} Ǘ s(lnNfvJvJV8r.RA F.%TNZKWU8E7k#,ڒޛC߁tEAq jFZ_Ͼ9_53A}sfAOFpKofz|T֫SzRiJI]Wۋ@Ὁ((|bX* ݬo3V ;kxGޛ^0ʍȄ;mƅZI :[t?e\2vT{zaH3M9[->X!pljwr4}ʧ ʵ){DPC02Iy[&ׅF{jpi]<;J 0K,xK!Pm?ؽbSrc{ፁ{T7dn{vCOwXN~~=I^>(;/eu{,h{dsd}V 9ś߱3ʫ)]jtγE׮iIBQRaNLGðQ]Sl IU8C i{ Ǫhebc+JE!5'[ Dt&ɳ:fn' {$šS>XʫfpGʔԙ䐫#~Z1`we]PS0ܭYzOAozMXn`ݲ.8~-,?˾sRRDVS c@\H ̗V7lJ7JIQ;=\mE- g`\RB^p` IF;n)/fT;{F` |YU u&`fn;V[a ZdL5ӑ.\A(e;&bQ?f?&UٻΧM\ФjƝhaTuf{qY"R| EpxkpgNѷ$ ƺ#'yD/\L;opжfd4A:b6>K1zSqK N6}?e !/$f!KHooQ;rml0v8?yMի:dдq6ʜ3 #Pp=n0LJuۺUub@6 va%lF06k9"8&ȋi܆qZxd=57KOJ0ny'cZTً! 4q1d!G^5ʨP̷Sc8W@Fk PM ?5;!PqL>g|8v8,|&Ȇq|3@aR^  GQK  tUGJd)#U1P[GXSˡzd ~ $G& 67E+ON/;ԣ/9Y@73EIj=`k-:GҔ?M I[ib D*2 gPm`qjbuw wÜf| ;|ՄWpLF Twvc%TD@x:y?CYF&˃',>*i-!S]3M58*!^jPY"ϋA0-Ħ%ڏ1( U6$a@byR}NG~,yS_£cnqͮn+ME8M`YL]F!7 ;TpbD➕ҳw]K}ʞQ!lQӮАbAzY2M\# K{O5DK.ŒYH1C 8yþEX wB権JŒaL.ʬ[ބBI*'VpTY J9٪@PAMF%c\A;En&c LeTnFK(@M%սf{.E`D:Aly bvT6(pBO~&ô*̋? _ɜ q| ZLFE|}Py/[IP—U%Q /pk?[?R2ӵ?sspn"v$B7^BCĤ}}?0ނy3q|קTH$zZ2 &Z ufO^`P' {݇]?tV\7  ɰ_+.x'⸙6 sj}? }|V-st!;ct,igqu4m%{ԗA)ߌv*ޔ\hjd*~2x7 o;0lȘBˌ7ɓI‘;>Nau╅I#/Vvߧ-3T"!䌼ׁ_WggP/_U۟;; X ZJ=½+B,]P(ne}Ro/pFxŋ(6^_Aj '!¦ˡk8[v>k(GnxEݣ8gв9ZF@^n>%']ž|UACԞB:TUO&Q)I+Z$ q'.ֻ7o[oGI1}](eE .×JL0+oBًJ ?&^H'YGc%ƵlX|pc/endstream endobj 284 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 341 >> stream xJLMRoman10-BoldN  Lambda0jӀʯx)({zS0W}.L}mgS~Ԃ‹Ӌ̓ѕx_f!9%wi5poz{{lyv{\‹17?@X+\狸݋P^   To ܔwendstream endobj 285 0 obj << /Filter /FlateDecode /Length 1938 >> stream xXms۸~зB7!BKfw{c+ꍧVH;ITH>]ۻAT$7/xAp  SA:7;?t|b 6!r~BL(*ab8]ЛvT0ERWY=Jc8%uľkgW)_TQXc/J-5Z[4F &)ʀ528.` z{뇁S3\q5x;ha*!\5@J&a/NV'K%Tb'7I:/ZMZ|Bڎ=4ߵki̵ /{ToҸp+_F ĕ!~ x1g,A7ym(7Sn(fu)efTځS[kJfǂ;C!6/&3t9}EdOoW|h ffMrUGKfh66 bYk &)Th T6V08!4u+ mb^ySV68#z7M<"R25LaFj-R%i*b_~1Tb`,^c C.BZY~-46#D `#0{?>P]l iNks#A``Q#)si% NJV;%!&^m..lq \%$u/0¬0,a䃯m{ J]M7N{p>TE]eE~ǠNp?cͫ,e5v9L-Id?.MS4`\[̸*:wljm-@<'Pf2!\o>UODbM }[M=Ц 4Ebs7;4g9)M,[EV!sv((Ë+.rXC54˄G3Pӛ_u[g_J(.'$d/;#'eZxӫON_٥q-ɳ9 PNy`Xa}>5=I]1'k4UCn7Ú6*8$ e 4$gW?ۃ-rF2%U|lXUhu)UKꓕEN{!d8zuc \? ۥQP Yt0sF>[)u:!y{&f*%vg=Ǩi,a_:>W;:` tGދۓrh:nιkzmTʪ"Dīb_tPsӿhShMw#jKT6^$G/PAR)BTօ;ف{c+n2r\߮Ya>uQ]o =tS7:LaBXӕ`뺸K\7a͋PuWa)^dHv?Ŝx $ LkU8Smʚ,o$Tֺ/5mr^Ӻ)sfӪ*TԑE>~~d]pЏ& T)- {L4?)n (SE"ȳ> stream x\KsHrd` 6zP:F<16#+!×`u5@s] gfU+ @15хzde~ӪڞTw'?zzڞ8wrĿ"N[:eN/'7UUVFYQnRD߬qi )t0jèdsMoF՟/N]z:UUecnu?04阵tL5P|^ԥ-_an-nN/O.~x*ͦ\o V\.Wga߽ ~V;uM{s2n#IQ7ґҦt XVkx5d:J60@ ^7mX~nFv˯׵ݱ?OWgNXx7tnwvn"dFUH Ga*>|<.vKhu\Xd7*Dhnp'9|3Z,>.'"qV+!ywHdS25`m8` lހ ϕ(zBw`KT]-وa'Nn3"&TpOx;ye]KxK{3-/dH $9ڨ-SG*0?)T m?"x#>5am;Ϸc|FފXK˳N]_'lowadUrxɌЗ)^nnGՎH`{Vͩ F;Z.K[[c@hivRv])j@, yJk:? D9D51,?UZe)2L9Qv֬`p7}Z1iEVM3 r֢TBFλmt(3nnql (>>E=K,nSKqh<]\M!3$6 }>,5ޅ,[jhx)(iТFTv}D o(Mv/6|@/1t>rE7_W3~&C|ZV8=$w, %SGF 刄;-ג퐓s\A|>*@R`Q9οJ HXa.V sC!uE,+䊍_79>zQ6Y J6߭%a  $⫈.~ ^ NN\wҕͷg,m%LYlKoh@̎$3!qJDD/pᲁxy @:Kg0G̯N; ȃ~hhD+ '"FRѳIZRt@R 4Af@>J*O݆cGTtO&]ʬVH#G8 F.ZSlUOH)yc.Ɓ^:LSY+0(SFFMs<)ᅡԺY\Й)AJL>g5|W4NM ~DPL]G=2U7',d0VU|d0'i0gA8;# 3Dm|Y룹kd*^Ǵf#}90`G2R)Y&fLlnCۈ }65'/p@+vay'Rai|= ]a2`kYؙhGt0,I͸1s_t/nZY&JƂ)N4;/-Lbx77ǫb)^1=}5+^__R(^*)pҞxdF,zj[Sإ0)x !r3Mq >KQ}s!7=I c٘Im0q5`1̈́S\BszI3߽G{ _ہjѸ5s3h H݆[YJpr0;h@z0dƴ \\[2y g־,-`Njvqt~dgSU!>_ c ,Os-A4;ȬY%C/gaɮě&('pzf=[T1aj]l:n(:+ɮp\.5ؾk- wkv=l"0;-mB)Ƅ]Jv?wJw H^|>Y5[AY8ZbwT"M% &0Aiy5hj^ L*Ͼ/zOPmi_X;y5䳤ѷ(JPxٌ~`8H TvnW\ٟܜ3M:Ⱥ#NOֵ.: rhKg2˱ivAj3[~^;i?5` Ҭ-QHFzr,dY&4]?u5ylIvZ"0-7d֥Ha]D 43V&Gм[{Ed`r pݖ%0/LL嬺"h+zq:3üQ=@kb޴dWsxv ϊzL![r>ťfNgp~)}b>8T-F64i#AGsrX_ФN"􂶾XbC( 䰖5KRfԇHcMܥFw@6mmR6R0lWtZa|rqG߿}D+Y;+39Tg [$"4ϩ(3k u:CIss+"]&j3>ä*ak*׾C|!?Iś< Z\ܻ̈́ҹH(21#hʼt< <;zIQ#]r%ƷNɬ1.bBo/vL^c{|1@94Nj:ߪ4%QfvIwe AY;T*#9W8Lŋ5R$^ on=*2Jhy ReߵhT[HA /+ 210{BW)hSaݹ1"Lg, nJ:|b"~@L}wC @49o8-&bL:v'E+i4*@I ͒\aWn!Ib7s#I>+yO 6v E>j& J:F1? hpQf\Mdc⸻ܰuϜ}'(;/\(֐(m}<;Ylu:TT`'0 Tb/7{~_w~Z{̈́E9qM;ں;ߟ]hfYnQ}?z)?-S9y@=Z SGLsZ؏Oo 53_N:+`=wA ~E~vhʵC4xߧ܏9 d)x\Q7^32xDXXdg|U_fmv),\!/KbςaW36m E ܓHud|Ag8xA(k٪zǰJ,Ec'endstream endobj 287 0 obj << /Filter /FlateDecode /Length 4797 >> stream x[ݓܸq>%y3uT"|R:֮Vfvdr=O$ZKrFnﻦ_//݅^w]_KM/vo.x9kM[{evNjJ6w07`g o.~wWx_*Uwoh'|^ksv9ᬫ`l*[EBd10?CXU+_e!ճ]ߒICov2LjkN2yq!!>H|?:F @_&UZܦ7wM-ARB+M*괿ơfV/K‡uߥHyds N*$&$I<¹9Oxtvhݮm|-|z~uF9Q`G8VZxjK.86HɉCzŪt<73/ֆT-L~hՌդմZǗDlL :a؂o*fb.cDJڲ| Ҹt6qIlxv@PDaY\ʰe iWaI7 uTBU" TuqX=3/PsZ$dm4RКjZ,zPJWa;@CGF֪Q+7',fRWxijR!obLt!(12o}|<0VJ8>{G1V -[x04W{ƥ*)J|w{s͗`Rň2?\5Qpqe3>)FTY:S1y>^֩ޤbYB$$ӛds/+t+=4RmtZV a Ѝmvp^VWt퀹8(Lji<ȓn,%|dHs8\Q2h閨1_Ccw_NWCBF0cEb-BMxFP!'[艸RܟSJzYY"" $JToe {S/5.W]tbU2x,76tun>,ѫ={耯.JE3dcmr7kyD9c*p]y^W]=)bTXђJmg3I#>;e 0"8|6}2{$DO `*>xpOaDbVNofπѢ`GJwhlprھ<>߼%B+0#;*E-E딄} ÕF^lr"a64y 9,]V:uAWD)@hA`݂;#@)MUb!fK5O{2Op3a{pmz!ut2UiJc u 80dq$&6IpJ5(eVXq6l6 tҞ]=^EF R7d}/.`<">##s K`e ye&(ٙc [",_PT0k H}>P9fi}mMD]pf@{ *27P^w{Ί4^*ă0qRfVq)wt,DrZ%{jmTz4XӘ0rbtaȚ|b·+mN^vK晙OO{9U@

nj$3zpbҋ)6*2WEA9<eKإsEI=$u] S=.h ɩQ}e6BZ "w3JF<=J ;ĝH@1|ǒS>4Ϋ~Ņ5-+]1%B JT+th"9si aC&oɥ l5f &efk-z! vc DHxt1EF4>fhƅ BRܽ/0[E<*%Ӳ$5mNJ[E卶,^qEཧ/坉aQIJ RƋ[ (sWBHhZ:#lad#/ kaE< ]M]^l櫵s@*\MyZIV[cX P8Ô" 9ZX0`˙rY,4Pdy+sAcgAܜ%&Kl"k_A(j;X7W958~x$T]1ҤI!SZLO5mƒ JBXc=\7zFCpʇӔii,Z] bUU'+8.KZh_ -eTگH<_'@ẍG@tdm}5v̗qbtrEǏܜ_%qi\ #:#r3N|,ni-Gf@nTQ]v e`NCi"؊_I%~]⋨C,kfyzR,"uSf``ݯ\ (p+IUNjY[{ +45J˫> zW< V<$d}O.=lm&iЅ)`mQfQZ-H;H,f(%HsEVo˟ab:>XpNeM9d8#>p>쭱taK]\ nVƌjZv`DJEoOBP9;۳҉2175FɄO]4j?7 w O_J dJO7Q 7WքR6*2i cLemNjpa\endstream endobj 288 0 obj << /Filter /FlateDecode /Length 2734 >> stream xZKoH7A0М'<3&Xʁd$*$~_d7E9Sdꫯax?dSox5+*Sxzu3)!8Osvr~n,8qzS4ɌRfXdrŤD͹~G?{70yҏZk'Jb)$ڼ նK_:Bh2W`rNM_M~Ö́JN?9ড়5N8! x())S }ǝ嘳%4hQ횶S j  F/ˢ-ΘS"ZWar2ΨZ\2ecxf@Q |vL!sty9O{7"S4(!僾!T]zh %=ukU*dB<2yGv`sp]_u15䈠 B"L/"[7(~oW-VF:8OBWԩN97ٛ}2#'$g|TaC+;4ЌEe8 (*R9×ᳶz=O꺪-90QE>dQMYE~蕬nn(of)gXgșc+ծ61^♿bNºPvJڱQyށA[w+ǂ'S+ɍ8* v%?Z 8~ȩtS"tPxҘ%=f,$/\٦ 8]ёAP5M`Q8L At gU[AX7uK;6=0!] r1# *ZnlQf®Nw:9N05d (!XR :neuxce V0'{0u c":BKALh ^J'R(X$ڇ .Ё=7&R{ }aV"mrS0O;fDB@N.(S#5Kt-/)lL ɄPS1 W 4K 'PƦuD'[_`#"Yzghaao&_ݻ?ͻFʆpo/i /5 _cX_#ht 3hKSg\ 0Wa: s"c {vh )%;DM8u?n3?@ݖQ2FQxumČɈ!NĔl%)ԸߡӖܠ."ѳ8`GҺ궟YN P~Zv̾:0l  5m%d 폵B+&]M90^bHoo]_ogaX cɹ2Hz>, Od*Pk1uc6T08l`x W44;uP$^š rU+O7 Q4aM Wsda Zއ@blQS~̩I R *~h0I U{yw,`ɵH:0bt`*\d1Ehݎ5̛hΟd,d w$sǦ,2l;N=II)tW0''4OSMu;Cm6vlpeU3<錫Ez`7bO-Ms#$#!D`Xcn<POηHoťVV 7N d鞺qdp+1{#p &#G0h (ȣ D wHtH{/1:nWP|Lе}1$VҴNmC h9:o ,ސЦSXs鹾n6ܹ->m5m/ovU;Gf>9Lex&i9?aѤWhQWMojwۿ==z`@8Ӛۦz|ˡXsqWKCV6,`MOyA㶑a6 @"/ݽ_Gn"1f/h"v[QL?cSپ| 6i|w%ù߉d9J}'!Mendstream endobj 289 0 obj << /Filter /FlateDecode /Length 3144 >> stream xZsЗᘘ(NI:U3i,M3Riݽ;w = {iib3ɧW~g~lnrЩS-ufm&ٯb2-) g9qfXf%E,)gyKa]8+)iYTۺUe_N:7}鳞z}|+/8F[bNU(A2C}Ç9'/O_*E]gQ<#Qz^,2qJOP><8sSyL7<) mQw7 -K? ִN+$8x-4POj(210Oo6uD咪1-;:de*N"`$*pE-GsY0SzNא#xᵦPא{Sf=|r` @HMJi\.UG]@ jB*gHd4Y9VL)Ûc_܊{uyh5԰텯n6 'x&AT%+iK ]M1nm }+1geYh;ƲٶjM:|Jw=b`Kdy(A_C_ݙ̆SiI,X;J/`sk0Uu+km (nT)9m>6~q4\7+Z]uss=pt$#Jqa֣GzpZ2m`!K3endstream endobj 290 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 3801 >> stream xW TS>1sTmOV:`)h+8L0*a I6ayQSQlmkkbu{{'m{&Y+k%}{ (@` A6xFH=eu%7ay Y>I^vA+Be.aQ{x$ 0}s^0Qv5ZOM6RoQj+NԻ3BVS\=}j(5,)+FQbʖBMݤSY;A_҄Å烵lM13!o z`3ؗ~t-,y 0~0}C wO5+7~0@Ajh zm03"LIOrKH9K9)ݮ)K;@ݡb j9IЕfs~\ׇq$ώA}B|&UOC9?f*޷oa>6?-{7S"6KFZ%H^V>*.\j be8$/Y2#0UTnpy:/YtGlD;U4;Sޗ Qr^+1hmXv82X>#Zpssewm ?ڢ6hK溳g4Wy)+C43Fط s!'6\q '3x];>yW-ĥ) ) %T&p .N[, sً l܄ 7qGl1{zXcN6E@nכd Y){$/_i]ha8ZK&szx¸r@!4;r_t")ѧ4p>ztЃޚo扃E22#ԉ,$_:W`xn:& ПhqWy&Рd |ޖfzE7d4 5(_&F|Hr*:h qm07^h=R+" A0_b4k;aحL>-Augo[@zÿs%ĞWljN>tUm'@#S8Al; iFJתJA($Ur;iCDSf Veޣ W?tv("D&+2r&l*67 +}V* ^KZՅ! e n3 kxHt[ARFL@:/6G^JE#p"t3lkIqt[)+*"(gx'r4#!C^Eˋ6W3*66}8[kɃaLV|1x-mpC'Wzf;3s⠮\ ݩZ`^#OoV-S p ^ +w|r B]Y:VUtGBVZew?pZgj"xqKʝ~!a^_)b?S^>MW _e/M ;SzEnk4yXNV8ѸACS{`ʿNښ5BoljS/<5!5iI!(/R[]=T`*`: eEl Z 28X0쏇袋ebB݀Tsփ ʊ/,)O<\W(9P&)#ώ*,έ+`?csFS_R-)*) dxLXWs?|':ꅒC'v ޔh;)oMY;%F!^"L6dx}IMEPS洒am6ׅUVUW7|^y /{}(E3kўZU%Y84'5 ^&Zj;>z<K4r3r55P A׈5i)1f? lv(Q6E~dځfSMPmT@a-#wu[?j6 8ytz(A#%.! 1-mEZLO% $BN\ފ@ %;D2VLM$Dyԣ2NEfFih>萐mQ1ߒ:q P0켯nPn.oyqm,!BgB'-1\W}XKUcߺ}1mt[9a=IDdNh!2BJtEMa.n2\su[h|w4?~NW"Ukd{kfߕ 1/ iet.Do;p|Ap9|'4Eg~W%Ť$Ӈ_gH;W.z̙Z9uōuRԩ\kGe;J]η$5D BpUkՊo5k.7%b=ASN;nKWFVTrTCX?kR~ūwyK~ PT46Q9wJvѽcdJ_hnIBz樬"HxsdA3uqJit&V&r䣗>fR+_EHE]M?nB/2 C/7 ") nB)!"1<^~JV,?T}y8¬fxVq%Y}x_~^gy7L vy , &ώ|fg(wv,z{ջ7t~o! mGȗC(LezJ6} pF_pLF&2lEFr,'"60p! d|`33%#"*N%U)2  )|HugkFesso9endstream endobj 291 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 1987 >> stream xmUyp_!K,xI@L9B 0 a|KeicuؖK%˒ 2%M&HI&i&0[Ϧ̴ig{~OLK[TJvɥE'RRetg.&NMd V1?Ǎ(fdzF5H<~Bfrw1, ہұ l7b F, Y0MSO qaILJ#Q/⏦>C=31C= W =;2A p7J'>aTK7(p"[5GZha3P%k2P]+#_䋠1c  &/ןq#InInh ,nIMd3뢣v{ڐt7QZ3<"ay$ŎÓO&6NV IU7?V=匿6XiqP-^ :G%΁^l&Mhsi;eB|LGb? D4GPYҮskP2=Lw(U&Yc@XƯH[խ\CtUzѶBG>p$d4UWtMtK)Ẇ HuS-84.%RSU=8bika9}&nj"+BE qU歜\6" (I̓%%F%X|*bj.ԏFXQD./̍Pt_n"- E˷oY*s_L/\.U/=Ggga-U"~$?08,E!ī&@Pٰm MM'˖ِRo v2a&TQr8z&OSz0:oc"n>݈՞d]<m#@xGΙGiIG\FNcE.}x"GCpjfZ_Ar*y^yZAq+/53}gXt^sԢz:vrBݷ#v?O\?ZL~?tz/\p]5vu{qJ6PГ +aT o. @'2繃 @8pબb^s$a]:OGڔϏ&~nqM=\>z>¹A K.}wAvuGt8o> {~$"`wkhޖTU!X&lha-{rxຩG2* E8阡IH$bĄqڭ+9y'}<~ M YQ;u5% 'q-N t`ԛIMr:KO)o9#8v_ӷ_F \̯\Kx}no(>|0$Et ¬(a%,8'fJ=(]PM% 6Etן0q1H>$>}9_BmycS5Z@4_h_#> stream x}TkTuQ)nwj7=ZMMr,BmY~ @` Yrmv) %T9M*^qḆ11+~h̟ow󼏊aT*8}C-; p5k.oߔ1'dlINMܙ c`r&a61[mL$c\fhrfTA5:s|j:KaaMaaF_ K~6ECO~[s4Def@&'ΞI ?Df/&Z-3)}PD` ޖSRٸO@4^AuT|5 ND7?1tKqk@pUf;'cr)So1]SթSAԋZ-fSQrY/01u-9/>K'_*k~'v+#,d3mA3P|:IpVN_d'`.U:ùmZ_j="9_;63.(%&)9x>TU֐RE-b1Fj}Uy I!UEizmiFrإe]xDB,?+U1>?JmPL0S{p~orhyWx: ܇.g"#I8'l|Knt#2X0}%{В/1+3P%tۜ2+=eH\x]Վ.s8OWe`H \K@ k8fC;+ r8vIbPFZFSW+p!.md;t2 iȅZ~vwR8e7zy$V2#) 1-@>Crzk`"GR<&r+CPlw-p8=>{pKW?{K^{j[ݕ:;"*!!)rB+cO&[B1QjS;1395:Ӷ CM}&ס:#o/ѹnemy_WL~h-=aQ\bϺ]ӛ{=IY !#v_rsiPBbGd M++ۤ{z̺ ')fV1}itnaӈH?ګDc\l.ryCWu9Kjx,\p{g>h,m|Y%⢵R*MڝbWYu9{}uw[܂ L uwQc|E8nj))˴TGZv.N47;ČW١E)).袛)H+ 8p M\cUWeݎ?'֗MuW ?7w5MMI> stream xWo6~0 F_Juo[q Y&^>쨳T;((&No;;3 cy9bqo^NG?^H;1rԩ(CPi9"`;v,k<:HV7I*ImF㛅G!eɤVNZKy' V[Bn(B=H(T<~ֽyљ#cKr^ ޽sΐ*z3DŽ1Pn1xvI.V ec2.ƩPjYZɒ9"),$)(h!X*TvV`*fSXrtL8; umY;yUmiyEnl_-en׾\]V勓~`iI*9"w!$_VKTA ܺCh戵6_ȲY8|u1KfIr[=kxWdz'p'&+sZ4xާ+9׍s{UU]y9˹!8*,,+.HLsٟisxɇ[,[\ ukT;3ngIIϯuUlweU}A_x8X[y9#s$y>PEPvݷ[H]6ŒAݜA?Ivv졵tgmU[)SGxdM=_T{B#T}MxR7E5Q+l4؛ָ 0MxYȓOͧXzvH77#l 1XgWZy٦4  C Cbi+ba95D;#ݝgpv, 8jY]YġI[1H+5HN mD!RUcH "y уnIwawҺ,4|$2?˩&vY iZ00~<ES0(<Z O.~B`ߋ X`x'=XrF>v5>58~*orݗ&y}Q5*z/N;P]۷+)[$xB^滪7QӃ嚬Z̓[}KI?yLWQendstream endobj 294 0 obj << /Filter /FlateDecode /Length 161 >> stream x]O10 ,. 2D! }I$.|D/J`H_#49U a:8 d >LyQMYU{%hy"*յvIG`n6S\ HJ$pLg9/S3endstream endobj 295 0 obj << /Filter /FlateDecode /Subtype /Type1C /Length 212 >> stream xcd`ab`dd74 JM/I, Jtw?ew+;7  KKR3sRJ YDzʅ>?+. +tstUUupOSܞcySمzpgr| ~8On; r\,!<gO '00M~endstream endobj 296 0 obj << /Type /XRef /Length 236 /Filter /FlateDecode /DecodeParms << /Columns 5 /Predictor 12 >> /W [ 1 3 1 ] /Info 3 0 R /Root 2 0 R /Size 297 /ID [<1a4741f6eb232fe9d869d179da931977>] >> stream xտPۋTD bH,$hfHl^@ ,^ FO`0D}ޢi_N=R} ׳:"\Ư^E{wEQ9n8~4+09@ y|U1.Oa8s+b}E&L/cO83 ng6#V u9[`aّ3Uփ{X,~-* endstream endobj startxref 165646 %%EOF RcppEigen/inst/doc/RcppEigen-Introduction.R0000644000176200001440000000072714567630620020345 0ustar liggesusers### R code from vignette source 'RcppEigen-Introduction.Rnw' ################################################### ### code chunk number 1: RcppEigen-Introduction.Rnw:8-13 ################################################### pkgVersion <- packageDescription("RcppEigen")$Version pkgDate <- packageDescription("RcppEigen")$Date prettyDate <- format(Sys.Date(), "%B %e, %Y") #require("RcppEigen") #eigenVersion <- paste(unlist(.Call("eigen_version", FALSE)), collapse=".") RcppEigen/inst/COPYRIGHTS0000644000176200001440000006535312254314420014532 0ustar liggesusersFiles: inst/include/Eigen/* inst/include/unsupported/Eigen/* Copyright: 2006 - 2012 Gael Guennebaud, Benoit Jacob License: MPL-2 Files: unsupported/Eigen/FFT unsupported/Eigen/src/FFT/ei_fftw_impl.h unsupported/Eigen/src/FFT/ei_kissfft_impl.h Copyright: 2003 - 2009 Mark Borgerding License: MPL-2 Files: inst/include/Eigen/src/SuperLUSupport/SuperLUSupport.h Copyright: 1994 Xerox Corporation License: THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK. Files: inst/include/unsupported/Eigen/src/IterativeSolvers/IterationController.h Copyright: 1997 - 2001 Andrew Lumsdaine and Lie-Quan Lee License: This file is part of the Iterative Template Library You should have received a copy of the License Agreement for the Iterative Template Library along with the software; see the file LICENSE. Permission to modify the code and to distribute modified code is granted, provided the text of this NOTICE is retained, a notice that the code was modified is included with the above COPYRIGHT NOTICE and with the COPYRIGHT NOTICE in the LICENSE file, and that the LICENSE file is distributed with the modified code. LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. By way of example, but not limitation, Licensor MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE COMPONENTS OR DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. Files: inst/include/unsupported/Eigen/Splines inst/include/unsupported/Eigen/src/Splines/SplineFitting.h inst/include/unsupported/Eigen/src/Splines/SplineFwd.h inst/include/unsupported/Eigen/src/Splines/Spline.h inst/include/Eigen/src/Geometry/Umeyama.h inst/include/Eigen/src/StlSupport/details.h inst/include/Eigen/src/StlSupport/StdDeque.h inst/include/Eigen/src/StlSupport/StdList.h inst/include/Eigen/src/StlSupport/StdVector.h inst/include/Eigen/StdDeque inst/include/Eigen/StdList inst/include/Eigen/StdVector inst/include/Eigen/src/Core/DenseStorage.h inst/include/Eigen/src/Core/util/Memory.h inst/include/Eigen/src/Geometry/Transform.h Copyright: 2009 - 2011 Hauke Heibel License: MPL-2 Files: inst/include/Eigen/src/LU/arch/Inverse_SSE.h inst/include/Eigen/src/Cholesky/LLT_MKL.h inst/include/Eigen/src/Core/Assign_MKL.h inst/include/Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h inst/include/Eigen/src/Core/products/GeneralMatrixVector_MKL.h inst/include/Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h inst/include/Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h inst/include/Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h inst/include/Eigen/src/Core/products/TriangularMatrixVector_MKL.h inst/include/Eigen/src/Core/products/TriangularSolverMatrix_MKL.h inst/include/Eigen/src/Core/util/MKL_support.h inst/include/Eigen/src/Eigenvalues/ComplexSchur_MKL.h inst/include/Eigen/src/Eigenvalues/RealSchur_MKL.h inst/include/Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h inst/include/Eigen/src/LU/PartialPivLU_MKL.h inst/include/Eigen/src/PardisoSupport/PardisoSupport.h inst/include/Eigen/src/QR/ColPivHouseholderQR_MKL.h inst/include/Eigen/src/QR/HouseholderQR_MKL.h inst/include/Eigen/src/SVD/JacobiSVD_MKL.h Copyright: 2001 Intel Corporation License: Permition is granted to use, copy, distribute and prepare derivative works of this library for any purpose and without fee, provided, that the above copyright notice and this statement appear in all copies. Intel makes no representations about the suitability of this software for any purpose, and specifically disclaims all warranties. Files: inst/include/unsupported/Eigen/src/IterativeSolvers/ConstrainedConjGrad.h Copyright: 2002 - 2007 Yves Renard License: LPGL-2.1 Files: inst/include/Eigen/src/OrderingMethods/Amd.h Copyright: 2006 Timothy A. Davis License: LGPL-2.1 Files: inst/include/Eigen/src/Core/arch/SSE/MathFunctions.h Copyright: 2007 Julien Pommier License: MPL-2 Files: inst/include/Eigen/src/Core/Assign.h Copyright: 2007 Michael Olbrich License: MPL-2 Files: inst/include/unsupported/Eigen/src/Skyline/SkylineMatrixBase.h inst/include/unsupported/Eigen/src/Skyline/SkylineMatrix.h inst/include/unsupported/Eigen/src/Skyline/SkylineProduct.h inst/include/unsupported/Eigen/src/Skyline/SkylineStorage.h inst/include/unsupported/Eigen/src/Skyline/SkylineInplaceLU.h inst/include/unsupported/Eigen/src/Skyline/SkylineUtil.h Copyright: 2008 - 2009 Guillaume Saupin ; License: MPL-2 Files: inst/include/Eigen/src/Core/arch/AltiVec/PacketMath.h inst/include/Eigen/src/Core/arch/NEON/PacketMath.h Copyright: 2008 Konstantinos Margaritis License: MPL-2 Files: inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixExponential.h inst/include/unsupported/Eigen/src/MatrixFunction.h inst/include/unsupported/Eigen/MatrixFunctions inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixFunctionAtomic.h inst/include/Eigen/src/Eigenvalues/ComplexEigenSolver.h inst/include/Eigen/src/Eigenvalues/ComplexSchur.h inst/include/Eigen/src/Eigenvalues/EigenSolver.h inst/include/Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h inst/include/Eigen/src/Eigenvalues/HessenbergDecomposition.h inst/include/Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h inst/include/Eigen/src/Eigenvalues/RealSchur.h inst/include/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h inst/include/Eigen/src/Eigenvalues/Tridiagonalization.h inst/include/unsupported/Eigen/src/MatrixFunctions/StemFunction.h inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixLogarithm.h inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixSquareRoot.h Copyright: 2009 - 2010 Jitse Niesen License: MPL-2 Files: inst/include/Eigen/src/Eigenvalues/ComplexEigenSolver.h inst/include/Eigen/src/Eigenvalues/ComplexSchur.h Copyright: 2009 Claire Maurice License: MPL-2 Files: inst/include/unsupported/Eigen/BVH inst/include/unsupported/Eigen/src/BVH/BVAlgorithms.h inst/include/unsupported/Eigen/src/BVH/KdBVH.h Copyright: 2009 Ilya Baran License: MPL-2 Files: inst/include/Eigen/src/Cholesky/LDLT.h Copyright: 2009 Keir Mierle License: MPL-2 Files: inst/include/Eigen/src/Geometry/Quaternion.h Copyright: 2009 Mathieu Gautier License: MPL-2 Files: inst/include/Eigen/src/Geometry/arch/Geometry_SSE.h inst/include/unsupported/Eigen/src/MoreVectorization/MathFunctions.h Copyright: 2009 Rohit Garg License: MPL-2 Files: inst/include/unsupported/Eigen/NonLinearOptimization inst/include/unsupported/Eigen/NumericalDiff inst/include/unsupported/Eigen/src/NonLinearOptimization/HybridNonLinearSolver.h inst/include/unsupported/Eigen/src/NonLinearOptimization/LevenbergMarquardt.h inst/include/unsupported/Eigen/src/NumericalDiff/NumericalDiff.h inst/include/Eigen/src/Core/util/Memory.h Copyright: 2009 - 2010 Thomas Capricelli License: MPL-2 Files: inst/include/Eigen/src/SparseCore/SparseView.h inst/include/Eigen/src/Sparse/SparseView.h Copyright: 2010 Daniel Lowengrub License: MPL-2 Files: inst/include/unsupported/Eigen/src/Polynomials/Companion.h inst/include/unsupported/Eigen/src/Polynomials/PolynomialSolver.h inst/include/unsupported/Eigen/src/Polynomials/PolynomialUtils.h Copyright: 2010 Manuel Yguel License: MPL-2 Files: inst/include/Eigen/src/Householder/BlockHouseholder.h inst/include/Eigen/src/QR/HouseholderQR.h Copyright: 2010 Vincent Lejeune License: MPL-2 Files: inst/include/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h Copyright: 2011 Andreas Platen License: MPL-2 Files: inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixExponential.h inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixLogarithm.h Copyright: 2011 Chen-Pang He License: MPL-2 Files: inst/include/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h inst/include/unsupported/Eigen/src/IterativeSolvers/GMRES.h Copyright: 2012 Kolja Brix License: MPL-2 Files: inst/include/Eigen/src/Cholesky/LDLT.h Copyright: 2011 Timothy E. Holy License: MPL-2 Files: inst/include/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h inst/include/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h inst/include/Eigen/src/PaStiXSupport/PaStiXSupport.h inst/include/unsupported/Eigen/src/IterativeSolvers/Scaling.h inst/include/unsupported/Eigen/src/SparseExtra/MarketIO.h inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h Copyright: 2012 Désiré Nuentsa-Wakam License: MPL-2 Files: * Copyright: 2011 - 2013 Douglas Bates, Romain Francois and Dirk Eddelbuettel License: GPL-2 Files: debian/* Copyright: 2013 Dirk Eddelbuettel License: GPL-2 GNU R ships with various open source licenses including - the GPL license (version 2), use RShowDoc("GPL-2") to display it - the LGPL license (version 2.1), use RShowDoc("LGPL-2.1") to display it The MPL-2 is included below. License: MPL-2 Mozilla Public License Version 2.0 ================================== . 1. Definitions -------------- . 1.1. "Contributor" means each individual or legal entity that creates, contributes to the creation of, or owns Covered Software. . 1.2. "Contributor Version" means the combination of the Contributions of others (if any) used by a Contributor and that particular Contributor's Contribution. . 1.3. "Contribution" means Covered Software of a particular Contributor. . 1.4. "Covered Software" means Source Code Form to which the initial Contributor has attached the notice in Exhibit A, the Executable Form of such Source Code Form, and Modifications of such Source Code Form, in each case including portions thereof. . 1.5. "Incompatible With Secondary Licenses" means . (a) that the initial Contributor has attached the notice described in Exhibit B to the Covered Software; or . (b) that the Covered Software was made available under the terms of version 1.1 or earlier of the License, but not also under the terms of a Secondary License. . 1.6. "Executable Form" means any form of the work other than Source Code Form. . 1.7. "Larger Work" means a work that combines Covered Software with other material, in a separate file or files, that is not Covered Software. . 1.8. "License" means this document. . 1.9. "Licensable" means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently, any and all of the rights conveyed by this License. . 1.10. "Modifications" means any of the following: . (a) any file in Source Code Form that results from an addition to, deletion from, or modification of the contents of Covered Software; or . (b) any new file in Source Code Form that contains any Covered Software. . 1.11. "Patent Claims" of a Contributor means any patent claim(s), including without limitation, method, process, and apparatus claims, in any patent Licensable by such Contributor that would be infringed, but for the grant of the License, by the making, using, selling, offering for sale, having made, import, or transfer of either its Contributions or its Contributor Version. . 1.12. "Secondary License" means either the GNU General Public License, Version 2.0, the GNU Lesser General Public License, Version 2.1, the GNU Affero General Public License, Version 3.0, or any later versions of those licenses. . 1.13. "Source Code Form" means the form of the work preferred for making modifications. . 1.14. "You" (or "Your") means an individual or a legal entity exercising rights under this License. For legal entities, "You" includes any entity that controls, is controlled by, or is under common control with You. For purposes of this definition, "control" means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity. . 2. License Grants and Conditions -------------------------------- . 2.1. Grants . Each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license: . (a) under intellectual property rights (other than patent or trademark) Licensable by such Contributor to use, reproduce, make available, modify, display, perform, distribute, and otherwise exploit its Contributions, either on an unmodified basis, with Modifications, or as part of a Larger Work; and . (b) under Patent Claims of such Contributor to make, use, sell, offer for sale, have made, import, and otherwise transfer either its Contributions or its Contributor Version. . 2.2. Effective Date . The licenses granted in Section 2.1 with respect to any Contribution become effective for each Contribution on the date the Contributor first distributes such Contribution. . 2.3. Limitations on Grant Scope . The licenses granted in this Section 2 are the only rights granted under this License. No additional rights or licenses will be implied from the distribution or licensing of Covered Software under this License. Notwithstanding Section 2.1(b) above, no patent license is granted by a Contributor: . (a) for any code that a Contributor has removed from Covered Software; or . (b) for infringements caused by: (i) Your and any other third party's modifications of Covered Software, or (ii) the combination of its Contributions with other software (except as part of its Contributor Version); or . (c) under Patent Claims infringed by Covered Software in the absence of its Contributions. . This License does not grant any rights in the trademarks, service marks, or logos of any Contributor (except as may be necessary to comply with the notice requirements in Section 3.4). . 2.4. Subsequent Licenses . No Contributor makes additional grants as a result of Your choice to distribute the Covered Software under a subsequent version of this License (see Section 10.2) or under the terms of a Secondary License (if permitted under the terms of Section 3.3). . 2.5. Representation . Each Contributor represents that the Contributor believes its Contributions are its original creation(s) or it has sufficient rights to grant the rights to its Contributions conveyed by this License. . 2.6. Fair Use . This License is not intended to limit any rights You have under applicable copyright doctrines of fair use, fair dealing, or other equivalents. . 2.7. Conditions . Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in Section 2.1. . 3. Responsibilities ------------------- . 3.1. Distribution of Source Form . All distribution of Covered Software in Source Code Form, including any Modifications that You create or to which You contribute, must be under the terms of this License. You must inform recipients that the Source Code Form of the Covered Software is governed by the terms of this License, and how they can obtain a copy of this License. You may not attempt to alter or restrict the recipients' rights in the Source Code Form. . 3.2. Distribution of Executable Form . If You distribute Covered Software in Executable Form then: . (a) such Covered Software must also be made available in Source Code Form, as described in Section 3.1, and You must inform recipients of the Executable Form how they can obtain a copy of such Source Code Form by reasonable means in a timely manner, at a charge no more than the cost of distribution to the recipient; and . (b) You may distribute such Executable Form under the terms of this License, or sublicense it under different terms, provided that the license for the Executable Form does not attempt to limit or alter the recipients' rights in the Source Code Form under this License. . 3.3. Distribution of a Larger Work . You may create and distribute a Larger Work under terms of Your choice, provided that You also comply with the requirements of this License for the Covered Software. If the Larger Work is a combination of Covered Software with a work governed by one or more Secondary Licenses, and the Covered Software is not Incompatible With Secondary Licenses, this License permits You to additionally distribute such Covered Software under the terms of such Secondary License(s), so that the recipient of the Larger Work may, at their option, further distribute the Covered Software under the terms of either this License or such Secondary License(s). . 3.4. Notices . You may not remove or alter the substance of any license notices (including copyright notices, patent notices, disclaimers of warranty, or limitations of liability) contained within the Source Code Form of the Covered Software, except that You may alter any license notices to the extent required to remedy known factual inaccuracies. . 3.5. Application of Additional Terms . You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, You may do so only on Your own behalf, and not on behalf of any Contributor. You must make it absolutely clear that any such warranty, support, indemnity, or liability obligation is offered by You alone, and You hereby agree to indemnify every Contributor for any liability incurred by such Contributor as a result of warranty, support, indemnity or liability terms You offer. You may include additional disclaimers of warranty and limitations of liability specific to any jurisdiction. . 4. Inability to Comply Due to Statute or Regulation --------------------------------------------------- . If it is impossible for You to comply with any of the terms of this License with respect to some or all of the Covered Software due to statute, judicial order, or regulation then You must: (a) comply with the terms of this License to the maximum extent possible; and (b) describe the limitations and the code they affect. Such description must be placed in a text file included with all distributions of the Covered Software under this License. Except to the extent prohibited by statute or regulation, such description must be sufficiently detailed for a recipient of ordinary skill to be able to understand it. . 5. Termination -------------- . 5.1. The rights granted under this License will terminate automatically if You fail to comply with any of its terms. However, if You become compliant, then the rights granted under this License from a particular Contributor are reinstated (a) provisionally, unless and until such Contributor explicitly and finally terminates Your grants, and (b) on an ongoing basis, if such Contributor fails to notify You of the non-compliance by some reasonable means prior to 60 days after You have come back into compliance. Moreover, Your grants from a particular Contributor are reinstated on an ongoing basis if such Contributor notifies You of the non-compliance by some reasonable means, this is the first time You have received notice of non-compliance with this License from such Contributor, and You become compliant prior to 30 days after Your receipt of the notice. . 5.2. If You initiate litigation against any entity by asserting a patent infringement claim (excluding declaratory judgment actions, counter-claims, and cross-claims) alleging that a Contributor Version directly or indirectly infringes any patent, then the rights granted to You by any and all Contributors for the Covered Software under Section 2.1 of this License shall terminate. . 5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user license agreements (excluding distributors and resellers) which have been validly granted by You or Your distributors under this License prior to termination shall survive termination. . ************************************************************************ * * * 6. Disclaimer of Warranty * * ------------------------- * * * * Covered Software is provided under this License on an "as is" * * basis, without warranty of any kind, either expressed, implied, or * * statutory, including, without limitation, warranties that the * * Covered Software is free of defects, merchantable, fit for a * * particular purpose or non-infringing. The entire risk as to the * * quality and performance of the Covered Software is with You. * * Should any Covered Software prove defective in any respect, You * * (not any Contributor) assume the cost of any necessary servicing, * * repair, or correction. This disclaimer of warranty constitutes an * * essential part of this License. No use of any Covered Software is * * authorized under this License except under this disclaimer. * * * ************************************************************************ . ************************************************************************ * * * 7. Limitation of Liability * * -------------------------- * * * * Under no circumstances and under no legal theory, whether tort * * (including negligence), contract, or otherwise, shall any * * Contributor, or anyone who distributes Covered Software as * * permitted above, be liable to You for any direct, indirect, * * special, incidental, or consequential damages of any character * * including, without limitation, damages for lost profits, loss of * * goodwill, work stoppage, computer failure or malfunction, or any * * and all other commercial damages or losses, even if such party * * shall have been informed of the possibility of such damages. This * * limitation of liability shall not apply to liability for death or * * personal injury resulting from such party's negligence to the * * extent applicable law prohibits such limitation. Some * * jurisdictions do not allow the exclusion or limitation of * * incidental or consequential damages, so this exclusion and * * limitation may not apply to You. * * * ************************************************************************ . 8. Litigation ------------- . Any litigation relating to this License may be brought only in the courts of a jurisdiction where the defendant maintains its principal place of business and such litigation shall be governed by laws of that jurisdiction, without reference to its conflict-of-law provisions. Nothing in this Section shall prevent a party's ability to bring cross-claims or counter-claims. . 9. Miscellaneous ---------------- . This License represents the complete agreement concerning the subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not be used to construe this License against a Contributor. . 10. Versions of the License --------------------------- . 10.1. New Versions . Mozilla Foundation is the license steward. Except as provided in Section 10.3, no one other than the license steward has the right to modify or publish new versions of this License. Each version will be given a distinguishing version number. . 10.2. Effect of New Versions . You may distribute the Covered Software under the terms of the version of the License under which You originally received the Covered Software, or under the terms of any subsequent version published by the license steward. . 10.3. Modified Versions . If you create software not governed by this License, and you want to create a new license for such software, you may create and use a modified version of this License if you rename the license and remove any references to the name of the license steward (except to note that such modified license differs from this License). . 10.4. Distributing Source Code Form that is Incompatible With Secondary Licenses . If You choose to distribute Source Code Form that is Incompatible With Secondary Licenses under the terms of this version of the License, the notice described in Exhibit B of this License must be attached. . Exhibit A - Source Code Form License Notice ------------------------------------------- . This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. . If it is not possible or desirable to put the notice in a particular file, then You may include the notice in a location (such as a LICENSE file in a relevant directory) where a recipient would be likely to look for such a notice. . You may add additional accurate notices of copyright ownership. . Exhibit B - "Incompatible With Secondary Licenses" Notice --------------------------------------------------------- . This Source Code Form is "Incompatible With Secondary Licenses", as defined by the Mozilla Public License, v. 2.0. RcppEigen/inst/include/0000755000176200001440000000000014520536112014525 5ustar liggesusersRcppEigen/inst/include/RcppEigen.h0000644000176200001440000000213512253717461016564 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // RcppEigen.h: Rcpp/Eigen glue // // Copyright (C) 2011 Douglas Bates, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with RcppEigen. If not, see . #ifndef RcppEigen__RcppEigen__h #define RcppEigen__RcppEigen__h #include #include #include //#include //#include // also includes Eigen/Sparse #endif RcppEigen/inst/include/RcppEigenCholmod.h0000644000176200001440000000252514520536112020064 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // RcppEigenCholmod.h: Provide access to the Matrix API and in turn // to Eigen's CholmodSupport module. Use of this header relies on // compilation of ../../src/RcppEigenStubs.cpp and LinkingTo: Matrix. // // Copyright (C) 2011 Douglas Bates, Martin Maechler, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with RcppEigen. If not, see . #ifndef RcppEigen__RcppEigenCholmod__h #define RcppEigen__RcppEigenCholmod__h #include #ifndef R_MATRIX_CHOLMOD /* Matrix <= 1.6-1.1 */ # define R_MATRIX_CHOLMOD(_NAME_) M_cholmod_ ## _NAME_ # define M_cholmod_start M_R_cholmod_start /* sigh */ #endif #include #endif RcppEigen/inst/include/RcppEigenForward.h0000644000176200001440000000573014520536112020104 0ustar liggesusers// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8 -*- // // RcppEigenForward.h: Rcpp/Eigen glue // // Copyright (C) 2011 Douglas Bates, Dirk Eddelbuettel and Romain Francois // // This file is part of RcppEigen. // // RcppEigen is free software: you can redistribute it and/or modify it // under the terms of the GNU General Public License as published by // the Free Software Foundation, either version 2 of the License, or // (at your option) any later version. // // RcppEigen is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with RcppEigen. If not, see . #ifndef RcppEigen__RcppEigenForward__h #define RcppEigen__RcppEigenForward__h #include #include #include #include #include //#include // causes problems redefining sign #include #include #include #include #include #include #include // also includes Eigen/Sparse #include /* forward declarations */ namespace Rcpp { namespace traits { /* support for as */ template class Exporter< Eigen::Map > >; template class Exporter< Eigen::Map > >; template class Exporter< Eigen::Map > >; template class Exporter< Eigen::Map > >; template class Exporter< Eigen::Matrix >; template class Exporter< Eigen::Matrix >; template class Exporter< Eigen::Matrix >; template class Exporter< Eigen::Array >; template class Exporter< Eigen::Array >; template class Exporter< Eigen::Array >; template class Exporter< Eigen::Map > >; template class Exporter< Eigen::MappedSparseMatrix >; // Deprecated template class Exporter< Eigen::SparseMatrix >; template class Exporter< Eigen::Map > >; template class Exporter< Eigen::MappedSparseMatrix >; // Deprecated template class Exporter< Eigen::SparseMatrix >; } // namespace traits } #endif RcppEigen/inst/include/unsupported/0000755000176200001440000000000012253717461017126 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/0000755000176200001440000000000014562417221020151 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/KroneckerProduct0000644000176200001440000000166014107270226023360 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_KRONECKER_PRODUCT_MODULE_H #define EIGEN_KRONECKER_PRODUCT_MODULE_H #include "../../Eigen/Core" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "../../Eigen/src/SparseCore/SparseUtil.h" namespace Eigen { /** * \defgroup KroneckerProduct_Module KroneckerProduct module * * This module contains an experimental Kronecker product implementation. * * \code * #include * \endcode */ } // namespace Eigen #include "src/KroneckerProduct/KroneckerTensorProduct.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_KRONECKER_PRODUCT_MODULE_H RcppEigen/inst/include/unsupported/Eigen/AdolcForward0000644000176200001440000001050614562417221022445 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ADLOC_FORWARD #define EIGEN_ADLOC_FORWARD //-------------------------------------------------------------------------------- // // This file provides support for adolc's adouble type in forward mode. // ADOL-C is a C++ automatic differentiation library, // see https://projects.coin-or.org/ADOL-C for more information. // // Note that the maximal number of directions is controlled by // the preprocessor token NUMBER_DIRECTIONS. The default is 2. // //-------------------------------------------------------------------------------- #define ADOLC_TAPELESS #ifndef NUMBER_DIRECTIONS # define NUMBER_DIRECTIONS 2 #endif #include // adolc defines some very stupid macros: #if defined(malloc) # undef malloc #endif #if defined(calloc) # undef calloc #endif #if defined(realloc) # undef realloc #endif #include "../../Eigen/Core" namespace Eigen { /** * \defgroup AdolcForward_Module Adolc forward module * This module provides support for adolc's adouble type in forward mode. * ADOL-C is a C++ automatic differentiation library, * see https://projects.coin-or.org/ADOL-C for more information. * It mainly consists in: * - a struct Eigen::NumTraits specialization * - overloads of internal::* math function for adtl::adouble type. * * Note that the maximal number of directions is controlled by * the preprocessor token NUMBER_DIRECTIONS. The default is 2. * * \code * #include * \endcode */ //@{ } // namespace Eigen // Eigen's require a few additional functions which must be defined in the same namespace // than the custom scalar type own namespace namespace adtl { inline const adouble& conj(const adouble& x) { return x; } inline const adouble& real(const adouble& x) { return x; } inline adouble imag(const adouble&) { return 0.; } inline adouble abs(const adouble& x) { return fabs(x); } inline adouble abs2(const adouble& x) { return x*x; } inline bool (isinf)(const adouble& x) { return (Eigen::numext::isinf)(x.getValue()); } inline bool (isnan)(const adouble& x) { return (Eigen::numext::isnan)(x.getValue()); } } namespace Eigen { template<> struct NumTraits : NumTraits { typedef adtl::adouble Real; typedef adtl::adouble NonInteger; typedef adtl::adouble Nested; enum { IsComplex = 0, IsInteger = 0, IsSigned = 1, RequireInitialization = 1, ReadCost = 1, AddCost = 1, MulCost = 1 }; }; template class AdolcForwardJacobian : public Functor { typedef adtl::adouble ActiveScalar; public: AdolcForwardJacobian() : Functor() {} AdolcForwardJacobian(const Functor& f) : Functor(f) {} // forward constructors template AdolcForwardJacobian(const T0& a0) : Functor(a0) {} template AdolcForwardJacobian(const T0& a0, const T1& a1) : Functor(a0, a1) {} template AdolcForwardJacobian(const T0& a0, const T1& a1, const T1& a2) : Functor(a0, a1, a2) {} typedef typename Functor::InputType InputType; typedef typename Functor::ValueType ValueType; typedef typename Functor::JacobianType JacobianType; typedef Matrix ActiveInput; typedef Matrix ActiveValue; void operator() (const InputType& x, ValueType* v, JacobianType* _jac) const { eigen_assert(v!=0); if (!_jac) { Functor::operator()(x, v); return; } JacobianType& jac = *_jac; ActiveInput ax = x.template cast(); ActiveValue av(jac.rows()); for (int j=0; j #include #include #include "../../Eigen/Core" /** * \defgroup FFT_Module Fast Fourier Transform module * * \code * #include * \endcode * * This module provides Fast Fourier transformation, with a configurable backend * implementation. * * The default implementation is based on kissfft. It is a small, free, and * reasonably efficient default. * * There are currently two implementation backend: * * - fftw (http://www.fftw.org) : faster, GPL -- incompatible with Eigen in LGPL form, bigger code size. * - MKL (http://en.wikipedia.org/wiki/Math_Kernel_Library) : fastest, commercial -- may be incompatible with Eigen in GPL form. * * \section FFTDesign Design * * The following design decisions were made concerning scaling and * half-spectrum for real FFT. * * The intent is to facilitate generic programming and ease migrating code * from Matlab/octave. * We think the default behavior of Eigen/FFT should favor correctness and * generality over speed. Of course, the caller should be able to "opt-out" from this * behavior and get the speed increase if they want it. * * 1) %Scaling: * Other libraries (FFTW,IMKL,KISSFFT) do not perform scaling, so there * is a constant gain incurred after the forward&inverse transforms , so * IFFT(FFT(x)) = Kx; this is done to avoid a vector-by-value multiply. * The downside is that algorithms that worked correctly in Matlab/octave * don't behave the same way once implemented in C++. * * How Eigen/FFT differs: invertible scaling is performed so IFFT( FFT(x) ) = x. * * 2) Real FFT half-spectrum * Other libraries use only half the frequency spectrum (plus one extra * sample for the Nyquist bin) for a real FFT, the other half is the * conjugate-symmetric of the first half. This saves them a copy and some * memory. The downside is the caller needs to have special logic for the * number of bins in complex vs real. * * How Eigen/FFT differs: The full spectrum is returned from the forward * transform. This facilitates generic template programming by obviating * separate specializations for real vs complex. On the inverse * transform, only half the spectrum is actually used if the output type is real. */ #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #ifdef EIGEN_FFTW_DEFAULT // FFTW: faster, GPL -- incompatible with Eigen in LGPL form, bigger code size # include # include "src/FFT/ei_fftw_impl.h" namespace Eigen { //template typedef struct internal::fftw_impl default_fft_impl; this does not work template struct default_fft_impl : public internal::fftw_impl {}; } #elif defined EIGEN_MKL_DEFAULT // TODO // intel Math Kernel Library: fastest, commercial -- may be incompatible with Eigen in GPL form # include "src/FFT/ei_imklfft_impl.h" namespace Eigen { template struct default_fft_impl : public internal::imklfft_impl {}; } #else // internal::kissfft_impl: small, free, reasonably efficient default, derived from kissfft // # include "src/FFT/ei_kissfft_impl.h" namespace Eigen { template struct default_fft_impl : public internal::kissfft_impl {}; } #endif namespace Eigen { // template struct fft_fwd_proxy; template struct fft_inv_proxy; namespace internal { template struct traits< fft_fwd_proxy > { typedef typename T_SrcMat::PlainObject ReturnType; }; template struct traits< fft_inv_proxy > { typedef typename T_SrcMat::PlainObject ReturnType; }; } template struct fft_fwd_proxy : public ReturnByValue > { typedef DenseIndex Index; fft_fwd_proxy(const T_SrcMat& src,T_FftIfc & fft, Index nfft) : m_src(src),m_ifc(fft), m_nfft(nfft) {} template void evalTo(T_DestMat& dst) const; Index rows() const { return m_src.rows(); } Index cols() const { return m_src.cols(); } protected: const T_SrcMat & m_src; T_FftIfc & m_ifc; Index m_nfft; }; template struct fft_inv_proxy : public ReturnByValue > { typedef DenseIndex Index; fft_inv_proxy(const T_SrcMat& src,T_FftIfc & fft, Index nfft) : m_src(src),m_ifc(fft), m_nfft(nfft) {} template void evalTo(T_DestMat& dst) const; Index rows() const { return m_src.rows(); } Index cols() const { return m_src.cols(); } protected: const T_SrcMat & m_src; T_FftIfc & m_ifc; Index m_nfft; }; template > class FFT { public: typedef T_Impl impl_type; typedef DenseIndex Index; typedef typename impl_type::Scalar Scalar; typedef typename impl_type::Complex Complex; enum Flag { Default=0, // goof proof Unscaled=1, HalfSpectrum=2, // SomeOtherSpeedOptimization=4 Speedy=32767 }; FFT( const impl_type & impl=impl_type() , Flag flags=Default ) :m_impl(impl),m_flag(flags) { } inline bool HasFlag(Flag f) const { return (m_flag & (int)f) == f;} inline void SetFlag(Flag f) { m_flag |= (int)f;} inline void ClearFlag(Flag f) { m_flag &= (~(int)f);} inline void fwd( Complex * dst, const Scalar * src, Index nfft) { m_impl.fwd(dst,src,static_cast(nfft)); if ( HasFlag(HalfSpectrum) == false) ReflectSpectrum(dst,nfft); } inline void fwd( Complex * dst, const Complex * src, Index nfft) { m_impl.fwd(dst,src,static_cast(nfft)); } /* inline void fwd2(Complex * dst, const Complex * src, int n0,int n1) { m_impl.fwd2(dst,src,n0,n1); } */ template inline void fwd( std::vector & dst, const std::vector<_Input> & src) { if ( NumTraits<_Input>::IsComplex == 0 && HasFlag(HalfSpectrum) ) dst.resize( (src.size()>>1)+1); // half the bins + Nyquist bin else dst.resize(src.size()); fwd(&dst[0],&src[0],src.size()); } template inline void fwd( MatrixBase & dst, const MatrixBase & src, Index nfft=-1) { typedef typename ComplexDerived::Scalar dst_type; typedef typename InputDerived::Scalar src_type; EIGEN_STATIC_ASSERT_VECTOR_ONLY(InputDerived) EIGEN_STATIC_ASSERT_VECTOR_ONLY(ComplexDerived) EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(ComplexDerived,InputDerived) // size at compile-time EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) EIGEN_STATIC_ASSERT(int(InputDerived::Flags)&int(ComplexDerived::Flags)&DirectAccessBit, THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES) if (nfft<1) nfft = src.size(); if ( NumTraits< src_type >::IsComplex == 0 && HasFlag(HalfSpectrum) ) dst.derived().resize( (nfft>>1)+1); else dst.derived().resize(nfft); if ( src.innerStride() != 1 || src.size() < nfft ) { Matrix tmp; if (src.size() inline fft_fwd_proxy< MatrixBase, FFT > fwd( const MatrixBase & src, Index nfft=-1) { return fft_fwd_proxy< MatrixBase ,FFT >( src, *this,nfft ); } template inline fft_inv_proxy< MatrixBase, FFT > inv( const MatrixBase & src, Index nfft=-1) { return fft_inv_proxy< MatrixBase ,FFT >( src, *this,nfft ); } inline void inv( Complex * dst, const Complex * src, Index nfft) { m_impl.inv( dst,src,static_cast(nfft) ); if ( HasFlag( Unscaled ) == false) scale(dst,Scalar(1./nfft),nfft); // scale the time series } inline void inv( Scalar * dst, const Complex * src, Index nfft) { m_impl.inv( dst,src,static_cast(nfft) ); if ( HasFlag( Unscaled ) == false) scale(dst,Scalar(1./nfft),nfft); // scale the time series } template inline void inv( MatrixBase & dst, const MatrixBase & src, Index nfft=-1) { typedef typename ComplexDerived::Scalar src_type; typedef typename ComplexDerived::RealScalar real_type; typedef typename OutputDerived::Scalar dst_type; const bool realfft= (NumTraits::IsComplex == 0); EIGEN_STATIC_ASSERT_VECTOR_ONLY(OutputDerived) EIGEN_STATIC_ASSERT_VECTOR_ONLY(ComplexDerived) EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(ComplexDerived,OutputDerived) // size at compile-time EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) EIGEN_STATIC_ASSERT(int(OutputDerived::Flags)&int(ComplexDerived::Flags)&DirectAccessBit, THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES) if (nfft<1) { //automatic FFT size determination if ( realfft && HasFlag(HalfSpectrum) ) nfft = 2*(src.size()-1); //assume even fft size else nfft = src.size(); } dst.derived().resize( nfft ); // check for nfft that does not fit the input data size Index resize_input= ( realfft && HasFlag(HalfSpectrum) ) ? ( (nfft/2+1) - src.size() ) : ( nfft - src.size() ); if ( src.innerStride() != 1 || resize_input ) { // if the vector is strided, then we need to copy it to a packed temporary Matrix tmp; if ( resize_input ) { size_t ncopy = (std::min)(src.size(),src.size() + resize_input); tmp.setZero(src.size() + resize_input); if ( realfft && HasFlag(HalfSpectrum) ) { // pad at the Nyquist bin tmp.head(ncopy) = src.head(ncopy); tmp(ncopy-1) = real(tmp(ncopy-1)); // enforce real-only Nyquist bin }else{ size_t nhead,ntail; nhead = 1+ncopy/2-1; // range [0:pi) ntail = ncopy/2-1; // range (-pi:0) tmp.head(nhead) = src.head(nhead); tmp.tail(ntail) = src.tail(ntail); if (resize_input<0) { //shrinking -- create the Nyquist bin as the average of the two bins that fold into it tmp(nhead) = ( src(nfft/2) + src( src.size() - nfft/2 ) )*real_type(.5); }else{ // expanding -- split the old Nyquist bin into two halves tmp(nhead) = src(nhead) * real_type(.5); tmp(tmp.size()-nhead) = tmp(nhead); } } }else{ tmp = src; } inv( &dst[0],&tmp[0], nfft); }else{ inv( &dst[0],&src[0], nfft); } } template inline void inv( std::vector<_Output> & dst, const std::vector & src,Index nfft=-1) { if (nfft<1) nfft = ( NumTraits<_Output>::IsComplex == 0 && HasFlag(HalfSpectrum) ) ? 2*(src.size()-1) : src.size(); dst.resize( nfft ); inv( &dst[0],&src[0],nfft); } /* // TODO: multi-dimensional FFTs inline void inv2(Complex * dst, const Complex * src, int n0,int n1) { m_impl.inv2(dst,src,n0,n1); if ( HasFlag( Unscaled ) == false) scale(dst,1./(n0*n1),n0*n1); } */ inline impl_type & impl() {return m_impl;} private: template inline void scale(T_Data * x,Scalar s,Index nx) { #if 1 for (int k=0;k::Map(x,nx) *= s; else Matrix::MapAligned(x,nx) *= s; //Matrix::Map(x,nx) * s; #endif } inline void ReflectSpectrum(Complex * freq, Index nfft) { // create the implicit right-half spectrum (conjugate-mirror of the left-half) Index nhbins=(nfft>>1)+1; for (Index k=nhbins;k < nfft; ++k ) freq[k] = conj(freq[nfft-k]); } impl_type m_impl; int m_flag; }; template template inline void fft_fwd_proxy::evalTo(T_DestMat& dst) const { m_ifc.fwd( dst, m_src, m_nfft); } template template inline void fft_inv_proxy::evalTo(T_DestMat& dst) const { m_ifc.inv( dst, m_src, m_nfft); } } #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif RcppEigen/inst/include/unsupported/Eigen/SpecialFunctions0000644000176200001440000000560714562417221023355 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_MODULE #define EIGEN_SPECIALFUNCTIONS_MODULE #include #include "../../Eigen/Core" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" namespace Eigen { /** * \defgroup SpecialFunctions_Module Special math functions module * * This module features additional coefficient-wise math functions available * within the numext:: namespace for the scalar version, and as method and/or free * functions of Array. Those include: * * - erf * - erfc * - lgamma * - igamma * - igamma_der_a * - gamma_sample_der_alpha * - igammac * - digamma * - ndtri * - polygamma * - zeta * - betainc * * Bessel Functions * - bessel_i0 * - bessel_i0e * - bessel_i1 * - bessel_i1e * - bessel_j0 * - bessel_j1 * - bessel_k0 * - bessel_k0e * - bessel_k1 * - bessel_k1e * - bessel_y0 * - bessel_y1 * * \code * #include * \endcode */ //@{ } #include "src/SpecialFunctions/BesselFunctionsImpl.h" #include "src/SpecialFunctions/BesselFunctionsBFloat16.h" #include "src/SpecialFunctions/BesselFunctionsHalf.h" #include "src/SpecialFunctions/BesselFunctionsPacketMath.h" #include "src/SpecialFunctions/BesselFunctionsFunctors.h" #include "src/SpecialFunctions/BesselFunctionsArrayAPI.h" #include "src/SpecialFunctions/SpecialFunctionsImpl.h" #if defined(EIGEN_HIPCC) #include "src/SpecialFunctions/HipVectorCompatibility.h" #endif #include "src/SpecialFunctions/SpecialFunctionsBFloat16.h" #include "src/SpecialFunctions/SpecialFunctionsHalf.h" #include "src/SpecialFunctions/SpecialFunctionsPacketMath.h" #include "src/SpecialFunctions/SpecialFunctionsFunctors.h" #include "src/SpecialFunctions/SpecialFunctionsArrayAPI.h" #if defined EIGEN_VECTORIZE_AVX512 #include "src/SpecialFunctions/arch/AVX/BesselFunctions.h" #include "src/SpecialFunctions/arch/AVX/SpecialFunctions.h" #include "src/SpecialFunctions/arch/AVX512/BesselFunctions.h" #include "src/SpecialFunctions/arch/AVX512/SpecialFunctions.h" #elif defined EIGEN_VECTORIZE_AVX #include "src/SpecialFunctions/arch/AVX/BesselFunctions.h" #include "src/SpecialFunctions/arch/AVX/SpecialFunctions.h" #elif defined EIGEN_VECTORIZE_NEON #include "src/SpecialFunctions/arch/NEON/BesselFunctions.h" #include "src/SpecialFunctions/arch/NEON/SpecialFunctions.h" #endif #if defined EIGEN_VECTORIZE_GPU #include "src/SpecialFunctions/arch/GPU/SpecialFunctions.h" #endif namespace Eigen { //@} } #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPECIALFUNCTIONS_MODULE RcppEigen/inst/include/unsupported/Eigen/MatrixFunctions0000644000176200001440000004277714562417221023252 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Jitse Niesen // Copyright (C) 2012 Chen-Pang He // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_FUNCTIONS #define EIGEN_MATRIX_FUNCTIONS #include #include #include "../../Eigen/Core" #include "../../Eigen/LU" #include "../../Eigen/Eigenvalues" /** * \defgroup MatrixFunctions_Module Matrix functions module * \brief This module aims to provide various methods for the computation of * matrix functions. * * To use this module, add * \code * #include * \endcode * at the start of your source file. * * This module defines the following MatrixBase methods. * - \ref matrixbase_cos "MatrixBase::cos()", for computing the matrix cosine * - \ref matrixbase_cosh "MatrixBase::cosh()", for computing the matrix hyperbolic cosine * - \ref matrixbase_exp "MatrixBase::exp()", for computing the matrix exponential * - \ref matrixbase_log "MatrixBase::log()", for computing the matrix logarithm * - \ref matrixbase_pow "MatrixBase::pow()", for computing the matrix power * - \ref matrixbase_matrixfunction "MatrixBase::matrixFunction()", for computing general matrix functions * - \ref matrixbase_sin "MatrixBase::sin()", for computing the matrix sine * - \ref matrixbase_sinh "MatrixBase::sinh()", for computing the matrix hyperbolic sine * - \ref matrixbase_sqrt "MatrixBase::sqrt()", for computing the matrix square root * * These methods are the main entry points to this module. * * %Matrix functions are defined as follows. Suppose that \f$ f \f$ * is an entire function (that is, a function on the complex plane * that is everywhere complex differentiable). Then its Taylor * series * \f[ f(0) + f'(0) x + \frac{f''(0)}{2} x^2 + \frac{f'''(0)}{3!} x^3 + \cdots \f] * converges to \f$ f(x) \f$. In this case, we can define the matrix * function by the same series: * \f[ f(M) = f(0) + f'(0) M + \frac{f''(0)}{2} M^2 + \frac{f'''(0)}{3!} M^3 + \cdots \f] * */ #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/MatrixFunctions/MatrixExponential.h" #include "src/MatrixFunctions/MatrixFunction.h" #include "src/MatrixFunctions/MatrixSquareRoot.h" #include "src/MatrixFunctions/MatrixLogarithm.h" #include "src/MatrixFunctions/MatrixPower.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" /** \page matrixbaseextra_page \ingroup MatrixFunctions_Module \section matrixbaseextra MatrixBase methods defined in the MatrixFunctions module The remainder of the page documents the following MatrixBase methods which are defined in the MatrixFunctions module. \subsection matrixbase_cos MatrixBase::cos() Compute the matrix cosine. \code const MatrixFunctionReturnValue MatrixBase::cos() const \endcode \param[in] M a square matrix. \returns expression representing \f$ \cos(M) \f$. This function computes the matrix cosine. Use ArrayBase::cos() for computing the entry-wise cosine. The implementation calls \ref matrixbase_matrixfunction "matrixFunction()" with StdStemFunctions::cos(). \sa \ref matrixbase_sin "sin()" for an example. \subsection matrixbase_cosh MatrixBase::cosh() Compute the matrix hyberbolic cosine. \code const MatrixFunctionReturnValue MatrixBase::cosh() const \endcode \param[in] M a square matrix. \returns expression representing \f$ \cosh(M) \f$ This function calls \ref matrixbase_matrixfunction "matrixFunction()" with StdStemFunctions::cosh(). \sa \ref matrixbase_sinh "sinh()" for an example. \subsection matrixbase_exp MatrixBase::exp() Compute the matrix exponential. \code const MatrixExponentialReturnValue MatrixBase::exp() const \endcode \param[in] M matrix whose exponential is to be computed. \returns expression representing the matrix exponential of \p M. The matrix exponential of \f$ M \f$ is defined by \f[ \exp(M) = \sum_{k=0}^\infty \frac{M^k}{k!}. \f] The matrix exponential can be used to solve linear ordinary differential equations: the solution of \f$ y' = My \f$ with the initial condition \f$ y(0) = y_0 \f$ is given by \f$ y(t) = \exp(M) y_0 \f$. The matrix exponential is different from applying the exp function to all the entries in the matrix. Use ArrayBase::exp() if you want to do the latter. The cost of the computation is approximately \f$ 20 n^3 \f$ for matrices of size \f$ n \f$. The number 20 depends weakly on the norm of the matrix. The matrix exponential is computed using the scaling-and-squaring method combined with Padé approximation. The matrix is first rescaled, then the exponential of the reduced matrix is computed approximant, and then the rescaling is undone by repeated squaring. The degree of the Padé approximant is chosen such that the approximation error is less than the round-off error. However, errors may accumulate during the squaring phase. Details of the algorithm can be found in: Nicholas J. Higham, "The scaling and squaring method for the matrix exponential revisited," SIAM J. %Matrix Anal. Applic., 26:1179–1193, 2005. Example: The following program checks that \f[ \exp \left[ \begin{array}{ccc} 0 & \frac14\pi & 0 \\ -\frac14\pi & 0 & 0 \\ 0 & 0 & 0 \end{array} \right] = \left[ \begin{array}{ccc} \frac12\sqrt2 & -\frac12\sqrt2 & 0 \\ \frac12\sqrt2 & \frac12\sqrt2 & 0 \\ 0 & 0 & 1 \end{array} \right]. \f] This corresponds to a rotation of \f$ \frac14\pi \f$ radians around the z-axis. \include MatrixExponential.cpp Output: \verbinclude MatrixExponential.out \note \p M has to be a matrix of \c float, \c double, `long double` \c complex, \c complex, or `complex` . \subsection matrixbase_log MatrixBase::log() Compute the matrix logarithm. \code const MatrixLogarithmReturnValue MatrixBase::log() const \endcode \param[in] M invertible matrix whose logarithm is to be computed. \returns expression representing the matrix logarithm root of \p M. The matrix logarithm of \f$ M \f$ is a matrix \f$ X \f$ such that \f$ \exp(X) = M \f$ where exp denotes the matrix exponential. As for the scalar logarithm, the equation \f$ \exp(X) = M \f$ may have multiple solutions; this function returns a matrix whose eigenvalues have imaginary part in the interval \f$ (-\pi,\pi] \f$. The matrix logarithm is different from applying the log function to all the entries in the matrix. Use ArrayBase::log() if you want to do the latter. In the real case, the matrix \f$ M \f$ should be invertible and it should have no eigenvalues which are real and negative (pairs of complex conjugate eigenvalues are allowed). In the complex case, it only needs to be invertible. This function computes the matrix logarithm using the Schur-Parlett algorithm as implemented by MatrixBase::matrixFunction(). The logarithm of an atomic block is computed by MatrixLogarithmAtomic, which uses direct computation for 1-by-1 and 2-by-2 blocks and an inverse scaling-and-squaring algorithm for bigger blocks, with the square roots computed by MatrixBase::sqrt(). Details of the algorithm can be found in Section 11.6.2 of: Nicholas J. Higham, Functions of Matrices: Theory and Computation, SIAM 2008. ISBN 978-0-898716-46-7. Example: The following program checks that \f[ \log \left[ \begin{array}{ccc} \frac12\sqrt2 & -\frac12\sqrt2 & 0 \\ \frac12\sqrt2 & \frac12\sqrt2 & 0 \\ 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{ccc} 0 & \frac14\pi & 0 \\ -\frac14\pi & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]. \f] This corresponds to a rotation of \f$ \frac14\pi \f$ radians around the z-axis. This is the inverse of the example used in the documentation of \ref matrixbase_exp "exp()". \include MatrixLogarithm.cpp Output: \verbinclude MatrixLogarithm.out \note \p M has to be a matrix of \c float, \c double, `long double`, \c complex, \c complex, or `complex`. \sa MatrixBase::exp(), MatrixBase::matrixFunction(), class MatrixLogarithmAtomic, MatrixBase::sqrt(). \subsection matrixbase_pow MatrixBase::pow() Compute the matrix raised to arbitrary real power. \code const MatrixPowerReturnValue MatrixBase::pow(RealScalar p) const \endcode \param[in] M base of the matrix power, should be a square matrix. \param[in] p exponent of the matrix power. The matrix power \f$ M^p \f$ is defined as \f$ \exp(p \log(M)) \f$, where exp denotes the matrix exponential, and log denotes the matrix logarithm. This is different from raising all the entries in the matrix to the p-th power. Use ArrayBase::pow() if you want to do the latter. If \p p is complex, the scalar type of \p M should be the type of \p p . \f$ M^p \f$ simply evaluates into \f$ \exp(p \log(M)) \f$. Therefore, the matrix \f$ M \f$ should meet the conditions to be an argument of matrix logarithm. If \p p is real, it is casted into the real scalar type of \p M. Then this function computes the matrix power using the Schur-Padé algorithm as implemented by class MatrixPower. The exponent is split into integral part and fractional part, where the fractional part is in the interval \f$ (-1, 1) \f$. The main diagonal and the first super-diagonal is directly computed. If \p M is singular with a semisimple zero eigenvalue and \p p is positive, the Schur factor \f$ T \f$ is reordered with Givens rotations, i.e. \f[ T = \left[ \begin{array}{cc} T_1 & T_2 \\ 0 & 0 \end{array} \right] \f] where \f$ T_1 \f$ is invertible. Then \f$ T^p \f$ is given by \f[ T^p = \left[ \begin{array}{cc} T_1^p & T_1^{-1} T_1^p T_2 \\ 0 & 0 \end{array}. \right] \f] \warning Fractional power of a matrix with a non-semisimple zero eigenvalue is not well-defined. We introduce an assertion failure against inaccurate result, e.g. \code #include #include int main() { Eigen::Matrix4d A; A << 0, 0, 2, 3, 0, 0, 4, 5, 0, 0, 6, 7, 0, 0, 8, 9; std::cout << A.pow(0.37) << std::endl; // The 1 makes eigenvalue 0 non-semisimple. A.coeffRef(0, 1) = 1; // This fails if EIGEN_NO_DEBUG is undefined. std::cout << A.pow(0.37) << std::endl; return 0; } \endcode Details of the algorithm can be found in: Nicholas J. Higham and Lijing Lin, "A Schur-Padé algorithm for fractional powers of a matrix," SIAM J. %Matrix Anal. Applic., 32(3):1056–1078, 2011. Example: The following program checks that \f[ \left[ \begin{array}{ccc} \cos1 & -\sin1 & 0 \\ \sin1 & \cos1 & 0 \\ 0 & 0 & 1 \end{array} \right]^{\frac14\pi} = \left[ \begin{array}{ccc} \frac12\sqrt2 & -\frac12\sqrt2 & 0 \\ \frac12\sqrt2 & \frac12\sqrt2 & 0 \\ 0 & 0 & 1 \end{array} \right]. \f] This corresponds to \f$ \frac14\pi \f$ rotations of 1 radian around the z-axis. \include MatrixPower.cpp Output: \verbinclude MatrixPower.out MatrixBase::pow() is user-friendly. However, there are some circumstances under which you should use class MatrixPower directly. MatrixPower can save the result of Schur decomposition, so it's better for computing various powers for the same matrix. Example: \include MatrixPower_optimal.cpp Output: \verbinclude MatrixPower_optimal.out \note \p M has to be a matrix of \c float, \c double, `long double`, \c complex, \c complex, or \c complex . \sa MatrixBase::exp(), MatrixBase::log(), class MatrixPower. \subsection matrixbase_matrixfunction MatrixBase::matrixFunction() Compute a matrix function. \code const MatrixFunctionReturnValue MatrixBase::matrixFunction(typename internal::stem_function::Scalar>::type f) const \endcode \param[in] M argument of matrix function, should be a square matrix. \param[in] f an entire function; \c f(x,n) should compute the n-th derivative of f at x. \returns expression representing \p f applied to \p M. Suppose that \p M is a matrix whose entries have type \c Scalar. Then, the second argument, \p f, should be a function with prototype \code ComplexScalar f(ComplexScalar, int) \endcode where \c ComplexScalar = \c std::complex if \c Scalar is real (e.g., \c float or \c double) and \c ComplexScalar = \c Scalar if \c Scalar is complex. The return value of \c f(x,n) should be \f$ f^{(n)}(x) \f$, the n-th derivative of f at x. This routine uses the algorithm described in: Philip Davies and Nicholas J. Higham, "A Schur-Parlett algorithm for computing matrix functions", SIAM J. %Matrix Anal. Applic., 25:464–485, 2003. The actual work is done by the MatrixFunction class. Example: The following program checks that \f[ \exp \left[ \begin{array}{ccc} 0 & \frac14\pi & 0 \\ -\frac14\pi & 0 & 0 \\ 0 & 0 & 0 \end{array} \right] = \left[ \begin{array}{ccc} \frac12\sqrt2 & -\frac12\sqrt2 & 0 \\ \frac12\sqrt2 & \frac12\sqrt2 & 0 \\ 0 & 0 & 1 \end{array} \right]. \f] This corresponds to a rotation of \f$ \frac14\pi \f$ radians around the z-axis. This is the same example as used in the documentation of \ref matrixbase_exp "exp()". \include MatrixFunction.cpp Output: \verbinclude MatrixFunction.out Note that the function \c expfn is defined for complex numbers \c x, even though the matrix \c A is over the reals. Instead of \c expfn, we could also have used StdStemFunctions::exp: \code A.matrixFunction(StdStemFunctions >::exp, &B); \endcode \subsection matrixbase_sin MatrixBase::sin() Compute the matrix sine. \code const MatrixFunctionReturnValue MatrixBase::sin() const \endcode \param[in] M a square matrix. \returns expression representing \f$ \sin(M) \f$. This function computes the matrix sine. Use ArrayBase::sin() for computing the entry-wise sine. The implementation calls \ref matrixbase_matrixfunction "matrixFunction()" with StdStemFunctions::sin(). Example: \include MatrixSine.cpp Output: \verbinclude MatrixSine.out \subsection matrixbase_sinh MatrixBase::sinh() Compute the matrix hyperbolic sine. \code MatrixFunctionReturnValue MatrixBase::sinh() const \endcode \param[in] M a square matrix. \returns expression representing \f$ \sinh(M) \f$ This function calls \ref matrixbase_matrixfunction "matrixFunction()" with StdStemFunctions::sinh(). Example: \include MatrixSinh.cpp Output: \verbinclude MatrixSinh.out \subsection matrixbase_sqrt MatrixBase::sqrt() Compute the matrix square root. \code const MatrixSquareRootReturnValue MatrixBase::sqrt() const \endcode \param[in] M invertible matrix whose square root is to be computed. \returns expression representing the matrix square root of \p M. The matrix square root of \f$ M \f$ is the matrix \f$ M^{1/2} \f$ whose square is the original matrix; so if \f$ S = M^{1/2} \f$ then \f$ S^2 = M \f$. This is different from taking the square root of all the entries in the matrix; use ArrayBase::sqrt() if you want to do the latter. In the real case, the matrix \f$ M \f$ should be invertible and it should have no eigenvalues which are real and negative (pairs of complex conjugate eigenvalues are allowed). In that case, the matrix has a square root which is also real, and this is the square root computed by this function. The matrix square root is computed by first reducing the matrix to quasi-triangular form with the real Schur decomposition. The square root of the quasi-triangular matrix can then be computed directly. The cost is approximately \f$ 25 n^3 \f$ real flops for the real Schur decomposition and \f$ 3\frac13 n^3 \f$ real flops for the remainder (though the computation time in practice is likely more than this indicates). Details of the algorithm can be found in: Nicholas J. Highan, "Computing real square roots of a real matrix", Linear Algebra Appl., 88/89:405–430, 1987. If the matrix is positive-definite symmetric, then the square root is also positive-definite symmetric. In this case, it is best to use SelfAdjointEigenSolver::operatorSqrt() to compute it. In the complex case, the matrix \f$ M \f$ should be invertible; this is a restriction of the algorithm. The square root computed by this algorithm is the one whose eigenvalues have an argument in the interval \f$ (-\frac12\pi, \frac12\pi] \f$. This is the usual branch cut. The computation is the same as in the real case, except that the complex Schur decomposition is used to reduce the matrix to a triangular matrix. The theoretical cost is the same. Details are in: Åke Björck and Sven Hammarling, "A Schur method for the square root of a matrix", Linear Algebra Appl., 52/53:127–140, 1983. Example: The following program checks that the square root of \f[ \left[ \begin{array}{cc} \cos(\frac13\pi) & -\sin(\frac13\pi) \\ \sin(\frac13\pi) & \cos(\frac13\pi) \end{array} \right], \f] corresponding to a rotation over 60 degrees, is a rotation over 30 degrees: \f[ \left[ \begin{array}{cc} \cos(\frac16\pi) & -\sin(\frac16\pi) \\ \sin(\frac16\pi) & \cos(\frac16\pi) \end{array} \right]. \f] \include MatrixSquareRoot.cpp Output: \verbinclude MatrixSquareRoot.out \sa class RealSchur, class ComplexSchur, class MatrixSquareRoot, SelfAdjointEigenSolver::operatorSqrt(). */ #endif // EIGEN_MATRIX_FUNCTIONS RcppEigen/inst/include/unsupported/Eigen/AutoDiff0000644000176200001440000000223514562417221021577 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_AUTODIFF_MODULE #define EIGEN_AUTODIFF_MODULE namespace Eigen { /** * \defgroup AutoDiff_Module Auto Diff module * * This module features forward automatic differentation via a simple * templated scalar type wrapper AutoDiffScalar. * * Warning : this should NOT be confused with numerical differentiation, which * is a different method and has its own module in Eigen : \ref NumericalDiff_Module. * * \code * #include * \endcode */ //@{ } #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/AutoDiff/AutoDiffScalar.h" // #include "src/AutoDiff/AutoDiffVector.h" #include "src/AutoDiff/AutoDiffJacobian.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" namespace Eigen { //@} } #endif // EIGEN_AUTODIFF_MODULE RcppEigen/inst/include/unsupported/Eigen/Polynomials0000644000176200001440000001121514562417221022402 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_POLYNOMIALS_MODULE_H #define EIGEN_POLYNOMIALS_MODULE_H #include "../../Eigen/Core" #include "../../Eigen/Eigenvalues" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" // Note that EIGEN_HIDE_HEAVY_CODE has to be defined per module #if (defined EIGEN_EXTERN_INSTANTIATIONS) && (EIGEN_EXTERN_INSTANTIATIONS>=2) #ifndef EIGEN_HIDE_HEAVY_CODE #define EIGEN_HIDE_HEAVY_CODE #endif #elif defined EIGEN_HIDE_HEAVY_CODE #undef EIGEN_HIDE_HEAVY_CODE #endif /** * \defgroup Polynomials_Module Polynomials module * \brief This module provides a QR based polynomial solver. * * To use this module, add * \code * #include * \endcode * at the start of your source file. */ #include "src/Polynomials/PolynomialUtils.h" #include "src/Polynomials/Companion.h" #include "src/Polynomials/PolynomialSolver.h" /** \page polynomials Polynomials defines functions for dealing with polynomials and a QR based polynomial solver. \ingroup Polynomials_Module The remainder of the page documents first the functions for evaluating, computing polynomials, computing estimates about polynomials and next the QR based polynomial solver. \section polynomialUtils convenient functions to deal with polynomials \subsection roots_to_monicPolynomial The function \code void roots_to_monicPolynomial( const RootVector& rv, Polynomial& poly ) \endcode computes the coefficients \f$ a_i \f$ of \f$ p(x) = a_0 + a_{1}x + ... + a_{n-1}x^{n-1} + x^n \f$ where \f$ p \f$ is known through its roots i.e. \f$ p(x) = (x-r_1)(x-r_2)...(x-r_n) \f$. \subsection poly_eval The function \code T poly_eval( const Polynomials& poly, const T& x ) \endcode evaluates a polynomial at a given point using stabilized Hörner method. The following code: first computes the coefficients in the monomial basis of the monic polynomial that has the provided roots; then, it evaluates the computed polynomial, using a stabilized Hörner method. \include PolynomialUtils1.cpp Output: \verbinclude PolynomialUtils1.out \subsection Cauchy bounds The function \code Real cauchy_max_bound( const Polynomial& poly ) \endcode provides a maximum bound (the Cauchy one: \f$C(p)\f$) for the absolute value of a root of the given polynomial i.e. \f$ \forall r_i \f$ root of \f$ p(x) = \sum_{k=0}^d a_k x^k \f$, \f$ |r_i| \le C(p) = \sum_{k=0}^{d} \left | \frac{a_k}{a_d} \right | \f$ The leading coefficient \f$ p \f$: should be non zero \f$a_d \neq 0\f$. The function \code Real cauchy_min_bound( const Polynomial& poly ) \endcode provides a minimum bound (the Cauchy one: \f$c(p)\f$) for the absolute value of a non zero root of the given polynomial i.e. \f$ \forall r_i \neq 0 \f$ root of \f$ p(x) = \sum_{k=0}^d a_k x^k \f$, \f$ |r_i| \ge c(p) = \left( \sum_{k=0}^{d} \left | \frac{a_k}{a_0} \right | \right)^{-1} \f$ \section QR polynomial solver class Computes the complex roots of a polynomial by computing the eigenvalues of the associated companion matrix with the QR algorithm. The roots of \f$ p(x) = a_0 + a_1 x + a_2 x^2 + a_{3} x^3 + x^4 \f$ are the eigenvalues of \f$ \left [ \begin{array}{cccc} 0 & 0 & 0 & a_0 \\ 1 & 0 & 0 & a_1 \\ 0 & 1 & 0 & a_2 \\ 0 & 0 & 1 & a_3 \end{array} \right ] \f$ However, the QR algorithm is not guaranteed to converge when there are several eigenvalues with same modulus. Therefore the current polynomial solver is guaranteed to provide a correct result only when the complex roots \f$r_1,r_2,...,r_d\f$ have distinct moduli i.e. \f$ \forall i,j \in [1;d],~ \| r_i \| \neq \| r_j \| \f$. With 32bit (float) floating types this problem shows up frequently. However, almost always, correct accuracy is reached even in these cases for 64bit (double) floating types and small polynomial degree (<20). \include PolynomialSolver1.cpp In the above example: -# a simple use of the polynomial solver is shown; -# the accuracy problem with the QR algorithm is presented: a polynomial with almost conjugate roots is provided to the solver. Those roots have almost same module therefore the QR algorithm failed to converge: the accuracy of the last root is bad; -# a simple way to circumvent the problem is shown: use doubles instead of floats. Output: \verbinclude PolynomialSolver1.out */ #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_POLYNOMIALS_MODULE_H RcppEigen/inst/include/unsupported/Eigen/NonLinearOptimization0000644000176200001440000001351314562417221024373 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NONLINEAROPTIMIZATION_MODULE #define EIGEN_NONLINEAROPTIMIZATION_MODULE #include #include "../../Eigen/Core" #include "../../Eigen/Jacobi" #include "../../Eigen/QR" #include "NumericalDiff" /** * \defgroup NonLinearOptimization_Module Non linear optimization module * * \code * #include * \endcode * * This module provides implementation of two important algorithms in non linear * optimization. In both cases, we consider a system of non linear functions. Of * course, this should work, and even work very well if those functions are * actually linear. But if this is so, you should probably better use other * methods more fitted to this special case. * * One algorithm allows to find a least-squares solution of such a system * (Levenberg-Marquardt algorithm) and the second one is used to find * a zero for the system (Powell hybrid "dogleg" method). * * This code is a port of minpack (http://en.wikipedia.org/wiki/MINPACK). * Minpack is a very famous, old, robust and well renowned package, written in * fortran. Those implementations have been carefully tuned, tested, and used * for several decades. * * The original fortran code was automatically translated using f2c (http://en.wikipedia.org/wiki/F2c) in C, * then c++, and then cleaned by several different authors. * The last one of those cleanings being our starting point : * http://devernay.free.fr/hacks/cminpack.html * * Finally, we ported this code to Eigen, creating classes and API * coherent with Eigen. When possible, we switched to Eigen * implementation, such as most linear algebra (vectors, matrices, stable norms). * * Doing so, we were very careful to check the tests we setup at the very * beginning, which ensure that the same results are found. * * \section Tests Tests * * The tests are placed in the file unsupported/test/NonLinear.cpp. * * There are two kinds of tests : those that come from examples bundled with cminpack. * They guaranty we get the same results as the original algorithms (value for 'x', * for the number of evaluations of the function, and for the number of evaluations * of the Jacobian if ever). * * Other tests were added by myself at the very beginning of the * process and check the results for Levenberg-Marquardt using the reference data * on http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml. Since then i've * carefully checked that the same results were obtained when modifying the * code. Please note that we do not always get the exact same decimals as they do, * but this is ok : they use 128bits float, and we do the tests using the C type 'double', * which is 64 bits on most platforms (x86 and amd64, at least). * I've performed those tests on several other implementations of Levenberg-Marquardt, and * (c)minpack performs VERY well compared to those, both in accuracy and speed. * * The documentation for running the tests is on the wiki * http://eigen.tuxfamily.org/index.php?title=Tests * * \section API API: overview of methods * * Both algorithms needs a functor computing the Jacobian. It can be computed by * hand, using auto-differentiation (see \ref AutoDiff_Module), or using numerical * differences (see \ref NumericalDiff_Module). For instance: *\code * MyFunc func; * NumericalDiff func_with_num_diff(func); * LevenbergMarquardt > lm(func_with_num_diff); * \endcode * For HybridNonLinearSolver, the method solveNumericalDiff() does the above wrapping for * you. * * The methods LevenbergMarquardt.lmder1()/lmdif1()/lmstr1() and * HybridNonLinearSolver.hybrj1()/hybrd1() are specific methods from the original * minpack package that you probably should NOT use until you are porting a code that * was previously using minpack. They just define a 'simple' API with default values * for some parameters. * * All algorithms are provided using two APIs : * - one where the user inits the algorithm, and uses '*OneStep()' as much as he wants : * this way the caller have control over the steps * - one where the user just calls a method (optimize() or solve()) which will * handle the loop: init + loop until a stop condition is met. Those are provided for * convenience. * * As an example, the method LevenbergMarquardt::minimize() is * implemented as follow: * \code * Status LevenbergMarquardt::minimize(FVectorType &x, const int mode) * { * Status status = minimizeInit(x, mode); * do { * status = minimizeOneStep(x, mode); * } while (status==Running); * return status; * } * \endcode * * \section examples Examples * * The easiest way to understand how to use this module is by looking at the many examples in the file * unsupported/test/NonLinearOptimization.cpp. */ #ifndef EIGEN_PARSED_BY_DOXYGEN #include "src/NonLinearOptimization/qrsolv.h" #include "src/NonLinearOptimization/r1updt.h" #include "src/NonLinearOptimization/r1mpyq.h" #include "src/NonLinearOptimization/rwupdt.h" #include "src/NonLinearOptimization/fdjac1.h" #include "src/NonLinearOptimization/lmpar.h" #include "src/NonLinearOptimization/dogleg.h" #include "src/NonLinearOptimization/covar.h" #include "src/NonLinearOptimization/chkder.h" #endif #include "src/NonLinearOptimization/HybridNonLinearSolver.h" #include "src/NonLinearOptimization/LevenbergMarquardt.h" #endif // EIGEN_NONLINEAROPTIMIZATION_MODULE RcppEigen/inst/include/unsupported/Eigen/IterativeSolvers0000644000176200001440000000303114562417221023403 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ITERATIVE_SOLVERS_MODULE_H #define EIGEN_ITERATIVE_SOLVERS_MODULE_H #include "../../Eigen/Sparse" #include "../../Eigen/Jacobi" #include "../../Eigen/Householder" /** * \defgroup IterativeLinearSolvers_Module Iterative solvers module * This module aims to provide various iterative linear and non linear solver algorithms. * It currently provides: * - a constrained conjugate gradient * - a Householder GMRES implementation * - an IDR(s) implementation * - a DGMRES implementation * - a MINRES implementation * * \code * #include * \endcode */ #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #ifndef EIGEN_MPL2_ONLY #include "src/IterativeSolvers/IterationController.h" #include "src/IterativeSolvers/ConstrainedConjGrad.h" #endif #include "src/IterativeSolvers/IncompleteLU.h" #include "src/IterativeSolvers/GMRES.h" #include "src/IterativeSolvers/DGMRES.h" //#include "src/IterativeSolvers/SSORPreconditioner.h" #include "src/IterativeSolvers/MINRES.h" #include "src/IterativeSolvers/IDRS.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_ITERATIVE_SOLVERS_MODULE_H RcppEigen/inst/include/unsupported/Eigen/LevenbergMarquardt0000644000176200001440000000232614562417221023671 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LEVENBERGMARQUARDT_MODULE #define EIGEN_LEVENBERGMARQUARDT_MODULE // #include #include "../../Eigen/Core" #include "../../Eigen/Jacobi" #include "../../Eigen/QR" #include "NumericalDiff" #include "../../Eigen/SparseQR" /** * \defgroup LevenbergMarquardt_Module Levenberg-Marquardt module * * \code * #include * \endcode * * */ #include "../../Eigen/SparseCore" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #ifndef EIGEN_PARSED_BY_DOXYGEN #include "src/LevenbergMarquardt/LMqrsolv.h" #include "src/LevenbergMarquardt/LMcovar.h" #include "src/LevenbergMarquardt/LMpar.h" #endif #include "src/LevenbergMarquardt/LevenbergMarquardt.h" #include "src/LevenbergMarquardt/LMonestep.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_LEVENBERGMARQUARDT_MODULE RcppEigen/inst/include/unsupported/Eigen/AlignedVector30000644000176200001440000001431514562417221022711 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ALIGNED_VECTOR3 #define EIGEN_ALIGNED_VECTOR3 #include "../../Eigen/Geometry" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" namespace Eigen { /** * \defgroup AlignedVector3_Module Aligned vector3 module * * \code * #include * \endcode */ //@{ /** \class AlignedVector3 * * \brief A vectorization friendly 3D vector * * This class represents a 3D vector internally using a 4D vector * such that vectorization can be seamlessly enabled. Of course, * the same result can be achieved by directly using a 4D vector. * This class makes this process simpler. * */ // TODO specialize Cwise template class AlignedVector3; namespace internal { template struct traits > : traits > { }; } template class AlignedVector3 : public MatrixBase > { typedef Matrix<_Scalar,4,1> CoeffType; CoeffType m_coeffs; public: typedef MatrixBase > Base; EIGEN_DENSE_PUBLIC_INTERFACE(AlignedVector3) using Base::operator*; inline Index rows() const { return 3; } inline Index cols() const { return 1; } Scalar* data() { return m_coeffs.data(); } const Scalar* data() const { return m_coeffs.data(); } Index innerStride() const { return 1; } Index outerStride() const { return 3; } inline const Scalar& coeff(Index row, Index col) const { return m_coeffs.coeff(row, col); } inline Scalar& coeffRef(Index row, Index col) { return m_coeffs.coeffRef(row, col); } inline const Scalar& coeff(Index index) const { return m_coeffs.coeff(index); } inline Scalar& coeffRef(Index index) { return m_coeffs.coeffRef(index);} inline AlignedVector3() {} inline AlignedVector3(const Scalar& x, const Scalar& y, const Scalar& z) : m_coeffs(x, y, z, Scalar(0)) {} inline AlignedVector3(const AlignedVector3& other) : Base(), m_coeffs(other.m_coeffs) {} template struct generic_assign_selector {}; template struct generic_assign_selector { inline static void run(AlignedVector3& dest, const XprType& src) { dest.m_coeffs = src; } }; template struct generic_assign_selector { inline static void run(AlignedVector3& dest, const XprType& src) { dest.m_coeffs.template head<3>() = src; dest.m_coeffs.w() = Scalar(0); } }; template inline AlignedVector3(const MatrixBase& other) { generic_assign_selector::run(*this,other.derived()); } inline AlignedVector3& operator=(const AlignedVector3& other) { m_coeffs = other.m_coeffs; return *this; } template inline AlignedVector3& operator=(const MatrixBase& other) { generic_assign_selector::run(*this,other.derived()); return *this; } inline AlignedVector3 operator+(const AlignedVector3& other) const { return AlignedVector3(m_coeffs + other.m_coeffs); } inline AlignedVector3& operator+=(const AlignedVector3& other) { m_coeffs += other.m_coeffs; return *this; } inline AlignedVector3 operator-(const AlignedVector3& other) const { return AlignedVector3(m_coeffs - other.m_coeffs); } inline AlignedVector3 operator-() const { return AlignedVector3(-m_coeffs); } inline AlignedVector3 operator-=(const AlignedVector3& other) { m_coeffs -= other.m_coeffs; return *this; } inline AlignedVector3 operator*(const Scalar& s) const { return AlignedVector3(m_coeffs * s); } inline friend AlignedVector3 operator*(const Scalar& s,const AlignedVector3& vec) { return AlignedVector3(s * vec.m_coeffs); } inline AlignedVector3& operator*=(const Scalar& s) { m_coeffs *= s; return *this; } inline AlignedVector3 operator/(const Scalar& s) const { return AlignedVector3(m_coeffs / s); } inline AlignedVector3& operator/=(const Scalar& s) { m_coeffs /= s; return *this; } inline Scalar dot(const AlignedVector3& other) const { eigen_assert(m_coeffs.w()==Scalar(0)); eigen_assert(other.m_coeffs.w()==Scalar(0)); return m_coeffs.dot(other.m_coeffs); } inline void normalize() { m_coeffs /= norm(); } inline AlignedVector3 normalized() const { return AlignedVector3(m_coeffs / norm()); } inline Scalar sum() const { eigen_assert(m_coeffs.w()==Scalar(0)); return m_coeffs.sum(); } inline Scalar squaredNorm() const { eigen_assert(m_coeffs.w()==Scalar(0)); return m_coeffs.squaredNorm(); } inline Scalar norm() const { using std::sqrt; return sqrt(squaredNorm()); } inline AlignedVector3 cross(const AlignedVector3& other) const { return AlignedVector3(m_coeffs.cross3(other.m_coeffs)); } template inline bool isApprox(const MatrixBase& other, const RealScalar& eps=NumTraits::dummy_precision()) const { return m_coeffs.template head<3>().isApprox(other,eps); } CoeffType& coeffs() { return m_coeffs; } const CoeffType& coeffs() const { return m_coeffs; } }; namespace internal { template struct eval, Dense> { typedef const AlignedVector3<_Scalar>& type; }; template struct evaluator > : evaluator > { typedef AlignedVector3 XprType; typedef evaluator > Base; evaluator(const XprType &m) : Base(m.coeffs()) {} }; } //@} } #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_ALIGNED_VECTOR3 RcppEigen/inst/include/unsupported/Eigen/NumericalDiff0000644000176200001440000000336314562417221022611 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NUMERICALDIFF_MODULE #define EIGEN_NUMERICALDIFF_MODULE #include "../../Eigen/Core" namespace Eigen { /** * \defgroup NumericalDiff_Module Numerical differentiation module * * \code * #include * \endcode * * See http://en.wikipedia.org/wiki/Numerical_differentiation * * Warning : this should NOT be confused with automatic differentiation, which * is a different method and has its own module in Eigen : \ref * AutoDiff_Module. * * Currently only "Forward" and "Central" schemes are implemented. Those * are basic methods, and there exist some more elaborated way of * computing such approximates. They are implemented using both * proprietary and free software, and usually requires linking to an * external library. It is very easy for you to write a functor * using such software, and the purpose is quite orthogonal to what we * want to achieve with Eigen. * * This is why we will not provide wrappers for every great numerical * differentiation software that exist, but should rather stick with those * basic ones, that still are useful for testing. * * Also, the \ref NonLinearOptimization_Module needs this in order to * provide full features compatibility with the original (c)minpack * package. * */ } //@{ #include "src/NumericalDiff/NumericalDiff.h" //@} #endif // EIGEN_NUMERICALDIFF_MODULE RcppEigen/inst/include/unsupported/Eigen/ArpackSupport0000644000176200001440000000156414562417221022700 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARPACKSUPPORT_MODULE_H #define EIGEN_ARPACKSUPPORT_MODULE_H #include "../../Eigen/Core" /** \defgroup ArpackSupport_Module Arpack support module * * This module provides a wrapper to Arpack, a library for sparse eigenvalue decomposition. * * \code * #include * \endcode */ #include "../../Eigen/SparseCholesky" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/Eigenvalues/ArpackSelfAdjointEigenSolver.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_ARPACKSUPPORT_MODULE_H RcppEigen/inst/include/unsupported/Eigen/BVH0000644000176200001440000001262314562417221020517 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Ilya Baran // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BVH_MODULE_H #define EIGEN_BVH_MODULE_H #include "../../Eigen/Core" #include "../../Eigen/Geometry" #include "../../Eigen/StdVector" #include #include namespace Eigen { /** * \defgroup BVH_Module BVH module * \brief This module provides generic bounding volume hierarchy algorithms * and reference tree implementations. * * * \code * #include * \endcode * * A bounding volume hierarchy (BVH) can accelerate many geometric queries. This module provides a generic implementation * of the two basic algorithms over a BVH: intersection of a query object against all objects in the hierarchy and minimization * of a function over the objects in the hierarchy. It also provides intersection and minimization over a cartesian product of * two BVH's. A BVH accelerates intersection by using the fact that if a query object does not intersect a volume, then it cannot * intersect any object contained in that volume. Similarly, a BVH accelerates minimization because the minimum of a function * over a volume is no greater than the minimum of a function over any object contained in it. * * Some sample queries that can be written in terms of intersection are: * - Determine all points where a ray intersects a triangle mesh * - Given a set of points, determine which are contained in a query sphere * - Given a set of spheres, determine which contain the query point * - Given a set of disks, determine if any is completely contained in a query rectangle (represent each 2D disk as a point \f$(x,y,r)\f$ * in 3D and represent the rectangle as a pyramid based on the original rectangle and shrinking in the \f$r\f$ direction) * - Given a set of points, count how many pairs are \f$d\pm\epsilon\f$ apart (done by looking at the cartesian product of the set * of points with itself) * * Some sample queries that can be written in terms of function minimization over a set of objects are: * - Find the intersection between a ray and a triangle mesh closest to the ray origin (function is infinite off the ray) * - Given a polyline and a query point, determine the closest point on the polyline to the query * - Find the diameter of a point cloud (done by looking at the cartesian product and using negative distance as the function) * - Determine how far two meshes are from colliding (this is also a cartesian product query) * * This implementation decouples the basic algorithms both from the type of hierarchy (and the types of the bounding volumes) and * from the particulars of the query. To enable abstraction from the BVH, the BVH is required to implement a generic mechanism * for traversal. To abstract from the query, the query is responsible for keeping track of results. * * To be used in the algorithms, a hierarchy must implement the following traversal mechanism (see KdBVH for a sample implementation): \code typedef Volume //the type of bounding volume typedef Object //the type of object in the hierarchy typedef Index //a reference to a node in the hierarchy--typically an int or a pointer typedef VolumeIterator //an iterator type over node children--returns Index typedef ObjectIterator //an iterator over object (leaf) children--returns const Object & Index getRootIndex() const //returns the index of the hierarchy root const Volume &getVolume(Index index) const //returns the bounding volume of the node at given index void getChildren(Index index, VolumeIterator &outVBegin, VolumeIterator &outVEnd, ObjectIterator &outOBegin, ObjectIterator &outOEnd) const //getChildren takes a node index and makes [outVBegin, outVEnd) range over its node children //and [outOBegin, outOEnd) range over its object children \endcode * * To use the hierarchy, call BVIntersect or BVMinimize, passing it a BVH (or two, for cartesian product) and a minimizer or intersector. * For an intersection query on a single BVH, the intersector encapsulates the query and must provide two functions: * \code bool intersectVolume(const Volume &volume) //returns true if the query intersects the volume bool intersectObject(const Object &object) //returns true if the intersection search should terminate immediately \endcode * The guarantee that BVIntersect provides is that intersectObject will be called on every object whose bounding volume * intersects the query (but possibly on other objects too) unless the search is terminated prematurely. It is the * responsibility of the intersectObject function to keep track of the results in whatever manner is appropriate. * The cartesian product intersection and the BVMinimize queries are similar--see their individual documentation. * * The following is a simple but complete example for how to use the BVH to accelerate the search for a closest red-blue point pair: * \include BVH_Example.cpp * Output: \verbinclude BVH_Example.out */ } //@{ #include "src/BVH/BVAlgorithms.h" #include "src/BVH/KdBVH.h" //@} #endif // EIGEN_BVH_MODULE_H RcppEigen/inst/include/unsupported/Eigen/MPRealSupport0000644000176200001440000001675014562417221022622 0ustar liggesusers// This file is part of a joint effort between Eigen, a lightweight C++ template library // for linear algebra, and MPFR C++, a C++ interface to MPFR library (http://www.holoborodko.com/pavel/) // // Copyright (C) 2010-2012 Pavel Holoborodko // Copyright (C) 2010 Konstantin Holoborodko // Copyright (C) 2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MPREALSUPPORT_MODULE_H #define EIGEN_MPREALSUPPORT_MODULE_H #include "../../Eigen/Core" #include namespace Eigen { /** * \defgroup MPRealSupport_Module MPFRC++ Support module * \code * #include * \endcode * * This module provides support for multi precision floating point numbers * via the MPFR C++ * library which itself is built upon MPFR/GMP. * * \warning MPFR C++ is licensed under the GPL. * * You can find a copy of MPFR C++ that is known to be compatible in the unsupported/test/mpreal folder. * * Here is an example: * \code #include #include #include using namespace mpfr; using namespace Eigen; int main() { // set precision to 256 bits (double has only 53 bits) mpreal::set_default_prec(256); // Declare matrix and vector types with multi-precision scalar type typedef Matrix MatrixXmp; typedef Matrix VectorXmp; MatrixXmp A = MatrixXmp::Random(100,100); VectorXmp b = VectorXmp::Random(100); // Solve Ax=b using LU VectorXmp x = A.lu().solve(b); std::cout << "relative error: " << (A*x - b).norm() / b.norm() << std::endl; return 0; } \endcode * */ template<> struct NumTraits : GenericNumTraits { enum { IsInteger = 0, IsSigned = 1, IsComplex = 0, RequireInitialization = 1, ReadCost = HugeCost, AddCost = HugeCost, MulCost = HugeCost }; typedef mpfr::mpreal Real; typedef mpfr::mpreal NonInteger; static inline Real highest (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::maxval(Precision); } static inline Real lowest (long Precision = mpfr::mpreal::get_default_prec()) { return -mpfr::maxval(Precision); } // Constants static inline Real Pi (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::const_pi(Precision); } static inline Real Euler (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::const_euler(Precision); } static inline Real Log2 (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::const_log2(Precision); } static inline Real Catalan (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::const_catalan(Precision); } static inline Real epsilon (long Precision = mpfr::mpreal::get_default_prec()) { return mpfr::machine_epsilon(Precision); } static inline Real epsilon (const Real& x) { return mpfr::machine_epsilon(x); } #ifdef MPREAL_HAVE_DYNAMIC_STD_NUMERIC_LIMITS static inline int digits10 (long Precision = mpfr::mpreal::get_default_prec()) { return std::numeric_limits::digits10(Precision); } static inline int digits10 (const Real& x) { return std::numeric_limits::digits10(x); } static inline int digits () { return std::numeric_limits::digits(); } static inline int digits (const Real& x) { return std::numeric_limits::digits(x); } #endif static inline Real dummy_precision() { mpfr_prec_t weak_prec = ((mpfr::mpreal::get_default_prec()-1) * 90) / 100; return mpfr::machine_epsilon(weak_prec); } }; namespace internal { template<> inline mpfr::mpreal random() { return mpfr::random(); } template<> inline mpfr::mpreal random(const mpfr::mpreal& a, const mpfr::mpreal& b) { return a + (b-a) * random(); } inline bool isMuchSmallerThan(const mpfr::mpreal& a, const mpfr::mpreal& b, const mpfr::mpreal& eps) { return mpfr::abs(a) <= mpfr::abs(b) * eps; } inline bool isApprox(const mpfr::mpreal& a, const mpfr::mpreal& b, const mpfr::mpreal& eps) { return mpfr::isEqualFuzzy(a,b,eps); } inline bool isApproxOrLessThan(const mpfr::mpreal& a, const mpfr::mpreal& b, const mpfr::mpreal& eps) { return a <= b || mpfr::isEqualFuzzy(a,b,eps); } template<> inline long double cast(const mpfr::mpreal& x) { return x.toLDouble(); } template<> inline double cast(const mpfr::mpreal& x) { return x.toDouble(); } template<> inline long cast(const mpfr::mpreal& x) { return x.toLong(); } template<> inline int cast(const mpfr::mpreal& x) { return int(x.toLong()); } // Specialize GEBP kernel and traits for mpreal (no need for peeling, nor complicated stuff) // This also permits to directly call mpfr's routines and avoid many temporaries produced by mpreal template<> class gebp_traits { public: typedef mpfr::mpreal ResScalar; enum { Vectorizable = false, LhsPacketSize = 1, RhsPacketSize = 1, ResPacketSize = 1, NumberOfRegisters = 1, nr = 1, mr = 1, LhsProgress = 1, RhsProgress = 1 }; typedef ResScalar LhsPacket; typedef ResScalar RhsPacket; typedef ResScalar ResPacket; typedef LhsPacket LhsPacket4Packing; }; template struct gebp_kernel { typedef mpfr::mpreal mpreal; EIGEN_DONT_INLINE void operator()(const DataMapper& res, const mpreal* blockA, const mpreal* blockB, Index rows, Index depth, Index cols, const mpreal& alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0) { if(rows==0 || cols==0 || depth==0) return; mpreal acc1(0,mpfr_get_prec(blockA[0].mpfr_srcptr())), tmp (0,mpfr_get_prec(blockA[0].mpfr_srcptr())); if(strideA==-1) strideA = depth; if(strideB==-1) strideB = depth; for(Index i=0; i // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_OPENGL_MODULE #define EIGEN_OPENGL_MODULE #include "../../Eigen/Geometry" #if defined(__APPLE_CC__) #include #else #include #endif namespace Eigen { /** * \defgroup OpenGLSUpport_Module OpenGL Support module * * This module provides wrapper functions for a couple of OpenGL functions * which simplify the way to pass Eigen's object to openGL. * Here is an example: * * \code * // You need to add path_to_eigen/unsupported to your include path. * #include * // ... * Vector3f x, y; * Matrix3f rot; * * glVertex(y + x * rot); * * Quaternion q; * glRotate(q); * * // ... * \endcode * */ //@{ #define EIGEN_GL_FUNC_DECLARATION(FUNC) \ namespace internal { \ template< typename XprType, \ typename Scalar = typename XprType::Scalar, \ int Rows = XprType::RowsAtCompileTime, \ int Cols = XprType::ColsAtCompileTime, \ bool IsGLCompatible = bool(internal::evaluator::Flags&LinearAccessBit) \ && bool(XprType::Flags&DirectAccessBit) \ && (XprType::IsVectorAtCompileTime || (XprType::Flags&RowMajorBit)==0)> \ struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl); \ \ template \ struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(const XprType& p) { \ EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl)::type>::run(p); } \ }; \ } \ \ template inline void FUNC(const Eigen::DenseBase& p) { \ EIGEN_CAT(EIGEN_CAT(internal::gl_,FUNC),_impl)::run(p.derived()); \ } #define EIGEN_GL_FUNC_SPECIALIZATION_MAT(FUNC,SCALAR,ROWS,COLS,SUFFIX) \ namespace internal { \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(const XprType& p) { FUNC##SUFFIX(p.data()); } \ }; \ } #define EIGEN_GL_FUNC_SPECIALIZATION_VEC(FUNC,SCALAR,SIZE,SUFFIX) \ namespace internal { \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(const XprType& p) { FUNC##SUFFIX(p.data()); } \ }; \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(const XprType& p) { FUNC##SUFFIX(p.data()); } \ }; \ } EIGEN_GL_FUNC_DECLARATION (glVertex) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,int, 2,2iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,short, 2,2sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,float, 2,2fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,double, 2,2dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,int, 3,3iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,short, 3,3sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,double, 3,3dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,int, 4,4iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,short, 4,4sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,float, 4,4fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glVertex,double, 4,4dv) EIGEN_GL_FUNC_DECLARATION (glTexCoord) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,int, 2,2iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,short, 2,2sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,float, 2,2fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,double, 2,2dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,int, 3,3iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,short, 3,3sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,double, 3,3dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,int, 4,4iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,short, 4,4sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,float, 4,4fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTexCoord,double, 4,4dv) EIGEN_GL_FUNC_DECLARATION (glColor) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,int, 2,2iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,short, 2,2sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,float, 2,2fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,double, 2,2dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,int, 3,3iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,short, 3,3sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,double, 3,3dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,int, 4,4iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,short, 4,4sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,float, 4,4fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glColor,double, 4,4dv) EIGEN_GL_FUNC_DECLARATION (glNormal) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glNormal,int, 3,3iv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glNormal,short, 3,3sv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glNormal,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glNormal,double, 3,3dv) inline void glScale2fv(const float* v) { glScalef(v[0], v[1], 1.f); } inline void glScale2dv(const double* v) { glScaled(v[0], v[1], 1.0); } inline void glScale3fv(const float* v) { glScalef(v[0], v[1], v[2]); } inline void glScale3dv(const double* v) { glScaled(v[0], v[1], v[2]); } EIGEN_GL_FUNC_DECLARATION (glScale) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glScale,float, 2,2fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glScale,double, 2,2dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glScale,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glScale,double, 3,3dv) template void glScale(const UniformScaling& s) { glScale(Matrix::Constant(s.factor())); } inline void glTranslate2fv(const float* v) { glTranslatef(v[0], v[1], 0.f); } inline void glTranslate2dv(const double* v) { glTranslated(v[0], v[1], 0.0); } inline void glTranslate3fv(const float* v) { glTranslatef(v[0], v[1], v[2]); } inline void glTranslate3dv(const double* v) { glTranslated(v[0], v[1], v[2]); } EIGEN_GL_FUNC_DECLARATION (glTranslate) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTranslate,float, 2,2fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTranslate,double, 2,2dv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTranslate,float, 3,3fv) EIGEN_GL_FUNC_SPECIALIZATION_VEC(glTranslate,double, 3,3dv) template void glTranslate(const Translation& t) { glTranslate(t.vector()); } template void glTranslate(const Translation& t) { glTranslate(t.vector()); } EIGEN_GL_FUNC_DECLARATION (glMultMatrix) EIGEN_GL_FUNC_SPECIALIZATION_MAT(glMultMatrix,float, 4,4,f) EIGEN_GL_FUNC_SPECIALIZATION_MAT(glMultMatrix,double, 4,4,d) template void glMultMatrix(const Transform& t) { glMultMatrix(t.matrix()); } template void glMultMatrix(const Transform& t) { glMultMatrix(t.matrix()); } template void glMultMatrix(const Transform& t) { glMultMatrix(Transform(t).matrix()); } EIGEN_GL_FUNC_DECLARATION (glLoadMatrix) EIGEN_GL_FUNC_SPECIALIZATION_MAT(glLoadMatrix,float, 4,4,f) EIGEN_GL_FUNC_SPECIALIZATION_MAT(glLoadMatrix,double, 4,4,d) template void glLoadMatrix(const Transform& t) { glLoadMatrix(t.matrix()); } template void glLoadMatrix(const Transform& t) { glLoadMatrix(t.matrix()); } template void glLoadMatrix(const Transform& t) { glLoadMatrix(Transform(t).matrix()); } inline void glRotate(const Rotation2D& rot) { glRotatef(rot.angle()*180.f/float(EIGEN_PI), 0.f, 0.f, 1.f); } inline void glRotate(const Rotation2D& rot) { glRotated(rot.angle()*180.0/double(EIGEN_PI), 0.0, 0.0, 1.0); } template void glRotate(const RotationBase& rot) { Transform tr(rot); glMultMatrix(tr.matrix()); } #define EIGEN_GL_MAKE_CONST_const const #define EIGEN_GL_MAKE_CONST__ #define EIGEN_GL_EVAL(X) X #define EIGEN_GL_FUNC1_DECLARATION(FUNC,ARG1,CONST) \ namespace internal { \ template< typename XprType, \ typename Scalar = typename XprType::Scalar, \ int Rows = XprType::RowsAtCompileTime, \ int Cols = XprType::ColsAtCompileTime, \ bool IsGLCompatible = bool(internal::evaluator::Flags&LinearAccessBit) \ && bool(XprType::Flags&DirectAccessBit) \ && (XprType::IsVectorAtCompileTime || (XprType::Flags&RowMajorBit)==0)> \ struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl); \ \ template \ struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(ARG1 a,EIGEN_GL_EVAL(EIGEN_GL_MAKE_CONST_##CONST) XprType& p) { \ EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl)::type>::run(a,p); } \ }; \ } \ \ template inline void FUNC(ARG1 a,EIGEN_GL_EVAL(EIGEN_GL_MAKE_CONST_##CONST) Eigen::DenseBase& p) { \ EIGEN_CAT(EIGEN_CAT(internal::gl_,FUNC),_impl)::run(a,p.derived()); \ } #define EIGEN_GL_FUNC1_SPECIALIZATION_MAT(FUNC,ARG1,CONST,SCALAR,ROWS,COLS,SUFFIX) \ namespace internal { \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(ARG1 a, EIGEN_GL_EVAL(EIGEN_GL_MAKE_CONST_##CONST) XprType& p) { FUNC##SUFFIX(a,p.data()); } \ }; \ } #define EIGEN_GL_FUNC1_SPECIALIZATION_VEC(FUNC,ARG1,CONST,SCALAR,SIZE,SUFFIX) \ namespace internal { \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(ARG1 a, EIGEN_GL_EVAL(EIGEN_GL_MAKE_CONST_##CONST) XprType& p) { FUNC##SUFFIX(a,p.data()); } \ }; \ template< typename XprType> struct EIGEN_CAT(EIGEN_CAT(gl_,FUNC),_impl) { \ inline static void run(ARG1 a, EIGEN_GL_EVAL(EIGEN_GL_MAKE_CONST_##CONST) XprType& p) { FUNC##SUFFIX(a,p.data()); } \ }; \ } EIGEN_GL_FUNC1_DECLARATION (glGet,GLenum,_) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glGet,GLenum,_,float, 4,4,Floatv) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glGet,GLenum,_,double, 4,4,Doublev) // glUniform API #ifdef GL_VERSION_2_0 inline void glUniform2fv_ei (GLint loc, const float* v) { glUniform2fv(loc,1,v); } inline void glUniform2iv_ei (GLint loc, const int* v) { glUniform2iv(loc,1,v); } inline void glUniform3fv_ei (GLint loc, const float* v) { glUniform3fv(loc,1,v); } inline void glUniform3iv_ei (GLint loc, const int* v) { glUniform3iv(loc,1,v); } inline void glUniform4fv_ei (GLint loc, const float* v) { glUniform4fv(loc,1,v); } inline void glUniform4iv_ei (GLint loc, const int* v) { glUniform4iv(loc,1,v); } inline void glUniformMatrix2fv_ei (GLint loc, const float* v) { glUniformMatrix2fv(loc,1,false,v); } inline void glUniformMatrix3fv_ei (GLint loc, const float* v) { glUniformMatrix3fv(loc,1,false,v); } inline void glUniformMatrix4fv_ei (GLint loc, const float* v) { glUniformMatrix4fv(loc,1,false,v); } EIGEN_GL_FUNC1_DECLARATION (glUniform,GLint,const) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,float, 2,2fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,int, 2,2iv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,float, 3,3fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,int, 3,3iv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,float, 4,4fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,int, 4,4iv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 2,2,Matrix2fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 3,3,Matrix3fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 4,4,Matrix4fv_ei) #endif #ifdef GL_VERSION_2_1 inline void glUniformMatrix2x3fv_ei(GLint loc, const float* v) { glUniformMatrix2x3fv(loc,1,false,v); } inline void glUniformMatrix3x2fv_ei(GLint loc, const float* v) { glUniformMatrix3x2fv(loc,1,false,v); } inline void glUniformMatrix2x4fv_ei(GLint loc, const float* v) { glUniformMatrix2x4fv(loc,1,false,v); } inline void glUniformMatrix4x2fv_ei(GLint loc, const float* v) { glUniformMatrix4x2fv(loc,1,false,v); } inline void glUniformMatrix3x4fv_ei(GLint loc, const float* v) { glUniformMatrix3x4fv(loc,1,false,v); } inline void glUniformMatrix4x3fv_ei(GLint loc, const float* v) { glUniformMatrix4x3fv(loc,1,false,v); } EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 2,3,Matrix2x3fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 3,2,Matrix3x2fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 2,4,Matrix2x4fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 4,2,Matrix4x2fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 3,4,Matrix3x4fv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_MAT(glUniform,GLint,const,float, 4,3,Matrix4x3fv_ei) #endif #ifdef GL_VERSION_3_0 inline void glUniform2uiv_ei (GLint loc, const unsigned int* v) { glUniform2uiv(loc,1,v); } inline void glUniform3uiv_ei (GLint loc, const unsigned int* v) { glUniform3uiv(loc,1,v); } inline void glUniform4uiv_ei (GLint loc, const unsigned int* v) { glUniform4uiv(loc,1,v); } EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,unsigned int, 2,2uiv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,unsigned int, 3,3uiv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,unsigned int, 4,4uiv_ei) #endif #ifdef GL_ARB_gpu_shader_fp64 inline void glUniform2dv_ei (GLint loc, const double* v) { glUniform2dv(loc,1,v); } inline void glUniform3dv_ei (GLint loc, const double* v) { glUniform3dv(loc,1,v); } inline void glUniform4dv_ei (GLint loc, const double* v) { glUniform4dv(loc,1,v); } EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,double, 2,2dv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,double, 3,3dv_ei) EIGEN_GL_FUNC1_SPECIALIZATION_VEC(glUniform,GLint,const,double, 4,4dv_ei) #endif //@} } #endif // EIGEN_OPENGL_MODULE RcppEigen/inst/include/unsupported/Eigen/Splines0000644000176200001440000000174414562417221021517 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 20010-2011 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPLINES_MODULE_H #define EIGEN_SPLINES_MODULE_H namespace Eigen { /** * \defgroup Splines_Module Spline and spline fitting module * * This module provides a simple multi-dimensional spline class while * offering most basic functionality to fit a spline to point sets. * * \code * #include * \endcode */ } #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/Splines/SplineFwd.h" #include "src/Splines/Spline.h" #include "src/Splines/SplineFitting.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPLINES_MODULE_H RcppEigen/inst/include/unsupported/Eigen/src/0000755000176200001440000000000014107270226020735 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/KroneckerProduct/0000755000176200001440000000000014562417221024224 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h0000644000176200001440000002401214562417221031053 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Kolja Brix // Copyright (C) 2011 Andreas Platen // Copyright (C) 2012 Chen-Pang He // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef KRONECKER_TENSOR_PRODUCT_H #define KRONECKER_TENSOR_PRODUCT_H namespace Eigen { /*! * \ingroup KroneckerProduct_Module * * \brief The base class of dense and sparse Kronecker product. * * \tparam Derived is the derived type. */ template class KroneckerProductBase : public ReturnByValue { private: typedef typename internal::traits Traits; typedef typename Traits::Scalar Scalar; protected: typedef typename Traits::Lhs Lhs; typedef typename Traits::Rhs Rhs; public: /*! \brief Constructor. */ KroneckerProductBase(const Lhs& A, const Rhs& B) : m_A(A), m_B(B) {} inline Index rows() const { return m_A.rows() * m_B.rows(); } inline Index cols() const { return m_A.cols() * m_B.cols(); } /*! * This overrides ReturnByValue::coeff because this function is * efficient enough. */ Scalar coeff(Index row, Index col) const { return m_A.coeff(row / m_B.rows(), col / m_B.cols()) * m_B.coeff(row % m_B.rows(), col % m_B.cols()); } /*! * This overrides ReturnByValue::coeff because this function is * efficient enough. */ Scalar coeff(Index i) const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived); return m_A.coeff(i / m_A.size()) * m_B.coeff(i % m_A.size()); } protected: typename Lhs::Nested m_A; typename Rhs::Nested m_B; }; /*! * \ingroup KroneckerProduct_Module * * \brief Kronecker tensor product helper class for dense matrices * * This class is the return value of kroneckerProduct(MatrixBase, * MatrixBase). Use the function rather than construct this class * directly to avoid specifying template prarameters. * * \tparam Lhs Type of the left-hand side, a matrix expression. * \tparam Rhs Type of the rignt-hand side, a matrix expression. */ template class KroneckerProduct : public KroneckerProductBase > { private: typedef KroneckerProductBase Base; using Base::m_A; using Base::m_B; public: /*! \brief Constructor. */ KroneckerProduct(const Lhs& A, const Rhs& B) : Base(A, B) {} /*! \brief Evaluate the Kronecker tensor product. */ template void evalTo(Dest& dst) const; }; /*! * \ingroup KroneckerProduct_Module * * \brief Kronecker tensor product helper class for sparse matrices * * If at least one of the operands is a sparse matrix expression, * then this class is returned and evaluates into a sparse matrix. * * This class is the return value of kroneckerProduct(EigenBase, * EigenBase). Use the function rather than construct this class * directly to avoid specifying template prarameters. * * \tparam Lhs Type of the left-hand side, a matrix expression. * \tparam Rhs Type of the rignt-hand side, a matrix expression. */ template class KroneckerProductSparse : public KroneckerProductBase > { private: typedef KroneckerProductBase Base; using Base::m_A; using Base::m_B; public: /*! \brief Constructor. */ KroneckerProductSparse(const Lhs& A, const Rhs& B) : Base(A, B) {} /*! \brief Evaluate the Kronecker tensor product. */ template void evalTo(Dest& dst) const; }; template template void KroneckerProduct::evalTo(Dest& dst) const { const int BlockRows = Rhs::RowsAtCompileTime, BlockCols = Rhs::ColsAtCompileTime; const Index Br = m_B.rows(), Bc = m_B.cols(); for (Index i=0; i < m_A.rows(); ++i) for (Index j=0; j < m_A.cols(); ++j) Block(dst,i*Br,j*Bc,Br,Bc) = m_A.coeff(i,j) * m_B; } template template void KroneckerProductSparse::evalTo(Dest& dst) const { Index Br = m_B.rows(), Bc = m_B.cols(); dst.resize(this->rows(), this->cols()); dst.resizeNonZeros(0); // 1 - evaluate the operands if needed: typedef typename internal::nested_eval::type Lhs1; typedef typename internal::remove_all::type Lhs1Cleaned; const Lhs1 lhs1(m_A); typedef typename internal::nested_eval::type Rhs1; typedef typename internal::remove_all::type Rhs1Cleaned; const Rhs1 rhs1(m_B); // 2 - construct respective iterators typedef Eigen::InnerIterator LhsInnerIterator; typedef Eigen::InnerIterator RhsInnerIterator; // compute number of non-zeros per innervectors of dst { // TODO VectorXi is not necessarily big enough! VectorXi nnzA = VectorXi::Zero(Dest::IsRowMajor ? m_A.rows() : m_A.cols()); for (Index kA=0; kA < m_A.outerSize(); ++kA) for (LhsInnerIterator itA(lhs1,kA); itA; ++itA) nnzA(Dest::IsRowMajor ? itA.row() : itA.col())++; VectorXi nnzB = VectorXi::Zero(Dest::IsRowMajor ? m_B.rows() : m_B.cols()); for (Index kB=0; kB < m_B.outerSize(); ++kB) for (RhsInnerIterator itB(rhs1,kB); itB; ++itB) nnzB(Dest::IsRowMajor ? itB.row() : itB.col())++; Matrix nnzAB = nnzB * nnzA.transpose(); dst.reserve(VectorXi::Map(nnzAB.data(), nnzAB.size())); } for (Index kA=0; kA < m_A.outerSize(); ++kA) { for (Index kB=0; kB < m_B.outerSize(); ++kB) { for (LhsInnerIterator itA(lhs1,kA); itA; ++itA) { for (RhsInnerIterator itB(rhs1,kB); itB; ++itB) { Index i = itA.row() * Br + itB.row(), j = itA.col() * Bc + itB.col(); dst.insert(i,j) = itA.value() * itB.value(); } } } } } namespace internal { template struct traits > { typedef typename remove_all<_Lhs>::type Lhs; typedef typename remove_all<_Rhs>::type Rhs; typedef typename ScalarBinaryOpTraits::ReturnType Scalar; typedef typename promote_index_type::type StorageIndex; enum { Rows = size_at_compile_time::RowsAtCompileTime, traits::RowsAtCompileTime>::ret, Cols = size_at_compile_time::ColsAtCompileTime, traits::ColsAtCompileTime>::ret, MaxRows = size_at_compile_time::MaxRowsAtCompileTime, traits::MaxRowsAtCompileTime>::ret, MaxCols = size_at_compile_time::MaxColsAtCompileTime, traits::MaxColsAtCompileTime>::ret }; typedef Matrix ReturnType; }; template struct traits > { typedef MatrixXpr XprKind; typedef typename remove_all<_Lhs>::type Lhs; typedef typename remove_all<_Rhs>::type Rhs; typedef typename ScalarBinaryOpTraits::ReturnType Scalar; typedef typename cwise_promote_storage_type::StorageKind, typename traits::StorageKind, scalar_product_op >::ret StorageKind; typedef typename promote_index_type::type StorageIndex; enum { LhsFlags = Lhs::Flags, RhsFlags = Rhs::Flags, RowsAtCompileTime = size_at_compile_time::RowsAtCompileTime, traits::RowsAtCompileTime>::ret, ColsAtCompileTime = size_at_compile_time::ColsAtCompileTime, traits::ColsAtCompileTime>::ret, MaxRowsAtCompileTime = size_at_compile_time::MaxRowsAtCompileTime, traits::MaxRowsAtCompileTime>::ret, MaxColsAtCompileTime = size_at_compile_time::MaxColsAtCompileTime, traits::MaxColsAtCompileTime>::ret, EvalToRowMajor = (int(LhsFlags) & int(RhsFlags) & RowMajorBit), RemovedBits = ~(EvalToRowMajor ? 0 : RowMajorBit), Flags = ((int(LhsFlags) | int(RhsFlags)) & HereditaryBits & RemovedBits) | EvalBeforeNestingBit, CoeffReadCost = HugeCost }; typedef SparseMatrix ReturnType; }; } // end namespace internal /*! * \ingroup KroneckerProduct_Module * * Computes Kronecker tensor product of two dense matrices * * \warning If you want to replace a matrix by its Kronecker product * with some matrix, do \b NOT do this: * \code * A = kroneckerProduct(A,B); // bug!!! caused by aliasing effect * \endcode * instead, use eval() to work around this: * \code * A = kroneckerProduct(A,B).eval(); * \endcode * * \param a Dense matrix a * \param b Dense matrix b * \return Kronecker tensor product of a and b */ template KroneckerProduct kroneckerProduct(const MatrixBase& a, const MatrixBase& b) { return KroneckerProduct(a.derived(), b.derived()); } /*! * \ingroup KroneckerProduct_Module * * Computes Kronecker tensor product of two matrices, at least one of * which is sparse * * \warning If you want to replace a matrix by its Kronecker product * with some matrix, do \b NOT do this: * \code * A = kroneckerProduct(A,B); // bug!!! caused by aliasing effect * \endcode * instead, use eval() to work around this: * \code * A = kroneckerProduct(A,B).eval(); * \endcode * * \param a Dense/sparse matrix a * \param b Dense/sparse matrix b * \return Kronecker tensor product of a and b, stored in a sparse * matrix */ template KroneckerProductSparse kroneckerProduct(const EigenBase& a, const EigenBase& b) { return KroneckerProductSparse(a.derived(), b.derived()); } } // end namespace Eigen #endif // KRONECKER_TENSOR_PRODUCT_H RcppEigen/inst/include/unsupported/Eigen/src/FFT/0000755000176200001440000000000014562417221021357 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/FFT/ei_fftw_impl.h0000644000176200001440000002200714562417221024175 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Mark Borgerding mark a borgerding net // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. namespace Eigen { namespace internal { // FFTW uses non-const arguments // so we must use ugly const_cast calls for all the args it uses // // This should be safe as long as // 1. we use FFTW_ESTIMATE for all our planning // see the FFTW docs section 4.3.2 "Planner Flags" // 2. fftw_complex is compatible with std::complex // This assumes std::complex layout is array of size 2 with real,imag template inline T * fftw_cast(const T* p) { return const_cast( p); } inline fftw_complex * fftw_cast( const std::complex * p) { return const_cast( reinterpret_cast(p) ); } inline fftwf_complex * fftw_cast( const std::complex * p) { return const_cast( reinterpret_cast(p) ); } inline fftwl_complex * fftw_cast( const std::complex * p) { return const_cast( reinterpret_cast(p) ); } template struct fftw_plan {}; template <> struct fftw_plan { typedef float scalar_type; typedef fftwf_complex complex_type; fftwf_plan m_plan; fftw_plan() :m_plan(NULL) {} ~fftw_plan() {if (m_plan) fftwf_destroy_plan(m_plan);} inline void fwd(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwf_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft( m_plan, src,dst); } inline void inv(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwf_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft( m_plan, src,dst); } inline void fwd(complex_type * dst,scalar_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwf_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft_r2c( m_plan,src,dst); } inline void inv(scalar_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwf_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft_c2r( m_plan, src,dst); } inline void fwd2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftwf_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft( m_plan, src,dst); } inline void inv2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftwf_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwf_execute_dft( m_plan, src,dst); } }; template <> struct fftw_plan { typedef double scalar_type; typedef fftw_complex complex_type; ::fftw_plan m_plan; fftw_plan() :m_plan(NULL) {} ~fftw_plan() {if (m_plan) fftw_destroy_plan(m_plan);} inline void fwd(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftw_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft( m_plan, src,dst); } inline void inv(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftw_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft( m_plan, src,dst); } inline void fwd(complex_type * dst,scalar_type * src,int nfft) { if (m_plan==NULL) m_plan = fftw_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft_r2c( m_plan,src,dst); } inline void inv(scalar_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftw_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft_c2r( m_plan, src,dst); } inline void fwd2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftw_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft( m_plan, src,dst); } inline void inv2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftw_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftw_execute_dft( m_plan, src,dst); } }; template <> struct fftw_plan { typedef long double scalar_type; typedef fftwl_complex complex_type; fftwl_plan m_plan; fftw_plan() :m_plan(NULL) {} ~fftw_plan() {if (m_plan) fftwl_destroy_plan(m_plan);} inline void fwd(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwl_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft( m_plan, src,dst); } inline void inv(complex_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwl_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft( m_plan, src,dst); } inline void fwd(complex_type * dst,scalar_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwl_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft_r2c( m_plan,src,dst); } inline void inv(scalar_type * dst,complex_type * src,int nfft) { if (m_plan==NULL) m_plan = fftwl_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft_c2r( m_plan, src,dst); } inline void fwd2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftwl_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft( m_plan, src,dst); } inline void inv2( complex_type * dst,complex_type * src,int n0,int n1) { if (m_plan==NULL) m_plan = fftwl_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT); fftwl_execute_dft( m_plan, src,dst); } }; template struct fftw_impl { typedef _Scalar Scalar; typedef std::complex Complex; inline void clear() { m_plans.clear(); } // complex-to-complex forward FFT inline void fwd( Complex * dst,const Complex *src,int nfft) { get_plan(nfft,false,dst,src).fwd(fftw_cast(dst), fftw_cast(src),nfft ); } // real-to-complex forward FFT inline void fwd( Complex * dst,const Scalar * src,int nfft) { get_plan(nfft,false,dst,src).fwd(fftw_cast(dst), fftw_cast(src) ,nfft); } // 2-d complex-to-complex inline void fwd2(Complex * dst, const Complex * src, int n0,int n1) { get_plan(n0,n1,false,dst,src).fwd2(fftw_cast(dst), fftw_cast(src) ,n0,n1); } // inverse complex-to-complex inline void inv(Complex * dst,const Complex *src,int nfft) { get_plan(nfft,true,dst,src).inv(fftw_cast(dst), fftw_cast(src),nfft ); } // half-complex to scalar inline void inv( Scalar * dst,const Complex * src,int nfft) { get_plan(nfft,true,dst,src).inv(fftw_cast(dst), fftw_cast(src),nfft ); } // 2-d complex-to-complex inline void inv2(Complex * dst, const Complex * src, int n0,int n1) { get_plan(n0,n1,true,dst,src).inv2(fftw_cast(dst), fftw_cast(src) ,n0,n1); } protected: typedef fftw_plan PlanData; typedef Eigen::numext::int64_t int64_t; typedef std::map PlanMap; PlanMap m_plans; inline PlanData & get_plan(int nfft,bool inverse,void * dst,const void * src) { bool inplace = (dst==src); bool aligned = ( (reinterpret_cast(src)&15) | (reinterpret_cast(dst)&15) ) == 0; int64_t key = ( (nfft<<3 ) | (inverse<<2) | (inplace<<1) | aligned ) << 1; return m_plans[key]; } inline PlanData & get_plan(int n0,int n1,bool inverse,void * dst,const void * src) { bool inplace = (dst==src); bool aligned = ( (reinterpret_cast(src)&15) | (reinterpret_cast(dst)&15) ) == 0; int64_t key = ( ( (((int64_t)n0) << 30)|(n1<<3 ) | (inverse<<2) | (inplace<<1) | aligned ) << 1 ) + 1; return m_plans[key]; } }; } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/FFT/ei_kissfft_impl.h0000644000176200001440000003165714562417221024713 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Mark Borgerding mark a borgerding net // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. namespace Eigen { namespace internal { // This FFT implementation was derived from kissfft http:sourceforge.net/projects/kissfft // Copyright 2003-2009 Mark Borgerding template struct kiss_cpx_fft { typedef _Scalar Scalar; typedef std::complex Complex; std::vector m_twiddles; std::vector m_stageRadix; std::vector m_stageRemainder; std::vector m_scratchBuf; bool m_inverse; inline void make_twiddles(int nfft, bool inverse) { using numext::sin; using numext::cos; m_inverse = inverse; m_twiddles.resize(nfft); double phinc = 0.25 * double(EIGEN_PI) / nfft; Scalar flip = inverse ? Scalar(1) : Scalar(-1); m_twiddles[0] = Complex(Scalar(1), Scalar(0)); if ((nfft&1)==0) m_twiddles[nfft/2] = Complex(Scalar(-1), Scalar(0)); int i=1; for (;i*8n) p=n;// impossible to have a factor > sqrt(n) } n /= p; m_stageRadix.push_back(p); m_stageRemainder.push_back(n); if ( p > 5 ) m_scratchBuf.resize(p); // scratchbuf will be needed in bfly_generic }while(n>1); } template inline void work( int stage,Complex * xout, const _Src * xin, size_t fstride,size_t in_stride) { int p = m_stageRadix[stage]; int m = m_stageRemainder[stage]; Complex * Fout_beg = xout; Complex * Fout_end = xout + p*m; if (m>1) { do{ // recursive call: // DFT of size m*p performed by doing // p instances of smaller DFTs of size m, // each one takes a decimated version of the input work(stage+1, xout , xin, fstride*p,in_stride); xin += fstride*in_stride; }while( (xout += m) != Fout_end ); }else{ do{ *xout = *xin; xin += fstride*in_stride; }while(++xout != Fout_end ); } xout=Fout_beg; // recombine the p smaller DFTs switch (p) { case 2: bfly2(xout,fstride,m); break; case 3: bfly3(xout,fstride,m); break; case 4: bfly4(xout,fstride,m); break; case 5: bfly5(xout,fstride,m); break; default: bfly_generic(xout,fstride,m,p); break; } } inline void bfly2( Complex * Fout, const size_t fstride, int m) { for (int k=0;kreal() - Scalar(.5)*scratch[3].real() , Fout->imag() - Scalar(.5)*scratch[3].imag() ); scratch[0] *= epi3.imag(); *Fout += scratch[3]; Fout[m2] = Complex( Fout[m].real() + scratch[0].imag() , Fout[m].imag() - scratch[0].real() ); Fout[m] += Complex( -scratch[0].imag(),scratch[0].real() ); ++Fout; }while(--k); } inline void bfly5( Complex * Fout, const size_t fstride, const size_t m) { Complex *Fout0,*Fout1,*Fout2,*Fout3,*Fout4; size_t u; Complex scratch[13]; Complex * twiddles = &m_twiddles[0]; Complex *tw; Complex ya,yb; ya = twiddles[fstride*m]; yb = twiddles[fstride*2*m]; Fout0=Fout; Fout1=Fout0+m; Fout2=Fout0+2*m; Fout3=Fout0+3*m; Fout4=Fout0+4*m; tw=twiddles; for ( u=0; u(m_twiddles.size()); Complex * scratchbuf = &m_scratchBuf[0]; for ( u=0; u(fstride) * k; if (twidx>=Norig) twidx-=Norig; t=scratchbuf[q] * twiddles[twidx]; Fout[ k ] += t; } k += m; } } } }; template struct kissfft_impl { typedef _Scalar Scalar; typedef std::complex Complex; void clear() { m_plans.clear(); m_realTwiddles.clear(); } inline void fwd( Complex * dst,const Complex *src,int nfft) { get_plan(nfft,false).work(0, dst, src, 1,1); } inline void fwd2( Complex * dst,const Complex *src,int n0,int n1) { EIGEN_UNUSED_VARIABLE(dst); EIGEN_UNUSED_VARIABLE(src); EIGEN_UNUSED_VARIABLE(n0); EIGEN_UNUSED_VARIABLE(n1); } inline void inv2( Complex * dst,const Complex *src,int n0,int n1) { EIGEN_UNUSED_VARIABLE(dst); EIGEN_UNUSED_VARIABLE(src); EIGEN_UNUSED_VARIABLE(n0); EIGEN_UNUSED_VARIABLE(n1); } // real-to-complex forward FFT // perform two FFTs of src even and src odd // then twiddle to recombine them into the half-spectrum format // then fill in the conjugate symmetric half inline void fwd( Complex * dst,const Scalar * src,int nfft) { if ( nfft&3 ) { // use generic mode for odd m_tmpBuf1.resize(nfft); get_plan(nfft,false).work(0, &m_tmpBuf1[0], src, 1,1); std::copy(m_tmpBuf1.begin(),m_tmpBuf1.begin()+(nfft>>1)+1,dst ); }else{ int ncfft = nfft>>1; int ncfft2 = nfft>>2; Complex * rtw = real_twiddles(ncfft2); // use optimized mode for even real fwd( dst, reinterpret_cast (src), ncfft); Complex dc(dst[0].real() + dst[0].imag()); Complex nyquist(dst[0].real() - dst[0].imag()); int k; for ( k=1;k <= ncfft2 ; ++k ) { Complex fpk = dst[k]; Complex fpnk = conj(dst[ncfft-k]); Complex f1k = fpk + fpnk; Complex f2k = fpk - fpnk; Complex tw= f2k * rtw[k-1]; dst[k] = (f1k + tw) * Scalar(.5); dst[ncfft-k] = conj(f1k -tw)*Scalar(.5); } dst[0] = dc; dst[ncfft] = nyquist; } } // inverse complex-to-complex inline void inv(Complex * dst,const Complex *src,int nfft) { get_plan(nfft,true).work(0, dst, src, 1,1); } // half-complex to scalar inline void inv( Scalar * dst,const Complex * src,int nfft) { if (nfft&3) { m_tmpBuf1.resize(nfft); m_tmpBuf2.resize(nfft); std::copy(src,src+(nfft>>1)+1,m_tmpBuf1.begin() ); for (int k=1;k<(nfft>>1)+1;++k) m_tmpBuf1[nfft-k] = conj(m_tmpBuf1[k]); inv(&m_tmpBuf2[0],&m_tmpBuf1[0],nfft); for (int k=0;k>1; int ncfft2 = nfft>>2; Complex * rtw = real_twiddles(ncfft2); m_tmpBuf1.resize(ncfft); m_tmpBuf1[0] = Complex( src[0].real() + src[ncfft].real(), src[0].real() - src[ncfft].real() ); for (int k = 1; k <= ncfft / 2; ++k) { Complex fk = src[k]; Complex fnkc = conj(src[ncfft-k]); Complex fek = fk + fnkc; Complex tmp = fk - fnkc; Complex fok = tmp * conj(rtw[k-1]); m_tmpBuf1[k] = fek + fok; m_tmpBuf1[ncfft-k] = conj(fek - fok); } get_plan(ncfft,true).work(0, reinterpret_cast(dst), &m_tmpBuf1[0], 1,1); } } protected: typedef kiss_cpx_fft PlanData; typedef std::map PlanMap; PlanMap m_plans; std::map > m_realTwiddles; std::vector m_tmpBuf1; std::vector m_tmpBuf2; inline int PlanKey(int nfft, bool isinverse) const { return (nfft<<1) | int(isinverse); } inline PlanData & get_plan(int nfft, bool inverse) { // TODO look for PlanKey(nfft, ! inverse) and conjugate the twiddles PlanData & pd = m_plans[ PlanKey(nfft,inverse) ]; if ( pd.m_twiddles.size() == 0 ) { pd.make_twiddles(nfft,inverse); pd.factorize(nfft); } return pd; } inline Complex * real_twiddles(int ncfft2) { using std::acos; std::vector & twidref = m_realTwiddles[ncfft2];// creates new if not there if ( (int)twidref.size() != ncfft2 ) { twidref.resize(ncfft2); int ncfft= ncfft2<<1; Scalar pi = acos( Scalar(-1) ); for (int k=1;k<=ncfft2;++k) twidref[k-1] = exp( Complex(0,-pi * (Scalar(k) / ncfft + Scalar(.5)) ) ); } return &twidref[0]; } }; } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/0000755000176200001440000000000014562417221024211 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsHalf.h0000644000176200001440000000476014562417221030272 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSELFUNCTIONS_HALF_H #define EIGEN_BESSELFUNCTIONS_HALF_H namespace Eigen { namespace numext { #if EIGEN_HAS_C99_MATH template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_i0(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_i0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_i0e(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_i0e(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_i1(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_i1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_i1e(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_i1e(static_cast(x))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_j0(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_j0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_j1(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_j1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_y0(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_y0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_y1(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_y1(static_cast(x))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_k0(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_k0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_k0e(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_k0e(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_k1(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_k1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bessel_k1e(const Eigen::half& x) { return Eigen::half(Eigen::numext::bessel_k1e(static_cast(x))); } #endif } // end namespace numext } // end namespace Eigen #endif // EIGEN_BESSELFUNCTIONS_HALF_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsPacketMath.h0000644000176200001440000000720114562417221031575 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_PACKETMATH_H #define EIGEN_SPECIALFUNCTIONS_PACKETMATH_H namespace Eigen { namespace internal { /** \internal \returns the ln(|gamma(\a a)|) (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet plgamma(const Packet& a) { using numext::lgamma; return lgamma(a); } /** \internal \returns the derivative of lgamma, psi(\a a) (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pdigamma(const Packet& a) { using numext::digamma; return digamma(a); } /** \internal \returns the zeta function of two arguments (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pzeta(const Packet& x, const Packet& q) { using numext::zeta; return zeta(x, q); } /** \internal \returns the polygamma function (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet ppolygamma(const Packet& n, const Packet& x) { using numext::polygamma; return polygamma(n, x); } /** \internal \returns the erf(\a a) (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet perf(const Packet& a) { using numext::erf; return erf(a); } /** \internal \returns the erfc(\a a) (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet perfc(const Packet& a) { using numext::erfc; return erfc(a); } /** \internal \returns the ndtri(\a a) (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pndtri(const Packet& a) { typedef typename unpacket_traits::type ScalarType; using internal::generic_ndtri; return generic_ndtri(a); } /** \internal \returns the incomplete gamma function igamma(\a a, \a x) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pigamma(const Packet& a, const Packet& x) { using numext::igamma; return igamma(a, x); } /** \internal \returns the derivative of the incomplete gamma function * igamma_der_a(\a a, \a x) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pigamma_der_a(const Packet& a, const Packet& x) { using numext::igamma_der_a; return igamma_der_a(a, x); } /** \internal \returns compute the derivative of the sample * of Gamma(alpha, 1) random variable with respect to the parameter a * gamma_sample_der_alpha(\a alpha, \a sample) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pgamma_sample_der_alpha(const Packet& alpha, const Packet& sample) { using numext::gamma_sample_der_alpha; return gamma_sample_der_alpha(alpha, sample); } /** \internal \returns the complementary incomplete gamma function igammac(\a a, \a x) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pigammac(const Packet& a, const Packet& x) { using numext::igammac; return igammac(a, x); } /** \internal \returns the complementary incomplete gamma function betainc(\a a, \a b, \a x) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pbetainc(const Packet& a, const Packet& b,const Packet& x) { using numext::betainc; return betainc(a, b, x); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_SPECIALFUNCTIONS_PACKETMATH_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/0000755000176200001440000000000014562417221025126 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX/0000755000176200001440000000000014562417221025564 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX/SpecialFunctions.h0000644000176200001440000000061614562417221031211 0ustar liggesusers#ifndef EIGEN_AVX_SPECIALFUNCTIONS_H #define EIGEN_AVX_SPECIALFUNCTIONS_H namespace Eigen { namespace internal { F16_PACKET_FUNCTION(Packet8f, Packet8h, perf) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, perf) F16_PACKET_FUNCTION(Packet8f, Packet8h, pndtri) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pndtri) } // namespace internal } // namespace Eigen #endif // EIGEN_AVX_SPECIAL_FUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX/BesselFunctions.h0000644000176200001440000000272414562417221031050 0ustar liggesusers#ifndef EIGEN_AVX_BESSELFUNCTIONS_H #define EIGEN_AVX_BESSELFUNCTIONS_H namespace Eigen { namespace internal { F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_i0) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_i0) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_i0e) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_i0e) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_i1) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_i1) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_i1e) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_i1e) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_j0) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_j0) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_j1) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_j1) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_k0) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_k0) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_k0e) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_k0e) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_k1) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_k1) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_k1e) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_k1e) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_y0) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_y0) F16_PACKET_FUNCTION(Packet8f, Packet8h, pbessel_y1) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pbessel_y1) } // namespace internal } // namespace Eigen #endif // EIGEN_AVX_BESSELFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/GPU/0000755000176200001440000000000014562417221025561 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/GPU/SpecialFunctions.h0000644000176200001440000002516014562417221031207 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GPU_SPECIALFUNCTIONS_H #define EIGEN_GPU_SPECIALFUNCTIONS_H namespace Eigen { namespace internal { // Make sure this is only available when targeting a GPU: we don't want to // introduce conflicts between these packet_traits definitions and the ones // we'll use on the host side (SSE, AVX, ...) #if defined(EIGEN_GPUCC) && defined(EIGEN_USE_GPU) template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plgamma(const float4& a) { return make_float4(lgammaf(a.x), lgammaf(a.y), lgammaf(a.z), lgammaf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plgamma(const double2& a) { using numext::lgamma; return make_double2(lgamma(a.x), lgamma(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pdigamma(const float4& a) { using numext::digamma; return make_float4(digamma(a.x), digamma(a.y), digamma(a.z), digamma(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pdigamma(const double2& a) { using numext::digamma; return make_double2(digamma(a.x), digamma(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pzeta(const float4& x, const float4& q) { using numext::zeta; return make_float4(zeta(x.x, q.x), zeta(x.y, q.y), zeta(x.z, q.z), zeta(x.w, q.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pzeta(const double2& x, const double2& q) { using numext::zeta; return make_double2(zeta(x.x, q.x), zeta(x.y, q.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ppolygamma(const float4& n, const float4& x) { using numext::polygamma; return make_float4(polygamma(n.x, x.x), polygamma(n.y, x.y), polygamma(n.z, x.z), polygamma(n.w, x.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ppolygamma(const double2& n, const double2& x) { using numext::polygamma; return make_double2(polygamma(n.x, x.x), polygamma(n.y, x.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 perf(const float4& a) { return make_float4(erff(a.x), erff(a.y), erff(a.z), erff(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 perf(const double2& a) { using numext::erf; return make_double2(erf(a.x), erf(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 perfc(const float4& a) { using numext::erfc; return make_float4(erfc(a.x), erfc(a.y), erfc(a.z), erfc(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 perfc(const double2& a) { using numext::erfc; return make_double2(erfc(a.x), erfc(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pndtri(const float4& a) { using numext::ndtri; return make_float4(ndtri(a.x), ndtri(a.y), ndtri(a.z), ndtri(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pndtri(const double2& a) { using numext::ndtri; return make_double2(ndtri(a.x), ndtri(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pigamma(const float4& a, const float4& x) { using numext::igamma; return make_float4( igamma(a.x, x.x), igamma(a.y, x.y), igamma(a.z, x.z), igamma(a.w, x.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pigamma(const double2& a, const double2& x) { using numext::igamma; return make_double2(igamma(a.x, x.x), igamma(a.y, x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pigamma_der_a( const float4& a, const float4& x) { using numext::igamma_der_a; return make_float4(igamma_der_a(a.x, x.x), igamma_der_a(a.y, x.y), igamma_der_a(a.z, x.z), igamma_der_a(a.w, x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pigamma_der_a(const double2& a, const double2& x) { using numext::igamma_der_a; return make_double2(igamma_der_a(a.x, x.x), igamma_der_a(a.y, x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pgamma_sample_der_alpha( const float4& alpha, const float4& sample) { using numext::gamma_sample_der_alpha; return make_float4( gamma_sample_der_alpha(alpha.x, sample.x), gamma_sample_der_alpha(alpha.y, sample.y), gamma_sample_der_alpha(alpha.z, sample.z), gamma_sample_der_alpha(alpha.w, sample.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pgamma_sample_der_alpha(const double2& alpha, const double2& sample) { using numext::gamma_sample_der_alpha; return make_double2( gamma_sample_der_alpha(alpha.x, sample.x), gamma_sample_der_alpha(alpha.y, sample.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pigammac(const float4& a, const float4& x) { using numext::igammac; return make_float4( igammac(a.x, x.x), igammac(a.y, x.y), igammac(a.z, x.z), igammac(a.w, x.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pigammac(const double2& a, const double2& x) { using numext::igammac; return make_double2(igammac(a.x, x.x), igammac(a.y, x.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbetainc(const float4& a, const float4& b, const float4& x) { using numext::betainc; return make_float4( betainc(a.x, b.x, x.x), betainc(a.y, b.y, x.y), betainc(a.z, b.z, x.z), betainc(a.w, b.w, x.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbetainc(const double2& a, const double2& b, const double2& x) { using numext::betainc; return make_double2(betainc(a.x, b.x, x.x), betainc(a.y, b.y, x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_i0e(const float4& x) { using numext::bessel_i0e; return make_float4(bessel_i0e(x.x), bessel_i0e(x.y), bessel_i0e(x.z), bessel_i0e(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_i0e(const double2& x) { using numext::bessel_i0e; return make_double2(bessel_i0e(x.x), bessel_i0e(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_i0(const float4& x) { using numext::bessel_i0; return make_float4(bessel_i0(x.x), bessel_i0(x.y), bessel_i0(x.z), bessel_i0(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_i0(const double2& x) { using numext::bessel_i0; return make_double2(bessel_i0(x.x), bessel_i0(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_i1e(const float4& x) { using numext::bessel_i1e; return make_float4(bessel_i1e(x.x), bessel_i1e(x.y), bessel_i1e(x.z), bessel_i1e(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_i1e(const double2& x) { using numext::bessel_i1e; return make_double2(bessel_i1e(x.x), bessel_i1e(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_i1(const float4& x) { using numext::bessel_i1; return make_float4(bessel_i1(x.x), bessel_i1(x.y), bessel_i1(x.z), bessel_i1(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_i1(const double2& x) { using numext::bessel_i1; return make_double2(bessel_i1(x.x), bessel_i1(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_k0e(const float4& x) { using numext::bessel_k0e; return make_float4(bessel_k0e(x.x), bessel_k0e(x.y), bessel_k0e(x.z), bessel_k0e(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_k0e(const double2& x) { using numext::bessel_k0e; return make_double2(bessel_k0e(x.x), bessel_k0e(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_k0(const float4& x) { using numext::bessel_k0; return make_float4(bessel_k0(x.x), bessel_k0(x.y), bessel_k0(x.z), bessel_k0(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_k0(const double2& x) { using numext::bessel_k0; return make_double2(bessel_k0(x.x), bessel_k0(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_k1e(const float4& x) { using numext::bessel_k1e; return make_float4(bessel_k1e(x.x), bessel_k1e(x.y), bessel_k1e(x.z), bessel_k1e(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_k1e(const double2& x) { using numext::bessel_k1e; return make_double2(bessel_k1e(x.x), bessel_k1e(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_k1(const float4& x) { using numext::bessel_k1; return make_float4(bessel_k1(x.x), bessel_k1(x.y), bessel_k1(x.z), bessel_k1(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_k1(const double2& x) { using numext::bessel_k1; return make_double2(bessel_k1(x.x), bessel_k1(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_j0(const float4& x) { using numext::bessel_j0; return make_float4(bessel_j0(x.x), bessel_j0(x.y), bessel_j0(x.z), bessel_j0(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_j0(const double2& x) { using numext::bessel_j0; return make_double2(bessel_j0(x.x), bessel_j0(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_j1(const float4& x) { using numext::bessel_j1; return make_float4(bessel_j1(x.x), bessel_j1(x.y), bessel_j1(x.z), bessel_j1(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_j1(const double2& x) { using numext::bessel_j1; return make_double2(bessel_j1(x.x), bessel_j1(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_y0(const float4& x) { using numext::bessel_y0; return make_float4(bessel_y0(x.x), bessel_y0(x.y), bessel_y0(x.z), bessel_y0(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_y0(const double2& x) { using numext::bessel_y0; return make_double2(bessel_y0(x.x), bessel_y0(x.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pbessel_y1(const float4& x) { using numext::bessel_y1; return make_float4(bessel_y1(x.x), bessel_y1(x.y), bessel_y1(x.z), bessel_y1(x.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pbessel_y1(const double2& x) { using numext::bessel_y1; return make_double2(bessel_y1(x.x), bessel_y1(x.y)); } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_GPU_SPECIALFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX512/0000755000176200001440000000000014562417221026014 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX512/SpecialFunctions.h0000644000176200001440000000063714562417221031444 0ustar liggesusers#ifndef EIGEN_AVX512_SPECIALFUNCTIONS_H #define EIGEN_AVX512_SPECIALFUNCTIONS_H namespace Eigen { namespace internal { F16_PACKET_FUNCTION(Packet16f, Packet16h, perf) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, perf) F16_PACKET_FUNCTION(Packet16f, Packet16h, pndtri) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pndtri) } // namespace internal } // namespace Eigen #endif // EIGEN_AVX512_SPECIAL_FUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/AVX512/BesselFunctions.h0000644000176200001440000000301514562417221031272 0ustar liggesusers#ifndef EIGEN_AVX512_BESSELFUNCTIONS_H #define EIGEN_AVX512_BESSELFUNCTIONS_H namespace Eigen { namespace internal { F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_i0) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_i0) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_i0e) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_i0e) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_i1) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_i1) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_i1e) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_i1e) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_j0) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_j0) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_j1) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_j1) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_k0) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_k0) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_k0e) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_k0e) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_k1) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_k1) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_k1e) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_k1e) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_y0) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_y0) F16_PACKET_FUNCTION(Packet16f, Packet16h, pbessel_y1) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pbessel_y1) } // namespace internal } // namespace Eigen #endif // EIGEN_AVX512_BESSELFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/NEON/0000755000176200001440000000000014562417221025665 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/NEON/SpecialFunctions.h0000644000176200001440000000240314562417221031306 0ustar liggesusers#ifndef EIGEN_NEON_SPECIALFUNCTIONS_H #define EIGEN_NEON_SPECIALFUNCTIONS_H namespace Eigen { namespace internal { #if EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC #define NEON_HALF_TO_FLOAT_FUNCTIONS(METHOD) \ template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ Packet8hf METHOD(const Packet8hf& x) { \ const Packet4f lo = METHOD(vcvt_f32_f16(vget_low_f16(x))); \ const Packet4f hi = METHOD(vcvt_f32_f16(vget_high_f16(x))); \ return vcombine_f16(vcvt_f16_f32(lo), vcvt_f16_f32(hi)); \ } \ \ template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ Packet4hf METHOD(const Packet4hf& x) { \ return vcvt_f16_f32(METHOD(vcvt_f32_f16(x))); \ } NEON_HALF_TO_FLOAT_FUNCTIONS(perf) NEON_HALF_TO_FLOAT_FUNCTIONS(pndtri) #undef NEON_HALF_TO_FLOAT_FUNCTIONS #endif BF16_PACKET_FUNCTION(Packet4f, Packet4bf, perf) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pndtri) } // namespace internal } // namespace Eigen #endif // EIGEN_NEON_SPECIALFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/arch/NEON/BesselFunctions.h0000644000176200001440000000432214562417221031145 0ustar liggesusers#ifndef EIGEN_NEON_BESSELFUNCTIONS_H #define EIGEN_NEON_BESSELFUNCTIONS_H namespace Eigen { namespace internal { #if EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC #define NEON_HALF_TO_FLOAT_FUNCTIONS(METHOD) \ template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ Packet8hf METHOD(const Packet8hf& x) { \ const Packet4f lo = METHOD(vcvt_f32_f16(vget_low_f16(x))); \ const Packet4f hi = METHOD(vcvt_f32_f16(vget_high_f16(x))); \ return vcombine_f16(vcvt_f16_f32(lo), vcvt_f16_f32(hi)); \ } \ \ template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ Packet4hf METHOD(const Packet4hf& x) { \ return vcvt_f16_f32(METHOD(vcvt_f32_f16(x))); \ } NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_i0) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_i0e) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_i1) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_i1e) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_j0) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_j1) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_k0) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_k0e) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_k1) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_k1e) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_y0) NEON_HALF_TO_FLOAT_FUNCTIONS(pbessel_y1) #undef NEON_HALF_TO_FLOAT_FUNCTIONS #endif BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_i0) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_i0e) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_i1) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_i1e) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_j0) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_j1) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_k0) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_k0e) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_k1) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_k1e) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_y0) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pbessel_y1) } // namespace internal } // namespace Eigen #endif // EIGEN_NEON_BESSELFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsFunctors.h0000644000176200001440000003054114562417221031217 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Eugene Brevdo // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSELFUNCTIONS_FUNCTORS_H #define EIGEN_BESSELFUNCTIONS_FUNCTORS_H namespace Eigen { namespace internal { /** \internal * \brief Template functor to compute the modified Bessel function of the first * kind of order zero. * \sa class CwiseUnaryOp, Cwise::bessel_i0() */ template struct scalar_bessel_i0_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_i0_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_i0; return bessel_i0(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_i0(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=20 is computed. // The cost is N multiplications and 2N additions. We also add // the cost of an additional exp over i0e. Cost = 28 * NumTraits::MulCost + 48 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the exponentially scaled modified Bessel * function of the first kind of order zero * \sa class CwiseUnaryOp, Cwise::bessel_i0e() */ template struct scalar_bessel_i0e_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_i0e_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_i0e; return bessel_i0e(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_i0e(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=20 is computed. // The cost is N multiplications and 2N additions. Cost = 20 * NumTraits::MulCost + 40 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the modified Bessel function of the first * kind of order one * \sa class CwiseUnaryOp, Cwise::bessel_i1() */ template struct scalar_bessel_i1_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_i1_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_i1; return bessel_i1(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_i1(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=20 is computed. // The cost is N multiplications and 2N additions. We also add // the cost of an additional exp over i1e. Cost = 28 * NumTraits::MulCost + 48 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the exponentially scaled modified Bessel * function of the first kind of order zero * \sa class CwiseUnaryOp, Cwise::bessel_i1e() */ template struct scalar_bessel_i1e_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_i1e_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_i1e; return bessel_i1e(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_i1e(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=20 is computed. // The cost is N multiplications and 2N additions. Cost = 20 * NumTraits::MulCost + 40 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the Bessel function of the second kind of * order zero * \sa class CwiseUnaryOp, Cwise::bessel_j0() */ template struct scalar_bessel_j0_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_j0_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_j0; return bessel_j0(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_j0(x); } }; template struct functor_traits > { enum { // 6 polynomial of order ~N=8 is computed. // The cost is N multiplications and N additions each, along with a // sine, cosine and rsqrt cost. Cost = 63 * NumTraits::MulCost + 48 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the Bessel function of the second kind of * order zero * \sa class CwiseUnaryOp, Cwise::bessel_y0() */ template struct scalar_bessel_y0_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_y0_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_y0; return bessel_y0(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_y0(x); } }; template struct functor_traits > { enum { // 6 polynomial of order ~N=8 is computed. // The cost is N multiplications and N additions each, along with a // sine, cosine, rsqrt and j0 cost. Cost = 126 * NumTraits::MulCost + 96 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the Bessel function of the first kind of * order one * \sa class CwiseUnaryOp, Cwise::bessel_j1() */ template struct scalar_bessel_j1_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_j1_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_j1; return bessel_j1(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_j1(x); } }; template struct functor_traits > { enum { // 6 polynomial of order ~N=8 is computed. // The cost is N multiplications and N additions each, along with a // sine, cosine and rsqrt cost. Cost = 63 * NumTraits::MulCost + 48 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the Bessel function of the second kind of * order one * \sa class CwiseUnaryOp, Cwise::bessel_j1e() */ template struct scalar_bessel_y1_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_y1_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_y1; return bessel_y1(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_y1(x); } }; template struct functor_traits > { enum { // 6 polynomial of order ~N=8 is computed. // The cost is N multiplications and N additions each, along with a // sine, cosine, rsqrt and j1 cost. Cost = 126 * NumTraits::MulCost + 96 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the modified Bessel function of the second * kind of order zero * \sa class CwiseUnaryOp, Cwise::bessel_k0() */ template struct scalar_bessel_k0_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_k0_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_k0; return bessel_k0(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_k0(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=10 is computed. // The cost is N multiplications and 2N additions. In addition we compute // i0, a log, exp and prsqrt and sin and cos. Cost = 68 * NumTraits::MulCost + 88 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the exponentially scaled modified Bessel * function of the second kind of order zero * \sa class CwiseUnaryOp, Cwise::bessel_k0e() */ template struct scalar_bessel_k0e_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_k0e_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_k0e; return bessel_k0e(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_k0e(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=10 is computed. // The cost is N multiplications and 2N additions. In addition we compute // i0, a log, exp and prsqrt and sin and cos. Cost = 68 * NumTraits::MulCost + 88 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the modified Bessel function of the * second kind of order one * \sa class CwiseUnaryOp, Cwise::bessel_k1() */ template struct scalar_bessel_k1_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_k1_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_k1; return bessel_k1(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_k1(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=10 is computed. // The cost is N multiplications and 2N additions. In addition we compute // i1, a log, exp and prsqrt and sin and cos. Cost = 68 * NumTraits::MulCost + 88 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; /** \internal * \brief Template functor to compute the exponentially scaled modified Bessel * function of the second kind of order one * \sa class CwiseUnaryOp, Cwise::bessel_k1e() */ template struct scalar_bessel_k1e_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_bessel_k1e_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { using numext::bessel_k1e; return bessel_k1e(x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return internal::pbessel_k1e(x); } }; template struct functor_traits > { enum { // On average, a Chebyshev polynomial of order N=10 is computed. // The cost is N multiplications and 2N additions. In addition we compute // i1, a log, exp and prsqrt and sin and cos. Cost = 68 * NumTraits::MulCost + 88 * NumTraits::AddCost, PacketAccess = packet_traits::HasBessel }; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_BESSELFUNCTIONS_FUNCTORS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/HipVectorCompatibility.h0000644000176200001440000000467114562417221031027 0ustar liggesusers#ifndef HIP_VECTOR_COMPATIBILITY_H #define HIP_VECTOR_COMPATIBILITY_H namespace hip_impl { template struct Scalar_accessor; } // end namespace hip_impl namespace Eigen { namespace internal { #define HIP_SCALAR_ACCESSOR_BUILDER(NAME) \ template \ struct NAME > : NAME {}; #define HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(NAME) \ template \ struct NAME##_impl > : NAME##_impl {}; \ template \ struct NAME##_retval > : NAME##_retval {}; #define HIP_SCALAR_ACCESSOR_BUILDER_IGAMMA(NAME) \ template \ struct NAME , mode> : NAME {}; #if EIGEN_HAS_C99_MATH HIP_SCALAR_ACCESSOR_BUILDER(betainc_helper) HIP_SCALAR_ACCESSOR_BUILDER(incbeta_cfe) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(erf) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(erfc) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(igammac) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(lgamma) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(ndtri) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(polygamma) HIP_SCALAR_ACCESSOR_BUILDER_IGAMMA(igamma_generic_impl) #endif HIP_SCALAR_ACCESSOR_BUILDER(digamma_impl_maybe_poly) HIP_SCALAR_ACCESSOR_BUILDER(zeta_impl_series) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_i0) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_i0e) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_i1) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_i1e) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_j0) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_j1) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_k0) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_k0e) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_k1) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_k1e) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_y0) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(bessel_y1) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(betainc) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(digamma) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(gamma_sample_der_alpha) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(igamma_der_a) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(igamma) HIP_SCALAR_ACCESSOR_BUILDER_RETVAL(zeta) HIP_SCALAR_ACCESSOR_BUILDER_IGAMMA(igamma_series_impl) HIP_SCALAR_ACCESSOR_BUILDER_IGAMMA(igammac_cf_impl) } // end namespace internal } // end namespace Eigen #endif // HIP_VECTOR_COMPATIBILITY_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsHalf.h0000644000176200001440000000552314562417221030433 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_HALF_H #define EIGEN_SPECIALFUNCTIONS_HALF_H namespace Eigen { namespace numext { #if EIGEN_HAS_C99_MATH template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half lgamma(const Eigen::half& a) { return Eigen::half(Eigen::numext::lgamma(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half digamma(const Eigen::half& a) { return Eigen::half(Eigen::numext::digamma(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half zeta(const Eigen::half& x, const Eigen::half& q) { return Eigen::half(Eigen::numext::zeta(static_cast(x), static_cast(q))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half polygamma(const Eigen::half& n, const Eigen::half& x) { return Eigen::half(Eigen::numext::polygamma(static_cast(n), static_cast(x))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half erf(const Eigen::half& a) { return Eigen::half(Eigen::numext::erf(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half erfc(const Eigen::half& a) { return Eigen::half(Eigen::numext::erfc(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half ndtri(const Eigen::half& a) { return Eigen::half(Eigen::numext::ndtri(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half igamma(const Eigen::half& a, const Eigen::half& x) { return Eigen::half(Eigen::numext::igamma(static_cast(a), static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half igamma_der_a(const Eigen::half& a, const Eigen::half& x) { return Eigen::half(Eigen::numext::igamma_der_a(static_cast(a), static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half gamma_sample_der_alpha(const Eigen::half& alpha, const Eigen::half& sample) { return Eigen::half(Eigen::numext::gamma_sample_der_alpha(static_cast(alpha), static_cast(sample))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half igammac(const Eigen::half& a, const Eigen::half& x) { return Eigen::half(Eigen::numext::igammac(static_cast(a), static_cast(x))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half betainc(const Eigen::half& a, const Eigen::half& b, const Eigen::half& x) { return Eigen::half(Eigen::numext::betainc(static_cast(a), static_cast(b), static_cast(x))); } #endif } // end namespace numext } // end namespace Eigen #endif // EIGEN_SPECIALFUNCTIONS_HALF_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsArrayAPI.h0000644000176200001440000001701614562417221031171 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_ARRAYAPI_H #define EIGEN_SPECIALFUNCTIONS_ARRAYAPI_H namespace Eigen { /** \cpp11 \returns an expression of the coefficient-wise igamma(\a a, \a x) to the given arrays. * * This function computes the coefficient-wise incomplete gamma function. * * \note This function supports only float and double scalar types in c++11 mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations of igammac(T,T) for any scalar * type T to be supported. * * \sa Eigen::igammac(), Eigen::lgamma() */ template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const Derived, const ExponentDerived> igamma(const Eigen::ArrayBase& a, const Eigen::ArrayBase& x) { return Eigen::CwiseBinaryOp, const Derived, const ExponentDerived>( a.derived(), x.derived() ); } /** \cpp11 \returns an expression of the coefficient-wise igamma_der_a(\a a, \a x) to the given arrays. * * This function computes the coefficient-wise derivative of the incomplete * gamma function with respect to the parameter a. * * \note This function supports only float and double scalar types in c++11 * mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations * of igamma_der_a(T,T) for any scalar * type T to be supported. * * \sa Eigen::igamma(), Eigen::lgamma() */ template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const Derived, const ExponentDerived> igamma_der_a(const Eigen::ArrayBase& a, const Eigen::ArrayBase& x) { return Eigen::CwiseBinaryOp, const Derived, const ExponentDerived>( a.derived(), x.derived()); } /** \cpp11 \returns an expression of the coefficient-wise gamma_sample_der_alpha(\a alpha, \a sample) to the given arrays. * * This function computes the coefficient-wise derivative of the sample * of a Gamma(alpha, 1) random variable with respect to the parameter alpha. * * \note This function supports only float and double scalar types in c++11 * mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations * of gamma_sample_der_alpha(T,T) for any scalar * type T to be supported. * * \sa Eigen::igamma(), Eigen::lgamma() */ template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const AlphaDerived, const SampleDerived> gamma_sample_der_alpha(const Eigen::ArrayBase& alpha, const Eigen::ArrayBase& sample) { return Eigen::CwiseBinaryOp, const AlphaDerived, const SampleDerived>( alpha.derived(), sample.derived()); } /** \cpp11 \returns an expression of the coefficient-wise igammac(\a a, \a x) to the given arrays. * * This function computes the coefficient-wise complementary incomplete gamma function. * * \note This function supports only float and double scalar types in c++11 mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations of igammac(T,T) for any scalar * type T to be supported. * * \sa Eigen::igamma(), Eigen::lgamma() */ template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const Derived, const ExponentDerived> igammac(const Eigen::ArrayBase& a, const Eigen::ArrayBase& x) { return Eigen::CwiseBinaryOp, const Derived, const ExponentDerived>( a.derived(), x.derived() ); } /** \cpp11 \returns an expression of the coefficient-wise polygamma(\a n, \a x) to the given arrays. * * It returns the \a n -th derivative of the digamma(psi) evaluated at \c x. * * \note This function supports only float and double scalar types in c++11 mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations of polygamma(T,T) for any scalar * type T to be supported. * * \sa Eigen::digamma() */ // * \warning Be careful with the order of the parameters: x.polygamma(n) is equivalent to polygamma(n,x) // * \sa ArrayBase::polygamma() template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const DerivedN, const DerivedX> polygamma(const Eigen::ArrayBase& n, const Eigen::ArrayBase& x) { return Eigen::CwiseBinaryOp, const DerivedN, const DerivedX>( n.derived(), x.derived() ); } /** \cpp11 \returns an expression of the coefficient-wise betainc(\a x, \a a, \a b) to the given arrays. * * This function computes the regularized incomplete beta function (integral). * * \note This function supports only float and double scalar types in c++11 mode. To support other scalar types, * or float/double in non c++11 mode, the user has to provide implementations of betainc(T,T,T) for any scalar * type T to be supported. * * \sa Eigen::betainc(), Eigen::lgamma() */ template EIGEN_STRONG_INLINE const Eigen::CwiseTernaryOp, const ArgADerived, const ArgBDerived, const ArgXDerived> betainc(const Eigen::ArrayBase& a, const Eigen::ArrayBase& b, const Eigen::ArrayBase& x) { return Eigen::CwiseTernaryOp, const ArgADerived, const ArgBDerived, const ArgXDerived>( a.derived(), b.derived(), x.derived() ); } /** \returns an expression of the coefficient-wise zeta(\a x, \a q) to the given arrays. * * It returns the Riemann zeta function of two arguments \a x and \a q: * * \param x is the exponent, it must be > 1 * \param q is the shift, it must be > 0 * * \note This function supports only float and double scalar types. To support other scalar types, the user has * to provide implementations of zeta(T,T) for any scalar type T to be supported. * * \sa ArrayBase::zeta() */ template EIGEN_STRONG_INLINE const Eigen::CwiseBinaryOp, const DerivedX, const DerivedQ> zeta(const Eigen::ArrayBase& x, const Eigen::ArrayBase& q) { return Eigen::CwiseBinaryOp, const DerivedX, const DerivedQ>( x.derived(), q.derived() ); } } // end namespace Eigen #endif // EIGEN_SPECIALFUNCTIONS_ARRAYAPI_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsBFloat16.h0000644000176200001440000000524414562417221030734 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSELFUNCTIONS_BFLOAT16_H #define EIGEN_BESSELFUNCTIONS_BFLOAT16_H namespace Eigen { namespace numext { #if EIGEN_HAS_C99_MATH template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_i0(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_i0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_i0e(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_i0e(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_i1(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_i1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_i1e(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_i1e(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_j0(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_j0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_j1(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_j1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_y0(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_y0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_y1(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_y1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_k0(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_k0(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_k0e(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_k0e(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_k1(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_k1(static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bessel_k1e(const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::bessel_k1e(static_cast(x))); } #endif } // end namespace numext } // end namespace Eigen #endif // EIGEN_BESSELFUNCTIONS_BFLOAT16_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsImpl.h0000644000176200001440000016225314562417221030466 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Eugene Brevdo // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIAL_FUNCTIONS_H #define EIGEN_SPECIAL_FUNCTIONS_H namespace Eigen { namespace internal { // Parts of this code are based on the Cephes Math Library. // // Cephes Math Library Release 2.8: June, 2000 // Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier // // Permission has been kindly provided by the original author // to incorporate the Cephes software into the Eigen codebase: // // From: Stephen Moshier // To: Eugene Brevdo // Subject: Re: Permission to wrap several cephes functions in Eigen // // Hello Eugene, // // Thank you for writing. // // If your licensing is similar to BSD, the formal way that has been // handled is simply to add a statement to the effect that you are incorporating // the Cephes software by permission of the author. // // Good luck with your project, // Steve /**************************************************************************** * Implementation of lgamma, requires C++11/C99 * ****************************************************************************/ template struct lgamma_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; template struct lgamma_retval { typedef Scalar type; }; #if EIGEN_HAS_C99_MATH // Since glibc 2.19 #if defined(__GLIBC__) && ((__GLIBC__>=2 && __GLIBC_MINOR__ >= 19) || __GLIBC__>2) \ && (defined(_DEFAULT_SOURCE) || defined(_BSD_SOURCE) || defined(_SVID_SOURCE)) #define EIGEN_HAS_LGAMMA_R #endif // Glibc versions before 2.19 #if defined(__GLIBC__) && ((__GLIBC__==2 && __GLIBC_MINOR__ < 19) || __GLIBC__<2) \ && (defined(_BSD_SOURCE) || defined(_SVID_SOURCE)) #define EIGEN_HAS_LGAMMA_R #endif template <> struct lgamma_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float run(float x) { #if !defined(EIGEN_GPU_COMPILE_PHASE) && defined (EIGEN_HAS_LGAMMA_R) && !defined(__APPLE__) int dummy; return ::lgammaf_r(x, &dummy); #elif defined(SYCL_DEVICE_ONLY) return cl::sycl::lgamma(x); #else return ::lgammaf(x); #endif } }; template <> struct lgamma_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double run(double x) { #if !defined(EIGEN_GPU_COMPILE_PHASE) && defined(EIGEN_HAS_LGAMMA_R) && !defined(__APPLE__) int dummy; return ::lgamma_r(x, &dummy); #elif defined(SYCL_DEVICE_ONLY) return cl::sycl::lgamma(x); #else return ::lgamma(x); #endif } }; #undef EIGEN_HAS_LGAMMA_R #endif /**************************************************************************** * Implementation of digamma (psi), based on Cephes * ****************************************************************************/ template struct digamma_retval { typedef Scalar type; }; /* * * Polynomial evaluation helper for the Psi (digamma) function. * * digamma_impl_maybe_poly::run(s) evaluates the asymptotic Psi expansion for * input Scalar s, assuming s is above 10.0. * * If s is above a certain threshold for the given Scalar type, zero * is returned. Otherwise the polynomial is evaluated with enough * coefficients for results matching Scalar machine precision. * * */ template struct digamma_impl_maybe_poly { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; template <> struct digamma_impl_maybe_poly { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float run(const float s) { const float A[] = { -4.16666666666666666667E-3f, 3.96825396825396825397E-3f, -8.33333333333333333333E-3f, 8.33333333333333333333E-2f }; float z; if (s < 1.0e8f) { z = 1.0f / (s * s); return z * internal::ppolevl::run(z, A); } else return 0.0f; } }; template <> struct digamma_impl_maybe_poly { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double run(const double s) { const double A[] = { 8.33333333333333333333E-2, -2.10927960927960927961E-2, 7.57575757575757575758E-3, -4.16666666666666666667E-3, 3.96825396825396825397E-3, -8.33333333333333333333E-3, 8.33333333333333333333E-2 }; double z; if (s < 1.0e17) { z = 1.0 / (s * s); return z * internal::ppolevl::run(z, A); } else return 0.0; } }; template struct digamma_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar x) { /* * * Psi (digamma) function (modified for Eigen) * * * SYNOPSIS: * * double x, y, psi(); * * y = psi( x ); * * * DESCRIPTION: * * d - * psi(x) = -- ln | (x) * dx * * is the logarithmic derivative of the gamma function. * For integer x, * n-1 * - * psi(n) = -EUL + > 1/k. * - * k=1 * * If x is negative, it is transformed to a positive argument by the * reflection formula psi(1-x) = psi(x) + pi cot(pi x). * For general positive x, the argument is made greater than 10 * using the recurrence psi(x+1) = psi(x) + 1/x. * Then the following asymptotic expansion is applied: * * inf. B * - 2k * psi(x) = log(x) - 1/2x - > ------- * - 2k * k=1 2k x * * where the B2k are Bernoulli numbers. * * ACCURACY (float): * Relative error (except absolute when |psi| < 1): * arithmetic domain # trials peak rms * IEEE 0,30 30000 1.3e-15 1.4e-16 * IEEE -30,0 40000 1.5e-15 2.2e-16 * * ACCURACY (double): * Absolute error, relative when |psi| > 1 : * arithmetic domain # trials peak rms * IEEE -33,0 30000 8.2e-7 1.2e-7 * IEEE 0,33 100000 7.3e-7 7.7e-8 * * ERROR MESSAGES: * message condition value returned * psi singularity x integer <=0 INFINITY */ Scalar p, q, nz, s, w, y; bool negative = false; const Scalar nan = NumTraits::quiet_NaN(); const Scalar m_pi = Scalar(EIGEN_PI); const Scalar zero = Scalar(0); const Scalar one = Scalar(1); const Scalar half = Scalar(0.5); nz = zero; if (x <= zero) { negative = true; q = x; p = numext::floor(q); if (p == q) { return nan; } /* Remove the zeros of tan(m_pi x) * by subtracting the nearest integer from x */ nz = q - p; if (nz != half) { if (nz > half) { p += one; nz = q - p; } nz = m_pi / numext::tan(m_pi * nz); } else { nz = zero; } x = one - x; } /* use the recurrence psi(x+1) = psi(x) + 1/x. */ s = x; w = zero; while (s < Scalar(10)) { w += one / s; s += one; } y = digamma_impl_maybe_poly::run(s); y = numext::log(s) - (half / s) - y - w; return (negative) ? y - nz : y; } }; /**************************************************************************** * Implementation of erf, requires C++11/C99 * ****************************************************************************/ /** \internal \returns the error function of \a a (coeff-wise) Doesn't do anything fancy, just a 13/8-degree rational interpolant which is accurate up to a couple of ulp in the range [-4, 4], outside of which fl(erf(x)) = +/-1. This implementation works on both scalars and Ts. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T generic_fast_erf_float(const T& a_x) { // Clamp the inputs to the range [-4, 4] since anything outside // this range is +/-1.0f in single-precision. const T plus_4 = pset1(4.f); const T minus_4 = pset1(-4.f); const T x = pmax(pmin(a_x, plus_4), minus_4); // The monomial coefficients of the numerator polynomial (odd). const T alpha_1 = pset1(-1.60960333262415e-02f); const T alpha_3 = pset1(-2.95459980854025e-03f); const T alpha_5 = pset1(-7.34990630326855e-04f); const T alpha_7 = pset1(-5.69250639462346e-05f); const T alpha_9 = pset1(-2.10102402082508e-06f); const T alpha_11 = pset1(2.77068142495902e-08f); const T alpha_13 = pset1(-2.72614225801306e-10f); // The monomial coefficients of the denominator polynomial (even). const T beta_0 = pset1(-1.42647390514189e-02f); const T beta_2 = pset1(-7.37332916720468e-03f); const T beta_4 = pset1(-1.68282697438203e-03f); const T beta_6 = pset1(-2.13374055278905e-04f); const T beta_8 = pset1(-1.45660718464996e-05f); // Since the polynomials are odd/even, we need x^2. const T x2 = pmul(x, x); // Evaluate the numerator polynomial p. T p = pmadd(x2, alpha_13, alpha_11); p = pmadd(x2, p, alpha_9); p = pmadd(x2, p, alpha_7); p = pmadd(x2, p, alpha_5); p = pmadd(x2, p, alpha_3); p = pmadd(x2, p, alpha_1); p = pmul(x, p); // Evaluate the denominator polynomial p. T q = pmadd(x2, beta_8, beta_6); q = pmadd(x2, q, beta_4); q = pmadd(x2, q, beta_2); q = pmadd(x2, q, beta_0); // Divide the numerator by the denominator. return pdiv(p, q); } template struct erf_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { return generic_fast_erf_float(x); } }; template struct erf_retval { typedef Scalar type; }; #if EIGEN_HAS_C99_MATH template <> struct erf_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float run(float x) { #if defined(SYCL_DEVICE_ONLY) return cl::sycl::erf(x); #else return generic_fast_erf_float(x); #endif } }; template <> struct erf_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double run(double x) { #if defined(SYCL_DEVICE_ONLY) return cl::sycl::erf(x); #else return ::erf(x); #endif } }; #endif // EIGEN_HAS_C99_MATH /*************************************************************************** * Implementation of erfc, requires C++11/C99 * ****************************************************************************/ template struct erfc_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; template struct erfc_retval { typedef Scalar type; }; #if EIGEN_HAS_C99_MATH template <> struct erfc_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float run(const float x) { #if defined(SYCL_DEVICE_ONLY) return cl::sycl::erfc(x); #else return ::erfcf(x); #endif } }; template <> struct erfc_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double run(const double x) { #if defined(SYCL_DEVICE_ONLY) return cl::sycl::erfc(x); #else return ::erfc(x); #endif } }; #endif // EIGEN_HAS_C99_MATH /*************************************************************************** * Implementation of ndtri. * ****************************************************************************/ /* Inverse of Normal distribution function (modified for Eigen). * * * SYNOPSIS: * * double x, y, ndtri(); * * x = ndtri( y ); * * * * DESCRIPTION: * * Returns the argument, x, for which the area under the * Gaussian probability density function (integrated from * minus infinity to x) is equal to y. * * * For small arguments 0 < y < exp(-2), the program computes * z = sqrt( -2.0 * log(y) ); then the approximation is * x = z - log(z)/z - (1/z) P(1/z) / Q(1/z). * There are two rational functions P/Q, one for 0 < y < exp(-32) * and the other for y up to exp(-2). For larger arguments, * w = y - 0.5, and x/sqrt(2pi) = w + w**3 R(w**2)/S(w**2)). * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * DEC 0.125, 1 5500 9.5e-17 2.1e-17 * DEC 6e-39, 0.135 3500 5.7e-17 1.3e-17 * IEEE 0.125, 1 20000 7.2e-16 1.3e-16 * IEEE 3e-308, 0.135 50000 4.6e-16 9.8e-17 * * * ERROR MESSAGES: * * message condition value returned * ndtri domain x <= 0 -MAXNUM * ndtri domain x >= 1 MAXNUM * */ /* Cephes Math Library Release 2.2: June, 1992 Copyright 1985, 1987, 1992 by Stephen L. Moshier Direct inquiries to 30 Frost Street, Cambridge, MA 02140 */ // TODO: Add a cheaper approximation for float. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T flipsign( const T& should_flipsign, const T& x) { typedef typename unpacket_traits::type Scalar; const T sign_mask = pset1(Scalar(-0.0)); T sign_bit = pand(should_flipsign, sign_mask); return pxor(sign_bit, x); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double flipsign( const double& should_flipsign, const double& x) { return should_flipsign == 0 ? x : -x; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float flipsign( const float& should_flipsign, const float& x) { return should_flipsign == 0 ? x : -x; } // We split this computation in to two so that in the scalar path // only one branch is evaluated (due to our template specialization of pselect // being an if statement.) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T generic_ndtri_gt_exp_neg_two(const T& b) { const ScalarType p0[] = { ScalarType(-5.99633501014107895267e1), ScalarType(9.80010754185999661536e1), ScalarType(-5.66762857469070293439e1), ScalarType(1.39312609387279679503e1), ScalarType(-1.23916583867381258016e0) }; const ScalarType q0[] = { ScalarType(1.0), ScalarType(1.95448858338141759834e0), ScalarType(4.67627912898881538453e0), ScalarType(8.63602421390890590575e1), ScalarType(-2.25462687854119370527e2), ScalarType(2.00260212380060660359e2), ScalarType(-8.20372256168333339912e1), ScalarType(1.59056225126211695515e1), ScalarType(-1.18331621121330003142e0) }; const T sqrt2pi = pset1(ScalarType(2.50662827463100050242e0)); const T half = pset1(ScalarType(0.5)); T c, c2, ndtri_gt_exp_neg_two; c = psub(b, half); c2 = pmul(c, c); ndtri_gt_exp_neg_two = pmadd(c, pmul( c2, pdiv( internal::ppolevl::run(c2, p0), internal::ppolevl::run(c2, q0))), c); return pmul(ndtri_gt_exp_neg_two, sqrt2pi); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T generic_ndtri_lt_exp_neg_two( const T& b, const T& should_flipsign) { /* Approximation for interval z = sqrt(-2 log a ) between 2 and 8 * i.e., a between exp(-2) = .135 and exp(-32) = 1.27e-14. */ const ScalarType p1[] = { ScalarType(4.05544892305962419923e0), ScalarType(3.15251094599893866154e1), ScalarType(5.71628192246421288162e1), ScalarType(4.40805073893200834700e1), ScalarType(1.46849561928858024014e1), ScalarType(2.18663306850790267539e0), ScalarType(-1.40256079171354495875e-1), ScalarType(-3.50424626827848203418e-2), ScalarType(-8.57456785154685413611e-4) }; const ScalarType q1[] = { ScalarType(1.0), ScalarType(1.57799883256466749731e1), ScalarType(4.53907635128879210584e1), ScalarType(4.13172038254672030440e1), ScalarType(1.50425385692907503408e1), ScalarType(2.50464946208309415979e0), ScalarType(-1.42182922854787788574e-1), ScalarType(-3.80806407691578277194e-2), ScalarType(-9.33259480895457427372e-4) }; /* Approximation for interval z = sqrt(-2 log a ) between 8 and 64 * i.e., a between exp(-32) = 1.27e-14 and exp(-2048) = 3.67e-890. */ const ScalarType p2[] = { ScalarType(3.23774891776946035970e0), ScalarType(6.91522889068984211695e0), ScalarType(3.93881025292474443415e0), ScalarType(1.33303460815807542389e0), ScalarType(2.01485389549179081538e-1), ScalarType(1.23716634817820021358e-2), ScalarType(3.01581553508235416007e-4), ScalarType(2.65806974686737550832e-6), ScalarType(6.23974539184983293730e-9) }; const ScalarType q2[] = { ScalarType(1.0), ScalarType(6.02427039364742014255e0), ScalarType(3.67983563856160859403e0), ScalarType(1.37702099489081330271e0), ScalarType(2.16236993594496635890e-1), ScalarType(1.34204006088543189037e-2), ScalarType(3.28014464682127739104e-4), ScalarType(2.89247864745380683936e-6), ScalarType(6.79019408009981274425e-9) }; const T eight = pset1(ScalarType(8.0)); const T one = pset1(ScalarType(1)); const T neg_two = pset1(ScalarType(-2)); T x, x0, x1, z; x = psqrt(pmul(neg_two, plog(b))); x0 = psub(x, pdiv(plog(x), x)); z = pdiv(one, x); x1 = pmul( z, pselect( pcmp_lt(x, eight), pdiv(internal::ppolevl::run(z, p1), internal::ppolevl::run(z, q1)), pdiv(internal::ppolevl::run(z, p2), internal::ppolevl::run(z, q2)))); return flipsign(should_flipsign, psub(x0, x1)); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T generic_ndtri(const T& a) { const T maxnum = pset1(NumTraits::infinity()); const T neg_maxnum = pset1(-NumTraits::infinity()); const T zero = pset1(ScalarType(0)); const T one = pset1(ScalarType(1)); // exp(-2) const T exp_neg_two = pset1(ScalarType(0.13533528323661269189)); T b, ndtri, should_flipsign; should_flipsign = pcmp_le(a, psub(one, exp_neg_two)); b = pselect(should_flipsign, a, psub(one, a)); ndtri = pselect( pcmp_lt(exp_neg_two, b), generic_ndtri_gt_exp_neg_two(b), generic_ndtri_lt_exp_neg_two(b, should_flipsign)); return pselect( pcmp_le(a, zero), neg_maxnum, pselect(pcmp_le(one, a), maxnum, ndtri)); } template struct ndtri_retval { typedef Scalar type; }; #if !EIGEN_HAS_C99_MATH template struct ndtri_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; # else template struct ndtri_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar x) { return generic_ndtri(x); } }; #endif // EIGEN_HAS_C99_MATH /************************************************************************************************************** * Implementation of igammac (complemented incomplete gamma integral), based on Cephes but requires C++11/C99 * **************************************************************************************************************/ template struct igammac_retval { typedef Scalar type; }; // NOTE: cephes_helper is also used to implement zeta template struct cephes_helper { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar machep() { assert(false && "machep not supported for this type"); return 0.0; } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar big() { assert(false && "big not supported for this type"); return 0.0; } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar biginv() { assert(false && "biginv not supported for this type"); return 0.0; } }; template <> struct cephes_helper { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float machep() { return NumTraits::epsilon() / 2; // 1.0 - machep == 1.0 } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float big() { // use epsneg (1.0 - epsneg == 1.0) return 1.0f / (NumTraits::epsilon() / 2); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float biginv() { // epsneg return machep(); } }; template <> struct cephes_helper { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double machep() { return NumTraits::epsilon() / 2; // 1.0 - machep == 1.0 } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double big() { return 1.0 / NumTraits::epsilon(); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double biginv() { // inverse of eps return NumTraits::epsilon(); } }; enum IgammaComputationMode { VALUE, DERIVATIVE, SAMPLE_DERIVATIVE }; template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar main_igamma_term(Scalar a, Scalar x) { /* Compute x**a * exp(-x) / gamma(a) */ Scalar logax = a * numext::log(x) - x - lgamma_impl::run(a); if (logax < -numext::log(NumTraits::highest()) || // Assuming x and a aren't Nan. (numext::isnan)(logax)) { return Scalar(0); } return numext::exp(logax); } template EIGEN_DEVICE_FUNC int igamma_num_iterations() { /* Returns the maximum number of internal iterations for igamma computation. */ if (mode == VALUE) { return 2000; } if (internal::is_same::value) { return 200; } else if (internal::is_same::value) { return 500; } else { return 2000; } } template struct igammac_cf_impl { /* Computes igamc(a, x) or derivative (depending on the mode) * using the continued fraction expansion of the complementary * incomplete Gamma function. * * Preconditions: * a > 0 * x >= 1 * x >= a */ EIGEN_DEVICE_FUNC static Scalar run(Scalar a, Scalar x) { const Scalar zero = 0; const Scalar one = 1; const Scalar two = 2; const Scalar machep = cephes_helper::machep(); const Scalar big = cephes_helper::big(); const Scalar biginv = cephes_helper::biginv(); if ((numext::isinf)(x)) { return zero; } Scalar ax = main_igamma_term(a, x); // This is independent of mode. If this value is zero, // then the function value is zero. If the function value is zero, // then we are in a neighborhood where the function value evalutes to zero, // so the derivative is zero. if (ax == zero) { return zero; } // continued fraction Scalar y = one - a; Scalar z = x + y + one; Scalar c = zero; Scalar pkm2 = one; Scalar qkm2 = x; Scalar pkm1 = x + one; Scalar qkm1 = z * x; Scalar ans = pkm1 / qkm1; Scalar dpkm2_da = zero; Scalar dqkm2_da = zero; Scalar dpkm1_da = zero; Scalar dqkm1_da = -x; Scalar dans_da = (dpkm1_da - ans * dqkm1_da) / qkm1; for (int i = 0; i < igamma_num_iterations(); i++) { c += one; y += one; z += two; Scalar yc = y * c; Scalar pk = pkm1 * z - pkm2 * yc; Scalar qk = qkm1 * z - qkm2 * yc; Scalar dpk_da = dpkm1_da * z - pkm1 - dpkm2_da * yc + pkm2 * c; Scalar dqk_da = dqkm1_da * z - qkm1 - dqkm2_da * yc + qkm2 * c; if (qk != zero) { Scalar ans_prev = ans; ans = pk / qk; Scalar dans_da_prev = dans_da; dans_da = (dpk_da - ans * dqk_da) / qk; if (mode == VALUE) { if (numext::abs(ans_prev - ans) <= machep * numext::abs(ans)) { break; } } else { if (numext::abs(dans_da - dans_da_prev) <= machep) { break; } } } pkm2 = pkm1; pkm1 = pk; qkm2 = qkm1; qkm1 = qk; dpkm2_da = dpkm1_da; dpkm1_da = dpk_da; dqkm2_da = dqkm1_da; dqkm1_da = dqk_da; if (numext::abs(pk) > big) { pkm2 *= biginv; pkm1 *= biginv; qkm2 *= biginv; qkm1 *= biginv; dpkm2_da *= biginv; dpkm1_da *= biginv; dqkm2_da *= biginv; dqkm1_da *= biginv; } } /* Compute x**a * exp(-x) / gamma(a) */ Scalar dlogax_da = numext::log(x) - digamma_impl::run(a); Scalar dax_da = ax * dlogax_da; switch (mode) { case VALUE: return ans * ax; case DERIVATIVE: return ans * dax_da + dans_da * ax; case SAMPLE_DERIVATIVE: default: // this is needed to suppress clang warning return -(dans_da + ans * dlogax_da) * x; } } }; template struct igamma_series_impl { /* Computes igam(a, x) or its derivative (depending on the mode) * using the series expansion of the incomplete Gamma function. * * Preconditions: * x > 0 * a > 0 * !(x > 1 && x > a) */ EIGEN_DEVICE_FUNC static Scalar run(Scalar a, Scalar x) { const Scalar zero = 0; const Scalar one = 1; const Scalar machep = cephes_helper::machep(); Scalar ax = main_igamma_term(a, x); // This is independent of mode. If this value is zero, // then the function value is zero. If the function value is zero, // then we are in a neighborhood where the function value evalutes to zero, // so the derivative is zero. if (ax == zero) { return zero; } ax /= a; /* power series */ Scalar r = a; Scalar c = one; Scalar ans = one; Scalar dc_da = zero; Scalar dans_da = zero; for (int i = 0; i < igamma_num_iterations(); i++) { r += one; Scalar term = x / r; Scalar dterm_da = -x / (r * r); dc_da = term * dc_da + dterm_da * c; dans_da += dc_da; c *= term; ans += c; if (mode == VALUE) { if (c <= machep * ans) { break; } } else { if (numext::abs(dc_da) <= machep * numext::abs(dans_da)) { break; } } } Scalar dlogax_da = numext::log(x) - digamma_impl::run(a + one); Scalar dax_da = ax * dlogax_da; switch (mode) { case VALUE: return ans * ax; case DERIVATIVE: return ans * dax_da + dans_da * ax; case SAMPLE_DERIVATIVE: default: // this is needed to suppress clang warning return -(dans_da + ans * dlogax_da) * x / a; } } }; #if !EIGEN_HAS_C99_MATH template struct igammac_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar a, Scalar x) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; #else template struct igammac_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar a, Scalar x) { /* igamc() * * Incomplete gamma integral (modified for Eigen) * * * * SYNOPSIS: * * double a, x, y, igamc(); * * y = igamc( a, x ); * * DESCRIPTION: * * The function is defined by * * * igamc(a,x) = 1 - igam(a,x) * * inf. * - * 1 | | -t a-1 * = ----- | e t dt. * - | | * | (a) - * x * * * In this implementation both arguments must be positive. * The integral is evaluated by either a power series or * continued fraction expansion, depending on the relative * values of a and x. * * ACCURACY (float): * * Relative error: * arithmetic domain # trials peak rms * IEEE 0,30 30000 7.8e-6 5.9e-7 * * * ACCURACY (double): * * Tested at random a, x. * a x Relative error: * arithmetic domain domain # trials peak rms * IEEE 0.5,100 0,100 200000 1.9e-14 1.7e-15 * IEEE 0.01,0.5 0,100 200000 1.4e-13 1.6e-15 * */ /* Cephes Math Library Release 2.2: June, 1992 Copyright 1985, 1987, 1992 by Stephen L. Moshier Direct inquiries to 30 Frost Street, Cambridge, MA 02140 */ const Scalar zero = 0; const Scalar one = 1; const Scalar nan = NumTraits::quiet_NaN(); if ((x < zero) || (a <= zero)) { // domain error return nan; } if ((numext::isnan)(a) || (numext::isnan)(x)) { // propagate nans return nan; } if ((x < one) || (x < a)) { return (one - igamma_series_impl::run(a, x)); } return igammac_cf_impl::run(a, x); } }; #endif // EIGEN_HAS_C99_MATH /************************************************************************************************ * Implementation of igamma (incomplete gamma integral), based on Cephes but requires C++11/C99 * ************************************************************************************************/ #if !EIGEN_HAS_C99_MATH template struct igamma_generic_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(Scalar a, Scalar x) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; #else template struct igamma_generic_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar a, Scalar x) { /* Depending on the mode, returns * - VALUE: incomplete Gamma function igamma(a, x) * - DERIVATIVE: derivative of incomplete Gamma function d/da igamma(a, x) * - SAMPLE_DERIVATIVE: implicit derivative of a Gamma random variable * x ~ Gamma(x | a, 1), dx/da = -1 / Gamma(x | a, 1) * d igamma(a, x) / dx * * Derivatives are implemented by forward-mode differentiation. */ const Scalar zero = 0; const Scalar one = 1; const Scalar nan = NumTraits::quiet_NaN(); if (x == zero) return zero; if ((x < zero) || (a <= zero)) { // domain error return nan; } if ((numext::isnan)(a) || (numext::isnan)(x)) { // propagate nans return nan; } if ((x > one) && (x > a)) { Scalar ret = igammac_cf_impl::run(a, x); if (mode == VALUE) { return one - ret; } else { return -ret; } } return igamma_series_impl::run(a, x); } }; #endif // EIGEN_HAS_C99_MATH template struct igamma_retval { typedef Scalar type; }; template struct igamma_impl : igamma_generic_impl { /* igam() * Incomplete gamma integral. * * The CDF of Gamma(a, 1) random variable at the point x. * * Accuracy estimation. For each a in [10^-2, 10^-1...10^3] we sample * 50 Gamma random variables x ~ Gamma(x | a, 1), a total of 300 points. * The ground truth is computed by mpmath. Mean absolute error: * float: 1.26713e-05 * double: 2.33606e-12 * * Cephes documentation below. * * SYNOPSIS: * * double a, x, y, igam(); * * y = igam( a, x ); * * DESCRIPTION: * * The function is defined by * * x * - * 1 | | -t a-1 * igam(a,x) = ----- | e t dt. * - | | * | (a) - * 0 * * * In this implementation both arguments must be positive. * The integral is evaluated by either a power series or * continued fraction expansion, depending on the relative * values of a and x. * * ACCURACY (double): * * Relative error: * arithmetic domain # trials peak rms * IEEE 0,30 200000 3.6e-14 2.9e-15 * IEEE 0,100 300000 9.9e-14 1.5e-14 * * * ACCURACY (float): * * Relative error: * arithmetic domain # trials peak rms * IEEE 0,30 20000 7.8e-6 5.9e-7 * */ /* Cephes Math Library Release 2.2: June, 1992 Copyright 1985, 1987, 1992 by Stephen L. Moshier Direct inquiries to 30 Frost Street, Cambridge, MA 02140 */ /* left tail of incomplete gamma function: * * inf. k * a -x - x * x e > ---------- * - - * k=0 | (a+k+1) * */ }; template struct igamma_der_a_retval : igamma_retval {}; template struct igamma_der_a_impl : igamma_generic_impl { /* Derivative of the incomplete Gamma function with respect to a. * * Computes d/da igamma(a, x) by forward differentiation of the igamma code. * * Accuracy estimation. For each a in [10^-2, 10^-1...10^3] we sample * 50 Gamma random variables x ~ Gamma(x | a, 1), a total of 300 points. * The ground truth is computed by mpmath. Mean absolute error: * float: 6.17992e-07 * double: 4.60453e-12 * * Reference: * R. Moore. "Algorithm AS 187: Derivatives of the incomplete gamma * integral". Journal of the Royal Statistical Society. 1982 */ }; template struct gamma_sample_der_alpha_retval : igamma_retval {}; template struct gamma_sample_der_alpha_impl : igamma_generic_impl { /* Derivative of a Gamma random variable sample with respect to alpha. * * Consider a sample of a Gamma random variable with the concentration * parameter alpha: sample ~ Gamma(alpha, 1). The reparameterization * derivative that we want to compute is dsample / dalpha = * d igammainv(alpha, u) / dalpha, where u = igamma(alpha, sample). * However, this formula is numerically unstable and expensive, so instead * we use implicit differentiation: * * igamma(alpha, sample) = u, where u ~ Uniform(0, 1). * Apply d / dalpha to both sides: * d igamma(alpha, sample) / dalpha * + d igamma(alpha, sample) / dsample * dsample/dalpha = 0 * d igamma(alpha, sample) / dalpha * + Gamma(sample | alpha, 1) dsample / dalpha = 0 * dsample/dalpha = - (d igamma(alpha, sample) / dalpha) * / Gamma(sample | alpha, 1) * * Here Gamma(sample | alpha, 1) is the PDF of the Gamma distribution * (note that the derivative of the CDF w.r.t. sample is the PDF). * See the reference below for more details. * * The derivative of igamma(alpha, sample) is computed by forward * differentiation of the igamma code. Division by the Gamma PDF is performed * in the same code, increasing the accuracy and speed due to cancellation * of some terms. * * Accuracy estimation. For each alpha in [10^-2, 10^-1...10^3] we sample * 50 Gamma random variables sample ~ Gamma(sample | alpha, 1), a total of 300 * points. The ground truth is computed by mpmath. Mean absolute error: * float: 2.1686e-06 * double: 1.4774e-12 * * Reference: * M. Figurnov, S. Mohamed, A. Mnih "Implicit Reparameterization Gradients". * 2018 */ }; /***************************************************************************** * Implementation of Riemann zeta function of two arguments, based on Cephes * *****************************************************************************/ template struct zeta_retval { typedef Scalar type; }; template struct zeta_impl_series { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Scalar) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; template <> struct zeta_impl_series { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE bool run(float& a, float& b, float& s, const float x, const float machep) { int i = 0; while(i < 9) { i += 1; a += 1.0f; b = numext::pow( a, -x ); s += b; if( numext::abs(b/s) < machep ) return true; } //Return whether we are done return false; } }; template <> struct zeta_impl_series { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE bool run(double& a, double& b, double& s, const double x, const double machep) { int i = 0; while( (i < 9) || (a <= 9.0) ) { i += 1; a += 1.0; b = numext::pow( a, -x ); s += b; if( numext::abs(b/s) < machep ) return true; } //Return whether we are done return false; } }; template struct zeta_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar x, Scalar q) { /* zeta.c * * Riemann zeta function of two arguments * * * * SYNOPSIS: * * double x, q, y, zeta(); * * y = zeta( x, q ); * * * * DESCRIPTION: * * * * inf. * - -x * zeta(x,q) = > (k+q) * - * k=0 * * where x > 1 and q is not a negative integer or zero. * The Euler-Maclaurin summation formula is used to obtain * the expansion * * n * - -x * zeta(x,q) = > (k+q) * - * k=1 * * 1-x inf. B x(x+1)...(x+2j) * (n+q) 1 - 2j * + --------- - ------- + > -------------------- * x-1 x - x+2j+1 * 2(n+q) j=1 (2j)! (n+q) * * where the B2j are Bernoulli numbers. Note that (see zetac.c) * zeta(x,1) = zetac(x) + 1. * * * * ACCURACY: * * Relative error for single precision: * arithmetic domain # trials peak rms * IEEE 0,25 10000 6.9e-7 1.0e-7 * * Large arguments may produce underflow in powf(), in which * case the results are inaccurate. * * REFERENCE: * * Gradshteyn, I. S., and I. M. Ryzhik, Tables of Integrals, * Series, and Products, p. 1073; Academic Press, 1980. * */ int i; Scalar p, r, a, b, k, s, t, w; const Scalar A[] = { Scalar(12.0), Scalar(-720.0), Scalar(30240.0), Scalar(-1209600.0), Scalar(47900160.0), Scalar(-1.8924375803183791606e9), /*1.307674368e12/691*/ Scalar(7.47242496e10), Scalar(-2.950130727918164224e12), /*1.067062284288e16/3617*/ Scalar(1.1646782814350067249e14), /*5.109094217170944e18/43867*/ Scalar(-4.5979787224074726105e15), /*8.028576626982912e20/174611*/ Scalar(1.8152105401943546773e17), /*1.5511210043330985984e23/854513*/ Scalar(-7.1661652561756670113e18) /*1.6938241367317436694528e27/236364091*/ }; const Scalar maxnum = NumTraits::infinity(); const Scalar zero = 0.0, half = 0.5, one = 1.0; const Scalar machep = cephes_helper::machep(); const Scalar nan = NumTraits::quiet_NaN(); if( x == one ) return maxnum; if( x < one ) { return nan; } if( q <= zero ) { if(q == numext::floor(q)) { if (x == numext::floor(x) && long(x) % 2 == 0) { return maxnum; } else { return nan; } } p = x; r = numext::floor(p); if (p != r) return nan; } /* Permit negative q but continue sum until n+q > +9 . * This case should be handled by a reflection formula. * If q<0 and x is an integer, there is a relation to * the polygamma function. */ s = numext::pow( q, -x ); a = q; b = zero; // Run the summation in a helper function that is specific to the floating precision if (zeta_impl_series::run(a, b, s, x, machep)) { return s; } w = a; s += b*w/(x-one); s -= half * b; a = one; k = zero; for( i=0; i<12; i++ ) { a *= x + k; b /= w; t = a*b/A[i]; s = s + t; t = numext::abs(t/s); if( t < machep ) { break; } k += one; a *= x + k; b /= w; k += one; } return s; } }; /**************************************************************************** * Implementation of polygamma function, requires C++11/C99 * ****************************************************************************/ template struct polygamma_retval { typedef Scalar type; }; #if !EIGEN_HAS_C99_MATH template struct polygamma_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(Scalar n, Scalar x) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; #else template struct polygamma_impl { EIGEN_DEVICE_FUNC static Scalar run(Scalar n, Scalar x) { Scalar zero = 0.0, one = 1.0; Scalar nplus = n + one; const Scalar nan = NumTraits::quiet_NaN(); // Check that n is a non-negative integer if (numext::floor(n) != n || n < zero) { return nan; } // Just return the digamma function for n = 0 else if (n == zero) { return digamma_impl::run(x); } // Use the same implementation as scipy else { Scalar factorial = numext::exp(lgamma_impl::run(nplus)); return numext::pow(-one, nplus) * factorial * zeta_impl::run(nplus, x); } } }; #endif // EIGEN_HAS_C99_MATH /************************************************************************************************ * Implementation of betainc (incomplete beta integral), based on Cephes but requires C++11/C99 * ************************************************************************************************/ template struct betainc_retval { typedef Scalar type; }; #if !EIGEN_HAS_C99_MATH template struct betainc_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(Scalar a, Scalar b, Scalar x) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; #else template struct betainc_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(Scalar, Scalar, Scalar) { /* betaincf.c * * Incomplete beta integral * * * SYNOPSIS: * * float a, b, x, y, betaincf(); * * y = betaincf( a, b, x ); * * * DESCRIPTION: * * Returns incomplete beta integral of the arguments, evaluated * from zero to x. The function is defined as * * x * - - * | (a+b) | | a-1 b-1 * ----------- | t (1-t) dt. * - - | | * | (a) | (b) - * 0 * * The domain of definition is 0 <= x <= 1. In this * implementation a and b are restricted to positive values. * The integral from x to 1 may be obtained by the symmetry * relation * * 1 - betainc( a, b, x ) = betainc( b, a, 1-x ). * * The integral is evaluated by a continued fraction expansion. * If a < 1, the function calls itself recursively after a * transformation to increase a to a+1. * * ACCURACY (float): * * Tested at random points (a,b,x) with a and b in the indicated * interval and x between 0 and 1. * * arithmetic domain # trials peak rms * Relative error: * IEEE 0,30 10000 3.7e-5 5.1e-6 * IEEE 0,100 10000 1.7e-4 2.5e-5 * The useful domain for relative error is limited by underflow * of the single precision exponential function. * Absolute error: * IEEE 0,30 100000 2.2e-5 9.6e-7 * IEEE 0,100 10000 6.5e-5 3.7e-6 * * Larger errors may occur for extreme ratios of a and b. * * ACCURACY (double): * arithmetic domain # trials peak rms * IEEE 0,5 10000 6.9e-15 4.5e-16 * IEEE 0,85 250000 2.2e-13 1.7e-14 * IEEE 0,1000 30000 5.3e-12 6.3e-13 * IEEE 0,10000 250000 9.3e-11 7.1e-12 * IEEE 0,100000 10000 8.7e-10 4.8e-11 * Outputs smaller than the IEEE gradual underflow threshold * were excluded from these statistics. * * ERROR MESSAGES: * message condition value returned * incbet domain x<0, x>1 nan * incbet underflow nan */ EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return Scalar(0); } }; /* Continued fraction expansion #1 for incomplete beta integral (small_branch = True) * Continued fraction expansion #2 for incomplete beta integral (small_branch = False) */ template struct incbeta_cfe { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(Scalar a, Scalar b, Scalar x, bool small_branch) { EIGEN_STATIC_ASSERT((internal::is_same::value || internal::is_same::value), THIS_TYPE_IS_NOT_SUPPORTED); const Scalar big = cephes_helper::big(); const Scalar machep = cephes_helper::machep(); const Scalar biginv = cephes_helper::biginv(); const Scalar zero = 0; const Scalar one = 1; const Scalar two = 2; Scalar xk, pk, pkm1, pkm2, qk, qkm1, qkm2; Scalar k1, k2, k3, k4, k5, k6, k7, k8, k26update; Scalar ans; int n; const int num_iters = (internal::is_same::value) ? 100 : 300; const Scalar thresh = (internal::is_same::value) ? machep : Scalar(3) * machep; Scalar r = (internal::is_same::value) ? zero : one; if (small_branch) { k1 = a; k2 = a + b; k3 = a; k4 = a + one; k5 = one; k6 = b - one; k7 = k4; k8 = a + two; k26update = one; } else { k1 = a; k2 = b - one; k3 = a; k4 = a + one; k5 = one; k6 = a + b; k7 = a + one; k8 = a + two; k26update = -one; x = x / (one - x); } pkm2 = zero; qkm2 = one; pkm1 = one; qkm1 = one; ans = one; n = 0; do { xk = -(x * k1 * k2) / (k3 * k4); pk = pkm1 + pkm2 * xk; qk = qkm1 + qkm2 * xk; pkm2 = pkm1; pkm1 = pk; qkm2 = qkm1; qkm1 = qk; xk = (x * k5 * k6) / (k7 * k8); pk = pkm1 + pkm2 * xk; qk = qkm1 + qkm2 * xk; pkm2 = pkm1; pkm1 = pk; qkm2 = qkm1; qkm1 = qk; if (qk != zero) { r = pk / qk; if (numext::abs(ans - r) < numext::abs(r) * thresh) { return r; } ans = r; } k1 += one; k2 += k26update; k3 += two; k4 += two; k5 += one; k6 -= k26update; k7 += two; k8 += two; if ((numext::abs(qk) + numext::abs(pk)) > big) { pkm2 *= biginv; pkm1 *= biginv; qkm2 *= biginv; qkm1 *= biginv; } if ((numext::abs(qk) < biginv) || (numext::abs(pk) < biginv)) { pkm2 *= big; pkm1 *= big; qkm2 *= big; qkm1 *= big; } } while (++n < num_iters); return ans; } }; /* Helper functions depending on the Scalar type */ template struct betainc_helper {}; template <> struct betainc_helper { /* Core implementation, assumes a large (> 1.0) */ EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float incbsa(float aa, float bb, float xx) { float ans, a, b, t, x, onemx; bool reversed_a_b = false; onemx = 1.0f - xx; /* see if x is greater than the mean */ if (xx > (aa / (aa + bb))) { reversed_a_b = true; a = bb; b = aa; t = xx; x = onemx; } else { a = aa; b = bb; t = onemx; x = xx; } /* Choose expansion for optimal convergence */ if (b > 10.0f) { if (numext::abs(b * x / a) < 0.3f) { t = betainc_helper::incbps(a, b, x); if (reversed_a_b) t = 1.0f - t; return t; } } ans = x * (a + b - 2.0f) / (a - 1.0f); if (ans < 1.0f) { ans = incbeta_cfe::run(a, b, x, true /* small_branch */); t = b * numext::log(t); } else { ans = incbeta_cfe::run(a, b, x, false /* small_branch */); t = (b - 1.0f) * numext::log(t); } t += a * numext::log(x) + lgamma_impl::run(a + b) - lgamma_impl::run(a) - lgamma_impl::run(b); t += numext::log(ans / a); t = numext::exp(t); if (reversed_a_b) t = 1.0f - t; return t; } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE float incbps(float a, float b, float x) { float t, u, y, s; const float machep = cephes_helper::machep(); y = a * numext::log(x) + (b - 1.0f) * numext::log1p(-x) - numext::log(a); y -= lgamma_impl::run(a) + lgamma_impl::run(b); y += lgamma_impl::run(a + b); t = x / (1.0f - x); s = 0.0f; u = 1.0f; do { b -= 1.0f; if (b == 0.0f) { break; } a += 1.0f; u *= t * b / a; s += u; } while (numext::abs(u) > machep); return numext::exp(y) * (1.0f + s); } }; template <> struct betainc_impl { EIGEN_DEVICE_FUNC static float run(float a, float b, float x) { const float nan = NumTraits::quiet_NaN(); float ans, t; if (a <= 0.0f) return nan; if (b <= 0.0f) return nan; if ((x <= 0.0f) || (x >= 1.0f)) { if (x == 0.0f) return 0.0f; if (x == 1.0f) return 1.0f; // mtherr("betaincf", DOMAIN); return nan; } /* transformation for small aa */ if (a <= 1.0f) { ans = betainc_helper::incbsa(a + 1.0f, b, x); t = a * numext::log(x) + b * numext::log1p(-x) + lgamma_impl::run(a + b) - lgamma_impl::run(a + 1.0f) - lgamma_impl::run(b); return (ans + numext::exp(t)); } else { return betainc_helper::incbsa(a, b, x); } } }; template <> struct betainc_helper { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE double incbps(double a, double b, double x) { const double machep = cephes_helper::machep(); double s, t, u, v, n, t1, z, ai; ai = 1.0 / a; u = (1.0 - b) * x; v = u / (a + 1.0); t1 = v; t = u; n = 2.0; s = 0.0; z = machep * ai; while (numext::abs(v) > z) { u = (n - b) * x / n; t *= u; v = t / (a + n); s += v; n += 1.0; } s += t1; s += ai; u = a * numext::log(x); // TODO: gamma() is not directly implemented in Eigen. /* if ((a + b) < maxgam && numext::abs(u) < maxlog) { t = gamma(a + b) / (gamma(a) * gamma(b)); s = s * t * pow(x, a); } */ t = lgamma_impl::run(a + b) - lgamma_impl::run(a) - lgamma_impl::run(b) + u + numext::log(s); return s = numext::exp(t); } }; template <> struct betainc_impl { EIGEN_DEVICE_FUNC static double run(double aa, double bb, double xx) { const double nan = NumTraits::quiet_NaN(); const double machep = cephes_helper::machep(); // const double maxgam = 171.624376956302725; double a, b, t, x, xc, w, y; bool reversed_a_b = false; if (aa <= 0.0 || bb <= 0.0) { return nan; // goto domerr; } if ((xx <= 0.0) || (xx >= 1.0)) { if (xx == 0.0) return (0.0); if (xx == 1.0) return (1.0); // mtherr("incbet", DOMAIN); return nan; } if ((bb * xx) <= 1.0 && xx <= 0.95) { return betainc_helper::incbps(aa, bb, xx); } w = 1.0 - xx; /* Reverse a and b if x is greater than the mean. */ if (xx > (aa / (aa + bb))) { reversed_a_b = true; a = bb; b = aa; xc = xx; x = w; } else { a = aa; b = bb; xc = w; x = xx; } if (reversed_a_b && (b * x) <= 1.0 && x <= 0.95) { t = betainc_helper::incbps(a, b, x); if (t <= machep) { t = 1.0 - machep; } else { t = 1.0 - t; } return t; } /* Choose expansion for better convergence. */ y = x * (a + b - 2.0) - (a - 1.0); if (y < 0.0) { w = incbeta_cfe::run(a, b, x, true /* small_branch */); } else { w = incbeta_cfe::run(a, b, x, false /* small_branch */) / xc; } /* Multiply w by the factor a b _ _ _ x (1-x) | (a+b) / ( a | (a) | (b) ) . */ y = a * numext::log(x); t = b * numext::log(xc); // TODO: gamma is not directly implemented in Eigen. /* if ((a + b) < maxgam && numext::abs(y) < maxlog && numext::abs(t) < maxlog) { t = pow(xc, b); t *= pow(x, a); t /= a; t *= w; t *= gamma(a + b) / (gamma(a) * gamma(b)); } else { */ /* Resort to logarithms. */ y += t + lgamma_impl::run(a + b) - lgamma_impl::run(a) - lgamma_impl::run(b); y += numext::log(w / a); t = numext::exp(y); /* } */ // done: if (reversed_a_b) { if (t <= machep) { t = 1.0 - machep; } else { t = 1.0 - t; } } return t; } }; #endif // EIGEN_HAS_C99_MATH } // end namespace internal namespace numext { template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(lgamma, Scalar) lgamma(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(lgamma, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(digamma, Scalar) digamma(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(digamma, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(zeta, Scalar) zeta(const Scalar& x, const Scalar& q) { return EIGEN_MATHFUNC_IMPL(zeta, Scalar)::run(x, q); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(polygamma, Scalar) polygamma(const Scalar& n, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(polygamma, Scalar)::run(n, x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(erf, Scalar) erf(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(erf, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(erfc, Scalar) erfc(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(erfc, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(ndtri, Scalar) ndtri(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(ndtri, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(igamma, Scalar) igamma(const Scalar& a, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(igamma, Scalar)::run(a, x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(igamma_der_a, Scalar) igamma_der_a(const Scalar& a, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(igamma_der_a, Scalar)::run(a, x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(gamma_sample_der_alpha, Scalar) gamma_sample_der_alpha(const Scalar& a, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(gamma_sample_der_alpha, Scalar)::run(a, x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(igammac, Scalar) igammac(const Scalar& a, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(igammac, Scalar)::run(a, x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(betainc, Scalar) betainc(const Scalar& a, const Scalar& b, const Scalar& x) { return EIGEN_MATHFUNC_IMPL(betainc, Scalar)::run(a, b, x); } } // end namespace numext } // end namespace Eigen #endif // EIGEN_SPECIAL_FUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsFunctors.h0000644000176200001440000002666414562417221031375 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Eugene Brevdo // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_FUNCTORS_H #define EIGEN_SPECIALFUNCTIONS_FUNCTORS_H namespace Eigen { namespace internal { /** \internal * \brief Template functor to compute the incomplete gamma function igamma(a, x) * * \sa class CwiseBinaryOp, Cwise::igamma */ template struct scalar_igamma_op : binary_op_base { EIGEN_EMPTY_STRUCT_CTOR(scalar_igamma_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& x) const { using numext::igamma; return igamma(a, x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& x) const { return internal::pigamma(a, x); } }; template struct functor_traits > { enum { // Guesstimate Cost = 20 * NumTraits::MulCost + 10 * NumTraits::AddCost, PacketAccess = packet_traits::HasIGamma }; }; /** \internal * \brief Template functor to compute the derivative of the incomplete gamma * function igamma_der_a(a, x) * * \sa class CwiseBinaryOp, Cwise::igamma_der_a */ template struct scalar_igamma_der_a_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_igamma_der_a_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& a, const Scalar& x) const { using numext::igamma_der_a; return igamma_der_a(a, x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& x) const { return internal::pigamma_der_a(a, x); } }; template struct functor_traits > { enum { // 2x the cost of igamma Cost = 40 * NumTraits::MulCost + 20 * NumTraits::AddCost, PacketAccess = packet_traits::HasIGammaDerA }; }; /** \internal * \brief Template functor to compute the derivative of the sample * of a Gamma(alpha, 1) random variable with respect to the parameter alpha * gamma_sample_der_alpha(alpha, sample) * * \sa class CwiseBinaryOp, Cwise::gamma_sample_der_alpha */ template struct scalar_gamma_sample_der_alpha_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_gamma_sample_der_alpha_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& alpha, const Scalar& sample) const { using numext::gamma_sample_der_alpha; return gamma_sample_der_alpha(alpha, sample); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& alpha, const Packet& sample) const { return internal::pgamma_sample_der_alpha(alpha, sample); } }; template struct functor_traits > { enum { // 2x the cost of igamma, minus the lgamma cost (the lgamma cancels out) Cost = 30 * NumTraits::MulCost + 15 * NumTraits::AddCost, PacketAccess = packet_traits::HasGammaSampleDerAlpha }; }; /** \internal * \brief Template functor to compute the complementary incomplete gamma function igammac(a, x) * * \sa class CwiseBinaryOp, Cwise::igammac */ template struct scalar_igammac_op : binary_op_base { EIGEN_EMPTY_STRUCT_CTOR(scalar_igammac_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& x) const { using numext::igammac; return igammac(a, x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& x) const { return internal::pigammac(a, x); } }; template struct functor_traits > { enum { // Guesstimate Cost = 20 * NumTraits::MulCost + 10 * NumTraits::AddCost, PacketAccess = packet_traits::HasIGammac }; }; /** \internal * \brief Template functor to compute the incomplete beta integral betainc(a, b, x) * */ template struct scalar_betainc_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_betainc_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& x, const Scalar& a, const Scalar& b) const { using numext::betainc; return betainc(x, a, b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& x, const Packet& a, const Packet& b) const { return internal::pbetainc(x, a, b); } }; template struct functor_traits > { enum { // Guesstimate Cost = 400 * NumTraits::MulCost + 400 * NumTraits::AddCost, PacketAccess = packet_traits::HasBetaInc }; }; /** \internal * \brief Template functor to compute the natural log of the absolute * value of Gamma of a scalar * \sa class CwiseUnaryOp, Cwise::lgamma() */ template struct scalar_lgamma_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_lgamma_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::lgamma; return lgamma(a); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a) const { return internal::plgamma(a); } }; template struct functor_traits > { enum { // Guesstimate Cost = 10 * NumTraits::MulCost + 5 * NumTraits::AddCost, PacketAccess = packet_traits::HasLGamma }; }; /** \internal * \brief Template functor to compute psi, the derivative of lgamma of a scalar. * \sa class CwiseUnaryOp, Cwise::digamma() */ template struct scalar_digamma_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_digamma_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::digamma; return digamma(a); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a) const { return internal::pdigamma(a); } }; template struct functor_traits > { enum { // Guesstimate Cost = 10 * NumTraits::MulCost + 5 * NumTraits::AddCost, PacketAccess = packet_traits::HasDiGamma }; }; /** \internal * \brief Template functor to compute the Riemann Zeta function of two arguments. * \sa class CwiseUnaryOp, Cwise::zeta() */ template struct scalar_zeta_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_zeta_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& x, const Scalar& q) const { using numext::zeta; return zeta(x, q); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x, const Packet& q) const { return internal::pzeta(x, q); } }; template struct functor_traits > { enum { // Guesstimate Cost = 10 * NumTraits::MulCost + 5 * NumTraits::AddCost, PacketAccess = packet_traits::HasZeta }; }; /** \internal * \brief Template functor to compute the polygamma function. * \sa class CwiseUnaryOp, Cwise::polygamma() */ template struct scalar_polygamma_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_polygamma_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& n, const Scalar& x) const { using numext::polygamma; return polygamma(n, x); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& n, const Packet& x) const { return internal::ppolygamma(n, x); } }; template struct functor_traits > { enum { // Guesstimate Cost = 10 * NumTraits::MulCost + 5 * NumTraits::AddCost, PacketAccess = packet_traits::HasPolygamma }; }; /** \internal * \brief Template functor to compute the error function of a scalar * \sa class CwiseUnaryOp, ArrayBase::erf() */ template struct scalar_erf_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_erf_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& a) const { return numext::erf(a); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { return perf(x); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasErf, Cost = (PacketAccess #ifdef EIGEN_VECTORIZE_FMA // TODO(rmlarsen): Move the FMA cost model to a central location. // Haswell can issue 2 add/mul/madd per cycle. // 10 pmadd, 2 pmul, 1 div, 2 other ? (2 * NumTraits::AddCost + 7 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value) #else ? (12 * NumTraits::AddCost + 12 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value) #endif // Assume for simplicity that this is as expensive as an exp(). : (functor_traits >::Cost)) }; }; /** \internal * \brief Template functor to compute the Complementary Error Function * of a scalar * \sa class CwiseUnaryOp, Cwise::erfc() */ template struct scalar_erfc_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_erfc_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::erfc; return erfc(a); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a) const { return internal::perfc(a); } }; template struct functor_traits > { enum { // Guesstimate Cost = 10 * NumTraits::MulCost + 5 * NumTraits::AddCost, PacketAccess = packet_traits::HasErfc }; }; /** \internal * \brief Template functor to compute the Inverse of the normal distribution * function of a scalar * \sa class CwiseUnaryOp, Cwise::ndtri() */ template struct scalar_ndtri_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_ndtri_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::ndtri; return ndtri(a); } typedef typename packet_traits::type Packet; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a) const { return internal::pndtri(a); } }; template struct functor_traits > { enum { // On average, We are evaluating rational functions with degree N=9 in the // numerator and denominator. This results in 2*N additions and 2*N // multiplications. Cost = 18 * NumTraits::MulCost + 18 * NumTraits::AddCost, PacketAccess = packet_traits::HasNdtri }; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SPECIALFUNCTIONS_FUNCTORS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsPacketMath.h0000644000176200001440000000764614562417221031447 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSELFUNCTIONS_PACKETMATH_H #define EIGEN_BESSELFUNCTIONS_PACKETMATH_H namespace Eigen { namespace internal { /** \internal \returns the exponentially scaled modified Bessel function of * order zero i0(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_i0(const Packet& x) { return numext::bessel_i0(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order zero i0e(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_i0e(const Packet& x) { return numext::bessel_i0e(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one i1(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_i1(const Packet& x) { return numext::bessel_i1(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one i1e(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_i1e(const Packet& x) { return numext::bessel_i1e(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order zero j0(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_j0(const Packet& x) { return numext::bessel_j0(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order zero j1(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_j1(const Packet& x) { return numext::bessel_j1(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one y0(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_y0(const Packet& x) { return numext::bessel_y0(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one y1(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_y1(const Packet& x) { return numext::bessel_y1(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order zero k0(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_k0(const Packet& x) { return numext::bessel_k0(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order zero k0e(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_k0e(const Packet& x) { return numext::bessel_k0e(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one k1e(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_k1(const Packet& x) { return numext::bessel_k1(x); } /** \internal \returns the exponentially scaled modified Bessel function of * order one k1e(\a a) (coeff-wise) */ template EIGEN_DEVICE_FUNC EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pbessel_k1e(const Packet& x) { return numext::bessel_k1e(x); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_BESSELFUNCTIONS_PACKETMATH_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsImpl.h0000644000176200001440000021000014562417221030303 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Eugene Brevdo // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSEL_FUNCTIONS_H #define EIGEN_BESSEL_FUNCTIONS_H namespace Eigen { namespace internal { // Parts of this code are based on the Cephes Math Library. // // Cephes Math Library Release 2.8: June, 2000 // Copyright 1984, 1987, 1992, 2000 by Stephen L. Moshier // // Permission has been kindly provided by the original author // to incorporate the Cephes software into the Eigen codebase: // // From: Stephen Moshier // To: Eugene Brevdo // Subject: Re: Permission to wrap several cephes functions in Eigen // // Hello Eugene, // // Thank you for writing. // // If your licensing is similar to BSD, the formal way that has been // handled is simply to add a statement to the effect that you are incorporating // the Cephes software by permission of the author. // // Good luck with your project, // Steve /**************************************************************************** * Implementation of Bessel function, based on Cephes * ****************************************************************************/ template struct bessel_i0e_retval { typedef Scalar type; }; template ::type> struct generic_i0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_i0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* i0ef.c * * Modified Bessel function of order zero, * exponentially scaled * * * * SYNOPSIS: * * float x, y, i0ef(); * * y = i0ef( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of order zero of the argument. * * The function is defined as i0e(x) = exp(-|x|) j0( ix ). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0,30 100000 3.7e-7 7.0e-8 * See i0f(). * */ const float A[] = {-1.30002500998624804212E-8f, 6.04699502254191894932E-8f, -2.67079385394061173391E-7f, 1.11738753912010371815E-6f, -4.41673835845875056359E-6f, 1.64484480707288970893E-5f, -5.75419501008210370398E-5f, 1.88502885095841655729E-4f, -5.76375574538582365885E-4f, 1.63947561694133579842E-3f, -4.32430999505057594430E-3f, 1.05464603945949983183E-2f, -2.37374148058994688156E-2f, 4.93052842396707084878E-2f, -9.49010970480476444210E-2f, 1.71620901522208775349E-1f, -3.04682672343198398683E-1f, 6.76795274409476084995E-1f}; const float B[] = {3.39623202570838634515E-9f, 2.26666899049817806459E-8f, 2.04891858946906374183E-7f, 2.89137052083475648297E-6f, 6.88975834691682398426E-5f, 3.36911647825569408990E-3f, 8.04490411014108831608E-1f}; T y = pabs(x); T y_le_eight = internal::pchebevl::run( pmadd(pset1(0.5f), y, pset1(-2.0f)), A); T y_gt_eight = pmul( internal::pchebevl::run( psub(pdiv(pset1(32.0f), y), pset1(2.0f)), B), prsqrt(y)); // TODO: Perhaps instead check whether all packet elements are in // [-8, 8] and evaluate a branch based off of that. It's possible // in practice most elements are in this region. return pselect(pcmp_le(y, pset1(8.0f)), y_le_eight, y_gt_eight); } }; template struct generic_i0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* i0e.c * * Modified Bessel function of order zero, * exponentially scaled * * * * SYNOPSIS: * * double x, y, i0e(); * * y = i0e( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of order zero of the argument. * * The function is defined as i0e(x) = exp(-|x|) j0( ix ). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0,30 30000 5.4e-16 1.2e-16 * See i0(). * */ const double A[] = {-4.41534164647933937950E-18, 3.33079451882223809783E-17, -2.43127984654795469359E-16, 1.71539128555513303061E-15, -1.16853328779934516808E-14, 7.67618549860493561688E-14, -4.85644678311192946090E-13, 2.95505266312963983461E-12, -1.72682629144155570723E-11, 9.67580903537323691224E-11, -5.18979560163526290666E-10, 2.65982372468238665035E-9, -1.30002500998624804212E-8, 6.04699502254191894932E-8, -2.67079385394061173391E-7, 1.11738753912010371815E-6, -4.41673835845875056359E-6, 1.64484480707288970893E-5, -5.75419501008210370398E-5, 1.88502885095841655729E-4, -5.76375574538582365885E-4, 1.63947561694133579842E-3, -4.32430999505057594430E-3, 1.05464603945949983183E-2, -2.37374148058994688156E-2, 4.93052842396707084878E-2, -9.49010970480476444210E-2, 1.71620901522208775349E-1, -3.04682672343198398683E-1, 6.76795274409476084995E-1}; const double B[] = { -7.23318048787475395456E-18, -4.83050448594418207126E-18, 4.46562142029675999901E-17, 3.46122286769746109310E-17, -2.82762398051658348494E-16, -3.42548561967721913462E-16, 1.77256013305652638360E-15, 3.81168066935262242075E-15, -9.55484669882830764870E-15, -4.15056934728722208663E-14, 1.54008621752140982691E-14, 3.85277838274214270114E-13, 7.18012445138366623367E-13, -1.79417853150680611778E-12, -1.32158118404477131188E-11, -3.14991652796324136454E-11, 1.18891471078464383424E-11, 4.94060238822496958910E-10, 3.39623202570838634515E-9, 2.26666899049817806459E-8, 2.04891858946906374183E-7, 2.89137052083475648297E-6, 6.88975834691682398426E-5, 3.36911647825569408990E-3, 8.04490411014108831608E-1}; T y = pabs(x); T y_le_eight = internal::pchebevl::run( pmadd(pset1(0.5), y, pset1(-2.0)), A); T y_gt_eight = pmul( internal::pchebevl::run( psub(pdiv(pset1(32.0), y), pset1(2.0)), B), prsqrt(y)); // TODO: Perhaps instead check whether all packet elements are in // [-8, 8] and evaluate a branch based off of that. It's possible // in practice most elements are in this region. return pselect(pcmp_le(y, pset1(8.0)), y_le_eight, y_gt_eight); } }; template struct bessel_i0e_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_i0e::run(x); } }; template struct bessel_i0_retval { typedef Scalar type; }; template ::type> struct generic_i0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { return pmul( pexp(pabs(x)), generic_i0e::run(x)); } }; template struct bessel_i0_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_i0::run(x); } }; template struct bessel_i1e_retval { typedef Scalar type; }; template ::type > struct generic_i1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_i1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* i1ef.c * * Modified Bessel function of order one, * exponentially scaled * * * * SYNOPSIS: * * float x, y, i1ef(); * * y = i1ef( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of order one of the argument. * * The function is defined as i1(x) = -i exp(-|x|) j1( ix ). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 1.5e-6 1.5e-7 * See i1(). * */ const float A[] = {9.38153738649577178388E-9f, -4.44505912879632808065E-8f, 2.00329475355213526229E-7f, -8.56872026469545474066E-7f, 3.47025130813767847674E-6f, -1.32731636560394358279E-5f, 4.78156510755005422638E-5f, -1.61760815825896745588E-4f, 5.12285956168575772895E-4f, -1.51357245063125314899E-3f, 4.15642294431288815669E-3f, -1.05640848946261981558E-2f, 2.47264490306265168283E-2f, -5.29459812080949914269E-2f, 1.02643658689847095384E-1f, -1.76416518357834055153E-1f, 2.52587186443633654823E-1f}; const float B[] = {-3.83538038596423702205E-9f, -2.63146884688951950684E-8f, -2.51223623787020892529E-7f, -3.88256480887769039346E-6f, -1.10588938762623716291E-4f, -9.76109749136146840777E-3f, 7.78576235018280120474E-1f}; T y = pabs(x); T y_le_eight = pmul(y, internal::pchebevl::run( pmadd(pset1(0.5f), y, pset1(-2.0f)), A)); T y_gt_eight = pmul( internal::pchebevl::run( psub(pdiv(pset1(32.0f), y), pset1(2.0f)), B), prsqrt(y)); // TODO: Perhaps instead check whether all packet elements are in // [-8, 8] and evaluate a branch based off of that. It's possible // in practice most elements are in this region. y = pselect(pcmp_le(y, pset1(8.0f)), y_le_eight, y_gt_eight); return pselect(pcmp_lt(x, pset1(0.0f)), pnegate(y), y); } }; template struct generic_i1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* i1e.c * * Modified Bessel function of order one, * exponentially scaled * * * * SYNOPSIS: * * double x, y, i1e(); * * y = i1e( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of order one of the argument. * * The function is defined as i1(x) = -i exp(-|x|) j1( ix ). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 2.0e-15 2.0e-16 * See i1(). * */ const double A[] = {2.77791411276104639959E-18, -2.11142121435816608115E-17, 1.55363195773620046921E-16, -1.10559694773538630805E-15, 7.60068429473540693410E-15, -5.04218550472791168711E-14, 3.22379336594557470981E-13, -1.98397439776494371520E-12, 1.17361862988909016308E-11, -6.66348972350202774223E-11, 3.62559028155211703701E-10, -1.88724975172282928790E-9, 9.38153738649577178388E-9, -4.44505912879632808065E-8, 2.00329475355213526229E-7, -8.56872026469545474066E-7, 3.47025130813767847674E-6, -1.32731636560394358279E-5, 4.78156510755005422638E-5, -1.61760815825896745588E-4, 5.12285956168575772895E-4, -1.51357245063125314899E-3, 4.15642294431288815669E-3, -1.05640848946261981558E-2, 2.47264490306265168283E-2, -5.29459812080949914269E-2, 1.02643658689847095384E-1, -1.76416518357834055153E-1, 2.52587186443633654823E-1}; const double B[] = { 7.51729631084210481353E-18, 4.41434832307170791151E-18, -4.65030536848935832153E-17, -3.20952592199342395980E-17, 2.96262899764595013876E-16, 3.30820231092092828324E-16, -1.88035477551078244854E-15, -3.81440307243700780478E-15, 1.04202769841288027642E-14, 4.27244001671195135429E-14, -2.10154184277266431302E-14, -4.08355111109219731823E-13, -7.19855177624590851209E-13, 2.03562854414708950722E-12, 1.41258074366137813316E-11, 3.25260358301548823856E-11, -1.89749581235054123450E-11, -5.58974346219658380687E-10, -3.83538038596423702205E-9, -2.63146884688951950684E-8, -2.51223623787020892529E-7, -3.88256480887769039346E-6, -1.10588938762623716291E-4, -9.76109749136146840777E-3, 7.78576235018280120474E-1}; T y = pabs(x); T y_le_eight = pmul(y, internal::pchebevl::run( pmadd(pset1(0.5), y, pset1(-2.0)), A)); T y_gt_eight = pmul( internal::pchebevl::run( psub(pdiv(pset1(32.0), y), pset1(2.0)), B), prsqrt(y)); // TODO: Perhaps instead check whether all packet elements are in // [-8, 8] and evaluate a branch based off of that. It's possible // in practice most elements are in this region. y = pselect(pcmp_le(y, pset1(8.0)), y_le_eight, y_gt_eight); return pselect(pcmp_lt(x, pset1(0.0)), pnegate(y), y); } }; template struct bessel_i1e_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_i1e::run(x); } }; template struct bessel_i1_retval { typedef T type; }; template ::type> struct generic_i1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { return pmul( pexp(pabs(x)), generic_i1e::run(x)); } }; template struct bessel_i1_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_i1::run(x); } }; template struct bessel_k0e_retval { typedef T type; }; template ::type> struct generic_k0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_k0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k0ef.c * Modified Bessel function, third kind, order zero, * exponentially scaled * * * * SYNOPSIS: * * float x, y, k0ef(); * * y = k0ef( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of the third kind of order zero of the argument. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 8.1e-7 7.8e-8 * See k0(). * */ const float A[] = {1.90451637722020886025E-9f, 2.53479107902614945675E-7f, 2.28621210311945178607E-5f, 1.26461541144692592338E-3f, 3.59799365153615016266E-2f, 3.44289899924628486886E-1f, -5.35327393233902768720E-1f}; const float B[] = {-1.69753450938905987466E-9f, 8.57403401741422608519E-9f, -4.66048989768794782956E-8f, 2.76681363944501510342E-7f, -1.83175552271911948767E-6f, 1.39498137188764993662E-5f, -1.28495495816278026384E-4f, 1.56988388573005337491E-3f, -3.14481013119645005427E-2f, 2.44030308206595545468E0f}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A); x_le_two = pmadd( generic_i0::run(x), pnegate( plog(pmul(pset1(0.5), x))), x_le_two); x_le_two = pmul(pexp(x), x_le_two); T x_gt_two = pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x)); return pselect( pcmp_le(x, pset1(0.0)), MAXNUM, pselect(pcmp_le(x, two), x_le_two, x_gt_two)); } }; template struct generic_k0e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k0e.c * Modified Bessel function, third kind, order zero, * exponentially scaled * * * * SYNOPSIS: * * double x, y, k0e(); * * y = k0e( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of the third kind of order zero of the argument. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 1.4e-15 1.4e-16 * See k0(). * */ const double A[] = { 1.37446543561352307156E-16, 4.25981614279661018399E-14, 1.03496952576338420167E-11, 1.90451637722020886025E-9, 2.53479107902614945675E-7, 2.28621210311945178607E-5, 1.26461541144692592338E-3, 3.59799365153615016266E-2, 3.44289899924628486886E-1, -5.35327393233902768720E-1}; const double B[] = { 5.30043377268626276149E-18, -1.64758043015242134646E-17, 5.21039150503902756861E-17, -1.67823109680541210385E-16, 5.51205597852431940784E-16, -1.84859337734377901440E-15, 6.34007647740507060557E-15, -2.22751332699166985548E-14, 8.03289077536357521100E-14, -2.98009692317273043925E-13, 1.14034058820847496303E-12, -4.51459788337394416547E-12, 1.85594911495471785253E-11, -7.95748924447710747776E-11, 3.57739728140030116597E-10, -1.69753450938905987466E-9, 8.57403401741422608519E-9, -4.66048989768794782956E-8, 2.76681363944501510342E-7, -1.83175552271911948767E-6, 1.39498137188764993662E-5, -1.28495495816278026384E-4, 1.56988388573005337491E-3, -3.14481013119645005427E-2, 2.44030308206595545468E0 }; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A); x_le_two = pmadd( generic_i0::run(x), pmul( pset1(-1.0), plog(pmul(pset1(0.5), x))), x_le_two); x_le_two = pmul(pexp(x), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x)); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct bessel_k0e_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_k0e::run(x); } }; template struct bessel_k0_retval { typedef T type; }; template ::type> struct generic_k0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_k0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k0f.c * Modified Bessel function, third kind, order zero * * * * SYNOPSIS: * * float x, y, k0f(); * * y = k0f( x ); * * * * DESCRIPTION: * * Returns modified Bessel function of the third kind * of order zero of the argument. * * The range is partitioned into the two intervals [0,8] and * (8, infinity). Chebyshev polynomial expansions are employed * in each interval. * * * * ACCURACY: * * Tested at 2000 random points between 0 and 8. Peak absolute * error (relative when K0 > 1) was 1.46e-14; rms, 4.26e-15. * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 7.8e-7 8.5e-8 * * ERROR MESSAGES: * * message condition value returned * K0 domain x <= 0 MAXNUM * */ const float A[] = {1.90451637722020886025E-9f, 2.53479107902614945675E-7f, 2.28621210311945178607E-5f, 1.26461541144692592338E-3f, 3.59799365153615016266E-2f, 3.44289899924628486886E-1f, -5.35327393233902768720E-1f}; const float B[] = {-1.69753450938905987466E-9f, 8.57403401741422608519E-9f, -4.66048989768794782956E-8f, 2.76681363944501510342E-7f, -1.83175552271911948767E-6f, 1.39498137188764993662E-5f, -1.28495495816278026384E-4f, 1.56988388573005337491E-3f, -3.14481013119645005427E-2f, 2.44030308206595545468E0f}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A); x_le_two = pmadd( generic_i0::run(x), pnegate( plog(pmul(pset1(0.5), x))), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( pmul( pexp(pnegate(x)), internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B)), prsqrt(x)); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct generic_k0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* * * Modified Bessel function, third kind, order zero, * exponentially scaled * * * * SYNOPSIS: * * double x, y, k0(); * * y = k0( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of the third kind of order zero of the argument. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 1.4e-15 1.4e-16 * See k0(). * */ const double A[] = { 1.37446543561352307156E-16, 4.25981614279661018399E-14, 1.03496952576338420167E-11, 1.90451637722020886025E-9, 2.53479107902614945675E-7, 2.28621210311945178607E-5, 1.26461541144692592338E-3, 3.59799365153615016266E-2, 3.44289899924628486886E-1, -5.35327393233902768720E-1}; const double B[] = { 5.30043377268626276149E-18, -1.64758043015242134646E-17, 5.21039150503902756861E-17, -1.67823109680541210385E-16, 5.51205597852431940784E-16, -1.84859337734377901440E-15, 6.34007647740507060557E-15, -2.22751332699166985548E-14, 8.03289077536357521100E-14, -2.98009692317273043925E-13, 1.14034058820847496303E-12, -4.51459788337394416547E-12, 1.85594911495471785253E-11, -7.95748924447710747776E-11, 3.57739728140030116597E-10, -1.69753450938905987466E-9, 8.57403401741422608519E-9, -4.66048989768794782956E-8, 2.76681363944501510342E-7, -1.83175552271911948767E-6, 1.39498137188764993662E-5, -1.28495495816278026384E-4, 1.56988388573005337491E-3, -3.14481013119645005427E-2, 2.44030308206595545468E0 }; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A); x_le_two = pmadd( generic_i0::run(x), pnegate( plog(pmul(pset1(0.5), x))), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( pmul( pexp(-x), internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B)), prsqrt(x)); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct bessel_k0_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_k0::run(x); } }; template struct bessel_k1e_retval { typedef T type; }; template ::type> struct generic_k1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_k1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k1ef.c * * Modified Bessel function, third kind, order one, * exponentially scaled * * * * SYNOPSIS: * * float x, y, k1ef(); * * y = k1ef( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of the third kind of order one of the argument: * * k1e(x) = exp(x) * k1(x). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 4.9e-7 6.7e-8 * See k1(). * */ const float A[] = {-2.21338763073472585583E-8f, -2.43340614156596823496E-6f, -1.73028895751305206302E-4f, -6.97572385963986435018E-3f, -1.22611180822657148235E-1f, -3.53155960776544875667E-1f, 1.52530022733894777053E0f}; const float B[] = {2.01504975519703286596E-9f, -1.03457624656780970260E-8f, 5.74108412545004946722E-8f, -3.50196060308781257119E-7f, 2.40648494783721712015E-6f, -1.93619797416608296024E-5f, 1.95215518471351631108E-4f, -2.85781685962277938680E-3f, 1.03923736576817238437E-1f, 2.72062619048444266945E0f}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = pdiv(internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A), x); x_le_two = pmadd( generic_i1::run(x), plog(pmul(pset1(0.5), x)), x_le_two); x_le_two = pmul(x_le_two, pexp(x)); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x)); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct generic_k1e { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k1e.c * * Modified Bessel function, third kind, order one, * exponentially scaled * * * * SYNOPSIS: * * double x, y, k1e(); * * y = k1e( x ); * * * * DESCRIPTION: * * Returns exponentially scaled modified Bessel function * of the third kind of order one of the argument: * * k1e(x) = exp(x) * k1(x). * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 7.8e-16 1.2e-16 * See k1(). * */ const double A[] = {-7.02386347938628759343E-18, -2.42744985051936593393E-15, -6.66690169419932900609E-13, -1.41148839263352776110E-10, -2.21338763073472585583E-8, -2.43340614156596823496E-6, -1.73028895751305206302E-4, -6.97572385963986435018E-3, -1.22611180822657148235E-1, -3.53155960776544875667E-1, 1.52530022733894777053E0}; const double B[] = {-5.75674448366501715755E-18, 1.79405087314755922667E-17, -5.68946255844285935196E-17, 1.83809354436663880070E-16, -6.05704724837331885336E-16, 2.03870316562433424052E-15, -7.01983709041831346144E-15, 2.47715442448130437068E-14, -8.97670518232499435011E-14, 3.34841966607842919884E-13, -1.28917396095102890680E-12, 5.13963967348173025100E-12, -2.12996783842756842877E-11, 9.21831518760500529508E-11, -4.19035475934189648750E-10, 2.01504975519703286596E-9, -1.03457624656780970260E-8, 5.74108412545004946722E-8, -3.50196060308781257119E-7, 2.40648494783721712015E-6, -1.93619797416608296024E-5, 1.95215518471351631108E-4, -2.85781685962277938680E-3, 1.03923736576817238437E-1, 2.72062619048444266945E0}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = pdiv(internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A), x); x_le_two = pmadd( generic_i1::run(x), plog(pmul(pset1(0.5), x)), x_le_two); x_le_two = pmul(x_le_two, pexp(x)); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x)); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct bessel_k1e_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_k1e::run(x); } }; template struct bessel_k1_retval { typedef T type; }; template ::type> struct generic_k1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_k1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k1f.c * Modified Bessel function, third kind, order one * * * * SYNOPSIS: * * float x, y, k1f(); * * y = k1f( x ); * * * * DESCRIPTION: * * Computes the modified Bessel function of the third kind * of order one of the argument. * * The range is partitioned into the two intervals [0,2] and * (2, infinity). Chebyshev polynomial expansions are employed * in each interval. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 4.6e-7 7.6e-8 * * ERROR MESSAGES: * * message condition value returned * k1 domain x <= 0 MAXNUM * */ const float A[] = {-2.21338763073472585583E-8f, -2.43340614156596823496E-6f, -1.73028895751305206302E-4f, -6.97572385963986435018E-3f, -1.22611180822657148235E-1f, -3.53155960776544875667E-1f, 1.52530022733894777053E0f}; const float B[] = {2.01504975519703286596E-9f, -1.03457624656780970260E-8f, 5.74108412545004946722E-8f, -3.50196060308781257119E-7f, 2.40648494783721712015E-6f, -1.93619797416608296024E-5f, 1.95215518471351631108E-4f, -2.85781685962277938680E-3f, 1.03923736576817238437E-1f, 2.72062619048444266945E0f}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = pdiv(internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A), x); x_le_two = pmadd( generic_i1::run(x), plog(pmul(pset1(0.5), x)), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( pexp(pnegate(x)), pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x))); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct generic_k1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* k1.c * Modified Bessel function, third kind, order one * * * * SYNOPSIS: * * float x, y, k1f(); * * y = k1f( x ); * * * * DESCRIPTION: * * Computes the modified Bessel function of the third kind * of order one of the argument. * * The range is partitioned into the two intervals [0,2] and * (2, infinity). Chebyshev polynomial expansions are employed * in each interval. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE 0, 30 30000 4.6e-7 7.6e-8 * * ERROR MESSAGES: * * message condition value returned * k1 domain x <= 0 MAXNUM * */ const double A[] = {-7.02386347938628759343E-18, -2.42744985051936593393E-15, -6.66690169419932900609E-13, -1.41148839263352776110E-10, -2.21338763073472585583E-8, -2.43340614156596823496E-6, -1.73028895751305206302E-4, -6.97572385963986435018E-3, -1.22611180822657148235E-1, -3.53155960776544875667E-1, 1.52530022733894777053E0}; const double B[] = {-5.75674448366501715755E-18, 1.79405087314755922667E-17, -5.68946255844285935196E-17, 1.83809354436663880070E-16, -6.05704724837331885336E-16, 2.03870316562433424052E-15, -7.01983709041831346144E-15, 2.47715442448130437068E-14, -8.97670518232499435011E-14, 3.34841966607842919884E-13, -1.28917396095102890680E-12, 5.13963967348173025100E-12, -2.12996783842756842877E-11, 9.21831518760500529508E-11, -4.19035475934189648750E-10, 2.01504975519703286596E-9, -1.03457624656780970260E-8, 5.74108412545004946722E-8, -3.50196060308781257119E-7, 2.40648494783721712015E-6, -1.93619797416608296024E-5, 1.95215518471351631108E-4, -2.85781685962277938680E-3, 1.03923736576817238437E-1, 2.72062619048444266945E0}; const T MAXNUM = pset1(NumTraits::infinity()); const T two = pset1(2.0); T x_le_two = pdiv(internal::pchebevl::run( pmadd(x, x, pset1(-2.0)), A), x); x_le_two = pmadd( generic_i1::run(x), plog(pmul(pset1(0.5), x)), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), MAXNUM, x_le_two); T x_gt_two = pmul( pexp(-x), pmul( internal::pchebevl::run( psub(pdiv(pset1(8.0), x), two), B), prsqrt(x))); return pselect(pcmp_le(x, two), x_le_two, x_gt_two); } }; template struct bessel_k1_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_k1::run(x); } }; template struct bessel_j0_retval { typedef T type; }; template ::type> struct generic_j0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_j0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j0f.c * Bessel function of order zero * * * * SYNOPSIS: * * float x, y, j0f(); * * y = j0f( x ); * * * * DESCRIPTION: * * Returns Bessel function of order zero of the argument. * * The domain is divided into the intervals [0, 2] and * (2, infinity). In the first interval the following polynomial * approximation is used: * * * 2 2 2 * (w - r ) (w - r ) (w - r ) P(w) * 1 2 3 * * 2 * where w = x and the three r's are zeros of the function. * * In the second interval, the modulus and phase are approximated * by polynomials of the form Modulus(x) = sqrt(1/x) Q(1/x) * and Phase(x) = x + 1/x R(1/x^2) - pi/4. The function is * * j0(x) = Modulus(x) cos( Phase(x) ). * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * IEEE 0, 2 100000 1.3e-7 3.6e-8 * IEEE 2, 32 100000 1.9e-7 5.4e-8 * */ const float JP[] = {-6.068350350393235E-008f, 6.388945720783375E-006f, -3.969646342510940E-004f, 1.332913422519003E-002f, -1.729150680240724E-001f}; const float MO[] = {-6.838999669318810E-002f, 1.864949361379502E-001f, -2.145007480346739E-001f, 1.197549369473540E-001f, -3.560281861530129E-003f, -4.969382655296620E-002f, -3.355424622293709E-006f, 7.978845717621440E-001f}; const float PH[] = {3.242077816988247E+001f, -3.630592630518434E+001f, 1.756221482109099E+001f, -4.974978466280903E+000f, 1.001973420681837E+000f, -1.939906941791308E-001f, 6.490598792654666E-002f, -1.249992184872738E-001f}; const T DR1 = pset1(5.78318596294678452118f); const T NEG_PIO4F = pset1(-0.7853981633974483096f); /* -pi / 4 */ T y = pabs(x); T z = pmul(y, y); T y_le_two = pselect( pcmp_lt(y, pset1(1.0e-3f)), pmadd(z, pset1(-0.25f), pset1(1.0f)), pmul(psub(z, DR1), internal::ppolevl::run(z, JP))); T q = pdiv(pset1(1.0f), y); T w = prsqrt(y); T p = pmul(w, internal::ppolevl::run(q, MO)); w = pmul(q, q); T yn = pmadd(q, internal::ppolevl::run(w, PH), NEG_PIO4F); T y_gt_two = pmul(p, pcos(padd(yn, y))); return pselect(pcmp_le(y, pset1(2.0)), y_le_two, y_gt_two); } }; template struct generic_j0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j0.c * Bessel function of order zero * * * * SYNOPSIS: * * double x, y, j0(); * * y = j0( x ); * * * * DESCRIPTION: * * Returns Bessel function of order zero of the argument. * * The domain is divided into the intervals [0, 5] and * (5, infinity). In the first interval the following rational * approximation is used: * * * 2 2 * (w - r ) (w - r ) P (w) / Q (w) * 1 2 3 8 * * 2 * where w = x and the two r's are zeros of the function. * * In the second interval, the Hankel asymptotic expansion * is employed with two rational functions of degree 6/6 * and 7/7. * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * DEC 0, 30 10000 4.4e-17 6.3e-18 * IEEE 0, 30 60000 4.2e-16 1.1e-16 * */ const double PP[] = {7.96936729297347051624E-4, 8.28352392107440799803E-2, 1.23953371646414299388E0, 5.44725003058768775090E0, 8.74716500199817011941E0, 5.30324038235394892183E0, 9.99999999999999997821E-1}; const double PQ[] = {9.24408810558863637013E-4, 8.56288474354474431428E-2, 1.25352743901058953537E0, 5.47097740330417105182E0, 8.76190883237069594232E0, 5.30605288235394617618E0, 1.00000000000000000218E0}; const double QP[] = {-1.13663838898469149931E-2, -1.28252718670509318512E0, -1.95539544257735972385E1, -9.32060152123768231369E1, -1.77681167980488050595E2, -1.47077505154951170175E2, -5.14105326766599330220E1, -6.05014350600728481186E0}; const double QQ[] = {1.00000000000000000000E0, 6.43178256118178023184E1, 8.56430025976980587198E2, 3.88240183605401609683E3, 7.24046774195652478189E3, 5.93072701187316984827E3, 2.06209331660327847417E3, 2.42005740240291393179E2}; const double RP[] = {-4.79443220978201773821E9, 1.95617491946556577543E12, -2.49248344360967716204E14, 9.70862251047306323952E15}; const double RQ[] = {1.00000000000000000000E0, 4.99563147152651017219E2, 1.73785401676374683123E5, 4.84409658339962045305E7, 1.11855537045356834862E10, 2.11277520115489217587E12, 3.10518229857422583814E14, 3.18121955943204943306E16, 1.71086294081043136091E18}; const T DR1 = pset1(5.78318596294678452118E0); const T DR2 = pset1(3.04712623436620863991E1); const T SQ2OPI = pset1(7.9788456080286535587989E-1); /* sqrt(2 / pi) */ const T NEG_PIO4 = pset1(-0.7853981633974483096); /* pi / 4 */ T y = pabs(x); T z = pmul(y, y); T y_le_five = pselect( pcmp_lt(y, pset1(1.0e-5)), pmadd(z, pset1(-0.25), pset1(1.0)), pmul(pmul(psub(z, DR1), psub(z, DR2)), pdiv(internal::ppolevl::run(z, RP), internal::ppolevl::run(z, RQ)))); T s = pdiv(pset1(25.0), z); T p = pdiv( internal::ppolevl::run(s, PP), internal::ppolevl::run(s, PQ)); T q = pdiv( internal::ppolevl::run(s, QP), internal::ppolevl::run(s, QQ)); T yn = padd(y, NEG_PIO4); T w = pdiv(pset1(-5.0), y); p = pmadd(p, pcos(yn), pmul(w, pmul(q, psin(yn)))); T y_gt_five = pmul(p, pmul(SQ2OPI, prsqrt(y))); return pselect(pcmp_le(y, pset1(5.0)), y_le_five, y_gt_five); } }; template struct bessel_j0_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_j0::run(x); } }; template struct bessel_y0_retval { typedef T type; }; template ::type> struct generic_y0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_y0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j0f.c * Bessel function of the second kind, order zero * * * * SYNOPSIS: * * float x, y, y0f(); * * y = y0f( x ); * * * * DESCRIPTION: * * Returns Bessel function of the second kind, of order * zero, of the argument. * * The domain is divided into the intervals [0, 2] and * (2, infinity). In the first interval a rational approximation * R(x) is employed to compute * * 2 2 2 * y0(x) = (w - r ) (w - r ) (w - r ) R(x) + 2/pi ln(x) j0(x). * 1 2 3 * * Thus a call to j0() is required. The three zeros are removed * from R(x) to improve its numerical stability. * * In the second interval, the modulus and phase are approximated * by polynomials of the form Modulus(x) = sqrt(1/x) Q(1/x) * and Phase(x) = x + 1/x S(1/x^2) - pi/4. Then the function is * * y0(x) = Modulus(x) sin( Phase(x) ). * * * * * ACCURACY: * * Absolute error, when y0(x) < 1; else relative error: * * arithmetic domain # trials peak rms * IEEE 0, 2 100000 2.4e-7 3.4e-8 * IEEE 2, 32 100000 1.8e-7 5.3e-8 * */ const float YP[] = {9.454583683980369E-008f, -9.413212653797057E-006f, 5.344486707214273E-004f, -1.584289289821316E-002f, 1.707584643733568E-001f}; const float MO[] = {-6.838999669318810E-002f, 1.864949361379502E-001f, -2.145007480346739E-001f, 1.197549369473540E-001f, -3.560281861530129E-003f, -4.969382655296620E-002f, -3.355424622293709E-006f, 7.978845717621440E-001f}; const float PH[] = {3.242077816988247E+001f, -3.630592630518434E+001f, 1.756221482109099E+001f, -4.974978466280903E+000f, 1.001973420681837E+000f, -1.939906941791308E-001f, 6.490598792654666E-002f, -1.249992184872738E-001f}; const T YZ1 = pset1(0.43221455686510834878f); const T TWOOPI = pset1(0.636619772367581343075535f); /* 2 / pi */ const T NEG_PIO4F = pset1(-0.7853981633974483096f); /* -pi / 4 */ const T NEG_MAXNUM = pset1(-NumTraits::infinity()); T z = pmul(x, x); T x_le_two = pmul(TWOOPI, pmul(plog(x), generic_j0::run(x))); x_le_two = pmadd( psub(z, YZ1), internal::ppolevl::run(z, YP), x_le_two); x_le_two = pselect(pcmp_le(x, pset1(0.0)), NEG_MAXNUM, x_le_two); T q = pdiv(pset1(1.0), x); T w = prsqrt(x); T p = pmul(w, internal::ppolevl::run(q, MO)); T u = pmul(q, q); T xn = pmadd(q, internal::ppolevl::run(u, PH), NEG_PIO4F); T x_gt_two = pmul(p, psin(padd(xn, x))); return pselect(pcmp_le(x, pset1(2.0)), x_le_two, x_gt_two); } }; template struct generic_y0 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j0.c * Bessel function of the second kind, order zero * * * * SYNOPSIS: * * double x, y, y0(); * * y = y0( x ); * * * * DESCRIPTION: * * Returns Bessel function of the second kind, of order * zero, of the argument. * * The domain is divided into the intervals [0, 5] and * (5, infinity). In the first interval a rational approximation * R(x) is employed to compute * y0(x) = R(x) + 2 * log(x) * j0(x) / PI. * Thus a call to j0() is required. * * In the second interval, the Hankel asymptotic expansion * is employed with two rational functions of degree 6/6 * and 7/7. * * * * ACCURACY: * * Absolute error, when y0(x) < 1; else relative error: * * arithmetic domain # trials peak rms * DEC 0, 30 9400 7.0e-17 7.9e-18 * IEEE 0, 30 30000 1.3e-15 1.6e-16 * */ const double PP[] = {7.96936729297347051624E-4, 8.28352392107440799803E-2, 1.23953371646414299388E0, 5.44725003058768775090E0, 8.74716500199817011941E0, 5.30324038235394892183E0, 9.99999999999999997821E-1}; const double PQ[] = {9.24408810558863637013E-4, 8.56288474354474431428E-2, 1.25352743901058953537E0, 5.47097740330417105182E0, 8.76190883237069594232E0, 5.30605288235394617618E0, 1.00000000000000000218E0}; const double QP[] = {-1.13663838898469149931E-2, -1.28252718670509318512E0, -1.95539544257735972385E1, -9.32060152123768231369E1, -1.77681167980488050595E2, -1.47077505154951170175E2, -5.14105326766599330220E1, -6.05014350600728481186E0}; const double QQ[] = {1.00000000000000000000E0, 6.43178256118178023184E1, 8.56430025976980587198E2, 3.88240183605401609683E3, 7.24046774195652478189E3, 5.93072701187316984827E3, 2.06209331660327847417E3, 2.42005740240291393179E2}; const double YP[] = {1.55924367855235737965E4, -1.46639295903971606143E7, 5.43526477051876500413E9, -9.82136065717911466409E11, 8.75906394395366999549E13, -3.46628303384729719441E15, 4.42733268572569800351E16, -1.84950800436986690637E16}; const double YQ[] = {1.00000000000000000000E0, 1.04128353664259848412E3, 6.26107330137134956842E5, 2.68919633393814121987E8, 8.64002487103935000337E10, 2.02979612750105546709E13, 3.17157752842975028269E15, 2.50596256172653059228E17}; const T SQ2OPI = pset1(7.9788456080286535587989E-1); /* sqrt(2 / pi) */ const T TWOOPI = pset1(0.636619772367581343075535); /* 2 / pi */ const T NEG_PIO4 = pset1(-0.7853981633974483096); /* -pi / 4 */ const T NEG_MAXNUM = pset1(-NumTraits::infinity()); T z = pmul(x, x); T x_le_five = pdiv(internal::ppolevl::run(z, YP), internal::ppolevl::run(z, YQ)); x_le_five = pmadd( pmul(TWOOPI, plog(x)), generic_j0::run(x), x_le_five); x_le_five = pselect(pcmp_le(x, pset1(0.0)), NEG_MAXNUM, x_le_five); T s = pdiv(pset1(25.0), z); T p = pdiv( internal::ppolevl::run(s, PP), internal::ppolevl::run(s, PQ)); T q = pdiv( internal::ppolevl::run(s, QP), internal::ppolevl::run(s, QQ)); T xn = padd(x, NEG_PIO4); T w = pdiv(pset1(5.0), x); p = pmadd(p, psin(xn), pmul(w, pmul(q, pcos(xn)))); T x_gt_five = pmul(p, pmul(SQ2OPI, prsqrt(x))); return pselect(pcmp_le(x, pset1(5.0)), x_le_five, x_gt_five); } }; template struct bessel_y0_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_y0::run(x); } }; template struct bessel_j1_retval { typedef T type; }; template ::type> struct generic_j1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_j1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j1f.c * Bessel function of order one * * * * SYNOPSIS: * * float x, y, j1f(); * * y = j1f( x ); * * * * DESCRIPTION: * * Returns Bessel function of order one of the argument. * * The domain is divided into the intervals [0, 2] and * (2, infinity). In the first interval a polynomial approximation * 2 * (w - r ) x P(w) * 1 * 2 * is used, where w = x and r is the first zero of the function. * * In the second interval, the modulus and phase are approximated * by polynomials of the form Modulus(x) = sqrt(1/x) Q(1/x) * and Phase(x) = x + 1/x R(1/x^2) - 3pi/4. The function is * * j0(x) = Modulus(x) cos( Phase(x) ). * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * IEEE 0, 2 100000 1.2e-7 2.5e-8 * IEEE 2, 32 100000 2.0e-7 5.3e-8 * * */ const float JP[] = {-4.878788132172128E-009f, 6.009061827883699E-007f, -4.541343896997497E-005f, 1.937383947804541E-003f, -3.405537384615824E-002f}; const float MO1[] = {6.913942741265801E-002f, -2.284801500053359E-001f, 3.138238455499697E-001f, -2.102302420403875E-001f, 5.435364690523026E-003f, 1.493389585089498E-001f, 4.976029650847191E-006f, 7.978845453073848E-001f}; const float PH1[] = {-4.497014141919556E+001f, 5.073465654089319E+001f, -2.485774108720340E+001f, 7.222973196770240E+000f, -1.544842782180211E+000f, 3.503787691653334E-001f, -1.637986776941202E-001f, 3.749989509080821E-001f}; const T Z1 = pset1(1.46819706421238932572E1f); const T NEG_THPIO4F = pset1(-2.35619449019234492885f); /* -3*pi/4 */ T y = pabs(x); T z = pmul(y, y); T y_le_two = pmul( psub(z, Z1), pmul(x, internal::ppolevl::run(z, JP))); T q = pdiv(pset1(1.0f), y); T w = prsqrt(y); T p = pmul(w, internal::ppolevl::run(q, MO1)); w = pmul(q, q); T yn = pmadd(q, internal::ppolevl::run(w, PH1), NEG_THPIO4F); T y_gt_two = pmul(p, pcos(padd(yn, y))); // j1 is an odd function. This implementation differs from cephes to // take this fact in to account. Cephes returns -j1(x) for y > 2 range. y_gt_two = pselect( pcmp_lt(x, pset1(0.0f)), pnegate(y_gt_two), y_gt_two); return pselect(pcmp_le(y, pset1(2.0f)), y_le_two, y_gt_two); } }; template struct generic_j1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j1.c * Bessel function of order one * * * * SYNOPSIS: * * double x, y, j1(); * * y = j1( x ); * * * * DESCRIPTION: * * Returns Bessel function of order one of the argument. * * The domain is divided into the intervals [0, 8] and * (8, infinity). In the first interval a 24 term Chebyshev * expansion is used. In the second, the asymptotic * trigonometric representation is employed using two * rational functions of degree 5/5. * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * DEC 0, 30 10000 4.0e-17 1.1e-17 * IEEE 0, 30 30000 2.6e-16 1.1e-16 * */ const double PP[] = {7.62125616208173112003E-4, 7.31397056940917570436E-2, 1.12719608129684925192E0, 5.11207951146807644818E0, 8.42404590141772420927E0, 5.21451598682361504063E0, 1.00000000000000000254E0}; const double PQ[] = {5.71323128072548699714E-4, 6.88455908754495404082E-2, 1.10514232634061696926E0, 5.07386386128601488557E0, 8.39985554327604159757E0, 5.20982848682361821619E0, 9.99999999999999997461E-1}; const double QP[] = {5.10862594750176621635E-2, 4.98213872951233449420E0, 7.58238284132545283818E1, 3.66779609360150777800E2, 7.10856304998926107277E2, 5.97489612400613639965E2, 2.11688757100572135698E2, 2.52070205858023719784E1}; const double QQ[] = {1.00000000000000000000E0, 7.42373277035675149943E1, 1.05644886038262816351E3, 4.98641058337653607651E3, 9.56231892404756170795E3, 7.99704160447350683650E3, 2.82619278517639096600E3, 3.36093607810698293419E2}; const double RP[] = {-8.99971225705559398224E8, 4.52228297998194034323E11, -7.27494245221818276015E13, 3.68295732863852883286E15}; const double RQ[] = {1.00000000000000000000E0, 6.20836478118054335476E2, 2.56987256757748830383E5, 8.35146791431949253037E7, 2.21511595479792499675E10, 4.74914122079991414898E12, 7.84369607876235854894E14, 8.95222336184627338078E16, 5.32278620332680085395E18}; const T Z1 = pset1(1.46819706421238932572E1); const T Z2 = pset1(4.92184563216946036703E1); const T NEG_THPIO4 = pset1(-2.35619449019234492885); /* -3*pi/4 */ const T SQ2OPI = pset1(7.9788456080286535587989E-1); /* sqrt(2 / pi) */ T y = pabs(x); T z = pmul(y, y); T y_le_five = pdiv(internal::ppolevl::run(z, RP), internal::ppolevl::run(z, RQ)); y_le_five = pmul(pmul(pmul(y_le_five, x), psub(z, Z1)), psub(z, Z2)); T s = pdiv(pset1(25.0), z); T p = pdiv( internal::ppolevl::run(s, PP), internal::ppolevl::run(s, PQ)); T q = pdiv( internal::ppolevl::run(s, QP), internal::ppolevl::run(s, QQ)); T yn = padd(y, NEG_THPIO4); T w = pdiv(pset1(-5.0), y); p = pmadd(p, pcos(yn), pmul(w, pmul(q, psin(yn)))); T y_gt_five = pmul(p, pmul(SQ2OPI, prsqrt(y))); // j1 is an odd function. This implementation differs from cephes to // take this fact in to account. Cephes returns -j1(x) for y > 5 range. y_gt_five = pselect( pcmp_lt(x, pset1(0.0)), pnegate(y_gt_five), y_gt_five); return pselect(pcmp_le(y, pset1(5.0)), y_le_five, y_gt_five); } }; template struct bessel_j1_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_j1::run(x); } }; template struct bessel_y1_retval { typedef T type; }; template ::type> struct generic_y1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T&) { EIGEN_STATIC_ASSERT((internal::is_same::value == false), THIS_TYPE_IS_NOT_SUPPORTED); return ScalarType(0); } }; template struct generic_y1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j1f.c * Bessel function of second kind of order one * * * * SYNOPSIS: * * double x, y, y1(); * * y = y1( x ); * * * * DESCRIPTION: * * Returns Bessel function of the second kind of order one * of the argument. * * The domain is divided into the intervals [0, 2] and * (2, infinity). In the first interval a rational approximation * R(x) is employed to compute * * 2 * y0(x) = (w - r ) x R(x^2) + 2/pi (ln(x) j1(x) - 1/x) . * 1 * * Thus a call to j1() is required. * * In the second interval, the modulus and phase are approximated * by polynomials of the form Modulus(x) = sqrt(1/x) Q(1/x) * and Phase(x) = x + 1/x S(1/x^2) - 3pi/4. Then the function is * * y0(x) = Modulus(x) sin( Phase(x) ). * * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * IEEE 0, 2 100000 2.2e-7 4.6e-8 * IEEE 2, 32 100000 1.9e-7 5.3e-8 * * (error criterion relative when |y1| > 1). * */ const float YP[] = {8.061978323326852E-009f, -9.496460629917016E-007f, 6.719543806674249E-005f, -2.641785726447862E-003f, 4.202369946500099E-002f}; const float MO1[] = {6.913942741265801E-002f, -2.284801500053359E-001f, 3.138238455499697E-001f, -2.102302420403875E-001f, 5.435364690523026E-003f, 1.493389585089498E-001f, 4.976029650847191E-006f, 7.978845453073848E-001f}; const float PH1[] = {-4.497014141919556E+001f, 5.073465654089319E+001f, -2.485774108720340E+001f, 7.222973196770240E+000f, -1.544842782180211E+000f, 3.503787691653334E-001f, -1.637986776941202E-001f, 3.749989509080821E-001f}; const T YO1 = pset1(4.66539330185668857532f); const T NEG_THPIO4F = pset1(-2.35619449019234492885f); /* -3*pi/4 */ const T TWOOPI = pset1(0.636619772367581343075535f); /* 2/pi */ const T NEG_MAXNUM = pset1(-NumTraits::infinity()); T z = pmul(x, x); T x_le_two = pmul(psub(z, YO1), internal::ppolevl::run(z, YP)); x_le_two = pmadd( x_le_two, x, pmul(TWOOPI, pmadd( generic_j1::run(x), plog(x), pdiv(pset1(-1.0f), x)))); x_le_two = pselect(pcmp_lt(x, pset1(0.0f)), NEG_MAXNUM, x_le_two); T q = pdiv(pset1(1.0), x); T w = prsqrt(x); T p = pmul(w, internal::ppolevl::run(q, MO1)); w = pmul(q, q); T xn = pmadd(q, internal::ppolevl::run(w, PH1), NEG_THPIO4F); T x_gt_two = pmul(p, psin(padd(xn, x))); return pselect(pcmp_le(x, pset1(2.0)), x_le_two, x_gt_two); } }; template struct generic_y1 { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T& x) { /* j1.c * Bessel function of second kind of order one * * * * SYNOPSIS: * * double x, y, y1(); * * y = y1( x ); * * * * DESCRIPTION: * * Returns Bessel function of the second kind of order one * of the argument. * * The domain is divided into the intervals [0, 8] and * (8, infinity). In the first interval a 25 term Chebyshev * expansion is used, and a call to j1() is required. * In the second, the asymptotic trigonometric representation * is employed using two rational functions of degree 5/5. * * * * ACCURACY: * * Absolute error: * arithmetic domain # trials peak rms * DEC 0, 30 10000 8.6e-17 1.3e-17 * IEEE 0, 30 30000 1.0e-15 1.3e-16 * * (error criterion relative when |y1| > 1). * */ const double PP[] = {7.62125616208173112003E-4, 7.31397056940917570436E-2, 1.12719608129684925192E0, 5.11207951146807644818E0, 8.42404590141772420927E0, 5.21451598682361504063E0, 1.00000000000000000254E0}; const double PQ[] = {5.71323128072548699714E-4, 6.88455908754495404082E-2, 1.10514232634061696926E0, 5.07386386128601488557E0, 8.39985554327604159757E0, 5.20982848682361821619E0, 9.99999999999999997461E-1}; const double QP[] = {5.10862594750176621635E-2, 4.98213872951233449420E0, 7.58238284132545283818E1, 3.66779609360150777800E2, 7.10856304998926107277E2, 5.97489612400613639965E2, 2.11688757100572135698E2, 2.52070205858023719784E1}; const double QQ[] = {1.00000000000000000000E0, 7.42373277035675149943E1, 1.05644886038262816351E3, 4.98641058337653607651E3, 9.56231892404756170795E3, 7.99704160447350683650E3, 2.82619278517639096600E3, 3.36093607810698293419E2}; const double YP[] = {1.26320474790178026440E9, -6.47355876379160291031E11, 1.14509511541823727583E14, -8.12770255501325109621E15, 2.02439475713594898196E17, -7.78877196265950026825E17}; const double YQ[] = {1.00000000000000000000E0, 5.94301592346128195359E2, 2.35564092943068577943E5, 7.34811944459721705660E7, 1.87601316108706159478E10, 3.88231277496238566008E12, 6.20557727146953693363E14, 6.87141087355300489866E16, 3.97270608116560655612E18}; const T SQ2OPI = pset1(.79788456080286535588); const T NEG_THPIO4 = pset1(-2.35619449019234492885); /* -3*pi/4 */ const T TWOOPI = pset1(0.636619772367581343075535); /* 2/pi */ const T NEG_MAXNUM = pset1(-NumTraits::infinity()); T z = pmul(x, x); T x_le_five = pdiv(internal::ppolevl::run(z, YP), internal::ppolevl::run(z, YQ)); x_le_five = pmadd( x_le_five, x, pmul( TWOOPI, pmadd(generic_j1::run(x), plog(x), pdiv(pset1(-1.0), x)))); x_le_five = pselect(pcmp_le(x, pset1(0.0)), NEG_MAXNUM, x_le_five); T s = pdiv(pset1(25.0), z); T p = pdiv( internal::ppolevl::run(s, PP), internal::ppolevl::run(s, PQ)); T q = pdiv( internal::ppolevl::run(s, QP), internal::ppolevl::run(s, QQ)); T xn = padd(x, NEG_THPIO4); T w = pdiv(pset1(5.0), x); p = pmadd(p, psin(xn), pmul(w, pmul(q, pcos(xn)))); T x_gt_five = pmul(p, pmul(SQ2OPI, prsqrt(x))); return pselect(pcmp_le(x, pset1(5.0)), x_le_five, x_gt_five); } }; template struct bessel_y1_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T run(const T x) { return generic_y1::run(x); } }; } // end namespace internal namespace numext { template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_i0, Scalar) bessel_i0(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_i0, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_i0e, Scalar) bessel_i0e(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_i0e, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_i1, Scalar) bessel_i1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_i1, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_i1e, Scalar) bessel_i1e(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_i1e, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_k0, Scalar) bessel_k0(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_k0, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_k0e, Scalar) bessel_k0e(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_k0e, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_k1, Scalar) bessel_k1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_k1, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_k1e, Scalar) bessel_k1e(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_k1e, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_j0, Scalar) bessel_j0(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_j0, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_y0, Scalar) bessel_y0(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_y0, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_j1, Scalar) bessel_j1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_j1, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(bessel_y1, Scalar) bessel_y1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(bessel_y1, Scalar)::run(x); } } // end namespace numext } // end namespace Eigen #endif // EIGEN_BESSEL_FUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/BesselFunctionsArrayAPI.h0000644000176200001440000002343714562417221031032 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BESSELFUNCTIONS_ARRAYAPI_H #define EIGEN_BESSELFUNCTIONS_ARRAYAPI_H namespace Eigen { /** \returns an expression of the coefficient-wise i0(\a x) to the given * arrays. * * It returns the modified Bessel function of the first kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of i0(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_i0() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i0_op, const Derived> bessel_i0(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i0_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise i0e(\a x) to the given * arrays. * * It returns the exponentially scaled modified Bessel * function of the first kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of i0e(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_i0e() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i0e_op, const Derived> bessel_i0e(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i0e_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise i1(\a x) to the given * arrays. * * It returns the modified Bessel function of the first kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of i1(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_i1() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i1_op, const Derived> bessel_i1(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i1_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise i1e(\a x) to the given * arrays. * * It returns the exponentially scaled modified Bessel * function of the first kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of i1e(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_i1e() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i1e_op, const Derived> bessel_i1e(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_i1e_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise k0(\a x) to the given * arrays. * * It returns the modified Bessel function of the second kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of k0(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_k0() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k0_op, const Derived> bessel_k0(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k0_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise k0e(\a x) to the given * arrays. * * It returns the exponentially scaled modified Bessel * function of the second kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of k0e(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_k0e() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k0e_op, const Derived> bessel_k0e(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k0e_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise k1(\a x) to the given * arrays. * * It returns the modified Bessel function of the second kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of k1(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_k1() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k1_op, const Derived> bessel_k1(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k1_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise k1e(\a x) to the given * arrays. * * It returns the exponentially scaled modified Bessel * function of the second kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of k1e(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_k1e() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k1e_op, const Derived> bessel_k1e(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_k1e_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise j0(\a x) to the given * arrays. * * It returns the Bessel function of the first kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of j0(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_j0() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_j0_op, const Derived> bessel_j0(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_j0_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise y0(\a x) to the given * arrays. * * It returns the Bessel function of the second kind of order zero. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of y0(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_y0() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_y0_op, const Derived> bessel_y0(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_y0_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise j1(\a x) to the given * arrays. * * It returns the modified Bessel function of the first kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of j1(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_j1() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_j1_op, const Derived> bessel_j1(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_j1_op, const Derived>(x.derived()); } /** \returns an expression of the coefficient-wise y1(\a x) to the given * arrays. * * It returns the Bessel function of the second kind of order one. * * \param x is the argument * * \note This function supports only float and double scalar types. To support * other scalar types, the user has to provide implementations of y1(T) for * any scalar type T to be supported. * * \sa ArrayBase::bessel_y1() */ template EIGEN_STRONG_INLINE const Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_y1_op, const Derived> bessel_y1(const Eigen::ArrayBase& x) { return Eigen::CwiseUnaryOp< Eigen::internal::scalar_bessel_y1_op, const Derived>(x.derived()); } } // end namespace Eigen #endif // EIGEN_BESSELFUNCTIONS_ARRAYAPI_H RcppEigen/inst/include/unsupported/Eigen/src/SpecialFunctions/SpecialFunctionsBFloat16.h0000644000176200001440000000601714562417221031076 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPECIALFUNCTIONS_BFLOAT16_H #define EIGEN_SPECIALFUNCTIONS_BFLOAT16_H namespace Eigen { namespace numext { #if EIGEN_HAS_C99_MATH template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 lgamma(const Eigen::bfloat16& a) { return Eigen::bfloat16(Eigen::numext::lgamma(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 digamma(const Eigen::bfloat16& a) { return Eigen::bfloat16(Eigen::numext::digamma(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 zeta(const Eigen::bfloat16& x, const Eigen::bfloat16& q) { return Eigen::bfloat16(Eigen::numext::zeta(static_cast(x), static_cast(q))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 polygamma(const Eigen::bfloat16& n, const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::polygamma(static_cast(n), static_cast(x))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 erf(const Eigen::bfloat16& a) { return Eigen::bfloat16(Eigen::numext::erf(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 erfc(const Eigen::bfloat16& a) { return Eigen::bfloat16(Eigen::numext::erfc(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 ndtri(const Eigen::bfloat16& a) { return Eigen::bfloat16(Eigen::numext::ndtri(static_cast(a))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 igamma(const Eigen::bfloat16& a, const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::igamma(static_cast(a), static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 igamma_der_a(const Eigen::bfloat16& a, const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::igamma_der_a(static_cast(a), static_cast(x))); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 gamma_sample_der_alpha(const Eigen::bfloat16& alpha, const Eigen::bfloat16& sample) { return Eigen::bfloat16(Eigen::numext::gamma_sample_der_alpha(static_cast(alpha), static_cast(sample))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 igammac(const Eigen::bfloat16& a, const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::igammac(static_cast(a), static_cast(x))); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 betainc(const Eigen::bfloat16& a, const Eigen::bfloat16& b, const Eigen::bfloat16& x) { return Eigen::bfloat16(Eigen::numext::betainc(static_cast(a), static_cast(b), static_cast(x))); } #endif } // end namespace numext } // end namespace Eigen #endif // EIGEN_SPECIALFUNCTIONS_BFLOAT16_H RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/0000755000176200001440000000000014562417221024075 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixLogarithm.h0000644000176200001440000004222514562417221027366 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011, 2013 Jitse Niesen // Copyright (C) 2011 Chen-Pang He // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_LOGARITHM #define EIGEN_MATRIX_LOGARITHM namespace Eigen { namespace internal { template struct matrix_log_min_pade_degree { static const int value = 3; }; template struct matrix_log_max_pade_degree { typedef typename NumTraits::Real RealScalar; static const int value = std::numeric_limits::digits<= 24? 5: // single precision std::numeric_limits::digits<= 53? 7: // double precision std::numeric_limits::digits<= 64? 8: // extended precision std::numeric_limits::digits<=106? 10: // double-double 11; // quadruple precision }; /** \brief Compute logarithm of 2x2 triangular matrix. */ template void matrix_log_compute_2x2(const MatrixType& A, MatrixType& result) { typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; using std::abs; using std::ceil; using std::imag; using std::log; Scalar logA00 = log(A(0,0)); Scalar logA11 = log(A(1,1)); result(0,0) = logA00; result(1,0) = Scalar(0); result(1,1) = logA11; Scalar y = A(1,1) - A(0,0); if (y==Scalar(0)) { result(0,1) = A(0,1) / A(0,0); } else if ((abs(A(0,0)) < RealScalar(0.5)*abs(A(1,1))) || (abs(A(0,0)) > 2*abs(A(1,1)))) { result(0,1) = A(0,1) * (logA11 - logA00) / y; } else { // computation in previous branch is inaccurate if A(1,1) \approx A(0,0) RealScalar unwindingNumber = ceil((imag(logA11 - logA00) - RealScalar(EIGEN_PI)) / RealScalar(2*EIGEN_PI)); result(0,1) = A(0,1) * (numext::log1p(y/A(0,0)) + Scalar(0,RealScalar(2*EIGEN_PI)*unwindingNumber)) / y; } } /* \brief Get suitable degree for Pade approximation. (specialized for RealScalar = float) */ inline int matrix_log_get_pade_degree(float normTminusI) { const float maxNormForPade[] = { 2.5111573934555054e-1 /* degree = 3 */ , 4.0535837411880493e-1, 5.3149729967117310e-1 }; const int minPadeDegree = matrix_log_min_pade_degree::value; const int maxPadeDegree = matrix_log_max_pade_degree::value; int degree = minPadeDegree; for (; degree <= maxPadeDegree; ++degree) if (normTminusI <= maxNormForPade[degree - minPadeDegree]) break; return degree; } /* \brief Get suitable degree for Pade approximation. (specialized for RealScalar = double) */ inline int matrix_log_get_pade_degree(double normTminusI) { const double maxNormForPade[] = { 1.6206284795015624e-2 /* degree = 3 */ , 5.3873532631381171e-2, 1.1352802267628681e-1, 1.8662860613541288e-1, 2.642960831111435e-1 }; const int minPadeDegree = matrix_log_min_pade_degree::value; const int maxPadeDegree = matrix_log_max_pade_degree::value; int degree = minPadeDegree; for (; degree <= maxPadeDegree; ++degree) if (normTminusI <= maxNormForPade[degree - minPadeDegree]) break; return degree; } /* \brief Get suitable degree for Pade approximation. (specialized for RealScalar = long double) */ inline int matrix_log_get_pade_degree(long double normTminusI) { #if LDBL_MANT_DIG == 53 // double precision const long double maxNormForPade[] = { 1.6206284795015624e-2L /* degree = 3 */ , 5.3873532631381171e-2L, 1.1352802267628681e-1L, 1.8662860613541288e-1L, 2.642960831111435e-1L }; #elif LDBL_MANT_DIG <= 64 // extended precision const long double maxNormForPade[] = { 5.48256690357782863103e-3L /* degree = 3 */, 2.34559162387971167321e-2L, 5.84603923897347449857e-2L, 1.08486423756725170223e-1L, 1.68385767881294446649e-1L, 2.32777776523703892094e-1L }; #elif LDBL_MANT_DIG <= 106 // double-double const long double maxNormForPade[] = { 8.58970550342939562202529664318890e-5L /* degree = 3 */, 9.34074328446359654039446552677759e-4L, 4.26117194647672175773064114582860e-3L, 1.21546224740281848743149666560464e-2L, 2.61100544998339436713088248557444e-2L, 4.66170074627052749243018566390567e-2L, 7.32585144444135027565872014932387e-2L, 1.05026503471351080481093652651105e-1L }; #else // quadruple precision const long double maxNormForPade[] = { 4.7419931187193005048501568167858103e-5L /* degree = 3 */, 5.8853168473544560470387769480192666e-4L, 2.9216120366601315391789493628113520e-3L, 8.8415758124319434347116734705174308e-3L, 1.9850836029449446668518049562565291e-2L, 3.6688019729653446926585242192447447e-2L, 5.9290962294020186998954055264528393e-2L, 8.6998436081634343903250580992127677e-2L, 1.1880960220216759245467951592883642e-1L }; #endif const int minPadeDegree = matrix_log_min_pade_degree::value; const int maxPadeDegree = matrix_log_max_pade_degree::value; int degree = minPadeDegree; for (; degree <= maxPadeDegree; ++degree) if (normTminusI <= maxNormForPade[degree - minPadeDegree]) break; return degree; } /* \brief Compute Pade approximation to matrix logarithm */ template void matrix_log_compute_pade(MatrixType& result, const MatrixType& T, int degree) { typedef typename NumTraits::Real RealScalar; const int minPadeDegree = 3; const int maxPadeDegree = 11; assert(degree >= minPadeDegree && degree <= maxPadeDegree); // FIXME this creates float-conversion-warnings if these are enabled. // Either manually convert each value, or disable the warning locally const RealScalar nodes[][maxPadeDegree] = { { 0.1127016653792583114820734600217600L, 0.5000000000000000000000000000000000L, // degree 3 0.8872983346207416885179265399782400L }, { 0.0694318442029737123880267555535953L, 0.3300094782075718675986671204483777L, // degree 4 0.6699905217924281324013328795516223L, 0.9305681557970262876119732444464048L }, { 0.0469100770306680036011865608503035L, 0.2307653449471584544818427896498956L, // degree 5 0.5000000000000000000000000000000000L, 0.7692346550528415455181572103501044L, 0.9530899229693319963988134391496965L }, { 0.0337652428984239860938492227530027L, 0.1693953067668677431693002024900473L, // degree 6 0.3806904069584015456847491391596440L, 0.6193095930415984543152508608403560L, 0.8306046932331322568306997975099527L, 0.9662347571015760139061507772469973L }, { 0.0254460438286207377369051579760744L, 0.1292344072003027800680676133596058L, // degree 7 0.2970774243113014165466967939615193L, 0.5000000000000000000000000000000000L, 0.7029225756886985834533032060384807L, 0.8707655927996972199319323866403942L, 0.9745539561713792622630948420239256L }, { 0.0198550717512318841582195657152635L, 0.1016667612931866302042230317620848L, // degree 8 0.2372337950418355070911304754053768L, 0.4082826787521750975302619288199080L, 0.5917173212478249024697380711800920L, 0.7627662049581644929088695245946232L, 0.8983332387068133697957769682379152L, 0.9801449282487681158417804342847365L }, { 0.0159198802461869550822118985481636L, 0.0819844463366821028502851059651326L, // degree 9 0.1933142836497048013456489803292629L, 0.3378732882980955354807309926783317L, 0.5000000000000000000000000000000000L, 0.6621267117019044645192690073216683L, 0.8066857163502951986543510196707371L, 0.9180155536633178971497148940348674L, 0.9840801197538130449177881014518364L }, { 0.0130467357414141399610179939577740L, 0.0674683166555077446339516557882535L, // degree 10 0.1602952158504877968828363174425632L, 0.2833023029353764046003670284171079L, 0.4255628305091843945575869994351400L, 0.5744371694908156054424130005648600L, 0.7166976970646235953996329715828921L, 0.8397047841495122031171636825574368L, 0.9325316833444922553660483442117465L, 0.9869532642585858600389820060422260L }, { 0.0108856709269715035980309994385713L, 0.0564687001159523504624211153480364L, // degree 11 0.1349239972129753379532918739844233L, 0.2404519353965940920371371652706952L, 0.3652284220238275138342340072995692L, 0.5000000000000000000000000000000000L, 0.6347715779761724861657659927004308L, 0.7595480646034059079628628347293048L, 0.8650760027870246620467081260155767L, 0.9435312998840476495375788846519636L, 0.9891143290730284964019690005614287L } }; const RealScalar weights[][maxPadeDegree] = { { 0.2777777777777777777777777777777778L, 0.4444444444444444444444444444444444L, // degree 3 0.2777777777777777777777777777777778L }, { 0.1739274225687269286865319746109997L, 0.3260725774312730713134680253890003L, // degree 4 0.3260725774312730713134680253890003L, 0.1739274225687269286865319746109997L }, { 0.1184634425280945437571320203599587L, 0.2393143352496832340206457574178191L, // degree 5 0.2844444444444444444444444444444444L, 0.2393143352496832340206457574178191L, 0.1184634425280945437571320203599587L }, { 0.0856622461895851725201480710863665L, 0.1803807865240693037849167569188581L, // degree 6 0.2339569672863455236949351719947755L, 0.2339569672863455236949351719947755L, 0.1803807865240693037849167569188581L, 0.0856622461895851725201480710863665L }, { 0.0647424830844348466353057163395410L, 0.1398526957446383339507338857118898L, // degree 7 0.1909150252525594724751848877444876L, 0.2089795918367346938775510204081633L, 0.1909150252525594724751848877444876L, 0.1398526957446383339507338857118898L, 0.0647424830844348466353057163395410L }, { 0.0506142681451881295762656771549811L, 0.1111905172266872352721779972131204L, // degree 8 0.1568533229389436436689811009933007L, 0.1813418916891809914825752246385978L, 0.1813418916891809914825752246385978L, 0.1568533229389436436689811009933007L, 0.1111905172266872352721779972131204L, 0.0506142681451881295762656771549811L }, { 0.0406371941807872059859460790552618L, 0.0903240803474287020292360156214564L, // degree 9 0.1303053482014677311593714347093164L, 0.1561735385200014200343152032922218L, 0.1651196775006298815822625346434870L, 0.1561735385200014200343152032922218L, 0.1303053482014677311593714347093164L, 0.0903240803474287020292360156214564L, 0.0406371941807872059859460790552618L }, { 0.0333356721543440687967844049466659L, 0.0747256745752902965728881698288487L, // degree 10 0.1095431812579910219977674671140816L, 0.1346333596549981775456134607847347L, 0.1477621123573764350869464973256692L, 0.1477621123573764350869464973256692L, 0.1346333596549981775456134607847347L, 0.1095431812579910219977674671140816L, 0.0747256745752902965728881698288487L, 0.0333356721543440687967844049466659L }, { 0.0278342835580868332413768602212743L, 0.0627901847324523123173471496119701L, // degree 11 0.0931451054638671257130488207158280L, 0.1165968822959952399592618524215876L, 0.1314022722551233310903444349452546L, 0.1364625433889503153572417641681711L, 0.1314022722551233310903444349452546L, 0.1165968822959952399592618524215876L, 0.0931451054638671257130488207158280L, 0.0627901847324523123173471496119701L, 0.0278342835580868332413768602212743L } }; MatrixType TminusI = T - MatrixType::Identity(T.rows(), T.rows()); result.setZero(T.rows(), T.rows()); for (int k = 0; k < degree; ++k) { RealScalar weight = weights[degree-minPadeDegree][k]; RealScalar node = nodes[degree-minPadeDegree][k]; result += weight * (MatrixType::Identity(T.rows(), T.rows()) + node * TminusI) .template triangularView().solve(TminusI); } } /** \brief Compute logarithm of triangular matrices with size > 2. * \details This uses a inverse scale-and-square algorithm. */ template void matrix_log_compute_big(const MatrixType& A, MatrixType& result) { typedef typename MatrixType::Scalar Scalar; typedef typename NumTraits::Real RealScalar; using std::pow; int numberOfSquareRoots = 0; int numberOfExtraSquareRoots = 0; int degree; MatrixType T = A, sqrtT; const int maxPadeDegree = matrix_log_max_pade_degree::value; const RealScalar maxNormForPade = RealScalar( maxPadeDegree<= 5? 5.3149729967117310e-1L: // single precision maxPadeDegree<= 7? 2.6429608311114350e-1L: // double precision maxPadeDegree<= 8? 2.32777776523703892094e-1L: // extended precision maxPadeDegree<=10? 1.05026503471351080481093652651105e-1L: // double-double 1.1880960220216759245467951592883642e-1L); // quadruple precision while (true) { RealScalar normTminusI = (T - MatrixType::Identity(T.rows(), T.rows())).cwiseAbs().colwise().sum().maxCoeff(); if (normTminusI < maxNormForPade) { degree = matrix_log_get_pade_degree(normTminusI); int degree2 = matrix_log_get_pade_degree(normTminusI / RealScalar(2)); if ((degree - degree2 <= 1) || (numberOfExtraSquareRoots == 1)) break; ++numberOfExtraSquareRoots; } matrix_sqrt_triangular(T, sqrtT); T = sqrtT.template triangularView(); ++numberOfSquareRoots; } matrix_log_compute_pade(result, T, degree); result *= pow(RealScalar(2), RealScalar(numberOfSquareRoots)); // TODO replace by bitshift if possible } /** \ingroup MatrixFunctions_Module * \class MatrixLogarithmAtomic * \brief Helper class for computing matrix logarithm of atomic matrices. * * Here, an atomic matrix is a triangular matrix whose diagonal entries are close to each other. * * \sa class MatrixFunctionAtomic, MatrixBase::log() */ template class MatrixLogarithmAtomic { public: /** \brief Compute matrix logarithm of atomic matrix * \param[in] A argument of matrix logarithm, should be upper triangular and atomic * \returns The logarithm of \p A. */ MatrixType compute(const MatrixType& A); }; template MatrixType MatrixLogarithmAtomic::compute(const MatrixType& A) { using std::log; MatrixType result(A.rows(), A.rows()); if (A.rows() == 1) result(0,0) = log(A(0,0)); else if (A.rows() == 2) matrix_log_compute_2x2(A, result); else matrix_log_compute_big(A, result); return result; } } // end of namespace internal /** \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix logarithm of some matrix (expression). * * \tparam Derived Type of the argument to the matrix function. * * This class holds the argument to the matrix function until it is * assigned or evaluated for some other reason (so the argument * should not be changed in the meantime). It is the return type of * MatrixBase::log() and most of the time this is the only way it * is used. */ template class MatrixLogarithmReturnValue : public ReturnByValue > { public: typedef typename Derived::Scalar Scalar; typedef typename Derived::Index Index; protected: typedef typename internal::ref_selector::type DerivedNested; public: /** \brief Constructor. * * \param[in] A %Matrix (expression) forming the argument of the matrix logarithm. */ explicit MatrixLogarithmReturnValue(const Derived& A) : m_A(A) { } /** \brief Compute the matrix logarithm. * * \param[out] result Logarithm of \c A, where \c A is as specified in the constructor. */ template inline void evalTo(ResultType& result) const { typedef typename internal::nested_eval::type DerivedEvalType; typedef typename internal::remove_all::type DerivedEvalTypeClean; typedef internal::traits Traits; typedef std::complex::Real> ComplexScalar; typedef Matrix DynMatrixType; typedef internal::MatrixLogarithmAtomic AtomicType; AtomicType atomic; internal::matrix_function_compute::run(m_A, atomic, result); } Index rows() const { return m_A.rows(); } Index cols() const { return m_A.cols(); } private: const DerivedNested m_A; }; namespace internal { template struct traits > { typedef typename Derived::PlainObject ReturnType; }; } /********** MatrixBase method **********/ template const MatrixLogarithmReturnValue MatrixBase::log() const { eigen_assert(rows() == cols()); return MatrixLogarithmReturnValue(derived()); } } // end namespace Eigen #endif // EIGEN_MATRIX_LOGARITHM RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixPower.h0000644000176200001440000005557614562417221026551 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012, 2013 Chen-Pang He // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_POWER #define EIGEN_MATRIX_POWER namespace Eigen { template class MatrixPower; /** * \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix power of some matrix. * * \tparam MatrixType type of the base, a matrix. * * This class holds the arguments to the matrix power until it is * assigned or evaluated for some other reason (so the argument * should not be changed in the meantime). It is the return type of * MatrixPower::operator() and related functions and most of the * time this is the only way it is used. */ /* TODO This class is only used by MatrixPower, so it should be nested * into MatrixPower, like MatrixPower::ReturnValue. However, my * compiler complained about unused template parameter in the * following declaration in namespace internal. * * template * struct traits::ReturnValue>; */ template class MatrixPowerParenthesesReturnValue : public ReturnByValue< MatrixPowerParenthesesReturnValue > { public: typedef typename MatrixType::RealScalar RealScalar; /** * \brief Constructor. * * \param[in] pow %MatrixPower storing the base. * \param[in] p scalar, the exponent of the matrix power. */ MatrixPowerParenthesesReturnValue(MatrixPower& pow, RealScalar p) : m_pow(pow), m_p(p) { } /** * \brief Compute the matrix power. * * \param[out] result */ template inline void evalTo(ResultType& result) const { m_pow.compute(result, m_p); } Index rows() const { return m_pow.rows(); } Index cols() const { return m_pow.cols(); } private: MatrixPower& m_pow; const RealScalar m_p; }; /** * \ingroup MatrixFunctions_Module * * \brief Class for computing matrix powers. * * \tparam MatrixType type of the base, expected to be an instantiation * of the Matrix class template. * * This class is capable of computing triangular real/complex matrices * raised to a power in the interval \f$ (-1, 1) \f$. * * \note Currently this class is only used by MatrixPower. One may * insist that this be nested into MatrixPower. This class is here to * facilitate future development of triangular matrix functions. */ template class MatrixPowerAtomic : internal::noncopyable { private: enum { RowsAtCompileTime = MatrixType::RowsAtCompileTime, MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime }; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef std::complex ComplexScalar; typedef Block ResultType; const MatrixType& m_A; RealScalar m_p; void computePade(int degree, const MatrixType& IminusT, ResultType& res) const; void compute2x2(ResultType& res, RealScalar p) const; void computeBig(ResultType& res) const; static int getPadeDegree(float normIminusT); static int getPadeDegree(double normIminusT); static int getPadeDegree(long double normIminusT); static ComplexScalar computeSuperDiag(const ComplexScalar&, const ComplexScalar&, RealScalar p); static RealScalar computeSuperDiag(RealScalar, RealScalar, RealScalar p); public: /** * \brief Constructor. * * \param[in] T the base of the matrix power. * \param[in] p the exponent of the matrix power, should be in * \f$ (-1, 1) \f$. * * The class stores a reference to T, so it should not be changed * (or destroyed) before evaluation. Only the upper triangular * part of T is read. */ MatrixPowerAtomic(const MatrixType& T, RealScalar p); /** * \brief Compute the matrix power. * * \param[out] res \f$ A^p \f$ where A and p are specified in the * constructor. */ void compute(ResultType& res) const; }; template MatrixPowerAtomic::MatrixPowerAtomic(const MatrixType& T, RealScalar p) : m_A(T), m_p(p) { eigen_assert(T.rows() == T.cols()); eigen_assert(p > -1 && p < 1); } template void MatrixPowerAtomic::compute(ResultType& res) const { using std::pow; switch (m_A.rows()) { case 0: break; case 1: res(0,0) = pow(m_A(0,0), m_p); break; case 2: compute2x2(res, m_p); break; default: computeBig(res); } } template void MatrixPowerAtomic::computePade(int degree, const MatrixType& IminusT, ResultType& res) const { int i = 2*degree; res = (m_p-RealScalar(degree)) / RealScalar(2*i-2) * IminusT; for (--i; i; --i) { res = (MatrixType::Identity(IminusT.rows(), IminusT.cols()) + res).template triangularView() .solve((i==1 ? -m_p : i&1 ? (-m_p-RealScalar(i/2))/RealScalar(2*i) : (m_p-RealScalar(i/2))/RealScalar(2*i-2)) * IminusT).eval(); } res += MatrixType::Identity(IminusT.rows(), IminusT.cols()); } // This function assumes that res has the correct size (see bug 614) template void MatrixPowerAtomic::compute2x2(ResultType& res, RealScalar p) const { using std::abs; using std::pow; res.coeffRef(0,0) = pow(m_A.coeff(0,0), p); for (Index i=1; i < m_A.cols(); ++i) { res.coeffRef(i,i) = pow(m_A.coeff(i,i), p); if (m_A.coeff(i-1,i-1) == m_A.coeff(i,i)) res.coeffRef(i-1,i) = p * pow(m_A.coeff(i,i), p-1); else if (2*abs(m_A.coeff(i-1,i-1)) < abs(m_A.coeff(i,i)) || 2*abs(m_A.coeff(i,i)) < abs(m_A.coeff(i-1,i-1))) res.coeffRef(i-1,i) = (res.coeff(i,i)-res.coeff(i-1,i-1)) / (m_A.coeff(i,i)-m_A.coeff(i-1,i-1)); else res.coeffRef(i-1,i) = computeSuperDiag(m_A.coeff(i,i), m_A.coeff(i-1,i-1), p); res.coeffRef(i-1,i) *= m_A.coeff(i-1,i); } } template void MatrixPowerAtomic::computeBig(ResultType& res) const { using std::ldexp; const int digits = std::numeric_limits::digits; const RealScalar maxNormForPade = RealScalar( digits <= 24? 4.3386528e-1L // single precision : digits <= 53? 2.789358995219730e-1L // double precision : digits <= 64? 2.4471944416607995472e-1L // extended precision : digits <= 106? 1.1016843812851143391275867258512e-1L // double-double : 9.134603732914548552537150753385375e-2L); // quadruple precision MatrixType IminusT, sqrtT, T = m_A.template triangularView(); RealScalar normIminusT; int degree, degree2, numberOfSquareRoots = 0; bool hasExtraSquareRoot = false; for (Index i=0; i < m_A.cols(); ++i) eigen_assert(m_A(i,i) != RealScalar(0)); while (true) { IminusT = MatrixType::Identity(m_A.rows(), m_A.cols()) - T; normIminusT = IminusT.cwiseAbs().colwise().sum().maxCoeff(); if (normIminusT < maxNormForPade) { degree = getPadeDegree(normIminusT); degree2 = getPadeDegree(normIminusT/2); if (degree - degree2 <= 1 || hasExtraSquareRoot) break; hasExtraSquareRoot = true; } matrix_sqrt_triangular(T, sqrtT); T = sqrtT.template triangularView(); ++numberOfSquareRoots; } computePade(degree, IminusT, res); for (; numberOfSquareRoots; --numberOfSquareRoots) { compute2x2(res, ldexp(m_p, -numberOfSquareRoots)); res = res.template triangularView() * res; } compute2x2(res, m_p); } template inline int MatrixPowerAtomic::getPadeDegree(float normIminusT) { const float maxNormForPade[] = { 2.8064004e-1f /* degree = 3 */ , 4.3386528e-1f }; int degree = 3; for (; degree <= 4; ++degree) if (normIminusT <= maxNormForPade[degree - 3]) break; return degree; } template inline int MatrixPowerAtomic::getPadeDegree(double normIminusT) { const double maxNormForPade[] = { 1.884160592658218e-2 /* degree = 3 */ , 6.038881904059573e-2, 1.239917516308172e-1, 1.999045567181744e-1, 2.789358995219730e-1 }; int degree = 3; for (; degree <= 7; ++degree) if (normIminusT <= maxNormForPade[degree - 3]) break; return degree; } template inline int MatrixPowerAtomic::getPadeDegree(long double normIminusT) { #if LDBL_MANT_DIG == 53 const int maxPadeDegree = 7; const double maxNormForPade[] = { 1.884160592658218e-2L /* degree = 3 */ , 6.038881904059573e-2L, 1.239917516308172e-1L, 1.999045567181744e-1L, 2.789358995219730e-1L }; #elif LDBL_MANT_DIG <= 64 const int maxPadeDegree = 8; const long double maxNormForPade[] = { 6.3854693117491799460e-3L /* degree = 3 */ , 2.6394893435456973676e-2L, 6.4216043030404063729e-2L, 1.1701165502926694307e-1L, 1.7904284231268670284e-1L, 2.4471944416607995472e-1L }; #elif LDBL_MANT_DIG <= 106 const int maxPadeDegree = 10; const double maxNormForPade[] = { 1.0007161601787493236741409687186e-4L /* degree = 3 */ , 1.0007161601787493236741409687186e-3L, 4.7069769360887572939882574746264e-3L, 1.3220386624169159689406653101695e-2L, 2.8063482381631737920612944054906e-2L, 4.9625993951953473052385361085058e-2L, 7.7367040706027886224557538328171e-2L, 1.1016843812851143391275867258512e-1L }; #else const int maxPadeDegree = 10; const double maxNormForPade[] = { 5.524506147036624377378713555116378e-5L /* degree = 3 */ , 6.640600568157479679823602193345995e-4L, 3.227716520106894279249709728084626e-3L, 9.619593944683432960546978734646284e-3L, 2.134595382433742403911124458161147e-2L, 3.908166513900489428442993794761185e-2L, 6.266780814639442865832535460550138e-2L, 9.134603732914548552537150753385375e-2L }; #endif int degree = 3; for (; degree <= maxPadeDegree; ++degree) if (normIminusT <= maxNormForPade[degree - 3]) break; return degree; } template inline typename MatrixPowerAtomic::ComplexScalar MatrixPowerAtomic::computeSuperDiag(const ComplexScalar& curr, const ComplexScalar& prev, RealScalar p) { using std::ceil; using std::exp; using std::log; using std::sinh; ComplexScalar logCurr = log(curr); ComplexScalar logPrev = log(prev); RealScalar unwindingNumber = ceil((numext::imag(logCurr - logPrev) - RealScalar(EIGEN_PI)) / RealScalar(2*EIGEN_PI)); ComplexScalar w = numext::log1p((curr-prev)/prev)/RealScalar(2) + ComplexScalar(0, RealScalar(EIGEN_PI)*unwindingNumber); return RealScalar(2) * exp(RealScalar(0.5) * p * (logCurr + logPrev)) * sinh(p * w) / (curr - prev); } template inline typename MatrixPowerAtomic::RealScalar MatrixPowerAtomic::computeSuperDiag(RealScalar curr, RealScalar prev, RealScalar p) { using std::exp; using std::log; using std::sinh; RealScalar w = numext::log1p((curr-prev)/prev)/RealScalar(2); return 2 * exp(p * (log(curr) + log(prev)) / 2) * sinh(p * w) / (curr - prev); } /** * \ingroup MatrixFunctions_Module * * \brief Class for computing matrix powers. * * \tparam MatrixType type of the base, expected to be an instantiation * of the Matrix class template. * * This class is capable of computing real/complex matrices raised to * an arbitrary real power. Meanwhile, it saves the result of Schur * decomposition if an non-integral power has even been calculated. * Therefore, if you want to compute multiple (>= 2) matrix powers * for the same matrix, using the class directly is more efficient than * calling MatrixBase::pow(). * * Example: * \include MatrixPower_optimal.cpp * Output: \verbinclude MatrixPower_optimal.out */ template class MatrixPower : internal::noncopyable { private: typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; public: /** * \brief Constructor. * * \param[in] A the base of the matrix power. * * The class stores a reference to A, so it should not be changed * (or destroyed) before evaluation. */ explicit MatrixPower(const MatrixType& A) : m_A(A), m_conditionNumber(0), m_rank(A.cols()), m_nulls(0) { eigen_assert(A.rows() == A.cols()); } /** * \brief Returns the matrix power. * * \param[in] p exponent, a real scalar. * \return The expression \f$ A^p \f$, where A is specified in the * constructor. */ const MatrixPowerParenthesesReturnValue operator()(RealScalar p) { return MatrixPowerParenthesesReturnValue(*this, p); } /** * \brief Compute the matrix power. * * \param[in] p exponent, a real scalar. * \param[out] res \f$ A^p \f$ where A is specified in the * constructor. */ template void compute(ResultType& res, RealScalar p); Index rows() const { return m_A.rows(); } Index cols() const { return m_A.cols(); } private: typedef std::complex ComplexScalar; typedef Matrix ComplexMatrix; /** \brief Reference to the base of matrix power. */ typename MatrixType::Nested m_A; /** \brief Temporary storage. */ MatrixType m_tmp; /** \brief Store the result of Schur decomposition. */ ComplexMatrix m_T, m_U; /** \brief Store fractional power of m_T. */ ComplexMatrix m_fT; /** * \brief Condition number of m_A. * * It is initialized as 0 to avoid performing unnecessary Schur * decomposition, which is the bottleneck. */ RealScalar m_conditionNumber; /** \brief Rank of m_A. */ Index m_rank; /** \brief Rank deficiency of m_A. */ Index m_nulls; /** * \brief Split p into integral part and fractional part. * * \param[in] p The exponent. * \param[out] p The fractional part ranging in \f$ (-1, 1) \f$. * \param[out] intpart The integral part. * * Only if the fractional part is nonzero, it calls initialize(). */ void split(RealScalar& p, RealScalar& intpart); /** \brief Perform Schur decomposition for fractional power. */ void initialize(); template void computeIntPower(ResultType& res, RealScalar p); template void computeFracPower(ResultType& res, RealScalar p); template static void revertSchur( Matrix& res, const ComplexMatrix& T, const ComplexMatrix& U); template static void revertSchur( Matrix& res, const ComplexMatrix& T, const ComplexMatrix& U); }; template template void MatrixPower::compute(ResultType& res, RealScalar p) { using std::pow; switch (cols()) { case 0: break; case 1: res(0,0) = pow(m_A.coeff(0,0), p); break; default: RealScalar intpart; split(p, intpart); res = MatrixType::Identity(rows(), cols()); computeIntPower(res, intpart); if (p) computeFracPower(res, p); } } template void MatrixPower::split(RealScalar& p, RealScalar& intpart) { using std::floor; using std::pow; intpart = floor(p); p -= intpart; // Perform Schur decomposition if it is not yet performed and the power is // not an integer. if (!m_conditionNumber && p) initialize(); // Choose the more stable of intpart = floor(p) and intpart = ceil(p). if (p > RealScalar(0.5) && p > (1-p) * pow(m_conditionNumber, p)) { --p; ++intpart; } } template void MatrixPower::initialize() { const ComplexSchur schurOfA(m_A); JacobiRotation rot; ComplexScalar eigenvalue; m_fT.resizeLike(m_A); m_T = schurOfA.matrixT(); m_U = schurOfA.matrixU(); m_conditionNumber = m_T.diagonal().array().abs().maxCoeff() / m_T.diagonal().array().abs().minCoeff(); // Move zero eigenvalues to the bottom right corner. for (Index i = cols()-1; i>=0; --i) { if (m_rank <= 2) return; if (m_T.coeff(i,i) == RealScalar(0)) { for (Index j=i+1; j < m_rank; ++j) { eigenvalue = m_T.coeff(j,j); rot.makeGivens(m_T.coeff(j-1,j), eigenvalue); m_T.applyOnTheRight(j-1, j, rot); m_T.applyOnTheLeft(j-1, j, rot.adjoint()); m_T.coeffRef(j-1,j-1) = eigenvalue; m_T.coeffRef(j,j) = RealScalar(0); m_U.applyOnTheRight(j-1, j, rot); } --m_rank; } } m_nulls = rows() - m_rank; if (m_nulls) { eigen_assert(m_T.bottomRightCorner(m_nulls, m_nulls).isZero() && "Base of matrix power should be invertible or with a semisimple zero eigenvalue."); m_fT.bottomRows(m_nulls).fill(RealScalar(0)); } } template template void MatrixPower::computeIntPower(ResultType& res, RealScalar p) { using std::abs; using std::fmod; RealScalar pp = abs(p); if (p<0) m_tmp = m_A.inverse(); else m_tmp = m_A; while (true) { if (fmod(pp, 2) >= 1) res = m_tmp * res; pp /= 2; if (pp < 1) break; m_tmp *= m_tmp; } } template template void MatrixPower::computeFracPower(ResultType& res, RealScalar p) { Block blockTp(m_fT, 0, 0, m_rank, m_rank); eigen_assert(m_conditionNumber); eigen_assert(m_rank + m_nulls == rows()); MatrixPowerAtomic(m_T.topLeftCorner(m_rank, m_rank), p).compute(blockTp); if (m_nulls) { m_fT.topRightCorner(m_rank, m_nulls) = m_T.topLeftCorner(m_rank, m_rank).template triangularView() .solve(blockTp * m_T.topRightCorner(m_rank, m_nulls)); } revertSchur(m_tmp, m_fT, m_U); res = m_tmp * res; } template template inline void MatrixPower::revertSchur( Matrix& res, const ComplexMatrix& T, const ComplexMatrix& U) { res.noalias() = U * (T.template triangularView() * U.adjoint()); } template template inline void MatrixPower::revertSchur( Matrix& res, const ComplexMatrix& T, const ComplexMatrix& U) { res.noalias() = (U * (T.template triangularView() * U.adjoint())).real(); } /** * \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix power of some matrix (expression). * * \tparam Derived type of the base, a matrix (expression). * * This class holds the arguments to the matrix power until it is * assigned or evaluated for some other reason (so the argument * should not be changed in the meantime). It is the return type of * MatrixBase::pow() and related functions and most of the * time this is the only way it is used. */ template class MatrixPowerReturnValue : public ReturnByValue< MatrixPowerReturnValue > { public: typedef typename Derived::PlainObject PlainObject; typedef typename Derived::RealScalar RealScalar; /** * \brief Constructor. * * \param[in] A %Matrix (expression), the base of the matrix power. * \param[in] p real scalar, the exponent of the matrix power. */ MatrixPowerReturnValue(const Derived& A, RealScalar p) : m_A(A), m_p(p) { } /** * \brief Compute the matrix power. * * \param[out] result \f$ A^p \f$ where \p A and \p p are as in the * constructor. */ template inline void evalTo(ResultType& result) const { MatrixPower(m_A.eval()).compute(result, m_p); } Index rows() const { return m_A.rows(); } Index cols() const { return m_A.cols(); } private: const Derived& m_A; const RealScalar m_p; }; /** * \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix power of some matrix (expression). * * \tparam Derived type of the base, a matrix (expression). * * This class holds the arguments to the matrix power until it is * assigned or evaluated for some other reason (so the argument * should not be changed in the meantime). It is the return type of * MatrixBase::pow() and related functions and most of the * time this is the only way it is used. */ template class MatrixComplexPowerReturnValue : public ReturnByValue< MatrixComplexPowerReturnValue > { public: typedef typename Derived::PlainObject PlainObject; typedef typename std::complex ComplexScalar; /** * \brief Constructor. * * \param[in] A %Matrix (expression), the base of the matrix power. * \param[in] p complex scalar, the exponent of the matrix power. */ MatrixComplexPowerReturnValue(const Derived& A, const ComplexScalar& p) : m_A(A), m_p(p) { } /** * \brief Compute the matrix power. * * Because \p p is complex, \f$ A^p \f$ is simply evaluated as \f$ * \exp(p \log(A)) \f$. * * \param[out] result \f$ A^p \f$ where \p A and \p p are as in the * constructor. */ template inline void evalTo(ResultType& result) const { result = (m_p * m_A.log()).exp(); } Index rows() const { return m_A.rows(); } Index cols() const { return m_A.cols(); } private: const Derived& m_A; const ComplexScalar m_p; }; namespace internal { template struct traits< MatrixPowerParenthesesReturnValue > { typedef typename MatrixPowerType::PlainObject ReturnType; }; template struct traits< MatrixPowerReturnValue > { typedef typename Derived::PlainObject ReturnType; }; template struct traits< MatrixComplexPowerReturnValue > { typedef typename Derived::PlainObject ReturnType; }; } template const MatrixPowerReturnValue MatrixBase::pow(const RealScalar& p) const { return MatrixPowerReturnValue(derived(), p); } template const MatrixComplexPowerReturnValue MatrixBase::pow(const std::complex& p) const { return MatrixComplexPowerReturnValue(derived(), p); } } // namespace Eigen #endif // EIGEN_MATRIX_POWER RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixFunction.h0000644000176200001440000005421714562417221027231 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2011, 2013 Jitse Niesen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_FUNCTION_H #define EIGEN_MATRIX_FUNCTION_H #include "StemFunction.h" namespace Eigen { namespace internal { /** \brief Maximum distance allowed between eigenvalues to be considered "close". */ static const float matrix_function_separation = 0.1f; /** \ingroup MatrixFunctions_Module * \class MatrixFunctionAtomic * \brief Helper class for computing matrix functions of atomic matrices. * * Here, an atomic matrix is a triangular matrix whose diagonal entries are close to each other. */ template class MatrixFunctionAtomic { public: typedef typename MatrixType::Scalar Scalar; typedef typename stem_function::type StemFunction; /** \brief Constructor * \param[in] f matrix function to compute. */ MatrixFunctionAtomic(StemFunction f) : m_f(f) { } /** \brief Compute matrix function of atomic matrix * \param[in] A argument of matrix function, should be upper triangular and atomic * \returns f(A), the matrix function evaluated at the given matrix */ MatrixType compute(const MatrixType& A); private: StemFunction* m_f; }; template typename NumTraits::Real matrix_function_compute_mu(const MatrixType& A) { typedef typename plain_col_type::type VectorType; Index rows = A.rows(); const MatrixType N = MatrixType::Identity(rows, rows) - A; VectorType e = VectorType::Ones(rows); N.template triangularView().solveInPlace(e); return e.cwiseAbs().maxCoeff(); } template MatrixType MatrixFunctionAtomic::compute(const MatrixType& A) { // TODO: Use that A is upper triangular typedef typename NumTraits::Real RealScalar; Index rows = A.rows(); Scalar avgEival = A.trace() / Scalar(RealScalar(rows)); MatrixType Ashifted = A - avgEival * MatrixType::Identity(rows, rows); RealScalar mu = matrix_function_compute_mu(Ashifted); MatrixType F = m_f(avgEival, 0) * MatrixType::Identity(rows, rows); MatrixType P = Ashifted; MatrixType Fincr; for (Index s = 1; double(s) < 1.1 * double(rows) + 10.0; s++) { // upper limit is fairly arbitrary Fincr = m_f(avgEival, static_cast(s)) * P; F += Fincr; P = Scalar(RealScalar(1)/RealScalar(s + 1)) * P * Ashifted; // test whether Taylor series converged const RealScalar F_norm = F.cwiseAbs().rowwise().sum().maxCoeff(); const RealScalar Fincr_norm = Fincr.cwiseAbs().rowwise().sum().maxCoeff(); if (Fincr_norm < NumTraits::epsilon() * F_norm) { RealScalar delta = 0; RealScalar rfactorial = 1; for (Index r = 0; r < rows; r++) { RealScalar mx = 0; for (Index i = 0; i < rows; i++) mx = (std::max)(mx, std::abs(m_f(Ashifted(i, i) + avgEival, static_cast(s+r)))); if (r != 0) rfactorial *= RealScalar(r); delta = (std::max)(delta, mx / rfactorial); } const RealScalar P_norm = P.cwiseAbs().rowwise().sum().maxCoeff(); if (mu * delta * P_norm < NumTraits::epsilon() * F_norm) // series converged break; } } return F; } /** \brief Find cluster in \p clusters containing some value * \param[in] key Value to find * \returns Iterator to cluster containing \p key, or \c clusters.end() if no cluster in \p m_clusters * contains \p key. */ template typename ListOfClusters::iterator matrix_function_find_cluster(Index key, ListOfClusters& clusters) { typename std::list::iterator j; for (typename ListOfClusters::iterator i = clusters.begin(); i != clusters.end(); ++i) { j = std::find(i->begin(), i->end(), key); if (j != i->end()) return i; } return clusters.end(); } /** \brief Partition eigenvalues in clusters of ei'vals close to each other * * \param[in] eivals Eigenvalues * \param[out] clusters Resulting partition of eigenvalues * * The partition satisfies the following two properties: * # Any eigenvalue in a certain cluster is at most matrix_function_separation() away from another eigenvalue * in the same cluster. * # The distance between two eigenvalues in different clusters is more than matrix_function_separation(). * The implementation follows Algorithm 4.1 in the paper of Davies and Higham. */ template void matrix_function_partition_eigenvalues(const EivalsType& eivals, std::list& clusters) { typedef typename EivalsType::RealScalar RealScalar; for (Index i=0; i::iterator qi = matrix_function_find_cluster(i, clusters); if (qi == clusters.end()) { Cluster l; l.push_back(i); clusters.push_back(l); qi = clusters.end(); --qi; } // Look for other element to add to the set for (Index j=i+1; jbegin(), qi->end(), j) == qi->end()) { typename std::list::iterator qj = matrix_function_find_cluster(j, clusters); if (qj == clusters.end()) { qi->push_back(j); } else { qi->insert(qi->end(), qj->begin(), qj->end()); clusters.erase(qj); } } } } } /** \brief Compute size of each cluster given a partitioning */ template void matrix_function_compute_cluster_size(const ListOfClusters& clusters, Matrix& clusterSize) { const Index numClusters = static_cast(clusters.size()); clusterSize.setZero(numClusters); Index clusterIndex = 0; for (typename ListOfClusters::const_iterator cluster = clusters.begin(); cluster != clusters.end(); ++cluster) { clusterSize[clusterIndex] = cluster->size(); ++clusterIndex; } } /** \brief Compute start of each block using clusterSize */ template void matrix_function_compute_block_start(const VectorType& clusterSize, VectorType& blockStart) { blockStart.resize(clusterSize.rows()); blockStart(0) = 0; for (Index i = 1; i < clusterSize.rows(); i++) { blockStart(i) = blockStart(i-1) + clusterSize(i-1); } } /** \brief Compute mapping of eigenvalue indices to cluster indices */ template void matrix_function_compute_map(const EivalsType& eivals, const ListOfClusters& clusters, VectorType& eivalToCluster) { eivalToCluster.resize(eivals.rows()); Index clusterIndex = 0; for (typename ListOfClusters::const_iterator cluster = clusters.begin(); cluster != clusters.end(); ++cluster) { for (Index i = 0; i < eivals.rows(); ++i) { if (std::find(cluster->begin(), cluster->end(), i) != cluster->end()) { eivalToCluster[i] = clusterIndex; } } ++clusterIndex; } } /** \brief Compute permutation which groups ei'vals in same cluster together */ template void matrix_function_compute_permutation(const DynVectorType& blockStart, const DynVectorType& eivalToCluster, VectorType& permutation) { DynVectorType indexNextEntry = blockStart; permutation.resize(eivalToCluster.rows()); for (Index i = 0; i < eivalToCluster.rows(); i++) { Index cluster = eivalToCluster[i]; permutation[i] = indexNextEntry[cluster]; ++indexNextEntry[cluster]; } } /** \brief Permute Schur decomposition in U and T according to permutation */ template void matrix_function_permute_schur(VectorType& permutation, MatrixType& U, MatrixType& T) { for (Index i = 0; i < permutation.rows() - 1; i++) { Index j; for (j = i; j < permutation.rows(); j++) { if (permutation(j) == i) break; } eigen_assert(permutation(j) == i); for (Index k = j-1; k >= i; k--) { JacobiRotation rotation; rotation.makeGivens(T(k, k+1), T(k+1, k+1) - T(k, k)); T.applyOnTheLeft(k, k+1, rotation.adjoint()); T.applyOnTheRight(k, k+1, rotation); U.applyOnTheRight(k, k+1, rotation); std::swap(permutation.coeffRef(k), permutation.coeffRef(k+1)); } } } /** \brief Compute block diagonal part of matrix function. * * This routine computes the matrix function applied to the block diagonal part of \p T (which should be * upper triangular), with the blocking given by \p blockStart and \p clusterSize. The matrix function of * each diagonal block is computed by \p atomic. The off-diagonal parts of \p fT are set to zero. */ template void matrix_function_compute_block_atomic(const MatrixType& T, AtomicType& atomic, const VectorType& blockStart, const VectorType& clusterSize, MatrixType& fT) { fT.setZero(T.rows(), T.cols()); for (Index i = 0; i < clusterSize.rows(); ++i) { fT.block(blockStart(i), blockStart(i), clusterSize(i), clusterSize(i)) = atomic.compute(T.block(blockStart(i), blockStart(i), clusterSize(i), clusterSize(i))); } } /** \brief Solve a triangular Sylvester equation AX + XB = C * * \param[in] A the matrix A; should be square and upper triangular * \param[in] B the matrix B; should be square and upper triangular * \param[in] C the matrix C; should have correct size. * * \returns the solution X. * * If A is m-by-m and B is n-by-n, then both C and X are m-by-n. The (i,j)-th component of the Sylvester * equation is * \f[ * \sum_{k=i}^m A_{ik} X_{kj} + \sum_{k=1}^j X_{ik} B_{kj} = C_{ij}. * \f] * This can be re-arranged to yield: * \f[ * X_{ij} = \frac{1}{A_{ii} + B_{jj}} \Bigl( C_{ij} * - \sum_{k=i+1}^m A_{ik} X_{kj} - \sum_{k=1}^{j-1} X_{ik} B_{kj} \Bigr). * \f] * It is assumed that A and B are such that the numerator is never zero (otherwise the Sylvester equation * does not have a unique solution). In that case, these equations can be evaluated in the order * \f$ i=m,\ldots,1 \f$ and \f$ j=1,\ldots,n \f$. */ template MatrixType matrix_function_solve_triangular_sylvester(const MatrixType& A, const MatrixType& B, const MatrixType& C) { eigen_assert(A.rows() == A.cols()); eigen_assert(A.isUpperTriangular()); eigen_assert(B.rows() == B.cols()); eigen_assert(B.isUpperTriangular()); eigen_assert(C.rows() == A.rows()); eigen_assert(C.cols() == B.rows()); typedef typename MatrixType::Scalar Scalar; Index m = A.rows(); Index n = B.rows(); MatrixType X(m, n); for (Index i = m - 1; i >= 0; --i) { for (Index j = 0; j < n; ++j) { // Compute AX = \sum_{k=i+1}^m A_{ik} X_{kj} Scalar AX; if (i == m - 1) { AX = 0; } else { Matrix AXmatrix = A.row(i).tail(m-1-i) * X.col(j).tail(m-1-i); AX = AXmatrix(0,0); } // Compute XB = \sum_{k=1}^{j-1} X_{ik} B_{kj} Scalar XB; if (j == 0) { XB = 0; } else { Matrix XBmatrix = X.row(i).head(j) * B.col(j).head(j); XB = XBmatrix(0,0); } X(i,j) = (C(i,j) - AX - XB) / (A(i,i) + B(j,j)); } } return X; } /** \brief Compute part of matrix function above block diagonal. * * This routine completes the computation of \p fT, denoting a matrix function applied to the triangular * matrix \p T. It assumes that the block diagonal part of \p fT has already been computed. The part below * the diagonal is zero, because \p T is upper triangular. */ template void matrix_function_compute_above_diagonal(const MatrixType& T, const VectorType& blockStart, const VectorType& clusterSize, MatrixType& fT) { typedef internal::traits Traits; typedef typename MatrixType::Scalar Scalar; static const int Options = MatrixType::Options; typedef Matrix DynMatrixType; for (Index k = 1; k < clusterSize.rows(); k++) { for (Index i = 0; i < clusterSize.rows() - k; i++) { // compute (i, i+k) block DynMatrixType A = T.block(blockStart(i), blockStart(i), clusterSize(i), clusterSize(i)); DynMatrixType B = -T.block(blockStart(i+k), blockStart(i+k), clusterSize(i+k), clusterSize(i+k)); DynMatrixType C = fT.block(blockStart(i), blockStart(i), clusterSize(i), clusterSize(i)) * T.block(blockStart(i), blockStart(i+k), clusterSize(i), clusterSize(i+k)); C -= T.block(blockStart(i), blockStart(i+k), clusterSize(i), clusterSize(i+k)) * fT.block(blockStart(i+k), blockStart(i+k), clusterSize(i+k), clusterSize(i+k)); for (Index m = i + 1; m < i + k; m++) { C += fT.block(blockStart(i), blockStart(m), clusterSize(i), clusterSize(m)) * T.block(blockStart(m), blockStart(i+k), clusterSize(m), clusterSize(i+k)); C -= T.block(blockStart(i), blockStart(m), clusterSize(i), clusterSize(m)) * fT.block(blockStart(m), blockStart(i+k), clusterSize(m), clusterSize(i+k)); } fT.block(blockStart(i), blockStart(i+k), clusterSize(i), clusterSize(i+k)) = matrix_function_solve_triangular_sylvester(A, B, C); } } } /** \ingroup MatrixFunctions_Module * \brief Class for computing matrix functions. * \tparam MatrixType type of the argument of the matrix function, * expected to be an instantiation of the Matrix class template. * \tparam AtomicType type for computing matrix function of atomic blocks. * \tparam IsComplex used internally to select correct specialization. * * This class implements the Schur-Parlett algorithm for computing matrix functions. The spectrum of the * matrix is divided in clustered of eigenvalues that lies close together. This class delegates the * computation of the matrix function on every block corresponding to these clusters to an object of type * \p AtomicType and uses these results to compute the matrix function of the whole matrix. The class * \p AtomicType should have a \p compute() member function for computing the matrix function of a block. * * \sa class MatrixFunctionAtomic, class MatrixLogarithmAtomic */ template ::Scalar>::IsComplex> struct matrix_function_compute { /** \brief Compute the matrix function. * * \param[in] A argument of matrix function, should be a square matrix. * \param[in] atomic class for computing matrix function of atomic blocks. * \param[out] result the function \p f applied to \p A, as * specified in the constructor. * * See MatrixBase::matrixFunction() for details on how this computation * is implemented. */ template static void run(const MatrixType& A, AtomicType& atomic, ResultType &result); }; /** \internal \ingroup MatrixFunctions_Module * \brief Partial specialization of MatrixFunction for real matrices * * This converts the real matrix to a complex matrix, compute the matrix function of that matrix, and then * converts the result back to a real matrix. */ template struct matrix_function_compute { template static void run(const MatA& A, AtomicType& atomic, ResultType &result) { typedef internal::traits Traits; typedef typename Traits::Scalar Scalar; static const int Rows = Traits::RowsAtCompileTime, Cols = Traits::ColsAtCompileTime; static const int MaxRows = Traits::MaxRowsAtCompileTime, MaxCols = Traits::MaxColsAtCompileTime; typedef std::complex ComplexScalar; typedef Matrix ComplexMatrix; ComplexMatrix CA = A.template cast(); ComplexMatrix Cresult; matrix_function_compute::run(CA, atomic, Cresult); result = Cresult.real(); } }; /** \internal \ingroup MatrixFunctions_Module * \brief Partial specialization of MatrixFunction for complex matrices */ template struct matrix_function_compute { template static void run(const MatA& A, AtomicType& atomic, ResultType &result) { typedef internal::traits Traits; // compute Schur decomposition of A const ComplexSchur schurOfA(A); eigen_assert(schurOfA.info()==Success); MatrixType T = schurOfA.matrixT(); MatrixType U = schurOfA.matrixU(); // partition eigenvalues into clusters of ei'vals "close" to each other std::list > clusters; matrix_function_partition_eigenvalues(T.diagonal(), clusters); // compute size of each cluster Matrix clusterSize; matrix_function_compute_cluster_size(clusters, clusterSize); // blockStart[i] is row index at which block corresponding to i-th cluster starts Matrix blockStart; matrix_function_compute_block_start(clusterSize, blockStart); // compute map so that eivalToCluster[i] = j means that i-th ei'val is in j-th cluster Matrix eivalToCluster; matrix_function_compute_map(T.diagonal(), clusters, eivalToCluster); // compute permutation which groups ei'vals in same cluster together Matrix permutation; matrix_function_compute_permutation(blockStart, eivalToCluster, permutation); // permute Schur decomposition matrix_function_permute_schur(permutation, U, T); // compute result MatrixType fT; // matrix function applied to T matrix_function_compute_block_atomic(T, atomic, blockStart, clusterSize, fT); matrix_function_compute_above_diagonal(T, blockStart, clusterSize, fT); result = U * (fT.template triangularView() * U.adjoint()); } }; } // end of namespace internal /** \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix function of some matrix (expression). * * \tparam Derived Type of the argument to the matrix function. * * This class holds the argument to the matrix function until it is assigned or evaluated for some other * reason (so the argument should not be changed in the meantime). It is the return type of * matrixBase::matrixFunction() and related functions and most of the time this is the only way it is used. */ template class MatrixFunctionReturnValue : public ReturnByValue > { public: typedef typename Derived::Scalar Scalar; typedef typename internal::stem_function::type StemFunction; protected: typedef typename internal::ref_selector::type DerivedNested; public: /** \brief Constructor. * * \param[in] A %Matrix (expression) forming the argument of the matrix function. * \param[in] f Stem function for matrix function under consideration. */ MatrixFunctionReturnValue(const Derived& A, StemFunction f) : m_A(A), m_f(f) { } /** \brief Compute the matrix function. * * \param[out] result \p f applied to \p A, where \p f and \p A are as in the constructor. */ template inline void evalTo(ResultType& result) const { typedef typename internal::nested_eval::type NestedEvalType; typedef typename internal::remove_all::type NestedEvalTypeClean; typedef internal::traits Traits; typedef std::complex::Real> ComplexScalar; typedef Matrix DynMatrixType; typedef internal::MatrixFunctionAtomic AtomicType; AtomicType atomic(m_f); internal::matrix_function_compute::run(m_A, atomic, result); } Index rows() const { return m_A.rows(); } Index cols() const { return m_A.cols(); } private: const DerivedNested m_A; StemFunction *m_f; }; namespace internal { template struct traits > { typedef typename Derived::PlainObject ReturnType; }; } /********** MatrixBase methods **********/ template const MatrixFunctionReturnValue MatrixBase::matrixFunction(typename internal::stem_function::Scalar>::type f) const { eigen_assert(rows() == cols()); return MatrixFunctionReturnValue(derived(), f); } template const MatrixFunctionReturnValue MatrixBase::sin() const { eigen_assert(rows() == cols()); typedef typename internal::stem_function::ComplexScalar ComplexScalar; return MatrixFunctionReturnValue(derived(), internal::stem_function_sin); } template const MatrixFunctionReturnValue MatrixBase::cos() const { eigen_assert(rows() == cols()); typedef typename internal::stem_function::ComplexScalar ComplexScalar; return MatrixFunctionReturnValue(derived(), internal::stem_function_cos); } template const MatrixFunctionReturnValue MatrixBase::sinh() const { eigen_assert(rows() == cols()); typedef typename internal::stem_function::ComplexScalar ComplexScalar; return MatrixFunctionReturnValue(derived(), internal::stem_function_sinh); } template const MatrixFunctionReturnValue MatrixBase::cosh() const { eigen_assert(rows() == cols()); typedef typename internal::stem_function::ComplexScalar ComplexScalar; return MatrixFunctionReturnValue(derived(), internal::stem_function_cosh); } } // end namespace Eigen #endif // EIGEN_MATRIX_FUNCTION_H RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/StemFunction.h0000644000176200001440000000407314107270226026665 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010, 2013 Jitse Niesen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STEM_FUNCTION #define EIGEN_STEM_FUNCTION namespace Eigen { namespace internal { /** \brief The exponential function (and its derivatives). */ template Scalar stem_function_exp(Scalar x, int) { using std::exp; return exp(x); } /** \brief Cosine (and its derivatives). */ template Scalar stem_function_cos(Scalar x, int n) { using std::cos; using std::sin; Scalar res; switch (n % 4) { case 0: res = std::cos(x); break; case 1: res = -std::sin(x); break; case 2: res = -std::cos(x); break; case 3: res = std::sin(x); break; } return res; } /** \brief Sine (and its derivatives). */ template Scalar stem_function_sin(Scalar x, int n) { using std::cos; using std::sin; Scalar res; switch (n % 4) { case 0: res = std::sin(x); break; case 1: res = std::cos(x); break; case 2: res = -std::sin(x); break; case 3: res = -std::cos(x); break; } return res; } /** \brief Hyperbolic cosine (and its derivatives). */ template Scalar stem_function_cosh(Scalar x, int n) { using std::cosh; using std::sinh; Scalar res; switch (n % 2) { case 0: res = std::cosh(x); break; case 1: res = std::sinh(x); break; } return res; } /** \brief Hyperbolic sine (and its derivatives). */ template Scalar stem_function_sinh(Scalar x, int n) { using std::cosh; using std::sinh; Scalar res; switch (n % 2) { case 0: res = std::sinh(x); break; case 1: res = std::cosh(x); break; } return res; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_STEM_FUNCTION RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixSquareRoot.h0000644000176200001440000003360414562417221027545 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011, 2013 Jitse Niesen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_SQUARE_ROOT #define EIGEN_MATRIX_SQUARE_ROOT namespace Eigen { namespace internal { // pre: T.block(i,i,2,2) has complex conjugate eigenvalues // post: sqrtT.block(i,i,2,2) is square root of T.block(i,i,2,2) template void matrix_sqrt_quasi_triangular_2x2_diagonal_block(const MatrixType& T, Index i, ResultType& sqrtT) { // TODO: This case (2-by-2 blocks with complex conjugate eigenvalues) is probably hidden somewhere // in EigenSolver. If we expose it, we could call it directly from here. typedef typename traits::Scalar Scalar; Matrix block = T.template block<2,2>(i,i); EigenSolver > es(block); sqrtT.template block<2,2>(i,i) = (es.eigenvectors() * es.eigenvalues().cwiseSqrt().asDiagonal() * es.eigenvectors().inverse()).real(); } // pre: block structure of T is such that (i,j) is a 1x1 block, // all blocks of sqrtT to left of and below (i,j) are correct // post: sqrtT(i,j) has the correct value template void matrix_sqrt_quasi_triangular_1x1_off_diagonal_block(const MatrixType& T, Index i, Index j, ResultType& sqrtT) { typedef typename traits::Scalar Scalar; Scalar tmp = (sqrtT.row(i).segment(i+1,j-i-1) * sqrtT.col(j).segment(i+1,j-i-1)).value(); sqrtT.coeffRef(i,j) = (T.coeff(i,j) - tmp) / (sqrtT.coeff(i,i) + sqrtT.coeff(j,j)); } // similar to compute1x1offDiagonalBlock() template void matrix_sqrt_quasi_triangular_1x2_off_diagonal_block(const MatrixType& T, Index i, Index j, ResultType& sqrtT) { typedef typename traits::Scalar Scalar; Matrix rhs = T.template block<1,2>(i,j); if (j-i > 1) rhs -= sqrtT.block(i, i+1, 1, j-i-1) * sqrtT.block(i+1, j, j-i-1, 2); Matrix A = sqrtT.coeff(i,i) * Matrix::Identity(); A += sqrtT.template block<2,2>(j,j).transpose(); sqrtT.template block<1,2>(i,j).transpose() = A.fullPivLu().solve(rhs.transpose()); } // similar to compute1x1offDiagonalBlock() template void matrix_sqrt_quasi_triangular_2x1_off_diagonal_block(const MatrixType& T, Index i, Index j, ResultType& sqrtT) { typedef typename traits::Scalar Scalar; Matrix rhs = T.template block<2,1>(i,j); if (j-i > 2) rhs -= sqrtT.block(i, i+2, 2, j-i-2) * sqrtT.block(i+2, j, j-i-2, 1); Matrix A = sqrtT.coeff(j,j) * Matrix::Identity(); A += sqrtT.template block<2,2>(i,i); sqrtT.template block<2,1>(i,j) = A.fullPivLu().solve(rhs); } // solves the equation A X + X B = C where all matrices are 2-by-2 template void matrix_sqrt_quasi_triangular_solve_auxiliary_equation(MatrixType& X, const MatrixType& A, const MatrixType& B, const MatrixType& C) { typedef typename traits::Scalar Scalar; Matrix coeffMatrix = Matrix::Zero(); coeffMatrix.coeffRef(0,0) = A.coeff(0,0) + B.coeff(0,0); coeffMatrix.coeffRef(1,1) = A.coeff(0,0) + B.coeff(1,1); coeffMatrix.coeffRef(2,2) = A.coeff(1,1) + B.coeff(0,0); coeffMatrix.coeffRef(3,3) = A.coeff(1,1) + B.coeff(1,1); coeffMatrix.coeffRef(0,1) = B.coeff(1,0); coeffMatrix.coeffRef(0,2) = A.coeff(0,1); coeffMatrix.coeffRef(1,0) = B.coeff(0,1); coeffMatrix.coeffRef(1,3) = A.coeff(0,1); coeffMatrix.coeffRef(2,0) = A.coeff(1,0); coeffMatrix.coeffRef(2,3) = B.coeff(1,0); coeffMatrix.coeffRef(3,1) = A.coeff(1,0); coeffMatrix.coeffRef(3,2) = B.coeff(0,1); Matrix rhs; rhs.coeffRef(0) = C.coeff(0,0); rhs.coeffRef(1) = C.coeff(0,1); rhs.coeffRef(2) = C.coeff(1,0); rhs.coeffRef(3) = C.coeff(1,1); Matrix result; result = coeffMatrix.fullPivLu().solve(rhs); X.coeffRef(0,0) = result.coeff(0); X.coeffRef(0,1) = result.coeff(1); X.coeffRef(1,0) = result.coeff(2); X.coeffRef(1,1) = result.coeff(3); } // similar to compute1x1offDiagonalBlock() template void matrix_sqrt_quasi_triangular_2x2_off_diagonal_block(const MatrixType& T, Index i, Index j, ResultType& sqrtT) { typedef typename traits::Scalar Scalar; Matrix A = sqrtT.template block<2,2>(i,i); Matrix B = sqrtT.template block<2,2>(j,j); Matrix C = T.template block<2,2>(i,j); if (j-i > 2) C -= sqrtT.block(i, i+2, 2, j-i-2) * sqrtT.block(i+2, j, j-i-2, 2); Matrix X; matrix_sqrt_quasi_triangular_solve_auxiliary_equation(X, A, B, C); sqrtT.template block<2,2>(i,j) = X; } // pre: T is quasi-upper-triangular and sqrtT is a zero matrix of the same size // post: the diagonal blocks of sqrtT are the square roots of the diagonal blocks of T template void matrix_sqrt_quasi_triangular_diagonal(const MatrixType& T, ResultType& sqrtT) { using std::sqrt; const Index size = T.rows(); for (Index i = 0; i < size; i++) { if (i == size - 1 || T.coeff(i+1, i) == 0) { eigen_assert(T(i,i) >= 0); sqrtT.coeffRef(i,i) = sqrt(T.coeff(i,i)); } else { matrix_sqrt_quasi_triangular_2x2_diagonal_block(T, i, sqrtT); ++i; } } } // pre: T is quasi-upper-triangular and diagonal blocks of sqrtT are square root of diagonal blocks of T. // post: sqrtT is the square root of T. template void matrix_sqrt_quasi_triangular_off_diagonal(const MatrixType& T, ResultType& sqrtT) { const Index size = T.rows(); for (Index j = 1; j < size; j++) { if (T.coeff(j, j-1) != 0) // if T(j-1:j, j-1:j) is a 2-by-2 block continue; for (Index i = j-1; i >= 0; i--) { if (i > 0 && T.coeff(i, i-1) != 0) // if T(i-1:i, i-1:i) is a 2-by-2 block continue; bool iBlockIs2x2 = (i < size - 1) && (T.coeff(i+1, i) != 0); bool jBlockIs2x2 = (j < size - 1) && (T.coeff(j+1, j) != 0); if (iBlockIs2x2 && jBlockIs2x2) matrix_sqrt_quasi_triangular_2x2_off_diagonal_block(T, i, j, sqrtT); else if (iBlockIs2x2 && !jBlockIs2x2) matrix_sqrt_quasi_triangular_2x1_off_diagonal_block(T, i, j, sqrtT); else if (!iBlockIs2x2 && jBlockIs2x2) matrix_sqrt_quasi_triangular_1x2_off_diagonal_block(T, i, j, sqrtT); else if (!iBlockIs2x2 && !jBlockIs2x2) matrix_sqrt_quasi_triangular_1x1_off_diagonal_block(T, i, j, sqrtT); } } } } // end of namespace internal /** \ingroup MatrixFunctions_Module * \brief Compute matrix square root of quasi-triangular matrix. * * \tparam MatrixType type of \p arg, the argument of matrix square root, * expected to be an instantiation of the Matrix class template. * \tparam ResultType type of \p result, where result is to be stored. * \param[in] arg argument of matrix square root. * \param[out] result matrix square root of upper Hessenberg part of \p arg. * * This function computes the square root of the upper quasi-triangular matrix stored in the upper * Hessenberg part of \p arg. Only the upper Hessenberg part of \p result is updated, the rest is * not touched. See MatrixBase::sqrt() for details on how this computation is implemented. * * \sa MatrixSquareRoot, MatrixSquareRootQuasiTriangular */ template void matrix_sqrt_quasi_triangular(const MatrixType &arg, ResultType &result) { eigen_assert(arg.rows() == arg.cols()); result.resize(arg.rows(), arg.cols()); internal::matrix_sqrt_quasi_triangular_diagonal(arg, result); internal::matrix_sqrt_quasi_triangular_off_diagonal(arg, result); } /** \ingroup MatrixFunctions_Module * \brief Compute matrix square root of triangular matrix. * * \tparam MatrixType type of \p arg, the argument of matrix square root, * expected to be an instantiation of the Matrix class template. * \tparam ResultType type of \p result, where result is to be stored. * \param[in] arg argument of matrix square root. * \param[out] result matrix square root of upper triangular part of \p arg. * * Only the upper triangular part (including the diagonal) of \p result is updated, the rest is not * touched. See MatrixBase::sqrt() for details on how this computation is implemented. * * \sa MatrixSquareRoot, MatrixSquareRootQuasiTriangular */ template void matrix_sqrt_triangular(const MatrixType &arg, ResultType &result) { using std::sqrt; typedef typename MatrixType::Scalar Scalar; eigen_assert(arg.rows() == arg.cols()); // Compute square root of arg and store it in upper triangular part of result // This uses that the square root of triangular matrices can be computed directly. result.resize(arg.rows(), arg.cols()); for (Index i = 0; i < arg.rows(); i++) { result.coeffRef(i,i) = sqrt(arg.coeff(i,i)); } for (Index j = 1; j < arg.cols(); j++) { for (Index i = j-1; i >= 0; i--) { // if i = j-1, then segment has length 0 so tmp = 0 Scalar tmp = (result.row(i).segment(i+1,j-i-1) * result.col(j).segment(i+1,j-i-1)).value(); // denominator may be zero if original matrix is singular result.coeffRef(i,j) = (arg.coeff(i,j) - tmp) / (result.coeff(i,i) + result.coeff(j,j)); } } } namespace internal { /** \ingroup MatrixFunctions_Module * \brief Helper struct for computing matrix square roots of general matrices. * \tparam MatrixType type of the argument of the matrix square root, * expected to be an instantiation of the Matrix class template. * * \sa MatrixSquareRootTriangular, MatrixSquareRootQuasiTriangular, MatrixBase::sqrt() */ template ::Scalar>::IsComplex> struct matrix_sqrt_compute { /** \brief Compute the matrix square root * * \param[in] arg matrix whose square root is to be computed. * \param[out] result square root of \p arg. * * See MatrixBase::sqrt() for details on how this computation is implemented. */ template static void run(const MatrixType &arg, ResultType &result); }; // ********** Partial specialization for real matrices ********** template struct matrix_sqrt_compute { typedef typename MatrixType::PlainObject PlainType; template static void run(const MatrixType &arg, ResultType &result) { eigen_assert(arg.rows() == arg.cols()); // Compute Schur decomposition of arg const RealSchur schurOfA(arg); const PlainType& T = schurOfA.matrixT(); const PlainType& U = schurOfA.matrixU(); // Compute square root of T PlainType sqrtT = PlainType::Zero(arg.rows(), arg.cols()); matrix_sqrt_quasi_triangular(T, sqrtT); // Compute square root of arg result = U * sqrtT * U.adjoint(); } }; // ********** Partial specialization for complex matrices ********** template struct matrix_sqrt_compute { typedef typename MatrixType::PlainObject PlainType; template static void run(const MatrixType &arg, ResultType &result) { eigen_assert(arg.rows() == arg.cols()); // Compute Schur decomposition of arg const ComplexSchur schurOfA(arg); const PlainType& T = schurOfA.matrixT(); const PlainType& U = schurOfA.matrixU(); // Compute square root of T PlainType sqrtT; matrix_sqrt_triangular(T, sqrtT); // Compute square root of arg result = U * (sqrtT.template triangularView() * U.adjoint()); } }; } // end namespace internal /** \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix square root of some matrix (expression). * * \tparam Derived Type of the argument to the matrix square root. * * This class holds the argument to the matrix square root until it * is assigned or evaluated for some other reason (so the argument * should not be changed in the meantime). It is the return type of * MatrixBase::sqrt() and most of the time this is the only way it is * used. */ template class MatrixSquareRootReturnValue : public ReturnByValue > { protected: typedef typename internal::ref_selector::type DerivedNested; public: /** \brief Constructor. * * \param[in] src %Matrix (expression) forming the argument of the * matrix square root. */ explicit MatrixSquareRootReturnValue(const Derived& src) : m_src(src) { } /** \brief Compute the matrix square root. * * \param[out] result the matrix square root of \p src in the * constructor. */ template inline void evalTo(ResultType& result) const { typedef typename internal::nested_eval::type DerivedEvalType; typedef typename internal::remove_all::type DerivedEvalTypeClean; DerivedEvalType tmp(m_src); internal::matrix_sqrt_compute::run(tmp, result); } Index rows() const { return m_src.rows(); } Index cols() const { return m_src.cols(); } protected: const DerivedNested m_src; }; namespace internal { template struct traits > { typedef typename Derived::PlainObject ReturnType; }; } template const MatrixSquareRootReturnValue MatrixBase::sqrt() const { eigen_assert(rows() == cols()); return MatrixSquareRootReturnValue(derived()); } } // end namespace Eigen #endif // EIGEN_MATRIX_FUNCTION RcppEigen/inst/include/unsupported/Eigen/src/MatrixFunctions/MatrixExponential.h0000644000176200001440000004036014562417221027724 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009, 2010, 2013 Jitse Niesen // Copyright (C) 2011, 2013 Chen-Pang He // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_EXPONENTIAL #define EIGEN_MATRIX_EXPONENTIAL #include "StemFunction.h" namespace Eigen { namespace internal { /** \brief Scaling operator. * * This struct is used by CwiseUnaryOp to scale a matrix by \f$ 2^{-s} \f$. */ template struct MatrixExponentialScalingOp { /** \brief Constructor. * * \param[in] squarings The integer \f$ s \f$ in this document. */ MatrixExponentialScalingOp(int squarings) : m_squarings(squarings) { } /** \brief Scale a matrix coefficient. * * \param[in,out] x The scalar to be scaled, becoming \f$ 2^{-s} x \f$. */ inline const RealScalar operator() (const RealScalar& x) const { using std::ldexp; return ldexp(x, -m_squarings); } typedef std::complex ComplexScalar; /** \brief Scale a matrix coefficient. * * \param[in,out] x The scalar to be scaled, becoming \f$ 2^{-s} x \f$. */ inline const ComplexScalar operator() (const ComplexScalar& x) const { using std::ldexp; return ComplexScalar(ldexp(x.real(), -m_squarings), ldexp(x.imag(), -m_squarings)); } private: int m_squarings; }; /** \brief Compute the (3,3)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. */ template void matrix_exp_pade3(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {120.L, 60.L, 12.L, 1.L}; const MatrixType A2 = A * A; const MatrixType tmp = b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; V = b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } /** \brief Compute the (5,5)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. */ template void matrix_exp_pade5(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {30240.L, 15120.L, 3360.L, 420.L, 30.L, 1.L}; const MatrixType A2 = A * A; const MatrixType A4 = A2 * A2; const MatrixType tmp = b[5] * A4 + b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; V = b[4] * A4 + b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } /** \brief Compute the (7,7)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. */ template void matrix_exp_pade7(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {17297280.L, 8648640.L, 1995840.L, 277200.L, 25200.L, 1512.L, 56.L, 1.L}; const MatrixType A2 = A * A; const MatrixType A4 = A2 * A2; const MatrixType A6 = A4 * A2; const MatrixType tmp = b[7] * A6 + b[5] * A4 + b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; V = b[6] * A6 + b[4] * A4 + b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } /** \brief Compute the (9,9)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. */ template void matrix_exp_pade9(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {17643225600.L, 8821612800.L, 2075673600.L, 302702400.L, 30270240.L, 2162160.L, 110880.L, 3960.L, 90.L, 1.L}; const MatrixType A2 = A * A; const MatrixType A4 = A2 * A2; const MatrixType A6 = A4 * A2; const MatrixType A8 = A6 * A2; const MatrixType tmp = b[9] * A8 + b[7] * A6 + b[5] * A4 + b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; V = b[8] * A8 + b[6] * A6 + b[4] * A4 + b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } /** \brief Compute the (13,13)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. */ template void matrix_exp_pade13(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {64764752532480000.L, 32382376266240000.L, 7771770303897600.L, 1187353796428800.L, 129060195264000.L, 10559470521600.L, 670442572800.L, 33522128640.L, 1323241920.L, 40840800.L, 960960.L, 16380.L, 182.L, 1.L}; const MatrixType A2 = A * A; const MatrixType A4 = A2 * A2; const MatrixType A6 = A4 * A2; V = b[13] * A6 + b[11] * A4 + b[9] * A2; // used for temporary storage MatrixType tmp = A6 * V; tmp += b[7] * A6 + b[5] * A4 + b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; tmp = b[12] * A6 + b[10] * A4 + b[8] * A2; V.noalias() = A6 * tmp; V += b[6] * A6 + b[4] * A4 + b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } /** \brief Compute the (17,17)-Padé approximant to the exponential. * * After exit, \f$ (V+U)(V-U)^{-1} \f$ is the Padé * approximant of \f$ \exp(A) \f$ around \f$ A = 0 \f$. * * This function activates only if your long double is double-double or quadruple. */ #if LDBL_MANT_DIG > 64 template void matrix_exp_pade17(const MatA& A, MatU& U, MatV& V) { typedef typename MatA::PlainObject MatrixType; typedef typename NumTraits::Scalar>::Real RealScalar; const RealScalar b[] = {830034394580628357120000.L, 415017197290314178560000.L, 100610229646136770560000.L, 15720348382208870400000.L, 1774878043152614400000.L, 153822763739893248000.L, 10608466464820224000.L, 595373117923584000.L, 27563570274240000.L, 1060137318240000.L, 33924394183680.L, 899510451840.L, 19554575040.L, 341863200.L, 4651200.L, 46512.L, 306.L, 1.L}; const MatrixType A2 = A * A; const MatrixType A4 = A2 * A2; const MatrixType A6 = A4 * A2; const MatrixType A8 = A4 * A4; V = b[17] * A8 + b[15] * A6 + b[13] * A4 + b[11] * A2; // used for temporary storage MatrixType tmp = A8 * V; tmp += b[9] * A8 + b[7] * A6 + b[5] * A4 + b[3] * A2 + b[1] * MatrixType::Identity(A.rows(), A.cols()); U.noalias() = A * tmp; tmp = b[16] * A8 + b[14] * A6 + b[12] * A4 + b[10] * A2; V.noalias() = tmp * A8; V += b[8] * A8 + b[6] * A6 + b[4] * A4 + b[2] * A2 + b[0] * MatrixType::Identity(A.rows(), A.cols()); } #endif template ::Scalar>::Real> struct matrix_exp_computeUV { /** \brief Compute Padé approximant to the exponential. * * Computes \c U, \c V and \c squarings such that \f$ (V+U)(V-U)^{-1} \f$ is a Padé * approximant of \f$ \exp(2^{-\mbox{squarings}}M) \f$ around \f$ M = 0 \f$, where \f$ M \f$ * denotes the matrix \c arg. The degree of the Padé approximant and the value of squarings * are chosen such that the approximation error is no more than the round-off error. */ static void run(const MatrixType& arg, MatrixType& U, MatrixType& V, int& squarings); }; template struct matrix_exp_computeUV { template static void run(const ArgType& arg, MatrixType& U, MatrixType& V, int& squarings) { using std::frexp; using std::pow; const float l1norm = arg.cwiseAbs().colwise().sum().maxCoeff(); squarings = 0; if (l1norm < 4.258730016922831e-001f) { matrix_exp_pade3(arg, U, V); } else if (l1norm < 1.880152677804762e+000f) { matrix_exp_pade5(arg, U, V); } else { const float maxnorm = 3.925724783138660f; frexp(l1norm / maxnorm, &squarings); if (squarings < 0) squarings = 0; MatrixType A = arg.unaryExpr(MatrixExponentialScalingOp(squarings)); matrix_exp_pade7(A, U, V); } } }; template struct matrix_exp_computeUV { typedef typename NumTraits::Scalar>::Real RealScalar; template static void run(const ArgType& arg, MatrixType& U, MatrixType& V, int& squarings) { using std::frexp; using std::pow; const RealScalar l1norm = arg.cwiseAbs().colwise().sum().maxCoeff(); squarings = 0; if (l1norm < 1.495585217958292e-002) { matrix_exp_pade3(arg, U, V); } else if (l1norm < 2.539398330063230e-001) { matrix_exp_pade5(arg, U, V); } else if (l1norm < 9.504178996162932e-001) { matrix_exp_pade7(arg, U, V); } else if (l1norm < 2.097847961257068e+000) { matrix_exp_pade9(arg, U, V); } else { const RealScalar maxnorm = 5.371920351148152; frexp(l1norm / maxnorm, &squarings); if (squarings < 0) squarings = 0; MatrixType A = arg.unaryExpr(MatrixExponentialScalingOp(squarings)); matrix_exp_pade13(A, U, V); } } }; template struct matrix_exp_computeUV { template static void run(const ArgType& arg, MatrixType& U, MatrixType& V, int& squarings) { #if LDBL_MANT_DIG == 53 // double precision matrix_exp_computeUV::run(arg, U, V, squarings); #else using std::frexp; using std::pow; const long double l1norm = arg.cwiseAbs().colwise().sum().maxCoeff(); squarings = 0; #if LDBL_MANT_DIG <= 64 // extended precision if (l1norm < 4.1968497232266989671e-003L) { matrix_exp_pade3(arg, U, V); } else if (l1norm < 1.1848116734693823091e-001L) { matrix_exp_pade5(arg, U, V); } else if (l1norm < 5.5170388480686700274e-001L) { matrix_exp_pade7(arg, U, V); } else if (l1norm < 1.3759868875587845383e+000L) { matrix_exp_pade9(arg, U, V); } else { const long double maxnorm = 4.0246098906697353063L; frexp(l1norm / maxnorm, &squarings); if (squarings < 0) squarings = 0; MatrixType A = arg.unaryExpr(MatrixExponentialScalingOp(squarings)); matrix_exp_pade13(A, U, V); } #elif LDBL_MANT_DIG <= 106 // double-double if (l1norm < 3.2787892205607026992947488108213e-005L) { matrix_exp_pade3(arg, U, V); } else if (l1norm < 6.4467025060072760084130906076332e-003L) { matrix_exp_pade5(arg, U, V); } else if (l1norm < 6.8988028496595374751374122881143e-002L) { matrix_exp_pade7(arg, U, V); } else if (l1norm < 2.7339737518502231741495857201670e-001L) { matrix_exp_pade9(arg, U, V); } else if (l1norm < 1.3203382096514474905666448850278e+000L) { matrix_exp_pade13(arg, U, V); } else { const long double maxnorm = 3.2579440895405400856599663723517L; frexp(l1norm / maxnorm, &squarings); if (squarings < 0) squarings = 0; MatrixType A = arg.unaryExpr(MatrixExponentialScalingOp(squarings)); matrix_exp_pade17(A, U, V); } #elif LDBL_MANT_DIG <= 113 // quadruple precision if (l1norm < 1.639394610288918690547467954466970e-005L) { matrix_exp_pade3(arg, U, V); } else if (l1norm < 4.253237712165275566025884344433009e-003L) { matrix_exp_pade5(arg, U, V); } else if (l1norm < 5.125804063165764409885122032933142e-002L) { matrix_exp_pade7(arg, U, V); } else if (l1norm < 2.170000765161155195453205651889853e-001L) { matrix_exp_pade9(arg, U, V); } else if (l1norm < 1.125358383453143065081397882891878e+000L) { matrix_exp_pade13(arg, U, V); } else { const long double maxnorm = 2.884233277829519311757165057717815L; frexp(l1norm / maxnorm, &squarings); if (squarings < 0) squarings = 0; MatrixType A = arg.unaryExpr(MatrixExponentialScalingOp(squarings)); matrix_exp_pade17(A, U, V); } #else // this case should be handled in compute() eigen_assert(false && "Bug in MatrixExponential"); #endif #endif // LDBL_MANT_DIG } }; template struct is_exp_known_type : false_type {}; template<> struct is_exp_known_type : true_type {}; template<> struct is_exp_known_type : true_type {}; #if LDBL_MANT_DIG <= 113 template<> struct is_exp_known_type : true_type {}; #endif template void matrix_exp_compute(const ArgType& arg, ResultType &result, true_type) // natively supported scalar type { typedef typename ArgType::PlainObject MatrixType; MatrixType U, V; int squarings; matrix_exp_computeUV::run(arg, U, V, squarings); // Pade approximant is (U+V) / (-U+V) MatrixType numer = U + V; MatrixType denom = -U + V; result = denom.partialPivLu().solve(numer); for (int i=0; i void matrix_exp_compute(const ArgType& arg, ResultType &result, false_type) // default { typedef typename ArgType::PlainObject MatrixType; typedef typename traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef typename std::complex ComplexScalar; result = arg.matrixFunction(internal::stem_function_exp); } } // end namespace Eigen::internal /** \ingroup MatrixFunctions_Module * * \brief Proxy for the matrix exponential of some matrix (expression). * * \tparam Derived Type of the argument to the matrix exponential. * * This class holds the argument to the matrix exponential until it is assigned or evaluated for * some other reason (so the argument should not be changed in the meantime). It is the return type * of MatrixBase::exp() and most of the time this is the only way it is used. */ template struct MatrixExponentialReturnValue : public ReturnByValue > { public: /** \brief Constructor. * * \param src %Matrix (expression) forming the argument of the matrix exponential. */ MatrixExponentialReturnValue(const Derived& src) : m_src(src) { } /** \brief Compute the matrix exponential. * * \param result the matrix exponential of \p src in the constructor. */ template inline void evalTo(ResultType& result) const { const typename internal::nested_eval::type tmp(m_src); internal::matrix_exp_compute(tmp, result, internal::is_exp_known_type()); } Index rows() const { return m_src.rows(); } Index cols() const { return m_src.cols(); } protected: const typename internal::ref_selector::type m_src; }; namespace internal { template struct traits > { typedef typename Derived::PlainObject ReturnType; }; } template const MatrixExponentialReturnValue MatrixBase::exp() const { eigen_assert(rows() == cols()); return MatrixExponentialReturnValue(derived()); } } // end namespace Eigen #endif // EIGEN_MATRIX_EXPONENTIAL RcppEigen/inst/include/unsupported/Eigen/src/Eigenvalues/0000755000176200001440000000000014562417221023207 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/Eigenvalues/ArpackSelfAdjointEigenSolver.h0000644000176200001440000007062314562417221031057 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 David Harmon // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARPACKGENERALIZEDSELFADJOINTEIGENSOLVER_H #define EIGEN_ARPACKGENERALIZEDSELFADJOINTEIGENSOLVER_H #include "../../../../Eigen/Dense" namespace Eigen { namespace internal { template struct arpack_wrapper; template struct OP; } template, bool BisSPD=false> class ArpackGeneralizedSelfAdjointEigenSolver { public: //typedef typename MatrixSolver::MatrixType MatrixType; /** \brief Scalar type for matrices of type \p MatrixType. */ typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::Index Index; /** \brief Real scalar type for \p MatrixType. * * This is just \c Scalar if #Scalar is real (e.g., \c float or * \c Scalar), and the type of the real part of \c Scalar if #Scalar is * complex. */ typedef typename NumTraits::Real RealScalar; /** \brief Type for vector of eigenvalues as returned by eigenvalues(). * * This is a column vector with entries of type #RealScalar. * The length of the vector is the size of \p nbrEigenvalues. */ typedef typename internal::plain_col_type::type RealVectorType; /** \brief Default constructor. * * The default constructor is for cases in which the user intends to * perform decompositions via compute(). * */ ArpackGeneralizedSelfAdjointEigenSolver() : m_eivec(), m_eivalues(), m_isInitialized(false), m_eigenvectorsOk(false), m_nbrConverged(0), m_nbrIterations(0) { } /** \brief Constructor; computes generalized eigenvalues of given matrix with respect to another matrix. * * \param[in] A Self-adjoint matrix whose eigenvalues / eigenvectors will * computed. By default, the upper triangular part is used, but can be changed * through the template parameter. * \param[in] B Self-adjoint matrix for the generalized eigenvalue problem. * \param[in] nbrEigenvalues The number of eigenvalues / eigenvectors to compute. * Must be less than the size of the input matrix, or an error is returned. * \param[in] eigs_sigma String containing either "LM", "SM", "LA", or "SA", with * respective meanings to find the largest magnitude , smallest magnitude, * largest algebraic, or smallest algebraic eigenvalues. Alternatively, this * value can contain floating point value in string form, in which case the * eigenvalues closest to this value will be found. * \param[in] options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly. * \param[in] tol What tolerance to find the eigenvalues to. Default is 0, which * means machine precision. * * This constructor calls compute(const MatrixType&, const MatrixType&, Index, string, int, RealScalar) * to compute the eigenvalues of the matrix \p A with respect to \p B. The eigenvectors are computed if * \p options equals #ComputeEigenvectors. * */ ArpackGeneralizedSelfAdjointEigenSolver(const MatrixType& A, const MatrixType& B, Index nbrEigenvalues, std::string eigs_sigma="LM", int options=ComputeEigenvectors, RealScalar tol=0.0) : m_eivec(), m_eivalues(), m_isInitialized(false), m_eigenvectorsOk(false), m_nbrConverged(0), m_nbrIterations(0) { compute(A, B, nbrEigenvalues, eigs_sigma, options, tol); } /** \brief Constructor; computes eigenvalues of given matrix. * * \param[in] A Self-adjoint matrix whose eigenvalues / eigenvectors will * computed. By default, the upper triangular part is used, but can be changed * through the template parameter. * \param[in] nbrEigenvalues The number of eigenvalues / eigenvectors to compute. * Must be less than the size of the input matrix, or an error is returned. * \param[in] eigs_sigma String containing either "LM", "SM", "LA", or "SA", with * respective meanings to find the largest magnitude , smallest magnitude, * largest algebraic, or smallest algebraic eigenvalues. Alternatively, this * value can contain floating point value in string form, in which case the * eigenvalues closest to this value will be found. * \param[in] options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly. * \param[in] tol What tolerance to find the eigenvalues to. Default is 0, which * means machine precision. * * This constructor calls compute(const MatrixType&, Index, string, int, RealScalar) * to compute the eigenvalues of the matrix \p A. The eigenvectors are computed if * \p options equals #ComputeEigenvectors. * */ ArpackGeneralizedSelfAdjointEigenSolver(const MatrixType& A, Index nbrEigenvalues, std::string eigs_sigma="LM", int options=ComputeEigenvectors, RealScalar tol=0.0) : m_eivec(), m_eivalues(), m_isInitialized(false), m_eigenvectorsOk(false), m_nbrConverged(0), m_nbrIterations(0) { compute(A, nbrEigenvalues, eigs_sigma, options, tol); } /** \brief Computes generalized eigenvalues / eigenvectors of given matrix using the external ARPACK library. * * \param[in] A Selfadjoint matrix whose eigendecomposition is to be computed. * \param[in] B Selfadjoint matrix for generalized eigenvalues. * \param[in] nbrEigenvalues The number of eigenvalues / eigenvectors to compute. * Must be less than the size of the input matrix, or an error is returned. * \param[in] eigs_sigma String containing either "LM", "SM", "LA", or "SA", with * respective meanings to find the largest magnitude , smallest magnitude, * largest algebraic, or smallest algebraic eigenvalues. Alternatively, this * value can contain floating point value in string form, in which case the * eigenvalues closest to this value will be found. * \param[in] options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly. * \param[in] tol What tolerance to find the eigenvalues to. Default is 0, which * means machine precision. * * \returns Reference to \c *this * * This function computes the generalized eigenvalues of \p A with respect to \p B using ARPACK. The eigenvalues() * function can be used to retrieve them. If \p options equals #ComputeEigenvectors, * then the eigenvectors are also computed and can be retrieved by * calling eigenvectors(). * */ ArpackGeneralizedSelfAdjointEigenSolver& compute(const MatrixType& A, const MatrixType& B, Index nbrEigenvalues, std::string eigs_sigma="LM", int options=ComputeEigenvectors, RealScalar tol=0.0); /** \brief Computes eigenvalues / eigenvectors of given matrix using the external ARPACK library. * * \param[in] A Selfadjoint matrix whose eigendecomposition is to be computed. * \param[in] nbrEigenvalues The number of eigenvalues / eigenvectors to compute. * Must be less than the size of the input matrix, or an error is returned. * \param[in] eigs_sigma String containing either "LM", "SM", "LA", or "SA", with * respective meanings to find the largest magnitude , smallest magnitude, * largest algebraic, or smallest algebraic eigenvalues. Alternatively, this * value can contain floating point value in string form, in which case the * eigenvalues closest to this value will be found. * \param[in] options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly. * \param[in] tol What tolerance to find the eigenvalues to. Default is 0, which * means machine precision. * * \returns Reference to \c *this * * This function computes the eigenvalues of \p A using ARPACK. The eigenvalues() * function can be used to retrieve them. If \p options equals #ComputeEigenvectors, * then the eigenvectors are also computed and can be retrieved by * calling eigenvectors(). * */ ArpackGeneralizedSelfAdjointEigenSolver& compute(const MatrixType& A, Index nbrEigenvalues, std::string eigs_sigma="LM", int options=ComputeEigenvectors, RealScalar tol=0.0); /** \brief Returns the eigenvectors of given matrix. * * \returns A const reference to the matrix whose columns are the eigenvectors. * * \pre The eigenvectors have been computed before. * * Column \f$ k \f$ of the returned matrix is an eigenvector corresponding * to eigenvalue number \f$ k \f$ as returned by eigenvalues(). The * eigenvectors are normalized to have (Euclidean) norm equal to one. If * this object was used to solve the eigenproblem for the selfadjoint * matrix \f$ A \f$, then the matrix returned by this function is the * matrix \f$ V \f$ in the eigendecomposition \f$ A V = D V \f$. * For the generalized eigenproblem, the matrix returned is the solution \f$ A V = D B V \f$ * * Example: \include SelfAdjointEigenSolver_eigenvectors.cpp * Output: \verbinclude SelfAdjointEigenSolver_eigenvectors.out * * \sa eigenvalues() */ const Matrix& eigenvectors() const { eigen_assert(m_isInitialized && "ArpackGeneralizedSelfAdjointEigenSolver is not initialized."); eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues."); return m_eivec; } /** \brief Returns the eigenvalues of given matrix. * * \returns A const reference to the column vector containing the eigenvalues. * * \pre The eigenvalues have been computed before. * * The eigenvalues are repeated according to their algebraic multiplicity, * so there are as many eigenvalues as rows in the matrix. The eigenvalues * are sorted in increasing order. * * Example: \include SelfAdjointEigenSolver_eigenvalues.cpp * Output: \verbinclude SelfAdjointEigenSolver_eigenvalues.out * * \sa eigenvectors(), MatrixBase::eigenvalues() */ const Matrix& eigenvalues() const { eigen_assert(m_isInitialized && "ArpackGeneralizedSelfAdjointEigenSolver is not initialized."); return m_eivalues; } /** \brief Computes the positive-definite square root of the matrix. * * \returns the positive-definite square root of the matrix * * \pre The eigenvalues and eigenvectors of a positive-definite matrix * have been computed before. * * The square root of a positive-definite matrix \f$ A \f$ is the * positive-definite matrix whose square equals \f$ A \f$. This function * uses the eigendecomposition \f$ A = V D V^{-1} \f$ to compute the * square root as \f$ A^{1/2} = V D^{1/2} V^{-1} \f$. * * Example: \include SelfAdjointEigenSolver_operatorSqrt.cpp * Output: \verbinclude SelfAdjointEigenSolver_operatorSqrt.out * * \sa operatorInverseSqrt(), * \ref MatrixFunctions_Module "MatrixFunctions Module" */ Matrix operatorSqrt() const { eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized."); eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues."); return m_eivec * m_eivalues.cwiseSqrt().asDiagonal() * m_eivec.adjoint(); } /** \brief Computes the inverse square root of the matrix. * * \returns the inverse positive-definite square root of the matrix * * \pre The eigenvalues and eigenvectors of a positive-definite matrix * have been computed before. * * This function uses the eigendecomposition \f$ A = V D V^{-1} \f$ to * compute the inverse square root as \f$ V D^{-1/2} V^{-1} \f$. This is * cheaper than first computing the square root with operatorSqrt() and * then its inverse with MatrixBase::inverse(). * * Example: \include SelfAdjointEigenSolver_operatorInverseSqrt.cpp * Output: \verbinclude SelfAdjointEigenSolver_operatorInverseSqrt.out * * \sa operatorSqrt(), MatrixBase::inverse(), * \ref MatrixFunctions_Module "MatrixFunctions Module" */ Matrix operatorInverseSqrt() const { eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized."); eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues."); return m_eivec * m_eivalues.cwiseInverse().cwiseSqrt().asDiagonal() * m_eivec.adjoint(); } /** \brief Reports whether previous computation was successful. * * \returns \c Success if computation was successful, \c NoConvergence otherwise. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "ArpackGeneralizedSelfAdjointEigenSolver is not initialized."); return m_info; } size_t getNbrConvergedEigenValues() const { return m_nbrConverged; } size_t getNbrIterations() const { return m_nbrIterations; } protected: Matrix m_eivec; Matrix m_eivalues; ComputationInfo m_info; bool m_isInitialized; bool m_eigenvectorsOk; size_t m_nbrConverged; size_t m_nbrIterations; }; template ArpackGeneralizedSelfAdjointEigenSolver& ArpackGeneralizedSelfAdjointEigenSolver ::compute(const MatrixType& A, Index nbrEigenvalues, std::string eigs_sigma, int options, RealScalar tol) { MatrixType B(0,0); compute(A, B, nbrEigenvalues, eigs_sigma, options, tol); return *this; } template ArpackGeneralizedSelfAdjointEigenSolver& ArpackGeneralizedSelfAdjointEigenSolver ::compute(const MatrixType& A, const MatrixType& B, Index nbrEigenvalues, std::string eigs_sigma, int options, RealScalar tol) { eigen_assert(A.cols() == A.rows()); eigen_assert(B.cols() == B.rows()); eigen_assert(B.rows() == 0 || A.cols() == B.rows()); eigen_assert((options &~ (EigVecMask | GenEigMask)) == 0 && (options & EigVecMask) != EigVecMask && "invalid option parameter"); bool isBempty = (B.rows() == 0) || (B.cols() == 0); // For clarity, all parameters match their ARPACK name // // Always 0 on the first call // int ido = 0; int n = (int)A.cols(); // User options: "LA", "SA", "SM", "LM", "BE" // char whch[3] = "LM"; // Specifies the shift if iparam[6] = { 3, 4, 5 }, not used if iparam[6] = { 1, 2 } // RealScalar sigma = 0.0; if (eigs_sigma.length() >= 2 && isalpha(eigs_sigma[0]) && isalpha(eigs_sigma[1])) { eigs_sigma[0] = toupper(eigs_sigma[0]); eigs_sigma[1] = toupper(eigs_sigma[1]); // In the following special case we're going to invert the problem, since solving // for larger magnitude is much much faster // i.e., if 'SM' is specified, we're going to really use 'LM', the default // if (eigs_sigma.substr(0,2) != "SM") { whch[0] = eigs_sigma[0]; whch[1] = eigs_sigma[1]; } } else { eigen_assert(false && "Specifying clustered eigenvalues is not yet supported!"); // If it's not scalar values, then the user may be explicitly // specifying the sigma value to cluster the evs around // sigma = atof(eigs_sigma.c_str()); // If atof fails, it returns 0.0, which is a fine default // } // "I" means normal eigenvalue problem, "G" means generalized // char bmat[2] = "I"; if (eigs_sigma.substr(0,2) == "SM" || !(isalpha(eigs_sigma[0]) && isalpha(eigs_sigma[1])) || (!isBempty && !BisSPD)) bmat[0] = 'G'; // Now we determine the mode to use // int mode = (bmat[0] == 'G') + 1; if (eigs_sigma.substr(0,2) == "SM" || !(isalpha(eigs_sigma[0]) && isalpha(eigs_sigma[1]))) { // We're going to use shift-and-invert mode, and basically find // the largest eigenvalues of the inverse operator // mode = 3; } // The user-specified number of eigenvalues/vectors to compute // int nev = (int)nbrEigenvalues; // Allocate space for ARPACK to store the residual // Scalar *resid = new Scalar[n]; // Number of Lanczos vectors, must satisfy nev < ncv <= n // Note that this indicates that nev != n, and we cannot compute // all eigenvalues of a mtrix // int ncv = std::min(std::max(2*nev, 20), n); // The working n x ncv matrix, also store the final eigenvectors (if computed) // Scalar *v = new Scalar[n*ncv]; int ldv = n; // Working space // Scalar *workd = new Scalar[3*n]; int lworkl = ncv*ncv+8*ncv; // Must be at least this length Scalar *workl = new Scalar[lworkl]; int *iparam= new int[11]; iparam[0] = 1; // 1 means we let ARPACK perform the shifts, 0 means we'd have to do it iparam[2] = std::max(300, (int)std::ceil(2*n/std::max(ncv,1))); iparam[6] = mode; // The mode, 1 is standard ev problem, 2 for generalized ev, 3 for shift-and-invert // Used during reverse communicate to notify where arrays start // int *ipntr = new int[11]; // Error codes are returned in here, initial value of 0 indicates a random initial // residual vector is used, any other values means resid contains the initial residual // vector, possibly from a previous run // int info = 0; Scalar scale = 1.0; //if (!isBempty) //{ //Scalar scale = B.norm() / std::sqrt(n); //scale = std::pow(2, std::floor(std::log(scale+1))); ////M /= scale; //for (size_t i=0; i<(size_t)B.outerSize(); i++) // for (typename MatrixType::InnerIterator it(B, i); it; ++it) // it.valueRef() /= scale; //} MatrixSolver OP; if (mode == 1 || mode == 2) { if (!isBempty) OP.compute(B); } else if (mode == 3) { if (sigma == 0.0) { OP.compute(A); } else { // Note: We will never enter here because sigma must be 0.0 // if (isBempty) { MatrixType AminusSigmaB(A); for (Index i=0; i::saupd(&ido, bmat, &n, whch, &nev, &tol, resid, &ncv, v, &ldv, iparam, ipntr, workd, workl, &lworkl, &info); if (ido == -1 || ido == 1) { Scalar *in = workd + ipntr[0] - 1; Scalar *out = workd + ipntr[1] - 1; if (ido == 1 && mode != 2) { Scalar *out2 = workd + ipntr[2] - 1; if (isBempty || mode == 1) Matrix::Map(out2, n) = Matrix::Map(in, n); else Matrix::Map(out2, n) = B * Matrix::Map(in, n); in = workd + ipntr[2] - 1; } if (mode == 1) { if (isBempty) { // OP = A // Matrix::Map(out, n) = A * Matrix::Map(in, n); } else { // OP = L^{-1}AL^{-T} // internal::OP::applyOP(OP, A, n, in, out); } } else if (mode == 2) { if (ido == 1) Matrix::Map(in, n) = A * Matrix::Map(in, n); // OP = B^{-1} A // Matrix::Map(out, n) = OP.solve(Matrix::Map(in, n)); } else if (mode == 3) { // OP = (A-\sigmaB)B (\sigma could be 0, and B could be I) // The B * in is already computed and stored at in if ido == 1 // if (ido == 1 || isBempty) Matrix::Map(out, n) = OP.solve(Matrix::Map(in, n)); else Matrix::Map(out, n) = OP.solve(B * Matrix::Map(in, n)); } } else if (ido == 2) { Scalar *in = workd + ipntr[0] - 1; Scalar *out = workd + ipntr[1] - 1; if (isBempty || mode == 1) Matrix::Map(out, n) = Matrix::Map(in, n); else Matrix::Map(out, n) = B * Matrix::Map(in, n); } } while (ido != 99); if (info == 1) m_info = NoConvergence; else if (info == 3) m_info = NumericalIssue; else if (info < 0) m_info = InvalidInput; else if (info != 0) eigen_assert(false && "Unknown ARPACK return value!"); else { // Do we compute eigenvectors or not? // int rvec = (options & ComputeEigenvectors) == ComputeEigenvectors; // "A" means "All", use "S" to choose specific eigenvalues (not yet supported in ARPACK)) // char howmny[2] = "A"; // if howmny == "S", specifies the eigenvalues to compute (not implemented in ARPACK) // int *select = new int[ncv]; // Final eigenvalues // m_eivalues.resize(nev, 1); internal::arpack_wrapper::seupd(&rvec, howmny, select, m_eivalues.data(), v, &ldv, &sigma, bmat, &n, whch, &nev, &tol, resid, &ncv, v, &ldv, iparam, ipntr, workd, workl, &lworkl, &info); if (info == -14) m_info = NoConvergence; else if (info != 0) m_info = InvalidInput; else { if (rvec) { m_eivec.resize(A.rows(), nev); for (int i=0; i::project(OP, n, nev, m_eivec.data()); m_eigenvectorsOk = true; } m_nbrIterations = iparam[2]; m_nbrConverged = iparam[4]; m_info = Success; } delete[] select; } delete[] v; delete[] iparam; delete[] ipntr; delete[] workd; delete[] workl; delete[] resid; m_isInitialized = true; return *this; } // Single precision // extern "C" void ssaupd_(int *ido, char *bmat, int *n, char *which, int *nev, float *tol, float *resid, int *ncv, float *v, int *ldv, int *iparam, int *ipntr, float *workd, float *workl, int *lworkl, int *info); extern "C" void sseupd_(int *rvec, char *All, int *select, float *d, float *z, int *ldz, float *sigma, char *bmat, int *n, char *which, int *nev, float *tol, float *resid, int *ncv, float *v, int *ldv, int *iparam, int *ipntr, float *workd, float *workl, int *lworkl, int *ierr); // Double precision // extern "C" void dsaupd_(int *ido, char *bmat, int *n, char *which, int *nev, double *tol, double *resid, int *ncv, double *v, int *ldv, int *iparam, int *ipntr, double *workd, double *workl, int *lworkl, int *info); extern "C" void dseupd_(int *rvec, char *All, int *select, double *d, double *z, int *ldz, double *sigma, char *bmat, int *n, char *which, int *nev, double *tol, double *resid, int *ncv, double *v, int *ldv, int *iparam, int *ipntr, double *workd, double *workl, int *lworkl, int *ierr); namespace internal { template struct arpack_wrapper { static inline void saupd(int *ido, char *bmat, int *n, char *which, int *nev, RealScalar *tol, Scalar *resid, int *ncv, Scalar *v, int *ldv, int *iparam, int *ipntr, Scalar *workd, Scalar *workl, int *lworkl, int *info) { EIGEN_STATIC_ASSERT(!NumTraits::IsComplex, NUMERIC_TYPE_MUST_BE_REAL) } static inline void seupd(int *rvec, char *All, int *select, Scalar *d, Scalar *z, int *ldz, RealScalar *sigma, char *bmat, int *n, char *which, int *nev, RealScalar *tol, Scalar *resid, int *ncv, Scalar *v, int *ldv, int *iparam, int *ipntr, Scalar *workd, Scalar *workl, int *lworkl, int *ierr) { EIGEN_STATIC_ASSERT(!NumTraits::IsComplex, NUMERIC_TYPE_MUST_BE_REAL) } }; template <> struct arpack_wrapper { static inline void saupd(int *ido, char *bmat, int *n, char *which, int *nev, float *tol, float *resid, int *ncv, float *v, int *ldv, int *iparam, int *ipntr, float *workd, float *workl, int *lworkl, int *info) { ssaupd_(ido, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, info); } static inline void seupd(int *rvec, char *All, int *select, float *d, float *z, int *ldz, float *sigma, char *bmat, int *n, char *which, int *nev, float *tol, float *resid, int *ncv, float *v, int *ldv, int *iparam, int *ipntr, float *workd, float *workl, int *lworkl, int *ierr) { sseupd_(rvec, All, select, d, z, ldz, sigma, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, ierr); } }; template <> struct arpack_wrapper { static inline void saupd(int *ido, char *bmat, int *n, char *which, int *nev, double *tol, double *resid, int *ncv, double *v, int *ldv, int *iparam, int *ipntr, double *workd, double *workl, int *lworkl, int *info) { dsaupd_(ido, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, info); } static inline void seupd(int *rvec, char *All, int *select, double *d, double *z, int *ldz, double *sigma, char *bmat, int *n, char *which, int *nev, double *tol, double *resid, int *ncv, double *v, int *ldv, int *iparam, int *ipntr, double *workd, double *workl, int *lworkl, int *ierr) { dseupd_(rvec, All, select, d, v, ldv, sigma, bmat, n, which, nev, tol, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, ierr); } }; template struct OP { static inline void applyOP(MatrixSolver &OP, const MatrixType &A, int n, Scalar *in, Scalar *out); static inline void project(MatrixSolver &OP, int n, int k, Scalar *vecs); }; template struct OP { static inline void applyOP(MatrixSolver &OP, const MatrixType &A, int n, Scalar *in, Scalar *out) { // OP = L^{-1} A L^{-T} (B = LL^T) // // First solve L^T out = in // Matrix::Map(out, n) = OP.matrixU().solve(Matrix::Map(in, n)); Matrix::Map(out, n) = OP.permutationPinv() * Matrix::Map(out, n); // Then compute out = A out // Matrix::Map(out, n) = A * Matrix::Map(out, n); // Then solve L out = out // Matrix::Map(out, n) = OP.permutationP() * Matrix::Map(out, n); Matrix::Map(out, n) = OP.matrixL().solve(Matrix::Map(out, n)); } static inline void project(MatrixSolver &OP, int n, int k, Scalar *vecs) { // Solve L^T out = in // Matrix::Map(vecs, n, k) = OP.matrixU().solve(Matrix::Map(vecs, n, k)); Matrix::Map(vecs, n, k) = OP.permutationPinv() * Matrix::Map(vecs, n, k); } }; template struct OP { static inline void applyOP(MatrixSolver &OP, const MatrixType &A, int n, Scalar *in, Scalar *out) { eigen_assert(false && "Should never be in here..."); } static inline void project(MatrixSolver &OP, int n, int k, Scalar *vecs) { eigen_assert(false && "Should never be in here..."); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_ARPACKSELFADJOINTEIGENSOLVER_H RcppEigen/inst/include/unsupported/Eigen/src/AutoDiff/0000755000176200001440000000000014562417221022441 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/AutoDiff/AutoDiffJacobian.h0000644000176200001440000000611614107270226025743 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_AUTODIFF_JACOBIAN_H #define EIGEN_AUTODIFF_JACOBIAN_H namespace Eigen { template class AutoDiffJacobian : public Functor { public: AutoDiffJacobian() : Functor() {} AutoDiffJacobian(const Functor& f) : Functor(f) {} // forward constructors #if EIGEN_HAS_VARIADIC_TEMPLATES template AutoDiffJacobian(const T& ...Values) : Functor(Values...) {} #else template AutoDiffJacobian(const T0& a0) : Functor(a0) {} template AutoDiffJacobian(const T0& a0, const T1& a1) : Functor(a0, a1) {} template AutoDiffJacobian(const T0& a0, const T1& a1, const T2& a2) : Functor(a0, a1, a2) {} #endif typedef typename Functor::InputType InputType; typedef typename Functor::ValueType ValueType; typedef typename ValueType::Scalar Scalar; enum { InputsAtCompileTime = InputType::RowsAtCompileTime, ValuesAtCompileTime = ValueType::RowsAtCompileTime }; typedef Matrix JacobianType; typedef typename JacobianType::Index Index; typedef Matrix DerivativeType; typedef AutoDiffScalar ActiveScalar; typedef Matrix ActiveInput; typedef Matrix ActiveValue; #if EIGEN_HAS_VARIADIC_TEMPLATES // Some compilers don't accept variadic parameters after a default parameter, // i.e., we can't just write _jac=0 but we need to overload operator(): EIGEN_STRONG_INLINE void operator() (const InputType& x, ValueType* v) const { this->operator()(x, v, 0); } template void operator() (const InputType& x, ValueType* v, JacobianType* _jac, const ParamsType&... Params) const #else void operator() (const InputType& x, ValueType* v, JacobianType* _jac=0) const #endif { eigen_assert(v!=0); if (!_jac) { #if EIGEN_HAS_VARIADIC_TEMPLATES Functor::operator()(x, v, Params...); #else Functor::operator()(x, v); #endif return; } JacobianType& jac = *_jac; ActiveInput ax = x.template cast(); ActiveValue av(jac.rows()); if(InputsAtCompileTime==Dynamic) for (Index j=0; j // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_AUTODIFF_VECTOR_H #define EIGEN_AUTODIFF_VECTOR_H namespace Eigen { /* \class AutoDiffScalar * \brief A scalar type replacement with automatic differentation capability * * \param DerType the vector type used to store/represent the derivatives (e.g. Vector3f) * * This class represents a scalar value while tracking its respective derivatives. * * It supports the following list of global math function: * - std::abs, std::sqrt, std::pow, std::exp, std::log, std::sin, std::cos, * - internal::abs, internal::sqrt, numext::pow, internal::exp, internal::log, internal::sin, internal::cos, * - internal::conj, internal::real, internal::imag, numext::abs2. * * AutoDiffScalar can be used as the scalar type of an Eigen::Matrix object. However, * in that case, the expression template mechanism only occurs at the top Matrix level, * while derivatives are computed right away. * */ template class AutoDiffVector { public: //typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::Scalar BaseScalar; typedef AutoDiffScalar > ActiveScalar; typedef ActiveScalar Scalar; typedef AutoDiffScalar CoeffType; typedef typename JacobianType::Index Index; inline AutoDiffVector() {} inline AutoDiffVector(const ValueType& values) : m_values(values) { m_jacobian.setZero(); } CoeffType operator[] (Index i) { return CoeffType(m_values[i], m_jacobian.col(i)); } const CoeffType operator[] (Index i) const { return CoeffType(m_values[i], m_jacobian.col(i)); } CoeffType operator() (Index i) { return CoeffType(m_values[i], m_jacobian.col(i)); } const CoeffType operator() (Index i) const { return CoeffType(m_values[i], m_jacobian.col(i)); } CoeffType coeffRef(Index i) { return CoeffType(m_values[i], m_jacobian.col(i)); } const CoeffType coeffRef(Index i) const { return CoeffType(m_values[i], m_jacobian.col(i)); } Index size() const { return m_values.size(); } // FIXME here we could return an expression of the sum Scalar sum() const { /*std::cerr << "sum \n\n";*/ /*std::cerr << m_jacobian.rowwise().sum() << "\n\n";*/ return Scalar(m_values.sum(), m_jacobian.rowwise().sum()); } inline AutoDiffVector(const ValueType& values, const JacobianType& jac) : m_values(values), m_jacobian(jac) {} template inline AutoDiffVector(const AutoDiffVector& other) : m_values(other.values()), m_jacobian(other.jacobian()) {} inline AutoDiffVector(const AutoDiffVector& other) : m_values(other.values()), m_jacobian(other.jacobian()) {} template inline AutoDiffVector& operator=(const AutoDiffVector& other) { m_values = other.values(); m_jacobian = other.jacobian(); return *this; } inline AutoDiffVector& operator=(const AutoDiffVector& other) { m_values = other.values(); m_jacobian = other.jacobian(); return *this; } inline const ValueType& values() const { return m_values; } inline ValueType& values() { return m_values; } inline const JacobianType& jacobian() const { return m_jacobian; } inline JacobianType& jacobian() { return m_jacobian; } template inline const AutoDiffVector< typename MakeCwiseBinaryOp,ValueType,OtherValueType>::Type, typename MakeCwiseBinaryOp,JacobianType,OtherJacobianType>::Type > operator+(const AutoDiffVector& other) const { return AutoDiffVector< typename MakeCwiseBinaryOp,ValueType,OtherValueType>::Type, typename MakeCwiseBinaryOp,JacobianType,OtherJacobianType>::Type >( m_values + other.values(), m_jacobian + other.jacobian()); } template inline AutoDiffVector& operator+=(const AutoDiffVector& other) { m_values += other.values(); m_jacobian += other.jacobian(); return *this; } template inline const AutoDiffVector< typename MakeCwiseBinaryOp,ValueType,OtherValueType>::Type, typename MakeCwiseBinaryOp,JacobianType,OtherJacobianType>::Type > operator-(const AutoDiffVector& other) const { return AutoDiffVector< typename MakeCwiseBinaryOp,ValueType,OtherValueType>::Type, typename MakeCwiseBinaryOp,JacobianType,OtherJacobianType>::Type >( m_values - other.values(), m_jacobian - other.jacobian()); } template inline AutoDiffVector& operator-=(const AutoDiffVector& other) { m_values -= other.values(); m_jacobian -= other.jacobian(); return *this; } inline const AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type > operator-() const { return AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type >( -m_values, -m_jacobian); } inline const AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type> operator*(const BaseScalar& other) const { return AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type >( m_values * other, m_jacobian * other); } friend inline const AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type > operator*(const Scalar& other, const AutoDiffVector& v) { return AutoDiffVector< typename MakeCwiseUnaryOp, ValueType>::Type, typename MakeCwiseUnaryOp, JacobianType>::Type >( v.values() * other, v.jacobian() * other); } // template // inline const AutoDiffVector< // CwiseBinaryOp, ValueType, OtherValueType> // CwiseBinaryOp, // CwiseUnaryOp, JacobianType>, // CwiseUnaryOp, OtherJacobianType> > > // operator*(const AutoDiffVector& other) const // { // return AutoDiffVector< // CwiseBinaryOp, ValueType, OtherValueType> // CwiseBinaryOp, // CwiseUnaryOp, JacobianType>, // CwiseUnaryOp, OtherJacobianType> > >( // m_values.cwise() * other.values(), // (m_jacobian * other.values()) + (m_values * other.jacobian())); // } inline AutoDiffVector& operator*=(const Scalar& other) { m_values *= other; m_jacobian *= other; return *this; } template inline AutoDiffVector& operator*=(const AutoDiffVector& other) { *this = *this * other; return *this; } protected: ValueType m_values; JacobianType m_jacobian; }; } #endif // EIGEN_AUTODIFF_VECTOR_H RcppEigen/inst/include/unsupported/Eigen/src/AutoDiff/AutoDiffScalar.h0000755000176200001440000007066314562417221025460 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_AUTODIFF_SCALAR_H #define EIGEN_AUTODIFF_SCALAR_H namespace Eigen { namespace internal { template struct make_coherent_impl { static void run(A&, B&) {} }; // resize a to match b is a.size()==0, and conversely. template void make_coherent(const A& a, const B&b) { make_coherent_impl::run(a.const_cast_derived(), b.const_cast_derived()); } template struct auto_diff_special_op; } // end namespace internal template class AutoDiffScalar; template inline AutoDiffScalar MakeAutoDiffScalar(const typename NewDerType::Scalar& value, const NewDerType &der) { return AutoDiffScalar(value,der); } /** \class AutoDiffScalar * \brief A scalar type replacement with automatic differentiation capability * * \param DerivativeType the vector type used to store/represent the derivatives. The base scalar type * as well as the number of derivatives to compute are determined from this type. * Typical choices include, e.g., \c Vector4f for 4 derivatives, or \c VectorXf * if the number of derivatives is not known at compile time, and/or, the number * of derivatives is large. * Note that DerivativeType can also be a reference (e.g., \c VectorXf&) to wrap a * existing vector into an AutoDiffScalar. * Finally, DerivativeType can also be any Eigen compatible expression. * * This class represents a scalar value while tracking its respective derivatives using Eigen's expression * template mechanism. * * It supports the following list of global math function: * - std::abs, std::sqrt, std::pow, std::exp, std::log, std::sin, std::cos, * - internal::abs, internal::sqrt, numext::pow, internal::exp, internal::log, internal::sin, internal::cos, * - internal::conj, internal::real, internal::imag, numext::abs2. * * AutoDiffScalar can be used as the scalar type of an Eigen::Matrix object. However, * in that case, the expression template mechanism only occurs at the top Matrix level, * while derivatives are computed right away. * */ template class AutoDiffScalar : public internal::auto_diff_special_op ::type>::Scalar, typename NumTraits::type>::Scalar>::Real>::value> { public: typedef internal::auto_diff_special_op ::type>::Scalar, typename NumTraits::type>::Scalar>::Real>::value> Base; typedef typename internal::remove_all::type DerType; typedef typename internal::traits::Scalar Scalar; typedef typename NumTraits::Real Real; using Base::operator+; using Base::operator*; /** Default constructor without any initialization. */ AutoDiffScalar() {} /** Constructs an active scalar from its \a value, and initializes the \a nbDer derivatives such that it corresponds to the \a derNumber -th variable */ AutoDiffScalar(const Scalar& value, int nbDer, int derNumber) : m_value(value), m_derivatives(DerType::Zero(nbDer)) { m_derivatives.coeffRef(derNumber) = Scalar(1); } /** Conversion from a scalar constant to an active scalar. * The derivatives are set to zero. */ /*explicit*/ AutoDiffScalar(const Real& value) : m_value(value) { if(m_derivatives.size()>0) m_derivatives.setZero(); } /** Constructs an active scalar from its \a value and derivatives \a der */ AutoDiffScalar(const Scalar& value, const DerType& der) : m_value(value), m_derivatives(der) {} template AutoDiffScalar(const AutoDiffScalar& other #ifndef EIGEN_PARSED_BY_DOXYGEN , typename internal::enable_if< internal::is_same::type>::Scalar>::value && internal::is_convertible::value , void*>::type = 0 #endif ) : m_value(other.value()), m_derivatives(other.derivatives()) {} friend std::ostream & operator << (std::ostream & s, const AutoDiffScalar& a) { return s << a.value(); } AutoDiffScalar(const AutoDiffScalar& other) : m_value(other.value()), m_derivatives(other.derivatives()) {} template inline AutoDiffScalar& operator=(const AutoDiffScalar& other) { m_value = other.value(); m_derivatives = other.derivatives(); return *this; } inline AutoDiffScalar& operator=(const AutoDiffScalar& other) { m_value = other.value(); m_derivatives = other.derivatives(); return *this; } inline AutoDiffScalar& operator=(const Scalar& other) { m_value = other; if(m_derivatives.size()>0) m_derivatives.setZero(); return *this; } // inline operator const Scalar& () const { return m_value; } // inline operator Scalar& () { return m_value; } inline const Scalar& value() const { return m_value; } inline Scalar& value() { return m_value; } inline const DerType& derivatives() const { return m_derivatives; } inline DerType& derivatives() { return m_derivatives; } inline bool operator< (const Scalar& other) const { return m_value < other; } inline bool operator<=(const Scalar& other) const { return m_value <= other; } inline bool operator> (const Scalar& other) const { return m_value > other; } inline bool operator>=(const Scalar& other) const { return m_value >= other; } inline bool operator==(const Scalar& other) const { return m_value == other; } inline bool operator!=(const Scalar& other) const { return m_value != other; } friend inline bool operator< (const Scalar& a, const AutoDiffScalar& b) { return a < b.value(); } friend inline bool operator<=(const Scalar& a, const AutoDiffScalar& b) { return a <= b.value(); } friend inline bool operator> (const Scalar& a, const AutoDiffScalar& b) { return a > b.value(); } friend inline bool operator>=(const Scalar& a, const AutoDiffScalar& b) { return a >= b.value(); } friend inline bool operator==(const Scalar& a, const AutoDiffScalar& b) { return a == b.value(); } friend inline bool operator!=(const Scalar& a, const AutoDiffScalar& b) { return a != b.value(); } template inline bool operator< (const AutoDiffScalar& b) const { return m_value < b.value(); } template inline bool operator<=(const AutoDiffScalar& b) const { return m_value <= b.value(); } template inline bool operator> (const AutoDiffScalar& b) const { return m_value > b.value(); } template inline bool operator>=(const AutoDiffScalar& b) const { return m_value >= b.value(); } template inline bool operator==(const AutoDiffScalar& b) const { return m_value == b.value(); } template inline bool operator!=(const AutoDiffScalar& b) const { return m_value != b.value(); } inline const AutoDiffScalar operator+(const Scalar& other) const { return AutoDiffScalar(m_value + other, m_derivatives); } friend inline const AutoDiffScalar operator+(const Scalar& a, const AutoDiffScalar& b) { return AutoDiffScalar(a + b.value(), b.derivatives()); } // inline const AutoDiffScalar operator+(const Real& other) const // { // return AutoDiffScalar(m_value + other, m_derivatives); // } // friend inline const AutoDiffScalar operator+(const Real& a, const AutoDiffScalar& b) // { // return AutoDiffScalar(a + b.value(), b.derivatives()); // } inline AutoDiffScalar& operator+=(const Scalar& other) { value() += other; return *this; } template inline const AutoDiffScalar,const DerType,const typename internal::remove_all::type> > operator+(const AutoDiffScalar& other) const { internal::make_coherent(m_derivatives, other.derivatives()); return AutoDiffScalar,const DerType,const typename internal::remove_all::type> >( m_value + other.value(), m_derivatives + other.derivatives()); } template inline AutoDiffScalar& operator+=(const AutoDiffScalar& other) { (*this) = (*this) + other; return *this; } inline const AutoDiffScalar operator-(const Scalar& b) const { return AutoDiffScalar(m_value - b, m_derivatives); } friend inline const AutoDiffScalar, const DerType> > operator-(const Scalar& a, const AutoDiffScalar& b) { return AutoDiffScalar, const DerType> > (a - b.value(), -b.derivatives()); } inline AutoDiffScalar& operator-=(const Scalar& other) { value() -= other; return *this; } template inline const AutoDiffScalar, const DerType,const typename internal::remove_all::type> > operator-(const AutoDiffScalar& other) const { internal::make_coherent(m_derivatives, other.derivatives()); return AutoDiffScalar, const DerType,const typename internal::remove_all::type> >( m_value - other.value(), m_derivatives - other.derivatives()); } template inline AutoDiffScalar& operator-=(const AutoDiffScalar& other) { *this = *this - other; return *this; } inline const AutoDiffScalar, const DerType> > operator-() const { return AutoDiffScalar, const DerType> >( -m_value, -m_derivatives); } inline const AutoDiffScalar operator*(const Scalar& other) const { return MakeAutoDiffScalar(m_value * other, m_derivatives * other); } friend inline const AutoDiffScalar operator*(const Scalar& other, const AutoDiffScalar& a) { return MakeAutoDiffScalar(a.value() * other, a.derivatives() * other); } // inline const AutoDiffScalar, DerType>::Type > // operator*(const Real& other) const // { // return AutoDiffScalar, DerType>::Type >( // m_value * other, // (m_derivatives * other)); // } // // friend inline const AutoDiffScalar, DerType>::Type > // operator*(const Real& other, const AutoDiffScalar& a) // { // return AutoDiffScalar, DerType>::Type >( // a.value() * other, // a.derivatives() * other); // } inline const AutoDiffScalar operator/(const Scalar& other) const { return MakeAutoDiffScalar(m_value / other, (m_derivatives * (Scalar(1)/other))); } friend inline const AutoDiffScalar operator/(const Scalar& other, const AutoDiffScalar& a) { return MakeAutoDiffScalar(other / a.value(), a.derivatives() * (Scalar(-other) / (a.value()*a.value()))); } // inline const AutoDiffScalar, DerType>::Type > // operator/(const Real& other) const // { // return AutoDiffScalar, DerType>::Type >( // m_value / other, // (m_derivatives * (Real(1)/other))); // } // // friend inline const AutoDiffScalar, DerType>::Type > // operator/(const Real& other, const AutoDiffScalar& a) // { // return AutoDiffScalar, DerType>::Type >( // other / a.value(), // a.derivatives() * (-Real(1)/other)); // } template inline const AutoDiffScalar EIGEN_COMMA const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(DerType,Scalar,product) EIGEN_COMMA const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(typename internal::remove_all::type,Scalar,product) >,Scalar,product) > operator/(const AutoDiffScalar& other) const { internal::make_coherent(m_derivatives, other.derivatives()); return MakeAutoDiffScalar( m_value / other.value(), ((m_derivatives * other.value()) - (other.derivatives() * m_value)) * (Scalar(1)/(other.value()*other.value()))); } template inline const AutoDiffScalar, const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(DerType,Scalar,product), const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(typename internal::remove_all::type,Scalar,product) > > operator*(const AutoDiffScalar& other) const { internal::make_coherent(m_derivatives, other.derivatives()); return MakeAutoDiffScalar( m_value * other.value(), (m_derivatives * other.value()) + (other.derivatives() * m_value)); } inline AutoDiffScalar& operator*=(const Scalar& other) { *this = *this * other; return *this; } template inline AutoDiffScalar& operator*=(const AutoDiffScalar& other) { *this = *this * other; return *this; } inline AutoDiffScalar& operator/=(const Scalar& other) { *this = *this / other; return *this; } template inline AutoDiffScalar& operator/=(const AutoDiffScalar& other) { *this = *this / other; return *this; } protected: Scalar m_value; DerType m_derivatives; }; namespace internal { template struct auto_diff_special_op // : auto_diff_scalar_op::Real, // is_same::Real>::value> { typedef typename remove_all::type DerType; typedef typename traits::Scalar Scalar; typedef typename NumTraits::Real Real; // typedef auto_diff_scalar_op::Real, // is_same::Real>::value> Base; // using Base::operator+; // using Base::operator+=; // using Base::operator-; // using Base::operator-=; // using Base::operator*; // using Base::operator*=; const AutoDiffScalar& derived() const { return *static_cast*>(this); } AutoDiffScalar& derived() { return *static_cast*>(this); } inline const AutoDiffScalar operator+(const Real& other) const { return AutoDiffScalar(derived().value() + other, derived().derivatives()); } friend inline const AutoDiffScalar operator+(const Real& a, const AutoDiffScalar& b) { return AutoDiffScalar(a + b.value(), b.derivatives()); } inline AutoDiffScalar& operator+=(const Real& other) { derived().value() += other; return derived(); } inline const AutoDiffScalar >, DerType>::Type > operator*(const Real& other) const { return AutoDiffScalar >, DerType>::Type >( derived().value() * other, derived().derivatives() * other); } friend inline const AutoDiffScalar >, DerType>::Type > operator*(const Real& other, const AutoDiffScalar& a) { return AutoDiffScalar >, DerType>::Type >( a.value() * other, a.derivatives() * other); } inline AutoDiffScalar& operator*=(const Scalar& other) { *this = *this * other; return derived(); } }; template struct auto_diff_special_op { void operator*() const; void operator-() const; void operator+() const; }; template void make_coherent_expression(CwiseBinaryOp xpr, const RefType &ref) { make_coherent(xpr.const_cast_derived().lhs(), ref); make_coherent(xpr.const_cast_derived().rhs(), ref); } template void make_coherent_expression(const CwiseUnaryOp &xpr, const RefType &ref) { make_coherent(xpr.nestedExpression().const_cast_derived(), ref); } // needed for compilation only template void make_coherent_expression(const CwiseNullaryOp &, const RefType &) {} template struct make_coherent_impl, B> { typedef Matrix A; static void run(A& a, B& b) { if((A_Rows==Dynamic || A_Cols==Dynamic) && (a.size()==0)) { a.resize(b.size()); a.setZero(); } else if (B::SizeAtCompileTime==Dynamic && a.size()!=0 && b.size()==0) { make_coherent_expression(b,a); } } }; template struct make_coherent_impl > { typedef Matrix B; static void run(A& a, B& b) { if((B_Rows==Dynamic || B_Cols==Dynamic) && (b.size()==0)) { b.resize(a.size()); b.setZero(); } else if (A::SizeAtCompileTime==Dynamic && b.size()!=0 && a.size()==0) { make_coherent_expression(a,b); } } }; template struct make_coherent_impl, Matrix > { typedef Matrix A; typedef Matrix B; static void run(A& a, B& b) { if((A_Rows==Dynamic || A_Cols==Dynamic) && (a.size()==0)) { a.resize(b.size()); a.setZero(); } else if((B_Rows==Dynamic || B_Cols==Dynamic) && (b.size()==0)) { b.resize(a.size()); b.setZero(); } } }; } // end namespace internal template struct ScalarBinaryOpTraits,typename DerType::Scalar,BinOp> { typedef AutoDiffScalar ReturnType; }; template struct ScalarBinaryOpTraits, BinOp> { typedef AutoDiffScalar ReturnType; }; // The following is an attempt to let Eigen's known about expression template, but that's more tricky! // template // struct ScalarBinaryOpTraits,AutoDiffScalar, BinOp> // { // enum { Defined = 1 }; // typedef AutoDiffScalar ReturnType; // }; // // template // struct ScalarBinaryOpTraits,AutoDiffScalar, BinOp> // { // enum { Defined = 1 };//internal::is_same::value }; // typedef AutoDiffScalar ReturnType; // }; #define EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(FUNC,CODE) \ template \ inline const Eigen::AutoDiffScalar< \ EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(typename Eigen::internal::remove_all::type, typename Eigen::internal::traits::type>::Scalar, product) > \ FUNC(const Eigen::AutoDiffScalar& x) { \ using namespace Eigen; \ typedef typename Eigen::internal::traits::type>::Scalar Scalar; \ EIGEN_UNUSED_VARIABLE(sizeof(Scalar)); \ CODE; \ } template struct CleanedUpDerType { typedef AutoDiffScalar::type::PlainObject> type; }; template inline const AutoDiffScalar& conj(const AutoDiffScalar& x) { return x; } template inline const AutoDiffScalar& real(const AutoDiffScalar& x) { return x; } template inline typename DerType::Scalar imag(const AutoDiffScalar&) { return 0.; } template inline typename CleanedUpDerType::type (min)(const AutoDiffScalar& x, const T& y) { typedef typename CleanedUpDerType::type ADS; return (x <= y ? ADS(x) : ADS(y)); } template inline typename CleanedUpDerType::type (max)(const AutoDiffScalar& x, const T& y) { typedef typename CleanedUpDerType::type ADS; return (x >= y ? ADS(x) : ADS(y)); } template inline typename CleanedUpDerType::type (min)(const T& x, const AutoDiffScalar& y) { typedef typename CleanedUpDerType::type ADS; return (x < y ? ADS(x) : ADS(y)); } template inline typename CleanedUpDerType::type (max)(const T& x, const AutoDiffScalar& y) { typedef typename CleanedUpDerType::type ADS; return (x > y ? ADS(x) : ADS(y)); } template inline typename CleanedUpDerType::type (min)(const AutoDiffScalar& x, const AutoDiffScalar& y) { return (x.value() < y.value() ? x : y); } template inline typename CleanedUpDerType::type (max)(const AutoDiffScalar& x, const AutoDiffScalar& y) { return (x.value() >= y.value() ? x : y); } EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(abs, using std::abs; return Eigen::MakeAutoDiffScalar(abs(x.value()), x.derivatives() * (x.value()<0 ? -1 : 1) );) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(abs2, using numext::abs2; return Eigen::MakeAutoDiffScalar(abs2(x.value()), x.derivatives() * (Scalar(2)*x.value()));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(sqrt, using std::sqrt; Scalar sqrtx = sqrt(x.value()); return Eigen::MakeAutoDiffScalar(sqrtx,x.derivatives() * (Scalar(0.5) / sqrtx));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(cos, using std::cos; using std::sin; return Eigen::MakeAutoDiffScalar(cos(x.value()), x.derivatives() * (-sin(x.value())));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(sin, using std::sin; using std::cos; return Eigen::MakeAutoDiffScalar(sin(x.value()),x.derivatives() * cos(x.value()));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(exp, using std::exp; Scalar expx = exp(x.value()); return Eigen::MakeAutoDiffScalar(expx,x.derivatives() * expx);) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(log, using std::log; return Eigen::MakeAutoDiffScalar(log(x.value()),x.derivatives() * (Scalar(1)/x.value()));) template inline const Eigen::AutoDiffScalar< EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(typename internal::remove_all::type,typename internal::traits::type>::Scalar,product) > pow(const Eigen::AutoDiffScalar &x, const typename internal::traits::type>::Scalar &y) { using namespace Eigen; using std::pow; return Eigen::MakeAutoDiffScalar(pow(x.value(),y), x.derivatives() * (y * pow(x.value(),y-1))); } template inline const AutoDiffScalar::type>::Scalar,Dynamic,1> > atan2(const AutoDiffScalar& a, const AutoDiffScalar& b) { using std::atan2; typedef typename internal::traits::type>::Scalar Scalar; typedef AutoDiffScalar > PlainADS; PlainADS ret; ret.value() = atan2(a.value(), b.value()); Scalar squared_hypot = a.value() * a.value() + b.value() * b.value(); // if (squared_hypot==0) the derivation is undefined and the following results in a NaN: ret.derivatives() = (a.derivatives() * b.value() - a.value() * b.derivatives()) / squared_hypot; return ret; } EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(tan, using std::tan; using std::cos; return Eigen::MakeAutoDiffScalar(tan(x.value()),x.derivatives() * (Scalar(1)/numext::abs2(cos(x.value()))));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(asin, using std::sqrt; using std::asin; return Eigen::MakeAutoDiffScalar(asin(x.value()),x.derivatives() * (Scalar(1)/sqrt(1-numext::abs2(x.value()))));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(acos, using std::sqrt; using std::acos; return Eigen::MakeAutoDiffScalar(acos(x.value()),x.derivatives() * (Scalar(-1)/sqrt(1-numext::abs2(x.value()))));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(tanh, using std::cosh; using std::tanh; return Eigen::MakeAutoDiffScalar(tanh(x.value()),x.derivatives() * (Scalar(1)/numext::abs2(cosh(x.value()))));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(sinh, using std::sinh; using std::cosh; return Eigen::MakeAutoDiffScalar(sinh(x.value()),x.derivatives() * cosh(x.value()));) EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY(cosh, using std::sinh; using std::cosh; return Eigen::MakeAutoDiffScalar(cosh(x.value()),x.derivatives() * sinh(x.value()));) #undef EIGEN_AUTODIFF_DECLARE_GLOBAL_UNARY template struct NumTraits > : NumTraits< typename NumTraits::type::Scalar>::Real > { typedef typename internal::remove_all::type DerTypeCleaned; typedef AutoDiffScalar::Real,DerTypeCleaned::RowsAtCompileTime,DerTypeCleaned::ColsAtCompileTime, 0, DerTypeCleaned::MaxRowsAtCompileTime, DerTypeCleaned::MaxColsAtCompileTime> > Real; typedef AutoDiffScalar NonInteger; typedef AutoDiffScalar Nested; typedef typename NumTraits::Literal Literal; enum{ RequireInitialization = 1 }; }; } namespace std { template class numeric_limits > : public numeric_limits {}; template class numeric_limits > : public numeric_limits {}; } // namespace std #endif // EIGEN_AUTODIFF_SCALAR_H RcppEigen/inst/include/unsupported/Eigen/src/Polynomials/0000755000176200001440000000000014562417221023246 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/Polynomials/Companion.h0000644000176200001440000001761414562417221025353 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Manuel Yguel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPANION_H #define EIGEN_COMPANION_H // This file requires the user to include // * Eigen/Core // * Eigen/src/PolynomialSolver.h namespace Eigen { namespace internal { #ifndef EIGEN_PARSED_BY_DOXYGEN template struct decrement_if_fixed_size { enum { ret = (Size == Dynamic) ? Dynamic : Size-1 }; }; #endif template< typename _Scalar, int _Deg > class companion { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Deg==Dynamic ? Dynamic : _Deg) enum { Deg = _Deg, Deg_1=decrement_if_fixed_size::ret }; typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef Matrix RightColumn; //typedef DiagonalMatrix< Scalar, Deg_1, Deg_1 > BottomLeftDiagonal; typedef Matrix BottomLeftDiagonal; typedef Matrix DenseCompanionMatrixType; typedef Matrix< Scalar, _Deg, Deg_1 > LeftBlock; typedef Matrix< Scalar, Deg_1, Deg_1 > BottomLeftBlock; typedef Matrix< Scalar, 1, Deg_1 > LeftBlockFirstRow; typedef DenseIndex Index; public: EIGEN_STRONG_INLINE const _Scalar operator()(Index row, Index col ) const { if( m_bl_diag.rows() > col ) { if( 0 < row ){ return m_bl_diag[col]; } else{ return 0; } } else{ return m_monic[row]; } } public: template void setPolynomial( const VectorType& poly ) { const Index deg = poly.size()-1; m_monic = -poly.head(deg)/poly[deg]; m_bl_diag.setOnes(deg-1); } template companion( const VectorType& poly ){ setPolynomial( poly ); } public: DenseCompanionMatrixType denseMatrix() const { const Index deg = m_monic.size(); const Index deg_1 = deg-1; DenseCompanionMatrixType companMat(deg,deg); companMat << ( LeftBlock(deg,deg_1) << LeftBlockFirstRow::Zero(1,deg_1), BottomLeftBlock::Identity(deg-1,deg-1)*m_bl_diag.asDiagonal() ).finished() , m_monic; return companMat; } protected: /** Helper function for the balancing algorithm. * \returns true if the row and the column, having colNorm and rowNorm * as norms, are balanced, false otherwise. * colB and rowB are respectively the multipliers for * the column and the row in order to balance them. * */ bool balanced( RealScalar colNorm, RealScalar rowNorm, bool& isBalanced, RealScalar& colB, RealScalar& rowB ); /** Helper function for the balancing algorithm. * \returns true if the row and the column, having colNorm and rowNorm * as norms, are balanced, false otherwise. * colB and rowB are respectively the multipliers for * the column and the row in order to balance them. * */ bool balancedR( RealScalar colNorm, RealScalar rowNorm, bool& isBalanced, RealScalar& colB, RealScalar& rowB ); public: /** * Balancing algorithm from B. N. PARLETT and C. REINSCH (1969) * "Balancing a matrix for calculation of eigenvalues and eigenvectors" * adapted to the case of companion matrices. * A matrix with non zero row and non zero column is balanced * for a certain norm if the i-th row and the i-th column * have same norm for all i. */ void balance(); protected: RightColumn m_monic; BottomLeftDiagonal m_bl_diag; }; template< typename _Scalar, int _Deg > inline bool companion<_Scalar,_Deg>::balanced( RealScalar colNorm, RealScalar rowNorm, bool& isBalanced, RealScalar& colB, RealScalar& rowB ) { if( RealScalar(0) == colNorm || RealScalar(0) == rowNorm || !(numext::isfinite)(colNorm) || !(numext::isfinite)(rowNorm)){ return true; } else { //To find the balancing coefficients, if the radix is 2, //one finds \f$ \sigma \f$ such that // \f$ 2^{2\sigma-1} < rowNorm / colNorm \le 2^{2\sigma+1} \f$ // then the balancing coefficient for the row is \f$ 1/2^{\sigma} \f$ // and the balancing coefficient for the column is \f$ 2^{\sigma} \f$ const RealScalar radix = RealScalar(2); const RealScalar radix2 = RealScalar(4); rowB = rowNorm / radix; colB = RealScalar(1); const RealScalar s = colNorm + rowNorm; // Find sigma s.t. rowNorm / 2 <= 2^(2*sigma) * colNorm RealScalar scout = colNorm; while (scout < rowB) { colB *= radix; scout *= radix2; } // We now have an upper-bound for sigma, try to lower it. // Find sigma s.t. 2^(2*sigma) * colNorm / 2 < rowNorm scout = colNorm * (colB / radix) * colB; // Avoid overflow. while (scout >= rowNorm) { colB /= radix; scout /= radix2; } // This line is used to avoid insubstantial balancing. if ((rowNorm + radix * scout) < RealScalar(0.95) * s * colB) { isBalanced = false; rowB = RealScalar(1) / colB; return false; } else { return true; } } } template< typename _Scalar, int _Deg > inline bool companion<_Scalar,_Deg>::balancedR( RealScalar colNorm, RealScalar rowNorm, bool& isBalanced, RealScalar& colB, RealScalar& rowB ) { if( RealScalar(0) == colNorm || RealScalar(0) == rowNorm ){ return true; } else { /** * Set the norm of the column and the row to the geometric mean * of the row and column norm */ const RealScalar q = colNorm/rowNorm; if( !isApprox( q, _Scalar(1) ) ) { rowB = sqrt( colNorm/rowNorm ); colB = RealScalar(1)/rowB; isBalanced = false; return false; } else{ return true; } } } template< typename _Scalar, int _Deg > void companion<_Scalar,_Deg>::balance() { using std::abs; EIGEN_STATIC_ASSERT( Deg == Dynamic || 1 < Deg, YOU_MADE_A_PROGRAMMING_MISTAKE ); const Index deg = m_monic.size(); const Index deg_1 = deg-1; bool hasConverged=false; while( !hasConverged ) { hasConverged = true; RealScalar colNorm,rowNorm; RealScalar colB,rowB; //First row, first column excluding the diagonal //============================================== colNorm = abs(m_bl_diag[0]); rowNorm = abs(m_monic[0]); //Compute balancing of the row and the column if( !balanced( colNorm, rowNorm, hasConverged, colB, rowB ) ) { m_bl_diag[0] *= colB; m_monic[0] *= rowB; } //Middle rows and columns excluding the diagonal //============================================== for( Index i=1; i headMonic( m_monic, 0, deg_1 ); colNorm = headMonic.array().abs().sum(); rowNorm = abs( m_bl_diag[ebl] ); //Compute balancing of the row and the column if( !balanced( colNorm, rowNorm, hasConverged, colB, rowB ) ) { headMonic *= colB; m_bl_diag[ebl] *= rowB; } } } } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPANION_H RcppEigen/inst/include/unsupported/Eigen/src/Polynomials/PolynomialSolver.h0000644000176200001440000003650314107270226026741 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Manuel Yguel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_POLYNOMIAL_SOLVER_H #define EIGEN_POLYNOMIAL_SOLVER_H namespace Eigen { /** \ingroup Polynomials_Module * \class PolynomialSolverBase. * * \brief Defined to be inherited by polynomial solvers: it provides * convenient methods such as * - real roots, * - greatest, smallest complex roots, * - real roots with greatest, smallest absolute real value, * - greatest, smallest real roots. * * It stores the set of roots as a vector of complexes. * */ template< typename _Scalar, int _Deg > class PolynomialSolverBase { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Deg==Dynamic ? Dynamic : _Deg) typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef std::complex RootType; typedef Matrix RootsType; typedef DenseIndex Index; protected: template< typename OtherPolynomial > inline void setPolynomial( const OtherPolynomial& poly ){ m_roots.resize(poly.size()-1); } public: template< typename OtherPolynomial > inline PolynomialSolverBase( const OtherPolynomial& poly ){ setPolynomial( poly() ); } inline PolynomialSolverBase(){} public: /** \returns the complex roots of the polynomial */ inline const RootsType& roots() const { return m_roots; } public: /** Clear and fills the back insertion sequence with the real roots of the polynomial * i.e. the real part of the complex roots that have an imaginary part which * absolute value is smaller than absImaginaryThreshold. * absImaginaryThreshold takes the dummy_precision associated * with the _Scalar template parameter of the PolynomialSolver class as the default value. * * \param[out] bi_seq : the back insertion sequence (stl concept) * \param[in] absImaginaryThreshold : the maximum bound of the imaginary part of a complex * number that is considered as real. * */ template inline void realRoots( Stl_back_insertion_sequence& bi_seq, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { using std::abs; bi_seq.clear(); for(Index i=0; i inline const RootType& selectComplexRoot_withRespectToNorm( squaredNormBinaryPredicate& pred ) const { Index res=0; RealScalar norm2 = numext::abs2( m_roots[0] ); for( Index i=1; i greater; return selectComplexRoot_withRespectToNorm( greater ); } /** * \returns the complex root with smallest norm. */ inline const RootType& smallestRoot() const { std::less less; return selectComplexRoot_withRespectToNorm( less ); } protected: template inline const RealScalar& selectRealRoot_withRespectToAbsRealPart( squaredRealPartBinaryPredicate& pred, bool& hasArealRoot, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { using std::abs; hasArealRoot = false; Index res=0; RealScalar abs2(0); for( Index i=0; i inline const RealScalar& selectRealRoot_withRespectToRealPart( RealPartBinaryPredicate& pred, bool& hasArealRoot, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { using std::abs; hasArealRoot = false; Index res=0; RealScalar val(0); for( Index i=0; i::dummy_precision() ) const { std::greater greater; return selectRealRoot_withRespectToAbsRealPart( greater, hasArealRoot, absImaginaryThreshold ); } /** * \returns a real root with smallest absolute magnitude. * A real root is defined as the real part of a complex root with absolute imaginary * part smallest than absImaginaryThreshold. * absImaginaryThreshold takes the dummy_precision associated * with the _Scalar template parameter of the PolynomialSolver class as the default value. * If no real root is found the boolean hasArealRoot is set to false and the real part of * the root with smallest absolute imaginary part is returned instead. * * \param[out] hasArealRoot : boolean true if a real root is found according to the * absImaginaryThreshold criterion, false otherwise. * \param[in] absImaginaryThreshold : threshold on the absolute imaginary part to decide * whether or not a root is real. */ inline const RealScalar& absSmallestRealRoot( bool& hasArealRoot, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { std::less less; return selectRealRoot_withRespectToAbsRealPart( less, hasArealRoot, absImaginaryThreshold ); } /** * \returns the real root with greatest value. * A real root is defined as the real part of a complex root with absolute imaginary * part smallest than absImaginaryThreshold. * absImaginaryThreshold takes the dummy_precision associated * with the _Scalar template parameter of the PolynomialSolver class as the default value. * If no real root is found the boolean hasArealRoot is set to false and the real part of * the root with smallest absolute imaginary part is returned instead. * * \param[out] hasArealRoot : boolean true if a real root is found according to the * absImaginaryThreshold criterion, false otherwise. * \param[in] absImaginaryThreshold : threshold on the absolute imaginary part to decide * whether or not a root is real. */ inline const RealScalar& greatestRealRoot( bool& hasArealRoot, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { std::greater greater; return selectRealRoot_withRespectToRealPart( greater, hasArealRoot, absImaginaryThreshold ); } /** * \returns the real root with smallest value. * A real root is defined as the real part of a complex root with absolute imaginary * part smallest than absImaginaryThreshold. * absImaginaryThreshold takes the dummy_precision associated * with the _Scalar template parameter of the PolynomialSolver class as the default value. * If no real root is found the boolean hasArealRoot is set to false and the real part of * the root with smallest absolute imaginary part is returned instead. * * \param[out] hasArealRoot : boolean true if a real root is found according to the * absImaginaryThreshold criterion, false otherwise. * \param[in] absImaginaryThreshold : threshold on the absolute imaginary part to decide * whether or not a root is real. */ inline const RealScalar& smallestRealRoot( bool& hasArealRoot, const RealScalar& absImaginaryThreshold = NumTraits::dummy_precision() ) const { std::less less; return selectRealRoot_withRespectToRealPart( less, hasArealRoot, absImaginaryThreshold ); } protected: RootsType m_roots; }; #define EIGEN_POLYNOMIAL_SOLVER_BASE_INHERITED_TYPES( BASE ) \ typedef typename BASE::Scalar Scalar; \ typedef typename BASE::RealScalar RealScalar; \ typedef typename BASE::RootType RootType; \ typedef typename BASE::RootsType RootsType; /** \ingroup Polynomials_Module * * \class PolynomialSolver * * \brief A polynomial solver * * Computes the complex roots of a real polynomial. * * \param _Scalar the scalar type, i.e., the type of the polynomial coefficients * \param _Deg the degree of the polynomial, can be a compile time value or Dynamic. * Notice that the number of polynomial coefficients is _Deg+1. * * This class implements a polynomial solver and provides convenient methods such as * - real roots, * - greatest, smallest complex roots, * - real roots with greatest, smallest absolute real value. * - greatest, smallest real roots. * * WARNING: this polynomial solver is experimental, part of the unsupported Eigen modules. * * * Currently a QR algorithm is used to compute the eigenvalues of the companion matrix of * the polynomial to compute its roots. * This supposes that the complex moduli of the roots are all distinct: e.g. there should * be no multiple roots or conjugate roots for instance. * With 32bit (float) floating types this problem shows up frequently. * However, almost always, correct accuracy is reached even in these cases for 64bit * (double) floating types and small polynomial degree (<20). */ template class PolynomialSolver : public PolynomialSolverBase<_Scalar,_Deg> { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Deg==Dynamic ? Dynamic : _Deg) typedef PolynomialSolverBase<_Scalar,_Deg> PS_Base; EIGEN_POLYNOMIAL_SOLVER_BASE_INHERITED_TYPES( PS_Base ) typedef Matrix CompanionMatrixType; typedef typename internal::conditional::IsComplex, ComplexEigenSolver, EigenSolver >::type EigenSolverType; typedef typename internal::conditional::IsComplex, Scalar, std::complex >::type ComplexScalar; public: /** Computes the complex roots of a new polynomial. */ template< typename OtherPolynomial > void compute( const OtherPolynomial& poly ) { eigen_assert( Scalar(0) != poly[poly.size()-1] ); eigen_assert( poly.size() > 1 ); if(poly.size() > 2 ) { internal::companion companion( poly ); companion.balance(); m_eigenSolver.compute( companion.denseMatrix() ); m_roots = m_eigenSolver.eigenvalues(); // cleanup noise in imaginary part of real roots: // if the imaginary part is rather small compared to the real part // and that cancelling the imaginary part yield a smaller evaluation, // then it's safe to keep the real part only. RealScalar coarse_prec = RealScalar(std::pow(4,poly.size()+1))*NumTraits::epsilon(); for(Index i = 0; i inline PolynomialSolver( const OtherPolynomial& poly ){ compute( poly ); } inline PolynomialSolver(){} protected: using PS_Base::m_roots; EigenSolverType m_eigenSolver; }; template< typename _Scalar > class PolynomialSolver<_Scalar,1> : public PolynomialSolverBase<_Scalar,1> { public: typedef PolynomialSolverBase<_Scalar,1> PS_Base; EIGEN_POLYNOMIAL_SOLVER_BASE_INHERITED_TYPES( PS_Base ) public: /** Computes the complex roots of a new polynomial. */ template< typename OtherPolynomial > void compute( const OtherPolynomial& poly ) { eigen_assert( poly.size() == 2 ); eigen_assert( Scalar(0) != poly[1] ); m_roots[0] = -poly[0]/poly[1]; } public: template< typename OtherPolynomial > inline PolynomialSolver( const OtherPolynomial& poly ){ compute( poly ); } inline PolynomialSolver(){} protected: using PS_Base::m_roots; }; } // end namespace Eigen #endif // EIGEN_POLYNOMIAL_SOLVER_H RcppEigen/inst/include/unsupported/Eigen/src/Polynomials/PolynomialUtils.h0000644000176200001440000001130614107270226026561 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Manuel Yguel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_POLYNOMIAL_UTILS_H #define EIGEN_POLYNOMIAL_UTILS_H namespace Eigen { /** \ingroup Polynomials_Module * \returns the evaluation of the polynomial at x using Horner algorithm. * * \param[in] poly : the vector of coefficients of the polynomial ordered * by degrees i.e. poly[i] is the coefficient of degree i of the polynomial * e.g. \f$ 1 + 3x^2 \f$ is stored as a vector \f$ [ 1, 0, 3 ] \f$. * \param[in] x : the value to evaluate the polynomial at. * * \note for stability: * \f$ |x| \le 1 \f$ */ template inline T poly_eval_horner( const Polynomials& poly, const T& x ) { T val=poly[poly.size()-1]; for(DenseIndex i=poly.size()-2; i>=0; --i ){ val = val*x + poly[i]; } return val; } /** \ingroup Polynomials_Module * \returns the evaluation of the polynomial at x using stabilized Horner algorithm. * * \param[in] poly : the vector of coefficients of the polynomial ordered * by degrees i.e. poly[i] is the coefficient of degree i of the polynomial * e.g. \f$ 1 + 3x^2 \f$ is stored as a vector \f$ [ 1, 0, 3 ] \f$. * \param[in] x : the value to evaluate the polynomial at. */ template inline T poly_eval( const Polynomials& poly, const T& x ) { typedef typename NumTraits::Real Real; if( numext::abs2( x ) <= Real(1) ){ return poly_eval_horner( poly, x ); } else { T val=poly[0]; T inv_x = T(1)/x; for( DenseIndex i=1; i inline typename NumTraits::Real cauchy_max_bound( const Polynomial& poly ) { using std::abs; typedef typename Polynomial::Scalar Scalar; typedef typename NumTraits::Real Real; eigen_assert( Scalar(0) != poly[poly.size()-1] ); const Scalar inv_leading_coeff = Scalar(1)/poly[poly.size()-1]; Real cb(0); for( DenseIndex i=0; i inline typename NumTraits::Real cauchy_min_bound( const Polynomial& poly ) { using std::abs; typedef typename Polynomial::Scalar Scalar; typedef typename NumTraits::Real Real; DenseIndex i=0; while( i void roots_to_monicPolynomial( const RootVector& rv, Polynomial& poly ) { typedef typename Polynomial::Scalar Scalar; poly.setZero( rv.size()+1 ); poly[0] = -rv[0]; poly[1] = Scalar(1); for( DenseIndex i=1; i< rv.size(); ++i ) { for( DenseIndex j=i+1; j>0; --j ){ poly[j] = poly[j-1] - rv[i]*poly[j]; } poly[0] = -rv[i]*poly[0]; } } } // end namespace Eigen #endif // EIGEN_POLYNOMIAL_UTILS_H RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/0000755000176200001440000000000014562417221025234 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/LevenbergMarquardt.h0000644000176200001440000005316714107270226031210 0ustar liggesusers// -*- coding: utf-8 // vim: set fileencoding=utf-8 // This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LEVENBERGMARQUARDT__H #define EIGEN_LEVENBERGMARQUARDT__H namespace Eigen { namespace LevenbergMarquardtSpace { enum Status { NotStarted = -2, Running = -1, ImproperInputParameters = 0, RelativeReductionTooSmall = 1, RelativeErrorTooSmall = 2, RelativeErrorAndReductionTooSmall = 3, CosinusTooSmall = 4, TooManyFunctionEvaluation = 5, FtolTooSmall = 6, XtolTooSmall = 7, GtolTooSmall = 8, UserAsked = 9 }; } /** * \ingroup NonLinearOptimization_Module * \brief Performs non linear optimization over a non-linear function, * using a variant of the Levenberg Marquardt algorithm. * * Check wikipedia for more information. * http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm */ template class LevenbergMarquardt { static Scalar sqrt_epsilon() { using std::sqrt; return sqrt(NumTraits::epsilon()); } public: LevenbergMarquardt(FunctorType &_functor) : functor(_functor) { nfev = njev = iter = 0; fnorm = gnorm = 0.; useExternalScaling=false; } typedef DenseIndex Index; struct Parameters { Parameters() : factor(Scalar(100.)) , maxfev(400) , ftol(sqrt_epsilon()) , xtol(sqrt_epsilon()) , gtol(Scalar(0.)) , epsfcn(Scalar(0.)) {} Scalar factor; Index maxfev; // maximum number of function evaluation Scalar ftol; Scalar xtol; Scalar gtol; Scalar epsfcn; }; typedef Matrix< Scalar, Dynamic, 1 > FVectorType; typedef Matrix< Scalar, Dynamic, Dynamic > JacobianType; LevenbergMarquardtSpace::Status lmder1( FVectorType &x, const Scalar tol = sqrt_epsilon() ); LevenbergMarquardtSpace::Status minimize(FVectorType &x); LevenbergMarquardtSpace::Status minimizeInit(FVectorType &x); LevenbergMarquardtSpace::Status minimizeOneStep(FVectorType &x); static LevenbergMarquardtSpace::Status lmdif1( FunctorType &functor, FVectorType &x, Index *nfev, const Scalar tol = sqrt_epsilon() ); LevenbergMarquardtSpace::Status lmstr1( FVectorType &x, const Scalar tol = sqrt_epsilon() ); LevenbergMarquardtSpace::Status minimizeOptimumStorage(FVectorType &x); LevenbergMarquardtSpace::Status minimizeOptimumStorageInit(FVectorType &x); LevenbergMarquardtSpace::Status minimizeOptimumStorageOneStep(FVectorType &x); void resetParameters(void) { parameters = Parameters(); } Parameters parameters; FVectorType fvec, qtf, diag; JacobianType fjac; PermutationMatrix permutation; Index nfev; Index njev; Index iter; Scalar fnorm, gnorm; bool useExternalScaling; Scalar lm_param(void) { return par; } private: FunctorType &functor; Index n; Index m; FVectorType wa1, wa2, wa3, wa4; Scalar par, sum; Scalar temp, temp1, temp2; Scalar delta; Scalar ratio; Scalar pnorm, xnorm, fnorm1, actred, dirder, prered; LevenbergMarquardt& operator=(const LevenbergMarquardt&); }; template LevenbergMarquardtSpace::Status LevenbergMarquardt::lmder1( FVectorType &x, const Scalar tol ) { n = x.size(); m = functor.values(); /* check the input parameters for errors. */ if (n <= 0 || m < n || tol < 0.) return LevenbergMarquardtSpace::ImproperInputParameters; resetParameters(); parameters.ftol = tol; parameters.xtol = tol; parameters.maxfev = 100*(n+1); return minimize(x); } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimize(FVectorType &x) { LevenbergMarquardtSpace::Status status = minimizeInit(x); if (status==LevenbergMarquardtSpace::ImproperInputParameters) return status; do { status = minimizeOneStep(x); } while (status==LevenbergMarquardtSpace::Running); return status; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeInit(FVectorType &x) { n = x.size(); m = functor.values(); wa1.resize(n); wa2.resize(n); wa3.resize(n); wa4.resize(m); fvec.resize(m); fjac.resize(m, n); if (!useExternalScaling) diag.resize(n); eigen_assert( (!useExternalScaling || diag.size()==n) && "When useExternalScaling is set, the caller must provide a valid 'diag'"); qtf.resize(n); /* Function Body */ nfev = 0; njev = 0; /* check the input parameters for errors. */ if (n <= 0 || m < n || parameters.ftol < 0. || parameters.xtol < 0. || parameters.gtol < 0. || parameters.maxfev <= 0 || parameters.factor <= 0.) return LevenbergMarquardtSpace::ImproperInputParameters; if (useExternalScaling) for (Index j = 0; j < n; ++j) if (diag[j] <= 0.) return LevenbergMarquardtSpace::ImproperInputParameters; /* evaluate the function at the starting point */ /* and calculate its norm. */ nfev = 1; if ( functor(x, fvec) < 0) return LevenbergMarquardtSpace::UserAsked; fnorm = fvec.stableNorm(); /* initialize levenberg-marquardt parameter and iteration counter. */ par = 0.; iter = 1; return LevenbergMarquardtSpace::NotStarted; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeOneStep(FVectorType &x) { using std::abs; using std::sqrt; eigen_assert(x.size()==n); // check the caller is not cheating us /* calculate the jacobian matrix. */ Index df_ret = functor.df(x, fjac); if (df_ret<0) return LevenbergMarquardtSpace::UserAsked; if (df_ret>0) // numerical diff, we evaluated the function df_ret times nfev += df_ret; else njev++; /* compute the qr factorization of the jacobian. */ wa2 = fjac.colwise().blueNorm(); ColPivHouseholderQR qrfac(fjac); fjac = qrfac.matrixQR(); permutation = qrfac.colsPermutation(); /* on the first iteration and if external scaling is not used, scale according */ /* to the norms of the columns of the initial jacobian. */ if (iter == 1) { if (!useExternalScaling) for (Index j = 0; j < n; ++j) diag[j] = (wa2[j]==0.)? 1. : wa2[j]; /* on the first iteration, calculate the norm of the scaled x */ /* and initialize the step bound delta. */ xnorm = diag.cwiseProduct(x).stableNorm(); delta = parameters.factor * xnorm; if (delta == 0.) delta = parameters.factor; } /* form (q transpose)*fvec and store the first n components in */ /* qtf. */ wa4 = fvec; wa4.applyOnTheLeft(qrfac.householderQ().adjoint()); qtf = wa4.head(n); /* compute the norm of the scaled gradient. */ gnorm = 0.; if (fnorm != 0.) for (Index j = 0; j < n; ++j) if (wa2[permutation.indices()[j]] != 0.) gnorm = (std::max)(gnorm, abs( fjac.col(j).head(j+1).dot(qtf.head(j+1)/fnorm) / wa2[permutation.indices()[j]])); /* test for convergence of the gradient norm. */ if (gnorm <= parameters.gtol) return LevenbergMarquardtSpace::CosinusTooSmall; /* rescale if necessary. */ if (!useExternalScaling) diag = diag.cwiseMax(wa2); do { /* determine the levenberg-marquardt parameter. */ internal::lmpar2(qrfac, diag, qtf, delta, par, wa1); /* store the direction p and x + p. calculate the norm of p. */ wa1 = -wa1; wa2 = x + wa1; pnorm = diag.cwiseProduct(wa1).stableNorm(); /* on the first iteration, adjust the initial step bound. */ if (iter == 1) delta = (std::min)(delta,pnorm); /* evaluate the function at x + p and calculate its norm. */ if ( functor(wa2, wa4) < 0) return LevenbergMarquardtSpace::UserAsked; ++nfev; fnorm1 = wa4.stableNorm(); /* compute the scaled actual reduction. */ actred = -1.; if (Scalar(.1) * fnorm1 < fnorm) actred = 1. - numext::abs2(fnorm1 / fnorm); /* compute the scaled predicted reduction and */ /* the scaled directional derivative. */ wa3 = fjac.template triangularView() * (qrfac.colsPermutation().inverse() *wa1); temp1 = numext::abs2(wa3.stableNorm() / fnorm); temp2 = numext::abs2(sqrt(par) * pnorm / fnorm); prered = temp1 + temp2 / Scalar(.5); dirder = -(temp1 + temp2); /* compute the ratio of the actual to the predicted */ /* reduction. */ ratio = 0.; if (prered != 0.) ratio = actred / prered; /* update the step bound. */ if (ratio <= Scalar(.25)) { if (actred >= 0.) temp = Scalar(.5); if (actred < 0.) temp = Scalar(.5) * dirder / (dirder + Scalar(.5) * actred); if (Scalar(.1) * fnorm1 >= fnorm || temp < Scalar(.1)) temp = Scalar(.1); /* Computing MIN */ delta = temp * (std::min)(delta, pnorm / Scalar(.1)); par /= temp; } else if (!(par != 0. && ratio < Scalar(.75))) { delta = pnorm / Scalar(.5); par = Scalar(.5) * par; } /* test for successful iteration. */ if (ratio >= Scalar(1e-4)) { /* successful iteration. update x, fvec, and their norms. */ x = wa2; wa2 = diag.cwiseProduct(x); fvec = wa4; xnorm = wa2.stableNorm(); fnorm = fnorm1; ++iter; } /* tests for convergence. */ if (abs(actred) <= parameters.ftol && prered <= parameters.ftol && Scalar(.5) * ratio <= 1. && delta <= parameters.xtol * xnorm) return LevenbergMarquardtSpace::RelativeErrorAndReductionTooSmall; if (abs(actred) <= parameters.ftol && prered <= parameters.ftol && Scalar(.5) * ratio <= 1.) return LevenbergMarquardtSpace::RelativeReductionTooSmall; if (delta <= parameters.xtol * xnorm) return LevenbergMarquardtSpace::RelativeErrorTooSmall; /* tests for termination and stringent tolerances. */ if (nfev >= parameters.maxfev) return LevenbergMarquardtSpace::TooManyFunctionEvaluation; if (abs(actred) <= NumTraits::epsilon() && prered <= NumTraits::epsilon() && Scalar(.5) * ratio <= 1.) return LevenbergMarquardtSpace::FtolTooSmall; if (delta <= NumTraits::epsilon() * xnorm) return LevenbergMarquardtSpace::XtolTooSmall; if (gnorm <= NumTraits::epsilon()) return LevenbergMarquardtSpace::GtolTooSmall; } while (ratio < Scalar(1e-4)); return LevenbergMarquardtSpace::Running; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::lmstr1( FVectorType &x, const Scalar tol ) { n = x.size(); m = functor.values(); /* check the input parameters for errors. */ if (n <= 0 || m < n || tol < 0.) return LevenbergMarquardtSpace::ImproperInputParameters; resetParameters(); parameters.ftol = tol; parameters.xtol = tol; parameters.maxfev = 100*(n+1); return minimizeOptimumStorage(x); } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeOptimumStorageInit(FVectorType &x) { n = x.size(); m = functor.values(); wa1.resize(n); wa2.resize(n); wa3.resize(n); wa4.resize(m); fvec.resize(m); // Only R is stored in fjac. Q is only used to compute 'qtf', which is // Q.transpose()*rhs. qtf will be updated using givens rotation, // instead of storing them in Q. // The purpose it to only use a nxn matrix, instead of mxn here, so // that we can handle cases where m>>n : fjac.resize(n, n); if (!useExternalScaling) diag.resize(n); eigen_assert( (!useExternalScaling || diag.size()==n) && "When useExternalScaling is set, the caller must provide a valid 'diag'"); qtf.resize(n); /* Function Body */ nfev = 0; njev = 0; /* check the input parameters for errors. */ if (n <= 0 || m < n || parameters.ftol < 0. || parameters.xtol < 0. || parameters.gtol < 0. || parameters.maxfev <= 0 || parameters.factor <= 0.) return LevenbergMarquardtSpace::ImproperInputParameters; if (useExternalScaling) for (Index j = 0; j < n; ++j) if (diag[j] <= 0.) return LevenbergMarquardtSpace::ImproperInputParameters; /* evaluate the function at the starting point */ /* and calculate its norm. */ nfev = 1; if ( functor(x, fvec) < 0) return LevenbergMarquardtSpace::UserAsked; fnorm = fvec.stableNorm(); /* initialize levenberg-marquardt parameter and iteration counter. */ par = 0.; iter = 1; return LevenbergMarquardtSpace::NotStarted; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeOptimumStorageOneStep(FVectorType &x) { using std::abs; using std::sqrt; eigen_assert(x.size()==n); // check the caller is not cheating us Index i, j; bool sing; /* compute the qr factorization of the jacobian matrix */ /* calculated one row at a time, while simultaneously */ /* forming (q transpose)*fvec and storing the first */ /* n components in qtf. */ qtf.fill(0.); fjac.fill(0.); Index rownb = 2; for (i = 0; i < m; ++i) { if (functor.df(x, wa3, rownb) < 0) return LevenbergMarquardtSpace::UserAsked; internal::rwupdt(fjac, wa3, qtf, fvec[i]); ++rownb; } ++njev; /* if the jacobian is rank deficient, call qrfac to */ /* reorder its columns and update the components of qtf. */ sing = false; for (j = 0; j < n; ++j) { if (fjac(j,j) == 0.) sing = true; wa2[j] = fjac.col(j).head(j).stableNorm(); } permutation.setIdentity(n); if (sing) { wa2 = fjac.colwise().blueNorm(); // TODO We have no unit test covering this code path, do not modify // until it is carefully tested ColPivHouseholderQR qrfac(fjac); fjac = qrfac.matrixQR(); wa1 = fjac.diagonal(); fjac.diagonal() = qrfac.hCoeffs(); permutation = qrfac.colsPermutation(); // TODO : avoid this: for(Index ii=0; ii< fjac.cols(); ii++) fjac.col(ii).segment(ii+1, fjac.rows()-ii-1) *= fjac(ii,ii); // rescale vectors for (j = 0; j < n; ++j) { if (fjac(j,j) != 0.) { sum = 0.; for (i = j; i < n; ++i) sum += fjac(i,j) * qtf[i]; temp = -sum / fjac(j,j); for (i = j; i < n; ++i) qtf[i] += fjac(i,j) * temp; } fjac(j,j) = wa1[j]; } } /* on the first iteration and if external scaling is not used, scale according */ /* to the norms of the columns of the initial jacobian. */ if (iter == 1) { if (!useExternalScaling) for (j = 0; j < n; ++j) diag[j] = (wa2[j]==0.)? 1. : wa2[j]; /* on the first iteration, calculate the norm of the scaled x */ /* and initialize the step bound delta. */ xnorm = diag.cwiseProduct(x).stableNorm(); delta = parameters.factor * xnorm; if (delta == 0.) delta = parameters.factor; } /* compute the norm of the scaled gradient. */ gnorm = 0.; if (fnorm != 0.) for (j = 0; j < n; ++j) if (wa2[permutation.indices()[j]] != 0.) gnorm = (std::max)(gnorm, abs( fjac.col(j).head(j+1).dot(qtf.head(j+1)/fnorm) / wa2[permutation.indices()[j]])); /* test for convergence of the gradient norm. */ if (gnorm <= parameters.gtol) return LevenbergMarquardtSpace::CosinusTooSmall; /* rescale if necessary. */ if (!useExternalScaling) diag = diag.cwiseMax(wa2); do { /* determine the levenberg-marquardt parameter. */ internal::lmpar(fjac, permutation.indices(), diag, qtf, delta, par, wa1); /* store the direction p and x + p. calculate the norm of p. */ wa1 = -wa1; wa2 = x + wa1; pnorm = diag.cwiseProduct(wa1).stableNorm(); /* on the first iteration, adjust the initial step bound. */ if (iter == 1) delta = (std::min)(delta,pnorm); /* evaluate the function at x + p and calculate its norm. */ if ( functor(wa2, wa4) < 0) return LevenbergMarquardtSpace::UserAsked; ++nfev; fnorm1 = wa4.stableNorm(); /* compute the scaled actual reduction. */ actred = -1.; if (Scalar(.1) * fnorm1 < fnorm) actred = 1. - numext::abs2(fnorm1 / fnorm); /* compute the scaled predicted reduction and */ /* the scaled directional derivative. */ wa3 = fjac.topLeftCorner(n,n).template triangularView() * (permutation.inverse() * wa1); temp1 = numext::abs2(wa3.stableNorm() / fnorm); temp2 = numext::abs2(sqrt(par) * pnorm / fnorm); prered = temp1 + temp2 / Scalar(.5); dirder = -(temp1 + temp2); /* compute the ratio of the actual to the predicted */ /* reduction. */ ratio = 0.; if (prered != 0.) ratio = actred / prered; /* update the step bound. */ if (ratio <= Scalar(.25)) { if (actred >= 0.) temp = Scalar(.5); if (actred < 0.) temp = Scalar(.5) * dirder / (dirder + Scalar(.5) * actred); if (Scalar(.1) * fnorm1 >= fnorm || temp < Scalar(.1)) temp = Scalar(.1); /* Computing MIN */ delta = temp * (std::min)(delta, pnorm / Scalar(.1)); par /= temp; } else if (!(par != 0. && ratio < Scalar(.75))) { delta = pnorm / Scalar(.5); par = Scalar(.5) * par; } /* test for successful iteration. */ if (ratio >= Scalar(1e-4)) { /* successful iteration. update x, fvec, and their norms. */ x = wa2; wa2 = diag.cwiseProduct(x); fvec = wa4; xnorm = wa2.stableNorm(); fnorm = fnorm1; ++iter; } /* tests for convergence. */ if (abs(actred) <= parameters.ftol && prered <= parameters.ftol && Scalar(.5) * ratio <= 1. && delta <= parameters.xtol * xnorm) return LevenbergMarquardtSpace::RelativeErrorAndReductionTooSmall; if (abs(actred) <= parameters.ftol && prered <= parameters.ftol && Scalar(.5) * ratio <= 1.) return LevenbergMarquardtSpace::RelativeReductionTooSmall; if (delta <= parameters.xtol * xnorm) return LevenbergMarquardtSpace::RelativeErrorTooSmall; /* tests for termination and stringent tolerances. */ if (nfev >= parameters.maxfev) return LevenbergMarquardtSpace::TooManyFunctionEvaluation; if (abs(actred) <= NumTraits::epsilon() && prered <= NumTraits::epsilon() && Scalar(.5) * ratio <= 1.) return LevenbergMarquardtSpace::FtolTooSmall; if (delta <= NumTraits::epsilon() * xnorm) return LevenbergMarquardtSpace::XtolTooSmall; if (gnorm <= NumTraits::epsilon()) return LevenbergMarquardtSpace::GtolTooSmall; } while (ratio < Scalar(1e-4)); return LevenbergMarquardtSpace::Running; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeOptimumStorage(FVectorType &x) { LevenbergMarquardtSpace::Status status = minimizeOptimumStorageInit(x); if (status==LevenbergMarquardtSpace::ImproperInputParameters) return status; do { status = minimizeOptimumStorageOneStep(x); } while (status==LevenbergMarquardtSpace::Running); return status; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::lmdif1( FunctorType &functor, FVectorType &x, Index *nfev, const Scalar tol ) { Index n = x.size(); Index m = functor.values(); /* check the input parameters for errors. */ if (n <= 0 || m < n || tol < 0.) return LevenbergMarquardtSpace::ImproperInputParameters; NumericalDiff numDiff(functor); // embedded LevenbergMarquardt LevenbergMarquardt, Scalar > lm(numDiff); lm.parameters.ftol = tol; lm.parameters.xtol = tol; lm.parameters.maxfev = 200*(n+1); LevenbergMarquardtSpace::Status info = LevenbergMarquardtSpace::Status(lm.minimize(x)); if (nfev) * nfev = lm.nfev; return info; } } // end namespace Eigen #endif // EIGEN_LEVENBERGMARQUARDT__H //vim: ai ts=4 sts=4 et sw=4 RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/fdjac1.h0000644000176200001440000000426114107270226026535 0ustar liggesusersnamespace Eigen { namespace internal { template DenseIndex fdjac1( const FunctorType &Functor, Matrix< Scalar, Dynamic, 1 > &x, Matrix< Scalar, Dynamic, 1 > &fvec, Matrix< Scalar, Dynamic, Dynamic > &fjac, DenseIndex ml, DenseIndex mu, Scalar epsfcn) { using std::sqrt; using std::abs; typedef DenseIndex Index; /* Local variables */ Scalar h; Index j, k; Scalar eps, temp; Index msum; int iflag; Index start, length; /* Function Body */ const Scalar epsmch = NumTraits::epsilon(); const Index n = x.size(); eigen_assert(fvec.size()==n); Matrix< Scalar, Dynamic, 1 > wa1(n); Matrix< Scalar, Dynamic, 1 > wa2(n); eps = sqrt((std::max)(epsfcn,epsmch)); msum = ml + mu + 1; if (msum >= n) { /* computation of dense approximate jacobian. */ for (j = 0; j < n; ++j) { temp = x[j]; h = eps * abs(temp); if (h == 0.) h = eps; x[j] = temp + h; iflag = Functor(x, wa1); if (iflag < 0) return iflag; x[j] = temp; fjac.col(j) = (wa1-fvec)/h; } }else { /* computation of banded approximate jacobian. */ for (k = 0; k < msum; ++k) { for (j = k; (msum<0) ? (j>n): (jn): (j(0,j-mu); length = (std::min)(n-1, j+ml) - start + 1; fjac.col(j).segment(start, length) = ( wa1.segment(start, length)-fvec.segment(start, length))/h; } } } return 0; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/r1mpyq.h0000644000176200001440000000207114107270226026633 0ustar liggesusersnamespace Eigen { namespace internal { // TODO : move this to GivensQR once there's such a thing in Eigen template void r1mpyq(DenseIndex m, DenseIndex n, Scalar *a, const std::vector > &v_givens, const std::vector > &w_givens) { typedef DenseIndex Index; /* apply the first set of givens rotations to a. */ for (Index j = n-2; j>=0; --j) for (Index i = 0; i void rwupdt( Matrix< Scalar, Dynamic, Dynamic > &r, const Matrix< Scalar, Dynamic, 1> &w, Matrix< Scalar, Dynamic, 1> &b, Scalar alpha) { typedef DenseIndex Index; const Index n = r.cols(); eigen_assert(r.rows()>=n); std::vector > givens(n); /* Local variables */ Scalar temp, rowj; /* Function Body */ for (Index j = 0; j < n; ++j) { rowj = w[j]; /* apply the previous transformations to */ /* r(i,j), i=0,1,...,j-1, and to w(j). */ for (Index i = 0; i < j; ++i) { temp = givens[i].c() * r(i,j) + givens[i].s() * rowj; rowj = -givens[i].s() * r(i,j) + givens[i].c() * rowj; r(i,j) = temp; } /* determine a givens rotation which eliminates w(j). */ givens[j].makeGivens(-r(j,j), rowj); if (rowj == 0.) continue; // givens[j] is identity /* apply the current transformation to r(j,j), b(j), and alpha. */ r(j,j) = givens[j].c() * r(j,j) + givens[j].s() * rowj; temp = givens[j].c() * b[j] + givens[j].s() * alpha; alpha = -givens[j].s() * b[j] + givens[j].c() * alpha; b[j] = temp; } } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/HybridNonLinearSolver.h0000644000176200001440000004657514562417221031650 0ustar liggesusers// -*- coding: utf-8 // vim: set fileencoding=utf-8 // This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_HYBRIDNONLINEARSOLVER_H #define EIGEN_HYBRIDNONLINEARSOLVER_H namespace Eigen { namespace HybridNonLinearSolverSpace { enum Status { Running = -1, ImproperInputParameters = 0, RelativeErrorTooSmall = 1, TooManyFunctionEvaluation = 2, TolTooSmall = 3, NotMakingProgressJacobian = 4, NotMakingProgressIterations = 5, UserAsked = 6 }; } /** * \ingroup NonLinearOptimization_Module * \brief Finds a zero of a system of n * nonlinear functions in n variables by a modification of the Powell * hybrid method ("dogleg"). * * The user must provide a subroutine which calculates the * functions. The Jacobian is either provided by the user, or approximated * using a forward-difference method. * */ template class HybridNonLinearSolver { public: typedef DenseIndex Index; HybridNonLinearSolver(FunctorType &_functor) : functor(_functor) { nfev=njev=iter = 0; fnorm= 0.; useExternalScaling=false;} struct Parameters { Parameters() : factor(Scalar(100.)) , maxfev(1000) , xtol(numext::sqrt(NumTraits::epsilon())) , nb_of_subdiagonals(-1) , nb_of_superdiagonals(-1) , epsfcn(Scalar(0.)) {} Scalar factor; Index maxfev; // maximum number of function evaluation Scalar xtol; Index nb_of_subdiagonals; Index nb_of_superdiagonals; Scalar epsfcn; }; typedef Matrix< Scalar, Dynamic, 1 > FVectorType; typedef Matrix< Scalar, Dynamic, Dynamic > JacobianType; /* TODO: if eigen provides a triangular storage, use it here */ typedef Matrix< Scalar, Dynamic, Dynamic > UpperTriangularType; HybridNonLinearSolverSpace::Status hybrj1( FVectorType &x, const Scalar tol = numext::sqrt(NumTraits::epsilon()) ); HybridNonLinearSolverSpace::Status solveInit(FVectorType &x); HybridNonLinearSolverSpace::Status solveOneStep(FVectorType &x); HybridNonLinearSolverSpace::Status solve(FVectorType &x); HybridNonLinearSolverSpace::Status hybrd1( FVectorType &x, const Scalar tol = numext::sqrt(NumTraits::epsilon()) ); HybridNonLinearSolverSpace::Status solveNumericalDiffInit(FVectorType &x); HybridNonLinearSolverSpace::Status solveNumericalDiffOneStep(FVectorType &x); HybridNonLinearSolverSpace::Status solveNumericalDiff(FVectorType &x); void resetParameters(void) { parameters = Parameters(); } Parameters parameters; FVectorType fvec, qtf, diag; JacobianType fjac; UpperTriangularType R; Index nfev; Index njev; Index iter; Scalar fnorm; bool useExternalScaling; private: FunctorType &functor; Index n; Scalar sum; bool sing; Scalar temp; Scalar delta; bool jeval; Index ncsuc; Scalar ratio; Scalar pnorm, xnorm, fnorm1; Index nslow1, nslow2; Index ncfail; Scalar actred, prered; FVectorType wa1, wa2, wa3, wa4; HybridNonLinearSolver& operator=(const HybridNonLinearSolver&); }; template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::hybrj1( FVectorType &x, const Scalar tol ) { n = x.size(); /* check the input parameters for errors. */ if (n <= 0 || tol < 0.) return HybridNonLinearSolverSpace::ImproperInputParameters; resetParameters(); parameters.maxfev = 100*(n+1); parameters.xtol = tol; diag.setConstant(n, 1.); useExternalScaling = true; return solve(x); } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solveInit(FVectorType &x) { n = x.size(); wa1.resize(n); wa2.resize(n); wa3.resize(n); wa4.resize(n); fvec.resize(n); qtf.resize(n); fjac.resize(n, n); if (!useExternalScaling) diag.resize(n); eigen_assert( (!useExternalScaling || diag.size()==n) && "When useExternalScaling is set, the caller must provide a valid 'diag'"); /* Function Body */ nfev = 0; njev = 0; /* check the input parameters for errors. */ if (n <= 0 || parameters.xtol < 0. || parameters.maxfev <= 0 || parameters.factor <= 0. ) return HybridNonLinearSolverSpace::ImproperInputParameters; if (useExternalScaling) for (Index j = 0; j < n; ++j) if (diag[j] <= 0.) return HybridNonLinearSolverSpace::ImproperInputParameters; /* evaluate the function at the starting point */ /* and calculate its norm. */ nfev = 1; if ( functor(x, fvec) < 0) return HybridNonLinearSolverSpace::UserAsked; fnorm = fvec.stableNorm(); /* initialize iteration counter and monitors. */ iter = 1; ncsuc = 0; ncfail = 0; nslow1 = 0; nslow2 = 0; return HybridNonLinearSolverSpace::Running; } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solveOneStep(FVectorType &x) { using std::abs; eigen_assert(x.size()==n); // check the caller is not cheating us Index j; std::vector > v_givens(n), w_givens(n); jeval = true; /* calculate the jacobian matrix. */ if ( functor.df(x, fjac) < 0) return HybridNonLinearSolverSpace::UserAsked; ++njev; wa2 = fjac.colwise().blueNorm(); /* on the first iteration and if external scaling is not used, scale according */ /* to the norms of the columns of the initial jacobian. */ if (iter == 1) { if (!useExternalScaling) for (j = 0; j < n; ++j) diag[j] = (wa2[j]==0.) ? 1. : wa2[j]; /* on the first iteration, calculate the norm of the scaled x */ /* and initialize the step bound delta. */ xnorm = diag.cwiseProduct(x).stableNorm(); delta = parameters.factor * xnorm; if (delta == 0.) delta = parameters.factor; } /* compute the qr factorization of the jacobian. */ HouseholderQR qrfac(fjac); // no pivoting: /* copy the triangular factor of the qr factorization into r. */ R = qrfac.matrixQR(); /* accumulate the orthogonal factor in fjac. */ fjac = qrfac.householderQ(); /* form (q transpose)*fvec and store in qtf. */ qtf = fjac.transpose() * fvec; /* rescale if necessary. */ if (!useExternalScaling) diag = diag.cwiseMax(wa2); while (true) { /* determine the direction p. */ internal::dogleg(R, diag, qtf, delta, wa1); /* store the direction p and x + p. calculate the norm of p. */ wa1 = -wa1; wa2 = x + wa1; pnorm = diag.cwiseProduct(wa1).stableNorm(); /* on the first iteration, adjust the initial step bound. */ if (iter == 1) delta = (std::min)(delta,pnorm); /* evaluate the function at x + p and calculate its norm. */ if ( functor(wa2, wa4) < 0) return HybridNonLinearSolverSpace::UserAsked; ++nfev; fnorm1 = wa4.stableNorm(); /* compute the scaled actual reduction. */ actred = -1.; if (fnorm1 < fnorm) /* Computing 2nd power */ actred = 1. - numext::abs2(fnorm1 / fnorm); /* compute the scaled predicted reduction. */ wa3 = R.template triangularView()*wa1 + qtf; temp = wa3.stableNorm(); prered = 0.; if (temp < fnorm) /* Computing 2nd power */ prered = 1. - numext::abs2(temp / fnorm); /* compute the ratio of the actual to the predicted reduction. */ ratio = 0.; if (prered > 0.) ratio = actred / prered; /* update the step bound. */ if (ratio < Scalar(.1)) { ncsuc = 0; ++ncfail; delta = Scalar(.5) * delta; } else { ncfail = 0; ++ncsuc; if (ratio >= Scalar(.5) || ncsuc > 1) delta = (std::max)(delta, pnorm / Scalar(.5)); if (abs(ratio - 1.) <= Scalar(.1)) { delta = pnorm / Scalar(.5); } } /* test for successful iteration. */ if (ratio >= Scalar(1e-4)) { /* successful iteration. update x, fvec, and their norms. */ x = wa2; wa2 = diag.cwiseProduct(x); fvec = wa4; xnorm = wa2.stableNorm(); fnorm = fnorm1; ++iter; } /* determine the progress of the iteration. */ ++nslow1; if (actred >= Scalar(.001)) nslow1 = 0; if (jeval) ++nslow2; if (actred >= Scalar(.1)) nslow2 = 0; /* test for convergence. */ if (delta <= parameters.xtol * xnorm || fnorm == 0.) return HybridNonLinearSolverSpace::RelativeErrorTooSmall; /* tests for termination and stringent tolerances. */ if (nfev >= parameters.maxfev) return HybridNonLinearSolverSpace::TooManyFunctionEvaluation; if (Scalar(.1) * (std::max)(Scalar(.1) * delta, pnorm) <= NumTraits::epsilon() * xnorm) return HybridNonLinearSolverSpace::TolTooSmall; if (nslow2 == 5) return HybridNonLinearSolverSpace::NotMakingProgressJacobian; if (nslow1 == 10) return HybridNonLinearSolverSpace::NotMakingProgressIterations; /* criterion for recalculating jacobian. */ if (ncfail == 2) break; // leave inner loop and go for the next outer loop iteration /* calculate the rank one modification to the jacobian */ /* and update qtf if necessary. */ wa1 = diag.cwiseProduct( diag.cwiseProduct(wa1)/pnorm ); wa2 = fjac.transpose() * wa4; if (ratio >= Scalar(1e-4)) qtf = wa2; wa2 = (wa2-wa3)/pnorm; /* compute the qr factorization of the updated jacobian. */ internal::r1updt(R, wa1, v_givens, w_givens, wa2, wa3, &sing); internal::r1mpyq(n, n, fjac.data(), v_givens, w_givens); internal::r1mpyq(1, n, qtf.data(), v_givens, w_givens); jeval = false; } return HybridNonLinearSolverSpace::Running; } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solve(FVectorType &x) { HybridNonLinearSolverSpace::Status status = solveInit(x); if (status==HybridNonLinearSolverSpace::ImproperInputParameters) return status; while (status==HybridNonLinearSolverSpace::Running) status = solveOneStep(x); return status; } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::hybrd1( FVectorType &x, const Scalar tol ) { n = x.size(); /* check the input parameters for errors. */ if (n <= 0 || tol < 0.) return HybridNonLinearSolverSpace::ImproperInputParameters; resetParameters(); parameters.maxfev = 200*(n+1); parameters.xtol = tol; diag.setConstant(n, 1.); useExternalScaling = true; return solveNumericalDiff(x); } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solveNumericalDiffInit(FVectorType &x) { n = x.size(); if (parameters.nb_of_subdiagonals<0) parameters.nb_of_subdiagonals= n-1; if (parameters.nb_of_superdiagonals<0) parameters.nb_of_superdiagonals= n-1; wa1.resize(n); wa2.resize(n); wa3.resize(n); wa4.resize(n); qtf.resize(n); fjac.resize(n, n); fvec.resize(n); if (!useExternalScaling) diag.resize(n); eigen_assert( (!useExternalScaling || diag.size()==n) && "When useExternalScaling is set, the caller must provide a valid 'diag'"); /* Function Body */ nfev = 0; njev = 0; /* check the input parameters for errors. */ if (n <= 0 || parameters.xtol < 0. || parameters.maxfev <= 0 || parameters.nb_of_subdiagonals< 0 || parameters.nb_of_superdiagonals< 0 || parameters.factor <= 0. ) return HybridNonLinearSolverSpace::ImproperInputParameters; if (useExternalScaling) for (Index j = 0; j < n; ++j) if (diag[j] <= 0.) return HybridNonLinearSolverSpace::ImproperInputParameters; /* evaluate the function at the starting point */ /* and calculate its norm. */ nfev = 1; if ( functor(x, fvec) < 0) return HybridNonLinearSolverSpace::UserAsked; fnorm = fvec.stableNorm(); /* initialize iteration counter and monitors. */ iter = 1; ncsuc = 0; ncfail = 0; nslow1 = 0; nslow2 = 0; return HybridNonLinearSolverSpace::Running; } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solveNumericalDiffOneStep(FVectorType &x) { using std::sqrt; using std::abs; assert(x.size()==n); // check the caller is not cheating us Index j; std::vector > v_givens(n), w_givens(n); jeval = true; if (parameters.nb_of_subdiagonals<0) parameters.nb_of_subdiagonals= n-1; if (parameters.nb_of_superdiagonals<0) parameters.nb_of_superdiagonals= n-1; /* calculate the jacobian matrix. */ if (internal::fdjac1(functor, x, fvec, fjac, parameters.nb_of_subdiagonals, parameters.nb_of_superdiagonals, parameters.epsfcn) <0) return HybridNonLinearSolverSpace::UserAsked; nfev += (std::min)(parameters.nb_of_subdiagonals+parameters.nb_of_superdiagonals+ 1, n); wa2 = fjac.colwise().blueNorm(); /* on the first iteration and if external scaling is not used, scale according */ /* to the norms of the columns of the initial jacobian. */ if (iter == 1) { if (!useExternalScaling) for (j = 0; j < n; ++j) diag[j] = (wa2[j]==0.) ? 1. : wa2[j]; /* on the first iteration, calculate the norm of the scaled x */ /* and initialize the step bound delta. */ xnorm = diag.cwiseProduct(x).stableNorm(); delta = parameters.factor * xnorm; if (delta == 0.) delta = parameters.factor; } /* compute the qr factorization of the jacobian. */ HouseholderQR qrfac(fjac); // no pivoting: /* copy the triangular factor of the qr factorization into r. */ R = qrfac.matrixQR(); /* accumulate the orthogonal factor in fjac. */ fjac = qrfac.householderQ(); /* form (q transpose)*fvec and store in qtf. */ qtf = fjac.transpose() * fvec; /* rescale if necessary. */ if (!useExternalScaling) diag = diag.cwiseMax(wa2); while (true) { /* determine the direction p. */ internal::dogleg(R, diag, qtf, delta, wa1); /* store the direction p and x + p. calculate the norm of p. */ wa1 = -wa1; wa2 = x + wa1; pnorm = diag.cwiseProduct(wa1).stableNorm(); /* on the first iteration, adjust the initial step bound. */ if (iter == 1) delta = (std::min)(delta,pnorm); /* evaluate the function at x + p and calculate its norm. */ if ( functor(wa2, wa4) < 0) return HybridNonLinearSolverSpace::UserAsked; ++nfev; fnorm1 = wa4.stableNorm(); /* compute the scaled actual reduction. */ actred = -1.; if (fnorm1 < fnorm) /* Computing 2nd power */ actred = 1. - numext::abs2(fnorm1 / fnorm); /* compute the scaled predicted reduction. */ wa3 = R.template triangularView()*wa1 + qtf; temp = wa3.stableNorm(); prered = 0.; if (temp < fnorm) /* Computing 2nd power */ prered = 1. - numext::abs2(temp / fnorm); /* compute the ratio of the actual to the predicted reduction. */ ratio = 0.; if (prered > 0.) ratio = actred / prered; /* update the step bound. */ if (ratio < Scalar(.1)) { ncsuc = 0; ++ncfail; delta = Scalar(.5) * delta; } else { ncfail = 0; ++ncsuc; if (ratio >= Scalar(.5) || ncsuc > 1) delta = (std::max)(delta, pnorm / Scalar(.5)); if (abs(ratio - 1.) <= Scalar(.1)) { delta = pnorm / Scalar(.5); } } /* test for successful iteration. */ if (ratio >= Scalar(1e-4)) { /* successful iteration. update x, fvec, and their norms. */ x = wa2; wa2 = diag.cwiseProduct(x); fvec = wa4; xnorm = wa2.stableNorm(); fnorm = fnorm1; ++iter; } /* determine the progress of the iteration. */ ++nslow1; if (actred >= Scalar(.001)) nslow1 = 0; if (jeval) ++nslow2; if (actred >= Scalar(.1)) nslow2 = 0; /* test for convergence. */ if (delta <= parameters.xtol * xnorm || fnorm == 0.) return HybridNonLinearSolverSpace::RelativeErrorTooSmall; /* tests for termination and stringent tolerances. */ if (nfev >= parameters.maxfev) return HybridNonLinearSolverSpace::TooManyFunctionEvaluation; if (Scalar(.1) * (std::max)(Scalar(.1) * delta, pnorm) <= NumTraits::epsilon() * xnorm) return HybridNonLinearSolverSpace::TolTooSmall; if (nslow2 == 5) return HybridNonLinearSolverSpace::NotMakingProgressJacobian; if (nslow1 == 10) return HybridNonLinearSolverSpace::NotMakingProgressIterations; /* criterion for recalculating jacobian. */ if (ncfail == 2) break; // leave inner loop and go for the next outer loop iteration /* calculate the rank one modification to the jacobian */ /* and update qtf if necessary. */ wa1 = diag.cwiseProduct( diag.cwiseProduct(wa1)/pnorm ); wa2 = fjac.transpose() * wa4; if (ratio >= Scalar(1e-4)) qtf = wa2; wa2 = (wa2-wa3)/pnorm; /* compute the qr factorization of the updated jacobian. */ internal::r1updt(R, wa1, v_givens, w_givens, wa2, wa3, &sing); internal::r1mpyq(n, n, fjac.data(), v_givens, w_givens); internal::r1mpyq(1, n, qtf.data(), v_givens, w_givens); jeval = false; } return HybridNonLinearSolverSpace::Running; } template HybridNonLinearSolverSpace::Status HybridNonLinearSolver::solveNumericalDiff(FVectorType &x) { HybridNonLinearSolverSpace::Status status = solveNumericalDiffInit(x); if (status==HybridNonLinearSolverSpace::ImproperInputParameters) return status; while (status==HybridNonLinearSolverSpace::Running) status = solveNumericalDiffOneStep(x); return status; } } // end namespace Eigen #endif // EIGEN_HYBRIDNONLINEARSOLVER_H //vim: ai ts=4 sts=4 et sw=4 RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/chkder.h0000644000176200001440000000351014107270226026641 0ustar liggesusers#define chkder_log10e 0.43429448190325182765 #define chkder_factor 100. namespace Eigen { namespace internal { template void chkder( const Matrix< Scalar, Dynamic, 1 > &x, const Matrix< Scalar, Dynamic, 1 > &fvec, const Matrix< Scalar, Dynamic, Dynamic > &fjac, Matrix< Scalar, Dynamic, 1 > &xp, const Matrix< Scalar, Dynamic, 1 > &fvecp, int mode, Matrix< Scalar, Dynamic, 1 > &err ) { using std::sqrt; using std::abs; using std::log; typedef DenseIndex Index; const Scalar eps = sqrt(NumTraits::epsilon()); const Scalar epsf = chkder_factor * NumTraits::epsilon(); const Scalar epslog = chkder_log10e * log(eps); Scalar temp; const Index m = fvec.size(), n = x.size(); if (mode != 2) { /* mode = 1. */ xp.resize(n); for (Index j = 0; j < n; ++j) { temp = eps * abs(x[j]); if (temp == 0.) temp = eps; xp[j] = x[j] + temp; } } else { /* mode = 2. */ err.setZero(m); for (Index j = 0; j < n; ++j) { temp = abs(x[j]); if (temp == 0.) temp = 1.; err += temp * fjac.col(j); } for (Index i = 0; i < m; ++i) { temp = 1.; if (fvec[i] != 0. && fvecp[i] != 0. && abs(fvecp[i] - fvec[i]) >= epsf * abs(fvec[i])) temp = eps * abs((fvecp[i] - fvec[i]) / eps - err[i]) / (abs(fvec[i]) + abs(fvecp[i])); err[i] = 1.; if (temp > NumTraits::epsilon() && temp < eps) err[i] = (chkder_log10e * log(temp) - epslog) / epslog; if (temp >= eps) err[i] = 0.; } } } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/covar.h0000644000176200001440000000357314107270226026524 0ustar liggesusersnamespace Eigen { namespace internal { template void covar( Matrix< Scalar, Dynamic, Dynamic > &r, const VectorXi &ipvt, Scalar tol = std::sqrt(NumTraits::epsilon()) ) { using std::abs; typedef DenseIndex Index; /* Local variables */ Index i, j, k, l, ii, jj; bool sing; Scalar temp; /* Function Body */ const Index n = r.cols(); const Scalar tolr = tol * abs(r(0,0)); Matrix< Scalar, Dynamic, 1 > wa(n); eigen_assert(ipvt.size()==n); /* form the inverse of r in the full upper triangle of r. */ l = -1; for (k = 0; k < n; ++k) if (abs(r(k,k)) > tolr) { r(k,k) = 1. / r(k,k); for (j = 0; j <= k-1; ++j) { temp = r(k,k) * r(j,k); r(j,k) = 0.; r.col(k).head(j+1) -= r.col(j).head(j+1) * temp; } l = k; } /* form the full upper triangle of the inverse of (r transpose)*r */ /* in the full upper triangle of r. */ for (k = 0; k <= l; ++k) { for (j = 0; j <= k-1; ++j) r.col(j).head(j+1) += r.col(k).head(j+1) * r(j,k); r.col(k).head(k+1) *= r(k,k); } /* form the full lower triangle of the covariance matrix */ /* in the strict lower triangle of r and in wa. */ for (j = 0; j < n; ++j) { jj = ipvt[j]; sing = j > l; for (i = 0; i <= j; ++i) { if (sing) r(i,j) = 0.; ii = ipvt[i]; if (ii > jj) r(ii,jj) = r(i,j); if (ii < jj) r(jj,ii) = r(i,j); } wa[jj] = r(j,j); } /* symmetrize the covariance matrix in r. */ r.topLeftCorner(n,n).template triangularView() = r.topLeftCorner(n,n).transpose(); r.diagonal() = wa; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/dogleg.h0000644000176200001440000000634114107270226026647 0ustar liggesusersnamespace Eigen { namespace internal { template void dogleg( const Matrix< Scalar, Dynamic, Dynamic > &qrfac, const Matrix< Scalar, Dynamic, 1 > &diag, const Matrix< Scalar, Dynamic, 1 > &qtb, Scalar delta, Matrix< Scalar, Dynamic, 1 > &x) { using std::abs; using std::sqrt; typedef DenseIndex Index; /* Local variables */ Index i, j; Scalar sum, temp, alpha, bnorm; Scalar gnorm, qnorm; Scalar sgnorm; /* Function Body */ const Scalar epsmch = NumTraits::epsilon(); const Index n = qrfac.cols(); eigen_assert(n==qtb.size()); eigen_assert(n==x.size()); eigen_assert(n==diag.size()); Matrix< Scalar, Dynamic, 1 > wa1(n), wa2(n); /* first, calculate the gauss-newton direction. */ for (j = n-1; j >=0; --j) { temp = qrfac(j,j); if (temp == 0.) { temp = epsmch * qrfac.col(j).head(j+1).maxCoeff(); if (temp == 0.) temp = epsmch; } if (j==n-1) x[j] = qtb[j] / temp; else x[j] = (qtb[j] - qrfac.row(j).tail(n-j-1).dot(x.tail(n-j-1))) / temp; } /* test whether the gauss-newton direction is acceptable. */ qnorm = diag.cwiseProduct(x).stableNorm(); if (qnorm <= delta) return; // TODO : this path is not tested by Eigen unit tests /* the gauss-newton direction is not acceptable. */ /* next, calculate the scaled gradient direction. */ wa1.fill(0.); for (j = 0; j < n; ++j) { wa1.tail(n-j) += qrfac.row(j).tail(n-j) * qtb[j]; wa1[j] /= diag[j]; } /* calculate the norm of the scaled gradient and test for */ /* the special case in which the scaled gradient is zero. */ gnorm = wa1.stableNorm(); sgnorm = 0.; alpha = delta / qnorm; if (gnorm == 0.) goto algo_end; /* calculate the point along the scaled gradient */ /* at which the quadratic is minimized. */ wa1.array() /= (diag*gnorm).array(); // TODO : once unit tests cover this part,: // wa2 = qrfac.template triangularView() * wa1; for (j = 0; j < n; ++j) { sum = 0.; for (i = j; i < n; ++i) { sum += qrfac(j,i) * wa1[i]; } wa2[j] = sum; } temp = wa2.stableNorm(); sgnorm = gnorm / temp / temp; /* test whether the scaled gradient direction is acceptable. */ alpha = 0.; if (sgnorm >= delta) goto algo_end; /* the scaled gradient direction is not acceptable. */ /* finally, calculate the point along the dogleg */ /* at which the quadratic is minimized. */ bnorm = qtb.stableNorm(); temp = bnorm / gnorm * (bnorm / qnorm) * (sgnorm / delta); temp = temp - delta / qnorm * numext::abs2(sgnorm / delta) + sqrt(numext::abs2(temp - delta / qnorm) + (1.-numext::abs2(delta / qnorm)) * (1.-numext::abs2(sgnorm / delta))); alpha = delta / qnorm * (1. - numext::abs2(sgnorm / delta)) / temp; algo_end: /* form appropriate convex combination of the gauss-newton */ /* direction and the scaled gradient direction. */ temp = (1.-alpha) * (std::min)(sgnorm,delta); x = temp * wa1 + alpha * x; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/r1updt.h0000644000176200001440000000601314562417221026624 0ustar liggesusersnamespace Eigen { namespace internal { template void r1updt( Matrix< Scalar, Dynamic, Dynamic > &s, const Matrix< Scalar, Dynamic, 1> &u, std::vector > &v_givens, std::vector > &w_givens, Matrix< Scalar, Dynamic, 1> &v, Matrix< Scalar, Dynamic, 1> &w, bool *sing) { typedef DenseIndex Index; const JacobiRotation IdentityRotation = JacobiRotation(1,0); /* Local variables */ const Index m = s.rows(); const Index n = s.cols(); Index i, j=1; Scalar temp; JacobiRotation givens; // r1updt had a broader usecase, but we don't use it here. And, more // importantly, we can not test it. eigen_assert(m==n); eigen_assert(u.size()==m); eigen_assert(v.size()==n); eigen_assert(w.size()==n); /* move the nontrivial part of the last column of s into w. */ w[n-1] = s(n-1,n-1); /* rotate the vector v into a multiple of the n-th unit vector */ /* in such a way that a spike is introduced into w. */ for (j=n-2; j>=0; --j) { w[j] = 0.; if (v[j] != 0.) { /* determine a givens rotation which eliminates the */ /* j-th element of v. */ givens.makeGivens(-v[n-1], v[j]); /* apply the transformation to v and store the information */ /* necessary to recover the givens rotation. */ v[n-1] = givens.s() * v[j] + givens.c() * v[n-1]; v_givens[j] = givens; /* apply the transformation to s and extend the spike in w. */ for (i = j; i < m; ++i) { temp = givens.c() * s(j,i) - givens.s() * w[i]; w[i] = givens.s() * s(j,i) + givens.c() * w[i]; s(j,i) = temp; } } else v_givens[j] = IdentityRotation; } /* add the spike from the rank 1 update to w. */ w += v[n-1] * u; /* eliminate the spike. */ *sing = false; for (j = 0; j < n-1; ++j) { if (w[j] != 0.) { /* determine a givens rotation which eliminates the */ /* j-th element of the spike. */ givens.makeGivens(-s(j,j), w[j]); /* apply the transformation to s and reduce the spike in w. */ for (i = j; i < m; ++i) { temp = givens.c() * s(j,i) + givens.s() * w[i]; w[i] = -givens.s() * s(j,i) + givens.c() * w[i]; s(j,i) = temp; } /* store the information necessary to recover the */ /* givens rotation. */ w_givens[j] = givens; } else v_givens[j] = IdentityRotation; /* test for zero diagonal elements in the output s. */ if (s(j,j) == 0.) { *sing = true; } } /* move w back into the last column of the output s. */ s(n-1,n-1) = w[n-1]; if (s(j,j) == 0.) { *sing = true; } return; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/qrsolv.h0000644000176200001440000000630014562417221026732 0ustar liggesusersnamespace Eigen { namespace internal { // TODO : once qrsolv2 is removed, use ColPivHouseholderQR or PermutationMatrix instead of ipvt template void qrsolv( Matrix< Scalar, Dynamic, Dynamic > &s, // TODO : use a PermutationMatrix once lmpar is no more: const VectorXi &ipvt, const Matrix< Scalar, Dynamic, 1 > &diag, const Matrix< Scalar, Dynamic, 1 > &qtb, Matrix< Scalar, Dynamic, 1 > &x, Matrix< Scalar, Dynamic, 1 > &sdiag) { typedef DenseIndex Index; /* Local variables */ Index i, j, k, l; Scalar temp; Index n = s.cols(); Matrix< Scalar, Dynamic, 1 > wa(n); JacobiRotation givens; /* Function Body */ // the following will only change the lower triangular part of s, including // the diagonal, though the diagonal is restored afterward /* copy r and (q transpose)*b to preserve input and initialize s. */ /* in particular, save the diagonal elements of r in x. */ x = s.diagonal(); wa = qtb; s.topLeftCorner(n,n).template triangularView() = s.topLeftCorner(n,n).transpose(); /* eliminate the diagonal matrix d using a givens rotation. */ for (j = 0; j < n; ++j) { /* prepare the row of d to be eliminated, locating the */ /* diagonal element using p from the qr factorization. */ l = ipvt[j]; if (diag[l] == 0.) break; sdiag.tail(n-j).setZero(); sdiag[j] = diag[l]; /* the transformations to eliminate the row of d */ /* modify only a single element of (q transpose)*b */ /* beyond the first n, which is initially zero. */ Scalar qtbpj = 0.; for (k = j; k < n; ++k) { /* determine a givens rotation which eliminates the */ /* appropriate element in the current row of d. */ givens.makeGivens(-s(k,k), sdiag[k]); /* compute the modified diagonal element of r and */ /* the modified element of ((q transpose)*b,0). */ s(k,k) = givens.c() * s(k,k) + givens.s() * sdiag[k]; temp = givens.c() * wa[k] + givens.s() * qtbpj; qtbpj = -givens.s() * wa[k] + givens.c() * qtbpj; wa[k] = temp; /* accumulate the transformation in the row of s. */ for (i = k+1; i().solveInPlace(wa.head(nsing)); // restore sdiag = s.diagonal(); s.diagonal() = x; /* permute the components of z back to components of x. */ for (j = 0; j < n; ++j) x[ipvt[j]] = wa[j]; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/NonLinearOptimization/lmpar.h0000644000176200001440000002162714107270226026525 0ustar liggesusersnamespace Eigen { namespace internal { template void lmpar( Matrix< Scalar, Dynamic, Dynamic > &r, const VectorXi &ipvt, const Matrix< Scalar, Dynamic, 1 > &diag, const Matrix< Scalar, Dynamic, 1 > &qtb, Scalar delta, Scalar &par, Matrix< Scalar, Dynamic, 1 > &x) { using std::abs; using std::sqrt; typedef DenseIndex Index; /* Local variables */ Index i, j, l; Scalar fp; Scalar parc, parl; Index iter; Scalar temp, paru; Scalar gnorm; Scalar dxnorm; /* Function Body */ const Scalar dwarf = (std::numeric_limits::min)(); const Index n = r.cols(); eigen_assert(n==diag.size()); eigen_assert(n==qtb.size()); eigen_assert(n==x.size()); Matrix< Scalar, Dynamic, 1 > wa1, wa2; /* compute and store in x the gauss-newton direction. if the */ /* jacobian is rank-deficient, obtain a least squares solution. */ Index nsing = n-1; wa1 = qtb; for (j = 0; j < n; ++j) { if (r(j,j) == 0. && nsing == n-1) nsing = j - 1; if (nsing < n-1) wa1[j] = 0.; } for (j = nsing; j>=0; --j) { wa1[j] /= r(j,j); temp = wa1[j]; for (i = 0; i < j ; ++i) wa1[i] -= r(i,j) * temp; } for (j = 0; j < n; ++j) x[ipvt[j]] = wa1[j]; /* initialize the iteration counter. */ /* evaluate the function at the origin, and test */ /* for acceptance of the gauss-newton direction. */ iter = 0; wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); fp = dxnorm - delta; if (fp <= Scalar(0.1) * delta) { par = 0; return; } /* if the jacobian is not rank deficient, the newton */ /* step provides a lower bound, parl, for the zero of */ /* the function. otherwise set this bound to zero. */ parl = 0.; if (nsing >= n-1) { for (j = 0; j < n; ++j) { l = ipvt[j]; wa1[j] = diag[l] * (wa2[l] / dxnorm); } // it's actually a triangularView.solveInplace(), though in a weird // way: for (j = 0; j < n; ++j) { Scalar sum = 0.; for (i = 0; i < j; ++i) sum += r(i,j) * wa1[i]; wa1[j] = (wa1[j] - sum) / r(j,j); } temp = wa1.blueNorm(); parl = fp / delta / temp / temp; } /* calculate an upper bound, paru, for the zero of the function. */ for (j = 0; j < n; ++j) wa1[j] = r.col(j).head(j+1).dot(qtb.head(j+1)) / diag[ipvt[j]]; gnorm = wa1.stableNorm(); paru = gnorm / delta; if (paru == 0.) paru = dwarf / (std::min)(delta,Scalar(0.1)); /* if the input par lies outside of the interval (parl,paru), */ /* set par to the closer endpoint. */ par = (std::max)(par,parl); par = (std::min)(par,paru); if (par == 0.) par = gnorm / dxnorm; /* beginning of an iteration. */ while (true) { ++iter; /* evaluate the function at the current value of par. */ if (par == 0.) par = (std::max)(dwarf,Scalar(.001) * paru); /* Computing MAX */ wa1 = sqrt(par)* diag; Matrix< Scalar, Dynamic, 1 > sdiag(n); qrsolv(r, ipvt, wa1, qtb, x, sdiag); wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); temp = fp; fp = dxnorm - delta; /* if the function is small enough, accept the current value */ /* of par. also test for the exceptional cases where parl */ /* is zero or the number of iterations has reached 10. */ if (abs(fp) <= Scalar(0.1) * delta || (parl == 0. && fp <= temp && temp < 0.) || iter == 10) break; /* compute the newton correction. */ for (j = 0; j < n; ++j) { l = ipvt[j]; wa1[j] = diag[l] * (wa2[l] / dxnorm); } for (j = 0; j < n; ++j) { wa1[j] /= sdiag[j]; temp = wa1[j]; for (i = j+1; i < n; ++i) wa1[i] -= r(i,j) * temp; } temp = wa1.blueNorm(); parc = fp / delta / temp / temp; /* depending on the sign of the function, update parl or paru. */ if (fp > 0.) parl = (std::max)(parl,par); if (fp < 0.) paru = (std::min)(paru,par); /* compute an improved estimate for par. */ /* Computing MAX */ par = (std::max)(parl,par+parc); /* end of an iteration. */ } /* termination. */ if (iter == 0) par = 0.; return; } template void lmpar2( const ColPivHouseholderQR > &qr, const Matrix< Scalar, Dynamic, 1 > &diag, const Matrix< Scalar, Dynamic, 1 > &qtb, Scalar delta, Scalar &par, Matrix< Scalar, Dynamic, 1 > &x) { using std::sqrt; using std::abs; typedef DenseIndex Index; /* Local variables */ Index j; Scalar fp; Scalar parc, parl; Index iter; Scalar temp, paru; Scalar gnorm; Scalar dxnorm; /* Function Body */ const Scalar dwarf = (std::numeric_limits::min)(); const Index n = qr.matrixQR().cols(); eigen_assert(n==diag.size()); eigen_assert(n==qtb.size()); Matrix< Scalar, Dynamic, 1 > wa1, wa2; /* compute and store in x the gauss-newton direction. if the */ /* jacobian is rank-deficient, obtain a least squares solution. */ // const Index rank = qr.nonzeroPivots(); // exactly double(0.) const Index rank = qr.rank(); // use a threshold wa1 = qtb; wa1.tail(n-rank).setZero(); qr.matrixQR().topLeftCorner(rank, rank).template triangularView().solveInPlace(wa1.head(rank)); x = qr.colsPermutation()*wa1; /* initialize the iteration counter. */ /* evaluate the function at the origin, and test */ /* for acceptance of the gauss-newton direction. */ iter = 0; wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); fp = dxnorm - delta; if (fp <= Scalar(0.1) * delta) { par = 0; return; } /* if the jacobian is not rank deficient, the newton */ /* step provides a lower bound, parl, for the zero of */ /* the function. otherwise set this bound to zero. */ parl = 0.; if (rank==n) { wa1 = qr.colsPermutation().inverse() * diag.cwiseProduct(wa2)/dxnorm; qr.matrixQR().topLeftCorner(n, n).transpose().template triangularView().solveInPlace(wa1); temp = wa1.blueNorm(); parl = fp / delta / temp / temp; } /* calculate an upper bound, paru, for the zero of the function. */ for (j = 0; j < n; ++j) wa1[j] = qr.matrixQR().col(j).head(j+1).dot(qtb.head(j+1)) / diag[qr.colsPermutation().indices()(j)]; gnorm = wa1.stableNorm(); paru = gnorm / delta; if (paru == 0.) paru = dwarf / (std::min)(delta,Scalar(0.1)); /* if the input par lies outside of the interval (parl,paru), */ /* set par to the closer endpoint. */ par = (std::max)(par,parl); par = (std::min)(par,paru); if (par == 0.) par = gnorm / dxnorm; /* beginning of an iteration. */ Matrix< Scalar, Dynamic, Dynamic > s = qr.matrixQR(); while (true) { ++iter; /* evaluate the function at the current value of par. */ if (par == 0.) par = (std::max)(dwarf,Scalar(.001) * paru); /* Computing MAX */ wa1 = sqrt(par)* diag; Matrix< Scalar, Dynamic, 1 > sdiag(n); qrsolv(s, qr.colsPermutation().indices(), wa1, qtb, x, sdiag); wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); temp = fp; fp = dxnorm - delta; /* if the function is small enough, accept the current value */ /* of par. also test for the exceptional cases where parl */ /* is zero or the number of iterations has reached 10. */ if (abs(fp) <= Scalar(0.1) * delta || (parl == 0. && fp <= temp && temp < 0.) || iter == 10) break; /* compute the newton correction. */ wa1 = qr.colsPermutation().inverse() * diag.cwiseProduct(wa2/dxnorm); // we could almost use this here, but the diagonal is outside qr, in sdiag[] // qr.matrixQR().topLeftCorner(n, n).transpose().template triangularView().solveInPlace(wa1); for (j = 0; j < n; ++j) { wa1[j] /= sdiag[j]; temp = wa1[j]; for (Index i = j+1; i < n; ++i) wa1[i] -= s(i,j) * temp; } temp = wa1.blueNorm(); parc = fp / delta / temp / temp; /* depending on the sign of the function, update parl or paru. */ if (fp > 0.) parl = (std::max)(parl,par); if (fp < 0.) paru = (std::min)(paru,par); /* compute an improved estimate for par. */ par = (std::max)(parl,par+parc); } if (iter == 0) par = 0.; return; } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/0000755000176200001440000000000014562417221024252 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/MINRES.h0000644000176200001440000003015514562417221025424 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Giacomo Po // Copyright (C) 2011-2014 Gael Guennebaud // Copyright (C) 2018 David Hyde // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MINRES_H_ #define EIGEN_MINRES_H_ namespace Eigen { namespace internal { /** \internal Low-level MINRES algorithm * \param mat The matrix A * \param rhs The right hand side vector b * \param x On input and initial solution, on output the computed solution. * \param precond A right preconditioner being able to efficiently solve for an * approximation of Ax=b (regardless of b) * \param iters On input the max number of iteration, on output the number of performed iterations. * \param tol_error On input the tolerance error, on output an estimation of the relative error. */ template EIGEN_DONT_INLINE void minres(const MatrixType& mat, const Rhs& rhs, Dest& x, const Preconditioner& precond, Index& iters, typename Dest::RealScalar& tol_error) { using std::sqrt; typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix VectorType; // Check for zero rhs const RealScalar rhsNorm2(rhs.squaredNorm()); if(rhsNorm2 == 0) { x.setZero(); iters = 0; tol_error = 0; return; } // initialize const Index maxIters(iters); // initialize maxIters to iters const Index N(mat.cols()); // the size of the matrix const RealScalar threshold2(tol_error*tol_error*rhsNorm2); // convergence threshold (compared to residualNorm2) // Initialize preconditioned Lanczos VectorType v_old(N); // will be initialized inside loop VectorType v( VectorType::Zero(N) ); //initialize v VectorType v_new(rhs-mat*x); //initialize v_new RealScalar residualNorm2(v_new.squaredNorm()); VectorType w(N); // will be initialized inside loop VectorType w_new(precond.solve(v_new)); // initialize w_new // RealScalar beta; // will be initialized inside loop RealScalar beta_new2(v_new.dot(w_new)); eigen_assert(beta_new2 >= 0.0 && "PRECONDITIONER IS NOT POSITIVE DEFINITE"); RealScalar beta_new(sqrt(beta_new2)); const RealScalar beta_one(beta_new); // Initialize other variables RealScalar c(1.0); // the cosine of the Givens rotation RealScalar c_old(1.0); RealScalar s(0.0); // the sine of the Givens rotation RealScalar s_old(0.0); // the sine of the Givens rotation VectorType p_oold(N); // will be initialized in loop VectorType p_old(VectorType::Zero(N)); // initialize p_old=0 VectorType p(p_old); // initialize p=0 RealScalar eta(1.0); iters = 0; // reset iters while ( iters < maxIters ) { // Preconditioned Lanczos /* Note that there are 4 variants on the Lanczos algorithm. These are * described in Paige, C. C. (1972). Computational variants of * the Lanczos method for the eigenproblem. IMA Journal of Applied * Mathematics, 10(3), 373-381. The current implementation corresponds * to the case A(2,7) in the paper. It also corresponds to * algorithm 6.14 in Y. Saad, Iterative Methods for Sparse Linear * Systems, 2003 p.173. For the preconditioned version see * A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM (1987). */ const RealScalar beta(beta_new); v_old = v; // update: at first time step, this makes v_old = 0 so value of beta doesn't matter v_new /= beta_new; // overwrite v_new for next iteration w_new /= beta_new; // overwrite w_new for next iteration v = v_new; // update w = w_new; // update v_new.noalias() = mat*w - beta*v_old; // compute v_new const RealScalar alpha = v_new.dot(w); v_new -= alpha*v; // overwrite v_new w_new = precond.solve(v_new); // overwrite w_new beta_new2 = v_new.dot(w_new); // compute beta_new eigen_assert(beta_new2 >= 0.0 && "PRECONDITIONER IS NOT POSITIVE DEFINITE"); beta_new = sqrt(beta_new2); // compute beta_new // Givens rotation const RealScalar r2 =s*alpha+c*c_old*beta; // s, s_old, c and c_old are still from previous iteration const RealScalar r3 =s_old*beta; // s, s_old, c and c_old are still from previous iteration const RealScalar r1_hat=c*alpha-c_old*s*beta; const RealScalar r1 =sqrt( std::pow(r1_hat,2) + std::pow(beta_new,2) ); c_old = c; // store for next iteration s_old = s; // store for next iteration c=r1_hat/r1; // new cosine s=beta_new/r1; // new sine // Update solution p_oold = p_old; p_old = p; p.noalias()=(w-r2*p_old-r3*p_oold) /r1; // IS NOALIAS REQUIRED? x += beta_one*c*eta*p; /* Update the squared residual. Note that this is the estimated residual. The real residual |Ax-b|^2 may be slightly larger */ residualNorm2 *= s*s; if ( residualNorm2 < threshold2) { break; } eta=-s*eta; // update eta iters++; // increment iteration number (for output purposes) } /* Compute error. Note that this is the estimated error. The real error |Ax-b|/|b| may be slightly larger */ tol_error = std::sqrt(residualNorm2 / rhsNorm2); } } template< typename _MatrixType, int _UpLo=Lower, typename _Preconditioner = IdentityPreconditioner> class MINRES; namespace internal { template< typename _MatrixType, int _UpLo, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } /** \ingroup IterativeLinearSolvers_Module * \brief A minimal residual solver for sparse symmetric problems * * This class allows to solve for A.x = b sparse linear problems using the MINRES algorithm * of Paige and Saunders (1975). The sparse matrix A must be symmetric (possibly indefinite). * The vectors x and b can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix. * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower, * Upper, or Lower|Upper in which the full matrix entries will be considered. Default is Lower. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * This class can be used as the direct solver classes. Here is a typical usage example: * \code * int n = 10000; * VectorXd x(n), b(n); * SparseMatrix A(n,n); * // fill A and b * MINRES > mr; * mr.compute(A); * x = mr.solve(b); * std::cout << "#iterations: " << mr.iterations() << std::endl; * std::cout << "estimated error: " << mr.error() << std::endl; * // update b, and solve again * x = mr.solve(b); * \endcode * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * MINRES can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * \sa class ConjugateGradient, BiCGSTAB, SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template< typename _MatrixType, int _UpLo, typename _Preconditioner> class MINRES : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; public: using Base::_solve_impl; typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; enum {UpLo = _UpLo}; public: /** Default constructor. */ MINRES() : Base() {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit MINRES(const EigenBase& A) : Base(A.derived()) {} /** Destructor. */ ~MINRES(){} /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { typedef typename Base::MatrixWrapper MatrixWrapper; typedef typename Base::ActualMatrixType ActualMatrixType; enum { TransposeInput = (!MatrixWrapper::MatrixFree) && (UpLo==(Lower|Upper)) && (!MatrixType::IsRowMajor) && (!NumTraits::IsComplex) }; typedef typename internal::conditional, ActualMatrixType const&>::type RowMajorWrapper; EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(MatrixWrapper::MatrixFree,UpLo==(Lower|Upper)),MATRIX_FREE_CONJUGATE_GRADIENT_IS_COMPATIBLE_WITH_UPPER_UNION_LOWER_MODE_ONLY); typedef typename internal::conditional::Type >::type SelfAdjointWrapper; m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; RowMajorWrapper row_mat(matrix()); internal::minres(SelfAdjointWrapper(row_mat), b, x, Base::m_preconditioner, m_iterations, m_error); m_info = m_error <= Base::m_tolerance ? Success : NoConvergence; } protected: }; } // end namespace Eigen #endif // EIGEN_MINRES_H RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/IterationController.h0000644000176200001440000001236014562417221030427 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud /* NOTE The class IterationController has been adapted from the iteration * class of the GMM++ and ITL libraries. */ //======================================================================= // Copyright (C) 1997-2001 // Authors: Andrew Lumsdaine // Lie-Quan Lee // // This file is part of the Iterative Template Library // // You should have received a copy of the License Agreement for the // Iterative Template Library along with the software; see the // file LICENSE. // // Permission to modify the code and to distribute modified code is // granted, provided the text of this NOTICE is retained, a notice that // the code was modified is included with the above COPYRIGHT NOTICE and // with the COPYRIGHT NOTICE in the LICENSE file, and that the LICENSE // file is distributed with the modified code. // // LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. // By way of example, but not limitation, Licensor MAKES NO // REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY // PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE COMPONENTS // OR DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS // OR OTHER RIGHTS. //======================================================================= //======================================================================== // // Copyright (C) 2002-2007 Yves Renard // // This file is a part of GETFEM++ // // Getfem++ is free software; you can redistribute it and/or modify // it under the terms of the GNU Lesser General Public License as // published by the Free Software Foundation; version 2.1 of the License. // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU Lesser General Public License for more details. // You should have received a copy of the GNU Lesser General Public // License along with this program; if not, write to the Free Software // Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, // USA. // //======================================================================== #include "../../../../Eigen/src/Core/util/NonMPL2.h" #ifndef EIGEN_ITERATION_CONTROLLER_H #define EIGEN_ITERATION_CONTROLLER_H namespace Eigen { /** \ingroup IterativeLinearSolvers_Module * \class IterationController * * \brief Controls the iterations of the iterative solvers * * This class has been adapted from the iteration class of GMM++ and ITL libraries. * */ class IterationController { protected : double m_rhsn; ///< Right hand side norm size_t m_maxiter; ///< Max. number of iterations int m_noise; ///< if noise > 0 iterations are printed double m_resmax; ///< maximum residual double m_resminreach, m_resadd; size_t m_nit; ///< iteration number double m_res; ///< last computed residual bool m_written; void (*m_callback)(const IterationController&); public : void init() { m_nit = 0; m_res = 0.0; m_written = false; m_resminreach = 1E50; m_resadd = 0.0; m_callback = 0; } IterationController(double r = 1.0E-8, int noi = 0, size_t mit = size_t(-1)) : m_rhsn(1.0), m_maxiter(mit), m_noise(noi), m_resmax(r) { init(); } void operator ++(int) { m_nit++; m_written = false; m_resadd += m_res; } void operator ++() { (*this)++; } bool first() { return m_nit == 0; } /* get/set the "noisyness" (verbosity) of the solvers */ int noiseLevel() const { return m_noise; } void setNoiseLevel(int n) { m_noise = n; } void reduceNoiseLevel() { if (m_noise > 0) m_noise--; } double maxResidual() const { return m_resmax; } void setMaxResidual(double r) { m_resmax = r; } double residual() const { return m_res; } /* change the user-definable callback, called after each iteration */ void setCallback(void (*t)(const IterationController&)) { m_callback = t; } size_t iteration() const { return m_nit; } void setIteration(size_t i) { m_nit = i; } size_t maxIterarions() const { return m_maxiter; } void setMaxIterations(size_t i) { m_maxiter = i; } double rhsNorm() const { return m_rhsn; } void setRhsNorm(double r) { m_rhsn = r; } bool converged() const { return m_res <= m_rhsn * m_resmax; } bool converged(double nr) { using std::abs; m_res = abs(nr); m_resminreach = (std::min)(m_resminreach, m_res); return converged(); } template bool converged(const VectorType &v) { return converged(v.squaredNorm()); } bool finished(double nr) { if (m_callback) m_callback(*this); if (m_noise > 0 && !m_written) { converged(nr); m_written = true; } return (m_nit >= m_maxiter || converged(nr)); } template bool finished(const MatrixBase &v) { return finished(double(v.squaredNorm())); } }; } // end namespace Eigen #endif // EIGEN_ITERATION_CONTROLLER_H RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/ConstrainedConjGrad.h0000644000176200001440000001231414562417221030305 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud /* NOTE The functions of this file have been adapted from the GMM++ library */ //======================================================================== // // Copyright (C) 2002-2007 Yves Renard // // This file is a part of GETFEM++ // // Getfem++ is free software; you can redistribute it and/or modify // it under the terms of the GNU Lesser General Public License as // published by the Free Software Foundation; version 2.1 of the License. // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU Lesser General Public License for more details. // You should have received a copy of the GNU Lesser General Public // License along with this program; if not, write to the Free Software // Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, // USA. // //======================================================================== #include "../../../../Eigen/src/Core/util/NonMPL2.h" #ifndef EIGEN_CONSTRAINEDCG_H #define EIGEN_CONSTRAINEDCG_H #include "../../../../Eigen/Core" namespace Eigen { namespace internal { /** \ingroup IterativeLinearSolvers_Module * Compute the pseudo inverse of the non-square matrix C such that * \f$ CINV = (C * C^T)^{-1} * C \f$ based on a conjugate gradient method. * * This function is internally used by constrained_cg. */ template void pseudo_inverse(const CMatrix &C, CINVMatrix &CINV) { // optimisable : copie de la ligne, precalcul de C * trans(C). typedef typename CMatrix::Scalar Scalar; typedef typename CMatrix::Index Index; // FIXME use sparse vectors ? typedef Matrix TmpVec; Index rows = C.rows(), cols = C.cols(); TmpVec d(rows), e(rows), l(cols), p(rows), q(rows), r(rows); Scalar rho, rho_1, alpha; d.setZero(); typedef Triplet T; std::vector tripletList; for (Index i = 0; i < rows; ++i) { d[i] = 1.0; rho = 1.0; e.setZero(); r = d; p = d; while (rho >= 1e-38) { /* conjugate gradient to compute e */ /* which is the i-th row of inv(C * trans(C)) */ l = C.transpose() * p; q = C * l; alpha = rho / p.dot(q); e += alpha * p; r += -alpha * q; rho_1 = rho; rho = r.dot(r); p = (rho/rho_1) * p + r; } l = C.transpose() * e; // l is the i-th row of CINV // FIXME add a generic "prune/filter" expression for both dense and sparse object to sparse for (Index j=0; j void constrained_cg(const TMatrix& A, const CMatrix& C, VectorX& x, const VectorB& b, const VectorF& f, IterationController &iter) { using std::sqrt; typedef typename TMatrix::Scalar Scalar; typedef typename TMatrix::Index Index; typedef Matrix TmpVec; Scalar rho = 1.0, rho_1, lambda, gamma; Index xSize = x.size(); TmpVec p(xSize), q(xSize), q2(xSize), r(xSize), old_z(xSize), z(xSize), memox(xSize); std::vector satured(C.rows()); p.setZero(); iter.setRhsNorm(sqrt(b.dot(b))); // gael vect_sp(PS, b, b) if (iter.rhsNorm() == 0.0) iter.setRhsNorm(1.0); SparseMatrix CINV(C.rows(), C.cols()); pseudo_inverse(C, CINV); while(true) { // computation of residual old_z = z; memox = x; r = b; r += A * -x; z = r; bool transition = false; for (Index i = 0; i < C.rows(); ++i) { Scalar al = C.row(i).dot(x) - f.coeff(i); if (al >= -1.0E-15) { if (!satured[i]) { satured[i] = true; transition = true; } Scalar bb = CINV.row(i).dot(z); if (bb > 0.0) // FIXME: we should allow that: z += -bb * C.row(i); for (typename CMatrix::InnerIterator it(C,i); it; ++it) z.coeffRef(it.index()) -= bb*it.value(); } else satured[i] = false; } // descent direction rho_1 = rho; rho = r.dot(z); if (iter.finished(rho)) break; if (transition || iter.first()) gamma = 0.0; else gamma = (std::max)(0.0, (rho - old_z.dot(z)) / rho_1); p = z + gamma*p; ++iter; // one dimensionnal optimization q = A * p; lambda = rho / q.dot(p); for (Index i = 0; i < C.rows(); ++i) { if (!satured[i]) { Scalar bb = C.row(i).dot(p) - f[i]; if (bb > 0.0) lambda = (std::min)(lambda, (f.coeff(i)-C.row(i).dot(x)) / bb); } } x += lambda * p; memox -= x; } } } // end namespace internal } // end namespace Eigen #endif // EIGEN_CONSTRAINEDCG_H RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/IncompleteLU.h0000644000176200001440000000473014107270226026764 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INCOMPLETE_LU_H #define EIGEN_INCOMPLETE_LU_H namespace Eigen { template class IncompleteLU : public SparseSolverBase > { protected: typedef SparseSolverBase > Base; using Base::m_isInitialized; typedef _Scalar Scalar; typedef Matrix Vector; typedef typename Vector::Index Index; typedef SparseMatrix FactorType; public: typedef Matrix MatrixType; IncompleteLU() {} template IncompleteLU(const MatrixType& mat) { compute(mat); } Index rows() const { return m_lu.rows(); } Index cols() const { return m_lu.cols(); } template IncompleteLU& compute(const MatrixType& mat) { m_lu = mat; int size = mat.cols(); Vector diag(size); for(int i=0; i void _solve_impl(const Rhs& b, Dest& x) const { x = m_lu.template triangularView().solve(b); x = m_lu.template triangularView().solve(x); } protected: FactorType m_lu; }; } // end namespace Eigen #endif // EIGEN_INCOMPLETE_LU_H RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/GMRES.h0000644000176200001440000002374114562417221025307 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Gael Guennebaud // Copyright (C) 2012, 2014 Kolja Brix // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GMRES_H #define EIGEN_GMRES_H namespace Eigen { namespace internal { /** * Generalized Minimal Residual Algorithm based on the * Arnoldi algorithm implemented with Householder reflections. * * Parameters: * \param mat matrix of linear system of equations * \param rhs right hand side vector of linear system of equations * \param x on input: initial guess, on output: solution * \param precond preconditioner used * \param iters on input: maximum number of iterations to perform * on output: number of iterations performed * \param restart number of iterations for a restart * \param tol_error on input: relative residual tolerance * on output: residuum achieved * * \sa IterativeMethods::bicgstab() * * * For references, please see: * * Saad, Y. and Schultz, M. H. * GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. * SIAM J.Sci.Stat.Comp. 7, 1986, pp. 856 - 869. * * Saad, Y. * Iterative Methods for Sparse Linear Systems. * Society for Industrial and Applied Mathematics, Philadelphia, 2003. * * Walker, H. F. * Implementations of the GMRES method. * Comput.Phys.Comm. 53, 1989, pp. 311 - 320. * * Walker, H. F. * Implementation of the GMRES Method using Householder Transformations. * SIAM J.Sci.Stat.Comp. 9, 1988, pp. 152 - 163. * */ template bool gmres(const MatrixType & mat, const Rhs & rhs, Dest & x, const Preconditioner & precond, Index &iters, const Index &restart, typename Dest::RealScalar & tol_error) { using std::sqrt; using std::abs; typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix < Scalar, Dynamic, 1 > VectorType; typedef Matrix < Scalar, Dynamic, Dynamic, ColMajor> FMatrixType; const RealScalar considerAsZero = (std::numeric_limits::min)(); if(rhs.norm() <= considerAsZero) { x.setZero(); tol_error = 0; return true; } RealScalar tol = tol_error; const Index maxIters = iters; iters = 0; const Index m = mat.rows(); // residual and preconditioned residual VectorType p0 = rhs - mat*x; VectorType r0 = precond.solve(p0); const RealScalar r0Norm = r0.norm(); // is initial guess already good enough? if(r0Norm == 0) { tol_error = 0; return true; } // storage for Hessenberg matrix and Householder data FMatrixType H = FMatrixType::Zero(m, restart + 1); VectorType w = VectorType::Zero(restart + 1); VectorType tau = VectorType::Zero(restart + 1); // storage for Jacobi rotations std::vector < JacobiRotation < Scalar > > G(restart); // storage for temporaries VectorType t(m), v(m), workspace(m), x_new(m); // generate first Householder vector Ref H0_tail = H.col(0).tail(m - 1); RealScalar beta; r0.makeHouseholder(H0_tail, tau.coeffRef(0), beta); w(0) = Scalar(beta); for (Index k = 1; k <= restart; ++k) { ++iters; v = VectorType::Unit(m, k - 1); // apply Householder reflections H_{1} ... H_{k-1} to v // TODO: use a HouseholderSequence for (Index i = k - 1; i >= 0; --i) { v.tail(m - i).applyHouseholderOnTheLeft(H.col(i).tail(m - i - 1), tau.coeffRef(i), workspace.data()); } // apply matrix M to v: v = mat * v; t.noalias() = mat * v; v = precond.solve(t); // apply Householder reflections H_{k-1} ... H_{1} to v // TODO: use a HouseholderSequence for (Index i = 0; i < k; ++i) { v.tail(m - i).applyHouseholderOnTheLeft(H.col(i).tail(m - i - 1), tau.coeffRef(i), workspace.data()); } if (v.tail(m - k).norm() != 0.0) { if (k <= restart) { // generate new Householder vector Ref Hk_tail = H.col(k).tail(m - k - 1); v.tail(m - k).makeHouseholder(Hk_tail, tau.coeffRef(k), beta); // apply Householder reflection H_{k} to v v.tail(m - k).applyHouseholderOnTheLeft(Hk_tail, tau.coeffRef(k), workspace.data()); } } if (k > 1) { for (Index i = 0; i < k - 1; ++i) { // apply old Givens rotations to v v.applyOnTheLeft(i, i + 1, G[i].adjoint()); } } if (k y = w.head(k); H.topLeftCorner(k, k).template triangularView ().solveInPlace(y); // use Horner-like scheme to calculate solution vector x_new.setZero(); for (Index i = k - 1; i >= 0; --i) { x_new(i) += y(i); // apply Householder reflection H_{i} to x_new x_new.tail(m - i).applyHouseholderOnTheLeft(H.col(i).tail(m - i - 1), tau.coeffRef(i), workspace.data()); } x += x_new; if(stop) { return true; } else { k=0; // reset data for restart p0.noalias() = rhs - mat*x; r0 = precond.solve(p0); // clear Hessenberg matrix and Householder data H.setZero(); w.setZero(); tau.setZero(); // generate first Householder vector r0.makeHouseholder(H0_tail, tau.coeffRef(0), beta); w(0) = Scalar(beta); } } } return false; } } template< typename _MatrixType, typename _Preconditioner = DiagonalPreconditioner > class GMRES; namespace internal { template< typename _MatrixType, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } /** \ingroup IterativeLinearSolvers_Module * \brief A GMRES solver for sparse square problems * * This class allows to solve for A.x = b sparse linear problems using a generalized minimal * residual method. The vectors x and b can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * This class can be used as the direct solver classes. Here is a typical usage example: * \code * int n = 10000; * VectorXd x(n), b(n); * SparseMatrix A(n,n); * // fill A and b * GMRES > solver(A); * x = solver.solve(b); * std::cout << "#iterations: " << solver.iterations() << std::endl; * std::cout << "estimated error: " << solver.error() << std::endl; * // update b, and solve again * x = solver.solve(b); * \endcode * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * GMRES can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template< typename _MatrixType, typename _Preconditioner> class GMRES : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; private: Index m_restart; public: using Base::_solve_impl; typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; public: /** Default constructor. */ GMRES() : Base(), m_restart(30) {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit GMRES(const EigenBase& A) : Base(A.derived()), m_restart(30) {} ~GMRES() {} /** Get the number of iterations after that a restart is performed. */ Index get_restart() { return m_restart; } /** Set the number of iterations after that a restart is performed. * \param restart number of iterations for a restarti, default is 30. */ void set_restart(const Index restart) { m_restart=restart; } /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; bool ret = internal::gmres(matrix(), b, x, Base::m_preconditioner, m_iterations, m_restart, m_error); m_info = (!ret) ? NumericalIssue : m_error <= Base::m_tolerance ? Success : NoConvergence; } protected: }; } // end namespace Eigen #endif // EIGEN_GMRES_H RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/Scaling.h0000644000176200001440000001333514562417221026010 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Desire NUENTSA WAKAM A; * // fill A and b; * IterScaling > scal; * // Compute the left and right scaling vectors. The matrix is equilibrated at output * scal.computeRef(A); * // Scale the right hand side * b = scal.LeftScaling().cwiseProduct(b); * // Now, solve the equilibrated linear system with any available solver * * // Scale back the computed solution * x = scal.RightScaling().cwiseProduct(x); * \endcode * * \tparam _MatrixType the type of the matrix. It should be a real square sparsematrix * * References : D. Ruiz and B. Ucar, A Symmetry Preserving Algorithm for Matrix Scaling, INRIA Research report RR-7552 * * \sa \ref IncompleteLUT */ template class IterScaling { public: typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::Index Index; public: IterScaling() { init(); } IterScaling(const MatrixType& matrix) { init(); compute(matrix); } ~IterScaling() { } /** * Compute the left and right diagonal matrices to scale the input matrix @p mat * * FIXME This algorithm will be modified such that the diagonal elements are permuted on the diagonal. * * \sa LeftScaling() RightScaling() */ void compute (const MatrixType& mat) { using std::abs; int m = mat.rows(); int n = mat.cols(); eigen_assert((m>0 && m == n) && "Please give a non - empty matrix"); m_left.resize(m); m_right.resize(n); m_left.setOnes(); m_right.setOnes(); m_matrix = mat; VectorXd Dr, Dc, DrRes, DcRes; // Temporary Left and right scaling vectors Dr.resize(m); Dc.resize(n); DrRes.resize(m); DcRes.resize(n); double EpsRow = 1.0, EpsCol = 1.0; int its = 0; do { // Iterate until the infinite norm of each row and column is approximately 1 // Get the maximum value in each row and column Dr.setZero(); Dc.setZero(); for (int k=0; km_tol || EpsCol > m_tol) && (its < m_maxits) ); m_isInitialized = true; } /** Compute the left and right vectors to scale the vectors * the input matrix is scaled with the computed vectors at output * * \sa compute() */ void computeRef (MatrixType& mat) { compute (mat); mat = m_matrix; } /** Get the vector to scale the rows of the matrix */ VectorXd& LeftScaling() { return m_left; } /** Get the vector to scale the columns of the matrix */ VectorXd& RightScaling() { return m_right; } /** Set the tolerance for the convergence of the iterative scaling algorithm */ void setTolerance(double tol) { m_tol = tol; } protected: void init() { m_tol = 1e-10; m_maxits = 5; m_isInitialized = false; } MatrixType m_matrix; mutable ComputationInfo m_info; bool m_isInitialized; VectorXd m_left; // Left scaling vector VectorXd m_right; // m_right scaling vector double m_tol; int m_maxits; // Maximum number of iterations allowed }; } #endif RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/DGMRES.h0000644000176200001440000004255114562417221025413 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Désiré Nuentsa-Wakam // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DGMRES_H #define EIGEN_DGMRES_H #include "../../../../Eigen/Eigenvalues" namespace Eigen { template< typename _MatrixType, typename _Preconditioner = DiagonalPreconditioner > class DGMRES; namespace internal { template< typename _MatrixType, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; /** \brief Computes a permutation vector to have a sorted sequence * \param vec The vector to reorder. * \param perm gives the sorted sequence on output. Must be initialized with 0..n-1 * \param ncut Put the ncut smallest elements at the end of the vector * WARNING This is an expensive sort, so should be used only * for small size vectors * TODO Use modified QuickSplit or std::nth_element to get the smallest values */ template void sortWithPermutation (VectorType& vec, IndexType& perm, typename IndexType::Scalar& ncut) { eigen_assert(vec.size() == perm.size()); bool flag; for (Index k = 0; k < ncut; k++) { flag = false; for (Index j = 0; j < vec.size()-1; j++) { if ( vec(perm(j)) < vec(perm(j+1)) ) { std::swap(perm(j),perm(j+1)); flag = true; } if (!flag) break; // The vector is in sorted order } } } } /** * \ingroup IterativeLinearSolvers_Module * \brief A Restarted GMRES with deflation. * This class implements a modification of the GMRES solver for * sparse linear systems. The basis is built with modified * Gram-Schmidt. At each restart, a few approximated eigenvectors * corresponding to the smallest eigenvalues are used to build a * preconditioner for the next cycle. This preconditioner * for deflation can be combined with any other preconditioner, * the IncompleteLUT for instance. The preconditioner is applied * at right of the matrix and the combination is multiplicative. * * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * Typical usage : * \code * SparseMatrix A; * VectorXd x, b; * //Fill A and b ... * DGMRES > solver; * solver.set_restart(30); // Set restarting value * solver.setEigenv(1); // Set the number of eigenvalues to deflate * solver.compute(A); * x = solver.solve(b); * \endcode * * DGMRES can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * References : * [1] D. NUENTSA WAKAM and F. PACULL, Memory Efficient Hybrid * Algebraic Solvers for Linear Systems Arising from Compressible * Flows, Computers and Fluids, In Press, * https://doi.org/10.1016/j.compfluid.2012.03.023 * [2] K. Burrage and J. Erhel, On the performance of various * adaptive preconditioned GMRES strategies, 5(1998), 101-121. * [3] J. Erhel, K. Burrage and B. Pohl, Restarted GMRES * preconditioned by deflation,J. Computational and Applied * Mathematics, 69(1996), 303-318. * */ template< typename _MatrixType, typename _Preconditioner> class DGMRES : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; using Base::m_tolerance; public: using Base::_solve_impl; using Base::_solve_with_guess_impl; typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; typedef Matrix DenseMatrix; typedef Matrix DenseRealMatrix; typedef Matrix DenseVector; typedef Matrix DenseRealVector; typedef Matrix, Dynamic, 1> ComplexVector; /** Default constructor. */ DGMRES() : Base(),m_restart(30),m_neig(0),m_r(0),m_maxNeig(5),m_isDeflAllocated(false),m_isDeflInitialized(false) {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit DGMRES(const EigenBase& A) : Base(A.derived()), m_restart(30),m_neig(0),m_r(0),m_maxNeig(5),m_isDeflAllocated(false),m_isDeflInitialized(false) {} ~DGMRES() {} /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { EIGEN_STATIC_ASSERT(Rhs::ColsAtCompileTime==1 || Dest::ColsAtCompileTime==1, YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX); m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; dgmres(matrix(), b, x, Base::m_preconditioner); } /** * Get the restart value */ Index restart() { return m_restart; } /** * Set the restart value (default is 30) */ void set_restart(const Index restart) { m_restart=restart; } /** * Set the number of eigenvalues to deflate at each restart */ void setEigenv(const Index neig) { m_neig = neig; if (neig+1 > m_maxNeig) m_maxNeig = neig+1; // To allow for complex conjugates } /** * Get the size of the deflation subspace size */ Index deflSize() {return m_r; } /** * Set the maximum size of the deflation subspace */ void setMaxEigenv(const Index maxNeig) { m_maxNeig = maxNeig; } protected: // DGMRES algorithm template void dgmres(const MatrixType& mat,const Rhs& rhs, Dest& x, const Preconditioner& precond) const; // Perform one cycle of GMRES template Index dgmresCycle(const MatrixType& mat, const Preconditioner& precond, Dest& x, DenseVector& r0, RealScalar& beta, const RealScalar& normRhs, Index& nbIts) const; // Compute data to use for deflation Index dgmresComputeDeflationData(const MatrixType& mat, const Preconditioner& precond, const Index& it, StorageIndex& neig) const; // Apply deflation to a vector template Index dgmresApplyDeflation(const RhsType& In, DestType& Out) const; ComplexVector schurValues(const ComplexSchur& schurofH) const; ComplexVector schurValues(const RealSchur& schurofH) const; // Init data for deflation void dgmresInitDeflation(Index& rows) const; mutable DenseMatrix m_V; // Krylov basis vectors mutable DenseMatrix m_H; // Hessenberg matrix mutable DenseMatrix m_Hes; // Initial hessenberg matrix without Givens rotations applied mutable Index m_restart; // Maximum size of the Krylov subspace mutable DenseMatrix m_U; // Vectors that form the basis of the invariant subspace mutable DenseMatrix m_MU; // matrix operator applied to m_U (for next cycles) mutable DenseMatrix m_T; /* T=U^T*M^{-1}*A*U */ mutable PartialPivLU m_luT; // LU factorization of m_T mutable StorageIndex m_neig; //Number of eigenvalues to extract at each restart mutable Index m_r; // Current number of deflated eigenvalues, size of m_U mutable Index m_maxNeig; // Maximum number of eigenvalues to deflate mutable RealScalar m_lambdaN; //Modulus of the largest eigenvalue of A mutable bool m_isDeflAllocated; mutable bool m_isDeflInitialized; //Adaptive strategy mutable RealScalar m_smv; // Smaller multiple of the remaining number of steps allowed mutable bool m_force; // Force the use of deflation at each restart }; /** * \brief Perform several cycles of restarted GMRES with modified Gram Schmidt, * * A right preconditioner is used combined with deflation. * */ template< typename _MatrixType, typename _Preconditioner> template void DGMRES<_MatrixType, _Preconditioner>::dgmres(const MatrixType& mat,const Rhs& rhs, Dest& x, const Preconditioner& precond) const { const RealScalar considerAsZero = (std::numeric_limits::min)(); RealScalar normRhs = rhs.norm(); if(normRhs <= considerAsZero) { x.setZero(); m_error = 0; return; } //Initialization m_isDeflInitialized = false; Index n = mat.rows(); DenseVector r0(n); Index nbIts = 0; m_H.resize(m_restart+1, m_restart); m_Hes.resize(m_restart, m_restart); m_V.resize(n,m_restart+1); //Initial residual vector and initial norm if(x.squaredNorm()==0) x = precond.solve(rhs); r0 = rhs - mat * x; RealScalar beta = r0.norm(); m_error = beta/normRhs; if(m_error < m_tolerance) m_info = Success; else m_info = NoConvergence; // Iterative process while (nbIts < m_iterations && m_info == NoConvergence) { dgmresCycle(mat, precond, x, r0, beta, normRhs, nbIts); // Compute the new residual vector for the restart if (nbIts < m_iterations && m_info == NoConvergence) { r0 = rhs - mat * x; beta = r0.norm(); } } } /** * \brief Perform one restart cycle of DGMRES * \param mat The coefficient matrix * \param precond The preconditioner * \param x the new approximated solution * \param r0 The initial residual vector * \param beta The norm of the residual computed so far * \param normRhs The norm of the right hand side vector * \param nbIts The number of iterations */ template< typename _MatrixType, typename _Preconditioner> template Index DGMRES<_MatrixType, _Preconditioner>::dgmresCycle(const MatrixType& mat, const Preconditioner& precond, Dest& x, DenseVector& r0, RealScalar& beta, const RealScalar& normRhs, Index& nbIts) const { //Initialization DenseVector g(m_restart+1); // Right hand side of the least square problem g.setZero(); g(0) = Scalar(beta); m_V.col(0) = r0/beta; m_info = NoConvergence; std::vector >gr(m_restart); // Givens rotations Index it = 0; // Number of inner iterations Index n = mat.rows(); DenseVector tv1(n), tv2(n); //Temporary vectors while (m_info == NoConvergence && it < m_restart && nbIts < m_iterations) { // Apply preconditioner(s) at right if (m_isDeflInitialized ) { dgmresApplyDeflation(m_V.col(it), tv1); // Deflation tv2 = precond.solve(tv1); } else { tv2 = precond.solve(m_V.col(it)); // User's selected preconditioner } tv1 = mat * tv2; // Orthogonalize it with the previous basis in the basis using modified Gram-Schmidt Scalar coef; for (Index i = 0; i <= it; ++i) { coef = tv1.dot(m_V.col(i)); tv1 = tv1 - coef * m_V.col(i); m_H(i,it) = coef; m_Hes(i,it) = coef; } // Normalize the vector coef = tv1.norm(); m_V.col(it+1) = tv1/coef; m_H(it+1, it) = coef; // m_Hes(it+1,it) = coef; // FIXME Check for happy breakdown // Update Hessenberg matrix with Givens rotations for (Index i = 1; i <= it; ++i) { m_H.col(it).applyOnTheLeft(i-1,i,gr[i-1].adjoint()); } // Compute the new plane rotation gr[it].makeGivens(m_H(it, it), m_H(it+1,it)); // Apply the new rotation m_H.col(it).applyOnTheLeft(it,it+1,gr[it].adjoint()); g.applyOnTheLeft(it,it+1, gr[it].adjoint()); beta = std::abs(g(it+1)); m_error = beta/normRhs; // std::cerr << nbIts << " Relative Residual Norm " << m_error << std::endl; it++; nbIts++; if (m_error < m_tolerance) { // The method has converged m_info = Success; break; } } // Compute the new coefficients by solving the least square problem // it++; //FIXME Check first if the matrix is singular ... zero diagonal DenseVector nrs(m_restart); nrs = m_H.topLeftCorner(it,it).template triangularView().solve(g.head(it)); // Form the new solution if (m_isDeflInitialized) { tv1 = m_V.leftCols(it) * nrs; dgmresApplyDeflation(tv1, tv2); x = x + precond.solve(tv2); } else x = x + precond.solve(m_V.leftCols(it) * nrs); // Go for a new cycle and compute data for deflation if(nbIts < m_iterations && m_info == NoConvergence && m_neig > 0 && (m_r+m_neig) < m_maxNeig) dgmresComputeDeflationData(mat, precond, it, m_neig); return 0; } template< typename _MatrixType, typename _Preconditioner> void DGMRES<_MatrixType, _Preconditioner>::dgmresInitDeflation(Index& rows) const { m_U.resize(rows, m_maxNeig); m_MU.resize(rows, m_maxNeig); m_T.resize(m_maxNeig, m_maxNeig); m_lambdaN = 0.0; m_isDeflAllocated = true; } template< typename _MatrixType, typename _Preconditioner> inline typename DGMRES<_MatrixType, _Preconditioner>::ComplexVector DGMRES<_MatrixType, _Preconditioner>::schurValues(const ComplexSchur& schurofH) const { return schurofH.matrixT().diagonal(); } template< typename _MatrixType, typename _Preconditioner> inline typename DGMRES<_MatrixType, _Preconditioner>::ComplexVector DGMRES<_MatrixType, _Preconditioner>::schurValues(const RealSchur& schurofH) const { const DenseMatrix& T = schurofH.matrixT(); Index it = T.rows(); ComplexVector eig(it); Index j = 0; while (j < it-1) { if (T(j+1,j) ==Scalar(0)) { eig(j) = std::complex(T(j,j),RealScalar(0)); j++; } else { eig(j) = std::complex(T(j,j),T(j+1,j)); eig(j+1) = std::complex(T(j,j+1),T(j+1,j+1)); j++; } } if (j < it-1) eig(j) = std::complex(T(j,j),RealScalar(0)); return eig; } template< typename _MatrixType, typename _Preconditioner> Index DGMRES<_MatrixType, _Preconditioner>::dgmresComputeDeflationData(const MatrixType& mat, const Preconditioner& precond, const Index& it, StorageIndex& neig) const { // First, find the Schur form of the Hessenberg matrix H typename internal::conditional::IsComplex, ComplexSchur, RealSchur >::type schurofH; bool computeU = true; DenseMatrix matrixQ(it,it); matrixQ.setIdentity(); schurofH.computeFromHessenberg(m_Hes.topLeftCorner(it,it), matrixQ, computeU); ComplexVector eig(it); Matrixperm(it); eig = this->schurValues(schurofH); // Reorder the absolute values of Schur values DenseRealVector modulEig(it); for (Index j=0; j(it-1)); internal::sortWithPermutation(modulEig, perm, neig); if (!m_lambdaN) { m_lambdaN = (std::max)(modulEig.maxCoeff(), m_lambdaN); } //Count the real number of extracted eigenvalues (with complex conjugates) Index nbrEig = 0; while (nbrEig < neig) { if(eig(perm(it-nbrEig-1)).imag() == RealScalar(0)) nbrEig++; else nbrEig += 2; } // Extract the Schur vectors corresponding to the smallest Ritz values DenseMatrix Sr(it, nbrEig); Sr.setZero(); for (Index j = 0; j < nbrEig; j++) { Sr.col(j) = schurofH.matrixU().col(perm(it-j-1)); } // Form the Schur vectors of the initial matrix using the Krylov basis DenseMatrix X; X = m_V.leftCols(it) * Sr; if (m_r) { // Orthogonalize X against m_U using modified Gram-Schmidt for (Index j = 0; j < nbrEig; j++) for (Index k =0; k < m_r; k++) X.col(j) = X.col(j) - (m_U.col(k).dot(X.col(j)))*m_U.col(k); } // Compute m_MX = A * M^-1 * X Index m = m_V.rows(); if (!m_isDeflAllocated) dgmresInitDeflation(m); DenseMatrix MX(m, nbrEig); DenseVector tv1(m); for (Index j = 0; j < nbrEig; j++) { tv1 = mat * X.col(j); MX.col(j) = precond.solve(tv1); } //Update m_T = [U'MU U'MX; X'MU X'MX] m_T.block(m_r, m_r, nbrEig, nbrEig) = X.transpose() * MX; if(m_r) { m_T.block(0, m_r, m_r, nbrEig) = m_U.leftCols(m_r).transpose() * MX; m_T.block(m_r, 0, nbrEig, m_r) = X.transpose() * m_MU.leftCols(m_r); } // Save X into m_U and m_MX in m_MU for (Index j = 0; j < nbrEig; j++) m_U.col(m_r+j) = X.col(j); for (Index j = 0; j < nbrEig; j++) m_MU.col(m_r+j) = MX.col(j); // Increase the size of the invariant subspace m_r += nbrEig; // Factorize m_T into m_luT m_luT.compute(m_T.topLeftCorner(m_r, m_r)); //FIXME CHeck if the factorization was correctly done (nonsingular matrix) m_isDeflInitialized = true; return 0; } template template Index DGMRES<_MatrixType, _Preconditioner>::dgmresApplyDeflation(const RhsType &x, DestType &y) const { DenseVector x1 = m_U.leftCols(m_r).transpose() * x; y = x + m_U.leftCols(m_r) * ( m_lambdaN * m_luT.solve(x1) - x1); return 0; } } // end namespace Eigen #endif RcppEigen/inst/include/unsupported/Eigen/src/IterativeSolvers/IDRS.h0000755000176200001440000003471214562417221025176 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020 Chris Schoutrop // Copyright (C) 2020 Jens Wehner // Copyright (C) 2020 Jan van Dijk // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_IDRS_H #define EIGEN_IDRS_H namespace Eigen { namespace internal { /** \internal Low-level Induced Dimension Reduction algoritm \param A The matrix A \param b The right hand side vector b \param x On input and initial solution, on output the computed solution. \param precond A preconditioner being able to efficiently solve for an approximation of Ax=b (regardless of b) \param iter On input the max number of iteration, on output the number of performed iterations. \param relres On input the tolerance error, on output an estimation of the relative error. \param S On input Number of the dimension of the shadow space. \param smoothing switches residual smoothing on. \param angle small omega lead to faster convergence at the expense of numerical stability \param replacement switches on a residual replacement strategy to increase accuracy of residual at the expense of more Mat*vec products \return false in the case of numerical issue, for example a break down of IDRS. */ template typename Vector::Scalar omega(const Vector& t, const Vector& s, RealScalar angle) { using numext::abs; typedef typename Vector::Scalar Scalar; const RealScalar ns = s.norm(); const RealScalar nt = t.norm(); const Scalar ts = t.dot(s); const RealScalar rho = abs(ts / (nt * ns)); if (rho < angle) { if (ts == Scalar(0)) { return Scalar(0); } // Original relation for om is given by // om = om * angle / rho; // To alleviate potential (near) division by zero this can be rewritten as // om = angle * (ns / nt) * (ts / abs(ts)) = angle * (ns / nt) * sgn(ts) return angle * (ns / nt) * (ts / abs(ts)); } return ts / (nt * nt); } template bool idrs(const MatrixType& A, const Rhs& b, Dest& x, const Preconditioner& precond, Index& iter, typename Dest::RealScalar& relres, Index S, bool smoothing, typename Dest::RealScalar angle, bool replacement) { typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix VectorType; typedef Matrix DenseMatrixType; const Index N = b.size(); S = S < x.rows() ? S : x.rows(); const RealScalar tol = relres; const Index maxit = iter; Index replacements = 0; bool trueres = false; FullPivLU lu_solver; DenseMatrixType P; { HouseholderQR qr(DenseMatrixType::Random(N, S)); P = (qr.householderQ() * DenseMatrixType::Identity(N, S)); } const RealScalar normb = b.norm(); if (internal::isApprox(normb, RealScalar(0))) { //Solution is the zero vector x.setZero(); iter = 0; relres = 0; return true; } // from http://homepage.tudelft.nl/1w5b5/IDRS/manual.pdf // A peak in the residual is considered dangerously high if‖ri‖/‖b‖> C(tol/epsilon). // With epsilon the // relative machine precision. The factor tol/epsilon corresponds to the size of a // finite precision number that is so large that the absolute round-off error in // this number, when propagated through the process, makes it impossible to // achieve the required accuracy.The factor C accounts for the accumulation of // round-off errors. This parameter has beenset to 10−3. // mp is epsilon/C // 10^3 * eps is very conservative, so normally no residual replacements will take place. // It only happens if things go very wrong. Too many restarts may ruin the convergence. const RealScalar mp = RealScalar(1e3) * NumTraits::epsilon(); //Compute initial residual const RealScalar tolb = tol * normb; //Relative tolerance VectorType r = b - A * x; VectorType x_s, r_s; if (smoothing) { x_s = x; r_s = r; } RealScalar normr = r.norm(); if (normr <= tolb) { //Initial guess is a good enough solution iter = 0; relres = normr / normb; return true; } DenseMatrixType G = DenseMatrixType::Zero(N, S); DenseMatrixType U = DenseMatrixType::Zero(N, S); DenseMatrixType M = DenseMatrixType::Identity(S, S); VectorType t(N), v(N); Scalar om = 1.; //Main iteration loop, guild G-spaces: iter = 0; while (normr > tolb && iter < maxit) { //New right hand size for small system: VectorType f = (r.adjoint() * P).adjoint(); for (Index k = 0; k < S; ++k) { //Solve small system and make v orthogonal to P: //c = M(k:s,k:s)\f(k:s); lu_solver.compute(M.block(k , k , S -k, S - k )); VectorType c = lu_solver.solve(f.segment(k , S - k )); //v = r - G(:,k:s)*c; v = r - G.rightCols(S - k ) * c; //Preconditioning v = precond.solve(v); //Compute new U(:,k) and G(:,k), G(:,k) is in space G_j U.col(k) = U.rightCols(S - k ) * c + om * v; G.col(k) = A * U.col(k ); //Bi-Orthogonalise the new basis vectors: for (Index i = 0; i < k-1 ; ++i) { //alpha = ( P(:,i)'*G(:,k) )/M(i,i); Scalar alpha = P.col(i ).dot(G.col(k )) / M(i, i ); G.col(k ) = G.col(k ) - alpha * G.col(i ); U.col(k ) = U.col(k ) - alpha * U.col(i ); } //New column of M = P'*G (first k-1 entries are zero) //M(k:s,k) = (G(:,k)'*P(:,k:s))'; M.block(k , k , S - k , 1) = (G.col(k ).adjoint() * P.rightCols(S - k )).adjoint(); if (internal::isApprox(M(k,k), Scalar(0))) { return false; } //Make r orthogonal to q_i, i = 0..k-1 Scalar beta = f(k ) / M(k , k ); r = r - beta * G.col(k ); x = x + beta * U.col(k ); normr = r.norm(); if (replacement && normr > tolb / mp) { trueres = true; } //Smoothing: if (smoothing) { t = r_s - r; //gamma is a Scalar, but the conversion is not allowed Scalar gamma = t.dot(r_s) / t.norm(); r_s = r_s - gamma * t; x_s = x_s - gamma * (x_s - x); normr = r_s.norm(); } if (normr < tolb || iter == maxit) { break; } //New f = P'*r (first k components are zero) if (k < S-1) { f.segment(k + 1, S - (k + 1) ) = f.segment(k + 1 , S - (k + 1)) - beta * M.block(k + 1 , k , S - (k + 1), 1); } }//end for if (normr < tolb || iter == maxit) { break; } //Now we have sufficient vectors in G_j to compute residual in G_j+1 //Note: r is already perpendicular to P so v = r //Preconditioning v = r; v = precond.solve(v); //Matrix-vector multiplication: t = A * v; //Computation of a new omega om = internal::omega(t, r, angle); if (om == RealScalar(0.0)) { return false; } r = r - om * t; x = x + om * v; normr = r.norm(); if (replacement && normr > tolb / mp) { trueres = true; } //Residual replacement? if (trueres && normr < normb) { r = b - A * x; trueres = false; replacements++; } //Smoothing: if (smoothing) { t = r_s - r; Scalar gamma = t.dot(r_s) /t.norm(); r_s = r_s - gamma * t; x_s = x_s - gamma * (x_s - x); normr = r_s.norm(); } iter++; }//end while if (smoothing) { x = x_s; } relres=normr/normb; return true; } } // namespace internal template > class IDRS; namespace internal { template struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } // namespace internal /** \ingroup IterativeLinearSolvers_Module * \brief The Induced Dimension Reduction method (IDR(s)) is a short-recurrences Krylov method for sparse square problems. * * This class allows to solve for A.x = b sparse linear problems. The vectors x and b can be either dense or sparse. * he Induced Dimension Reduction method, IDR(), is a robust and efficient short-recurrence Krylov subspace method for * solving large nonsymmetric systems of linear equations. * * For indefinite systems IDR(S) outperforms both BiCGStab and BiCGStab(L). Additionally, IDR(S) can handle matrices * with complex eigenvalues more efficiently than BiCGStab. * * Many problems that do not converge for BiCGSTAB converge for IDR(s) (for larger values of s). And if both methods * converge the convergence for IDR(s) is typically much faster for difficult systems (for example indefinite problems). * * IDR(s) is a limited memory finite termination method. In exact arithmetic it converges in at most N+N/s iterations, * with N the system size. It uses a fixed number of 4+3s vector. In comparison, BiCGSTAB terminates in 2N iterations * and uses 7 vectors. GMRES terminates in at most N iterations, and uses I+3 vectors, with I the number of iterations. * Restarting GMRES limits the memory consumption, but destroys the finite termination property. * * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * * \implsparsesolverconcept * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * The tolerance corresponds to the relative residual error: |Ax-b|/|b| * * \b Performance: when using sparse matrices, best performance is achied for a row-major sparse matrix format. * Moreover, in this case multi-threading can be exploited if the user code is compiled with OpenMP enabled. * See \ref TopicMultiThreading for details. * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * IDR(s) can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template class IDRS : public IterativeSolverBase > { public: typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; private: typedef IterativeSolverBase Base; using Base::m_error; using Base::m_info; using Base::m_isInitialized; using Base::m_iterations; using Base::matrix; Index m_S; bool m_smoothing; RealScalar m_angle; bool m_residual; public: /** Default constructor. */ IDRS(): m_S(4), m_smoothing(false), m_angle(RealScalar(0.7)), m_residual(false) {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. This constructor is a shortcut for the default constructor followed by a call to compute(). \warning this class stores a reference to the matrix A as well as some precomputed values that depend on it. Therefore, if \a A is changed this class becomes invalid. Call compute() to update it with the new matrix A, or modify a copy of A. */ template explicit IDRS(const EigenBase& A) : Base(A.derived()), m_S(4), m_smoothing(false), m_angle(RealScalar(0.7)), m_residual(false) {} /** \internal */ /** Loops over the number of columns of b and does the following: 1. sets the tolerence and maxIterations 2. Calls the function that has the core solver routine */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; bool ret = internal::idrs(matrix(), b, x, Base::m_preconditioner, m_iterations, m_error, m_S,m_smoothing,m_angle,m_residual); m_info = (!ret) ? NumericalIssue : m_error <= Base::m_tolerance ? Success : NoConvergence; } /** Sets the parameter S, indicating the dimension of the shadow space. Default is 4*/ void setS(Index S) { if (S < 1) { S = 4; } m_S = S; } /** Switches off and on smoothing. Residual smoothing results in monotonically decreasing residual norms at the expense of two extra vectors of storage and a few extra vector operations. Although monotonic decrease of the residual norms is a desirable property, the rate of convergence of the unsmoothed process and the smoothed process is basically the same. Default is off */ void setSmoothing(bool smoothing) { m_smoothing=smoothing; } /** The angle must be a real scalar. In IDR(s), a value for the iteration parameter omega must be chosen in every s+1th step. The most natural choice is to select a value to minimize the norm of the next residual. This corresponds to the parameter omega = 0. In practice, this may lead to values of omega that are so small that the other iteration parameters cannot be computed with sufficient accuracy. In such cases it is better to increase the value of omega sufficiently such that a compromise is reached between accurate computations and reduction of the residual norm. The parameter angle =0.7 (”maintaining the convergence strategy”) results in such a compromise. */ void setAngle(RealScalar angle) { m_angle=angle; } /** The parameter replace is a logical that determines whether a residual replacement strategy is employed to increase the accuracy of the solution. */ void setResidualUpdate(bool update) { m_residual=update; } }; } // namespace Eigen #endif /* EIGEN_IDRS_H */ RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/0000755000176200001440000000000014562417221024532 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMqrsolv.h0000644000176200001440000001522514562417221026467 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // Copyright (C) 2012 Desire Nuentsa // // This code initially comes from MINPACK whose original authors are: // Copyright Jorge More - Argonne National Laboratory // Copyright Burt Garbow - Argonne National Laboratory // Copyright Ken Hillstrom - Argonne National Laboratory // // This Source Code Form is subject to the terms of the Minpack license // (a BSD-like license) described in the campaigned CopyrightMINPACK.txt file. #ifndef EIGEN_LMQRSOLV_H #define EIGEN_LMQRSOLV_H namespace Eigen { namespace internal { template void lmqrsolv( Matrix &s, const PermutationMatrix &iPerm, const Matrix &diag, const Matrix &qtb, Matrix &x, Matrix &sdiag) { /* Local variables */ Index i, j, k; Scalar temp; Index n = s.cols(); Matrix wa(n); JacobiRotation givens; /* Function Body */ // the following will only change the lower triangular part of s, including // the diagonal, though the diagonal is restored afterward /* copy r and (q transpose)*b to preserve input and initialize s. */ /* in particular, save the diagonal elements of r in x. */ x = s.diagonal(); wa = qtb; s.topLeftCorner(n,n).template triangularView() = s.topLeftCorner(n,n).transpose(); /* eliminate the diagonal matrix d using a givens rotation. */ for (j = 0; j < n; ++j) { /* prepare the row of d to be eliminated, locating the */ /* diagonal element using p from the qr factorization. */ const PermIndex l = iPerm.indices()(j); if (diag[l] == 0.) break; sdiag.tail(n-j).setZero(); sdiag[j] = diag[l]; /* the transformations to eliminate the row of d */ /* modify only a single element of (q transpose)*b */ /* beyond the first n, which is initially zero. */ Scalar qtbpj = 0.; for (k = j; k < n; ++k) { /* determine a givens rotation which eliminates the */ /* appropriate element in the current row of d. */ givens.makeGivens(-s(k,k), sdiag[k]); /* compute the modified diagonal element of r and */ /* the modified element of ((q transpose)*b,0). */ s(k,k) = givens.c() * s(k,k) + givens.s() * sdiag[k]; temp = givens.c() * wa[k] + givens.s() * qtbpj; qtbpj = -givens.s() * wa[k] + givens.c() * qtbpj; wa[k] = temp; /* accumulate the transformation in the row of s. */ for (i = k+1; i().solveInPlace(wa.head(nsing)); // restore sdiag = s.diagonal(); s.diagonal() = x; /* permute the components of z back to components of x. */ x = iPerm * wa; } template void lmqrsolv( SparseMatrix &s, const PermutationMatrix &iPerm, const Matrix &diag, const Matrix &qtb, Matrix &x, Matrix &sdiag) { /* Local variables */ typedef SparseMatrix FactorType; Index i, j, k, l; Scalar temp; Index n = s.cols(); Matrix wa(n); JacobiRotation givens; /* Function Body */ // the following will only change the lower triangular part of s, including // the diagonal, though the diagonal is restored afterward /* copy r and (q transpose)*b to preserve input and initialize R. */ wa = qtb; FactorType R(s); // Eliminate the diagonal matrix d using a givens rotation for (j = 0; j < n; ++j) { // Prepare the row of d to be eliminated, locating the // diagonal element using p from the qr factorization l = iPerm.indices()(j); if (diag(l) == Scalar(0)) break; sdiag.tail(n-j).setZero(); sdiag[j] = diag[l]; // the transformations to eliminate the row of d // modify only a single element of (q transpose)*b // beyond the first n, which is initially zero. Scalar qtbpj = 0; // Browse the nonzero elements of row j of the upper triangular s for (k = j; k < n; ++k) { typename FactorType::InnerIterator itk(R,k); for (; itk; ++itk){ if (itk.index() < k) continue; else break; } //At this point, we have the diagonal element R(k,k) // Determine a givens rotation which eliminates // the appropriate element in the current row of d givens.makeGivens(-itk.value(), sdiag(k)); // Compute the modified diagonal element of r and // the modified element of ((q transpose)*b,0). itk.valueRef() = givens.c() * itk.value() + givens.s() * sdiag(k); temp = givens.c() * wa(k) + givens.s() * qtbpj; qtbpj = -givens.s() * wa(k) + givens.c() * qtbpj; wa(k) = temp; // Accumulate the transformation in the remaining k row/column of R for (++itk; itk; ++itk) { i = itk.index(); temp = givens.c() * itk.value() + givens.s() * sdiag(i); sdiag(i) = -givens.s() * itk.value() + givens.c() * sdiag(i); itk.valueRef() = temp; } } } // Solve the triangular system for z. If the system is // singular, then obtain a least squares solution Index nsing; for(nsing = 0; nsing().solve/*InPlace*/(wa.head(nsing)); sdiag = R.diagonal(); // Permute the components of z back to components of x x = iPerm * wa; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_LMQRSOLV_H RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/LevenbergMarquardt.h0000644000176200001440000003176114562417221030505 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // Copyright (C) 2012 Desire Nuentsa // // The algorithm of this class initially comes from MINPACK whose original authors are: // Copyright Jorge More - Argonne National Laboratory // Copyright Burt Garbow - Argonne National Laboratory // Copyright Ken Hillstrom - Argonne National Laboratory // // This Source Code Form is subject to the terms of the Minpack license // (a BSD-like license) described in the campaigned CopyrightMINPACK.txt file. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LEVENBERGMARQUARDT_H #define EIGEN_LEVENBERGMARQUARDT_H namespace Eigen { namespace LevenbergMarquardtSpace { enum Status { NotStarted = -2, Running = -1, ImproperInputParameters = 0, RelativeReductionTooSmall = 1, RelativeErrorTooSmall = 2, RelativeErrorAndReductionTooSmall = 3, CosinusTooSmall = 4, TooManyFunctionEvaluation = 5, FtolTooSmall = 6, XtolTooSmall = 7, GtolTooSmall = 8, UserAsked = 9 }; } template struct DenseFunctor { typedef _Scalar Scalar; enum { InputsAtCompileTime = NX, ValuesAtCompileTime = NY }; typedef Matrix InputType; typedef Matrix ValueType; typedef Matrix JacobianType; typedef ColPivHouseholderQR QRSolver; const int m_inputs, m_values; DenseFunctor() : m_inputs(InputsAtCompileTime), m_values(ValuesAtCompileTime) {} DenseFunctor(int inputs, int values) : m_inputs(inputs), m_values(values) {} int inputs() const { return m_inputs; } int values() const { return m_values; } //int operator()(const InputType &x, ValueType& fvec) { } // should be defined in derived classes //int df(const InputType &x, JacobianType& fjac) { } // should be defined in derived classes }; template struct SparseFunctor { typedef _Scalar Scalar; typedef _Index Index; typedef Matrix InputType; typedef Matrix ValueType; typedef SparseMatrix JacobianType; typedef SparseQR > QRSolver; enum { InputsAtCompileTime = Dynamic, ValuesAtCompileTime = Dynamic }; SparseFunctor(int inputs, int values) : m_inputs(inputs), m_values(values) {} int inputs() const { return m_inputs; } int values() const { return m_values; } const int m_inputs, m_values; //int operator()(const InputType &x, ValueType& fvec) { } // to be defined in the functor //int df(const InputType &x, JacobianType& fjac) { } // to be defined in the functor if no automatic differentiation }; namespace internal { template void lmpar2(const QRSolver &qr, const VectorType &diag, const VectorType &qtb, typename VectorType::Scalar m_delta, typename VectorType::Scalar &par, VectorType &x); } /** * \ingroup NonLinearOptimization_Module * \brief Performs non linear optimization over a non-linear function, * using a variant of the Levenberg Marquardt algorithm. * * Check wikipedia for more information. * http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm */ template class LevenbergMarquardt : internal::no_assignment_operator { public: typedef _FunctorType FunctorType; typedef typename FunctorType::QRSolver QRSolver; typedef typename FunctorType::JacobianType JacobianType; typedef typename JacobianType::Scalar Scalar; typedef typename JacobianType::RealScalar RealScalar; typedef typename QRSolver::StorageIndex PermIndex; typedef Matrix FVectorType; typedef PermutationMatrix PermutationType; public: LevenbergMarquardt(FunctorType& functor) : m_functor(functor),m_nfev(0),m_njev(0),m_fnorm(0.0),m_gnorm(0), m_isInitialized(false),m_info(InvalidInput) { resetParameters(); m_useExternalScaling=false; } LevenbergMarquardtSpace::Status minimize(FVectorType &x); LevenbergMarquardtSpace::Status minimizeInit(FVectorType &x); LevenbergMarquardtSpace::Status minimizeOneStep(FVectorType &x); LevenbergMarquardtSpace::Status lmder1( FVectorType &x, const Scalar tol = std::sqrt(NumTraits::epsilon()) ); static LevenbergMarquardtSpace::Status lmdif1( FunctorType &functor, FVectorType &x, Index *nfev, const Scalar tol = std::sqrt(NumTraits::epsilon()) ); /** Sets the default parameters */ void resetParameters() { using std::sqrt; m_factor = 100.; m_maxfev = 400; m_ftol = sqrt(NumTraits::epsilon()); m_xtol = sqrt(NumTraits::epsilon()); m_gtol = 0. ; m_epsfcn = 0. ; } /** Sets the tolerance for the norm of the solution vector*/ void setXtol(RealScalar xtol) { m_xtol = xtol; } /** Sets the tolerance for the norm of the vector function*/ void setFtol(RealScalar ftol) { m_ftol = ftol; } /** Sets the tolerance for the norm of the gradient of the error vector*/ void setGtol(RealScalar gtol) { m_gtol = gtol; } /** Sets the step bound for the diagonal shift */ void setFactor(RealScalar factor) { m_factor = factor; } /** Sets the error precision */ void setEpsilon (RealScalar epsfcn) { m_epsfcn = epsfcn; } /** Sets the maximum number of function evaluation */ void setMaxfev(Index maxfev) {m_maxfev = maxfev; } /** Use an external Scaling. If set to true, pass a nonzero diagonal to diag() */ void setExternalScaling(bool value) {m_useExternalScaling = value; } /** \returns the tolerance for the norm of the solution vector */ RealScalar xtol() const {return m_xtol; } /** \returns the tolerance for the norm of the vector function */ RealScalar ftol() const {return m_ftol; } /** \returns the tolerance for the norm of the gradient of the error vector */ RealScalar gtol() const {return m_gtol; } /** \returns the step bound for the diagonal shift */ RealScalar factor() const {return m_factor; } /** \returns the error precision */ RealScalar epsilon() const {return m_epsfcn; } /** \returns the maximum number of function evaluation */ Index maxfev() const {return m_maxfev; } /** \returns a reference to the diagonal of the jacobian */ FVectorType& diag() {return m_diag; } /** \returns the number of iterations performed */ Index iterations() { return m_iter; } /** \returns the number of functions evaluation */ Index nfev() { return m_nfev; } /** \returns the number of jacobian evaluation */ Index njev() { return m_njev; } /** \returns the norm of current vector function */ RealScalar fnorm() {return m_fnorm; } /** \returns the norm of the gradient of the error */ RealScalar gnorm() {return m_gnorm; } /** \returns the LevenbergMarquardt parameter */ RealScalar lm_param(void) { return m_par; } /** \returns a reference to the current vector function */ FVectorType& fvec() {return m_fvec; } /** \returns a reference to the matrix where the current Jacobian matrix is stored */ JacobianType& jacobian() {return m_fjac; } /** \returns a reference to the triangular matrix R from the QR of the jacobian matrix. * \sa jacobian() */ JacobianType& matrixR() {return m_rfactor; } /** the permutation used in the QR factorization */ PermutationType permutation() {return m_permutation; } /** * \brief Reports whether the minimization was successful * \returns \c Success if the minimization was successful, * \c NumericalIssue if a numerical problem arises during the * minimization process, for example during the QR factorization * \c NoConvergence if the minimization did not converge after * the maximum number of function evaluation allowed * \c InvalidInput if the input matrix is invalid */ ComputationInfo info() const { return m_info; } private: JacobianType m_fjac; JacobianType m_rfactor; // The triangular matrix R from the QR of the jacobian matrix m_fjac FunctorType &m_functor; FVectorType m_fvec, m_qtf, m_diag; Index n; Index m; Index m_nfev; Index m_njev; RealScalar m_fnorm; // Norm of the current vector function RealScalar m_gnorm; //Norm of the gradient of the error RealScalar m_factor; // Index m_maxfev; // Maximum number of function evaluation RealScalar m_ftol; //Tolerance in the norm of the vector function RealScalar m_xtol; // RealScalar m_gtol; //tolerance of the norm of the error gradient RealScalar m_epsfcn; // Index m_iter; // Number of iterations performed RealScalar m_delta; bool m_useExternalScaling; PermutationType m_permutation; FVectorType m_wa1, m_wa2, m_wa3, m_wa4; //Temporary vectors RealScalar m_par; bool m_isInitialized; // Check whether the minimization step has been called ComputationInfo m_info; }; template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimize(FVectorType &x) { LevenbergMarquardtSpace::Status status = minimizeInit(x); if (status==LevenbergMarquardtSpace::ImproperInputParameters) { m_isInitialized = true; return status; } do { // std::cout << " uv " << x.transpose() << "\n"; status = minimizeOneStep(x); } while (status==LevenbergMarquardtSpace::Running); m_isInitialized = true; return status; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeInit(FVectorType &x) { n = x.size(); m = m_functor.values(); m_wa1.resize(n); m_wa2.resize(n); m_wa3.resize(n); m_wa4.resize(m); m_fvec.resize(m); //FIXME Sparse Case : Allocate space for the jacobian m_fjac.resize(m, n); // m_fjac.reserve(VectorXi::Constant(n,5)); // FIXME Find a better alternative if (!m_useExternalScaling) m_diag.resize(n); eigen_assert( (!m_useExternalScaling || m_diag.size()==n) && "When m_useExternalScaling is set, the caller must provide a valid 'm_diag'"); m_qtf.resize(n); /* Function Body */ m_nfev = 0; m_njev = 0; /* check the input parameters for errors. */ if (n <= 0 || m < n || m_ftol < 0. || m_xtol < 0. || m_gtol < 0. || m_maxfev <= 0 || m_factor <= 0.){ m_info = InvalidInput; return LevenbergMarquardtSpace::ImproperInputParameters; } if (m_useExternalScaling) for (Index j = 0; j < n; ++j) if (m_diag[j] <= 0.) { m_info = InvalidInput; return LevenbergMarquardtSpace::ImproperInputParameters; } /* evaluate the function at the starting point */ /* and calculate its norm. */ m_nfev = 1; if ( m_functor(x, m_fvec) < 0) return LevenbergMarquardtSpace::UserAsked; m_fnorm = m_fvec.stableNorm(); /* initialize levenberg-marquardt parameter and iteration counter. */ m_par = 0.; m_iter = 1; return LevenbergMarquardtSpace::NotStarted; } template LevenbergMarquardtSpace::Status LevenbergMarquardt::lmder1( FVectorType &x, const Scalar tol ) { n = x.size(); m = m_functor.values(); /* check the input parameters for errors. */ if (n <= 0 || m < n || tol < 0.) return LevenbergMarquardtSpace::ImproperInputParameters; resetParameters(); m_ftol = tol; m_xtol = tol; m_maxfev = 100*(n+1); return minimize(x); } template LevenbergMarquardtSpace::Status LevenbergMarquardt::lmdif1( FunctorType &functor, FVectorType &x, Index *nfev, const Scalar tol ) { Index n = x.size(); Index m = functor.values(); /* check the input parameters for errors. */ if (n <= 0 || m < n || tol < 0.) return LevenbergMarquardtSpace::ImproperInputParameters; NumericalDiff numDiff(functor); // embedded LevenbergMarquardt LevenbergMarquardt > lm(numDiff); lm.setFtol(tol); lm.setXtol(tol); lm.setMaxfev(200*(n+1)); LevenbergMarquardtSpace::Status info = LevenbergMarquardtSpace::Status(lm.minimize(x)); if (nfev) * nfev = lm.nfev(); return info; } } // end namespace Eigen #endif // EIGEN_LEVENBERGMARQUARDT_H RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMcovar.h0000644000176200001440000000461314107270226026247 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This code initially comes from MINPACK whose original authors are: // Copyright Jorge More - Argonne National Laboratory // Copyright Burt Garbow - Argonne National Laboratory // Copyright Ken Hillstrom - Argonne National Laboratory // // This Source Code Form is subject to the terms of the Minpack license // (a BSD-like license) described in the campaigned CopyrightMINPACK.txt file. #ifndef EIGEN_LMCOVAR_H #define EIGEN_LMCOVAR_H namespace Eigen { namespace internal { template void covar( Matrix< Scalar, Dynamic, Dynamic > &r, const VectorXi& ipvt, Scalar tol = std::sqrt(NumTraits::epsilon()) ) { using std::abs; /* Local variables */ Index i, j, k, l, ii, jj; bool sing; Scalar temp; /* Function Body */ const Index n = r.cols(); const Scalar tolr = tol * abs(r(0,0)); Matrix< Scalar, Dynamic, 1 > wa(n); eigen_assert(ipvt.size()==n); /* form the inverse of r in the full upper triangle of r. */ l = -1; for (k = 0; k < n; ++k) if (abs(r(k,k)) > tolr) { r(k,k) = 1. / r(k,k); for (j = 0; j <= k-1; ++j) { temp = r(k,k) * r(j,k); r(j,k) = 0.; r.col(k).head(j+1) -= r.col(j).head(j+1) * temp; } l = k; } /* form the full upper triangle of the inverse of (r transpose)*r */ /* in the full upper triangle of r. */ for (k = 0; k <= l; ++k) { for (j = 0; j <= k-1; ++j) r.col(j).head(j+1) += r.col(k).head(j+1) * r(j,k); r.col(k).head(k+1) *= r(k,k); } /* form the full lower triangle of the covariance matrix */ /* in the strict lower triangle of r and in wa. */ for (j = 0; j < n; ++j) { jj = ipvt[j]; sing = j > l; for (i = 0; i <= j; ++i) { if (sing) r(i,j) = 0.; ii = ipvt[i]; if (ii > jj) r(ii,jj) = r(i,j); if (ii < jj) r(jj,ii) = r(i,j); } wa[jj] = r(j,j); } /* symmetrize the covariance matrix in r. */ r.topLeftCorner(n,n).template triangularView() = r.topLeftCorner(n,n).transpose(); r.diagonal() = wa; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_LMCOVAR_H RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/CopyrightMINPACK.txt0000644000176200001440000000422214107270226030243 0ustar liggesusersMinpack Copyright Notice (1999) University of Chicago. All rights reserved Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The end-user documentation included with the redistribution, if any, must include the following acknowledgment: "This product includes software developed by the University of Chicago, as Operator of Argonne National Laboratory. Alternately, this acknowledgment may appear in the software itself, if and wherever such third-party acknowledgments normally appear. 4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS" WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4) DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL BE CORRECTED. 5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT, INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT (INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE, EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE POSSIBILITY OF SUCH LOSS OR DAMAGES. RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMpar.h0000644000176200001440000001165714107270226025725 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This code initially comes from MINPACK whose original authors are: // Copyright Jorge More - Argonne National Laboratory // Copyright Burt Garbow - Argonne National Laboratory // Copyright Ken Hillstrom - Argonne National Laboratory // // This Source Code Form is subject to the terms of the Minpack license // (a BSD-like license) described in the campaigned CopyrightMINPACK.txt file. #ifndef EIGEN_LMPAR_H #define EIGEN_LMPAR_H namespace Eigen { namespace internal { template void lmpar2( const QRSolver &qr, const VectorType &diag, const VectorType &qtb, typename VectorType::Scalar m_delta, typename VectorType::Scalar &par, VectorType &x) { using std::sqrt; using std::abs; typedef typename QRSolver::MatrixType MatrixType; typedef typename QRSolver::Scalar Scalar; // typedef typename QRSolver::StorageIndex StorageIndex; /* Local variables */ Index j; Scalar fp; Scalar parc, parl; Index iter; Scalar temp, paru; Scalar gnorm; Scalar dxnorm; // Make a copy of the triangular factor. // This copy is modified during call the qrsolv MatrixType s; s = qr.matrixR(); /* Function Body */ const Scalar dwarf = (std::numeric_limits::min)(); const Index n = qr.matrixR().cols(); eigen_assert(n==diag.size()); eigen_assert(n==qtb.size()); VectorType wa1, wa2; /* compute and store in x the gauss-newton direction. if the */ /* jacobian is rank-deficient, obtain a least squares solution. */ // const Index rank = qr.nonzeroPivots(); // exactly double(0.) const Index rank = qr.rank(); // use a threshold wa1 = qtb; wa1.tail(n-rank).setZero(); //FIXME There is no solve in place for sparse triangularView wa1.head(rank) = s.topLeftCorner(rank,rank).template triangularView().solve(qtb.head(rank)); x = qr.colsPermutation()*wa1; /* initialize the iteration counter. */ /* evaluate the function at the origin, and test */ /* for acceptance of the gauss-newton direction. */ iter = 0; wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); fp = dxnorm - m_delta; if (fp <= Scalar(0.1) * m_delta) { par = 0; return; } /* if the jacobian is not rank deficient, the newton */ /* step provides a lower bound, parl, for the zero of */ /* the function. otherwise set this bound to zero. */ parl = 0.; if (rank==n) { wa1 = qr.colsPermutation().inverse() * diag.cwiseProduct(wa2)/dxnorm; s.topLeftCorner(n,n).transpose().template triangularView().solveInPlace(wa1); temp = wa1.blueNorm(); parl = fp / m_delta / temp / temp; } /* calculate an upper bound, paru, for the zero of the function. */ for (j = 0; j < n; ++j) wa1[j] = s.col(j).head(j+1).dot(qtb.head(j+1)) / diag[qr.colsPermutation().indices()(j)]; gnorm = wa1.stableNorm(); paru = gnorm / m_delta; if (paru == 0.) paru = dwarf / (std::min)(m_delta,Scalar(0.1)); /* if the input par lies outside of the interval (parl,paru), */ /* set par to the closer endpoint. */ par = (std::max)(par,parl); par = (std::min)(par,paru); if (par == 0.) par = gnorm / dxnorm; /* beginning of an iteration. */ while (true) { ++iter; /* evaluate the function at the current value of par. */ if (par == 0.) par = (std::max)(dwarf,Scalar(.001) * paru); /* Computing MAX */ wa1 = sqrt(par)* diag; VectorType sdiag(n); lmqrsolv(s, qr.colsPermutation(), wa1, qtb, x, sdiag); wa2 = diag.cwiseProduct(x); dxnorm = wa2.blueNorm(); temp = fp; fp = dxnorm - m_delta; /* if the function is small enough, accept the current value */ /* of par. also test for the exceptional cases where parl */ /* is zero or the number of iterations has reached 10. */ if (abs(fp) <= Scalar(0.1) * m_delta || (parl == 0. && fp <= temp && temp < 0.) || iter == 10) break; /* compute the newton correction. */ wa1 = qr.colsPermutation().inverse() * diag.cwiseProduct(wa2/dxnorm); // we could almost use this here, but the diagonal is outside qr, in sdiag[] for (j = 0; j < n; ++j) { wa1[j] /= sdiag[j]; temp = wa1[j]; for (Index i = j+1; i < n; ++i) wa1[i] -= s.coeff(i,j) * temp; } temp = wa1.blueNorm(); parc = fp / m_delta / temp / temp; /* depending on the sign of the function, update parl or paru. */ if (fp > 0.) parl = (std::max)(parl,par); if (fp < 0.) paru = (std::min)(paru,par); /* compute an improved estimate for par. */ par = (std::max)(parl,par+parc); } if (iter == 0) par = 0.; return; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_LMPAR_H RcppEigen/inst/include/unsupported/Eigen/src/LevenbergMarquardt/LMonestep.h0000644000176200001440000001477014107270226026617 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This code initially comes from MINPACK whose original authors are: // Copyright Jorge More - Argonne National Laboratory // Copyright Burt Garbow - Argonne National Laboratory // Copyright Ken Hillstrom - Argonne National Laboratory // // This Source Code Form is subject to the terms of the Minpack license // (a BSD-like license) described in the campaigned CopyrightMINPACK.txt file. #ifndef EIGEN_LMONESTEP_H #define EIGEN_LMONESTEP_H namespace Eigen { template LevenbergMarquardtSpace::Status LevenbergMarquardt::minimizeOneStep(FVectorType &x) { using std::abs; using std::sqrt; RealScalar temp, temp1,temp2; RealScalar ratio; RealScalar pnorm, xnorm, fnorm1, actred, dirder, prered; eigen_assert(x.size()==n); // check the caller is not cheating us temp = 0.0; xnorm = 0.0; /* calculate the jacobian matrix. */ Index df_ret = m_functor.df(x, m_fjac); if (df_ret<0) return LevenbergMarquardtSpace::UserAsked; if (df_ret>0) // numerical diff, we evaluated the function df_ret times m_nfev += df_ret; else m_njev++; /* compute the qr factorization of the jacobian. */ for (int j = 0; j < x.size(); ++j) m_wa2(j) = m_fjac.col(j).blueNorm(); QRSolver qrfac(m_fjac); if(qrfac.info() != Success) { m_info = NumericalIssue; return LevenbergMarquardtSpace::ImproperInputParameters; } // Make a copy of the first factor with the associated permutation m_rfactor = qrfac.matrixR(); m_permutation = (qrfac.colsPermutation()); /* on the first iteration and if external scaling is not used, scale according */ /* to the norms of the columns of the initial jacobian. */ if (m_iter == 1) { if (!m_useExternalScaling) for (Index j = 0; j < n; ++j) m_diag[j] = (m_wa2[j]==0.)? 1. : m_wa2[j]; /* on the first iteration, calculate the norm of the scaled x */ /* and initialize the step bound m_delta. */ xnorm = m_diag.cwiseProduct(x).stableNorm(); m_delta = m_factor * xnorm; if (m_delta == 0.) m_delta = m_factor; } /* form (q transpose)*m_fvec and store the first n components in */ /* m_qtf. */ m_wa4 = m_fvec; m_wa4 = qrfac.matrixQ().adjoint() * m_fvec; m_qtf = m_wa4.head(n); /* compute the norm of the scaled gradient. */ m_gnorm = 0.; if (m_fnorm != 0.) for (Index j = 0; j < n; ++j) if (m_wa2[m_permutation.indices()[j]] != 0.) m_gnorm = (std::max)(m_gnorm, abs( m_rfactor.col(j).head(j+1).dot(m_qtf.head(j+1)/m_fnorm) / m_wa2[m_permutation.indices()[j]])); /* test for convergence of the gradient norm. */ if (m_gnorm <= m_gtol) { m_info = Success; return LevenbergMarquardtSpace::CosinusTooSmall; } /* rescale if necessary. */ if (!m_useExternalScaling) m_diag = m_diag.cwiseMax(m_wa2); do { /* determine the levenberg-marquardt parameter. */ internal::lmpar2(qrfac, m_diag, m_qtf, m_delta, m_par, m_wa1); /* store the direction p and x + p. calculate the norm of p. */ m_wa1 = -m_wa1; m_wa2 = x + m_wa1; pnorm = m_diag.cwiseProduct(m_wa1).stableNorm(); /* on the first iteration, adjust the initial step bound. */ if (m_iter == 1) m_delta = (std::min)(m_delta,pnorm); /* evaluate the function at x + p and calculate its norm. */ if ( m_functor(m_wa2, m_wa4) < 0) return LevenbergMarquardtSpace::UserAsked; ++m_nfev; fnorm1 = m_wa4.stableNorm(); /* compute the scaled actual reduction. */ actred = -1.; if (Scalar(.1) * fnorm1 < m_fnorm) actred = 1. - numext::abs2(fnorm1 / m_fnorm); /* compute the scaled predicted reduction and */ /* the scaled directional derivative. */ m_wa3 = m_rfactor.template triangularView() * (m_permutation.inverse() *m_wa1); temp1 = numext::abs2(m_wa3.stableNorm() / m_fnorm); temp2 = numext::abs2(sqrt(m_par) * pnorm / m_fnorm); prered = temp1 + temp2 / Scalar(.5); dirder = -(temp1 + temp2); /* compute the ratio of the actual to the predicted */ /* reduction. */ ratio = 0.; if (prered != 0.) ratio = actred / prered; /* update the step bound. */ if (ratio <= Scalar(.25)) { if (actred >= 0.) temp = RealScalar(.5); if (actred < 0.) temp = RealScalar(.5) * dirder / (dirder + RealScalar(.5) * actred); if (RealScalar(.1) * fnorm1 >= m_fnorm || temp < RealScalar(.1)) temp = Scalar(.1); /* Computing MIN */ m_delta = temp * (std::min)(m_delta, pnorm / RealScalar(.1)); m_par /= temp; } else if (!(m_par != 0. && ratio < RealScalar(.75))) { m_delta = pnorm / RealScalar(.5); m_par = RealScalar(.5) * m_par; } /* test for successful iteration. */ if (ratio >= RealScalar(1e-4)) { /* successful iteration. update x, m_fvec, and their norms. */ x = m_wa2; m_wa2 = m_diag.cwiseProduct(x); m_fvec = m_wa4; xnorm = m_wa2.stableNorm(); m_fnorm = fnorm1; ++m_iter; } /* tests for convergence. */ if (abs(actred) <= m_ftol && prered <= m_ftol && Scalar(.5) * ratio <= 1. && m_delta <= m_xtol * xnorm) { m_info = Success; return LevenbergMarquardtSpace::RelativeErrorAndReductionTooSmall; } if (abs(actred) <= m_ftol && prered <= m_ftol && Scalar(.5) * ratio <= 1.) { m_info = Success; return LevenbergMarquardtSpace::RelativeReductionTooSmall; } if (m_delta <= m_xtol * xnorm) { m_info = Success; return LevenbergMarquardtSpace::RelativeErrorTooSmall; } /* tests for termination and stringent tolerances. */ if (m_nfev >= m_maxfev) { m_info = NoConvergence; return LevenbergMarquardtSpace::TooManyFunctionEvaluation; } if (abs(actred) <= NumTraits::epsilon() && prered <= NumTraits::epsilon() && Scalar(.5) * ratio <= 1.) { m_info = Success; return LevenbergMarquardtSpace::FtolTooSmall; } if (m_delta <= NumTraits::epsilon() * xnorm) { m_info = Success; return LevenbergMarquardtSpace::XtolTooSmall; } if (m_gnorm <= NumTraits::epsilon()) { m_info = Success; return LevenbergMarquardtSpace::GtolTooSmall; } } while (ratio < Scalar(1e-4)); return LevenbergMarquardtSpace::Running; } } // end namespace Eigen #endif // EIGEN_LMONESTEP_H RcppEigen/inst/include/unsupported/Eigen/src/NumericalDiff/0000755000176200001440000000000014107270226023445 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/NumericalDiff/NumericalDiff.h0000644000176200001440000000766414107270226026343 0ustar liggesusers// -*- coding: utf-8 // vim: set fileencoding=utf-8 // This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Thomas Capricelli // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NUMERICAL_DIFF_H #define EIGEN_NUMERICAL_DIFF_H namespace Eigen { enum NumericalDiffMode { Forward, Central }; /** * This class allows you to add a method df() to your functor, which will * use numerical differentiation to compute an approximate of the * derivative for the functor. Of course, if you have an analytical form * for the derivative, you should rather implement df() by yourself. * * More information on * http://en.wikipedia.org/wiki/Numerical_differentiation * * Currently only "Forward" and "Central" scheme are implemented. */ template class NumericalDiff : public _Functor { public: typedef _Functor Functor; typedef typename Functor::Scalar Scalar; typedef typename Functor::InputType InputType; typedef typename Functor::ValueType ValueType; typedef typename Functor::JacobianType JacobianType; NumericalDiff(Scalar _epsfcn=0.) : Functor(), epsfcn(_epsfcn) {} NumericalDiff(const Functor& f, Scalar _epsfcn=0.) : Functor(f), epsfcn(_epsfcn) {} // forward constructors template NumericalDiff(const T0& a0) : Functor(a0), epsfcn(0) {} template NumericalDiff(const T0& a0, const T1& a1) : Functor(a0, a1), epsfcn(0) {} template NumericalDiff(const T0& a0, const T1& a1, const T2& a2) : Functor(a0, a1, a2), epsfcn(0) {} enum { InputsAtCompileTime = Functor::InputsAtCompileTime, ValuesAtCompileTime = Functor::ValuesAtCompileTime }; /** * return the number of evaluation of functor */ int df(const InputType& _x, JacobianType &jac) const { using std::sqrt; using std::abs; /* Local variables */ Scalar h; int nfev=0; const typename InputType::Index n = _x.size(); const Scalar eps = sqrt(((std::max)(epsfcn,NumTraits::epsilon() ))); ValueType val1, val2; InputType x = _x; // TODO : we should do this only if the size is not already known val1.resize(Functor::values()); val2.resize(Functor::values()); // initialization switch(mode) { case Forward: // compute f(x) Functor::operator()(x, val1); nfev++; break; case Central: // do nothing break; default: eigen_assert(false); }; // Function Body for (int j = 0; j < n; ++j) { h = eps * abs(x[j]); if (h == 0.) { h = eps; } switch(mode) { case Forward: x[j] += h; Functor::operator()(x, val2); nfev++; x[j] = _x[j]; jac.col(j) = (val2-val1)/h; break; case Central: x[j] += h; Functor::operator()(x, val2); nfev++; x[j] -= 2*h; Functor::operator()(x, val1); nfev++; x[j] = _x[j]; jac.col(j) = (val2-val1)/(2*h); break; default: eigen_assert(false); }; } return nfev; } private: Scalar epsfcn; NumericalDiff& operator=(const NumericalDiff&); }; } // end namespace Eigen //vim: ai ts=4 sts=4 et sw=4 #endif // EIGEN_NUMERICAL_DIFF_H RcppEigen/inst/include/unsupported/Eigen/src/BVH/0000755000176200001440000000000014562417221021357 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/BVH/KdBVH.h0000644000176200001440000002171614562417221022435 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Ilya Baran // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef KDBVH_H_INCLUDED #define KDBVH_H_INCLUDED namespace Eigen { namespace internal { //internal pair class for the BVH--used instead of std::pair because of alignment template struct vector_int_pair { EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(Scalar, Dim) typedef Matrix VectorType; vector_int_pair(const VectorType &v, int i) : first(v), second(i) {} VectorType first; int second; }; //these templates help the tree initializer get the bounding boxes either from a provided //iterator range or using bounding_box in a unified way template struct get_boxes_helper { void operator()(const ObjectList &objects, BoxIter boxBegin, BoxIter boxEnd, VolumeList &outBoxes) { outBoxes.insert(outBoxes.end(), boxBegin, boxEnd); eigen_assert(outBoxes.size() == objects.size()); EIGEN_ONLY_USED_FOR_DEBUG(objects); } }; template struct get_boxes_helper { void operator()(const ObjectList &objects, int, int, VolumeList &outBoxes) { outBoxes.reserve(objects.size()); for(int i = 0; i < (int)objects.size(); ++i) outBoxes.push_back(bounding_box(objects[i])); } }; } // end namespace internal /** \class KdBVH * \brief A simple bounding volume hierarchy based on AlignedBox * * \param _Scalar The underlying scalar type of the bounding boxes * \param _Dim The dimension of the space in which the hierarchy lives * \param _Object The object type that lives in the hierarchy. It must have value semantics. Either bounding_box(_Object) must * be defined and return an AlignedBox<_Scalar, _Dim> or bounding boxes must be provided to the tree initializer. * * This class provides a simple (as opposed to optimized) implementation of a bounding volume hierarchy analogous to a Kd-tree. * Given a sequence of objects, it computes their bounding boxes, constructs a Kd-tree of their centers * and builds a BVH with the structure of that Kd-tree. When the elements of the tree are too expensive to be copied around, * it is useful for _Object to be a pointer. */ template class KdBVH { public: enum { Dim = _Dim }; typedef _Object Object; typedef std::vector > ObjectList; typedef _Scalar Scalar; typedef AlignedBox Volume; typedef std::vector > VolumeList; typedef int Index; typedef const int *VolumeIterator; //the iterators are just pointers into the tree's vectors typedef const Object *ObjectIterator; KdBVH() {} /** Given an iterator range over \a Object references, constructs the BVH. Requires that bounding_box(Object) return a Volume. */ template KdBVH(Iter begin, Iter end) { init(begin, end, 0, 0); } //int is recognized by init as not being an iterator type /** Given an iterator range over \a Object references and an iterator range over their bounding boxes, constructs the BVH */ template KdBVH(OIter begin, OIter end, BIter boxBegin, BIter boxEnd) { init(begin, end, boxBegin, boxEnd); } /** Given an iterator range over \a Object references, constructs the BVH, overwriting whatever is in there currently. * Requires that bounding_box(Object) return a Volume. */ template void init(Iter begin, Iter end) { init(begin, end, 0, 0); } /** Given an iterator range over \a Object references and an iterator range over their bounding boxes, * constructs the BVH, overwriting whatever is in there currently. */ template void init(OIter begin, OIter end, BIter boxBegin, BIter boxEnd) { objects.clear(); boxes.clear(); children.clear(); objects.insert(objects.end(), begin, end); int n = static_cast(objects.size()); if(n < 2) return; //if we have at most one object, we don't need any internal nodes VolumeList objBoxes; VIPairList objCenters; //compute the bounding boxes depending on BIter type internal::get_boxes_helper()(objects, boxBegin, boxEnd, objBoxes); objCenters.reserve(n); boxes.reserve(n - 1); children.reserve(2 * n - 2); for(int i = 0; i < n; ++i) objCenters.push_back(VIPair(objBoxes[i].center(), i)); build(objCenters, 0, n, objBoxes, 0); //the recursive part of the algorithm ObjectList tmp(n); tmp.swap(objects); for(int i = 0; i < n; ++i) objects[i] = tmp[objCenters[i].second]; } /** \returns the index of the root of the hierarchy */ inline Index getRootIndex() const { return (int)boxes.size() - 1; } /** Given an \a index of a node, on exit, \a outVBegin and \a outVEnd range over the indices of the volume children of the node * and \a outOBegin and \a outOEnd range over the object children of the node */ EIGEN_STRONG_INLINE void getChildren(Index index, VolumeIterator &outVBegin, VolumeIterator &outVEnd, ObjectIterator &outOBegin, ObjectIterator &outOEnd) const { //inlining this function should open lots of optimization opportunities to the compiler if(index < 0) { outVBegin = outVEnd; if(!objects.empty()) outOBegin = &(objects[0]); outOEnd = outOBegin + objects.size(); //output all objects--necessary when the tree has only one object return; } int numBoxes = static_cast(boxes.size()); int idx = index * 2; if(children[idx + 1] < numBoxes) { //second index is always bigger outVBegin = &(children[idx]); outVEnd = outVBegin + 2; outOBegin = outOEnd; } else if(children[idx] >= numBoxes) { //if both children are objects outVBegin = outVEnd; outOBegin = &(objects[children[idx] - numBoxes]); outOEnd = outOBegin + 2; } else { //if the first child is a volume and the second is an object outVBegin = &(children[idx]); outVEnd = outVBegin + 1; outOBegin = &(objects[children[idx + 1] - numBoxes]); outOEnd = outOBegin + 1; } } /** \returns the bounding box of the node at \a index */ inline const Volume &getVolume(Index index) const { return boxes[index]; } private: typedef internal::vector_int_pair VIPair; typedef std::vector > VIPairList; typedef Matrix VectorType; struct VectorComparator //compares vectors, or more specifically, VIPairs along a particular dimension { VectorComparator(int inDim) : dim(inDim) {} inline bool operator()(const VIPair &v1, const VIPair &v2) const { return v1.first[dim] < v2.first[dim]; } int dim; }; //Build the part of the tree between objects[from] and objects[to] (not including objects[to]). //This routine partitions the objCenters in [from, to) along the dimension dim, recursively constructs //the two halves, and adds their parent node. TODO: a cache-friendlier layout void build(VIPairList &objCenters, int from, int to, const VolumeList &objBoxes, int dim) { eigen_assert(to - from > 1); if(to - from == 2) { boxes.push_back(objBoxes[objCenters[from].second].merged(objBoxes[objCenters[from + 1].second])); children.push_back(from + (int)objects.size() - 1); //there are objects.size() - 1 tree nodes children.push_back(from + (int)objects.size()); } else if(to - from == 3) { int mid = from + 2; std::nth_element(objCenters.begin() + from, objCenters.begin() + mid, objCenters.begin() + to, VectorComparator(dim)); //partition build(objCenters, from, mid, objBoxes, (dim + 1) % Dim); int idx1 = (int)boxes.size() - 1; boxes.push_back(boxes[idx1].merged(objBoxes[objCenters[mid].second])); children.push_back(idx1); children.push_back(mid + (int)objects.size() - 1); } else { int mid = from + (to - from) / 2; nth_element(objCenters.begin() + from, objCenters.begin() + mid, objCenters.begin() + to, VectorComparator(dim)); //partition build(objCenters, from, mid, objBoxes, (dim + 1) % Dim); int idx1 = (int)boxes.size() - 1; build(objCenters, mid, to, objBoxes, (dim + 1) % Dim); int idx2 = (int)boxes.size() - 1; boxes.push_back(boxes[idx1].merged(boxes[idx2])); children.push_back(idx1); children.push_back(idx2); } } std::vector children; //children of x are children[2x] and children[2x+1], indices bigger than boxes.size() index into objects. VolumeList boxes; ObjectList objects; }; } // end namespace Eigen #endif //KDBVH_H_INCLUDED RcppEigen/inst/include/unsupported/Eigen/src/BVH/BVAlgorithms.h0000644000176200001440000003126014107270226024070 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Ilya Baran // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BVALGORITHMS_H #define EIGEN_BVALGORITHMS_H namespace Eigen { namespace internal { #ifndef EIGEN_PARSED_BY_DOXYGEN template bool intersect_helper(const BVH &tree, Intersector &intersector, typename BVH::Index root) { typedef typename BVH::Index Index; typedef typename BVH::VolumeIterator VolIter; typedef typename BVH::ObjectIterator ObjIter; VolIter vBegin = VolIter(), vEnd = VolIter(); ObjIter oBegin = ObjIter(), oEnd = ObjIter(); std::vector todo(1, root); while(!todo.empty()) { tree.getChildren(todo.back(), vBegin, vEnd, oBegin, oEnd); todo.pop_back(); for(; vBegin != vEnd; ++vBegin) //go through child volumes if(intersector.intersectVolume(tree.getVolume(*vBegin))) todo.push_back(*vBegin); for(; oBegin != oEnd; ++oBegin) //go through child objects if(intersector.intersectObject(*oBegin)) return true; //intersector said to stop query } return false; } #endif //not EIGEN_PARSED_BY_DOXYGEN template struct intersector_helper1 { intersector_helper1(const Object2 &inStored, Intersector &in) : stored(inStored), intersector(in) {} bool intersectVolume(const Volume1 &vol) { return intersector.intersectVolumeObject(vol, stored); } bool intersectObject(const Object1 &obj) { return intersector.intersectObjectObject(obj, stored); } Object2 stored; Intersector &intersector; private: intersector_helper1& operator=(const intersector_helper1&); }; template struct intersector_helper2 { intersector_helper2(const Object1 &inStored, Intersector &in) : stored(inStored), intersector(in) {} bool intersectVolume(const Volume2 &vol) { return intersector.intersectObjectVolume(stored, vol); } bool intersectObject(const Object2 &obj) { return intersector.intersectObjectObject(stored, obj); } Object1 stored; Intersector &intersector; private: intersector_helper2& operator=(const intersector_helper2&); }; } // end namespace internal /** Given a BVH, runs the query encapsulated by \a intersector. * The Intersector type must provide the following members: \code bool intersectVolume(const BVH::Volume &volume) //returns true if volume intersects the query bool intersectObject(const BVH::Object &object) //returns true if the search should terminate immediately \endcode */ template void BVIntersect(const BVH &tree, Intersector &intersector) { internal::intersect_helper(tree, intersector, tree.getRootIndex()); } /** Given two BVH's, runs the query on their Cartesian product encapsulated by \a intersector. * The Intersector type must provide the following members: \code bool intersectVolumeVolume(const BVH1::Volume &v1, const BVH2::Volume &v2) //returns true if product of volumes intersects the query bool intersectVolumeObject(const BVH1::Volume &v1, const BVH2::Object &o2) //returns true if the volume-object product intersects the query bool intersectObjectVolume(const BVH1::Object &o1, const BVH2::Volume &v2) //returns true if the volume-object product intersects the query bool intersectObjectObject(const BVH1::Object &o1, const BVH2::Object &o2) //returns true if the search should terminate immediately \endcode */ template void BVIntersect(const BVH1 &tree1, const BVH2 &tree2, Intersector &intersector) //TODO: tandem descent when it makes sense { typedef typename BVH1::Index Index1; typedef typename BVH2::Index Index2; typedef internal::intersector_helper1 Helper1; typedef internal::intersector_helper2 Helper2; typedef typename BVH1::VolumeIterator VolIter1; typedef typename BVH1::ObjectIterator ObjIter1; typedef typename BVH2::VolumeIterator VolIter2; typedef typename BVH2::ObjectIterator ObjIter2; VolIter1 vBegin1 = VolIter1(), vEnd1 = VolIter1(); ObjIter1 oBegin1 = ObjIter1(), oEnd1 = ObjIter1(); VolIter2 vBegin2 = VolIter2(), vEnd2 = VolIter2(), vCur2 = VolIter2(); ObjIter2 oBegin2 = ObjIter2(), oEnd2 = ObjIter2(), oCur2 = ObjIter2(); std::vector > todo(1, std::make_pair(tree1.getRootIndex(), tree2.getRootIndex())); while(!todo.empty()) { tree1.getChildren(todo.back().first, vBegin1, vEnd1, oBegin1, oEnd1); tree2.getChildren(todo.back().second, vBegin2, vEnd2, oBegin2, oEnd2); todo.pop_back(); for(; vBegin1 != vEnd1; ++vBegin1) { //go through child volumes of first tree const typename BVH1::Volume &vol1 = tree1.getVolume(*vBegin1); for(vCur2 = vBegin2; vCur2 != vEnd2; ++vCur2) { //go through child volumes of second tree if(intersector.intersectVolumeVolume(vol1, tree2.getVolume(*vCur2))) todo.push_back(std::make_pair(*vBegin1, *vCur2)); } for(oCur2 = oBegin2; oCur2 != oEnd2; ++oCur2) {//go through child objects of second tree Helper1 helper(*oCur2, intersector); if(internal::intersect_helper(tree1, helper, *vBegin1)) return; //intersector said to stop query } } for(; oBegin1 != oEnd1; ++oBegin1) { //go through child objects of first tree for(vCur2 = vBegin2; vCur2 != vEnd2; ++vCur2) { //go through child volumes of second tree Helper2 helper(*oBegin1, intersector); if(internal::intersect_helper(tree2, helper, *vCur2)) return; //intersector said to stop query } for(oCur2 = oBegin2; oCur2 != oEnd2; ++oCur2) {//go through child objects of second tree if(intersector.intersectObjectObject(*oBegin1, *oCur2)) return; //intersector said to stop query } } } } namespace internal { #ifndef EIGEN_PARSED_BY_DOXYGEN template typename Minimizer::Scalar minimize_helper(const BVH &tree, Minimizer &minimizer, typename BVH::Index root, typename Minimizer::Scalar minimum) { typedef typename Minimizer::Scalar Scalar; typedef typename BVH::Index Index; typedef std::pair QueueElement; //first element is priority typedef typename BVH::VolumeIterator VolIter; typedef typename BVH::ObjectIterator ObjIter; VolIter vBegin = VolIter(), vEnd = VolIter(); ObjIter oBegin = ObjIter(), oEnd = ObjIter(); std::priority_queue, std::greater > todo; //smallest is at the top todo.push(std::make_pair(Scalar(), root)); while(!todo.empty()) { tree.getChildren(todo.top().second, vBegin, vEnd, oBegin, oEnd); todo.pop(); for(; oBegin != oEnd; ++oBegin) //go through child objects minimum = (std::min)(minimum, minimizer.minimumOnObject(*oBegin)); for(; vBegin != vEnd; ++vBegin) { //go through child volumes Scalar val = minimizer.minimumOnVolume(tree.getVolume(*vBegin)); if(val < minimum) todo.push(std::make_pair(val, *vBegin)); } } return minimum; } #endif //not EIGEN_PARSED_BY_DOXYGEN template struct minimizer_helper1 { typedef typename Minimizer::Scalar Scalar; minimizer_helper1(const Object2 &inStored, Minimizer &m) : stored(inStored), minimizer(m) {} Scalar minimumOnVolume(const Volume1 &vol) { return minimizer.minimumOnVolumeObject(vol, stored); } Scalar minimumOnObject(const Object1 &obj) { return minimizer.minimumOnObjectObject(obj, stored); } Object2 stored; Minimizer &minimizer; private: minimizer_helper1& operator=(const minimizer_helper1&); }; template struct minimizer_helper2 { typedef typename Minimizer::Scalar Scalar; minimizer_helper2(const Object1 &inStored, Minimizer &m) : stored(inStored), minimizer(m) {} Scalar minimumOnVolume(const Volume2 &vol) { return minimizer.minimumOnObjectVolume(stored, vol); } Scalar minimumOnObject(const Object2 &obj) { return minimizer.minimumOnObjectObject(stored, obj); } Object1 stored; Minimizer &minimizer; private: minimizer_helper2& operator=(const minimizer_helper2&); }; } // end namespace internal /** Given a BVH, runs the query encapsulated by \a minimizer. * \returns the minimum value. * The Minimizer type must provide the following members: \code typedef Scalar //the numeric type of what is being minimized--not necessarily the Scalar type of the BVH (if it has one) Scalar minimumOnVolume(const BVH::Volume &volume) Scalar minimumOnObject(const BVH::Object &object) \endcode */ template typename Minimizer::Scalar BVMinimize(const BVH &tree, Minimizer &minimizer) { return internal::minimize_helper(tree, minimizer, tree.getRootIndex(), (std::numeric_limits::max)()); } /** Given two BVH's, runs the query on their cartesian product encapsulated by \a minimizer. * \returns the minimum value. * The Minimizer type must provide the following members: \code typedef Scalar //the numeric type of what is being minimized--not necessarily the Scalar type of the BVH (if it has one) Scalar minimumOnVolumeVolume(const BVH1::Volume &v1, const BVH2::Volume &v2) Scalar minimumOnVolumeObject(const BVH1::Volume &v1, const BVH2::Object &o2) Scalar minimumOnObjectVolume(const BVH1::Object &o1, const BVH2::Volume &v2) Scalar minimumOnObjectObject(const BVH1::Object &o1, const BVH2::Object &o2) \endcode */ template typename Minimizer::Scalar BVMinimize(const BVH1 &tree1, const BVH2 &tree2, Minimizer &minimizer) { typedef typename Minimizer::Scalar Scalar; typedef typename BVH1::Index Index1; typedef typename BVH2::Index Index2; typedef internal::minimizer_helper1 Helper1; typedef internal::minimizer_helper2 Helper2; typedef std::pair > QueueElement; //first element is priority typedef typename BVH1::VolumeIterator VolIter1; typedef typename BVH1::ObjectIterator ObjIter1; typedef typename BVH2::VolumeIterator VolIter2; typedef typename BVH2::ObjectIterator ObjIter2; VolIter1 vBegin1 = VolIter1(), vEnd1 = VolIter1(); ObjIter1 oBegin1 = ObjIter1(), oEnd1 = ObjIter1(); VolIter2 vBegin2 = VolIter2(), vEnd2 = VolIter2(), vCur2 = VolIter2(); ObjIter2 oBegin2 = ObjIter2(), oEnd2 = ObjIter2(), oCur2 = ObjIter2(); std::priority_queue, std::greater > todo; //smallest is at the top Scalar minimum = (std::numeric_limits::max)(); todo.push(std::make_pair(Scalar(), std::make_pair(tree1.getRootIndex(), tree2.getRootIndex()))); while(!todo.empty()) { tree1.getChildren(todo.top().second.first, vBegin1, vEnd1, oBegin1, oEnd1); tree2.getChildren(todo.top().second.second, vBegin2, vEnd2, oBegin2, oEnd2); todo.pop(); for(; oBegin1 != oEnd1; ++oBegin1) { //go through child objects of first tree for(oCur2 = oBegin2; oCur2 != oEnd2; ++oCur2) {//go through child objects of second tree minimum = (std::min)(minimum, minimizer.minimumOnObjectObject(*oBegin1, *oCur2)); } for(vCur2 = vBegin2; vCur2 != vEnd2; ++vCur2) { //go through child volumes of second tree Helper2 helper(*oBegin1, minimizer); minimum = (std::min)(minimum, internal::minimize_helper(tree2, helper, *vCur2, minimum)); } } for(; vBegin1 != vEnd1; ++vBegin1) { //go through child volumes of first tree const typename BVH1::Volume &vol1 = tree1.getVolume(*vBegin1); for(oCur2 = oBegin2; oCur2 != oEnd2; ++oCur2) {//go through child objects of second tree Helper1 helper(*oCur2, minimizer); minimum = (std::min)(minimum, internal::minimize_helper(tree1, helper, *vBegin1, minimum)); } for(vCur2 = vBegin2; vCur2 != vEnd2; ++vCur2) { //go through child volumes of second tree Scalar val = minimizer.minimumOnVolumeVolume(vol1, tree2.getVolume(*vCur2)); if(val < minimum) todo.push(std::make_pair(val, std::make_pair(*vBegin1, *vCur2))); } } } return minimum; } } // end namespace Eigen #endif // EIGEN_BVALGORITHMS_H RcppEigen/inst/include/unsupported/Eigen/src/Splines/0000755000176200001440000000000014562417221022355 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/Splines/SplineFitting.h0000644000176200001440000004017114562417221025310 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 20010-2011 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPLINE_FITTING_H #define EIGEN_SPLINE_FITTING_H #include #include #include #include #include "SplineFwd.h" #include "../../../../Eigen/LU" #include "../../../../Eigen/QR" namespace Eigen { /** * \brief Computes knot averages. * \ingroup Splines_Module * * The knots are computed as * \f{align*} * u_0 & = \hdots = u_p = 0 \\ * u_{m-p} & = \hdots = u_{m} = 1 \\ * u_{j+p} & = \frac{1}{p}\sum_{i=j}^{j+p-1}\bar{u}_i \quad\quad j=1,\hdots,n-p * \f} * where \f$p\f$ is the degree and \f$m+1\f$ the number knots * of the desired interpolating spline. * * \param[in] parameters The input parameters. During interpolation one for each data point. * \param[in] degree The spline degree which is used during the interpolation. * \param[out] knots The output knot vector. * * \sa Les Piegl and Wayne Tiller, The NURBS book (2nd ed.), 1997, 9.2.1 Global Curve Interpolation to Point Data **/ template void KnotAveraging(const KnotVectorType& parameters, DenseIndex degree, KnotVectorType& knots) { knots.resize(parameters.size()+degree+1); for (DenseIndex j=1; j void KnotAveragingWithDerivatives(const ParameterVectorType& parameters, const unsigned int degree, const IndexArray& derivativeIndices, KnotVectorType& knots) { typedef typename ParameterVectorType::Scalar Scalar; DenseIndex numParameters = parameters.size(); DenseIndex numDerivatives = derivativeIndices.size(); if (numDerivatives < 1) { KnotAveraging(parameters, degree, knots); return; } DenseIndex startIndex; DenseIndex endIndex; DenseIndex numInternalDerivatives = numDerivatives; if (derivativeIndices[0] == 0) { startIndex = 0; --numInternalDerivatives; } else { startIndex = 1; } if (derivativeIndices[numDerivatives - 1] == numParameters - 1) { endIndex = numParameters - degree; --numInternalDerivatives; } else { endIndex = numParameters - degree - 1; } // There are (endIndex - startIndex + 1) knots obtained from the averaging // and 2 for the first and last parameters. DenseIndex numAverageKnots = endIndex - startIndex + 3; KnotVectorType averageKnots(numAverageKnots); averageKnots[0] = parameters[0]; int newKnotIndex = 0; for (DenseIndex i = startIndex; i <= endIndex; ++i) averageKnots[++newKnotIndex] = parameters.segment(i, degree).mean(); averageKnots[++newKnotIndex] = parameters[numParameters - 1]; newKnotIndex = -1; ParameterVectorType temporaryParameters(numParameters + 1); KnotVectorType derivativeKnots(numInternalDerivatives); for (DenseIndex i = 0; i < numAverageKnots - 1; ++i) { temporaryParameters[0] = averageKnots[i]; ParameterVectorType parameterIndices(numParameters); int temporaryParameterIndex = 1; for (DenseIndex j = 0; j < numParameters; ++j) { Scalar parameter = parameters[j]; if (parameter >= averageKnots[i] && parameter < averageKnots[i + 1]) { parameterIndices[temporaryParameterIndex] = j; temporaryParameters[temporaryParameterIndex++] = parameter; } } temporaryParameters[temporaryParameterIndex] = averageKnots[i + 1]; for (int j = 0; j <= temporaryParameterIndex - 2; ++j) { for (DenseIndex k = 0; k < derivativeIndices.size(); ++k) { if (parameterIndices[j + 1] == derivativeIndices[k] && parameterIndices[j + 1] != 0 && parameterIndices[j + 1] != numParameters - 1) { derivativeKnots[++newKnotIndex] = temporaryParameters.segment(j, 3).mean(); break; } } } } KnotVectorType temporaryKnots(averageKnots.size() + derivativeKnots.size()); std::merge(averageKnots.data(), averageKnots.data() + averageKnots.size(), derivativeKnots.data(), derivativeKnots.data() + derivativeKnots.size(), temporaryKnots.data()); // Number of knots (one for each point and derivative) plus spline order. DenseIndex numKnots = numParameters + numDerivatives + degree + 1; knots.resize(numKnots); knots.head(degree).fill(temporaryKnots[0]); knots.tail(degree).fill(temporaryKnots.template tail<1>()[0]); knots.segment(degree, temporaryKnots.size()) = temporaryKnots; } /** * \brief Computes chord length parameters which are required for spline interpolation. * \ingroup Splines_Module * * \param[in] pts The data points to which a spline should be fit. * \param[out] chord_lengths The resulting chord length vector. * * \sa Les Piegl and Wayne Tiller, The NURBS book (2nd ed.), 1997, 9.2.1 Global Curve Interpolation to Point Data **/ template void ChordLengths(const PointArrayType& pts, KnotVectorType& chord_lengths) { typedef typename KnotVectorType::Scalar Scalar; const DenseIndex n = pts.cols(); // 1. compute the column-wise norms chord_lengths.resize(pts.cols()); chord_lengths[0] = 0; chord_lengths.rightCols(n-1) = (pts.array().leftCols(n-1) - pts.array().rightCols(n-1)).matrix().colwise().norm(); // 2. compute the partial sums std::partial_sum(chord_lengths.data(), chord_lengths.data()+n, chord_lengths.data()); // 3. normalize the data chord_lengths /= chord_lengths(n-1); chord_lengths(n-1) = Scalar(1); } /** * \brief Spline fitting methods. * \ingroup Splines_Module **/ template struct SplineFitting { typedef typename SplineType::KnotVectorType KnotVectorType; typedef typename SplineType::ParameterVectorType ParameterVectorType; /** * \brief Fits an interpolating Spline to the given data points. * * \param pts The points for which an interpolating spline will be computed. * \param degree The degree of the interpolating spline. * * \returns A spline interpolating the initially provided points. **/ template static SplineType Interpolate(const PointArrayType& pts, DenseIndex degree); /** * \brief Fits an interpolating Spline to the given data points. * * \param pts The points for which an interpolating spline will be computed. * \param degree The degree of the interpolating spline. * \param knot_parameters The knot parameters for the interpolation. * * \returns A spline interpolating the initially provided points. **/ template static SplineType Interpolate(const PointArrayType& pts, DenseIndex degree, const KnotVectorType& knot_parameters); /** * \brief Fits an interpolating spline to the given data points and * derivatives. * * \param points The points for which an interpolating spline will be computed. * \param derivatives The desired derivatives of the interpolating spline at interpolation * points. * \param derivativeIndices An array indicating which point each derivative belongs to. This * must be the same size as @a derivatives. * \param degree The degree of the interpolating spline. * * \returns A spline interpolating @a points with @a derivatives at those points. * * \sa Les A. Piegl, Khairan Rajab, Volha Smarodzinana. 2008. * Curve interpolation with directional constraints for engineering design. * Engineering with Computers **/ template static SplineType InterpolateWithDerivatives(const PointArrayType& points, const PointArrayType& derivatives, const IndexArray& derivativeIndices, const unsigned int degree); /** * \brief Fits an interpolating spline to the given data points and derivatives. * * \param points The points for which an interpolating spline will be computed. * \param derivatives The desired derivatives of the interpolating spline at interpolation points. * \param derivativeIndices An array indicating which point each derivative belongs to. This * must be the same size as @a derivatives. * \param degree The degree of the interpolating spline. * \param parameters The parameters corresponding to the interpolation points. * * \returns A spline interpolating @a points with @a derivatives at those points. * * \sa Les A. Piegl, Khairan Rajab, Volha Smarodzinana. 2008. * Curve interpolation with directional constraints for engineering design. * Engineering with Computers */ template static SplineType InterpolateWithDerivatives(const PointArrayType& points, const PointArrayType& derivatives, const IndexArray& derivativeIndices, const unsigned int degree, const ParameterVectorType& parameters); }; template template SplineType SplineFitting::Interpolate(const PointArrayType& pts, DenseIndex degree, const KnotVectorType& knot_parameters) { typedef typename SplineType::KnotVectorType::Scalar Scalar; typedef typename SplineType::ControlPointVectorType ControlPointVectorType; typedef Matrix MatrixType; KnotVectorType knots; KnotAveraging(knot_parameters, degree, knots); DenseIndex n = pts.cols(); MatrixType A = MatrixType::Zero(n,n); for (DenseIndex i=1; i qr(A); // Here, we are creating a temporary due to an Eigen issue. ControlPointVectorType ctrls = qr.solve(MatrixType(pts.transpose())).transpose(); return SplineType(knots, ctrls); } template template SplineType SplineFitting::Interpolate(const PointArrayType& pts, DenseIndex degree) { KnotVectorType chord_lengths; // knot parameters ChordLengths(pts, chord_lengths); return Interpolate(pts, degree, chord_lengths); } template template SplineType SplineFitting::InterpolateWithDerivatives(const PointArrayType& points, const PointArrayType& derivatives, const IndexArray& derivativeIndices, const unsigned int degree, const ParameterVectorType& parameters) { typedef typename SplineType::KnotVectorType::Scalar Scalar; typedef typename SplineType::ControlPointVectorType ControlPointVectorType; typedef Matrix MatrixType; const DenseIndex n = points.cols() + derivatives.cols(); KnotVectorType knots; KnotAveragingWithDerivatives(parameters, degree, derivativeIndices, knots); // fill matrix MatrixType A = MatrixType::Zero(n, n); // Use these dimensions for quicker populating, then transpose for solving. MatrixType b(points.rows(), n); DenseIndex startRow; DenseIndex derivativeStart; // End derivatives. if (derivativeIndices[0] == 0) { A.template block<1, 2>(1, 0) << -1, 1; Scalar y = (knots(degree + 1) - knots(0)) / degree; b.col(1) = y*derivatives.col(0); startRow = 2; derivativeStart = 1; } else { startRow = 1; derivativeStart = 0; } if (derivativeIndices[derivatives.cols() - 1] == points.cols() - 1) { A.template block<1, 2>(n - 2, n - 2) << -1, 1; Scalar y = (knots(knots.size() - 1) - knots(knots.size() - (degree + 2))) / degree; b.col(b.cols() - 2) = y*derivatives.col(derivatives.cols() - 1); } DenseIndex row = startRow; DenseIndex derivativeIndex = derivativeStart; for (DenseIndex i = 1; i < parameters.size() - 1; ++i) { const DenseIndex span = SplineType::Span(parameters[i], degree, knots); if (derivativeIndex < derivativeIndices.size() && derivativeIndices[derivativeIndex] == i) { A.block(row, span - degree, 2, degree + 1) = SplineType::BasisFunctionDerivatives(parameters[i], 1, degree, knots); b.col(row++) = points.col(i); b.col(row++) = derivatives.col(derivativeIndex++); } else { A.row(row).segment(span - degree, degree + 1) = SplineType::BasisFunctions(parameters[i], degree, knots); b.col(row++) = points.col(i); } } b.col(0) = points.col(0); b.col(b.cols() - 1) = points.col(points.cols() - 1); A(0,0) = 1; A(n - 1, n - 1) = 1; // Solve FullPivLU lu(A); ControlPointVectorType controlPoints = lu.solve(MatrixType(b.transpose())).transpose(); SplineType spline(knots, controlPoints); return spline; } template template SplineType SplineFitting::InterpolateWithDerivatives(const PointArrayType& points, const PointArrayType& derivatives, const IndexArray& derivativeIndices, const unsigned int degree) { ParameterVectorType parameters; ChordLengths(points, parameters); return InterpolateWithDerivatives(points, derivatives, derivativeIndices, degree, parameters); } } #endif // EIGEN_SPLINE_FITTING_H RcppEigen/inst/include/unsupported/Eigen/src/Splines/Spline.h0000644000176200001440000004360314562417221023766 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 20010-2011 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPLINE_H #define EIGEN_SPLINE_H #include "SplineFwd.h" namespace Eigen { /** * \ingroup Splines_Module * \class Spline * \brief A class representing multi-dimensional spline curves. * * The class represents B-splines with non-uniform knot vectors. Each control * point of the B-spline is associated with a basis function * \f{align*} * C(u) & = \sum_{i=0}^{n}N_{i,p}(u)P_i * \f} * * \tparam _Scalar The underlying data type (typically float or double) * \tparam _Dim The curve dimension (e.g. 2 or 3) * \tparam _Degree Per default set to Dynamic; could be set to the actual desired * degree for optimization purposes (would result in stack allocation * of several temporary variables). **/ template class Spline { public: typedef _Scalar Scalar; /*!< The spline curve's scalar type. */ enum { Dimension = _Dim /*!< The spline curve's dimension. */ }; enum { Degree = _Degree /*!< The spline curve's degree. */ }; /** \brief The point type the spline is representing. */ typedef typename SplineTraits::PointType PointType; /** \brief The data type used to store knot vectors. */ typedef typename SplineTraits::KnotVectorType KnotVectorType; /** \brief The data type used to store parameter vectors. */ typedef typename SplineTraits::ParameterVectorType ParameterVectorType; /** \brief The data type used to store non-zero basis functions. */ typedef typename SplineTraits::BasisVectorType BasisVectorType; /** \brief The data type used to store the values of the basis function derivatives. */ typedef typename SplineTraits::BasisDerivativeType BasisDerivativeType; /** \brief The data type representing the spline's control points. */ typedef typename SplineTraits::ControlPointVectorType ControlPointVectorType; /** * \brief Creates a (constant) zero spline. * For Splines with dynamic degree, the resulting degree will be 0. **/ Spline() : m_knots(1, (Degree==Dynamic ? 2 : 2*Degree+2)) , m_ctrls(ControlPointVectorType::Zero(Dimension,(Degree==Dynamic ? 1 : Degree+1))) { // in theory this code can go to the initializer list but it will get pretty // much unreadable ... enum { MinDegree = (Degree==Dynamic ? 0 : Degree) }; m_knots.template segment(0) = Array::Zero(); m_knots.template segment(MinDegree+1) = Array::Ones(); } /** * \brief Creates a spline from a knot vector and control points. * \param knots The spline's knot vector. * \param ctrls The spline's control point vector. **/ template Spline(const OtherVectorType& knots, const OtherArrayType& ctrls) : m_knots(knots), m_ctrls(ctrls) {} /** * \brief Copy constructor for splines. * \param spline The input spline. **/ template Spline(const Spline& spline) : m_knots(spline.knots()), m_ctrls(spline.ctrls()) {} /** * \brief Returns the knots of the underlying spline. **/ const KnotVectorType& knots() const { return m_knots; } /** * \brief Returns the ctrls of the underlying spline. **/ const ControlPointVectorType& ctrls() const { return m_ctrls; } /** * \brief Returns the spline value at a given site \f$u\f$. * * The function returns * \f{align*} * C(u) & = \sum_{i=0}^{n}N_{i,p}P_i * \f} * * \param u Parameter \f$u \in [0;1]\f$ at which the spline is evaluated. * \return The spline value at the given location \f$u\f$. **/ PointType operator()(Scalar u) const; /** * \brief Evaluation of spline derivatives of up-to given order. * * The function returns * \f{align*} * \frac{d^i}{du^i}C(u) & = \sum_{i=0}^{n} \frac{d^i}{du^i} N_{i,p}(u)P_i * \f} * for i ranging between 0 and order. * * \param u Parameter \f$u \in [0;1]\f$ at which the spline derivative is evaluated. * \param order The order up to which the derivatives are computed. **/ typename SplineTraits::DerivativeType derivatives(Scalar u, DenseIndex order) const; /** * \copydoc Spline::derivatives * Using the template version of this function is more efficieent since * temporary objects are allocated on the stack whenever this is possible. **/ template typename SplineTraits::DerivativeType derivatives(Scalar u, DenseIndex order = DerivativeOrder) const; /** * \brief Computes the non-zero basis functions at the given site. * * Splines have local support and a point from their image is defined * by exactly \f$p+1\f$ control points \f$P_i\f$ where \f$p\f$ is the * spline degree. * * This function computes the \f$p+1\f$ non-zero basis function values * for a given parameter value \f$u\f$. It returns * \f{align*}{ * N_{i,p}(u), \hdots, N_{i+p+1,p}(u) * \f} * * \param u Parameter \f$u \in [0;1]\f$ at which the non-zero basis functions * are computed. **/ typename SplineTraits::BasisVectorType basisFunctions(Scalar u) const; /** * \brief Computes the non-zero spline basis function derivatives up to given order. * * The function computes * \f{align*}{ * \frac{d^i}{du^i} N_{i,p}(u), \hdots, \frac{d^i}{du^i} N_{i+p+1,p}(u) * \f} * with i ranging from 0 up to the specified order. * * \param u Parameter \f$u \in [0;1]\f$ at which the non-zero basis function * derivatives are computed. * \param order The order up to which the basis function derivatives are computes. **/ typename SplineTraits::BasisDerivativeType basisFunctionDerivatives(Scalar u, DenseIndex order) const; /** * \copydoc Spline::basisFunctionDerivatives * Using the template version of this function is more efficieent since * temporary objects are allocated on the stack whenever this is possible. **/ template typename SplineTraits::BasisDerivativeType basisFunctionDerivatives(Scalar u, DenseIndex order = DerivativeOrder) const; /** * \brief Returns the spline degree. **/ DenseIndex degree() const; /** * \brief Returns the span within the knot vector in which u is falling. * \param u The site for which the span is determined. **/ DenseIndex span(Scalar u) const; /** * \brief Computes the span within the provided knot vector in which u is falling. **/ static DenseIndex Span(typename SplineTraits::Scalar u, DenseIndex degree, const typename SplineTraits::KnotVectorType& knots); /** * \brief Returns the spline's non-zero basis functions. * * The function computes and returns * \f{align*}{ * N_{i,p}(u), \hdots, N_{i+p+1,p}(u) * \f} * * \param u The site at which the basis functions are computed. * \param degree The degree of the underlying spline. * \param knots The underlying spline's knot vector. **/ static BasisVectorType BasisFunctions(Scalar u, DenseIndex degree, const KnotVectorType& knots); /** * \copydoc Spline::basisFunctionDerivatives * \param degree The degree of the underlying spline * \param knots The underlying spline's knot vector. **/ static BasisDerivativeType BasisFunctionDerivatives( const Scalar u, const DenseIndex order, const DenseIndex degree, const KnotVectorType& knots); private: KnotVectorType m_knots; /*!< Knot vector. */ ControlPointVectorType m_ctrls; /*!< Control points. */ template static void BasisFunctionDerivativesImpl( const typename Spline<_Scalar, _Dim, _Degree>::Scalar u, const DenseIndex order, const DenseIndex p, const typename Spline<_Scalar, _Dim, _Degree>::KnotVectorType& U, DerivativeType& N_); }; template DenseIndex Spline<_Scalar, _Dim, _Degree>::Span( typename SplineTraits< Spline<_Scalar, _Dim, _Degree> >::Scalar u, DenseIndex degree, const typename SplineTraits< Spline<_Scalar, _Dim, _Degree> >::KnotVectorType& knots) { // Piegl & Tiller, "The NURBS Book", A2.1 (p. 68) if (u <= knots(0)) return degree; const Scalar* pos = std::upper_bound(knots.data()+degree-1, knots.data()+knots.size()-degree-1, u); return static_cast( std::distance(knots.data(), pos) - 1 ); } template typename Spline<_Scalar, _Dim, _Degree>::BasisVectorType Spline<_Scalar, _Dim, _Degree>::BasisFunctions( typename Spline<_Scalar, _Dim, _Degree>::Scalar u, DenseIndex degree, const typename Spline<_Scalar, _Dim, _Degree>::KnotVectorType& knots) { const DenseIndex p = degree; const DenseIndex i = Spline::Span(u, degree, knots); const KnotVectorType& U = knots; BasisVectorType left(p+1); left(0) = Scalar(0); BasisVectorType right(p+1); right(0) = Scalar(0); VectorBlock(left,1,p) = u - VectorBlock(U,i+1-p,p).reverse(); VectorBlock(right,1,p) = VectorBlock(U,i+1,p) - u; BasisVectorType N(1,p+1); N(0) = Scalar(1); for (DenseIndex j=1; j<=p; ++j) { Scalar saved = Scalar(0); for (DenseIndex r=0; r DenseIndex Spline<_Scalar, _Dim, _Degree>::degree() const { if (_Degree == Dynamic) return m_knots.size() - m_ctrls.cols() - 1; else return _Degree; } template DenseIndex Spline<_Scalar, _Dim, _Degree>::span(Scalar u) const { return Spline::Span(u, degree(), knots()); } template typename Spline<_Scalar, _Dim, _Degree>::PointType Spline<_Scalar, _Dim, _Degree>::operator()(Scalar u) const { enum { Order = SplineTraits::OrderAtCompileTime }; const DenseIndex span = this->span(u); const DenseIndex p = degree(); const BasisVectorType basis_funcs = basisFunctions(u); const Replicate ctrl_weights(basis_funcs); const Block ctrl_pts(ctrls(),0,span-p,Dimension,p+1); return (ctrl_weights * ctrl_pts).rowwise().sum(); } /* --------------------------------------------------------------------------------------------- */ template void derivativesImpl(const SplineType& spline, typename SplineType::Scalar u, DenseIndex order, DerivativeType& der) { enum { Dimension = SplineTraits::Dimension }; enum { Order = SplineTraits::OrderAtCompileTime }; enum { DerivativeOrder = DerivativeType::ColsAtCompileTime }; typedef typename SplineTraits::ControlPointVectorType ControlPointVectorType; typedef typename SplineTraits::BasisDerivativeType BasisDerivativeType; typedef typename BasisDerivativeType::ConstRowXpr BasisDerivativeRowXpr; const DenseIndex p = spline.degree(); const DenseIndex span = spline.span(u); const DenseIndex n = (std::min)(p, order); der.resize(Dimension,n+1); // Retrieve the basis function derivatives up to the desired order... const BasisDerivativeType basis_func_ders = spline.template basisFunctionDerivatives(u, n+1); // ... and perform the linear combinations of the control points. for (DenseIndex der_order=0; der_order ctrl_weights( basis_func_ders.row(der_order) ); const Block ctrl_pts(spline.ctrls(),0,span-p,Dimension,p+1); der.col(der_order) = (ctrl_weights * ctrl_pts).rowwise().sum(); } } template typename SplineTraits< Spline<_Scalar, _Dim, _Degree> >::DerivativeType Spline<_Scalar, _Dim, _Degree>::derivatives(Scalar u, DenseIndex order) const { typename SplineTraits< Spline >::DerivativeType res; derivativesImpl(*this, u, order, res); return res; } template template typename SplineTraits< Spline<_Scalar, _Dim, _Degree>, DerivativeOrder >::DerivativeType Spline<_Scalar, _Dim, _Degree>::derivatives(Scalar u, DenseIndex order) const { typename SplineTraits< Spline, DerivativeOrder >::DerivativeType res; derivativesImpl(*this, u, order, res); return res; } template typename SplineTraits< Spline<_Scalar, _Dim, _Degree> >::BasisVectorType Spline<_Scalar, _Dim, _Degree>::basisFunctions(Scalar u) const { return Spline::BasisFunctions(u, degree(), knots()); } /* --------------------------------------------------------------------------------------------- */ template template void Spline<_Scalar, _Dim, _Degree>::BasisFunctionDerivativesImpl( const typename Spline<_Scalar, _Dim, _Degree>::Scalar u, const DenseIndex order, const DenseIndex p, const typename Spline<_Scalar, _Dim, _Degree>::KnotVectorType& U, DerivativeType& N_) { typedef Spline<_Scalar, _Dim, _Degree> SplineType; enum { Order = SplineTraits::OrderAtCompileTime }; const DenseIndex span = SplineType::Span(u, p, U); const DenseIndex n = (std::min)(p, order); N_.resize(n+1, p+1); BasisVectorType left = BasisVectorType::Zero(p+1); BasisVectorType right = BasisVectorType::Zero(p+1); Matrix ndu(p+1,p+1); Scalar saved, temp; // FIXME These were double instead of Scalar. Was there a reason for that? ndu(0,0) = 1.0; DenseIndex j; for (j=1; j<=p; ++j) { left[j] = u-U[span+1-j]; right[j] = U[span+j]-u; saved = 0.0; for (DenseIndex r=0; r(saved+right[r+1] * temp); saved = left[j-r] * temp; } ndu(j,j) = static_cast(saved); } for (j = p; j>=0; --j) N_(0,j) = ndu(j,p); // Compute the derivatives DerivativeType a(n+1,p+1); DenseIndex r=0; for (; r<=p; ++r) { DenseIndex s1,s2; s1 = 0; s2 = 1; // alternate rows in array a a(0,0) = 1.0; // Compute the k-th derivative for (DenseIndex k=1; k<=static_cast(n); ++k) { Scalar d = 0.0; DenseIndex rk,pk,j1,j2; rk = r-k; pk = p-k; if (r>=k) { a(s2,0) = a(s1,0)/ndu(pk+1,rk); d = a(s2,0)*ndu(rk,pk); } if (rk>=-1) j1 = 1; else j1 = -rk; if (r-1 <= pk) j2 = k-1; else j2 = p-r; for (j=j1; j<=j2; ++j) { a(s2,j) = (a(s1,j)-a(s1,j-1))/ndu(pk+1,rk+j); d += a(s2,j)*ndu(rk+j,pk); } if (r<=pk) { a(s2,k) = -a(s1,k-1)/ndu(pk+1,r); d += a(s2,k)*ndu(r,pk); } N_(k,r) = static_cast(d); j = s1; s1 = s2; s2 = j; // Switch rows } } /* Multiply through by the correct factors */ /* (Eq. [2.9]) */ r = p; for (DenseIndex k=1; k<=static_cast(n); ++k) { for (j=p; j>=0; --j) N_(k,j) *= r; r *= p-k; } } template typename SplineTraits< Spline<_Scalar, _Dim, _Degree> >::BasisDerivativeType Spline<_Scalar, _Dim, _Degree>::basisFunctionDerivatives(Scalar u, DenseIndex order) const { typename SplineTraits >::BasisDerivativeType der; BasisFunctionDerivativesImpl(u, order, degree(), knots(), der); return der; } template template typename SplineTraits< Spline<_Scalar, _Dim, _Degree>, DerivativeOrder >::BasisDerivativeType Spline<_Scalar, _Dim, _Degree>::basisFunctionDerivatives(Scalar u, DenseIndex order) const { typename SplineTraits< Spline<_Scalar, _Dim, _Degree>, DerivativeOrder >::BasisDerivativeType der; BasisFunctionDerivativesImpl(u, order, degree(), knots(), der); return der; } template typename SplineTraits >::BasisDerivativeType Spline<_Scalar, _Dim, _Degree>::BasisFunctionDerivatives( const typename Spline<_Scalar, _Dim, _Degree>::Scalar u, const DenseIndex order, const DenseIndex degree, const typename Spline<_Scalar, _Dim, _Degree>::KnotVectorType& knots) { typename SplineTraits::BasisDerivativeType der; BasisFunctionDerivativesImpl(u, order, degree, knots, der); return der; } } #endif // EIGEN_SPLINE_H RcppEigen/inst/include/unsupported/Eigen/src/Splines/SplineFwd.h0000644000176200001440000001033014562417221024416 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 20010-2011 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPLINES_FWD_H #define EIGEN_SPLINES_FWD_H #include "../../../../Eigen/Core" namespace Eigen { template class Spline; template < typename SplineType, int DerivativeOrder = Dynamic > struct SplineTraits {}; /** * \ingroup Splines_Module * \brief Compile-time attributes of the Spline class for Dynamic degree. **/ template struct SplineTraits< Spline<_Scalar, _Dim, _Degree>, Dynamic > { typedef _Scalar Scalar; /*!< The spline curve's scalar type. */ enum { Dimension = _Dim /*!< The spline curve's dimension. */ }; enum { Degree = _Degree /*!< The spline curve's degree. */ }; enum { OrderAtCompileTime = _Degree==Dynamic ? Dynamic : _Degree+1 /*!< The spline curve's order at compile-time. */ }; enum { NumOfDerivativesAtCompileTime = OrderAtCompileTime /*!< The number of derivatives defined for the current spline. */ }; enum { DerivativeMemoryLayout = Dimension==1 ? RowMajor : ColMajor /*!< The derivative type's memory layout. */ }; /** \brief The data type used to store non-zero basis functions. */ typedef Array BasisVectorType; /** \brief The data type used to store the values of the basis function derivatives. */ typedef Array BasisDerivativeType; /** \brief The data type used to store the spline's derivative values. */ typedef Array DerivativeType; /** \brief The point type the spline is representing. */ typedef Array PointType; /** \brief The data type used to store knot vectors. */ typedef Array KnotVectorType; /** \brief The data type used to store parameter vectors. */ typedef Array ParameterVectorType; /** \brief The data type representing the spline's control points. */ typedef Array ControlPointVectorType; }; /** * \ingroup Splines_Module * \brief Compile-time attributes of the Spline class for fixed degree. * * The traits class inherits all attributes from the SplineTraits of Dynamic degree. **/ template < typename _Scalar, int _Dim, int _Degree, int _DerivativeOrder > struct SplineTraits< Spline<_Scalar, _Dim, _Degree>, _DerivativeOrder > : public SplineTraits< Spline<_Scalar, _Dim, _Degree> > { enum { OrderAtCompileTime = _Degree==Dynamic ? Dynamic : _Degree+1 /*!< The spline curve's order at compile-time. */ }; enum { NumOfDerivativesAtCompileTime = _DerivativeOrder==Dynamic ? Dynamic : _DerivativeOrder+1 /*!< The number of derivatives defined for the current spline. */ }; enum { DerivativeMemoryLayout = _Dim==1 ? RowMajor : ColMajor /*!< The derivative type's memory layout. */ }; /** \brief The data type used to store the values of the basis function derivatives. */ typedef Array<_Scalar,Dynamic,Dynamic,RowMajor,NumOfDerivativesAtCompileTime,OrderAtCompileTime> BasisDerivativeType; /** \brief The data type used to store the spline's derivative values. */ typedef Array<_Scalar,_Dim,Dynamic,DerivativeMemoryLayout,_Dim,NumOfDerivativesAtCompileTime> DerivativeType; }; /** \brief 2D float B-spline with dynamic degree. */ typedef Spline Spline2f; /** \brief 3D float B-spline with dynamic degree. */ typedef Spline Spline3f; /** \brief 2D double B-spline with dynamic degree. */ typedef Spline Spline2d; /** \brief 3D double B-spline with dynamic degree. */ typedef Spline Spline3d; } #endif // EIGEN_SPLINES_FWD_H RcppEigen/inst/include/unsupported/Eigen/src/MoreVectorization/0000755000176200001440000000000014107270226024420 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/MoreVectorization/MathFunctions.h0000644000176200001440000000573314107270226027363 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Rohit Garg // Copyright (C) 2009 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MOREVECTORIZATION_MATHFUNCTIONS_H #define EIGEN_MOREVECTORIZATION_MATHFUNCTIONS_H namespace Eigen { namespace internal { /** \internal \returns the arcsin of \a a (coeff-wise) */ template inline static Packet pasin(Packet a) { return std::asin(a); } #ifdef EIGEN_VECTORIZE_SSE template<> EIGEN_DONT_INLINE Packet4f pasin(Packet4f x) { _EIGEN_DECLARE_CONST_Packet4f(half, 0.5); _EIGEN_DECLARE_CONST_Packet4f(minus_half, -0.5); _EIGEN_DECLARE_CONST_Packet4f(3half, 1.5); _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(sign_mask, 0x80000000); _EIGEN_DECLARE_CONST_Packet4f(pi, 3.141592654); _EIGEN_DECLARE_CONST_Packet4f(pi_over_2, 3.141592654*0.5); _EIGEN_DECLARE_CONST_Packet4f(asin1, 4.2163199048E-2); _EIGEN_DECLARE_CONST_Packet4f(asin2, 2.4181311049E-2); _EIGEN_DECLARE_CONST_Packet4f(asin3, 4.5470025998E-2); _EIGEN_DECLARE_CONST_Packet4f(asin4, 7.4953002686E-2); _EIGEN_DECLARE_CONST_Packet4f(asin5, 1.6666752422E-1); Packet4f a = pabs(x);//got the absolute value Packet4f sign_bit= _mm_and_ps(x, p4f_sign_mask);//extracted the sign bit Packet4f z1,z2;//will need them during computation //will compute the two branches for asin //so first compare with half Packet4f branch_mask= _mm_cmpgt_ps(a, p4f_half);//this is to select which branch to take //both will be taken, and finally results will be merged //the branch for values >0.5 { //the core series expansion z1=pmadd(p4f_minus_half,a,p4f_half); Packet4f x1=psqrt(z1); Packet4f s1=pmadd(p4f_asin1, z1, p4f_asin2); Packet4f s2=pmadd(s1, z1, p4f_asin3); Packet4f s3=pmadd(s2,z1, p4f_asin4); Packet4f s4=pmadd(s3,z1, p4f_asin5); Packet4f temp=pmul(s4,z1);//not really a madd but a mul by z so that the next term can be a madd z1=pmadd(temp,x1,x1); z1=padd(z1,z1); z1=psub(p4f_pi_over_2,z1); } { //the core series expansion Packet4f x2=a; z2=pmul(x2,x2); Packet4f s1=pmadd(p4f_asin1, z2, p4f_asin2); Packet4f s2=pmadd(s1, z2, p4f_asin3); Packet4f s3=pmadd(s2,z2, p4f_asin4); Packet4f s4=pmadd(s3,z2, p4f_asin5); Packet4f temp=pmul(s4,z2);//not really a madd but a mul by z so that the next term can be a madd z2=pmadd(temp,x2,x2); } /* select the correct result from the two branch evaluations */ z1 = _mm_and_ps(branch_mask, z1); z2 = _mm_andnot_ps(branch_mask, z2); Packet4f z = _mm_or_ps(z1,z2); /* update the sign */ return _mm_xor_ps(z, sign_bit); } #endif // EIGEN_VECTORIZE_SSE } // end namespace internal } // end namespace Eigen #endif // EIGEN_MOREVECTORIZATION_MATHFUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/0000755000176200001440000000000014562417221023201 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/MarketIO.h0000644000176200001440000002034014562417221025024 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Gael Guennebaud // Copyright (C) 2012 Desire NUENTSA WAKAM // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSE_MARKET_IO_H #define EIGEN_SPARSE_MARKET_IO_H #include #include namespace Eigen { namespace internal { template inline void GetMarketLine (const char* line, StorageIndex& i, StorageIndex& j, Scalar& value) { std::stringstream sline(line); sline >> i >> j >> value; } template<> inline void GetMarketLine (const char* line, int& i, int& j, float& value) { std::sscanf(line, "%d %d %g", &i, &j, &value); } template<> inline void GetMarketLine (const char* line, int& i, int& j, double& value) { std::sscanf(line, "%d %d %lg", &i, &j, &value); } template<> inline void GetMarketLine (const char* line, int& i, int& j, std::complex& value) { std::sscanf(line, "%d %d %g %g", &i, &j, &numext::real_ref(value), &numext::imag_ref(value)); } template<> inline void GetMarketLine (const char* line, int& i, int& j, std::complex& value) { std::sscanf(line, "%d %d %lg %lg", &i, &j, &numext::real_ref(value), &numext::imag_ref(value)); } template inline void GetMarketLine (const char* line, StorageIndex& i, StorageIndex& j, std::complex& value) { std::stringstream sline(line); Scalar valR, valI; sline >> i >> j >> valR >> valI; value = std::complex(valR,valI); } template inline void GetVectorElt (const std::string& line, RealScalar& val) { std::istringstream newline(line); newline >> val; } template inline void GetVectorElt (const std::string& line, std::complex& val) { RealScalar valR, valI; std::istringstream newline(line); newline >> valR >> valI; val = std::complex(valR, valI); } template inline void putMarketHeader(std::string& header,int sym) { header= "%%MatrixMarket matrix coordinate "; if(internal::is_same >::value || internal::is_same >::value) { header += " complex"; if(sym == Symmetric) header += " symmetric"; else if (sym == SelfAdjoint) header += " Hermitian"; else header += " general"; } else { header += " real"; if(sym == Symmetric) header += " symmetric"; else header += " general"; } } template inline void PutMatrixElt(Scalar value, StorageIndex row, StorageIndex col, std::ofstream& out) { out << row << " "<< col << " " << value << "\n"; } template inline void PutMatrixElt(std::complex value, StorageIndex row, StorageIndex col, std::ofstream& out) { out << row << " " << col << " " << value.real() << " " << value.imag() << "\n"; } template inline void putVectorElt(Scalar value, std::ofstream& out) { out << value << "\n"; } template inline void putVectorElt(std::complex value, std::ofstream& out) { out << value.real() << " " << value.imag()<< "\n"; } } // end namespace internal inline bool getMarketHeader(const std::string& filename, int& sym, bool& iscomplex, bool& isvector) { sym = 0; iscomplex = false; isvector = false; std::ifstream in(filename.c_str(),std::ios::in); if(!in) return false; std::string line; // The matrix header is always the first line in the file std::getline(in, line); eigen_assert(in.good()); std::stringstream fmtline(line); std::string substr[5]; fmtline>> substr[0] >> substr[1] >> substr[2] >> substr[3] >> substr[4]; if(substr[2].compare("array") == 0) isvector = true; if(substr[3].compare("complex") == 0) iscomplex = true; if(substr[4].compare("symmetric") == 0) sym = Symmetric; else if (substr[4].compare("Hermitian") == 0) sym = SelfAdjoint; return true; } template bool loadMarket(SparseMatrixType& mat, const std::string& filename) { typedef typename SparseMatrixType::Scalar Scalar; typedef typename SparseMatrixType::StorageIndex StorageIndex; std::ifstream input(filename.c_str(),std::ios::in); if(!input) return false; char rdbuffer[4096]; input.rdbuf()->pubsetbuf(rdbuffer, 4096); const int maxBuffersize = 2048; char buffer[maxBuffersize]; bool readsizes = false; typedef Triplet T; std::vector elements; Index M(-1), N(-1), NNZ(-1); Index count = 0; while(input.getline(buffer, maxBuffersize)) { // skip comments //NOTE An appropriate test should be done on the header to get the symmetry if(buffer[0]=='%') continue; if(!readsizes) { std::stringstream line(buffer); line >> M >> N >> NNZ; if(M > 0 && N > 0) { readsizes = true; mat.resize(M,N); mat.reserve(NNZ); } } else { StorageIndex i(-1), j(-1); Scalar value; internal::GetMarketLine(buffer, i, j, value); i--; j--; if(i>=0 && j>=0 && i bool loadMarketVector(VectorType& vec, const std::string& filename) { typedef typename VectorType::Scalar Scalar; std::ifstream in(filename.c_str(), std::ios::in); if(!in) return false; std::string line; int n(0), col(0); do { // Skip comments std::getline(in, line); eigen_assert(in.good()); } while (line[0] == '%'); std::istringstream newline(line); newline >> n >> col; eigen_assert(n>0 && col>0); vec.resize(n); int i = 0; Scalar value; while ( std::getline(in, line) && (i < n) ){ internal::GetVectorElt(line, value); vec(i++) = value; } in.close(); if (i!=n){ std::cerr<< "Unable to read all elements from file " << filename << "\n"; return false; } return true; } template bool saveMarket(const SparseMatrixType& mat, const std::string& filename, int sym = 0) { typedef typename SparseMatrixType::Scalar Scalar; typedef typename SparseMatrixType::RealScalar RealScalar; std::ofstream out(filename.c_str(),std::ios::out); if(!out) return false; out.flags(std::ios_base::scientific); out.precision(std::numeric_limits::digits10 + 2); std::string header; internal::putMarketHeader(header, sym); out << header << std::endl; out << mat.rows() << " " << mat.cols() << " " << mat.nonZeros() << "\n"; int count = 0; for(int j=0; j bool saveMarketVector (const VectorType& vec, const std::string& filename) { typedef typename VectorType::Scalar Scalar; typedef typename VectorType::RealScalar RealScalar; std::ofstream out(filename.c_str(),std::ios::out); if(!out) return false; out.flags(std::ios_base::scientific); out.precision(std::numeric_limits::digits10 + 2); if(internal::is_same >::value || internal::is_same >::value) out << "%%MatrixMarket matrix array complex general\n"; else out << "%%MatrixMarket matrix array real general\n"; out << vec.size() << " "<< 1 << "\n"; for (int i=0; i < vec.size(); i++){ internal::putVectorElt(vec(i), out); } out.close(); return true; } } // end namespace Eigen #endif // EIGEN_SPARSE_MARKET_IO_H RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h0000644000176200001440000001677114562417221027510 0ustar liggesusers // This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Desire NUENTSA WAKAM // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BROWSE_MATRICES_H #define EIGEN_BROWSE_MATRICES_H namespace Eigen { enum { SPD = 0x100, NonSymmetric = 0x0 }; /** * @brief Iterator to browse matrices from a specified folder * * This is used to load all the matrices from a folder. * The matrices should be in Matrix Market format * It is assumed that the matrices are named as matname.mtx * and matname_SPD.mtx if the matrix is Symmetric and positive definite (or Hermitian) * The right hand side vectors are loaded as well, if they exist. * They should be named as matname_b.mtx. * Note that the right hand side for a SPD matrix is named as matname_SPD_b.mtx * * Sometimes a reference solution is available. In this case, it should be named as matname_x.mtx * * Sample code * \code * * \endcode * * \tparam Scalar The scalar type */ template class MatrixMarketIterator { typedef typename NumTraits::Real RealScalar; public: typedef Matrix VectorType; typedef SparseMatrix MatrixType; public: MatrixMarketIterator(const std::string &folder) : m_sym(0), m_isvalid(false), m_matIsLoaded(false), m_hasRhs(false), m_hasrefX(false), m_folder(folder) { m_folder_id = opendir(folder.c_str()); if(m_folder_id) Getnextvalidmatrix(); } ~MatrixMarketIterator() { if (m_folder_id) closedir(m_folder_id); } inline MatrixMarketIterator& operator++() { m_matIsLoaded = false; m_hasrefX = false; m_hasRhs = false; Getnextvalidmatrix(); return *this; } inline operator bool() const { return m_isvalid;} /** Return the sparse matrix corresponding to the current file */ inline MatrixType& matrix() { // Read the matrix if (m_matIsLoaded) return m_mat; std::string matrix_file = m_folder + "/" + m_matname + ".mtx"; if ( !loadMarket(m_mat, matrix_file)) { std::cerr << "Warning loadMarket failed when loading \"" << matrix_file << "\"" << std::endl; m_matIsLoaded = false; return m_mat; } m_matIsLoaded = true; if (m_sym != NonSymmetric) { // Check whether we need to restore a full matrix: RealScalar diag_norm = m_mat.diagonal().norm(); RealScalar lower_norm = m_mat.template triangularView().norm(); RealScalar upper_norm = m_mat.template triangularView().norm(); if(lower_norm>diag_norm && upper_norm==diag_norm) { // only the lower part is stored MatrixType tmp(m_mat); m_mat = tmp.template selfadjointView(); } else if(upper_norm>diag_norm && lower_norm==diag_norm) { // only the upper part is stored MatrixType tmp(m_mat); m_mat = tmp.template selfadjointView(); } } return m_mat; } /** Return the right hand side corresponding to the current matrix. * If the rhs file is not provided, a random rhs is generated */ inline VectorType& rhs() { // Get the right hand side if (m_hasRhs) return m_rhs; std::string rhs_file; rhs_file = m_folder + "/" + m_matname + "_b.mtx"; // The pattern is matname_b.mtx m_hasRhs = Fileexists(rhs_file); if (m_hasRhs) { m_rhs.resize(m_mat.cols()); m_hasRhs = loadMarketVector(m_rhs, rhs_file); } if (!m_hasRhs) { // Generate a random right hand side if (!m_matIsLoaded) this->matrix(); m_refX.resize(m_mat.cols()); m_refX.setRandom(); m_rhs = m_mat * m_refX; m_hasrefX = true; m_hasRhs = true; } return m_rhs; } /** Return a reference solution * If it is not provided and if the right hand side is not available * then refX is randomly generated such that A*refX = b * where A and b are the matrix and the rhs. * Note that when a rhs is provided, refX is not available */ inline VectorType& refX() { // Check if a reference solution is provided if (m_hasrefX) return m_refX; std::string lhs_file; lhs_file = m_folder + "/" + m_matname + "_x.mtx"; m_hasrefX = Fileexists(lhs_file); if (m_hasrefX) { m_refX.resize(m_mat.cols()); m_hasrefX = loadMarketVector(m_refX, lhs_file); } else m_refX.resize(0); return m_refX; } inline std::string& matname() { return m_matname; } inline int sym() { return m_sym; } bool hasRhs() {return m_hasRhs; } bool hasrefX() {return m_hasrefX; } bool isFolderValid() { return bool(m_folder_id); } protected: inline bool Fileexists(std::string file) { std::ifstream file_id(file.c_str()); if (!file_id.good() ) { return false; } else { file_id.close(); return true; } } void Getnextvalidmatrix( ) { m_isvalid = false; // Here, we return with the next valid matrix in the folder while ( (m_curs_id = readdir(m_folder_id)) != NULL) { m_isvalid = false; std::string curfile; curfile = m_folder + "/" + m_curs_id->d_name; // Discard if it is a folder #if !(defined(__sun) || defined(_AIX) || defined(__hpux) || defined(__sgi) || defined(__HAIKU__)) if (m_curs_id->d_type == DT_DIR) continue; //FIXME This may not be available on non BSD systems #endif // struct stat st_buf; // stat (curfile.c_str(), &st_buf); // if (S_ISDIR(st_buf.st_mode)) continue; // Determine from the header if it is a matrix or a right hand side bool isvector,iscomplex=false; if(!getMarketHeader(curfile,m_sym,iscomplex,isvector)) continue; if(isvector) continue; if (!iscomplex) { if(internal::is_same >::value || internal::is_same >::value) continue; } if (iscomplex) { if(internal::is_same::value || internal::is_same::value) continue; } // Get the matrix name std::string filename = m_curs_id->d_name; m_matname = filename.substr(0, filename.length()-4); // Find if the matrix is SPD size_t found = m_matname.find("SPD"); if( (found!=std::string::npos) && (m_sym != NonSymmetric) ) m_sym = SPD; m_isvalid = true; break; } } int m_sym; // Symmetry of the matrix MatrixType m_mat; // Current matrix VectorType m_rhs; // Current vector VectorType m_refX; // The reference solution, if exists std::string m_matname; // Matrix Name bool m_isvalid; bool m_matIsLoaded; // Determine if the matrix has already been loaded from the file bool m_hasRhs; // The right hand side exists bool m_hasrefX; // A reference solution is provided std::string m_folder; DIR * m_folder_id; struct dirent *m_curs_id; }; } // end namespace Eigen #endif RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/RandomSetter.h0000644000176200001440000003020714562417221025763 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_RANDOMSETTER_H #define EIGEN_RANDOMSETTER_H #if defined(EIGEN_GOOGLEHASH_SUPPORT) // Ensure the ::google namespace exists, required for checking existence of // ::google::dense_hash_map and ::google::sparse_hash_map. namespace google {} #endif namespace Eigen { /** Represents a std::map * * \see RandomSetter */ template struct StdMapTraits { typedef int KeyType; typedef std::map Type; enum { IsSorted = 1 }; static void setInvalidKey(Type&, const KeyType&) {} }; #ifdef EIGEN_UNORDERED_MAP_SUPPORT /** Represents a std::unordered_map * * To use it you need to both define EIGEN_UNORDERED_MAP_SUPPORT and include the unordered_map header file * yourself making sure that unordered_map is defined in the std namespace. * * For instance, with current version of gcc you can either enable C++0x standard (-std=c++0x) or do: * \code * #include * #define EIGEN_UNORDERED_MAP_SUPPORT * namespace std { * using std::tr1::unordered_map; * } * \endcode * * \see RandomSetter */ template struct StdUnorderedMapTraits { typedef int KeyType; typedef std::unordered_map Type; enum { IsSorted = 0 }; static void setInvalidKey(Type&, const KeyType&) {} }; #endif // EIGEN_UNORDERED_MAP_SUPPORT #if defined(EIGEN_GOOGLEHASH_SUPPORT) namespace google { // Namespace work-around, since sometimes dense_hash_map and sparse_hash_map // are in the global namespace, and other times they are under ::google. using namespace ::google; template struct DenseHashMap { typedef dense_hash_map type; }; template struct SparseHashMap { typedef sparse_hash_map type; }; } // namespace google /** Represents a google::dense_hash_map * * \see RandomSetter */ template struct GoogleDenseHashMapTraits { typedef int KeyType; typedef typename google::DenseHashMap::type Type; enum { IsSorted = 0 }; static void setInvalidKey(Type& map, const KeyType& k) { map.set_empty_key(k); } }; /** Represents a google::sparse_hash_map * * \see RandomSetter */ template struct GoogleSparseHashMapTraits { typedef int KeyType; typedef typename google::SparseHashMap::type Type; enum { IsSorted = 0 }; static void setInvalidKey(Type&, const KeyType&) {} }; #endif /** \class RandomSetter * * \brief The RandomSetter is a wrapper object allowing to set/update a sparse matrix with random access * * \tparam SparseMatrixType the type of the sparse matrix we are updating * \tparam MapTraits a traits class representing the map implementation used for the temporary sparse storage. * Its default value depends on the system. * \tparam OuterPacketBits defines the number of rows (or columns) manage by a single map object * as a power of two exponent. * * This class temporarily represents a sparse matrix object using a generic map implementation allowing for * efficient random access. The conversion from the compressed representation to a hash_map object is performed * in the RandomSetter constructor, while the sparse matrix is updated back at destruction time. This strategy * suggest the use of nested blocks as in this example: * * \code * SparseMatrix m(rows,cols); * { * RandomSetter > w(m); * // don't use m but w instead with read/write random access to the coefficients: * for(;;) * w(rand(),rand()) = rand; * } * // when w is deleted, the data are copied back to m * // and m is ready to use. * \endcode * * Since hash_map objects are not fully sorted, representing a full matrix as a single hash_map would * involve a big and costly sort to update the compressed matrix back. To overcome this issue, a RandomSetter * use multiple hash_map, each representing 2^OuterPacketBits columns or rows according to the storage order. * To reach optimal performance, this value should be adjusted according to the average number of nonzeros * per rows/columns. * * The possible values for the template parameter MapTraits are: * - \b StdMapTraits: corresponds to std::map. (does not perform very well) * - \b GnuHashMapTraits: corresponds to __gnu_cxx::hash_map (available only with GCC) * - \b GoogleDenseHashMapTraits: corresponds to google::dense_hash_map (best efficiency, reasonable memory consumption) * - \b GoogleSparseHashMapTraits: corresponds to google::sparse_hash_map (best memory consumption, relatively good performance) * * The default map implementation depends on the availability, and the preferred order is: * GoogleSparseHashMapTraits, GnuHashMapTraits, and finally StdMapTraits. * * For performance and memory consumption reasons it is highly recommended to use one of * Google's hash_map implementations. To enable the support for them, you must define * EIGEN_GOOGLEHASH_SUPPORT. This will include both and * for you. * * \see https://github.com/sparsehash/sparsehash */ template class MapTraits = #if defined(EIGEN_GOOGLEHASH_SUPPORT) GoogleDenseHashMapTraits #elif defined(_HASH_MAP) GnuHashMapTraits #else StdMapTraits #endif ,int OuterPacketBits = 6> class RandomSetter { typedef typename SparseMatrixType::Scalar Scalar; typedef typename SparseMatrixType::StorageIndex StorageIndex; struct ScalarWrapper { ScalarWrapper() : value(0) {} Scalar value; }; typedef typename MapTraits::KeyType KeyType; typedef typename MapTraits::Type HashMapType; static const int OuterPacketMask = (1 << OuterPacketBits) - 1; enum { SwapStorage = 1 - MapTraits::IsSorted, TargetRowMajor = (SparseMatrixType::Flags & RowMajorBit) ? 1 : 0, SetterRowMajor = SwapStorage ? 1-TargetRowMajor : TargetRowMajor }; public: /** Constructs a random setter object from the sparse matrix \a target * * Note that the initial value of \a target are imported. If you want to re-set * a sparse matrix from scratch, then you must set it to zero first using the * setZero() function. */ inline RandomSetter(SparseMatrixType& target) : mp_target(&target) { const Index outerSize = SwapStorage ? target.innerSize() : target.outerSize(); const Index innerSize = SwapStorage ? target.outerSize() : target.innerSize(); m_outerPackets = outerSize >> OuterPacketBits; if (outerSize&OuterPacketMask) m_outerPackets += 1; m_hashmaps = new HashMapType[m_outerPackets]; // compute number of bits needed to store inner indices Index aux = innerSize - 1; m_keyBitsOffset = 0; while (aux) { ++m_keyBitsOffset; aux = aux >> 1; } KeyType ik = (1<<(OuterPacketBits+m_keyBitsOffset)); for (Index k=0; k::setInvalidKey(m_hashmaps[k],ik); // insert current coeffs for (Index j=0; jouterSize(); ++j) for (typename SparseMatrixType::InnerIterator it(*mp_target,j); it; ++it) (*this)(TargetRowMajor?j:it.index(), TargetRowMajor?it.index():j) = it.value(); } /** Destructor updating back the sparse matrix target */ ~RandomSetter() { KeyType keyBitsMask = (1<setZero(); mp_target->makeCompressed(); mp_target->reserve(nonZeros()); Index prevOuter = -1; for (Index k=0; kfirst >> m_keyBitsOffset) + outerOffset; const Index inner = it->first & keyBitsMask; if (prevOuter!=outer) { for (Index j=prevOuter+1;j<=outer;++j) mp_target->startVec(j); prevOuter = outer; } mp_target->insertBackByOuterInner(outer, inner) = it->second.value; } } mp_target->finalize(); } else { VectorXi positions(mp_target->outerSize()); positions.setZero(); // pass 1 for (Index k=0; kfirst & keyBitsMask; ++positions[outer]; } } // prefix sum StorageIndex count = 0; for (Index j=0; jouterSize(); ++j) { StorageIndex tmp = positions[j]; mp_target->outerIndexPtr()[j] = count; positions[j] = count; count += tmp; } mp_target->makeCompressed(); mp_target->outerIndexPtr()[mp_target->outerSize()] = count; mp_target->resizeNonZeros(count); // pass 2 for (Index k=0; kfirst >> m_keyBitsOffset) + outerOffset; const Index outer = it->first & keyBitsMask; // sorted insertion // Note that we have to deal with at most 2^OuterPacketBits unsorted coefficients, // moreover those 2^OuterPacketBits coeffs are likely to be sparse, an so only a // small fraction of them have to be sorted, whence the following simple procedure: Index posStart = mp_target->outerIndexPtr()[outer]; Index i = (positions[outer]++) - 1; while ( (i >= posStart) && (mp_target->innerIndexPtr()[i] > inner) ) { mp_target->valuePtr()[i+1] = mp_target->valuePtr()[i]; mp_target->innerIndexPtr()[i+1] = mp_target->innerIndexPtr()[i]; --i; } mp_target->innerIndexPtr()[i+1] = internal::convert_index(inner); mp_target->valuePtr()[i+1] = it->second.value; } } } delete[] m_hashmaps; } /** \returns a reference to the coefficient at given coordinates \a row, \a col */ Scalar& operator() (Index row, Index col) { const Index outer = SetterRowMajor ? row : col; const Index inner = SetterRowMajor ? col : row; const Index outerMajor = outer >> OuterPacketBits; // index of the packet/map const Index outerMinor = outer & OuterPacketMask; // index of the inner vector in the packet const KeyType key = internal::convert_index((outerMinor<(m_hashmaps[k].size()); return nz; } protected: HashMapType* m_hashmaps; SparseMatrixType* mp_target; Index m_outerPackets; unsigned char m_keyBitsOffset; }; } // end namespace Eigen #endif // EIGEN_RANDOMSETTER_H RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/BlockSparseMatrix.h0000644000176200001440000011657414107270226026762 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Desire Nuentsa // Copyright (C) 2013 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSEBLOCKMATRIX_H #define EIGEN_SPARSEBLOCKMATRIX_H namespace Eigen { /** \ingroup SparseCore_Module * * \class BlockSparseMatrix * * \brief A versatile sparse matrix representation where each element is a block * * This class provides routines to manipulate block sparse matrices stored in a * BSR-like representation. There are two main types : * * 1. All blocks have the same number of rows and columns, called block size * in the following. In this case, if this block size is known at compile time, * it can be given as a template parameter like * \code * BlockSparseMatrix bmat(b_rows, b_cols); * \endcode * Here, bmat is a b_rows x b_cols block sparse matrix * where each coefficient is a 3x3 dense matrix. * If the block size is fixed but will be given at runtime, * \code * BlockSparseMatrix bmat(b_rows, b_cols); * bmat.setBlockSize(block_size); * \endcode * * 2. The second case is for variable-block sparse matrices. * Here each block has its own dimensions. The only restriction is that all the blocks * in a row (resp. a column) should have the same number of rows (resp. of columns). * It is thus required in this case to describe the layout of the matrix by calling * setBlockLayout(rowBlocks, colBlocks). * * In any of the previous case, the matrix can be filled by calling setFromTriplets(). * A regular sparse matrix can be converted to a block sparse matrix and vice versa. * It is obviously required to describe the block layout beforehand by calling either * setBlockSize() for fixed-size blocks or setBlockLayout for variable-size blocks. * * \tparam _Scalar The Scalar type * \tparam _BlockAtCompileTime The block layout option. It takes the following values * Dynamic : block size known at runtime * a numeric number : fixed-size block known at compile time */ template class BlockSparseMatrix; template class BlockSparseMatrixView; namespace internal { template struct traits > { typedef _Scalar Scalar; typedef _Index Index; typedef Sparse StorageKind; // FIXME Where is it used ?? typedef MatrixXpr XprKind; enum { RowsAtCompileTime = Dynamic, ColsAtCompileTime = Dynamic, MaxRowsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic, BlockSize = _BlockAtCompileTime, Flags = _Options | NestByRefBit | LvalueBit, CoeffReadCost = NumTraits::ReadCost, SupportedAccessPatterns = InnerRandomAccessPattern }; }; template struct traits > { typedef Ref > Scalar; typedef Ref > RealScalar; }; // Function object to sort a triplet list template struct TripletComp { typedef typename Iterator::value_type Triplet; bool operator()(const Triplet& a, const Triplet& b) { if(IsColMajor) return ((a.col() == b.col() && a.row() < b.row()) || (a.col() < b.col())); else return ((a.row() == b.row() && a.col() < b.col()) || (a.row() < b.row())); } }; } // end namespace internal /* Proxy to view the block sparse matrix as a regular sparse matrix */ template class BlockSparseMatrixView : public SparseMatrixBase { public: typedef Ref Scalar; typedef Ref RealScalar; typedef typename BlockSparseMatrixT::Index Index; typedef BlockSparseMatrixT Nested; enum { Flags = BlockSparseMatrixT::Options, Options = BlockSparseMatrixT::Options, RowsAtCompileTime = BlockSparseMatrixT::RowsAtCompileTime, ColsAtCompileTime = BlockSparseMatrixT::ColsAtCompileTime, MaxColsAtCompileTime = BlockSparseMatrixT::MaxColsAtCompileTime, MaxRowsAtCompileTime = BlockSparseMatrixT::MaxRowsAtCompileTime }; public: BlockSparseMatrixView(const BlockSparseMatrixT& spblockmat) : m_spblockmat(spblockmat) {} Index outerSize() const { return (Flags&RowMajorBit) == 1 ? this->rows() : this->cols(); } Index cols() const { return m_spblockmat.blockCols(); } Index rows() const { return m_spblockmat.blockRows(); } Scalar coeff(Index row, Index col) { return m_spblockmat.coeff(row, col); } Scalar coeffRef(Index row, Index col) { return m_spblockmat.coeffRef(row, col); } // Wrapper to iterate over all blocks class InnerIterator : public BlockSparseMatrixT::BlockInnerIterator { public: InnerIterator(const BlockSparseMatrixView& mat, Index outer) : BlockSparseMatrixT::BlockInnerIterator(mat.m_spblockmat, outer) {} }; protected: const BlockSparseMatrixT& m_spblockmat; }; // Proxy to view a regular vector as a block vector template class BlockVectorView { public: enum { BlockSize = BlockSparseMatrixT::BlockSize, ColsAtCompileTime = VectorType::ColsAtCompileTime, RowsAtCompileTime = VectorType::RowsAtCompileTime, Flags = VectorType::Flags }; typedef Ref >Scalar; typedef typename BlockSparseMatrixT::Index Index; public: BlockVectorView(const BlockSparseMatrixT& spblockmat, const VectorType& vec) : m_spblockmat(spblockmat),m_vec(vec) { } inline Index cols() const { return m_vec.cols(); } inline Index size() const { return m_spblockmat.blockRows(); } inline Scalar coeff(Index bi) const { Index startRow = m_spblockmat.blockRowsIndex(bi); Index rowSize = m_spblockmat.blockRowsIndex(bi+1) - startRow; return m_vec.middleRows(startRow, rowSize); } inline Scalar coeff(Index bi, Index j) const { Index startRow = m_spblockmat.blockRowsIndex(bi); Index rowSize = m_spblockmat.blockRowsIndex(bi+1) - startRow; return m_vec.block(startRow, j, rowSize, 1); } protected: const BlockSparseMatrixT& m_spblockmat; const VectorType& m_vec; }; template class BlockVectorReturn; // Proxy to view a regular vector as a block vector template class BlockVectorReturn { public: enum { ColsAtCompileTime = VectorType::ColsAtCompileTime, RowsAtCompileTime = VectorType::RowsAtCompileTime, Flags = VectorType::Flags }; typedef Ref > Scalar; typedef typename BlockSparseMatrixT::Index Index; public: BlockVectorReturn(const BlockSparseMatrixT& spblockmat, VectorType& vec) : m_spblockmat(spblockmat),m_vec(vec) { } inline Index size() const { return m_spblockmat.blockRows(); } inline Scalar coeffRef(Index bi) { Index startRow = m_spblockmat.blockRowsIndex(bi); Index rowSize = m_spblockmat.blockRowsIndex(bi+1) - startRow; return m_vec.middleRows(startRow, rowSize); } inline Scalar coeffRef(Index bi, Index j) { Index startRow = m_spblockmat.blockRowsIndex(bi); Index rowSize = m_spblockmat.blockRowsIndex(bi+1) - startRow; return m_vec.block(startRow, j, rowSize, 1); } protected: const BlockSparseMatrixT& m_spblockmat; VectorType& m_vec; }; // Block version of the sparse dense product template class BlockSparseTimeDenseProduct; namespace internal { template struct traits > { typedef Dense StorageKind; typedef MatrixXpr XprKind; typedef typename BlockSparseMatrixT::Scalar Scalar; typedef typename BlockSparseMatrixT::Index Index; enum { RowsAtCompileTime = Dynamic, ColsAtCompileTime = Dynamic, MaxRowsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic, Flags = 0, CoeffReadCost = internal::traits::CoeffReadCost }; }; } // end namespace internal template class BlockSparseTimeDenseProduct : public ProductBase, Lhs, Rhs> { public: EIGEN_PRODUCT_PUBLIC_INTERFACE(BlockSparseTimeDenseProduct) BlockSparseTimeDenseProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {} template void scaleAndAddTo(Dest& dest, const typename Rhs::Scalar& alpha) const { BlockVectorReturn tmpDest(m_lhs, dest); internal::sparse_time_dense_product( BlockSparseMatrixView(m_lhs), BlockVectorView(m_lhs, m_rhs), tmpDest, alpha); } private: BlockSparseTimeDenseProduct& operator=(const BlockSparseTimeDenseProduct&); }; template class BlockSparseMatrix : public SparseMatrixBase > { public: typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef _StorageIndex StorageIndex; typedef typename internal::ref_selector >::type Nested; enum { Options = _Options, Flags = Options, BlockSize=_BlockAtCompileTime, RowsAtCompileTime = Dynamic, ColsAtCompileTime = Dynamic, MaxRowsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic, IsVectorAtCompileTime = 0, IsColMajor = Flags&RowMajorBit ? 0 : 1 }; typedef Matrix BlockScalar; typedef Matrix BlockRealScalar; typedef typename internal::conditional<_BlockAtCompileTime==Dynamic, Scalar, BlockScalar>::type BlockScalarReturnType; typedef BlockSparseMatrix PlainObject; public: // Default constructor BlockSparseMatrix() : m_innerBSize(0),m_outerBSize(0),m_innerOffset(0),m_outerOffset(0), m_nonzerosblocks(0),m_values(0),m_blockPtr(0),m_indices(0), m_outerIndex(0),m_blockSize(BlockSize) { } /** * \brief Construct and resize * */ BlockSparseMatrix(Index brow, Index bcol) : m_innerBSize(IsColMajor ? brow : bcol), m_outerBSize(IsColMajor ? bcol : brow), m_innerOffset(0),m_outerOffset(0),m_nonzerosblocks(0), m_values(0),m_blockPtr(0),m_indices(0), m_outerIndex(0),m_blockSize(BlockSize) { } /** * \brief Copy-constructor */ BlockSparseMatrix(const BlockSparseMatrix& other) : m_innerBSize(other.m_innerBSize),m_outerBSize(other.m_outerBSize), m_nonzerosblocks(other.m_nonzerosblocks),m_nonzeros(other.m_nonzeros), m_blockPtr(0),m_blockSize(other.m_blockSize) { // should we allow copying between variable-size blocks and fixed-size blocks ?? eigen_assert(m_blockSize == BlockSize && " CAN NOT COPY BETWEEN FIXED-SIZE AND VARIABLE-SIZE BLOCKS"); std::copy(other.m_innerOffset, other.m_innerOffset+m_innerBSize+1, m_innerOffset); std::copy(other.m_outerOffset, other.m_outerOffset+m_outerBSize+1, m_outerOffset); std::copy(other.m_values, other.m_values+m_nonzeros, m_values); if(m_blockSize != Dynamic) std::copy(other.m_blockPtr, other.m_blockPtr+m_nonzerosblocks, m_blockPtr); std::copy(other.m_indices, other.m_indices+m_nonzerosblocks, m_indices); std::copy(other.m_outerIndex, other.m_outerIndex+m_outerBSize, m_outerIndex); } friend void swap(BlockSparseMatrix& first, BlockSparseMatrix& second) { std::swap(first.m_innerBSize, second.m_innerBSize); std::swap(first.m_outerBSize, second.m_outerBSize); std::swap(first.m_innerOffset, second.m_innerOffset); std::swap(first.m_outerOffset, second.m_outerOffset); std::swap(first.m_nonzerosblocks, second.m_nonzerosblocks); std::swap(first.m_nonzeros, second.m_nonzeros); std::swap(first.m_values, second.m_values); std::swap(first.m_blockPtr, second.m_blockPtr); std::swap(first.m_indices, second.m_indices); std::swap(first.m_outerIndex, second.m_outerIndex); std::swap(first.m_BlockSize, second.m_blockSize); } BlockSparseMatrix& operator=(BlockSparseMatrix other) { //Copy-and-swap paradigm ... avoid leaked data if thrown swap(*this, other); return *this; } // Destructor ~BlockSparseMatrix() { delete[] m_outerIndex; delete[] m_innerOffset; delete[] m_outerOffset; delete[] m_indices; delete[] m_blockPtr; delete[] m_values; } /** * \brief Constructor from a sparse matrix * */ template inline BlockSparseMatrix(const MatrixType& spmat) : m_blockSize(BlockSize) { EIGEN_STATIC_ASSERT((m_blockSize != Dynamic), THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE); *this = spmat; } /** * \brief Assignment from a sparse matrix with the same storage order * * Convert from a sparse matrix to block sparse matrix. * \warning Before calling this function, tt is necessary to call * either setBlockLayout() (matrices with variable-size blocks) * or setBlockSize() (for fixed-size blocks). */ template inline BlockSparseMatrix& operator=(const MatrixType& spmat) { eigen_assert((m_innerBSize != 0 && m_outerBSize != 0) && "Trying to assign to a zero-size matrix, call resize() first"); eigen_assert(((MatrixType::Options&RowMajorBit) != IsColMajor) && "Wrong storage order"); typedef SparseMatrix MatrixPatternType; MatrixPatternType blockPattern(blockRows(), blockCols()); m_nonzeros = 0; // First, compute the number of nonzero blocks and their locations for(StorageIndex bj = 0; bj < m_outerBSize; ++bj) { // Browse each outer block and compute the structure std::vector nzblocksFlag(m_innerBSize,false); // Record the existing blocks blockPattern.startVec(bj); for(StorageIndex j = blockOuterIndex(bj); j < blockOuterIndex(bj+1); ++j) { typename MatrixType::InnerIterator it_spmat(spmat, j); for(; it_spmat; ++it_spmat) { StorageIndex bi = innerToBlock(it_spmat.index()); // Index of the current nonzero block if(!nzblocksFlag[bi]) { // Save the index of this nonzero block nzblocksFlag[bi] = true; blockPattern.insertBackByOuterInnerUnordered(bj, bi) = true; // Compute the total number of nonzeros (including explicit zeros in blocks) m_nonzeros += blockOuterSize(bj) * blockInnerSize(bi); } } } // end current outer block } blockPattern.finalize(); // Allocate the internal arrays setBlockStructure(blockPattern); for(StorageIndex nz = 0; nz < m_nonzeros; ++nz) m_values[nz] = Scalar(0); for(StorageIndex bj = 0; bj < m_outerBSize; ++bj) { // Now copy the values for(StorageIndex j = blockOuterIndex(bj); j < blockOuterIndex(bj+1); ++j) { // Browse the outer block column by column (for column-major matrices) typename MatrixType::InnerIterator it_spmat(spmat, j); for(; it_spmat; ++it_spmat) { StorageIndex idx = 0; // Position of this block in the column block StorageIndex bi = innerToBlock(it_spmat.index()); // Index of the current nonzero block // Go to the inner block where this element belongs to while(bi > m_indices[m_outerIndex[bj]+idx]) ++idx; // Not expensive for ordered blocks StorageIndex idxVal;// Get the right position in the array of values for this element if(m_blockSize == Dynamic) { // Offset from all blocks before ... idxVal = m_blockPtr[m_outerIndex[bj]+idx]; // ... and offset inside the block idxVal += (j - blockOuterIndex(bj)) * blockOuterSize(bj) + it_spmat.index() - m_innerOffset[bi]; } else { // All blocks before idxVal = (m_outerIndex[bj] + idx) * m_blockSize * m_blockSize; // inside the block idxVal += (j - blockOuterIndex(bj)) * m_blockSize + (it_spmat.index()%m_blockSize); } // Insert the value m_values[idxVal] = it_spmat.value(); } // end of this column } // end of this block } // end of this outer block return *this; } /** * \brief Set the nonzero block pattern of the matrix * * Given a sparse matrix describing the nonzero block pattern, * this function prepares the internal pointers for values. * After calling this function, any *nonzero* block (bi, bj) can be set * with a simple call to coeffRef(bi,bj). * * * \warning Before calling this function, tt is necessary to call * either setBlockLayout() (matrices with variable-size blocks) * or setBlockSize() (for fixed-size blocks). * * \param blockPattern Sparse matrix of boolean elements describing the block structure * * \sa setBlockLayout() \sa setBlockSize() */ template void setBlockStructure(const MatrixType& blockPattern) { resize(blockPattern.rows(), blockPattern.cols()); reserve(blockPattern.nonZeros()); // Browse the block pattern and set up the various pointers m_outerIndex[0] = 0; if(m_blockSize == Dynamic) m_blockPtr[0] = 0; for(StorageIndex nz = 0; nz < m_nonzeros; ++nz) m_values[nz] = Scalar(0); for(StorageIndex bj = 0; bj < m_outerBSize; ++bj) { //Browse each outer block //First, copy and save the indices of nonzero blocks //FIXME : find a way to avoid this ... std::vector nzBlockIdx; typename MatrixType::InnerIterator it(blockPattern, bj); for(; it; ++it) { nzBlockIdx.push_back(it.index()); } std::sort(nzBlockIdx.begin(), nzBlockIdx.end()); // Now, fill block indices and (eventually) pointers to blocks for(StorageIndex idx = 0; idx < nzBlockIdx.size(); ++idx) { StorageIndex offset = m_outerIndex[bj]+idx; // offset in m_indices m_indices[offset] = nzBlockIdx[idx]; if(m_blockSize == Dynamic) m_blockPtr[offset] = m_blockPtr[offset-1] + blockInnerSize(nzBlockIdx[idx]) * blockOuterSize(bj); // There is no blockPtr for fixed-size blocks... not needed !??? } // Save the pointer to the next outer block m_outerIndex[bj+1] = m_outerIndex[bj] + nzBlockIdx.size(); } } /** * \brief Set the number of rows and columns blocks */ inline void resize(Index brow, Index bcol) { m_innerBSize = IsColMajor ? brow : bcol; m_outerBSize = IsColMajor ? bcol : brow; } /** * \brief set the block size at runtime for fixed-size block layout * * Call this only for fixed-size blocks */ inline void setBlockSize(Index blockSize) { m_blockSize = blockSize; } /** * \brief Set the row and column block layouts, * * This function set the size of each row and column block. * So this function should be used only for blocks with variable size. * \param rowBlocks : Number of rows per row block * \param colBlocks : Number of columns per column block * \sa resize(), setBlockSize() */ inline void setBlockLayout(const VectorXi& rowBlocks, const VectorXi& colBlocks) { const VectorXi& innerBlocks = IsColMajor ? rowBlocks : colBlocks; const VectorXi& outerBlocks = IsColMajor ? colBlocks : rowBlocks; eigen_assert(m_innerBSize == innerBlocks.size() && "CHECK THE NUMBER OF ROW OR COLUMN BLOCKS"); eigen_assert(m_outerBSize == outerBlocks.size() && "CHECK THE NUMBER OF ROW OR COLUMN BLOCKS"); m_outerBSize = outerBlocks.size(); // starting index of blocks... cumulative sums m_innerOffset = new StorageIndex[m_innerBSize+1]; m_outerOffset = new StorageIndex[m_outerBSize+1]; m_innerOffset[0] = 0; m_outerOffset[0] = 0; std::partial_sum(&innerBlocks[0], &innerBlocks[m_innerBSize-1]+1, &m_innerOffset[1]); std::partial_sum(&outerBlocks[0], &outerBlocks[m_outerBSize-1]+1, &m_outerOffset[1]); // Compute the total number of nonzeros m_nonzeros = 0; for(StorageIndex bj = 0; bj < m_outerBSize; ++bj) for(StorageIndex bi = 0; bi < m_innerBSize; ++bi) m_nonzeros += outerBlocks[bj] * innerBlocks[bi]; } /** * \brief Allocate the internal array of pointers to blocks and their inner indices * * \note For fixed-size blocks, call setBlockSize() to set the block. * And For variable-size blocks, call setBlockLayout() before using this function * * \param nonzerosblocks Number of nonzero blocks. The total number of nonzeros is * is computed in setBlockLayout() for variable-size blocks * \sa setBlockSize() */ inline void reserve(const Index nonzerosblocks) { eigen_assert((m_innerBSize != 0 && m_outerBSize != 0) && "TRYING TO RESERVE ZERO-SIZE MATRICES, CALL resize() first"); //FIXME Should free if already allocated m_outerIndex = new StorageIndex[m_outerBSize+1]; m_nonzerosblocks = nonzerosblocks; if(m_blockSize != Dynamic) { m_nonzeros = nonzerosblocks * (m_blockSize * m_blockSize); m_blockPtr = 0; } else { // m_nonzeros is already computed in setBlockLayout() m_blockPtr = new StorageIndex[m_nonzerosblocks+1]; } m_indices = new StorageIndex[m_nonzerosblocks+1]; m_values = new Scalar[m_nonzeros]; } /** * \brief Fill values in a matrix from a triplet list. * * Each triplet item has a block stored in an Eigen dense matrix. * The InputIterator class should provide the functions row(), col() and value() * * \note For fixed-size blocks, call setBlockSize() before this function. * * FIXME Do not accept duplicates */ template void setFromTriplets(const InputIterator& begin, const InputIterator& end) { eigen_assert((m_innerBSize!=0 && m_outerBSize !=0) && "ZERO BLOCKS, PLEASE CALL resize() before"); /* First, sort the triplet list * FIXME This can be unnecessarily expensive since only the inner indices have to be sorted * The best approach is like in SparseMatrix::setFromTriplets() */ internal::TripletComp tripletcomp; std::sort(begin, end, tripletcomp); /* Count the number of rows and column blocks, * and the number of nonzero blocks per outer dimension */ VectorXi rowBlocks(m_innerBSize); // Size of each block row VectorXi colBlocks(m_outerBSize); // Size of each block column rowBlocks.setZero(); colBlocks.setZero(); VectorXi nzblock_outer(m_outerBSize); // Number of nz blocks per outer vector VectorXi nz_outer(m_outerBSize); // Number of nz per outer vector...for variable-size blocks nzblock_outer.setZero(); nz_outer.setZero(); for(InputIterator it(begin); it !=end; ++it) { eigen_assert(it->row() >= 0 && it->row() < this->blockRows() && it->col() >= 0 && it->col() < this->blockCols()); eigen_assert((it->value().rows() == it->value().cols() && (it->value().rows() == m_blockSize)) || (m_blockSize == Dynamic)); if(m_blockSize == Dynamic) { eigen_assert((rowBlocks[it->row()] == 0 || rowBlocks[it->row()] == it->value().rows()) && "NON CORRESPONDING SIZES FOR ROW BLOCKS"); eigen_assert((colBlocks[it->col()] == 0 || colBlocks[it->col()] == it->value().cols()) && "NON CORRESPONDING SIZES FOR COLUMN BLOCKS"); rowBlocks[it->row()] =it->value().rows(); colBlocks[it->col()] = it->value().cols(); } nz_outer(IsColMajor ? it->col() : it->row()) += it->value().rows() * it->value().cols(); nzblock_outer(IsColMajor ? it->col() : it->row())++; } // Allocate member arrays if(m_blockSize == Dynamic) setBlockLayout(rowBlocks, colBlocks); StorageIndex nzblocks = nzblock_outer.sum(); reserve(nzblocks); // Temporary markers VectorXi block_id(m_outerBSize); // To be used as a block marker during insertion // Setup outer index pointers and markers m_outerIndex[0] = 0; if (m_blockSize == Dynamic) m_blockPtr[0] = 0; for(StorageIndex bj = 0; bj < m_outerBSize; ++bj) { m_outerIndex[bj+1] = m_outerIndex[bj] + nzblock_outer(bj); block_id(bj) = m_outerIndex[bj]; if(m_blockSize==Dynamic) { m_blockPtr[m_outerIndex[bj+1]] = m_blockPtr[m_outerIndex[bj]] + nz_outer(bj); } } // Fill the matrix for(InputIterator it(begin); it!=end; ++it) { StorageIndex outer = IsColMajor ? it->col() : it->row(); StorageIndex inner = IsColMajor ? it->row() : it->col(); m_indices[block_id(outer)] = inner; StorageIndex block_size = it->value().rows()*it->value().cols(); StorageIndex nz_marker = blockPtr(block_id[outer]); memcpy(&(m_values[nz_marker]), it->value().data(), block_size * sizeof(Scalar)); if(m_blockSize == Dynamic) { m_blockPtr[block_id(outer)+1] = m_blockPtr[block_id(outer)] + block_size; } block_id(outer)++; } // An alternative when the outer indices are sorted...no need to use an array of markers // for(Index bcol = 0; bcol < m_outerBSize; ++bcol) // { // Index id = 0, id_nz = 0, id_nzblock = 0; // for(InputIterator it(begin); it!=end; ++it) // { // while (idvalue().rows()*it->value().cols(); // m_blockPtr[id_nzblock+1] = m_blockPtr[id_nzblock] + block_size; // id_nzblock++; // memcpy(&(m_values[id_nz]),it->value().data(), block_size*sizeof(Scalar)); // id_nz += block_size; // } // while(id < m_outerBSize-1) // Empty columns at the end // { // id++; // m_outerIndex[id+1]=m_outerIndex[id]; // } // } } /** * \returns the number of rows */ inline Index rows() const { // return blockRows(); return (IsColMajor ? innerSize() : outerSize()); } /** * \returns the number of cols */ inline Index cols() const { // return blockCols(); return (IsColMajor ? outerSize() : innerSize()); } inline Index innerSize() const { if(m_blockSize == Dynamic) return m_innerOffset[m_innerBSize]; else return (m_innerBSize * m_blockSize) ; } inline Index outerSize() const { if(m_blockSize == Dynamic) return m_outerOffset[m_outerBSize]; else return (m_outerBSize * m_blockSize) ; } /** \returns the number of rows grouped by blocks */ inline Index blockRows() const { return (IsColMajor ? m_innerBSize : m_outerBSize); } /** \returns the number of columns grouped by blocks */ inline Index blockCols() const { return (IsColMajor ? m_outerBSize : m_innerBSize); } inline Index outerBlocks() const { return m_outerBSize; } inline Index innerBlocks() const { return m_innerBSize; } /** \returns the block index where outer belongs to */ inline Index outerToBlock(Index outer) const { eigen_assert(outer < outerSize() && "OUTER INDEX OUT OF BOUNDS"); if(m_blockSize != Dynamic) return (outer / m_blockSize); // Integer division StorageIndex b_outer = 0; while(m_outerOffset[b_outer] <= outer) ++b_outer; return b_outer - 1; } /** \returns the block index where inner belongs to */ inline Index innerToBlock(Index inner) const { eigen_assert(inner < innerSize() && "OUTER INDEX OUT OF BOUNDS"); if(m_blockSize != Dynamic) return (inner / m_blockSize); // Integer division StorageIndex b_inner = 0; while(m_innerOffset[b_inner] <= inner) ++b_inner; return b_inner - 1; } /** *\returns a reference to the (i,j) block as an Eigen Dense Matrix */ Ref coeffRef(Index brow, Index bcol) { eigen_assert(brow < blockRows() && "BLOCK ROW INDEX OUT OF BOUNDS"); eigen_assert(bcol < blockCols() && "BLOCK nzblocksFlagCOLUMN OUT OF BOUNDS"); StorageIndex rsize = IsColMajor ? blockInnerSize(brow): blockOuterSize(bcol); StorageIndex csize = IsColMajor ? blockOuterSize(bcol) : blockInnerSize(brow); StorageIndex inner = IsColMajor ? brow : bcol; StorageIndex outer = IsColMajor ? bcol : brow; StorageIndex offset = m_outerIndex[outer]; while(offset < m_outerIndex[outer+1] && m_indices[offset] != inner) offset++; if(m_indices[offset] == inner) { return Map(&(m_values[blockPtr(offset)]), rsize, csize); } else { //FIXME the block does not exist, Insert it !!!!!!!!! eigen_assert("DYNAMIC INSERTION IS NOT YET SUPPORTED"); } } /** * \returns the value of the (i,j) block as an Eigen Dense Matrix */ Map coeff(Index brow, Index bcol) const { eigen_assert(brow < blockRows() && "BLOCK ROW INDEX OUT OF BOUNDS"); eigen_assert(bcol < blockCols() && "BLOCK COLUMN OUT OF BOUNDS"); StorageIndex rsize = IsColMajor ? blockInnerSize(brow): blockOuterSize(bcol); StorageIndex csize = IsColMajor ? blockOuterSize(bcol) : blockInnerSize(brow); StorageIndex inner = IsColMajor ? brow : bcol; StorageIndex outer = IsColMajor ? bcol : brow; StorageIndex offset = m_outerIndex[outer]; while(offset < m_outerIndex[outer+1] && m_indices[offset] != inner) offset++; if(m_indices[offset] == inner) { return Map (&(m_values[blockPtr(offset)]), rsize, csize); } else // return BlockScalar::Zero(rsize, csize); eigen_assert("NOT YET SUPPORTED"); } // Block Matrix times vector product template BlockSparseTimeDenseProduct operator*(const VecType& lhs) const { return BlockSparseTimeDenseProduct(*this, lhs); } /** \returns the number of nonzero blocks */ inline Index nonZerosBlocks() const { return m_nonzerosblocks; } /** \returns the total number of nonzero elements, including eventual explicit zeros in blocks */ inline Index nonZeros() const { return m_nonzeros; } inline BlockScalarReturnType *valuePtr() {return static_cast(m_values);} // inline Scalar *valuePtr(){ return m_values; } inline StorageIndex *innerIndexPtr() {return m_indices; } inline const StorageIndex *innerIndexPtr() const {return m_indices; } inline StorageIndex *outerIndexPtr() {return m_outerIndex; } inline const StorageIndex* outerIndexPtr() const {return m_outerIndex; } /** \brief for compatibility purposes with the SparseMatrix class */ inline bool isCompressed() const {return true;} /** * \returns the starting index of the bi row block */ inline Index blockRowsIndex(Index bi) const { return IsColMajor ? blockInnerIndex(bi) : blockOuterIndex(bi); } /** * \returns the starting index of the bj col block */ inline Index blockColsIndex(Index bj) const { return IsColMajor ? blockOuterIndex(bj) : blockInnerIndex(bj); } inline Index blockOuterIndex(Index bj) const { return (m_blockSize == Dynamic) ? m_outerOffset[bj] : (bj * m_blockSize); } inline Index blockInnerIndex(Index bi) const { return (m_blockSize == Dynamic) ? m_innerOffset[bi] : (bi * m_blockSize); } // Not needed ??? inline Index blockInnerSize(Index bi) const { return (m_blockSize == Dynamic) ? (m_innerOffset[bi+1] - m_innerOffset[bi]) : m_blockSize; } inline Index blockOuterSize(Index bj) const { return (m_blockSize == Dynamic) ? (m_outerOffset[bj+1]- m_outerOffset[bj]) : m_blockSize; } /** * \brief Browse the matrix by outer index */ class InnerIterator; // Browse column by column /** * \brief Browse the matrix by block outer index */ class BlockInnerIterator; // Browse block by block friend std::ostream & operator << (std::ostream & s, const BlockSparseMatrix& m) { for (StorageIndex j = 0; j < m.outerBlocks(); ++j) { BlockInnerIterator itb(m, j); for(; itb; ++itb) { s << "("<::type()); } protected: // inline Index blockDynIdx(Index id, internal::true_type) const // { // return m_blockPtr[id]; // } // inline Index blockDynIdx(Index id, internal::false_type) const // { // return id * BlockSize * BlockSize; // } // To be implemented // Insert a block at a particular location... need to make a room for that Map insert(Index brow, Index bcol); Index m_innerBSize; // Number of block rows Index m_outerBSize; // Number of block columns StorageIndex *m_innerOffset; // Starting index of each inner block (size m_innerBSize+1) StorageIndex *m_outerOffset; // Starting index of each outer block (size m_outerBSize+1) Index m_nonzerosblocks; // Total nonzeros blocks (lower than m_innerBSize x m_outerBSize) Index m_nonzeros; // Total nonzeros elements Scalar *m_values; //Values stored block column after block column (size m_nonzeros) StorageIndex *m_blockPtr; // Pointer to the beginning of each block in m_values, size m_nonzeroblocks ... null for fixed-size blocks StorageIndex *m_indices; //Inner block indices, size m_nonzerosblocks ... OK StorageIndex *m_outerIndex; // Starting pointer of each block column in m_indices (size m_outerBSize)... OK Index m_blockSize; // Size of a block for fixed-size blocks, otherwise -1 }; template class BlockSparseMatrix<_Scalar, _BlockAtCompileTime, _Options, _StorageIndex>::BlockInnerIterator { public: enum{ Flags = _Options }; BlockInnerIterator(const BlockSparseMatrix& mat, const Index outer) : m_mat(mat),m_outer(outer), m_id(mat.m_outerIndex[outer]), m_end(mat.m_outerIndex[outer+1]) { } inline BlockInnerIterator& operator++() {m_id++; return *this; } inline const Map value() const { return Map(&(m_mat.m_values[m_mat.blockPtr(m_id)]), rows(),cols()); } inline Map valueRef() { return Map(&(m_mat.m_values[m_mat.blockPtr(m_id)]), rows(),cols()); } // Block inner index inline Index index() const {return m_mat.m_indices[m_id]; } inline Index outer() const { return m_outer; } // block row index inline Index row() const {return index(); } // block column index inline Index col() const {return outer(); } // FIXME Number of rows in the current block inline Index rows() const { return (m_mat.m_blockSize==Dynamic) ? (m_mat.m_innerOffset[index()+1] - m_mat.m_innerOffset[index()]) : m_mat.m_blockSize; } // Number of columns in the current block ... inline Index cols() const { return (m_mat.m_blockSize==Dynamic) ? (m_mat.m_outerOffset[m_outer+1]-m_mat.m_outerOffset[m_outer]) : m_mat.m_blockSize;} inline operator bool() const { return (m_id < m_end); } protected: const BlockSparseMatrix<_Scalar, _BlockAtCompileTime, _Options, StorageIndex>& m_mat; const Index m_outer; Index m_id; Index m_end; }; template class BlockSparseMatrix<_Scalar, _BlockAtCompileTime, _Options, _StorageIndex>::InnerIterator { public: InnerIterator(const BlockSparseMatrix& mat, Index outer) : m_mat(mat),m_outerB(mat.outerToBlock(outer)),m_outer(outer), itb(mat, mat.outerToBlock(outer)), m_offset(outer - mat.blockOuterIndex(m_outerB)) { if (itb) { m_id = m_mat.blockInnerIndex(itb.index()); m_start = m_id; m_end = m_mat.blockInnerIndex(itb.index()+1); } } inline InnerIterator& operator++() { m_id++; if (m_id >= m_end) { ++itb; if (itb) { m_id = m_mat.blockInnerIndex(itb.index()); m_start = m_id; m_end = m_mat.blockInnerIndex(itb.index()+1); } } return *this; } inline const Scalar& value() const { return itb.value().coeff(m_id - m_start, m_offset); } inline Scalar& valueRef() { return itb.valueRef().coeff(m_id - m_start, m_offset); } inline Index index() const { return m_id; } inline Index outer() const {return m_outer; } inline Index col() const {return outer(); } inline Index row() const { return index();} inline operator bool() const { return itb; } protected: const BlockSparseMatrix& m_mat; const Index m_outer; const Index m_outerB; BlockInnerIterator itb; // Iterator through the blocks const Index m_offset; // Position of this column in the block Index m_start; // starting inner index of this block Index m_id; // current inner index in the block Index m_end; // starting inner index of the next block }; } // end namespace Eigen #endif // EIGEN_SPARSEBLOCKMATRIX_H RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/DynamicSparseMatrix.h0000644000176200001440000003266014562417221027310 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DYNAMIC_SPARSEMATRIX_H #define EIGEN_DYNAMIC_SPARSEMATRIX_H namespace Eigen { /** \deprecated use a SparseMatrix in an uncompressed mode * * \class DynamicSparseMatrix * * \brief A sparse matrix class designed for matrix assembly purpose * * \param _Scalar the scalar type, i.e. the type of the coefficients * * Unlike SparseMatrix, this class provides a much higher degree of flexibility. In particular, it allows * random read/write accesses in log(rho*outer_size) where \c rho is the probability that a coefficient is * nonzero and outer_size is the number of columns if the matrix is column-major and the number of rows * otherwise. * * Internally, the data are stored as a std::vector of compressed vector. The performances of random writes might * decrease as the number of nonzeros per inner-vector increase. In practice, we observed very good performance * till about 100 nonzeros/vector, and the performance remains relatively good till 500 nonzeros/vectors. * * \see SparseMatrix */ namespace internal { template struct traits > { typedef _Scalar Scalar; typedef _StorageIndex StorageIndex; typedef Sparse StorageKind; typedef MatrixXpr XprKind; enum { RowsAtCompileTime = Dynamic, ColsAtCompileTime = Dynamic, MaxRowsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic, Flags = _Options | NestByRefBit | LvalueBit, CoeffReadCost = NumTraits::ReadCost, SupportedAccessPatterns = OuterRandomAccessPattern }; }; } template class DynamicSparseMatrix : public SparseMatrixBase > { typedef SparseMatrixBase Base; using Base::convert_index; public: EIGEN_SPARSE_PUBLIC_INTERFACE(DynamicSparseMatrix) // FIXME: why are these operator already alvailable ??? // EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(DynamicSparseMatrix, +=) // EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(DynamicSparseMatrix, -=) typedef MappedSparseMatrix Map; using Base::IsRowMajor; using Base::operator=; enum { Options = _Options }; protected: typedef DynamicSparseMatrix TransposedSparseMatrix; Index m_innerSize; std::vector > m_data; public: inline Index rows() const { return IsRowMajor ? outerSize() : m_innerSize; } inline Index cols() const { return IsRowMajor ? m_innerSize : outerSize(); } inline Index innerSize() const { return m_innerSize; } inline Index outerSize() const { return convert_index(m_data.size()); } inline Index innerNonZeros(Index j) const { return m_data[j].size(); } std::vector >& _data() { return m_data; } const std::vector >& _data() const { return m_data; } /** \returns the coefficient value at given position \a row, \a col * This operation involes a log(rho*outer_size) binary search. */ inline Scalar coeff(Index row, Index col) const { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; return m_data[outer].at(inner); } /** \returns a reference to the coefficient value at given position \a row, \a col * This operation involes a log(rho*outer_size) binary search. If the coefficient does not * exist yet, then a sorted insertion into a sequential buffer is performed. */ inline Scalar& coeffRef(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; return m_data[outer].atWithInsertion(inner); } class InnerIterator; class ReverseInnerIterator; void setZero() { for (Index j=0; j0) { Index reserveSizePerVector = (std::max)(reserveSize/outerSize(),Index(4)); for (Index j=0; j(m_data[outer].size()) - 1; m_data[outer].resize(id+2,1); while ( (id >= startId) && (m_data[outer].index(id) > inner) ) { m_data[outer].index(id+1) = m_data[outer].index(id); m_data[outer].value(id+1) = m_data[outer].value(id); --id; } m_data[outer].index(id+1) = inner; m_data[outer].value(id+1) = 0; return m_data[outer].value(id+1); } /** Does nothing: provided for compatibility with SparseMatrix */ inline void finalize() {} /** Suppress all nonzeros which are smaller than \a reference under the tolerance \a epsilon */ void prune(Scalar reference, RealScalar epsilon = NumTraits::dummy_precision()) { for (Index j=0; jinnerSize) { // remove all coefficients with innerCoord>=innerSize // TODO //std::cerr << "not implemented yet\n"; exit(2); } if (m_data.size() != outerSize) { m_data.resize(outerSize); } } /** The class DynamicSparseMatrix is deprecated */ EIGEN_DEPRECATED inline DynamicSparseMatrix() : m_innerSize(0), m_data(0) { #ifdef EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN #endif eigen_assert(innerSize()==0 && outerSize()==0); } /** The class DynamicSparseMatrix is deprecated */ EIGEN_DEPRECATED inline DynamicSparseMatrix(Index rows, Index cols) : m_innerSize(0) { #ifdef EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN #endif resize(rows, cols); } /** The class DynamicSparseMatrix is deprecated */ template EIGEN_DEPRECATED explicit inline DynamicSparseMatrix(const SparseMatrixBase& other) : m_innerSize(0) { #ifdef EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN #endif Base::operator=(other.derived()); } inline DynamicSparseMatrix(const DynamicSparseMatrix& other) : Base(), m_innerSize(0) { #ifdef EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN #endif *this = other.derived(); } inline void swap(DynamicSparseMatrix& other) { //EIGEN_DBG_SPARSE(std::cout << "SparseMatrix:: swap\n"); std::swap(m_innerSize, other.m_innerSize); //std::swap(m_outerSize, other.m_outerSize); m_data.swap(other.m_data); } inline DynamicSparseMatrix& operator=(const DynamicSparseMatrix& other) { if (other.isRValue()) { swap(other.const_cast_derived()); } else { resize(other.rows(), other.cols()); m_data = other.m_data; } return *this; } /** Destructor */ inline ~DynamicSparseMatrix() {} public: /** \deprecated * Set the matrix to zero and reserve the memory for \a reserveSize nonzero coefficients. */ EIGEN_DEPRECATED void startFill(Index reserveSize = 1000) { setZero(); reserve(reserveSize); } /** \deprecated use insert() * inserts a nonzero coefficient at given coordinates \a row, \a col and returns its reference assuming that: * 1 - the coefficient does not exist yet * 2 - this the coefficient with greater inner coordinate for the given outer coordinate. * In other words, assuming \c *this is column-major, then there must not exists any nonzero coefficient of coordinates * \c i \c x \a col such that \c i >= \a row. Otherwise the matrix is invalid. * * \see fillrand(), coeffRef() */ EIGEN_DEPRECATED Scalar& fill(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; return insertBack(outer,inner); } /** \deprecated use insert() * Like fill() but with random inner coordinates. * Compared to the generic coeffRef(), the unique limitation is that we assume * the coefficient does not exist yet. */ EIGEN_DEPRECATED Scalar& fillrand(Index row, Index col) { return insert(row,col); } /** \deprecated use finalize() * Does nothing. Provided for compatibility with SparseMatrix. */ EIGEN_DEPRECATED void endFill() {} # ifdef EIGEN_DYNAMICSPARSEMATRIX_PLUGIN # include EIGEN_DYNAMICSPARSEMATRIX_PLUGIN # endif }; template class DynamicSparseMatrix::InnerIterator : public SparseVector::InnerIterator { typedef typename SparseVector::InnerIterator Base; public: InnerIterator(const DynamicSparseMatrix& mat, Index outer) : Base(mat.m_data[outer]), m_outer(outer) {} inline Index row() const { return IsRowMajor ? m_outer : Base::index(); } inline Index col() const { return IsRowMajor ? Base::index() : m_outer; } inline Index outer() const { return m_outer; } protected: const Index m_outer; }; template class DynamicSparseMatrix::ReverseInnerIterator : public SparseVector::ReverseInnerIterator { typedef typename SparseVector::ReverseInnerIterator Base; public: ReverseInnerIterator(const DynamicSparseMatrix& mat, Index outer) : Base(mat.m_data[outer]), m_outer(outer) {} inline Index row() const { return IsRowMajor ? m_outer : Base::index(); } inline Index col() const { return IsRowMajor ? Base::index() : m_outer; } inline Index outer() const { return m_outer; } protected: const Index m_outer; }; namespace internal { template struct evaluator > : evaluator_base > { typedef _Scalar Scalar; typedef DynamicSparseMatrix<_Scalar,_Options,_StorageIndex> SparseMatrixType; typedef typename SparseMatrixType::InnerIterator InnerIterator; typedef typename SparseMatrixType::ReverseInnerIterator ReverseInnerIterator; enum { CoeffReadCost = NumTraits<_Scalar>::ReadCost, Flags = SparseMatrixType::Flags }; evaluator() : m_matrix(0) {} evaluator(const SparseMatrixType &mat) : m_matrix(&mat) {} operator SparseMatrixType&() { return m_matrix->const_cast_derived(); } operator const SparseMatrixType&() const { return *m_matrix; } Scalar coeff(Index row, Index col) const { return m_matrix->coeff(row,col); } Index nonZerosEstimate() const { return m_matrix->nonZeros(); } const SparseMatrixType *m_matrix; }; } } // end namespace Eigen #endif // EIGEN_DYNAMIC_SPARSEMATRIX_H RcppEigen/inst/include/unsupported/Eigen/src/SparseExtra/BlockOfDynamicSparseMatrix.h0000644000176200001440000001024414107270226030537 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSE_BLOCKFORDYNAMICMATRIX_H #define EIGEN_SPARSE_BLOCKFORDYNAMICMATRIX_H namespace Eigen { #if 0 // NOTE Have to be reimplemented as a specialization of BlockImpl< DynamicSparseMatrix<_Scalar, _Options, _Index>, ... > // See SparseBlock.h for an example /*************************************************************************** * specialisation for DynamicSparseMatrix ***************************************************************************/ template class SparseInnerVectorSet, Size> : public SparseMatrixBase, Size> > { typedef DynamicSparseMatrix<_Scalar, _Options, _Index> MatrixType; public: enum { IsRowMajor = internal::traits::IsRowMajor }; EIGEN_SPARSE_PUBLIC_INTERFACE(SparseInnerVectorSet) class InnerIterator: public MatrixType::InnerIterator { public: inline InnerIterator(const SparseInnerVectorSet& xpr, Index outer) : MatrixType::InnerIterator(xpr.m_matrix, xpr.m_outerStart + outer), m_outer(outer) {} inline Index row() const { return IsRowMajor ? m_outer : this->index(); } inline Index col() const { return IsRowMajor ? this->index() : m_outer; } protected: Index m_outer; }; inline SparseInnerVectorSet(const MatrixType& matrix, Index outerStart, Index outerSize) : m_matrix(matrix), m_outerStart(outerStart), m_outerSize(outerSize) { eigen_assert( (outerStart>=0) && ((outerStart+outerSize)<=matrix.outerSize()) ); } inline SparseInnerVectorSet(const MatrixType& matrix, Index outer) : m_matrix(matrix), m_outerStart(outer), m_outerSize(Size) { eigen_assert(Size!=Dynamic); eigen_assert( (outer>=0) && (outer inline SparseInnerVectorSet& operator=(const SparseMatrixBase& other) { if (IsRowMajor != ((OtherDerived::Flags&RowMajorBit)==RowMajorBit)) { // need to transpose => perform a block evaluation followed by a big swap DynamicSparseMatrix aux(other); *this = aux.markAsRValue(); } else { // evaluate/copy vector per vector for (Index j=0; j aux(other.innerVector(j)); m_matrix.const_cast_derived()._data()[m_outerStart+j].swap(aux._data()); } } return *this; } inline SparseInnerVectorSet& operator=(const SparseInnerVectorSet& other) { return operator=(other); } Index nonZeros() const { Index count = 0; for (Index j=0; j0); return m_matrix.data()[m_outerStart].vale(m_matrix.data()[m_outerStart].size()-1); } // template // inline SparseInnerVectorSet& operator=(const SparseMatrixBase& other) // { // return *this; // } EIGEN_STRONG_INLINE Index rows() const { return IsRowMajor ? m_outerSize.value() : m_matrix.rows(); } EIGEN_STRONG_INLINE Index cols() const { return IsRowMajor ? m_matrix.cols() : m_outerSize.value(); } protected: const typename MatrixType::Nested m_matrix; Index m_outerStart; const internal::variable_if_dynamic m_outerSize; }; #endif } // end namespace Eigen #endif // EIGEN_SPARSE_BLOCKFORDYNAMICMATRIX_H RcppEigen/inst/include/unsupported/Eigen/src/EulerAngles/0000755000176200001440000000000014562417221023146 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/EulerAngles/EulerSystem.h0000644000176200001440000002654414562417221025613 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Tal Hadad // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EULERSYSTEM_H #define EIGEN_EULERSYSTEM_H namespace Eigen { // Forward declarations template class EulerAngles; namespace internal { // TODO: Add this trait to the Eigen internal API? template 0)> struct Abs { enum { value = Num }; }; template struct Abs { enum { value = -Num }; }; template struct IsValidAxis { enum { value = Axis != 0 && Abs::value <= 3 }; }; template struct eulerangles_assign_impl; } #define EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT(COND,MSG) typedef char static_assertion_##MSG[(COND)?1:-1] /** \brief Representation of a fixed signed rotation axis for EulerSystem. * * \ingroup EulerAngles_Module * * Values here represent: * - The axis of the rotation: X, Y or Z. * - The sign (i.e. direction of the rotation along the axis): positive(+) or negative(-) * * Therefore, this could express all the axes {+X,+Y,+Z,-X,-Y,-Z} * * For positive axis, use +EULER_{axis}, and for negative axis use -EULER_{axis}. */ enum EulerAxis { EULER_X = 1, /*!< the X axis */ EULER_Y = 2, /*!< the Y axis */ EULER_Z = 3 /*!< the Z axis */ }; /** \class EulerSystem * * \ingroup EulerAngles_Module * * \brief Represents a fixed Euler rotation system. * * This meta-class goal is to represent the Euler system in compilation time, for EulerAngles. * * You can use this class to get two things: * - Build an Euler system, and then pass it as a template parameter to EulerAngles. * - Query some compile time data about an Euler system. (e.g. Whether it's Tait-Bryan) * * Euler rotation is a set of three rotation on fixed axes. (see \ref EulerAngles) * This meta-class store constantly those signed axes. (see \ref EulerAxis) * * ### Types of Euler systems ### * * All and only valid 3 dimension Euler rotation over standard * signed axes{+X,+Y,+Z,-X,-Y,-Z} are supported: * - all axes X, Y, Z in each valid order (see below what order is valid) * - rotation over the axis is supported both over the positive and negative directions. * - both Tait-Bryan and proper/classic Euler angles (i.e. the opposite). * * Since EulerSystem support both positive and negative directions, * you may call this rotation distinction in other names: * - _right handed_ or _left handed_ * - _counterclockwise_ or _clockwise_ * * Notice all axed combination are valid, and would trigger a static assertion. * Same unsigned axes can't be neighbors, e.g. {X,X,Y} is invalid. * This yield two and only two classes: * - _Tait-Bryan_ - all unsigned axes are distinct, e.g. {X,Y,Z} * - _proper/classic Euler angles_ - The first and the third unsigned axes is equal, * and the second is different, e.g. {X,Y,X} * * ### Intrinsic vs extrinsic Euler systems ### * * Only intrinsic Euler systems are supported for simplicity. * If you want to use extrinsic Euler systems, * just use the equal intrinsic opposite order for axes and angles. * I.e axes (A,B,C) becomes (C,B,A), and angles (a,b,c) becomes (c,b,a). * * ### Convenient user typedefs ### * * Convenient typedefs for EulerSystem exist (only for positive axes Euler systems), * in a form of EulerSystem{A}{B}{C}, e.g. \ref EulerSystemXYZ. * * ### Additional reading ### * * More information about Euler angles: https://en.wikipedia.org/wiki/Euler_angles * * \tparam _AlphaAxis the first fixed EulerAxis * * \tparam _BetaAxis the second fixed EulerAxis * * \tparam _GammaAxis the third fixed EulerAxis */ template class EulerSystem { public: // It's defined this way and not as enum, because I think // that enum is not guerantee to support negative numbers /** The first rotation axis */ static const int AlphaAxis = _AlphaAxis; /** The second rotation axis */ static const int BetaAxis = _BetaAxis; /** The third rotation axis */ static const int GammaAxis = _GammaAxis; enum { AlphaAxisAbs = internal::Abs::value, /*!< the first rotation axis unsigned */ BetaAxisAbs = internal::Abs::value, /*!< the second rotation axis unsigned */ GammaAxisAbs = internal::Abs::value, /*!< the third rotation axis unsigned */ IsAlphaOpposite = (AlphaAxis < 0) ? 1 : 0, /*!< whether alpha axis is negative */ IsBetaOpposite = (BetaAxis < 0) ? 1 : 0, /*!< whether beta axis is negative */ IsGammaOpposite = (GammaAxis < 0) ? 1 : 0, /*!< whether gamma axis is negative */ // Parity is even if alpha axis X is followed by beta axis Y, or Y is followed // by Z, or Z is followed by X; otherwise it is odd. IsOdd = ((AlphaAxisAbs)%3 == (BetaAxisAbs - 1)%3) ? 0 : 1, /*!< whether the Euler system is odd */ IsEven = IsOdd ? 0 : 1, /*!< whether the Euler system is even */ IsTaitBryan = ((unsigned)AlphaAxisAbs != (unsigned)GammaAxisAbs) ? 1 : 0 /*!< whether the Euler system is Tait-Bryan */ }; private: EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT(internal::IsValidAxis::value, ALPHA_AXIS_IS_INVALID); EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT(internal::IsValidAxis::value, BETA_AXIS_IS_INVALID); EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT(internal::IsValidAxis::value, GAMMA_AXIS_IS_INVALID); EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT((unsigned)AlphaAxisAbs != (unsigned)BetaAxisAbs, ALPHA_AXIS_CANT_BE_EQUAL_TO_BETA_AXIS); EIGEN_EULER_ANGLES_CLASS_STATIC_ASSERT((unsigned)BetaAxisAbs != (unsigned)GammaAxisAbs, BETA_AXIS_CANT_BE_EQUAL_TO_GAMMA_AXIS); static const int // I, J, K are the pivot indexes permutation for the rotation matrix, that match this Euler system. // They are used in this class converters. // They are always different from each other, and their possible values are: 0, 1, or 2. I_ = AlphaAxisAbs - 1, J_ = (AlphaAxisAbs - 1 + 1 + IsOdd)%3, K_ = (AlphaAxisAbs - 1 + 2 - IsOdd)%3 ; // TODO: Get @mat parameter in form that avoids double evaluation. template static void CalcEulerAngles_imp(Matrix::Scalar, 3, 1>& res, const MatrixBase& mat, internal::true_type /*isTaitBryan*/) { using std::atan2; using std::sqrt; typedef typename Derived::Scalar Scalar; const Scalar plusMinus = IsEven? 1 : -1; const Scalar minusPlus = IsOdd? 1 : -1; const Scalar Rsum = sqrt((mat(I_,I_) * mat(I_,I_) + mat(I_,J_) * mat(I_,J_) + mat(J_,K_) * mat(J_,K_) + mat(K_,K_) * mat(K_,K_))/2); res[1] = atan2(plusMinus * mat(I_,K_), Rsum); // There is a singularity when cos(beta) == 0 if(Rsum > 4 * NumTraits::epsilon()) {// cos(beta) != 0 res[0] = atan2(minusPlus * mat(J_, K_), mat(K_, K_)); res[2] = atan2(minusPlus * mat(I_, J_), mat(I_, I_)); } else if(plusMinus * mat(I_, K_) > 0) {// cos(beta) == 0 and sin(beta) == 1 Scalar spos = mat(J_, I_) + plusMinus * mat(K_, J_); // 2*sin(alpha + plusMinus * gamma Scalar cpos = mat(J_, J_) + minusPlus * mat(K_, I_); // 2*cos(alpha + plusMinus * gamma) Scalar alphaPlusMinusGamma = atan2(spos, cpos); res[0] = alphaPlusMinusGamma; res[2] = 0; } else {// cos(beta) == 0 and sin(beta) == -1 Scalar sneg = plusMinus * (mat(K_, J_) + minusPlus * mat(J_, I_)); // 2*sin(alpha + minusPlus*gamma) Scalar cneg = mat(J_, J_) + plusMinus * mat(K_, I_); // 2*cos(alpha + minusPlus*gamma) Scalar alphaMinusPlusBeta = atan2(sneg, cneg); res[0] = alphaMinusPlusBeta; res[2] = 0; } } template static void CalcEulerAngles_imp(Matrix::Scalar,3,1>& res, const MatrixBase& mat, internal::false_type /*isTaitBryan*/) { using std::atan2; using std::sqrt; typedef typename Derived::Scalar Scalar; const Scalar plusMinus = IsEven? 1 : -1; const Scalar minusPlus = IsOdd? 1 : -1; const Scalar Rsum = sqrt((mat(I_, J_) * mat(I_, J_) + mat(I_, K_) * mat(I_, K_) + mat(J_, I_) * mat(J_, I_) + mat(K_, I_) * mat(K_, I_)) / 2); res[1] = atan2(Rsum, mat(I_, I_)); // There is a singularity when sin(beta) == 0 if(Rsum > 4 * NumTraits::epsilon()) {// sin(beta) != 0 res[0] = atan2(mat(J_, I_), minusPlus * mat(K_, I_)); res[2] = atan2(mat(I_, J_), plusMinus * mat(I_, K_)); } else if(mat(I_, I_) > 0) {// sin(beta) == 0 and cos(beta) == 1 Scalar spos = plusMinus * mat(K_, J_) + minusPlus * mat(J_, K_); // 2*sin(alpha + gamma) Scalar cpos = mat(J_, J_) + mat(K_, K_); // 2*cos(alpha + gamma) res[0] = atan2(spos, cpos); res[2] = 0; } else {// sin(beta) == 0 and cos(beta) == -1 Scalar sneg = plusMinus * mat(K_, J_) + plusMinus * mat(J_, K_); // 2*sin(alpha - gamma) Scalar cneg = mat(J_, J_) - mat(K_, K_); // 2*cos(alpha - gamma) res[0] = atan2(sneg, cneg); res[2] = 0; } } template static void CalcEulerAngles( EulerAngles& res, const typename EulerAngles::Matrix3& mat) { CalcEulerAngles_imp( res.angles(), mat, typename internal::conditional::type()); if (IsAlphaOpposite) res.alpha() = -res.alpha(); if (IsBetaOpposite) res.beta() = -res.beta(); if (IsGammaOpposite) res.gamma() = -res.gamma(); } template friend class Eigen::EulerAngles; template friend struct internal::eulerangles_assign_impl; }; #define EIGEN_EULER_SYSTEM_TYPEDEF(A, B, C) \ /** \ingroup EulerAngles_Module */ \ typedef EulerSystem EulerSystem##A##B##C; EIGEN_EULER_SYSTEM_TYPEDEF(X,Y,Z) EIGEN_EULER_SYSTEM_TYPEDEF(X,Y,X) EIGEN_EULER_SYSTEM_TYPEDEF(X,Z,Y) EIGEN_EULER_SYSTEM_TYPEDEF(X,Z,X) EIGEN_EULER_SYSTEM_TYPEDEF(Y,Z,X) EIGEN_EULER_SYSTEM_TYPEDEF(Y,Z,Y) EIGEN_EULER_SYSTEM_TYPEDEF(Y,X,Z) EIGEN_EULER_SYSTEM_TYPEDEF(Y,X,Y) EIGEN_EULER_SYSTEM_TYPEDEF(Z,X,Y) EIGEN_EULER_SYSTEM_TYPEDEF(Z,X,Z) EIGEN_EULER_SYSTEM_TYPEDEF(Z,Y,X) EIGEN_EULER_SYSTEM_TYPEDEF(Z,Y,Z) } #endif // EIGEN_EULERSYSTEM_H RcppEigen/inst/include/unsupported/Eigen/src/EulerAngles/EulerAngles.h0000644000176200001440000003600714562417221025533 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Tal Hadad // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EULERANGLESCLASS_H// TODO: Fix previous "EIGEN_EULERANGLES_H" definition? #define EIGEN_EULERANGLESCLASS_H namespace Eigen { /** \class EulerAngles * * \ingroup EulerAngles_Module * * \brief Represents a rotation in a 3 dimensional space as three Euler angles. * * Euler rotation is a set of three rotation of three angles over three fixed axes, defined by the EulerSystem given as a template parameter. * * Here is how intrinsic Euler angles works: * - first, rotate the axes system over the alpha axis in angle alpha * - then, rotate the axes system over the beta axis(which was rotated in the first stage) in angle beta * - then, rotate the axes system over the gamma axis(which was rotated in the two stages above) in angle gamma * * \note This class support only intrinsic Euler angles for simplicity, * see EulerSystem how to easily overcome this for extrinsic systems. * * ### Rotation representation and conversions ### * * It has been proved(see Wikipedia link below) that every rotation can be represented * by Euler angles, but there is no single representation (e.g. unlike rotation matrices). * Therefore, you can convert from Eigen rotation and to them * (including rotation matrices, which is not called "rotations" by Eigen design). * * Euler angles usually used for: * - convenient human representation of rotation, especially in interactive GUI. * - gimbal systems and robotics * - efficient encoding(i.e. 3 floats only) of rotation for network protocols. * * However, Euler angles are slow comparing to quaternion or matrices, * because their unnatural math definition, although it's simple for human. * To overcome this, this class provide easy movement from the math friendly representation * to the human friendly representation, and vise-versa. * * All the user need to do is a safe simple C++ type conversion, * and this class take care for the math. * Additionally, some axes related computation is done in compile time. * * #### Euler angles ranges in conversions #### * Rotations representation as EulerAngles are not single (unlike matrices), * and even have infinite EulerAngles representations.
* For example, add or subtract 2*PI from either angle of EulerAngles * and you'll get the same rotation. * This is the general reason for infinite representation, * but it's not the only general reason for not having a single representation. * * When converting rotation to EulerAngles, this class convert it to specific ranges * When converting some rotation to EulerAngles, the rules for ranges are as follow: * - If the rotation we converting from is an EulerAngles * (even when it represented as RotationBase explicitly), angles ranges are __undefined__. * - otherwise, alpha and gamma angles will be in the range [-PI, PI].
* As for Beta angle: * - If the system is Tait-Bryan, the beta angle will be in the range [-PI/2, PI/2]. * - otherwise: * - If the beta axis is positive, the beta angle will be in the range [0, PI] * - If the beta axis is negative, the beta angle will be in the range [-PI, 0] * * \sa EulerAngles(const MatrixBase&) * \sa EulerAngles(const RotationBase&) * * ### Convenient user typedefs ### * * Convenient typedefs for EulerAngles exist for float and double scalar, * in a form of EulerAngles{A}{B}{C}{scalar}, * e.g. \ref EulerAnglesXYZd, \ref EulerAnglesZYZf. * * Only for positive axes{+x,+y,+z} Euler systems are have convenient typedef. * If you need negative axes{-x,-y,-z}, it is recommended to create you own typedef with * a word that represent what you need. * * ### Example ### * * \include EulerAngles.cpp * Output: \verbinclude EulerAngles.out * * ### Additional reading ### * * If you're want to get more idea about how Euler system work in Eigen see EulerSystem. * * More information about Euler angles: https://en.wikipedia.org/wiki/Euler_angles * * \tparam _Scalar the scalar type, i.e. the type of the angles. * * \tparam _System the EulerSystem to use, which represents the axes of rotation. */ template class EulerAngles : public RotationBase, 3> { public: typedef RotationBase, 3> Base; /** the scalar type of the angles */ typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; /** the EulerSystem to use, which represents the axes of rotation. */ typedef _System System; typedef Matrix Matrix3; /*!< the equivalent rotation matrix type */ typedef Matrix Vector3; /*!< the equivalent 3 dimension vector type */ typedef Quaternion QuaternionType; /*!< the equivalent quaternion type */ typedef AngleAxis AngleAxisType; /*!< the equivalent angle-axis type */ /** \returns the axis vector of the first (alpha) rotation */ static Vector3 AlphaAxisVector() { const Vector3& u = Vector3::Unit(System::AlphaAxisAbs - 1); return System::IsAlphaOpposite ? -u : u; } /** \returns the axis vector of the second (beta) rotation */ static Vector3 BetaAxisVector() { const Vector3& u = Vector3::Unit(System::BetaAxisAbs - 1); return System::IsBetaOpposite ? -u : u; } /** \returns the axis vector of the third (gamma) rotation */ static Vector3 GammaAxisVector() { const Vector3& u = Vector3::Unit(System::GammaAxisAbs - 1); return System::IsGammaOpposite ? -u : u; } private: Vector3 m_angles; public: /** Default constructor without initialization. */ EulerAngles() {} /** Constructs and initialize an EulerAngles (\p alpha, \p beta, \p gamma). */ EulerAngles(const Scalar& alpha, const Scalar& beta, const Scalar& gamma) : m_angles(alpha, beta, gamma) {} // TODO: Test this constructor /** Constructs and initialize an EulerAngles from the array data {alpha, beta, gamma} */ explicit EulerAngles(const Scalar* data) : m_angles(data) {} /** Constructs and initializes an EulerAngles from either: * - a 3x3 rotation matrix expression(i.e. pure orthogonal matrix with determinant of +1), * - a 3D vector expression representing Euler angles. * * \note If \p other is a 3x3 rotation matrix, the angles range rules will be as follow:
* Alpha and gamma angles will be in the range [-PI, PI].
* As for Beta angle: * - If the system is Tait-Bryan, the beta angle will be in the range [-PI/2, PI/2]. * - otherwise: * - If the beta axis is positive, the beta angle will be in the range [0, PI] * - If the beta axis is negative, the beta angle will be in the range [-PI, 0] */ template explicit EulerAngles(const MatrixBase& other) { *this = other; } /** Constructs and initialize Euler angles from a rotation \p rot. * * \note If \p rot is an EulerAngles (even when it represented as RotationBase explicitly), * angles ranges are __undefined__. * Otherwise, alpha and gamma angles will be in the range [-PI, PI].
* As for Beta angle: * - If the system is Tait-Bryan, the beta angle will be in the range [-PI/2, PI/2]. * - otherwise: * - If the beta axis is positive, the beta angle will be in the range [0, PI] * - If the beta axis is negative, the beta angle will be in the range [-PI, 0] */ template EulerAngles(const RotationBase& rot) { System::CalcEulerAngles(*this, rot.toRotationMatrix()); } /*EulerAngles(const QuaternionType& q) { // TODO: Implement it in a faster way for quaternions // According to http://www.euclideanspace.com/maths/geometry/rotations/conversions/quaternionToEuler/ // we can compute only the needed matrix cells and then convert to euler angles. (see ZYX example below) // Currently we compute all matrix cells from quaternion. // Special case only for ZYX //Scalar y2 = q.y() * q.y(); //m_angles[0] = std::atan2(2*(q.w()*q.z() + q.x()*q.y()), (1 - 2*(y2 + q.z()*q.z()))); //m_angles[1] = std::asin( 2*(q.w()*q.y() - q.z()*q.x())); //m_angles[2] = std::atan2(2*(q.w()*q.x() + q.y()*q.z()), (1 - 2*(q.x()*q.x() + y2))); }*/ /** \returns The angle values stored in a vector (alpha, beta, gamma). */ const Vector3& angles() const { return m_angles; } /** \returns A read-write reference to the angle values stored in a vector (alpha, beta, gamma). */ Vector3& angles() { return m_angles; } /** \returns The value of the first angle. */ Scalar alpha() const { return m_angles[0]; } /** \returns A read-write reference to the angle of the first angle. */ Scalar& alpha() { return m_angles[0]; } /** \returns The value of the second angle. */ Scalar beta() const { return m_angles[1]; } /** \returns A read-write reference to the angle of the second angle. */ Scalar& beta() { return m_angles[1]; } /** \returns The value of the third angle. */ Scalar gamma() const { return m_angles[2]; } /** \returns A read-write reference to the angle of the third angle. */ Scalar& gamma() { return m_angles[2]; } /** \returns The Euler angles rotation inverse (which is as same as the negative), * (-alpha, -beta, -gamma). */ EulerAngles inverse() const { EulerAngles res; res.m_angles = -m_angles; return res; } /** \returns The Euler angles rotation negative (which is as same as the inverse), * (-alpha, -beta, -gamma). */ EulerAngles operator -() const { return inverse(); } /** Set \c *this from either: * - a 3x3 rotation matrix expression(i.e. pure orthogonal matrix with determinant of +1), * - a 3D vector expression representing Euler angles. * * See EulerAngles(const MatrixBase&) for more information about * angles ranges output. */ template EulerAngles& operator=(const MatrixBase& other) { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) internal::eulerangles_assign_impl::run(*this, other.derived()); return *this; } // TODO: Assign and construct from another EulerAngles (with different system) /** Set \c *this from a rotation. * * See EulerAngles(const RotationBase&) for more information about * angles ranges output. */ template EulerAngles& operator=(const RotationBase& rot) { System::CalcEulerAngles(*this, rot.toRotationMatrix()); return *this; } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ bool isApprox(const EulerAngles& other, const RealScalar& prec = NumTraits::dummy_precision()) const { return angles().isApprox(other.angles(), prec); } /** \returns an equivalent 3x3 rotation matrix. */ Matrix3 toRotationMatrix() const { // TODO: Calc it faster return static_cast(*this).toRotationMatrix(); } /** Convert the Euler angles to quaternion. */ operator QuaternionType() const { return AngleAxisType(alpha(), AlphaAxisVector()) * AngleAxisType(beta(), BetaAxisVector()) * AngleAxisType(gamma(), GammaAxisVector()); } friend std::ostream& operator<<(std::ostream& s, const EulerAngles& eulerAngles) { s << eulerAngles.angles().transpose(); return s; } /** \returns \c *this with scalar type casted to \a NewScalarType */ template EulerAngles cast() const { EulerAngles e; e.angles() = angles().template cast(); return e; } }; #define EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(AXES, SCALAR_TYPE, SCALAR_POSTFIX) \ /** \ingroup EulerAngles_Module */ \ typedef EulerAngles EulerAngles##AXES##SCALAR_POSTFIX; #define EIGEN_EULER_ANGLES_TYPEDEFS(SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(XYZ, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(XYX, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(XZY, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(XZX, SCALAR_TYPE, SCALAR_POSTFIX) \ \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(YZX, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(YZY, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(YXZ, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(YXY, SCALAR_TYPE, SCALAR_POSTFIX) \ \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(ZXY, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(ZXZ, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(ZYX, SCALAR_TYPE, SCALAR_POSTFIX) \ EIGEN_EULER_ANGLES_SINGLE_TYPEDEF(ZYZ, SCALAR_TYPE, SCALAR_POSTFIX) EIGEN_EULER_ANGLES_TYPEDEFS(float, f) EIGEN_EULER_ANGLES_TYPEDEFS(double, d) namespace internal { template struct traits > { typedef _Scalar Scalar; }; // set from a rotation matrix template struct eulerangles_assign_impl { typedef typename Other::Scalar Scalar; static void run(EulerAngles& e, const Other& m) { System::CalcEulerAngles(e, m); } }; // set from a vector of Euler angles template struct eulerangles_assign_impl { typedef typename Other::Scalar Scalar; static void run(EulerAngles& e, const Other& vec) { e.angles() = vec; } }; } } #endif // EIGEN_EULERANGLESCLASS_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/0000755000176200001440000000000014562417221022356 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineProduct.h0000644000176200001440000002514514107270226025512 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINEPRODUCT_H #define EIGEN_SKYLINEPRODUCT_H namespace Eigen { template struct SkylineProductReturnType { typedef const typename internal::nested_eval::type LhsNested; typedef const typename internal::nested_eval::type RhsNested; typedef SkylineProduct Type; }; template struct internal::traits > { // clean the nested types: typedef typename internal::remove_all::type _LhsNested; typedef typename internal::remove_all::type _RhsNested; typedef typename _LhsNested::Scalar Scalar; enum { LhsCoeffReadCost = _LhsNested::CoeffReadCost, RhsCoeffReadCost = _RhsNested::CoeffReadCost, LhsFlags = _LhsNested::Flags, RhsFlags = _RhsNested::Flags, RowsAtCompileTime = _LhsNested::RowsAtCompileTime, ColsAtCompileTime = _RhsNested::ColsAtCompileTime, InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(_LhsNested::ColsAtCompileTime, _RhsNested::RowsAtCompileTime), MaxRowsAtCompileTime = _LhsNested::MaxRowsAtCompileTime, MaxColsAtCompileTime = _RhsNested::MaxColsAtCompileTime, EvalToRowMajor = (RhsFlags & LhsFlags & RowMajorBit), ResultIsSkyline = ProductMode == SkylineTimeSkylineProduct, RemovedBits = ~((EvalToRowMajor ? 0 : RowMajorBit) | (ResultIsSkyline ? 0 : SkylineBit)), Flags = (int(LhsFlags | RhsFlags) & HereditaryBits & RemovedBits) | EvalBeforeAssigningBit | EvalBeforeNestingBit, CoeffReadCost = HugeCost }; typedef typename internal::conditional >, MatrixBase > >::type Base; }; namespace internal { template class SkylineProduct : no_assignment_operator, public traits >::Base { public: EIGEN_GENERIC_PUBLIC_INTERFACE(SkylineProduct) private: typedef typename traits::_LhsNested _LhsNested; typedef typename traits::_RhsNested _RhsNested; public: template EIGEN_STRONG_INLINE SkylineProduct(const Lhs& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs) { eigen_assert(lhs.cols() == rhs.rows()); enum { ProductIsValid = _LhsNested::ColsAtCompileTime == Dynamic || _RhsNested::RowsAtCompileTime == Dynamic || int(_LhsNested::ColsAtCompileTime) == int(_RhsNested::RowsAtCompileTime), AreVectors = _LhsNested::IsVectorAtCompileTime && _RhsNested::IsVectorAtCompileTime, SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(_LhsNested, _RhsNested) }; // note to the lost user: // * for a dot product use: v1.dot(v2) // * for a coeff-wise product use: v1.cwise()*v2 EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes), INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS) EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors), INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION) EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT) } EIGEN_STRONG_INLINE Index rows() const { return m_lhs.rows(); } EIGEN_STRONG_INLINE Index cols() const { return m_rhs.cols(); } EIGEN_STRONG_INLINE const _LhsNested& lhs() const { return m_lhs; } EIGEN_STRONG_INLINE const _RhsNested& rhs() const { return m_rhs; } protected: LhsNested m_lhs; RhsNested m_rhs; }; // dense = skyline * dense // Note that here we force no inlining and separate the setZero() because GCC messes up otherwise template EIGEN_DONT_INLINE void skyline_row_major_time_dense_product(const Lhs& lhs, const Rhs& rhs, Dest& dst) { typedef typename remove_all::type _Lhs; typedef typename remove_all::type _Rhs; typedef typename traits::Scalar Scalar; enum { LhsIsRowMajor = (_Lhs::Flags & RowMajorBit) == RowMajorBit, LhsIsSelfAdjoint = (_Lhs::Flags & SelfAdjointBit) == SelfAdjointBit, ProcessFirstHalf = LhsIsSelfAdjoint && (((_Lhs::Flags & (UpperTriangularBit | LowerTriangularBit)) == 0) || ((_Lhs::Flags & UpperTriangularBit) && !LhsIsRowMajor) || ((_Lhs::Flags & LowerTriangularBit) && LhsIsRowMajor)), ProcessSecondHalf = LhsIsSelfAdjoint && (!ProcessFirstHalf) }; //Use matrix diagonal part <- Improvement : use inner iterator on dense matrix. for (Index col = 0; col < rhs.cols(); col++) { for (Index row = 0; row < lhs.rows(); row++) { dst(row, col) = lhs.coeffDiag(row) * rhs(row, col); } } //Use matrix lower triangular part for (Index row = 0; row < lhs.rows(); row++) { typename _Lhs::InnerLowerIterator lIt(lhs, row); const Index stop = lIt.col() + lIt.size(); for (Index col = 0; col < rhs.cols(); col++) { Index k = lIt.col(); Scalar tmp = 0; while (k < stop) { tmp += lIt.value() * rhs(k++, col); ++lIt; } dst(row, col) += tmp; lIt += -lIt.size(); } } //Use matrix upper triangular part for (Index lhscol = 0; lhscol < lhs.cols(); lhscol++) { typename _Lhs::InnerUpperIterator uIt(lhs, lhscol); const Index stop = uIt.size() + uIt.row(); for (Index rhscol = 0; rhscol < rhs.cols(); rhscol++) { const Scalar rhsCoeff = rhs.coeff(lhscol, rhscol); Index k = uIt.row(); while (k < stop) { dst(k++, rhscol) += uIt.value() * rhsCoeff; ++uIt; } uIt += -uIt.size(); } } } template EIGEN_DONT_INLINE void skyline_col_major_time_dense_product(const Lhs& lhs, const Rhs& rhs, Dest& dst) { typedef typename remove_all::type _Lhs; typedef typename remove_all::type _Rhs; typedef typename traits::Scalar Scalar; enum { LhsIsRowMajor = (_Lhs::Flags & RowMajorBit) == RowMajorBit, LhsIsSelfAdjoint = (_Lhs::Flags & SelfAdjointBit) == SelfAdjointBit, ProcessFirstHalf = LhsIsSelfAdjoint && (((_Lhs::Flags & (UpperTriangularBit | LowerTriangularBit)) == 0) || ((_Lhs::Flags & UpperTriangularBit) && !LhsIsRowMajor) || ((_Lhs::Flags & LowerTriangularBit) && LhsIsRowMajor)), ProcessSecondHalf = LhsIsSelfAdjoint && (!ProcessFirstHalf) }; //Use matrix diagonal part <- Improvement : use inner iterator on dense matrix. for (Index col = 0; col < rhs.cols(); col++) { for (Index row = 0; row < lhs.rows(); row++) { dst(row, col) = lhs.coeffDiag(row) * rhs(row, col); } } //Use matrix upper triangular part for (Index row = 0; row < lhs.rows(); row++) { typename _Lhs::InnerUpperIterator uIt(lhs, row); const Index stop = uIt.col() + uIt.size(); for (Index col = 0; col < rhs.cols(); col++) { Index k = uIt.col(); Scalar tmp = 0; while (k < stop) { tmp += uIt.value() * rhs(k++, col); ++uIt; } dst(row, col) += tmp; uIt += -uIt.size(); } } //Use matrix lower triangular part for (Index lhscol = 0; lhscol < lhs.cols(); lhscol++) { typename _Lhs::InnerLowerIterator lIt(lhs, lhscol); const Index stop = lIt.size() + lIt.row(); for (Index rhscol = 0; rhscol < rhs.cols(); rhscol++) { const Scalar rhsCoeff = rhs.coeff(lhscol, rhscol); Index k = lIt.row(); while (k < stop) { dst(k++, rhscol) += lIt.value() * rhsCoeff; ++lIt; } lIt += -lIt.size(); } } } template::Flags&RowMajorBit> struct skyline_product_selector; template struct skyline_product_selector { typedef typename traits::type>::Scalar Scalar; static void run(const Lhs& lhs, const Rhs& rhs, ResultType & res) { skyline_row_major_time_dense_product (lhs, rhs, res); } }; template struct skyline_product_selector { typedef typename traits::type>::Scalar Scalar; static void run(const Lhs& lhs, const Rhs& rhs, ResultType & res) { skyline_col_major_time_dense_product (lhs, rhs, res); } }; } // end namespace internal // template // template // Derived & MatrixBase::lazyAssign(const SkylineProduct& product) { // typedef typename internal::remove_all::type _Lhs; // internal::skyline_product_selector::type, // typename internal::remove_all::type, // Derived>::run(product.lhs(), product.rhs(), derived()); // // return derived(); // } // skyline * dense template template EIGEN_STRONG_INLINE const typename SkylineProductReturnType::Type SkylineMatrixBase::operator*(const MatrixBase &other) const { return typename SkylineProductReturnType::Type(derived(), other.derived()); } } // end namespace Eigen #endif // EIGEN_SKYLINEPRODUCT_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineMatrixBase.h0000644000176200001440000001723514562417221026135 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINEMATRIXBASE_H #define EIGEN_SKYLINEMATRIXBASE_H #include "SkylineUtil.h" namespace Eigen { /** \ingroup Skyline_Module * * \class SkylineMatrixBase * * \brief Base class of any skyline matrices or skyline expressions * * \param Derived * */ template class SkylineMatrixBase : public EigenBase { public: typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::index::type Index; enum { RowsAtCompileTime = internal::traits::RowsAtCompileTime, /**< The number of rows at compile-time. This is just a copy of the value provided * by the \a Derived type. If a value is not known at compile-time, * it is set to the \a Dynamic constant. * \sa MatrixBase::rows(), MatrixBase::cols(), ColsAtCompileTime, SizeAtCompileTime */ ColsAtCompileTime = internal::traits::ColsAtCompileTime, /**< The number of columns at compile-time. This is just a copy of the value provided * by the \a Derived type. If a value is not known at compile-time, * it is set to the \a Dynamic constant. * \sa MatrixBase::rows(), MatrixBase::cols(), RowsAtCompileTime, SizeAtCompileTime */ SizeAtCompileTime = (internal::size_at_compile_time::RowsAtCompileTime, internal::traits::ColsAtCompileTime>::ret), /**< This is equal to the number of coefficients, i.e. the number of * rows times the number of columns, or to \a Dynamic if this is not * known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */ MaxRowsAtCompileTime = RowsAtCompileTime, MaxColsAtCompileTime = ColsAtCompileTime, MaxSizeAtCompileTime = (internal::size_at_compile_time::ret), IsVectorAtCompileTime = RowsAtCompileTime == 1 || ColsAtCompileTime == 1, /**< This is set to true if either the number of rows or the number of * columns is known at compile-time to be equal to 1. Indeed, in that case, * we are dealing with a column-vector (if there is only one column) or with * a row-vector (if there is only one row). */ Flags = internal::traits::Flags, /**< This stores expression \ref flags flags which may or may not be inherited by new expressions * constructed from this one. See the \ref flags "list of flags". */ CoeffReadCost = internal::traits::CoeffReadCost, /**< This is a rough measure of how expensive it is to read one coefficient from * this expression. */ IsRowMajor = Flags & RowMajorBit ? 1 : 0 }; #ifndef EIGEN_PARSED_BY_DOXYGEN /** This is the "real scalar" type; if the \a Scalar type is already real numbers * (e.g. int, float or double) then \a RealScalar is just the same as \a Scalar. If * \a Scalar is \a std::complex then RealScalar is \a T. * * \sa class NumTraits */ typedef typename NumTraits::Real RealScalar; /** type of the equivalent square matrix */ typedef Matrix SquareMatrixType; inline const Derived& derived() const { return *static_cast (this); } inline Derived& derived() { return *static_cast (this); } inline Derived& const_cast_derived() const { return *static_cast (const_cast (this)); } #endif // not EIGEN_PARSED_BY_DOXYGEN /** \returns the number of rows. \sa cols(), RowsAtCompileTime */ inline EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return derived().rows(); } /** \returns the number of columns. \sa rows(), ColsAtCompileTime*/ inline EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return derived().cols(); } /** \returns the number of coefficients, which is \a rows()*cols(). * \sa rows(), cols(), SizeAtCompileTime. */ inline EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return rows() * cols(); } /** \returns the number of nonzero coefficients which is in practice the number * of stored coefficients. */ inline Index nonZeros() const { return derived().nonZeros(); } /** \returns the size of the storage major dimension, * i.e., the number of columns for a columns major matrix, and the number of rows otherwise */ Index outerSize() const { return (int(Flags) & RowMajorBit) ? this->rows() : this->cols(); } /** \returns the size of the inner dimension according to the storage order, * i.e., the number of rows for a columns major matrix, and the number of cols otherwise */ Index innerSize() const { return (int(Flags) & RowMajorBit) ? this->cols() : this->rows(); } bool isRValue() const { return m_isRValue; } Derived& markAsRValue() { m_isRValue = true; return derived(); } SkylineMatrixBase() : m_isRValue(false) { /* TODO check flags */ } inline Derived & operator=(const Derived& other) { this->operator= (other); return derived(); } template inline void assignGeneric(const OtherDerived& other) { derived().resize(other.rows(), other.cols()); for (Index row = 0; row < rows(); row++) for (Index col = 0; col < cols(); col++) { if (other.coeff(row, col) != Scalar(0)) derived().insert(row, col) = other.coeff(row, col); } derived().finalize(); } template inline Derived & operator=(const SkylineMatrixBase& other) { //TODO } template inline Derived & operator=(const SkylineProduct& product); friend std::ostream & operator <<(std::ostream & s, const SkylineMatrixBase& m) { s << m.derived(); return s; } template const typename SkylineProductReturnType::Type operator*(const MatrixBase &other) const; /** \internal use operator= */ template void evalTo(MatrixBase& dst) const { dst.setZero(); for (Index i = 0; i < rows(); i++) for (Index j = 0; j < rows(); j++) dst(i, j) = derived().coeff(i, j); } Matrix toDense() const { return derived(); } /** \returns the matrix or vector obtained by evaluating this expression. * * Notice that in the case of a plain matrix or vector (not an expression) this function just returns * a const reference, in order to avoid a useless copy. */ EIGEN_STRONG_INLINE const typename internal::eval::type eval() const { return typename internal::eval::type(derived()); } protected: bool m_isRValue; }; } // end namespace Eigen #endif // EIGEN_SKYLINEMATRIXBASE_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineStorage.h0000644000176200001440000001743614562417221025505 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINE_STORAGE_H #define EIGEN_SKYLINE_STORAGE_H namespace Eigen { /** Stores a skyline set of values in three structures : * The diagonal elements * The upper elements * The lower elements * */ template class SkylineStorage { typedef typename NumTraits::Real RealScalar; typedef SparseIndex Index; public: SkylineStorage() : m_diag(0), m_lower(0), m_upper(0), m_lowerProfile(0), m_upperProfile(0), m_diagSize(0), m_upperSize(0), m_lowerSize(0), m_upperProfileSize(0), m_lowerProfileSize(0), m_allocatedSize(0) { } SkylineStorage(const SkylineStorage& other) : m_diag(0), m_lower(0), m_upper(0), m_lowerProfile(0), m_upperProfile(0), m_diagSize(0), m_upperSize(0), m_lowerSize(0), m_upperProfileSize(0), m_lowerProfileSize(0), m_allocatedSize(0) { *this = other; } SkylineStorage & operator=(const SkylineStorage& other) { resize(other.diagSize(), other.m_upperProfileSize, other.m_lowerProfileSize, other.upperSize(), other.lowerSize()); memcpy(m_diag, other.m_diag, m_diagSize * sizeof (Scalar)); memcpy(m_upper, other.m_upper, other.upperSize() * sizeof (Scalar)); memcpy(m_lower, other.m_lower, other.lowerSize() * sizeof (Scalar)); memcpy(m_upperProfile, other.m_upperProfile, m_upperProfileSize * sizeof (Index)); memcpy(m_lowerProfile, other.m_lowerProfile, m_lowerProfileSize * sizeof (Index)); return *this; } void swap(SkylineStorage& other) { std::swap(m_diag, other.m_diag); std::swap(m_upper, other.m_upper); std::swap(m_lower, other.m_lower); std::swap(m_upperProfile, other.m_upperProfile); std::swap(m_lowerProfile, other.m_lowerProfile); std::swap(m_diagSize, other.m_diagSize); std::swap(m_upperSize, other.m_upperSize); std::swap(m_lowerSize, other.m_lowerSize); std::swap(m_allocatedSize, other.m_allocatedSize); } ~SkylineStorage() { delete[] m_diag; delete[] m_upper; if (m_upper != m_lower) delete[] m_lower; delete[] m_upperProfile; delete[] m_lowerProfile; } void reserve(Index size, Index upperProfileSize, Index lowerProfileSize, Index upperSize, Index lowerSize) { Index newAllocatedSize = size + upperSize + lowerSize; if (newAllocatedSize > m_allocatedSize) reallocate(size, upperProfileSize, lowerProfileSize, upperSize, lowerSize); } void squeeze() { if (m_allocatedSize > m_diagSize + m_upperSize + m_lowerSize) reallocate(m_diagSize, m_upperProfileSize, m_lowerProfileSize, m_upperSize, m_lowerSize); } void resize(Index diagSize, Index upperProfileSize, Index lowerProfileSize, Index upperSize, Index lowerSize, float reserveSizeFactor = 0) { if (m_allocatedSize < diagSize + upperSize + lowerSize) reallocate(diagSize, upperProfileSize, lowerProfileSize, upperSize + Index(reserveSizeFactor * upperSize), lowerSize + Index(reserveSizeFactor * lowerSize)); m_diagSize = diagSize; m_upperSize = upperSize; m_lowerSize = lowerSize; m_upperProfileSize = upperProfileSize; m_lowerProfileSize = lowerProfileSize; } inline Index diagSize() const { return m_diagSize; } inline Index upperSize() const { return m_upperSize; } inline Index lowerSize() const { return m_lowerSize; } inline Index upperProfileSize() const { return m_upperProfileSize; } inline Index lowerProfileSize() const { return m_lowerProfileSize; } inline Index allocatedSize() const { return m_allocatedSize; } inline void clear() { m_diagSize = 0; } inline Scalar& diag(Index i) { return m_diag[i]; } inline const Scalar& diag(Index i) const { return m_diag[i]; } inline Scalar& upper(Index i) { return m_upper[i]; } inline const Scalar& upper(Index i) const { return m_upper[i]; } inline Scalar& lower(Index i) { return m_lower[i]; } inline const Scalar& lower(Index i) const { return m_lower[i]; } inline Index& upperProfile(Index i) { return m_upperProfile[i]; } inline const Index& upperProfile(Index i) const { return m_upperProfile[i]; } inline Index& lowerProfile(Index i) { return m_lowerProfile[i]; } inline const Index& lowerProfile(Index i) const { return m_lowerProfile[i]; } static SkylineStorage Map(Index* upperProfile, Index* lowerProfile, Scalar* diag, Scalar* upper, Scalar* lower, Index size, Index upperSize, Index lowerSize) { SkylineStorage res; res.m_upperProfile = upperProfile; res.m_lowerProfile = lowerProfile; res.m_diag = diag; res.m_upper = upper; res.m_lower = lower; res.m_allocatedSize = res.m_diagSize = size; res.m_upperSize = upperSize; res.m_lowerSize = lowerSize; return res; } inline void reset() { memset(m_diag, 0, m_diagSize * sizeof (Scalar)); memset(m_upper, 0, m_upperSize * sizeof (Scalar)); memset(m_lower, 0, m_lowerSize * sizeof (Scalar)); memset(m_upperProfile, 0, m_diagSize * sizeof (Index)); memset(m_lowerProfile, 0, m_diagSize * sizeof (Index)); } void prune(Scalar reference, RealScalar epsilon = dummy_precision()) { //TODO } protected: inline void reallocate(Index diagSize, Index upperProfileSize, Index lowerProfileSize, Index upperSize, Index lowerSize) { Scalar* diag = new Scalar[diagSize]; Scalar* upper = new Scalar[upperSize]; Scalar* lower = new Scalar[lowerSize]; Index* upperProfile = new Index[upperProfileSize]; Index* lowerProfile = new Index[lowerProfileSize]; Index copyDiagSize = (std::min)(diagSize, m_diagSize); Index copyUpperSize = (std::min)(upperSize, m_upperSize); Index copyLowerSize = (std::min)(lowerSize, m_lowerSize); Index copyUpperProfileSize = (std::min)(upperProfileSize, m_upperProfileSize); Index copyLowerProfileSize = (std::min)(lowerProfileSize, m_lowerProfileSize); // copy memcpy(diag, m_diag, copyDiagSize * sizeof (Scalar)); memcpy(upper, m_upper, copyUpperSize * sizeof (Scalar)); memcpy(lower, m_lower, copyLowerSize * sizeof (Scalar)); memcpy(upperProfile, m_upperProfile, copyUpperProfileSize * sizeof (Index)); memcpy(lowerProfile, m_lowerProfile, copyLowerProfileSize * sizeof (Index)); // delete old stuff delete[] m_diag; delete[] m_upper; delete[] m_lower; delete[] m_upperProfile; delete[] m_lowerProfile; m_diag = diag; m_upper = upper; m_lower = lower; m_upperProfile = upperProfile; m_lowerProfile = lowerProfile; m_allocatedSize = diagSize + upperSize + lowerSize; m_upperSize = upperSize; m_lowerSize = lowerSize; } public: Scalar* m_diag; Scalar* m_upper; Scalar* m_lower; Index* m_upperProfile; Index* m_lowerProfile; Index m_diagSize; Index m_upperSize; Index m_lowerSize; Index m_upperProfileSize; Index m_lowerProfileSize; Index m_allocatedSize; }; } // end namespace Eigen #endif // EIGEN_SKYLINE_STORAGE_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineInplaceLU.h0000644000176200001440000002614514562417221025712 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINEINPLACELU_H #define EIGEN_SKYLINEINPLACELU_H namespace Eigen { /** \ingroup Skyline_Module * * \class SkylineInplaceLU * * \brief Inplace LU decomposition of a skyline matrix and associated features * * \param MatrixType the type of the matrix of which we are computing the LU factorization * */ template class SkylineInplaceLU { protected: typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::Index Index; typedef typename NumTraits::Real RealScalar; public: /** Creates a LU object and compute the respective factorization of \a matrix using * flags \a flags. */ SkylineInplaceLU(MatrixType& matrix, int flags = 0) : /*m_matrix(matrix.rows(), matrix.cols()),*/ m_flags(flags), m_status(0), m_lu(matrix) { m_precision = RealScalar(0.1) * Eigen::dummy_precision (); m_lu.IsRowMajor ? computeRowMajor() : compute(); } /** Sets the relative threshold value used to prune zero coefficients during the decomposition. * * Setting a value greater than zero speeds up computation, and yields to an incomplete * factorization with fewer non zero coefficients. Such approximate factors are especially * useful to initialize an iterative solver. * * Note that the exact meaning of this parameter might depends on the actual * backend. Moreover, not all backends support this feature. * * \sa precision() */ void setPrecision(RealScalar v) { m_precision = v; } /** \returns the current precision. * * \sa setPrecision() */ RealScalar precision() const { return m_precision; } /** Sets the flags. Possible values are: * - CompleteFactorization * - IncompleteFactorization * - MemoryEfficient * - one of the ordering methods * - etc... * * \sa flags() */ void setFlags(int f) { m_flags = f; } /** \returns the current flags */ int flags() const { return m_flags; } void setOrderingMethod(int m) { m_flags = m; } int orderingMethod() const { return m_flags; } /** Computes/re-computes the LU factorization */ void compute(); void computeRowMajor(); /** \returns the lower triangular matrix L */ //inline const MatrixType& matrixL() const { return m_matrixL; } /** \returns the upper triangular matrix U */ //inline const MatrixType& matrixU() const { return m_matrixU; } template bool solve(const MatrixBase &b, MatrixBase* x, const int transposed = 0) const; /** \returns true if the factorization succeeded */ inline bool succeeded(void) const { return m_succeeded; } protected: RealScalar m_precision; int m_flags; mutable int m_status; bool m_succeeded; MatrixType& m_lu; }; /** Computes / recomputes the in place LU decomposition of the SkylineInplaceLU. * using the default algorithm. */ template //template void SkylineInplaceLU::compute() { const size_t rows = m_lu.rows(); const size_t cols = m_lu.cols(); eigen_assert(rows == cols && "We do not (yet) support rectangular LU."); eigen_assert(!m_lu.IsRowMajor && "LU decomposition does not work with rowMajor Storage"); for (Index row = 0; row < rows; row++) { const double pivot = m_lu.coeffDiag(row); //Lower matrix Columns update const Index& col = row; for (typename MatrixType::InnerLowerIterator lIt(m_lu, col); lIt; ++lIt) { lIt.valueRef() /= pivot; } //Upper matrix update -> contiguous memory access typename MatrixType::InnerLowerIterator lIt(m_lu, col); for (Index rrow = row + 1; rrow < m_lu.rows(); rrow++) { typename MatrixType::InnerUpperIterator uItPivot(m_lu, row); typename MatrixType::InnerUpperIterator uIt(m_lu, rrow); const double coef = lIt.value(); uItPivot += (rrow - row - 1); //update upper part -> contiguous memory access for (++uItPivot; uIt && uItPivot;) { uIt.valueRef() -= uItPivot.value() * coef; ++uIt; ++uItPivot; } ++lIt; } //Upper matrix update -> non contiguous memory access typename MatrixType::InnerLowerIterator lIt3(m_lu, col); for (Index rrow = row + 1; rrow < m_lu.rows(); rrow++) { typename MatrixType::InnerUpperIterator uItPivot(m_lu, row); const double coef = lIt3.value(); //update lower part -> non contiguous memory access for (Index i = 0; i < rrow - row - 1; i++) { m_lu.coeffRefLower(rrow, row + i + 1) -= uItPivot.value() * coef; ++uItPivot; } ++lIt3; } //update diag -> contiguous typename MatrixType::InnerLowerIterator lIt2(m_lu, col); for (Index rrow = row + 1; rrow < m_lu.rows(); rrow++) { typename MatrixType::InnerUpperIterator uItPivot(m_lu, row); typename MatrixType::InnerUpperIterator uIt(m_lu, rrow); const double coef = lIt2.value(); uItPivot += (rrow - row - 1); m_lu.coeffRefDiag(rrow) -= uItPivot.value() * coef; ++lIt2; } } } template void SkylineInplaceLU::computeRowMajor() { const size_t rows = m_lu.rows(); const size_t cols = m_lu.cols(); eigen_assert(rows == cols && "We do not (yet) support rectangular LU."); eigen_assert(m_lu.IsRowMajor && "You're trying to apply rowMajor decomposition on a ColMajor matrix !"); for (Index row = 0; row < rows; row++) { typename MatrixType::InnerLowerIterator llIt(m_lu, row); for (Index col = llIt.col(); col < row; col++) { if (m_lu.coeffExistLower(row, col)) { const double diag = m_lu.coeffDiag(col); typename MatrixType::InnerLowerIterator lIt(m_lu, row); typename MatrixType::InnerUpperIterator uIt(m_lu, col); const Index offset = lIt.col() - uIt.row(); Index stop = offset > 0 ? col - lIt.col() : col - uIt.row(); //#define VECTORIZE #ifdef VECTORIZE Map rowVal(lIt.valuePtr() + (offset > 0 ? 0 : -offset), stop); Map colVal(uIt.valuePtr() + (offset > 0 ? offset : 0), stop); Scalar newCoeff = m_lu.coeffLower(row, col) - rowVal.dot(colVal); #else if (offset > 0) //Skip zero value of lIt uIt += offset; else //Skip zero values of uIt lIt += -offset; Scalar newCoeff = m_lu.coeffLower(row, col); for (Index k = 0; k < stop; ++k) { const Scalar tmp = newCoeff; newCoeff = tmp - lIt.value() * uIt.value(); ++lIt; ++uIt; } #endif m_lu.coeffRefLower(row, col) = newCoeff / diag; } } //Upper matrix update const Index col = row; typename MatrixType::InnerUpperIterator uuIt(m_lu, col); for (Index rrow = uuIt.row(); rrow < col; rrow++) { typename MatrixType::InnerLowerIterator lIt(m_lu, rrow); typename MatrixType::InnerUpperIterator uIt(m_lu, col); const Index offset = lIt.col() - uIt.row(); Index stop = offset > 0 ? rrow - lIt.col() : rrow - uIt.row(); #ifdef VECTORIZE Map rowVal(lIt.valuePtr() + (offset > 0 ? 0 : -offset), stop); Map colVal(uIt.valuePtr() + (offset > 0 ? offset : 0), stop); Scalar newCoeff = m_lu.coeffUpper(rrow, col) - rowVal.dot(colVal); #else if (offset > 0) //Skip zero value of lIt uIt += offset; else //Skip zero values of uIt lIt += -offset; Scalar newCoeff = m_lu.coeffUpper(rrow, col); for (Index k = 0; k < stop; ++k) { const Scalar tmp = newCoeff; newCoeff = tmp - lIt.value() * uIt.value(); ++lIt; ++uIt; } #endif m_lu.coeffRefUpper(rrow, col) = newCoeff; } //Diag matrix update typename MatrixType::InnerLowerIterator lIt(m_lu, row); typename MatrixType::InnerUpperIterator uIt(m_lu, row); const Index offset = lIt.col() - uIt.row(); Index stop = offset > 0 ? lIt.size() : uIt.size(); #ifdef VECTORIZE Map rowVal(lIt.valuePtr() + (offset > 0 ? 0 : -offset), stop); Map colVal(uIt.valuePtr() + (offset > 0 ? offset : 0), stop); Scalar newCoeff = m_lu.coeffDiag(row) - rowVal.dot(colVal); #else if (offset > 0) //Skip zero value of lIt uIt += offset; else //Skip zero values of uIt lIt += -offset; Scalar newCoeff = m_lu.coeffDiag(row); for (Index k = 0; k < stop; ++k) { const Scalar tmp = newCoeff; newCoeff = tmp - lIt.value() * uIt.value(); ++lIt; ++uIt; } #endif m_lu.coeffRefDiag(row) = newCoeff; } } /** Computes *x = U^-1 L^-1 b * * If \a transpose is set to SvTranspose or SvAdjoint, the solution * of the transposed/adjoint system is computed instead. * * Not all backends implement the solution of the transposed or * adjoint system. */ template template bool SkylineInplaceLU::solve(const MatrixBase &b, MatrixBase* x, const int transposed) const { const size_t rows = m_lu.rows(); const size_t cols = m_lu.cols(); for (Index row = 0; row < rows; row++) { x->coeffRef(row) = b.coeff(row); Scalar newVal = x->coeff(row); typename MatrixType::InnerLowerIterator lIt(m_lu, row); Index col = lIt.col(); while (lIt.col() < row) { newVal -= x->coeff(col++) * lIt.value(); ++lIt; } x->coeffRef(row) = newVal; } for (Index col = rows - 1; col > 0; col--) { x->coeffRef(col) = x->coeff(col) / m_lu.coeffDiag(col); const Scalar x_col = x->coeff(col); typename MatrixType::InnerUpperIterator uIt(m_lu, col); uIt += uIt.size()-1; while (uIt) { x->coeffRef(uIt.row()) -= x_col * uIt.value(); //TODO : introduce --operator uIt += -1; } } x->coeffRef(0) = x->coeff(0) / m_lu.coeffDiag(0); return true; } } // end namespace Eigen #endif // EIGEN_SKYLINEINPLACELU_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineMatrix.h0000644000176200001440000007460114562417221025342 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINEMATRIX_H #define EIGEN_SKYLINEMATRIX_H #include "SkylineStorage.h" #include "SkylineMatrixBase.h" namespace Eigen { /** \ingroup Skyline_Module * * \class SkylineMatrix * * \brief The main skyline matrix class * * This class implements a skyline matrix using the very uncommon storage * scheme. * * \param _Scalar the scalar type, i.e. the type of the coefficients * \param _Options Union of bit flags controlling the storage scheme. Currently the only possibility * is RowMajor. The default is 0 which means column-major. * * */ namespace internal { template struct traits > { typedef _Scalar Scalar; typedef Sparse StorageKind; enum { RowsAtCompileTime = Dynamic, ColsAtCompileTime = Dynamic, MaxRowsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic, Flags = SkylineBit | _Options, CoeffReadCost = NumTraits::ReadCost, }; }; } template class SkylineMatrix : public SkylineMatrixBase > { public: EIGEN_SKYLINE_GENERIC_PUBLIC_INTERFACE(SkylineMatrix) EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(SkylineMatrix, +=) EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(SkylineMatrix, -=) using Base::IsRowMajor; protected: typedef SkylineMatrix TransposedSkylineMatrix; Index m_outerSize; Index m_innerSize; public: Index* m_colStartIndex; Index* m_rowStartIndex; SkylineStorage m_data; public: inline Index rows() const { return IsRowMajor ? m_outerSize : m_innerSize; } inline Index cols() const { return IsRowMajor ? m_innerSize : m_outerSize; } inline Index innerSize() const { return m_innerSize; } inline Index outerSize() const { return m_outerSize; } inline Index upperNonZeros() const { return m_data.upperSize(); } inline Index lowerNonZeros() const { return m_data.lowerSize(); } inline Index upperNonZeros(Index j) const { return m_colStartIndex[j + 1] - m_colStartIndex[j]; } inline Index lowerNonZeros(Index j) const { return m_rowStartIndex[j + 1] - m_rowStartIndex[j]; } inline const Scalar* _diagPtr() const { return &m_data.diag(0); } inline Scalar* _diagPtr() { return &m_data.diag(0); } inline const Scalar* _upperPtr() const { return &m_data.upper(0); } inline Scalar* _upperPtr() { return &m_data.upper(0); } inline const Scalar* _lowerPtr() const { return &m_data.lower(0); } inline Scalar* _lowerPtr() { return &m_data.lower(0); } inline const Index* _upperProfilePtr() const { return &m_data.upperProfile(0); } inline Index* _upperProfilePtr() { return &m_data.upperProfile(0); } inline const Index* _lowerProfilePtr() const { return &m_data.lowerProfile(0); } inline Index* _lowerProfilePtr() { return &m_data.lowerProfile(0); } inline Scalar coeff(Index row, Index col) const { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); if (outer == inner) return this->m_data.diag(outer); if (IsRowMajor) { if (inner > outer) //upper matrix { const Index minOuterIndex = inner - m_data.upperProfile(inner); if (outer >= minOuterIndex) return this->m_data.upper(m_colStartIndex[inner] + outer - (inner - m_data.upperProfile(inner))); else return Scalar(0); } if (inner < outer) //lower matrix { const Index minInnerIndex = outer - m_data.lowerProfile(outer); if (inner >= minInnerIndex) return this->m_data.lower(m_rowStartIndex[outer] + inner - (outer - m_data.lowerProfile(outer))); else return Scalar(0); } return m_data.upper(m_colStartIndex[inner] + outer - inner); } else { if (outer > inner) //upper matrix { const Index maxOuterIndex = inner + m_data.upperProfile(inner); if (outer <= maxOuterIndex) return this->m_data.upper(m_colStartIndex[inner] + (outer - inner)); else return Scalar(0); } if (outer < inner) //lower matrix { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); if (inner <= maxInnerIndex) return this->m_data.lower(m_rowStartIndex[outer] + (inner - outer)); else return Scalar(0); } } } inline Scalar& coeffRef(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); if (outer == inner) return this->m_data.diag(outer); if (IsRowMajor) { if (col > row) //upper matrix { const Index minOuterIndex = inner - m_data.upperProfile(inner); eigen_assert(outer >= minOuterIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.upper(m_colStartIndex[inner] + outer - (inner - m_data.upperProfile(inner))); } if (col < row) //lower matrix { const Index minInnerIndex = outer - m_data.lowerProfile(outer); eigen_assert(inner >= minInnerIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.lower(m_rowStartIndex[outer] + inner - (outer - m_data.lowerProfile(outer))); } } else { if (outer > inner) //upper matrix { const Index maxOuterIndex = inner + m_data.upperProfile(inner); eigen_assert(outer <= maxOuterIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.upper(m_colStartIndex[inner] + (outer - inner)); } if (outer < inner) //lower matrix { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); eigen_assert(inner <= maxInnerIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.lower(m_rowStartIndex[outer] + (inner - outer)); } } } inline Scalar coeffDiag(Index idx) const { eigen_assert(idx < outerSize()); eigen_assert(idx < innerSize()); return this->m_data.diag(idx); } inline Scalar coeffLower(Index row, Index col) const { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minInnerIndex = outer - m_data.lowerProfile(outer); if (inner >= minInnerIndex) return this->m_data.lower(m_rowStartIndex[outer] + inner - (outer - m_data.lowerProfile(outer))); else return Scalar(0); } else { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); if (inner <= maxInnerIndex) return this->m_data.lower(m_rowStartIndex[outer] + (inner - outer)); else return Scalar(0); } } inline Scalar coeffUpper(Index row, Index col) const { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minOuterIndex = inner - m_data.upperProfile(inner); if (outer >= minOuterIndex) return this->m_data.upper(m_colStartIndex[inner] + outer - (inner - m_data.upperProfile(inner))); else return Scalar(0); } else { const Index maxOuterIndex = inner + m_data.upperProfile(inner); if (outer <= maxOuterIndex) return this->m_data.upper(m_colStartIndex[inner] + (outer - inner)); else return Scalar(0); } } inline Scalar& coeffRefDiag(Index idx) { eigen_assert(idx < outerSize()); eigen_assert(idx < innerSize()); return this->m_data.diag(idx); } inline Scalar& coeffRefLower(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minInnerIndex = outer - m_data.lowerProfile(outer); eigen_assert(inner >= minInnerIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.lower(m_rowStartIndex[outer] + inner - (outer - m_data.lowerProfile(outer))); } else { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); eigen_assert(inner <= maxInnerIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.lower(m_rowStartIndex[outer] + (inner - outer)); } } inline bool coeffExistLower(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minInnerIndex = outer - m_data.lowerProfile(outer); return inner >= minInnerIndex; } else { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); return inner <= maxInnerIndex; } } inline Scalar& coeffRefUpper(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minOuterIndex = inner - m_data.upperProfile(inner); eigen_assert(outer >= minOuterIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.upper(m_colStartIndex[inner] + outer - (inner - m_data.upperProfile(inner))); } else { const Index maxOuterIndex = inner + m_data.upperProfile(inner); eigen_assert(outer <= maxOuterIndex && "You tried to access a coeff that does not exist in the storage"); return this->m_data.upper(m_colStartIndex[inner] + (outer - inner)); } } inline bool coeffExistUpper(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); eigen_assert(inner != outer); if (IsRowMajor) { const Index minOuterIndex = inner - m_data.upperProfile(inner); return outer >= minOuterIndex; } else { const Index maxOuterIndex = inner + m_data.upperProfile(inner); return outer <= maxOuterIndex; } } protected: public: class InnerUpperIterator; class InnerLowerIterator; class OuterUpperIterator; class OuterLowerIterator; /** Removes all non zeros */ inline void setZero() { m_data.clear(); memset(m_colStartIndex, 0, (m_outerSize + 1) * sizeof (Index)); memset(m_rowStartIndex, 0, (m_outerSize + 1) * sizeof (Index)); } /** \returns the number of non zero coefficients */ inline Index nonZeros() const { return m_data.diagSize() + m_data.upperSize() + m_data.lowerSize(); } /** Preallocates \a reserveSize non zeros */ inline void reserve(Index reserveSize, Index reserveUpperSize, Index reserveLowerSize) { m_data.reserve(reserveSize, reserveUpperSize, reserveLowerSize); } /** \returns a reference to a novel non zero coefficient with coordinates \a row x \a col. * * \warning This function can be extremely slow if the non zero coefficients * are not inserted in a coherent order. * * After an insertion session, you should call the finalize() function. */ EIGEN_DONT_INLINE Scalar & insert(Index row, Index col) { const Index outer = IsRowMajor ? row : col; const Index inner = IsRowMajor ? col : row; eigen_assert(outer < outerSize()); eigen_assert(inner < innerSize()); if (outer == inner) return m_data.diag(col); if (IsRowMajor) { if (outer < inner) //upper matrix { Index minOuterIndex = 0; minOuterIndex = inner - m_data.upperProfile(inner); if (outer < minOuterIndex) //The value does not yet exist { const Index previousProfile = m_data.upperProfile(inner); m_data.upperProfile(inner) = inner - outer; const Index bandIncrement = m_data.upperProfile(inner) - previousProfile; //shift data stored after this new one const Index stop = m_colStartIndex[cols()]; const Index start = m_colStartIndex[inner]; for (Index innerIdx = stop; innerIdx >= start; innerIdx--) { m_data.upper(innerIdx + bandIncrement) = m_data.upper(innerIdx); } for (Index innerIdx = cols(); innerIdx > inner; innerIdx--) { m_colStartIndex[innerIdx] += bandIncrement; } //zeros new data memset(this->_upperPtr() + start, 0, (bandIncrement - 1) * sizeof (Scalar)); return m_data.upper(m_colStartIndex[inner]); } else { return m_data.upper(m_colStartIndex[inner] + outer - (inner - m_data.upperProfile(inner))); } } if (outer > inner) //lower matrix { const Index minInnerIndex = outer - m_data.lowerProfile(outer); if (inner < minInnerIndex) //The value does not yet exist { const Index previousProfile = m_data.lowerProfile(outer); m_data.lowerProfile(outer) = outer - inner; const Index bandIncrement = m_data.lowerProfile(outer) - previousProfile; //shift data stored after this new one const Index stop = m_rowStartIndex[rows()]; const Index start = m_rowStartIndex[outer]; for (Index innerIdx = stop; innerIdx >= start; innerIdx--) { m_data.lower(innerIdx + bandIncrement) = m_data.lower(innerIdx); } for (Index innerIdx = rows(); innerIdx > outer; innerIdx--) { m_rowStartIndex[innerIdx] += bandIncrement; } //zeros new data memset(this->_lowerPtr() + start, 0, (bandIncrement - 1) * sizeof (Scalar)); return m_data.lower(m_rowStartIndex[outer]); } else { return m_data.lower(m_rowStartIndex[outer] + inner - (outer - m_data.lowerProfile(outer))); } } } else { if (outer > inner) //upper matrix { const Index maxOuterIndex = inner + m_data.upperProfile(inner); if (outer > maxOuterIndex) //The value does not yet exist { const Index previousProfile = m_data.upperProfile(inner); m_data.upperProfile(inner) = outer - inner; const Index bandIncrement = m_data.upperProfile(inner) - previousProfile; //shift data stored after this new one const Index stop = m_rowStartIndex[rows()]; const Index start = m_rowStartIndex[inner + 1]; for (Index innerIdx = stop; innerIdx >= start; innerIdx--) { m_data.upper(innerIdx + bandIncrement) = m_data.upper(innerIdx); } for (Index innerIdx = inner + 1; innerIdx < outerSize() + 1; innerIdx++) { m_rowStartIndex[innerIdx] += bandIncrement; } memset(this->_upperPtr() + m_rowStartIndex[inner] + previousProfile + 1, 0, (bandIncrement - 1) * sizeof (Scalar)); return m_data.upper(m_rowStartIndex[inner] + m_data.upperProfile(inner)); } else { return m_data.upper(m_rowStartIndex[inner] + (outer - inner)); } } if (outer < inner) //lower matrix { const Index maxInnerIndex = outer + m_data.lowerProfile(outer); if (inner > maxInnerIndex) //The value does not yet exist { const Index previousProfile = m_data.lowerProfile(outer); m_data.lowerProfile(outer) = inner - outer; const Index bandIncrement = m_data.lowerProfile(outer) - previousProfile; //shift data stored after this new one const Index stop = m_colStartIndex[cols()]; const Index start = m_colStartIndex[outer + 1]; for (Index innerIdx = stop; innerIdx >= start; innerIdx--) { m_data.lower(innerIdx + bandIncrement) = m_data.lower(innerIdx); } for (Index innerIdx = outer + 1; innerIdx < outerSize() + 1; innerIdx++) { m_colStartIndex[innerIdx] += bandIncrement; } memset(this->_lowerPtr() + m_colStartIndex[outer] + previousProfile + 1, 0, (bandIncrement - 1) * sizeof (Scalar)); return m_data.lower(m_colStartIndex[outer] + m_data.lowerProfile(outer)); } else { return m_data.lower(m_colStartIndex[outer] + (inner - outer)); } } } } /** Must be called after inserting a set of non zero entries. */ inline void finalize() { if (IsRowMajor) { if (rows() > cols()) m_data.resize(cols(), cols(), rows(), m_colStartIndex[cols()] + 1, m_rowStartIndex[rows()] + 1); else m_data.resize(rows(), cols(), rows(), m_colStartIndex[cols()] + 1, m_rowStartIndex[rows()] + 1); // eigen_assert(rows() == cols() && "memory reorganisatrion only works with suare matrix"); // // Scalar* newArray = new Scalar[m_colStartIndex[cols()] + 1 + m_rowStartIndex[rows()] + 1]; // Index dataIdx = 0; // for (Index row = 0; row < rows(); row++) { // // const Index nbLowerElts = m_rowStartIndex[row + 1] - m_rowStartIndex[row]; // // std::cout << "nbLowerElts" << nbLowerElts << std::endl; // memcpy(newArray + dataIdx, m_data.m_lower + m_rowStartIndex[row], nbLowerElts * sizeof (Scalar)); // m_rowStartIndex[row] = dataIdx; // dataIdx += nbLowerElts; // // const Index nbUpperElts = m_colStartIndex[row + 1] - m_colStartIndex[row]; // memcpy(newArray + dataIdx, m_data.m_upper + m_colStartIndex[row], nbUpperElts * sizeof (Scalar)); // m_colStartIndex[row] = dataIdx; // dataIdx += nbUpperElts; // // // } // //todo : don't access m_data profile directly : add an accessor from SkylineMatrix // m_rowStartIndex[rows()] = m_rowStartIndex[rows()-1] + m_data.lowerProfile(rows()-1); // m_colStartIndex[cols()] = m_colStartIndex[cols()-1] + m_data.upperProfile(cols()-1); // // delete[] m_data.m_lower; // delete[] m_data.m_upper; // // m_data.m_lower = newArray; // m_data.m_upper = newArray; } else { if (rows() > cols()) m_data.resize(cols(), rows(), cols(), m_rowStartIndex[cols()] + 1, m_colStartIndex[cols()] + 1); else m_data.resize(rows(), rows(), cols(), m_rowStartIndex[rows()] + 1, m_colStartIndex[rows()] + 1); } } inline void squeeze() { finalize(); m_data.squeeze(); } void prune(Scalar reference, RealScalar epsilon = dummy_precision ()) { //TODO } /** Resizes the matrix to a \a rows x \a cols matrix and initializes it to zero * \sa resizeNonZeros(Index), reserve(), setZero() */ void resize(size_t rows, size_t cols) { const Index diagSize = rows > cols ? cols : rows; m_innerSize = IsRowMajor ? cols : rows; eigen_assert(rows == cols && "Skyline matrix must be square matrix"); if (diagSize % 2) { // diagSize is odd const Index k = (diagSize - 1) / 2; m_data.resize(diagSize, IsRowMajor ? cols : rows, IsRowMajor ? rows : cols, 2 * k * k + k + 1, 2 * k * k + k + 1); } else // diagSize is even { const Index k = diagSize / 2; m_data.resize(diagSize, IsRowMajor ? cols : rows, IsRowMajor ? rows : cols, 2 * k * k - k + 1, 2 * k * k - k + 1); } if (m_colStartIndex && m_rowStartIndex) { delete[] m_colStartIndex; delete[] m_rowStartIndex; } m_colStartIndex = new Index [cols + 1]; m_rowStartIndex = new Index [rows + 1]; m_outerSize = diagSize; m_data.reset(); m_data.clear(); m_outerSize = diagSize; memset(m_colStartIndex, 0, (cols + 1) * sizeof (Index)); memset(m_rowStartIndex, 0, (rows + 1) * sizeof (Index)); } void resizeNonZeros(Index size) { m_data.resize(size); } inline SkylineMatrix() : m_outerSize(-1), m_innerSize(0), m_colStartIndex(0), m_rowStartIndex(0) { resize(0, 0); } inline SkylineMatrix(size_t rows, size_t cols) : m_outerSize(0), m_innerSize(0), m_colStartIndex(0), m_rowStartIndex(0) { resize(rows, cols); } template inline SkylineMatrix(const SkylineMatrixBase& other) : m_outerSize(0), m_innerSize(0), m_colStartIndex(0), m_rowStartIndex(0) { *this = other.derived(); } inline SkylineMatrix(const SkylineMatrix & other) : Base(), m_outerSize(0), m_innerSize(0), m_colStartIndex(0), m_rowStartIndex(0) { *this = other.derived(); } inline void swap(SkylineMatrix & other) { //EIGEN_DBG_SKYLINE(std::cout << "SkylineMatrix:: swap\n"); std::swap(m_colStartIndex, other.m_colStartIndex); std::swap(m_rowStartIndex, other.m_rowStartIndex); std::swap(m_innerSize, other.m_innerSize); std::swap(m_outerSize, other.m_outerSize); m_data.swap(other.m_data); } inline SkylineMatrix & operator=(const SkylineMatrix & other) { std::cout << "SkylineMatrix& operator=(const SkylineMatrix& other)\n"; if (other.isRValue()) { swap(other.const_cast_derived()); } else { resize(other.rows(), other.cols()); memcpy(m_colStartIndex, other.m_colStartIndex, (m_outerSize + 1) * sizeof (Index)); memcpy(m_rowStartIndex, other.m_rowStartIndex, (m_outerSize + 1) * sizeof (Index)); m_data = other.m_data; } return *this; } template inline SkylineMatrix & operator=(const SkylineMatrixBase& other) { const bool needToTranspose = (Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit); if (needToTranspose) { // TODO // return *this; } else { // there is no special optimization return SkylineMatrixBase::operator=(other.derived()); } } friend std::ostream & operator <<(std::ostream & s, const SkylineMatrix & m) { EIGEN_DBG_SKYLINE( std::cout << "upper elements : " << std::endl; for (Index i = 0; i < m.m_data.upperSize(); i++) std::cout << m.m_data.upper(i) << "\t"; std::cout << std::endl; std::cout << "upper profile : " << std::endl; for (Index i = 0; i < m.m_data.upperProfileSize(); i++) std::cout << m.m_data.upperProfile(i) << "\t"; std::cout << std::endl; std::cout << "lower startIdx : " << std::endl; for (Index i = 0; i < m.m_data.upperProfileSize(); i++) std::cout << (IsRowMajor ? m.m_colStartIndex[i] : m.m_rowStartIndex[i]) << "\t"; std::cout << std::endl; std::cout << "lower elements : " << std::endl; for (Index i = 0; i < m.m_data.lowerSize(); i++) std::cout << m.m_data.lower(i) << "\t"; std::cout << std::endl; std::cout << "lower profile : " << std::endl; for (Index i = 0; i < m.m_data.lowerProfileSize(); i++) std::cout << m.m_data.lowerProfile(i) << "\t"; std::cout << std::endl; std::cout << "lower startIdx : " << std::endl; for (Index i = 0; i < m.m_data.lowerProfileSize(); i++) std::cout << (IsRowMajor ? m.m_rowStartIndex[i] : m.m_colStartIndex[i]) << "\t"; std::cout << std::endl; ); for (Index rowIdx = 0; rowIdx < m.rows(); rowIdx++) { for (Index colIdx = 0; colIdx < m.cols(); colIdx++) { s << m.coeff(rowIdx, colIdx) << "\t"; } s << std::endl; } return s; } /** Destructor */ inline ~SkylineMatrix() { delete[] m_colStartIndex; delete[] m_rowStartIndex; } /** Overloaded for performance */ Scalar sum() const; }; template class SkylineMatrix::InnerUpperIterator { public: InnerUpperIterator(const SkylineMatrix& mat, Index outer) : m_matrix(mat), m_outer(outer), m_id(_Options == RowMajor ? mat.m_colStartIndex[outer] : mat.m_rowStartIndex[outer] + 1), m_start(m_id), m_end(_Options == RowMajor ? mat.m_colStartIndex[outer + 1] : mat.m_rowStartIndex[outer + 1] + 1) { } inline InnerUpperIterator & operator++() { m_id++; return *this; } inline InnerUpperIterator & operator+=(Index shift) { m_id += shift; return *this; } inline Scalar value() const { return m_matrix.m_data.upper(m_id); } inline Scalar* valuePtr() { return const_cast (&(m_matrix.m_data.upper(m_id))); } inline Scalar& valueRef() { return const_cast (m_matrix.m_data.upper(m_id)); } inline Index index() const { return IsRowMajor ? m_outer - m_matrix.m_data.upperProfile(m_outer) + (m_id - m_start) : m_outer + (m_id - m_start) + 1; } inline Index row() const { return IsRowMajor ? index() : m_outer; } inline Index col() const { return IsRowMajor ? m_outer : index(); } inline size_t size() const { return m_matrix.m_data.upperProfile(m_outer); } inline operator bool() const { return (m_id < m_end) && (m_id >= m_start); } protected: const SkylineMatrix& m_matrix; const Index m_outer; Index m_id; const Index m_start; const Index m_end; }; template class SkylineMatrix::InnerLowerIterator { public: InnerLowerIterator(const SkylineMatrix& mat, Index outer) : m_matrix(mat), m_outer(outer), m_id(_Options == RowMajor ? mat.m_rowStartIndex[outer] : mat.m_colStartIndex[outer] + 1), m_start(m_id), m_end(_Options == RowMajor ? mat.m_rowStartIndex[outer + 1] : mat.m_colStartIndex[outer + 1] + 1) { } inline InnerLowerIterator & operator++() { m_id++; return *this; } inline InnerLowerIterator & operator+=(Index shift) { m_id += shift; return *this; } inline Scalar value() const { return m_matrix.m_data.lower(m_id); } inline Scalar* valuePtr() { return const_cast (&(m_matrix.m_data.lower(m_id))); } inline Scalar& valueRef() { return const_cast (m_matrix.m_data.lower(m_id)); } inline Index index() const { return IsRowMajor ? m_outer - m_matrix.m_data.lowerProfile(m_outer) + (m_id - m_start) : m_outer + (m_id - m_start) + 1; ; } inline Index row() const { return IsRowMajor ? m_outer : index(); } inline Index col() const { return IsRowMajor ? index() : m_outer; } inline size_t size() const { return m_matrix.m_data.lowerProfile(m_outer); } inline operator bool() const { return (m_id < m_end) && (m_id >= m_start); } protected: const SkylineMatrix& m_matrix; const Index m_outer; Index m_id; const Index m_start; const Index m_end; }; } // end namespace Eigen #endif // EIGEN_SKYLINEMATRIX_H RcppEigen/inst/include/unsupported/Eigen/src/Skyline/SkylineUtil.h0000644000176200001440000000612114107270226025000 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Guillaume Saupin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINEUTIL_H #define EIGEN_SKYLINEUTIL_H namespace Eigen { #ifdef NDEBUG #define EIGEN_DBG_SKYLINE(X) #else #define EIGEN_DBG_SKYLINE(X) X #endif const unsigned int SkylineBit = 0x1200; template class SkylineProduct; enum AdditionalProductEvaluationMode {SkylineTimeDenseProduct, SkylineTimeSkylineProduct, DenseTimeSkylineProduct}; enum {IsSkyline = SkylineBit}; #define EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(Derived, Op) \ template \ EIGEN_STRONG_INLINE Derived& operator Op(const Eigen::SkylineMatrixBase& other) \ { \ return Base::operator Op(other.derived()); \ } \ EIGEN_STRONG_INLINE Derived& operator Op(const Derived& other) \ { \ return Base::operator Op(other); \ } #define EIGEN_SKYLINE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, Op) \ template \ EIGEN_STRONG_INLINE Derived& operator Op(const Other& scalar) \ { \ return Base::operator Op(scalar); \ } #define EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATORS(Derived) \ EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(Derived, =) \ EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(Derived, +=) \ EIGEN_SKYLINE_INHERIT_ASSIGNMENT_OPERATOR(Derived, -=) \ EIGEN_SKYLINE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, *=) \ EIGEN_SKYLINE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, /=) #define _EIGEN_SKYLINE_GENERIC_PUBLIC_INTERFACE(Derived, BaseClass) \ typedef BaseClass Base; \ typedef typename Eigen::internal::traits::Scalar Scalar; \ typedef typename Eigen::NumTraits::Real RealScalar; \ typedef typename Eigen::internal::traits::StorageKind StorageKind; \ typedef typename Eigen::internal::index::type Index; \ enum { Flags = Eigen::internal::traits::Flags, }; #define EIGEN_SKYLINE_GENERIC_PUBLIC_INTERFACE(Derived) \ _EIGEN_SKYLINE_GENERIC_PUBLIC_INTERFACE(Derived, Eigen::SkylineMatrixBase) template class SkylineMatrixBase; template class SkylineMatrix; template class DynamicSkylineMatrix; template class SkylineVector; template class MappedSkylineMatrix; namespace internal { template struct skyline_product_mode; template::value> struct SkylineProductReturnType; template class eval { typedef typename traits::Scalar _Scalar; enum { _Flags = traits::Flags }; public: typedef SkylineMatrix<_Scalar, _Flags> type; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SKYLINEUTIL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/0000755000176200001440000000000014562417221020755 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/TensorSymmetry0000644000176200001440000000236314562417221023730 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSORSYMMETRY_MODULE #define EIGEN_CXX11_TENSORSYMMETRY_MODULE #include "Tensor" #include "../../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/util/CXX11Meta.h" /** \defgroup CXX11_TensorSymmetry_Module Tensor Symmetry Module * * This module provides a classes that allow for the definition of * symmetries w.r.t. tensor indices. * * Including this module will implicitly include the Tensor module. * * \code * #include * \endcode */ #include "src/TensorSymmetry/util/TemplateGroupTheory.h" #include "src/TensorSymmetry/Symmetry.h" #include "src/TensorSymmetry/StaticSymmetry.h" #include "src/TensorSymmetry/DynamicSymmetry.h" #include "../../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_CXX11_TENSORSYMMETRY_MODULE /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/ThreadPool0000644000176200001440000000404714562417221022746 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_MODULE #define EIGEN_CXX11_THREADPOOL_MODULE #include "../../../Eigen/Core" #include "../../../Eigen/src/Core/util/DisableStupidWarnings.h" /** \defgroup CXX11_ThreadPool_Module C++11 ThreadPool Module * * This module provides 2 threadpool implementations * - a simple reference implementation * - a faster non blocking implementation * * This module requires C++11. * * \code * #include * \endcode */ // The code depends on CXX11, so only include the module if the // compiler supports it. #if (EIGEN_COMP_CXXVER >= 11) #include #include #include #include #include #include #include #include #include #include #include #include // There are non-parenthesized calls to "max" in the header, // which trigger a check in test/main.h causing compilation to fail. // We work around the check here by removing the check for max in // the case where we have to emulate thread_local. #ifdef max #undef max #endif #include #include "src/util/CXX11Meta.h" #include "src/util/MaxSizeVector.h" #include "src/ThreadPool/ThreadLocal.h" #include "src/ThreadPool/ThreadYield.h" #include "src/ThreadPool/ThreadCancel.h" #include "src/ThreadPool/EventCount.h" #include "src/ThreadPool/RunQueue.h" #include "src/ThreadPool/ThreadPoolInterface.h" #include "src/ThreadPool/ThreadEnvironment.h" #include "src/ThreadPool/Barrier.h" #include "src/ThreadPool/NonBlockingThreadPool.h" #endif #include "../../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_CXX11_THREADPOOL_MODULE RcppEigen/inst/include/unsupported/Eigen/CXX11/Tensor0000644000176200001440000001013314562417221022150 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. //#ifndef EIGEN_CXX11_TENSOR_MODULE //#define EIGEN_CXX11_TENSOR_MODULE #include "../../../Eigen/Core" #if EIGEN_HAS_CXX11 #include "../SpecialFunctions" #include "../../../Eigen/src/Core/util/DisableStupidWarnings.h" #include "src/util/CXX11Meta.h" #include "src/util/MaxSizeVector.h" /** \defgroup CXX11_Tensor_Module Tensor Module * * This module provides a Tensor class for storing arbitrarily indexed * objects. * * \code * #include * \endcode * * Much of the documentation can be found \ref eigen_tensors "here". */ #include #include #include #include #include #include #include #if defined(EIGEN_USE_THREADS) || defined(EIGEN_USE_SYCL) #include "ThreadPool" #endif #ifdef EIGEN_USE_GPU #include #if defined(EIGEN_USE_HIP) #include #else #include #endif #endif #include "src/Tensor/TensorMacros.h" #include "src/Tensor/TensorForwardDeclarations.h" #include "src/Tensor/TensorMeta.h" #include "src/Tensor/TensorFunctors.h" #include "src/Tensor/TensorCostModel.h" #include "src/Tensor/TensorDeviceDefault.h" #include "src/Tensor/TensorDeviceThreadPool.h" #include "src/Tensor/TensorDeviceGpu.h" #ifndef gpu_assert #define gpu_assert(x) #endif #include "src/Tensor/TensorDeviceSycl.h" #include "src/Tensor/TensorIndexList.h" #include "src/Tensor/TensorDimensionList.h" #include "src/Tensor/TensorDimensions.h" #include "src/Tensor/TensorInitializer.h" #include "src/Tensor/TensorTraits.h" #include "src/Tensor/TensorRandom.h" #include "src/Tensor/TensorUInt128.h" #include "src/Tensor/TensorIntDiv.h" #include "src/Tensor/TensorGlobalFunctions.h" #include "src/Tensor/TensorBase.h" #include "src/Tensor/TensorBlock.h" #include "src/Tensor/TensorEvaluator.h" #include "src/Tensor/TensorExpr.h" #include "src/Tensor/TensorReduction.h" #include "src/Tensor/TensorReductionGpu.h" #include "src/Tensor/TensorArgMax.h" #include "src/Tensor/TensorConcatenation.h" #include "src/Tensor/TensorContractionMapper.h" #include "src/Tensor/TensorContractionBlocking.h" #include "src/Tensor/TensorContraction.h" #include "src/Tensor/TensorContractionThreadPool.h" #include "src/Tensor/TensorContractionGpu.h" #include "src/Tensor/TensorConversion.h" #include "src/Tensor/TensorConvolution.h" #include "src/Tensor/TensorFFT.h" #include "src/Tensor/TensorPatch.h" #include "src/Tensor/TensorImagePatch.h" #include "src/Tensor/TensorVolumePatch.h" #include "src/Tensor/TensorBroadcasting.h" #include "src/Tensor/TensorChipping.h" #include "src/Tensor/TensorInflation.h" #include "src/Tensor/TensorLayoutSwap.h" #include "src/Tensor/TensorMorphing.h" #include "src/Tensor/TensorPadding.h" #include "src/Tensor/TensorReverse.h" #include "src/Tensor/TensorShuffling.h" #include "src/Tensor/TensorStriding.h" #include "src/Tensor/TensorCustomOp.h" #include "src/Tensor/TensorEvalTo.h" #include "src/Tensor/TensorForcedEval.h" #include "src/Tensor/TensorGenerator.h" #include "src/Tensor/TensorAssign.h" #include "src/Tensor/TensorScan.h" #include "src/Tensor/TensorTrace.h" #ifdef EIGEN_USE_SYCL #include "src/Tensor/TensorReductionSycl.h" #include "src/Tensor/TensorConvolutionSycl.h" #include "src/Tensor/TensorContractionSycl.h" #include "src/Tensor/TensorScanSycl.h" #endif #include "src/Tensor/TensorExecutor.h" #include "src/Tensor/TensorDevice.h" #include "src/Tensor/TensorStorage.h" #include "src/Tensor/Tensor.h" #include "src/Tensor/TensorFixedSize.h" #include "src/Tensor/TensorMap.h" #include "src/Tensor/TensorRef.h" #include "src/Tensor/TensorIO.h" #include "../../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_HAS_CXX11 //#endif // EIGEN_CXX11_TENSOR_MODULE RcppEigen/inst/include/unsupported/Eigen/CXX11/src/0000755000176200001440000000000014107270226021541 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/0000755000176200001440000000000014107270226024565 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/StaticSymmetry.h0000644000176200001440000002157614107270226027752 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H #define EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H namespace Eigen { namespace internal { template struct tensor_static_symgroup_permutate; template struct tensor_static_symgroup_permutate> { constexpr static std::size_t N = sizeof...(nn); template constexpr static inline std::array run(const std::array& indices) { return {{indices[nn]...}}; } }; template struct tensor_static_symgroup_element { typedef indices_ indices; constexpr static int flags = flags_; }; template struct tensor_static_symgroup_element_ctor { typedef tensor_static_symgroup_element< typename gen_numeric_list_swapped_pair::type, Gen::Flags > type; }; template struct tensor_static_symgroup_identity_ctor { typedef tensor_static_symgroup_element< typename gen_numeric_list::type, 0 > type; }; template struct tensor_static_symgroup_multiply_helper { template constexpr static inline numeric_list::value...> helper(numeric_list) { return numeric_list::value...>(); } }; template struct tensor_static_symgroup_multiply { private: typedef typename A::indices iia; typedef typename B::indices iib; constexpr static int ffa = A::flags; constexpr static int ffb = B::flags; public: static_assert(iia::count == iib::count, "Cannot multiply symmetry elements with different number of indices."); typedef tensor_static_symgroup_element< decltype(tensor_static_symgroup_multiply_helper::helper(iia())), ffa ^ ffb > type; }; template struct tensor_static_symgroup_equality { typedef typename A::indices iia; typedef typename B::indices iib; constexpr static int ffa = A::flags; constexpr static int ffb = B::flags; static_assert(iia::count == iib::count, "Cannot compare symmetry elements with different number of indices."); constexpr static bool value = is_same::value; private: /* this should be zero if they are identical, or else the tensor * will be forced to be pure real, pure imaginary or even pure zero */ constexpr static int flags_cmp_ = ffa ^ ffb; /* either they are not equal, then we don't care whether the flags * match, or they are equal, and then we have to check */ constexpr static bool is_zero = value && flags_cmp_ == NegationFlag; constexpr static bool is_real = value && flags_cmp_ == ConjugationFlag; constexpr static bool is_imag = value && flags_cmp_ == (NegationFlag | ConjugationFlag); public: constexpr static int global_flags = (is_real ? GlobalRealFlag : 0) | (is_imag ? GlobalImagFlag : 0) | (is_zero ? GlobalZeroFlag : 0); }; template struct tensor_static_symgroup { typedef StaticSGroup type; constexpr static std::size_t size = type::static_size; }; template constexpr static inline std::array tensor_static_symgroup_index_permute(std::array idx, internal::numeric_list, internal::numeric_list) { return {{ idx[ii]..., idx[jj]... }}; } template static inline std::vector tensor_static_symgroup_index_permute(std::vector idx, internal::numeric_list) { std::vector result{{ idx[ii]... }}; std::size_t target_size = idx.size(); for (std::size_t i = result.size(); i < target_size; i++) result.push_back(idx[i]); return result; } template struct tensor_static_symgroup_do_apply; template struct tensor_static_symgroup_do_apply> { template static inline RV run(const std::array& idx, RV initial, Args&&... args) { static_assert(NumIndices >= SGNumIndices, "Can only apply symmetry group to objects that have at least the required amount of indices."); typedef typename internal::gen_numeric_list::type remaining_indices; initial = Op::run(tensor_static_symgroup_index_permute(idx, typename first::indices(), remaining_indices()), first::flags, initial, std::forward(args)...); return tensor_static_symgroup_do_apply>::template run(idx, initial, args...); } template static inline RV run(const std::vector& idx, RV initial, Args&&... args) { eigen_assert(idx.size() >= SGNumIndices && "Can only apply symmetry group to objects that have at least the required amount of indices."); initial = Op::run(tensor_static_symgroup_index_permute(idx, typename first::indices()), first::flags, initial, std::forward(args)...); return tensor_static_symgroup_do_apply>::template run(idx, initial, args...); } }; template struct tensor_static_symgroup_do_apply> { template static inline RV run(const std::array&, RV initial, Args&&...) { // do nothing return initial; } template static inline RV run(const std::vector&, RV initial, Args&&...) { // do nothing return initial; } }; } // end namespace internal template class StaticSGroup { constexpr static std::size_t NumIndices = internal::tensor_symmetry_num_indices::value; typedef internal::group_theory::enumerate_group_elements< internal::tensor_static_symgroup_multiply, internal::tensor_static_symgroup_equality, typename internal::tensor_static_symgroup_identity_ctor::type, internal::type_list::type...> > group_elements; typedef typename group_elements::type ge; public: constexpr inline StaticSGroup() {} constexpr inline StaticSGroup(const StaticSGroup&) {} constexpr inline StaticSGroup(StaticSGroup&&) {} template static inline RV apply(const std::array& idx, RV initial, Args&&... args) { return internal::tensor_static_symgroup_do_apply::template run(idx, initial, args...); } template static inline RV apply(const std::vector& idx, RV initial, Args&&... args) { eigen_assert(idx.size() == NumIndices); return internal::tensor_static_symgroup_do_apply::template run(idx, initial, args...); } constexpr static std::size_t static_size = ge::count; constexpr static inline std::size_t size() { return ge::count; } constexpr static inline int globalFlags() { return group_elements::global_flags; } template inline internal::tensor_symmetry_value_setter> operator()(Tensor_& tensor, typename Tensor_::Index firstIndex, IndexTypes... otherIndices) const { static_assert(sizeof...(otherIndices) + 1 == Tensor_::NumIndices, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor."); return operator()(tensor, std::array{{firstIndex, otherIndices...}}); } template inline internal::tensor_symmetry_value_setter> operator()(Tensor_& tensor, std::array const& indices) const { return internal::tensor_symmetry_value_setter>(tensor, *this, indices); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/DynamicSymmetry.h0000644000176200001440000002515114107270226030100 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H #define EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H namespace Eigen { class DynamicSGroup { public: inline explicit DynamicSGroup() : m_numIndices(1), m_elements(), m_generators(), m_globalFlags(0) { m_elements.push_back(ge(Generator(0, 0, 0))); } inline DynamicSGroup(const DynamicSGroup& o) : m_numIndices(o.m_numIndices), m_elements(o.m_elements), m_generators(o.m_generators), m_globalFlags(o.m_globalFlags) { } inline DynamicSGroup(DynamicSGroup&& o) : m_numIndices(o.m_numIndices), m_elements(), m_generators(o.m_generators), m_globalFlags(o.m_globalFlags) { std::swap(m_elements, o.m_elements); } inline DynamicSGroup& operator=(const DynamicSGroup& o) { m_numIndices = o.m_numIndices; m_elements = o.m_elements; m_generators = o.m_generators; m_globalFlags = o.m_globalFlags; return *this; } inline DynamicSGroup& operator=(DynamicSGroup&& o) { m_numIndices = o.m_numIndices; std::swap(m_elements, o.m_elements); m_generators = o.m_generators; m_globalFlags = o.m_globalFlags; return *this; } void add(int one, int two, int flags = 0); template inline void add(Gen_) { add(Gen_::One, Gen_::Two, Gen_::Flags); } inline void addSymmetry(int one, int two) { add(one, two, 0); } inline void addAntiSymmetry(int one, int two) { add(one, two, NegationFlag); } inline void addHermiticity(int one, int two) { add(one, two, ConjugationFlag); } inline void addAntiHermiticity(int one, int two) { add(one, two, NegationFlag | ConjugationFlag); } template inline RV apply(const std::array& idx, RV initial, Args&&... args) const { eigen_assert(N >= m_numIndices && "Can only apply symmetry group to objects that have at least the required amount of indices."); for (std::size_t i = 0; i < size(); i++) initial = Op::run(h_permute(i, idx, typename internal::gen_numeric_list::type()), m_elements[i].flags, initial, std::forward(args)...); return initial; } template inline RV apply(const std::vector& idx, RV initial, Args&&... args) const { eigen_assert(idx.size() >= m_numIndices && "Can only apply symmetry group to objects that have at least the required amount of indices."); for (std::size_t i = 0; i < size(); i++) initial = Op::run(h_permute(i, idx), m_elements[i].flags, initial, std::forward(args)...); return initial; } inline int globalFlags() const { return m_globalFlags; } inline std::size_t size() const { return m_elements.size(); } template inline internal::tensor_symmetry_value_setter operator()(Tensor_& tensor, typename Tensor_::Index firstIndex, IndexTypes... otherIndices) const { static_assert(sizeof...(otherIndices) + 1 == Tensor_::NumIndices, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor."); return operator()(tensor, std::array{{firstIndex, otherIndices...}}); } template inline internal::tensor_symmetry_value_setter operator()(Tensor_& tensor, std::array const& indices) const { return internal::tensor_symmetry_value_setter(tensor, *this, indices); } private: struct GroupElement { std::vector representation; int flags; bool isId() const { for (std::size_t i = 0; i < representation.size(); i++) if (i != (size_t)representation[i]) return false; return true; } }; struct Generator { int one; int two; int flags; constexpr inline Generator(int one_, int two_, int flags_) : one(one_), two(two_), flags(flags_) {} }; std::size_t m_numIndices; std::vector m_elements; std::vector m_generators; int m_globalFlags; template inline std::array h_permute(std::size_t which, const std::array& idx, internal::numeric_list) const { return std::array{{ idx[n >= m_numIndices ? n : m_elements[which].representation[n]]... }}; } template inline std::vector h_permute(std::size_t which, std::vector idx) const { std::vector result; result.reserve(idx.size()); for (auto k : m_elements[which].representation) result.push_back(idx[k]); for (std::size_t i = m_numIndices; i < idx.size(); i++) result.push_back(idx[i]); return result; } inline GroupElement ge(Generator const& g) const { GroupElement result; result.representation.reserve(m_numIndices); result.flags = g.flags; for (std::size_t k = 0; k < m_numIndices; k++) { if (k == (std::size_t)g.one) result.representation.push_back(g.two); else if (k == (std::size_t)g.two) result.representation.push_back(g.one); else result.representation.push_back(int(k)); } return result; } GroupElement mul(GroupElement, GroupElement) const; inline GroupElement mul(Generator g1, GroupElement g2) const { return mul(ge(g1), g2); } inline GroupElement mul(GroupElement g1, Generator g2) const { return mul(g1, ge(g2)); } inline GroupElement mul(Generator g1, Generator g2) const { return mul(ge(g1), ge(g2)); } inline int findElement(GroupElement e) const { for (auto ee : m_elements) { if (ee.representation == e.representation) return ee.flags ^ e.flags; } return -1; } void updateGlobalFlags(int flagDiffOfSameGenerator); }; // dynamic symmetry group that auto-adds the template parameters in the constructor template class DynamicSGroupFromTemplateArgs : public DynamicSGroup { public: inline DynamicSGroupFromTemplateArgs() : DynamicSGroup() { add_all(internal::type_list()); } inline DynamicSGroupFromTemplateArgs(DynamicSGroupFromTemplateArgs const& other) : DynamicSGroup(other) { } inline DynamicSGroupFromTemplateArgs(DynamicSGroupFromTemplateArgs&& other) : DynamicSGroup(other) { } inline DynamicSGroupFromTemplateArgs& operator=(const DynamicSGroupFromTemplateArgs& o) { DynamicSGroup::operator=(o); return *this; } inline DynamicSGroupFromTemplateArgs& operator=(DynamicSGroupFromTemplateArgs&& o) { DynamicSGroup::operator=(o); return *this; } private: template inline void add_all(internal::type_list) { add(Gen1()); add_all(internal::type_list()); } inline void add_all(internal::type_list<>) { } }; inline DynamicSGroup::GroupElement DynamicSGroup::mul(GroupElement g1, GroupElement g2) const { eigen_internal_assert(g1.representation.size() == m_numIndices); eigen_internal_assert(g2.representation.size() == m_numIndices); GroupElement result; result.representation.reserve(m_numIndices); for (std::size_t i = 0; i < m_numIndices; i++) { int v = g2.representation[g1.representation[i]]; eigen_assert(v >= 0); result.representation.push_back(v); } result.flags = g1.flags ^ g2.flags; return result; } inline void DynamicSGroup::add(int one, int two, int flags) { eigen_assert(one >= 0); eigen_assert(two >= 0); eigen_assert(one != two); if ((std::size_t)one >= m_numIndices || (std::size_t)two >= m_numIndices) { std::size_t newNumIndices = (one > two) ? one : two + 1; for (auto& gelem : m_elements) { gelem.representation.reserve(newNumIndices); for (std::size_t i = m_numIndices; i < newNumIndices; i++) gelem.representation.push_back(i); } m_numIndices = newNumIndices; } Generator g{one, two, flags}; GroupElement e = ge(g); /* special case for first generator */ if (m_elements.size() == 1) { while (!e.isId()) { m_elements.push_back(e); e = mul(e, g); } if (e.flags > 0) updateGlobalFlags(e.flags); // only add in case we didn't have identity if (m_elements.size() > 1) m_generators.push_back(g); return; } int p = findElement(e); if (p >= 0) { updateGlobalFlags(p); return; } std::size_t coset_order = m_elements.size(); m_elements.push_back(e); for (std::size_t i = 1; i < coset_order; i++) m_elements.push_back(mul(m_elements[i], e)); m_generators.push_back(g); std::size_t coset_rep = coset_order; do { for (auto g : m_generators) { e = mul(m_elements[coset_rep], g); p = findElement(e); if (p < 0) { // element not yet in group m_elements.push_back(e); for (std::size_t i = 1; i < coset_order; i++) m_elements.push_back(mul(m_elements[i], e)); } else if (p > 0) { updateGlobalFlags(p); } } coset_rep += coset_order; } while (coset_rep < m_elements.size()); } inline void DynamicSGroup::updateGlobalFlags(int flagDiffOfSameGenerator) { switch (flagDiffOfSameGenerator) { case 0: default: // nothing happened break; case NegationFlag: // every element is it's own negative => whole tensor is zero m_globalFlags |= GlobalZeroFlag; break; case ConjugationFlag: // every element is it's own conjugate => whole tensor is real m_globalFlags |= GlobalRealFlag; break; case (NegationFlag | ConjugationFlag): // every element is it's own negative conjugate => whole tensor is imaginary m_globalFlags |= GlobalImagFlag; break; /* NOTE: * since GlobalZeroFlag == GlobalRealFlag | GlobalImagFlag, if one generator * causes the tensor to be real and the next one to be imaginary, this will * trivially give the correct result */ } } } // end namespace Eigen #endif // EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/Symmetry.h0000644000176200001440000003133514107270226026574 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H #define EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H namespace Eigen { enum { NegationFlag = 0x01, ConjugationFlag = 0x02 }; enum { GlobalRealFlag = 0x01, GlobalImagFlag = 0x02, GlobalZeroFlag = 0x03 }; namespace internal { template struct tensor_symmetry_pre_analysis; template struct tensor_static_symgroup; template struct tensor_static_symgroup_if; template struct tensor_symmetry_calculate_flags; template struct tensor_symmetry_assign_value; template struct tensor_symmetry_num_indices; } // end namespace internal template struct Symmetry { static_assert(One_ != Two_, "Symmetries must cover distinct indices."); constexpr static int One = One_; constexpr static int Two = Two_; constexpr static int Flags = 0; }; template struct AntiSymmetry { static_assert(One_ != Two_, "Symmetries must cover distinct indices."); constexpr static int One = One_; constexpr static int Two = Two_; constexpr static int Flags = NegationFlag; }; template struct Hermiticity { static_assert(One_ != Two_, "Symmetries must cover distinct indices."); constexpr static int One = One_; constexpr static int Two = Two_; constexpr static int Flags = ConjugationFlag; }; template struct AntiHermiticity { static_assert(One_ != Two_, "Symmetries must cover distinct indices."); constexpr static int One = One_; constexpr static int Two = Two_; constexpr static int Flags = ConjugationFlag | NegationFlag; }; /** \class DynamicSGroup * \ingroup TensorSymmetry_Module * * \brief Dynamic symmetry group * * The %DynamicSGroup class represents a symmetry group that need not be known at * compile time. It is useful if one wants to support arbitrary run-time defineable * symmetries for tensors, but it is also instantiated if a symmetry group is defined * at compile time that would be either too large for the compiler to reasonably * generate (using templates to calculate this at compile time is very inefficient) * or that the compiler could generate the group but that it wouldn't make sense to * unroll the loop for setting coefficients anymore. */ class DynamicSGroup; /** \internal * * \class DynamicSGroupFromTemplateArgs * \ingroup TensorSymmetry_Module * * \brief Dynamic symmetry group, initialized from template arguments * * This class is a child class of DynamicSGroup. It uses the template arguments * specified to initialize itself. */ template class DynamicSGroupFromTemplateArgs; /** \class StaticSGroup * \ingroup TensorSymmetry_Module * * \brief Static symmetry group * * This class represents a symmetry group that is known and resolved completely * at compile time. Ideally, no run-time penalty is incurred compared to the * manual unrolling of the symmetry. * * CAUTION: * * Do not use this class directly for large symmetry groups. The compiler * may run into a limit, or segfault or in the very least will take a very, * very, very long time to compile the code. Use the SGroup class instead * if you want a static group. That class contains logic that will * automatically select the DynamicSGroup class instead if the symmetry * group becomes too large. (In that case, unrolling may not even be * beneficial.) */ template class StaticSGroup; /** \class SGroup * \ingroup TensorSymmetry_Module * * \brief Symmetry group, initialized from template arguments * * This class represents a symmetry group whose generators are already * known at compile time. It may or may not be resolved at compile time, * depending on the estimated size of the group. * * \sa StaticSGroup * \sa DynamicSGroup */ template class SGroup : public internal::tensor_symmetry_pre_analysis::value, Gen...>::root_type { public: constexpr static std::size_t NumIndices = internal::tensor_symmetry_num_indices::value; typedef typename internal::tensor_symmetry_pre_analysis::root_type Base; // make standard constructors + assignment operators public inline SGroup() : Base() { } inline SGroup(const SGroup& other) : Base(other) { } inline SGroup(SGroup&& other) : Base(other) { } inline SGroup& operator=(const SGroup& other) { Base::operator=(other); return *this; } inline SGroup& operator=(SGroup&& other) { Base::operator=(other); return *this; } // all else is defined in the base class }; namespace internal { template struct tensor_symmetry_num_indices { constexpr static std::size_t value = 1; }; template struct tensor_symmetry_num_indices, Sym...> { private: constexpr static std::size_t One = static_cast(One_); constexpr static std::size_t Two = static_cast(Two_); constexpr static std::size_t Three = tensor_symmetry_num_indices::value; // don't use std::max, since it's not constexpr until C++14... constexpr static std::size_t maxOneTwoPlusOne = ((One > Two) ? One : Two) + 1; public: constexpr static std::size_t value = (maxOneTwoPlusOne > Three) ? maxOneTwoPlusOne : Three; }; template struct tensor_symmetry_num_indices, Sym...> : public tensor_symmetry_num_indices, Sym...> {}; template struct tensor_symmetry_num_indices, Sym...> : public tensor_symmetry_num_indices, Sym...> {}; template struct tensor_symmetry_num_indices, Sym...> : public tensor_symmetry_num_indices, Sym...> {}; /** \internal * * \class tensor_symmetry_pre_analysis * \ingroup TensorSymmetry_Module * * \brief Pre-select whether to use a static or dynamic symmetry group * * When a symmetry group could in principle be determined at compile time, * this template implements the logic whether to actually do that or whether * to rather defer that to runtime. * * The logic is as follows: *

*
No generators (trivial symmetry):
*
Use a trivial static group. Ideally, this has no performance impact * compared to not using symmetry at all. In practice, this might not * be the case.
*
More than 4 generators:
*
Calculate the group at run time, it is likely far too large for the * compiler to be able to properly generate it in a realistic time.
*
Up to and including 4 generators:
*
Actually enumerate all group elements, but then check how many there * are. If there are more than 16, it is unlikely that unrolling the * loop (as is done in the static compile-time case) is sensible, so * use a dynamic group instead. If there are at most 16 elements, actually * use that static group. Note that the largest group with 4 generators * still compiles with reasonable resources.
*
* * Note: Example compile time performance with g++-4.6 on an Intenl Core i5-3470 * with 16 GiB RAM (all generators non-redundant and the subgroups don't * factorize): * * # Generators -O0 -ggdb -O2 * ------------------------------------------------------------------- * 1 0.5 s / 250 MiB 0.45s / 230 MiB * 2 0.5 s / 260 MiB 0.5 s / 250 MiB * 3 0.65s / 310 MiB 0.62s / 310 MiB * 4 2.2 s / 860 MiB 1.7 s / 770 MiB * 5 130 s / 13000 MiB 120 s / 11000 MiB * * It is clear that everything is still very efficient up to 4 generators, then * the memory and CPU requirements become unreasonable. Thus we only instantiate * the template group theory logic if the number of generators supplied is 4 or * lower, otherwise this will be forced to be done during runtime, where the * algorithm is reasonably fast. */ template struct tensor_symmetry_pre_analysis { typedef StaticSGroup<> root_type; }; template struct tensor_symmetry_pre_analysis { constexpr static std::size_t max_static_generators = 4; constexpr static std::size_t max_static_elements = 16; typedef tensor_static_symgroup_if<(sizeof...(Gens_) + 1 <= max_static_generators), NumIndices, Gen_, Gens_...> helper; constexpr static std::size_t possible_size = helper::size; typedef typename conditional< possible_size == 0 || possible_size >= max_static_elements, DynamicSGroupFromTemplateArgs, typename helper::type >::type root_type; }; template struct tensor_static_symgroup_if { constexpr static std::size_t size = 0; typedef void type; }; template struct tensor_static_symgroup_if : tensor_static_symgroup {}; template struct tensor_symmetry_assign_value { typedef typename Tensor_::Index Index; typedef typename Tensor_::Scalar Scalar; constexpr static std::size_t NumIndices = Tensor_::NumIndices; static inline int run(const std::array& transformed_indices, int transformation_flags, int dummy, Tensor_& tensor, const Scalar& value_) { Scalar value(value_); if (transformation_flags & ConjugationFlag) value = numext::conj(value); if (transformation_flags & NegationFlag) value = -value; tensor.coeffRef(transformed_indices) = value; return dummy; } }; template struct tensor_symmetry_calculate_flags { typedef typename Tensor_::Index Index; constexpr static std::size_t NumIndices = Tensor_::NumIndices; static inline int run(const std::array& transformed_indices, int transform_flags, int current_flags, const std::array& orig_indices) { if (transformed_indices == orig_indices) { if (transform_flags & (ConjugationFlag | NegationFlag)) return current_flags | GlobalImagFlag; // anti-hermitian diagonal else if (transform_flags & ConjugationFlag) return current_flags | GlobalRealFlag; // hermitian diagonal else if (transform_flags & NegationFlag) return current_flags | GlobalZeroFlag; // anti-symmetric diagonal } return current_flags; } }; template class tensor_symmetry_value_setter { public: typedef typename Tensor_::Index Index; typedef typename Tensor_::Scalar Scalar; constexpr static std::size_t NumIndices = Tensor_::NumIndices; inline tensor_symmetry_value_setter(Tensor_& tensor, Symmetry_ const& symmetry, std::array const& indices) : m_tensor(tensor), m_symmetry(symmetry), m_indices(indices) { } inline tensor_symmetry_value_setter& operator=(Scalar const& value) { doAssign(value); return *this; } private: Tensor_& m_tensor; Symmetry_ m_symmetry; std::array m_indices; inline void doAssign(Scalar const& value) { #ifdef EIGEN_TENSOR_SYMMETRY_CHECK_VALUES int value_flags = m_symmetry.template apply, int>(m_indices, m_symmetry.globalFlags(), m_indices); if (value_flags & GlobalRealFlag) eigen_assert(numext::imag(value) == 0); if (value_flags & GlobalImagFlag) eigen_assert(numext::real(value) == 0); #endif m_symmetry.template apply, int>(m_indices, 0, m_tensor, value); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/util/0000755000176200001440000000000014562417221025545 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h0000644000176200001440000005106614562417221031711 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H #define EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H namespace Eigen { namespace internal { namespace group_theory { /** \internal * \file CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h * This file contains C++ templates that implement group theory algorithms. * * The algorithms allow for a compile-time analysis of finite groups. * * Currently only Dimino's algorithm is implemented, which returns a list * of all elements in a group given a set of (possibly redundant) generators. * (One could also do that with the so-called orbital algorithm, but that * is much more expensive and usually has no advantages.) */ /********************************************************************** * "Ok kid, here is where it gets complicated." * - Amelia Pond in the "Doctor Who" episode * "The Big Bang" * * Dimino's algorithm * ================== * * The following is Dimino's algorithm in sequential form: * * Input: identity element, list of generators, equality check, * multiplication operation * Output: list of group elements * * 1. add identity element * 2. remove identities from list of generators * 3. add all powers of first generator that aren't the * identity element * 4. go through all remaining generators: * a. if generator is already in the list of elements * -> do nothing * b. otherwise * i. remember current # of elements * (i.e. the size of the current subgroup) * ii. add all current elements (which includes * the identity) each multiplied from right * with the current generator to the group * iii. add all remaining cosets that are generated * by products of the new generator with itself * and all other generators seen so far * * In functional form, this is implemented as a long set of recursive * templates that have a complicated relationship. * * The main interface for Dimino's algorithm is the template * enumerate_group_elements. All lists are implemented as variadic * type_list and numeric_list * templates. * * 'Calling' templates is usually done via typedefs. * * This algorithm is an extended version of the basic version. The * extension consists in the fact that each group element has a set * of flags associated with it. Multiplication of two group elements * with each other results in a group element whose flags are the * XOR of the flags of the previous elements. Each time the algorithm * notices that a group element it just calculated is already in the * list of current elements, the flags of both will be compared and * added to the so-called 'global flags' of the group. * * The rationale behind this extension is that this allows not only * for the description of symmetries between tensor indices, but * also allows for the description of hermiticity, antisymmetry and * antihermiticity. Negation and conjugation each are specific bit * in the flags value and if two different ways to reach a group * element lead to two different flags, this poses a constraint on * the allowed values of the resulting tensor. For example, if a * group element is reach both with and without the conjugation * flags, it is clear that the resulting tensor has to be real. * * Note that this flag mechanism is quite generic and may have other * uses beyond tensor properties. * * IMPORTANT: * This algorithm assumes the group to be finite. If you try to * run it with a group that's infinite, the algorithm will only * terminate once you hit a compiler limit (max template depth). * Also note that trying to use this implementation to create a * very large group will probably either make you hit the same * limit, cause the compiler to segfault or at the very least * take a *really* long time (hours, days, weeks - sic!) to * compile. It is not recommended to plug in more than 4 * generators, unless they are independent of each other. */ /** \internal * * \class strip_identities * \ingroup CXX11_TensorSymmetry_Module * * \brief Cleanse a list of group elements of the identity element * * This template is used to make a first pass through all initial * generators of Dimino's algorithm and remove the identity * elements. * * \sa enumerate_group_elements */ template class Equality, typename id, typename L> struct strip_identities; template< template class Equality, typename id, typename t, typename... ts > struct strip_identities> { typedef typename conditional< Equality::value, typename strip_identities>::type, typename concat, typename strip_identities>::type>::type >::type type; constexpr static int global_flags = Equality::global_flags | strip_identities>::global_flags; }; template< template class Equality, typename id EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, ts) > struct strip_identities> { typedef type_list<> type; constexpr static int global_flags = 0; }; /** \internal * * \class dimino_first_step_elements_helper * \ingroup CXX11_TensorSymmetry_Module * * \brief Recursive template that adds powers of the first generator to the list of group elements * * This template calls itself recursively to add powers of the first * generator to the list of group elements. It stops if it reaches * the identity element again. * * \sa enumerate_group_elements, dimino_first_step_elements */ template< template class Multiply, template class Equality, typename id, typename g, typename current_element, typename elements, bool dont_add_current_element // = false > struct dimino_first_step_elements_helper #ifndef EIGEN_PARSED_BY_DOXYGEN : // recursive inheritance is too difficult for Doxygen public dimino_first_step_elements_helper< Multiply, Equality, id, g, typename Multiply::type, typename concat>::type, Equality::type, id>::value > {}; template< template class Multiply, template class Equality, typename id, typename g, typename current_element, typename elements > struct dimino_first_step_elements_helper #endif // EIGEN_PARSED_BY_DOXYGEN { typedef elements type; constexpr static int global_flags = Equality::global_flags; }; /** \internal * * \class dimino_first_step_elements * \ingroup CXX11_TensorSymmetry_Module * * \brief Add all powers of the first generator to the list of group elements * * This template takes the first non-identity generator and generates the initial * list of elements which consists of all powers of that generator. For a group * with just one generated, it would be enumerated after this. * * \sa enumerate_group_elements */ template< template class Multiply, template class Equality, typename id, typename generators > struct dimino_first_step_elements { typedef typename get<0, generators>::type first_generator; typedef typename skip<1, generators>::type next_generators; typedef type_list generators_done; typedef dimino_first_step_elements_helper< Multiply, Equality, id, first_generator, first_generator, type_list, false > helper; typedef typename helper::type type; constexpr static int global_flags = helper::global_flags; }; /** \internal * * \class dimino_get_coset_elements * \ingroup CXX11_TensorSymmetry_Module * * \brief Generate all elements of a specific coset * * This template generates all the elements of a specific coset by * multiplying all elements in the given subgroup with the new * coset representative. Note that the first element of the * subgroup is always the identity element, so the first element of * the result of this template is going to be the coset * representative itself. * * Note that this template accepts an additional boolean parameter * that specifies whether to actually generate the coset (true) or * just return an empty list (false). * * \sa enumerate_group_elements, dimino_add_cosets_for_rep */ template< template class Multiply, typename sub_group_elements, typename new_coset_rep, bool generate_coset // = true > struct dimino_get_coset_elements { typedef typename apply_op_from_right::type type; }; template< template class Multiply, typename sub_group_elements, typename new_coset_rep > struct dimino_get_coset_elements { typedef type_list<> type; }; /** \internal * * \class dimino_add_cosets_for_rep * \ingroup CXX11_TensorSymmetry_Module * * \brief Recursive template for adding coset spaces * * This template multiplies the coset representative with a generator * from the list of previous generators. If the new element is not in * the group already, it adds the corresponding coset. Finally it * proceeds to call itself with the next generator from the list. * * \sa enumerate_group_elements, dimino_add_all_coset_spaces */ template< template class Multiply, template class Equality, typename id, typename sub_group_elements, typename elements, typename generators, typename rep_element, int sub_group_size > struct dimino_add_cosets_for_rep; template< template class Multiply, template class Equality, typename id, typename sub_group_elements, typename elements, typename g, typename... gs, typename rep_element, int sub_group_size > struct dimino_add_cosets_for_rep, rep_element, sub_group_size> { typedef typename Multiply::type new_coset_rep; typedef contained_in_list_gf _cil; constexpr static bool add_coset = !_cil::value; typedef typename dimino_get_coset_elements< Multiply, sub_group_elements, new_coset_rep, add_coset >::type coset_elements; typedef dimino_add_cosets_for_rep< Multiply, Equality, id, sub_group_elements, typename concat::type, type_list, rep_element, sub_group_size > _helper; typedef typename _helper::type type; constexpr static int global_flags = _cil::global_flags | _helper::global_flags; /* Note that we don't have to update global flags here, since * we will only add these elements if they are not part of * the group already. But that only happens if the coset rep * is not already in the group, so the check for the coset rep * will catch this. */ }; template< template class Multiply, template class Equality, typename id, typename sub_group_elements, typename elements EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty), typename rep_element, int sub_group_size > struct dimino_add_cosets_for_rep, rep_element, sub_group_size> { typedef elements type; constexpr static int global_flags = 0; }; /** \internal * * \class dimino_add_all_coset_spaces * \ingroup CXX11_TensorSymmetry_Module * * \brief Recursive template for adding all coset spaces for a new generator * * This template tries to go through the list of generators (with * the help of the dimino_add_cosets_for_rep template) as long as * it still finds elements that are not part of the group and add * the corresponding cosets. * * \sa enumerate_group_elements, dimino_add_cosets_for_rep */ template< template class Multiply, template class Equality, typename id, typename sub_group_elements, typename elements, typename generators, int sub_group_size, int rep_pos, bool stop_condition // = false > struct dimino_add_all_coset_spaces { typedef typename get::type rep_element; typedef dimino_add_cosets_for_rep< Multiply, Equality, id, sub_group_elements, elements, generators, rep_element, sub_group_elements::count > _ac4r; typedef typename _ac4r::type new_elements; constexpr static int new_rep_pos = rep_pos + sub_group_elements::count; constexpr static bool new_stop_condition = new_rep_pos >= new_elements::count; typedef dimino_add_all_coset_spaces< Multiply, Equality, id, sub_group_elements, new_elements, generators, sub_group_size, new_rep_pos, new_stop_condition > _helper; typedef typename _helper::type type; constexpr static int global_flags = _helper::global_flags | _ac4r::global_flags; }; template< template class Multiply, template class Equality, typename id, typename sub_group_elements, typename elements, typename generators, int sub_group_size, int rep_pos > struct dimino_add_all_coset_spaces { typedef elements type; constexpr static int global_flags = 0; }; /** \internal * * \class dimino_add_generator * \ingroup CXX11_TensorSymmetry_Module * * \brief Enlarge the group by adding a new generator. * * It accepts a boolean parameter that determines if the generator is redundant, * i.e. was already seen in the group. In that case, it reduces to a no-op. * * \sa enumerate_group_elements, dimino_add_all_coset_spaces */ template< template class Multiply, template class Equality, typename id, typename elements, typename generators_done, typename current_generator, bool redundant // = false > struct dimino_add_generator { /* this template is only called if the generator is not redundant * => all elements of the group multiplied with the new generator * are going to be new elements of the most trivial coset space */ typedef typename apply_op_from_right::type multiplied_elements; typedef typename concat::type new_elements; constexpr static int rep_pos = elements::count; typedef dimino_add_all_coset_spaces< Multiply, Equality, id, elements, // elements of previous subgroup new_elements, typename concat>::type, elements::count, // size of previous subgroup rep_pos, false // don't stop (because rep_pos >= new_elements::count is always false at this point) > _helper; typedef typename _helper::type type; constexpr static int global_flags = _helper::global_flags; }; template< template class Multiply, template class Equality, typename id, typename elements, typename generators_done, typename current_generator > struct dimino_add_generator { // redundant case typedef elements type; constexpr static int global_flags = 0; }; /** \internal * * \class dimino_add_remaining_generators * \ingroup CXX11_TensorSymmetry_Module * * \brief Recursive template that adds all remaining generators to a group * * Loop through the list of generators that remain and successively * add them to the group. * * \sa enumerate_group_elements, dimino_add_generator */ template< template class Multiply, template class Equality, typename id, typename generators_done, typename remaining_generators, typename elements > struct dimino_add_remaining_generators { typedef typename get<0, remaining_generators>::type first_generator; typedef typename skip<1, remaining_generators>::type next_generators; typedef contained_in_list_gf _cil; typedef dimino_add_generator< Multiply, Equality, id, elements, generators_done, first_generator, _cil::value > _helper; typedef typename _helper::type new_elements; typedef dimino_add_remaining_generators< Multiply, Equality, id, typename concat>::type, next_generators, new_elements > _next_iter; typedef typename _next_iter::type type; constexpr static int global_flags = _cil::global_flags | _helper::global_flags | _next_iter::global_flags; }; template< template class Multiply, template class Equality, typename id, typename generators_done, typename elements > struct dimino_add_remaining_generators, elements> { typedef elements type; constexpr static int global_flags = 0; }; /** \internal * * \class enumerate_group_elements_noid * \ingroup CXX11_TensorSymmetry_Module * * \brief Helper template that implements group element enumeration * * This is a helper template that implements the actual enumeration * of group elements. This has been split so that the list of * generators can be cleansed of the identity element before * performing the actual operation. * * \sa enumerate_group_elements */ template< template class Multiply, template class Equality, typename id, typename generators, int initial_global_flags = 0 > struct enumerate_group_elements_noid { typedef dimino_first_step_elements first_step; typedef typename first_step::type first_step_elements; typedef dimino_add_remaining_generators< Multiply, Equality, id, typename first_step::generators_done, typename first_step::next_generators, // remaining_generators typename first_step::type // first_step elements > _helper; typedef typename _helper::type type; constexpr static int global_flags = initial_global_flags | first_step::global_flags | _helper::global_flags; }; // in case when no generators are specified template< template class Multiply, template class Equality, typename id, int initial_global_flags > struct enumerate_group_elements_noid, initial_global_flags> { typedef type_list type; constexpr static int global_flags = initial_global_flags; }; /** \internal * * \class enumerate_group_elements * \ingroup CXX11_TensorSymmetry_Module * * \brief Enumerate all elements in a finite group * * This template enumerates all elements in a finite group. It accepts * the following template parameters: * * \tparam Multiply The multiplication operation that multiplies two group elements * with each other. * \tparam Equality The equality check operation that checks if two group elements * are equal to another. * \tparam id The identity element * \tparam _generators A list of (possibly redundant) generators of the group */ template< template class Multiply, template class Equality, typename id, typename _generators > struct enumerate_group_elements : public enumerate_group_elements_noid< Multiply, Equality, id, typename strip_identities::type, strip_identities::global_flags > { }; } // end namespace group_theory } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/0000755000176200001440000000000014562417221023605 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/EventCount.h0000644000176200001440000002164114562417221026054 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Dmitry Vyukov // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_EVENTCOUNT_H_ #define EIGEN_CXX11_THREADPOOL_EVENTCOUNT_H_ namespace Eigen { // EventCount allows to wait for arbitrary predicates in non-blocking // algorithms. Think of condition variable, but wait predicate does not need to // be protected by a mutex. Usage: // Waiting thread does: // // if (predicate) // return act(); // EventCount::Waiter& w = waiters[my_index]; // ec.Prewait(&w); // if (predicate) { // ec.CancelWait(&w); // return act(); // } // ec.CommitWait(&w); // // Notifying thread does: // // predicate = true; // ec.Notify(true); // // Notify is cheap if there are no waiting threads. Prewait/CommitWait are not // cheap, but they are executed only if the preceding predicate check has // failed. // // Algorithm outline: // There are two main variables: predicate (managed by user) and state_. // Operation closely resembles Dekker mutual algorithm: // https://en.wikipedia.org/wiki/Dekker%27s_algorithm // Waiting thread sets state_ then checks predicate, Notifying thread sets // predicate then checks state_. Due to seq_cst fences in between these // operations it is guaranteed than either waiter will see predicate change // and won't block, or notifying thread will see state_ change and will unblock // the waiter, or both. But it can't happen that both threads don't see each // other changes, which would lead to deadlock. class EventCount { public: class Waiter; EventCount(MaxSizeVector& waiters) : state_(kStackMask), waiters_(waiters) { eigen_plain_assert(waiters.size() < (1 << kWaiterBits) - 1); } ~EventCount() { // Ensure there are no waiters. eigen_plain_assert(state_.load() == kStackMask); } // Prewait prepares for waiting. // After calling Prewait, the thread must re-check the wait predicate // and then call either CancelWait or CommitWait. void Prewait() { uint64_t state = state_.load(std::memory_order_relaxed); for (;;) { CheckState(state); uint64_t newstate = state + kWaiterInc; CheckState(newstate); if (state_.compare_exchange_weak(state, newstate, std::memory_order_seq_cst)) return; } } // CommitWait commits waiting after Prewait. void CommitWait(Waiter* w) { eigen_plain_assert((w->epoch & ~kEpochMask) == 0); w->state = Waiter::kNotSignaled; const uint64_t me = (w - &waiters_[0]) | w->epoch; uint64_t state = state_.load(std::memory_order_seq_cst); for (;;) { CheckState(state, true); uint64_t newstate; if ((state & kSignalMask) != 0) { // Consume the signal and return immidiately. newstate = state - kWaiterInc - kSignalInc; } else { // Remove this thread from pre-wait counter and add to the waiter stack. newstate = ((state & kWaiterMask) - kWaiterInc) | me; w->next.store(state & (kStackMask | kEpochMask), std::memory_order_relaxed); } CheckState(newstate); if (state_.compare_exchange_weak(state, newstate, std::memory_order_acq_rel)) { if ((state & kSignalMask) == 0) { w->epoch += kEpochInc; Park(w); } return; } } } // CancelWait cancels effects of the previous Prewait call. void CancelWait() { uint64_t state = state_.load(std::memory_order_relaxed); for (;;) { CheckState(state, true); uint64_t newstate = state - kWaiterInc; // We don't know if the thread was also notified or not, // so we should not consume a signal unconditionaly. // Only if number of waiters is equal to number of signals, // we know that the thread was notified and we must take away the signal. if (((state & kWaiterMask) >> kWaiterShift) == ((state & kSignalMask) >> kSignalShift)) newstate -= kSignalInc; CheckState(newstate); if (state_.compare_exchange_weak(state, newstate, std::memory_order_acq_rel)) return; } } // Notify wakes one or all waiting threads. // Must be called after changing the associated wait predicate. void Notify(bool notifyAll) { std::atomic_thread_fence(std::memory_order_seq_cst); uint64_t state = state_.load(std::memory_order_acquire); for (;;) { CheckState(state); const uint64_t waiters = (state & kWaiterMask) >> kWaiterShift; const uint64_t signals = (state & kSignalMask) >> kSignalShift; // Easy case: no waiters. if ((state & kStackMask) == kStackMask && waiters == signals) return; uint64_t newstate; if (notifyAll) { // Empty wait stack and set signal to number of pre-wait threads. newstate = (state & kWaiterMask) | (waiters << kSignalShift) | kStackMask; } else if (signals < waiters) { // There is a thread in pre-wait state, unblock it. newstate = state + kSignalInc; } else { // Pop a waiter from list and unpark it. Waiter* w = &waiters_[state & kStackMask]; uint64_t next = w->next.load(std::memory_order_relaxed); newstate = (state & (kWaiterMask | kSignalMask)) | next; } CheckState(newstate); if (state_.compare_exchange_weak(state, newstate, std::memory_order_acq_rel)) { if (!notifyAll && (signals < waiters)) return; // unblocked pre-wait thread if ((state & kStackMask) == kStackMask) return; Waiter* w = &waiters_[state & kStackMask]; if (!notifyAll) w->next.store(kStackMask, std::memory_order_relaxed); Unpark(w); return; } } } class Waiter { friend class EventCount; // Align to 128 byte boundary to prevent false sharing with other Waiter // objects in the same vector. EIGEN_ALIGN_TO_BOUNDARY(128) std::atomic next; std::mutex mu; std::condition_variable cv; uint64_t epoch = 0; unsigned state = kNotSignaled; enum { kNotSignaled, kWaiting, kSignaled, }; }; private: // State_ layout: // - low kWaiterBits is a stack of waiters committed wait // (indexes in waiters_ array are used as stack elements, // kStackMask means empty stack). // - next kWaiterBits is count of waiters in prewait state. // - next kWaiterBits is count of pending signals. // - remaining bits are ABA counter for the stack. // (stored in Waiter node and incremented on push). static const uint64_t kWaiterBits = 14; static const uint64_t kStackMask = (1ull << kWaiterBits) - 1; static const uint64_t kWaiterShift = kWaiterBits; static const uint64_t kWaiterMask = ((1ull << kWaiterBits) - 1) << kWaiterShift; static const uint64_t kWaiterInc = 1ull << kWaiterShift; static const uint64_t kSignalShift = 2 * kWaiterBits; static const uint64_t kSignalMask = ((1ull << kWaiterBits) - 1) << kSignalShift; static const uint64_t kSignalInc = 1ull << kSignalShift; static const uint64_t kEpochShift = 3 * kWaiterBits; static const uint64_t kEpochBits = 64 - kEpochShift; static const uint64_t kEpochMask = ((1ull << kEpochBits) - 1) << kEpochShift; static const uint64_t kEpochInc = 1ull << kEpochShift; std::atomic state_; MaxSizeVector& waiters_; static void CheckState(uint64_t state, bool waiter = false) { static_assert(kEpochBits >= 20, "not enough bits to prevent ABA problem"); const uint64_t waiters = (state & kWaiterMask) >> kWaiterShift; const uint64_t signals = (state & kSignalMask) >> kSignalShift; eigen_plain_assert(waiters >= signals); eigen_plain_assert(waiters < (1 << kWaiterBits) - 1); eigen_plain_assert(!waiter || waiters > 0); (void)waiters; (void)signals; } void Park(Waiter* w) { std::unique_lock lock(w->mu); while (w->state != Waiter::kSignaled) { w->state = Waiter::kWaiting; w->cv.wait(lock); } } void Unpark(Waiter* w) { for (Waiter* next; w; w = next) { uint64_t wnext = w->next.load(std::memory_order_relaxed) & kStackMask; next = wnext == kStackMask ? nullptr : &waiters_[wnext]; unsigned state; { std::unique_lock lock(w->mu); state = w->state; w->state = Waiter::kSignaled; } // Avoid notifying if it wasn't waiting. if (state == Waiter::kWaiting) w->cv.notify_one(); } } EventCount(const EventCount&) = delete; void operator=(const EventCount&) = delete; }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_EVENTCOUNT_H_ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h0000644000176200001440000002222614562417221025533 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Dmitry Vyukov // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_RUNQUEUE_H_ #define EIGEN_CXX11_THREADPOOL_RUNQUEUE_H_ namespace Eigen { // RunQueue is a fixed-size, partially non-blocking deque or Work items. // Operations on front of the queue must be done by a single thread (owner), // operations on back of the queue can be done by multiple threads concurrently. // // Algorithm outline: // All remote threads operating on the queue back are serialized by a mutex. // This ensures that at most two threads access state: owner and one remote // thread (Size aside). The algorithm ensures that the occupied region of the // underlying array is logically continuous (can wraparound, but no stray // occupied elements). Owner operates on one end of this region, remote thread // operates on the other end. Synchronization between these threads // (potential consumption of the last element and take up of the last empty // element) happens by means of state variable in each element. States are: // empty, busy (in process of insertion of removal) and ready. Threads claim // elements (empty->busy and ready->busy transitions) by means of a CAS // operation. The finishing transition (busy->empty and busy->ready) are done // with plain store as the element is exclusively owned by the current thread. // // Note: we could permit only pointers as elements, then we would not need // separate state variable as null/non-null pointer value would serve as state, // but that would require malloc/free per operation for large, complex values // (and this is designed to store std::function<()>). template class RunQueue { public: RunQueue() : front_(0), back_(0) { // require power-of-two for fast masking eigen_plain_assert((kSize & (kSize - 1)) == 0); eigen_plain_assert(kSize > 2); // why would you do this? eigen_plain_assert(kSize <= (64 << 10)); // leave enough space for counter for (unsigned i = 0; i < kSize; i++) array_[i].state.store(kEmpty, std::memory_order_relaxed); } ~RunQueue() { eigen_plain_assert(Size() == 0); } // PushFront inserts w at the beginning of the queue. // If queue is full returns w, otherwise returns default-constructed Work. Work PushFront(Work w) { unsigned front = front_.load(std::memory_order_relaxed); Elem* e = &array_[front & kMask]; uint8_t s = e->state.load(std::memory_order_relaxed); if (s != kEmpty || !e->state.compare_exchange_strong(s, kBusy, std::memory_order_acquire)) return w; front_.store(front + 1 + (kSize << 1), std::memory_order_relaxed); e->w = std::move(w); e->state.store(kReady, std::memory_order_release); return Work(); } // PopFront removes and returns the first element in the queue. // If the queue was empty returns default-constructed Work. Work PopFront() { unsigned front = front_.load(std::memory_order_relaxed); Elem* e = &array_[(front - 1) & kMask]; uint8_t s = e->state.load(std::memory_order_relaxed); if (s != kReady || !e->state.compare_exchange_strong(s, kBusy, std::memory_order_acquire)) return Work(); Work w = std::move(e->w); e->state.store(kEmpty, std::memory_order_release); front = ((front - 1) & kMask2) | (front & ~kMask2); front_.store(front, std::memory_order_relaxed); return w; } // PushBack adds w at the end of the queue. // If queue is full returns w, otherwise returns default-constructed Work. Work PushBack(Work w) { std::unique_lock lock(mutex_); unsigned back = back_.load(std::memory_order_relaxed); Elem* e = &array_[(back - 1) & kMask]; uint8_t s = e->state.load(std::memory_order_relaxed); if (s != kEmpty || !e->state.compare_exchange_strong(s, kBusy, std::memory_order_acquire)) return w; back = ((back - 1) & kMask2) | (back & ~kMask2); back_.store(back, std::memory_order_relaxed); e->w = std::move(w); e->state.store(kReady, std::memory_order_release); return Work(); } // PopBack removes and returns the last elements in the queue. Work PopBack() { if (Empty()) return Work(); std::unique_lock lock(mutex_); unsigned back = back_.load(std::memory_order_relaxed); Elem* e = &array_[back & kMask]; uint8_t s = e->state.load(std::memory_order_relaxed); if (s != kReady || !e->state.compare_exchange_strong(s, kBusy, std::memory_order_acquire)) return Work(); Work w = std::move(e->w); e->state.store(kEmpty, std::memory_order_release); back_.store(back + 1 + (kSize << 1), std::memory_order_relaxed); return w; } // PopBackHalf removes and returns half last elements in the queue. // Returns number of elements removed. unsigned PopBackHalf(std::vector* result) { if (Empty()) return 0; std::unique_lock lock(mutex_); unsigned back = back_.load(std::memory_order_relaxed); unsigned size = Size(); unsigned mid = back; if (size > 1) mid = back + (size - 1) / 2; unsigned n = 0; unsigned start = 0; for (; static_cast(mid - back) >= 0; mid--) { Elem* e = &array_[mid & kMask]; uint8_t s = e->state.load(std::memory_order_relaxed); if (n == 0) { if (s != kReady || !e->state.compare_exchange_strong( s, kBusy, std::memory_order_acquire)) continue; start = mid; } else { // Note: no need to store temporal kBusy, we exclusively own these // elements. eigen_plain_assert(s == kReady); } result->push_back(std::move(e->w)); e->state.store(kEmpty, std::memory_order_release); n++; } if (n != 0) back_.store(start + 1 + (kSize << 1), std::memory_order_relaxed); return n; } // Size returns current queue size. // Can be called by any thread at any time. unsigned Size() const { return SizeOrNotEmpty(); } // Empty tests whether container is empty. // Can be called by any thread at any time. bool Empty() const { return SizeOrNotEmpty() == 0; } // Delete all the elements from the queue. void Flush() { while (!Empty()) { PopFront(); } } private: static const unsigned kMask = kSize - 1; static const unsigned kMask2 = (kSize << 1) - 1; struct Elem { std::atomic state; Work w; }; enum { kEmpty, kBusy, kReady, }; std::mutex mutex_; // Low log(kSize) + 1 bits in front_ and back_ contain rolling index of // front/back, respectively. The remaining bits contain modification counters // that are incremented on Push operations. This allows us to (1) distinguish // between empty and full conditions (if we would use log(kSize) bits for // position, these conditions would be indistinguishable); (2) obtain // consistent snapshot of front_/back_ for Size operation using the // modification counters. std::atomic front_; std::atomic back_; Elem array_[kSize]; // SizeOrNotEmpty returns current queue size; if NeedSizeEstimate is false, // only whether the size is 0 is guaranteed to be correct. // Can be called by any thread at any time. template unsigned SizeOrNotEmpty() const { // Emptiness plays critical role in thread pool blocking. So we go to great // effort to not produce false positives (claim non-empty queue as empty). unsigned front = front_.load(std::memory_order_acquire); for (;;) { // Capture a consistent snapshot of front/tail. unsigned back = back_.load(std::memory_order_acquire); unsigned front1 = front_.load(std::memory_order_relaxed); if (front != front1) { front = front1; std::atomic_thread_fence(std::memory_order_acquire); continue; } if (NeedSizeEstimate) { return CalculateSize(front, back); } else { // This value will be 0 if the queue is empty, and undefined otherwise. unsigned maybe_zero = ((front ^ back) & kMask2); // Queue size estimate must agree with maybe zero check on the queue // empty/non-empty state. eigen_assert((CalculateSize(front, back) == 0) == (maybe_zero == 0)); return maybe_zero; } } } EIGEN_ALWAYS_INLINE unsigned CalculateSize(unsigned front, unsigned back) const { int size = (front & kMask2) - (back & kMask2); // Fix overflow. if (size < 0) size += 2 * kSize; // Order of modification in push/pop is crafted to make the queue look // larger than it is during concurrent modifications. E.g. push can // increment size before the corresponding pop has decremented it. // So the computed size can be up to kSize + 1, fix it. if (size > static_cast(kSize)) size = kSize; return static_cast(size); } RunQueue(const RunQueue&) = delete; void operator=(const RunQueue&) = delete; }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_RUNQUEUE_H_ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadPoolInterface.h0000644000176200001440000000322014562417221027635 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_THREAD_POOL_INTERFACE_H #define EIGEN_CXX11_THREADPOOL_THREAD_POOL_INTERFACE_H namespace Eigen { // This defines an interface that ThreadPoolDevice can take to use // custom thread pools underneath. class ThreadPoolInterface { public: // Submits a closure to be run by a thread in the pool. virtual void Schedule(std::function fn) = 0; // Submits a closure to be run by threads in the range [start, end) in the // pool. virtual void ScheduleWithHint(std::function fn, int /*start*/, int /*end*/) { // Just defer to Schedule in case sub-classes aren't interested in // overriding this functionality. Schedule(fn); } // If implemented, stop processing the closures that have been enqueued. // Currently running closures may still be processed. // If not implemented, does nothing. virtual void Cancel() {} // Returns the number of threads in the pool. virtual int NumThreads() const = 0; // Returns a logical thread index between 0 and NumThreads() - 1 if called // from one of the threads in the pool. Returns -1 otherwise. virtual int CurrentThreadId() const = 0; virtual ~ThreadPoolInterface() {} }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_THREAD_POOL_INTERFACE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/Barrier.h0000644000176200001440000000410114562417221025340 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2018 Rasmus Munk Larsen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. // Barrier is an object that allows one or more threads to wait until // Notify has been called a specified number of times. #ifndef EIGEN_CXX11_THREADPOOL_BARRIER_H #define EIGEN_CXX11_THREADPOOL_BARRIER_H namespace Eigen { class Barrier { public: Barrier(unsigned int count) : state_(count << 1), notified_(false) { eigen_plain_assert(((count << 1) >> 1) == count); } ~Barrier() { eigen_plain_assert((state_ >> 1) == 0); } void Notify() { unsigned int v = state_.fetch_sub(2, std::memory_order_acq_rel) - 2; if (v != 1) { // Clear the lowest bit (waiter flag) and check that the original state // value was not zero. If it was zero, it means that notify was called // more times than the original count. eigen_plain_assert(((v + 2) & ~1) != 0); return; // either count has not dropped to 0, or waiter is not waiting } std::unique_lock l(mu_); eigen_plain_assert(!notified_); notified_ = true; cv_.notify_all(); } void Wait() { unsigned int v = state_.fetch_or(1, std::memory_order_acq_rel); if ((v >> 1) == 0) return; std::unique_lock l(mu_); while (!notified_) { cv_.wait(l); } } private: std::mutex mu_; std::condition_variable cv_; std::atomic state_; // low bit is waiter flag bool notified_; }; // Notification is an object that allows a user to to wait for another // thread to signal a notification that an event has occurred. // // Multiple threads can wait on the same Notification object, // but only one caller must call Notify() on the object. struct Notification : Barrier { Notification() : Barrier(1){}; }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_BARRIER_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadCancel.h0000644000176200001440000000140614562417221026274 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_THREAD_CANCEL_H #define EIGEN_CXX11_THREADPOOL_THREAD_CANCEL_H // Try to come up with a portable way to cancel a thread #if EIGEN_OS_GNULINUX #define EIGEN_THREAD_CANCEL(t) \ pthread_cancel(t.native_handle()); #define EIGEN_SUPPORTS_THREAD_CANCELLATION 1 #else #define EIGEN_THREAD_CANCEL(t) #endif #endif // EIGEN_CXX11_THREADPOOL_THREAD_CANCEL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadEnvironment.h0000644000176200001440000000227114562417221027414 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_THREAD_ENVIRONMENT_H #define EIGEN_CXX11_THREADPOOL_THREAD_ENVIRONMENT_H namespace Eigen { struct StlThreadEnvironment { struct Task { std::function f; }; // EnvThread constructor must start the thread, // destructor must join the thread. class EnvThread { public: EnvThread(std::function f) : thr_(std::move(f)) {} ~EnvThread() { thr_.join(); } // This function is called when the threadpool is cancelled. void OnCancel() { } private: std::thread thr_; }; EnvThread* CreateThread(std::function f) { return new EnvThread(std::move(f)); } Task CreateTask(std::function f) { return Task{std::move(f)}; } void ExecuteTask(const Task& t) { t.f(); } }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_THREAD_ENVIRONMENT_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h0000644000176200001440000004126314562417221030151 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Dmitry Vyukov // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_NONBLOCKING_THREAD_POOL_H #define EIGEN_CXX11_THREADPOOL_NONBLOCKING_THREAD_POOL_H namespace Eigen { template class ThreadPoolTempl : public Eigen::ThreadPoolInterface { public: typedef typename Environment::Task Task; typedef RunQueue Queue; ThreadPoolTempl(int num_threads, Environment env = Environment()) : ThreadPoolTempl(num_threads, true, env) {} ThreadPoolTempl(int num_threads, bool allow_spinning, Environment env = Environment()) : env_(env), num_threads_(num_threads), allow_spinning_(allow_spinning), thread_data_(num_threads), all_coprimes_(num_threads), waiters_(num_threads), global_steal_partition_(EncodePartition(0, num_threads_)), blocked_(0), spinning_(0), done_(false), cancelled_(false), ec_(waiters_) { waiters_.resize(num_threads_); // Calculate coprimes of all numbers [1, num_threads]. // Coprimes are used for random walks over all threads in Steal // and NonEmptyQueueIndex. Iteration is based on the fact that if we take // a random starting thread index t and calculate num_threads - 1 subsequent // indices as (t + coprime) % num_threads, we will cover all threads without // repetitions (effectively getting a presudo-random permutation of thread // indices). eigen_plain_assert(num_threads_ < kMaxThreads); for (int i = 1; i <= num_threads_; ++i) { all_coprimes_.emplace_back(i); ComputeCoprimes(i, &all_coprimes_.back()); } #ifndef EIGEN_THREAD_LOCAL init_barrier_.reset(new Barrier(num_threads_)); #endif thread_data_.resize(num_threads_); for (int i = 0; i < num_threads_; i++) { SetStealPartition(i, EncodePartition(0, num_threads_)); thread_data_[i].thread.reset( env_.CreateThread([this, i]() { WorkerLoop(i); })); } #ifndef EIGEN_THREAD_LOCAL // Wait for workers to initialize per_thread_map_. Otherwise we might race // with them in Schedule or CurrentThreadId. init_barrier_->Wait(); #endif } ~ThreadPoolTempl() { done_ = true; // Now if all threads block without work, they will start exiting. // But note that threads can continue to work arbitrary long, // block, submit new work, unblock and otherwise live full life. if (!cancelled_) { ec_.Notify(true); } else { // Since we were cancelled, there might be entries in the queues. // Empty them to prevent their destructor from asserting. for (size_t i = 0; i < thread_data_.size(); i++) { thread_data_[i].queue.Flush(); } } // Join threads explicitly (by destroying) to avoid destruction order within // this class. for (size_t i = 0; i < thread_data_.size(); ++i) thread_data_[i].thread.reset(); } void SetStealPartitions(const std::vector>& partitions) { eigen_plain_assert(partitions.size() == static_cast(num_threads_)); // Pass this information to each thread queue. for (int i = 0; i < num_threads_; i++) { const auto& pair = partitions[i]; unsigned start = pair.first, end = pair.second; AssertBounds(start, end); unsigned val = EncodePartition(start, end); SetStealPartition(i, val); } } void Schedule(std::function fn) EIGEN_OVERRIDE { ScheduleWithHint(std::move(fn), 0, num_threads_); } void ScheduleWithHint(std::function fn, int start, int limit) override { Task t = env_.CreateTask(std::move(fn)); PerThread* pt = GetPerThread(); if (pt->pool == this) { // Worker thread of this pool, push onto the thread's queue. Queue& q = thread_data_[pt->thread_id].queue; t = q.PushFront(std::move(t)); } else { // A free-standing thread (or worker of another pool), push onto a random // queue. eigen_plain_assert(start < limit); eigen_plain_assert(limit <= num_threads_); int num_queues = limit - start; int rnd = Rand(&pt->rand) % num_queues; eigen_plain_assert(start + rnd < limit); Queue& q = thread_data_[start + rnd].queue; t = q.PushBack(std::move(t)); } // Note: below we touch this after making w available to worker threads. // Strictly speaking, this can lead to a racy-use-after-free. Consider that // Schedule is called from a thread that is neither main thread nor a worker // thread of this pool. Then, execution of w directly or indirectly // completes overall computations, which in turn leads to destruction of // this. We expect that such scenario is prevented by program, that is, // this is kept alive while any threads can potentially be in Schedule. if (!t.f) { ec_.Notify(false); } else { env_.ExecuteTask(t); // Push failed, execute directly. } } void Cancel() EIGEN_OVERRIDE { cancelled_ = true; done_ = true; // Let each thread know it's been cancelled. #ifdef EIGEN_THREAD_ENV_SUPPORTS_CANCELLATION for (size_t i = 0; i < thread_data_.size(); i++) { thread_data_[i].thread->OnCancel(); } #endif // Wake up the threads without work to let them exit on their own. ec_.Notify(true); } int NumThreads() const EIGEN_FINAL { return num_threads_; } int CurrentThreadId() const EIGEN_FINAL { const PerThread* pt = const_cast(this)->GetPerThread(); if (pt->pool == this) { return pt->thread_id; } else { return -1; } } private: // Create a single atomic that encodes start and limit information for // each thread. // We expect num_threads_ < 65536, so we can store them in a single // std::atomic. // Exposed publicly as static functions so that external callers can reuse // this encode/decode logic for maintaining their own thread-safe copies of // scheduling and steal domain(s). static const int kMaxPartitionBits = 16; static const int kMaxThreads = 1 << kMaxPartitionBits; inline unsigned EncodePartition(unsigned start, unsigned limit) { return (start << kMaxPartitionBits) | limit; } inline void DecodePartition(unsigned val, unsigned* start, unsigned* limit) { *limit = val & (kMaxThreads - 1); val >>= kMaxPartitionBits; *start = val; } void AssertBounds(int start, int end) { eigen_plain_assert(start >= 0); eigen_plain_assert(start < end); // non-zero sized partition eigen_plain_assert(end <= num_threads_); } inline void SetStealPartition(size_t i, unsigned val) { thread_data_[i].steal_partition.store(val, std::memory_order_relaxed); } inline unsigned GetStealPartition(int i) { return thread_data_[i].steal_partition.load(std::memory_order_relaxed); } void ComputeCoprimes(int N, MaxSizeVector* coprimes) { for (int i = 1; i <= N; i++) { unsigned a = i; unsigned b = N; // If GCD(a, b) == 1, then a and b are coprimes. while (b != 0) { unsigned tmp = a; a = b; b = tmp % b; } if (a == 1) { coprimes->push_back(i); } } } typedef typename Environment::EnvThread Thread; struct PerThread { constexpr PerThread() : pool(NULL), rand(0), thread_id(-1) {} ThreadPoolTempl* pool; // Parent pool, or null for normal threads. uint64_t rand; // Random generator state. int thread_id; // Worker thread index in pool. #ifndef EIGEN_THREAD_LOCAL // Prevent false sharing. char pad_[128]; #endif }; struct ThreadData { constexpr ThreadData() : thread(), steal_partition(0), queue() {} std::unique_ptr thread; std::atomic steal_partition; Queue queue; }; Environment env_; const int num_threads_; const bool allow_spinning_; MaxSizeVector thread_data_; MaxSizeVector> all_coprimes_; MaxSizeVector waiters_; unsigned global_steal_partition_; std::atomic blocked_; std::atomic spinning_; std::atomic done_; std::atomic cancelled_; EventCount ec_; #ifndef EIGEN_THREAD_LOCAL std::unique_ptr init_barrier_; std::mutex per_thread_map_mutex_; // Protects per_thread_map_. std::unordered_map> per_thread_map_; #endif // Main worker thread loop. void WorkerLoop(int thread_id) { #ifndef EIGEN_THREAD_LOCAL std::unique_ptr new_pt(new PerThread()); per_thread_map_mutex_.lock(); bool insertOK = per_thread_map_.emplace(GlobalThreadIdHash(), std::move(new_pt)).second; eigen_plain_assert(insertOK); EIGEN_UNUSED_VARIABLE(insertOK); per_thread_map_mutex_.unlock(); init_barrier_->Notify(); init_barrier_->Wait(); #endif PerThread* pt = GetPerThread(); pt->pool = this; pt->rand = GlobalThreadIdHash(); pt->thread_id = thread_id; Queue& q = thread_data_[thread_id].queue; EventCount::Waiter* waiter = &waiters_[thread_id]; // TODO(dvyukov,rmlarsen): The time spent in NonEmptyQueueIndex() is // proportional to num_threads_ and we assume that new work is scheduled at // a constant rate, so we set spin_count to 5000 / num_threads_. The // constant was picked based on a fair dice roll, tune it. const int spin_count = allow_spinning_ && num_threads_ > 0 ? 5000 / num_threads_ : 0; if (num_threads_ == 1) { // For num_threads_ == 1 there is no point in going through the expensive // steal loop. Moreover, since NonEmptyQueueIndex() calls PopBack() on the // victim queues it might reverse the order in which ops are executed // compared to the order in which they are scheduled, which tends to be // counter-productive for the types of I/O workloads the single thread // pools tend to be used for. while (!cancelled_) { Task t = q.PopFront(); for (int i = 0; i < spin_count && !t.f; i++) { if (!cancelled_.load(std::memory_order_relaxed)) { t = q.PopFront(); } } if (!t.f) { if (!WaitForWork(waiter, &t)) { return; } } if (t.f) { env_.ExecuteTask(t); } } } else { while (!cancelled_) { Task t = q.PopFront(); if (!t.f) { t = LocalSteal(); if (!t.f) { t = GlobalSteal(); if (!t.f) { // Leave one thread spinning. This reduces latency. if (allow_spinning_ && !spinning_ && !spinning_.exchange(true)) { for (int i = 0; i < spin_count && !t.f; i++) { if (!cancelled_.load(std::memory_order_relaxed)) { t = GlobalSteal(); } else { return; } } spinning_ = false; } if (!t.f) { if (!WaitForWork(waiter, &t)) { return; } } } } } if (t.f) { env_.ExecuteTask(t); } } } } // Steal tries to steal work from other worker threads in the range [start, // limit) in best-effort manner. Task Steal(unsigned start, unsigned limit) { PerThread* pt = GetPerThread(); const size_t size = limit - start; unsigned r = Rand(&pt->rand); // Reduce r into [0, size) range, this utilizes trick from // https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ eigen_plain_assert(all_coprimes_[size - 1].size() < (1<<30)); unsigned victim = ((uint64_t)r * (uint64_t)size) >> 32; unsigned index = ((uint64_t) all_coprimes_[size - 1].size() * (uint64_t)r) >> 32; unsigned inc = all_coprimes_[size - 1][index]; for (unsigned i = 0; i < size; i++) { eigen_plain_assert(start + victim < limit); Task t = thread_data_[start + victim].queue.PopBack(); if (t.f) { return t; } victim += inc; if (victim >= size) { victim -= size; } } return Task(); } // Steals work within threads belonging to the partition. Task LocalSteal() { PerThread* pt = GetPerThread(); unsigned partition = GetStealPartition(pt->thread_id); // If thread steal partition is the same as global partition, there is no // need to go through the steal loop twice. if (global_steal_partition_ == partition) return Task(); unsigned start, limit; DecodePartition(partition, &start, &limit); AssertBounds(start, limit); return Steal(start, limit); } // Steals work from any other thread in the pool. Task GlobalSteal() { return Steal(0, num_threads_); } // WaitForWork blocks until new work is available (returns true), or if it is // time to exit (returns false). Can optionally return a task to execute in t // (in such case t.f != nullptr on return). bool WaitForWork(EventCount::Waiter* waiter, Task* t) { eigen_plain_assert(!t->f); // We already did best-effort emptiness check in Steal, so prepare for // blocking. ec_.Prewait(); // Now do a reliable emptiness check. int victim = NonEmptyQueueIndex(); if (victim != -1) { ec_.CancelWait(); if (cancelled_) { return false; } else { *t = thread_data_[victim].queue.PopBack(); return true; } } // Number of blocked threads is used as termination condition. // If we are shutting down and all worker threads blocked without work, // that's we are done. blocked_++; // TODO is blocked_ required to be unsigned? if (done_ && blocked_ == static_cast(num_threads_)) { ec_.CancelWait(); // Almost done, but need to re-check queues. // Consider that all queues are empty and all worker threads are preempted // right after incrementing blocked_ above. Now a free-standing thread // submits work and calls destructor (which sets done_). If we don't // re-check queues, we will exit leaving the work unexecuted. if (NonEmptyQueueIndex() != -1) { // Note: we must not pop from queues before we decrement blocked_, // otherwise the following scenario is possible. Consider that instead // of checking for emptiness we popped the only element from queues. // Now other worker threads can start exiting, which is bad if the // work item submits other work. So we just check emptiness here, // which ensures that all worker threads exit at the same time. blocked_--; return true; } // Reached stable termination state. ec_.Notify(true); return false; } ec_.CommitWait(waiter); blocked_--; return true; } int NonEmptyQueueIndex() { PerThread* pt = GetPerThread(); // We intentionally design NonEmptyQueueIndex to steal work from // anywhere in the queue so threads don't block in WaitForWork() forever // when all threads in their partition go to sleep. Steal is still local. const size_t size = thread_data_.size(); unsigned r = Rand(&pt->rand); unsigned inc = all_coprimes_[size - 1][r % all_coprimes_[size - 1].size()]; unsigned victim = r % size; for (unsigned i = 0; i < size; i++) { if (!thread_data_[victim].queue.Empty()) { return victim; } victim += inc; if (victim >= size) { victim -= size; } } return -1; } static EIGEN_STRONG_INLINE uint64_t GlobalThreadIdHash() { return std::hash()(std::this_thread::get_id()); } EIGEN_STRONG_INLINE PerThread* GetPerThread() { #ifndef EIGEN_THREAD_LOCAL static PerThread dummy; auto it = per_thread_map_.find(GlobalThreadIdHash()); if (it == per_thread_map_.end()) { return &dummy; } else { return it->second.get(); } #else EIGEN_THREAD_LOCAL PerThread per_thread_; PerThread* pt = &per_thread_; return pt; #endif } static EIGEN_STRONG_INLINE unsigned Rand(uint64_t* state) { uint64_t current = *state; // Update the internal state *state = current * 6364136223846793005ULL + 0xda3e39cb94b95bdbULL; // Generate the random output (using the PCG-XSH-RS scheme) return static_cast((current ^ (current >> 22)) >> (22 + (current >> 61))); } }; typedef ThreadPoolTempl ThreadPool; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_NONBLOCKING_THREAD_POOL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadYield.h0000644000176200001440000000133614562414502026156 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_THREAD_YIELD_H #define EIGEN_CXX11_THREADPOOL_THREAD_YIELD_H // Try to come up with a portable way to yield #if EIGEN_COMP_GNUC && EIGEN_GNUC_AT_MOST(4, 7) #include #define EIGEN_THREAD_YIELD() sched_yield() #else #define EIGEN_THREAD_YIELD() std::this_thread::yield() #endif #endif // EIGEN_CXX11_THREADPOOL_THREAD_YIELD_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/ThreadPool/ThreadLocal.h0000644000176200001440000002633214562417221026146 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_THREADPOOL_THREAD_LOCAL_H #define EIGEN_CXX11_THREADPOOL_THREAD_LOCAL_H #ifdef EIGEN_AVOID_THREAD_LOCAL #ifdef EIGEN_THREAD_LOCAL #undef EIGEN_THREAD_LOCAL #endif #else #if EIGEN_MAX_CPP_VER >= 11 && \ ((EIGEN_COMP_GNUC && EIGEN_GNUC_AT_LEAST(4, 8)) || \ __has_feature(cxx_thread_local) || \ (EIGEN_COMP_MSVC >= 1900) ) #define EIGEN_THREAD_LOCAL static thread_local #endif // Disable TLS for Apple and Android builds with older toolchains. #if defined(__APPLE__) // Included for TARGET_OS_IPHONE, __IPHONE_OS_VERSION_MIN_REQUIRED, // __IPHONE_8_0. #include #include #endif // Checks whether C++11's `thread_local` storage duration specifier is // supported. #if defined(__apple_build_version__) && \ ((__apple_build_version__ < 8000042) || \ (TARGET_OS_IPHONE && __IPHONE_OS_VERSION_MIN_REQUIRED < __IPHONE_9_0)) // Notes: Xcode's clang did not support `thread_local` until version // 8, and even then not for all iOS < 9.0. #undef EIGEN_THREAD_LOCAL #elif defined(__ANDROID__) && EIGEN_COMP_CLANG // There are platforms for which TLS should not be used even though the compiler // makes it seem like it's supported (Android NDK < r12b for example). // This is primarily because of linker problems and toolchain misconfiguration: // TLS isn't supported until NDK r12b per // https://developer.android.com/ndk/downloads/revision_history.html // Since NDK r16, `__NDK_MAJOR__` and `__NDK_MINOR__` are defined in // . For NDK < r16, users should define these macros, // e.g. `-D__NDK_MAJOR__=11 -D__NKD_MINOR__=0` for NDK r11. #if __has_include() #include #endif // __has_include() #if defined(__ANDROID__) && defined(__clang__) && defined(__NDK_MAJOR__) && \ defined(__NDK_MINOR__) && \ ((__NDK_MAJOR__ < 12) || ((__NDK_MAJOR__ == 12) && (__NDK_MINOR__ < 1))) #undef EIGEN_THREAD_LOCAL #endif #endif // defined(__ANDROID__) && defined(__clang__) #endif // EIGEN_AVOID_THREAD_LOCAL namespace Eigen { namespace internal { template struct ThreadLocalNoOpInitialize { void operator()(T&) const {} }; template struct ThreadLocalNoOpRelease { void operator()(T&) const {} }; } // namespace internal // Thread local container for elements of type T, that does not use thread local // storage. As long as the number of unique threads accessing this storage // is smaller than `capacity_`, it is lock-free and wait-free. Otherwise it will // use a mutex for synchronization. // // Type `T` has to be default constructible, and by default each thread will get // a default constructed value. It is possible to specify custom `initialize` // callable, that will be called lazily from each thread accessing this object, // and will be passed a default initialized object of type `T`. Also it's // possible to pass a custom `release` callable, that will be invoked before // calling ~T(). // // Example: // // struct Counter { // int value = 0; // } // // Eigen::ThreadLocal counter(10); // // // Each thread will have access to it's own counter object. // Counter& cnt = counter.local(); // cnt++; // // WARNING: Eigen::ThreadLocal uses the OS-specific value returned by // std::this_thread::get_id() to identify threads. This value is not guaranteed // to be unique except for the life of the thread. A newly created thread may // get an OS-specific ID equal to that of an already destroyed thread. // // Somewhat similar to TBB thread local storage, with similar restrictions: // https://www.threadingbuildingblocks.org/docs/help/reference/thread_local_storage/enumerable_thread_specific_cls.html // template , typename Release = internal::ThreadLocalNoOpRelease> class ThreadLocal { // We preallocate default constructed elements in MaxSizedVector. static_assert(std::is_default_constructible::value, "ThreadLocal data type must be default constructible"); public: explicit ThreadLocal(int capacity) : ThreadLocal(capacity, internal::ThreadLocalNoOpInitialize(), internal::ThreadLocalNoOpRelease()) {} ThreadLocal(int capacity, Initialize initialize) : ThreadLocal(capacity, std::move(initialize), internal::ThreadLocalNoOpRelease()) {} ThreadLocal(int capacity, Initialize initialize, Release release) : initialize_(std::move(initialize)), release_(std::move(release)), capacity_(capacity), data_(capacity_), ptr_(capacity_), filled_records_(0) { eigen_assert(capacity_ >= 0); data_.resize(capacity_); for (int i = 0; i < capacity_; ++i) { ptr_.emplace_back(nullptr); } } T& local() { std::thread::id this_thread = std::this_thread::get_id(); if (capacity_ == 0) return SpilledLocal(this_thread); std::size_t h = std::hash()(this_thread); const int start_idx = h % capacity_; // NOTE: From the definition of `std::this_thread::get_id()` it is // guaranteed that we never can have concurrent insertions with the same key // to our hash-map like data structure. If we didn't find an element during // the initial traversal, it's guaranteed that no one else could have // inserted it while we are in this function. This allows to massively // simplify out lock-free insert-only hash map. // Check if we already have an element for `this_thread`. int idx = start_idx; while (ptr_[idx].load() != nullptr) { ThreadIdAndValue& record = *(ptr_[idx].load()); if (record.thread_id == this_thread) return record.value; idx += 1; if (idx >= capacity_) idx -= capacity_; if (idx == start_idx) break; } // If we are here, it means that we found an insertion point in lookup // table at `idx`, or we did a full traversal and table is full. // If lock-free storage is full, fallback on mutex. if (filled_records_.load() >= capacity_) return SpilledLocal(this_thread); // We double check that we still have space to insert an element into a lock // free storage. If old value in `filled_records_` is larger than the // records capacity, it means that some other thread added an element while // we were traversing lookup table. int insertion_index = filled_records_.fetch_add(1, std::memory_order_relaxed); if (insertion_index >= capacity_) return SpilledLocal(this_thread); // At this point it's guaranteed that we can access to // data_[insertion_index_] without a data race. data_[insertion_index].thread_id = this_thread; initialize_(data_[insertion_index].value); // That's the pointer we'll put into the lookup table. ThreadIdAndValue* inserted = &data_[insertion_index]; // We'll use nullptr pointer to ThreadIdAndValue in a compare-and-swap loop. ThreadIdAndValue* empty = nullptr; // Now we have to find an insertion point into the lookup table. We start // from the `idx` that was identified as an insertion point above, it's // guaranteed that we will have an empty record somewhere in a lookup table // (because we created a record in the `data_`). const int insertion_idx = idx; do { // Always start search from the original insertion candidate. idx = insertion_idx; while (ptr_[idx].load() != nullptr) { idx += 1; if (idx >= capacity_) idx -= capacity_; // If we did a full loop, it means that we don't have any free entries // in the lookup table, and this means that something is terribly wrong. eigen_assert(idx != insertion_idx); } // Atomic CAS of the pointer guarantees that any other thread, that will // follow this pointer will see all the mutations in the `data_`. } while (!ptr_[idx].compare_exchange_weak(empty, inserted)); return inserted->value; } // WARN: It's not thread safe to call it concurrently with `local()`. void ForEach(std::function f) { // Reading directly from `data_` is unsafe, because only CAS to the // record in `ptr_` makes all changes visible to other threads. for (auto& ptr : ptr_) { ThreadIdAndValue* record = ptr.load(); if (record == nullptr) continue; f(record->thread_id, record->value); } // We did not spill into the map based storage. if (filled_records_.load(std::memory_order_relaxed) < capacity_) return; // Adds a happens before edge from the last call to SpilledLocal(). std::unique_lock lock(mu_); for (auto& kv : per_thread_map_) { f(kv.first, kv.second); } } // WARN: It's not thread safe to call it concurrently with `local()`. ~ThreadLocal() { // Reading directly from `data_` is unsafe, because only CAS to the record // in `ptr_` makes all changes visible to other threads. for (auto& ptr : ptr_) { ThreadIdAndValue* record = ptr.load(); if (record == nullptr) continue; release_(record->value); } // We did not spill into the map based storage. if (filled_records_.load(std::memory_order_relaxed) < capacity_) return; // Adds a happens before edge from the last call to SpilledLocal(). std::unique_lock lock(mu_); for (auto& kv : per_thread_map_) { release_(kv.second); } } private: struct ThreadIdAndValue { std::thread::id thread_id; T value; }; // Use unordered map guarded by a mutex when lock free storage is full. T& SpilledLocal(std::thread::id this_thread) { std::unique_lock lock(mu_); auto it = per_thread_map_.find(this_thread); if (it == per_thread_map_.end()) { auto result = per_thread_map_.emplace(this_thread, T()); eigen_assert(result.second); initialize_((*result.first).second); return (*result.first).second; } else { return it->second; } } Initialize initialize_; Release release_; const int capacity_; // Storage that backs lock-free lookup table `ptr_`. Records stored in this // storage contiguously starting from index 0. MaxSizeVector data_; // Atomic pointers to the data stored in `data_`. Used as a lookup table for // linear probing hash map (https://en.wikipedia.org/wiki/Linear_probing). MaxSizeVector> ptr_; // Number of records stored in the `data_`. std::atomic filled_records_; // We fallback on per thread map if lock-free storage is full. In practice // this should never happen, if `capacity_` is a reasonable estimate of the // number of threads running in a system. std::mutex mu_; // Protects per_thread_map_. std::unordered_map per_thread_map_; }; } // namespace Eigen #endif // EIGEN_CXX11_THREADPOOL_THREAD_LOCAL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/0000755000176200001440000000000014562417221023016 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h0000644000176200001440000003024014562417221025540 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Eugene Brevdo // Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H #define EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H namespace Eigen { namespace internal { /** \class TensorIndexTuple * \ingroup CXX11_Tensor_Module * * \brief Tensor + Index Tuple class. * * */ template struct traits > : public traits { typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef Tuple Scalar; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; }; template struct eval, Eigen::Dense> { typedef const TensorIndexTupleOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorIndexTupleOp type; }; } // end namespace internal template class TensorIndexTupleOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; typedef Tuple CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIndexTupleOp(const XprType& expr) : m_xpr(expr) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorIndexTupleOp XprType; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename TensorEvaluator::Dimensions Dimensions; static const int NumDims = internal::array_size::value; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = /*TensorEvaluator::IsAligned*/ false, PacketAccess = /*TensorEvaluator::PacketAccess*/ false, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return CoeffReturnType(index, m_impl.coeff(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, 1); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: TensorEvaluator m_impl; }; namespace internal { /** \class TensorTupleIndex * \ingroup CXX11_Tensor_Module * * \brief Converts to Tensor > and reduces to Tensor. * */ template struct traits > : public traits { typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef Index Scalar; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions - array_size::value; static const int Layout = XprTraits::Layout; }; template struct eval, Eigen::Dense> { typedef const TensorTupleReducerOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorTupleReducerOp type; }; } // end namespace internal template class TensorTupleReducerOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; typedef Index CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorTupleReducerOp(const XprType& expr, const ReduceOp& reduce_op, const Index return_dim, const Dims& reduce_dims) : m_xpr(expr), m_reduce_op(reduce_op), m_return_dim(return_dim), m_reduce_dims(reduce_dims) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_DEVICE_FUNC const ReduceOp& reduce_op() const { return m_reduce_op; } EIGEN_DEVICE_FUNC const Dims& reduce_dims() const { return m_reduce_dims; } EIGEN_DEVICE_FUNC Index return_dim() const { return m_return_dim; } protected: typename XprType::Nested m_xpr; const ReduceOp m_reduce_op; const Index m_return_dim; const Dims m_reduce_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorTupleReducerOp XprType; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename TensorIndexTupleOp::CoeffReturnType TupleType; typedef typename TensorEvaluator >, Device>::Dimensions Dimensions; typedef typename TensorEvaluator , Device>::Dimensions InputDimensions; static const int NumDims = internal::array_size::value; typedef array StrideDims; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef StorageMemory TupleStorageMem; enum { IsAligned = /*TensorEvaluator::IsAligned*/ false, PacketAccess = /*TensorEvaluator::PacketAccess*/ false, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator >, Device>::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_orig_impl(op.expression(), device), m_impl(op.expression().index_tuples().reduce(op.reduce_dims(), op.reduce_op()), device), m_return_dim(op.return_dim()) { gen_strides(m_orig_impl.dimensions(), m_strides); if (Layout == static_cast(ColMajor)) { const Index total_size = internal::array_prod(m_orig_impl.dimensions()); m_stride_mod = (m_return_dim < NumDims - 1) ? m_strides[m_return_dim + 1] : total_size; } else { const Index total_size = internal::array_prod(m_orig_impl.dimensions()); m_stride_mod = (m_return_dim > 0) ? m_strides[m_return_dim - 1] : total_size; } // If m_return_dim is not a valid index, returns 1 or this can crash on Windows. m_stride_div = ((m_return_dim >= 0) && (m_return_dim < static_cast(m_strides.size()))) ? m_strides[m_return_dim] : 1; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { const TupleType v = m_impl.coeff(index); return (m_return_dim < 0) ? v.first : (v.first % m_stride_mod) / m_stride_div; } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); m_orig_impl.bind(cgh); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = 1.0 + (m_return_dim < 0 ? 0.0 : (TensorOpCost::ModCost() + TensorOpCost::DivCost())); return m_orig_impl.costPerCoeff(vectorized) + m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost); } private: EIGEN_DEVICE_FUNC void gen_strides(const InputDimensions& dims, StrideDims& strides) { if (m_return_dim < 0) { return; // Won't be using the strides. } eigen_assert(m_return_dim < NumDims && "Asking to convert index to a dimension outside of the rank"); // Calculate m_stride_div and m_stride_mod, which are used to // calculate the value of an index w.r.t. the m_return_dim. if (Layout == static_cast(ColMajor)) { strides[0] = 1; for (int i = 1; i < NumDims; ++i) { strides[i] = strides[i-1] * dims[i-1]; } } else { strides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { strides[i] = strides[i+1] * dims[i+1]; } } } protected: TensorEvaluator, Device> m_orig_impl; TensorEvaluator >, Device> m_impl; const Index m_return_dim; StrideDims m_strides; Index m_stride_mod; Index m_stride_div; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h0000644000176200001440000003014114562417221025601 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // Copyright (C) 2018 Mehdi Goli Codeplay Software Ltd. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_RANDOM_H #define EIGEN_CXX11_TENSOR_TENSOR_RANDOM_H namespace Eigen { namespace internal { namespace { EIGEN_DEVICE_FUNC uint64_t get_random_seed() { #if defined(EIGEN_GPU_COMPILE_PHASE) // We don't support 3d kernels since we currently only use 1 and // 2d kernels. gpu_assert(threadIdx.z == 0); return blockIdx.x * blockDim.x + threadIdx.x + gridDim.x * blockDim.x * (blockIdx.y * blockDim.y + threadIdx.y); #else // Rely on Eigen's random implementation. return random(); #endif } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE unsigned PCG_XSH_RS_generator(uint64_t* state, uint64_t stream) { // TODO: Unify with the implementation in the non blocking thread pool. uint64_t current = *state; // Update the internal state *state = current * 6364136223846793005ULL + (stream << 1 | 1); // Generate the random output (using the PCG-XSH-RS scheme) return static_cast((current ^ (current >> 22)) >> (22 + (current >> 61))); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE uint64_t PCG_XSH_RS_state(uint64_t seed) { seed = seed ? seed : get_random_seed(); return seed * 6364136223846793005ULL + 0xda3e39cb94b95bdbULL; } } // namespace template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T RandomToTypeUniform(uint64_t* state, uint64_t stream) { unsigned rnd = PCG_XSH_RS_generator(state, stream); return static_cast(rnd); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half RandomToTypeUniform(uint64_t* state, uint64_t stream) { // Generate 10 random bits for the mantissa, merge with exponent. unsigned rnd = PCG_XSH_RS_generator(state, stream); const uint16_t half_bits = static_cast(rnd & 0x3ffu) | (static_cast(15) << 10); Eigen::half result = Eigen::numext::bit_cast(half_bits); // Return the final result return result - Eigen::half(1.0f); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 RandomToTypeUniform(uint64_t* state, uint64_t stream) { // Generate 7 random bits for the mantissa, merge with exponent. unsigned rnd = PCG_XSH_RS_generator(state, stream); const uint16_t half_bits = static_cast(rnd & 0x7fu) | (static_cast(127) << 7); Eigen::bfloat16 result = Eigen::numext::bit_cast(half_bits); // Return the final result return result - Eigen::bfloat16(1.0f); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float RandomToTypeUniform(uint64_t* state, uint64_t stream) { typedef union { uint32_t raw; float fp; } internal; internal result; // Generate 23 random bits for the mantissa mantissa const unsigned rnd = PCG_XSH_RS_generator(state, stream); result.raw = rnd & 0x7fffffu; // Set the exponent result.raw |= (static_cast(127) << 23); // Return the final result return result.fp - 1.0f; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double RandomToTypeUniform(uint64_t* state, uint64_t stream) { typedef union { uint64_t raw; double dp; } internal; internal result; result.raw = 0; // Generate 52 random bits for the mantissa // First generate the upper 20 bits unsigned rnd1 = PCG_XSH_RS_generator(state, stream) & 0xfffffu; // The generate the lower 32 bits unsigned rnd2 = PCG_XSH_RS_generator(state, stream); result.raw = (static_cast(rnd1) << 32) | rnd2; // Set the exponent result.raw |= (static_cast(1023) << 52); // Return the final result return result.dp - 1.0; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex RandomToTypeUniform >(uint64_t* state, uint64_t stream) { return std::complex(RandomToTypeUniform(state, stream), RandomToTypeUniform(state, stream)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex RandomToTypeUniform >(uint64_t* state, uint64_t stream) { return std::complex(RandomToTypeUniform(state, stream), RandomToTypeUniform(state, stream)); } template class UniformRandomGenerator { public: static const bool PacketAccess = true; // Uses the given "seed" if non-zero, otherwise uses a random seed. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE UniformRandomGenerator( uint64_t seed = 0) { m_state = PCG_XSH_RS_state(seed); #ifdef EIGEN_USE_SYCL // In SYCL it is not possible to build PCG_XSH_RS_state in one step. // Therefor, we need two step to initializate the m_state. // IN SYCL, the constructor of the functor is s called on the CPU // and we get the clock seed here from the CPU. However, This seed is //the same for all the thread. As unlike CUDA, the thread.ID, BlockID, etc is not a global function. // and only available on the Operator() function (which is called on the GPU). // Thus for CUDA (((CLOCK + global_thread_id)* 6364136223846793005ULL) + 0xda3e39cb94b95bdbULL) is passed to each thread // but for SYCL ((CLOCK * 6364136223846793005ULL) + 0xda3e39cb94b95bdbULL) is passed to each thread and each thread adds // the (global_thread_id* 6364136223846793005ULL) for itself only once, in order to complete the construction // similar to CUDA Therefore, the thread Id injection is not available at this stage. //However when the operator() is called the thread ID will be avilable. So inside the opeator, // we add the thrreadID, BlockId,... (which is equivalent of i) //to the seed and construct the unique m_state per thead similar to cuda. m_exec_once =false; #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE UniformRandomGenerator( const UniformRandomGenerator& other) { m_state = other.m_state; #ifdef EIGEN_USE_SYCL m_exec_once =other.m_exec_once; #endif } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(Index i) const { #ifdef EIGEN_USE_SYCL if(!m_exec_once) { // This is the second stage of adding thread Id to the CPU clock seed and build unique seed per thread // The (i * 6364136223846793005ULL) is the remaining part of the PCG_XSH_RS_state on the GPU side m_state += (i * 6364136223846793005ULL); m_exec_once =true; } #endif T result = RandomToTypeUniform(&m_state, i); return result; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(Index i) const { const int packetSize = internal::unpacket_traits::size; EIGEN_ALIGN_MAX T values[packetSize]; #ifdef EIGEN_USE_SYCL if(!m_exec_once) { // This is the second stage of adding thread Id to the CPU clock seed and build unique seed per thread m_state += (i * 6364136223846793005ULL); m_exec_once =true; } #endif EIGEN_UNROLL_LOOP for (int j = 0; j < packetSize; ++j) { values[j] = RandomToTypeUniform(&m_state, i); } return internal::pload(values); } private: mutable uint64_t m_state; #ifdef EIGEN_USE_SYCL mutable bool m_exec_once; #endif }; template struct functor_traits > { enum { // Rough estimate for floating point, multiplied by ceil(sizeof(T) / sizeof(float)). Cost = 12 * NumTraits::AddCost * ((sizeof(Scalar) + sizeof(float) - 1) / sizeof(float)), PacketAccess = UniformRandomGenerator::PacketAccess }; }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T RandomToTypeNormal(uint64_t* state, uint64_t stream) { // Use the ratio of uniform method to generate numbers following a normal // distribution. See for example Numerical Recipes chapter 7.3.9 for the // details. T u, v, q; do { u = RandomToTypeUniform(state, stream); v = T(1.7156) * (RandomToTypeUniform(state, stream) - T(0.5)); const T x = u - T(0.449871); const T y = numext::abs(v) + T(0.386595); q = x*x + y * (T(0.196)*y - T(0.25472)*x); } while (q > T(0.27597) && (q > T(0.27846) || v*v > T(-4) * numext::log(u) * u*u)); return v/u; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex RandomToTypeNormal >(uint64_t* state, uint64_t stream) { return std::complex(RandomToTypeNormal(state, stream), RandomToTypeNormal(state, stream)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex RandomToTypeNormal >(uint64_t* state, uint64_t stream) { return std::complex(RandomToTypeNormal(state, stream), RandomToTypeNormal(state, stream)); } template class NormalRandomGenerator { public: static const bool PacketAccess = true; // Uses the given "seed" if non-zero, otherwise uses a random seed. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE NormalRandomGenerator(uint64_t seed = 0) { m_state = PCG_XSH_RS_state(seed); #ifdef EIGEN_USE_SYCL // In SYCL it is not possible to build PCG_XSH_RS_state in one step. // Therefor, we need two steps to initializate the m_state. // IN SYCL, the constructor of the functor is s called on the CPU // and we get the clock seed here from the CPU. However, This seed is //the same for all the thread. As unlike CUDA, the thread.ID, BlockID, etc is not a global function. // and only available on the Operator() function (which is called on the GPU). // Therefore, the thread Id injection is not available at this stage. However when the operator() //is called the thread ID will be avilable. So inside the opeator, // we add the thrreadID, BlockId,... (which is equivalent of i) //to the seed and construct the unique m_state per thead similar to cuda. m_exec_once =false; #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE NormalRandomGenerator( const NormalRandomGenerator& other) { m_state = other.m_state; #ifdef EIGEN_USE_SYCL m_exec_once=other.m_exec_once; #endif } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(Index i) const { #ifdef EIGEN_USE_SYCL if(!m_exec_once) { // This is the second stage of adding thread Id to the CPU clock seed and build unique seed per thread m_state += (i * 6364136223846793005ULL); m_exec_once =true; } #endif T result = RandomToTypeNormal(&m_state, i); return result; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(Index i) const { const int packetSize = internal::unpacket_traits::size; EIGEN_ALIGN_MAX T values[packetSize]; #ifdef EIGEN_USE_SYCL if(!m_exec_once) { // This is the second stage of adding thread Id to the CPU clock seed and build unique seed per thread m_state += (i * 6364136223846793005ULL); m_exec_once =true; } #endif EIGEN_UNROLL_LOOP for (int j = 0; j < packetSize; ++j) { values[j] = RandomToTypeNormal(&m_state, i); } return internal::pload(values); } private: mutable uint64_t m_state; #ifdef EIGEN_USE_SYCL mutable bool m_exec_once; #endif }; template struct functor_traits > { enum { // On average, we need to generate about 3 random numbers // 15 mul, 8 add, 1.5 logs Cost = 3 * functor_traits >::Cost + 15 * NumTraits::AddCost + 8 * NumTraits::AddCost + 3 * functor_traits >::Cost / 2, PacketAccess = NormalRandomGenerator::PacketAccess }; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_RANDOM_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceSycl.h0000644000176200001440000011665714562417221026434 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #if defined(EIGEN_USE_SYCL) && !defined(EIGEN_CXX11_TENSOR_TENSOR_DEVICE_SYCL_H) #define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_SYCL_H #include namespace Eigen { namespace TensorSycl { namespace internal { /// Cache all the device information needed struct SyclDeviceInfo { SyclDeviceInfo(cl::sycl::queue queue) : local_mem_type( queue.get_device() .template get_info()), max_work_item_sizes( queue.get_device() .template get_info< cl::sycl::info::device::max_work_item_sizes>()), max_mem_alloc_size( queue.get_device() .template get_info< cl::sycl::info::device::max_mem_alloc_size>()), max_compute_units(queue.get_device() .template get_info< cl::sycl::info::device::max_compute_units>()), max_work_group_size( queue.get_device() .template get_info< cl::sycl::info::device::max_work_group_size>()), local_mem_size( queue.get_device() .template get_info()), platform_name(queue.get_device() .get_platform() .template get_info()), device_name(queue.get_device() .template get_info()), device_vendor( queue.get_device() .template get_info()) {} cl::sycl::info::local_mem_type local_mem_type; cl::sycl::id<3> max_work_item_sizes; unsigned long max_mem_alloc_size; unsigned long max_compute_units; unsigned long max_work_group_size; size_t local_mem_size; std::string platform_name; std::string device_name; std::string device_vendor; }; } // end namespace internal } // end namespace TensorSycl typedef TensorSycl::internal::buffer_data_type_t buffer_scalar_t; // All devices (even AMD CPU with intel OpenCL runtime) that support OpenCL and // can consume SPIR or SPIRV can use the Eigen SYCL backend and consequently // TensorFlow via the Eigen SYCL Backend. EIGEN_STRONG_INLINE auto get_sycl_supported_devices() -> decltype(cl::sycl::device::get_devices()) { #ifdef EIGEN_SYCL_USE_DEFAULT_SELECTOR return {cl::sycl::device(cl::sycl::default_selector())}; #else std::vector supported_devices; auto platform_list = cl::sycl::platform::get_platforms(); for (const auto &platform : platform_list) { auto device_list = platform.get_devices(); auto platform_name = platform.template get_info(); std::transform(platform_name.begin(), platform_name.end(), platform_name.begin(), ::tolower); for (const auto &device : device_list) { auto vendor = device.template get_info(); std::transform(vendor.begin(), vendor.end(), vendor.begin(), ::tolower); bool unsupported_condition = (device.is_cpu() && platform_name.find("amd") != std::string::npos && vendor.find("apu") == std::string::npos) || (platform_name.find("experimental") != std::string::npos) || device.is_host(); if (!unsupported_condition) { supported_devices.push_back(device); } } } return supported_devices; #endif } class QueueInterface { public: /// Creating device by using cl::sycl::selector or cl::sycl::device. template explicit QueueInterface( const DeviceOrSelector &dev_or_sel, cl::sycl::async_handler handler, unsigned num_threads = std::thread::hardware_concurrency()) : m_queue(dev_or_sel, handler), #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS m_prog(m_queue.get_context(), get_sycl_supported_devices()), #endif m_thread_pool(num_threads), m_device_info(m_queue) { #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS m_prog.build_with_kernel_type(); auto f = [&](cl::sycl::handler &cgh) { cgh.single_task(m_prog.get_kernel(), [=]() {}) }; EIGEN_SYCL_TRY_CATCH(m_queue.submit(f)); #endif } template explicit QueueInterface( const DeviceOrSelector &dev_or_sel, unsigned num_threads = std::thread::hardware_concurrency()) : QueueInterface(dev_or_sel, [this](cl::sycl::exception_list l) { this->exception_caught_ = this->sycl_async_handler(l); }, num_threads) {} #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS EIGEN_STRONG_INLINE cl::sycl::program &program() const { return m_prog; } #endif /// Attach an existing buffer to the pointer map, Eigen will not reuse it EIGEN_STRONG_INLINE void *attach_buffer( cl::sycl::buffer &buf) const { std::lock_guard lock(pmapper_mutex_); return static_cast(pMapper.add_pointer(buf)); } /// Detach previously attached buffer EIGEN_STRONG_INLINE void detach_buffer(void *p) const { std::lock_guard lock(pmapper_mutex_); TensorSycl::internal::SYCLfree(p, pMapper); } /// Allocating device pointer. This pointer is actually an 8 bytes host /// pointer used as key to access the sycl device buffer. The reason is that /// we cannot use device buffer as a pointer as a m_data in Eigen leafNode /// expressions. So we create a key pointer to be used in Eigen expression /// construction. When we convert the Eigen construction into the sycl /// construction we use this pointer as a key in our buffer_map and we make /// sure that we dedicate only one buffer only for this pointer. The device /// pointer would be deleted by calling deallocate function. EIGEN_STRONG_INLINE void *allocate(size_t num_bytes) const { #if EIGEN_MAX_ALIGN_BYTES > 0 size_t align = num_bytes % EIGEN_MAX_ALIGN_BYTES; if (align > 0) { num_bytes += EIGEN_MAX_ALIGN_BYTES - align; } #endif std::lock_guard lock(pmapper_mutex_); return TensorSycl::internal::SYCLmalloc(num_bytes, pMapper); } EIGEN_STRONG_INLINE void *allocate_temp(size_t num_bytes) const { #if EIGEN_MAX_ALIGN_BYTES > 0 size_t align = num_bytes % EIGEN_MAX_ALIGN_BYTES; if (align > 0) { num_bytes += EIGEN_MAX_ALIGN_BYTES - align; } #endif std::lock_guard lock(pmapper_mutex_); #ifndef EIGEN_SYCL_NO_REUSE_BUFFERS if (scratch_buffers.empty()) { return TensorSycl::internal::SYCLmalloc(num_bytes, pMapper); ; } else { for (auto it = scratch_buffers.begin(); it != scratch_buffers.end();) { auto buff = pMapper.get_buffer(*it); if (buff.get_size() >= num_bytes) { auto ptr = *it; scratch_buffers.erase(it); return ptr; } else { ++it; } } return TensorSycl::internal::SYCLmalloc(num_bytes, pMapper); } #else return TensorSycl::internal::SYCLmalloc(num_bytes, pMapper); #endif } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorSycl::internal::RangeAccess< cl::sycl::access::mode::read_write, data_t> get(data_t *data) const { return get_range_accessor(data); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE data_t *get( TensorSycl::internal::RangeAccess data) const { return static_cast(data.get_virtual_pointer()); } EIGEN_STRONG_INLINE void deallocate_temp(void *p) const { std::lock_guard lock(pmapper_mutex_); #ifndef EIGEN_SYCL_NO_REUSE_BUFFERS scratch_buffers.insert(p); #else TensorSycl::internal::SYCLfree(p, pMapper); #endif } template EIGEN_STRONG_INLINE void deallocate_temp( const TensorSycl::internal::RangeAccess &p) const { deallocate_temp(p.get_virtual_pointer()); } /// This is used to deallocate the device pointer. p is used as a key inside /// the map to find the device buffer and delete it. EIGEN_STRONG_INLINE void deallocate(void *p) const { std::lock_guard lock(pmapper_mutex_); TensorSycl::internal::SYCLfree(p, pMapper); } EIGEN_STRONG_INLINE void deallocate_all() const { std::lock_guard lock(pmapper_mutex_); TensorSycl::internal::SYCLfreeAll(pMapper); #ifndef EIGEN_SYCL_NO_REUSE_BUFFERS scratch_buffers.clear(); #endif } /// The memcpyHostToDevice is used to copy the data from host to device /// The destination pointer could be deleted before the copy happend which is /// why a callback function is needed. By default if none is provided, the /// function is blocking. EIGEN_STRONG_INLINE void memcpyHostToDevice( void *dst, const void *src, size_t n, std::function callback) const { static const auto write_mode = cl::sycl::access::mode::discard_write; static const auto global_access = cl::sycl::access::target::global_buffer; typedef cl::sycl::accessor write_accessor; if (n == 0) { if (callback) callback(); return; } n /= sizeof(buffer_scalar_t); auto f = [&](cl::sycl::handler &cgh) { write_accessor dst_acc = get_range_accessor(cgh, dst, n); buffer_scalar_t const *ptr = static_cast(src); auto non_deleter = [](buffer_scalar_t const *) {}; std::shared_ptr s_ptr(ptr, non_deleter); cgh.copy(s_ptr, dst_acc); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(f)); synchronize_and_callback(e, callback); } /// The memcpyDeviceToHost is used to copy the data from device to host. /// The source pointer could be deleted before the copy happend which is /// why a callback function is needed. By default if none is provided, the /// function is blocking. EIGEN_STRONG_INLINE void memcpyDeviceToHost( void *dst, const void *src, size_t n, std::function callback) const { static const auto read_mode = cl::sycl::access::mode::read; static const auto global_access = cl::sycl::access::target::global_buffer; typedef cl::sycl::accessor read_accessor; if (n == 0) { if (callback) callback(); return; } n /= sizeof(buffer_scalar_t); auto f = [&](cl::sycl::handler &cgh) { read_accessor src_acc = get_range_accessor(cgh, src, n); buffer_scalar_t *ptr = static_cast(dst); auto non_deleter = [](buffer_scalar_t *) {}; std::shared_ptr s_ptr(ptr, non_deleter); cgh.copy(src_acc, s_ptr); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(f)); synchronize_and_callback(e, callback); } /// The memcpy function. /// No callback is required here as both arguments are on the device /// and SYCL can handle the dependency. EIGEN_STRONG_INLINE void memcpy(void *dst, const void *src, size_t n) const { static const auto read_mode = cl::sycl::access::mode::read; static const auto write_mode = cl::sycl::access::mode::discard_write; if (n == 0) { return; } n /= sizeof(buffer_scalar_t); auto f = [&](cl::sycl::handler &cgh) { auto src_acc = get_range_accessor(cgh, src, n); auto dst_acc = get_range_accessor(cgh, dst, n); cgh.copy(src_acc, dst_acc); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(f)); async_synchronize(e); } /// the memset function. /// No callback is required here as both arguments are on the device /// and SYCL can handle the dependency. EIGEN_STRONG_INLINE void memset(void *data, int c, size_t n) const { static const auto write_mode = cl::sycl::access::mode::discard_write; if (n == 0) { return; } n /= sizeof(buffer_scalar_t); auto f = [&](cl::sycl::handler &cgh) { auto dst_acc = get_range_accessor(cgh, data, n); // The cast to uint8_t is here to match the behaviour of the standard // memset. The cast to buffer_scalar_t is needed to match the type of the // accessor (in case buffer_scalar_t is not uint8_t) cgh.fill(dst_acc, static_cast(static_cast(c))); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(f)); async_synchronize(e); } /// Get a range accessor to the virtual pointer's device memory. This range /// accessor will allow access to the memory from the pointer to the end of /// the buffer. /// /// NOTE: Inside a kernel the range accessor will always be indexed from the /// start of the buffer, so the offset in the accessor is only used by /// methods like handler::copy and will not be available inside a kernel. template EIGEN_STRONG_INLINE TensorSycl::internal::RangeAccess get_range_accessor(const void *ptr) const { static const auto global_access = cl::sycl::access::target::global_buffer; static const auto is_place_holder = cl::sycl::access::placeholder::true_t; typedef TensorSycl::internal::RangeAccess ret_type; typedef const TensorSycl::internal::buffer_data_type_t *internal_ptr_t; std::lock_guard lock(pmapper_mutex_); auto original_buffer = pMapper.get_buffer(ptr); const ptrdiff_t offset = pMapper.get_offset(ptr); const ptrdiff_t typed_offset = offset / sizeof(T); eigen_assert(typed_offset >= 0); const auto typed_size = original_buffer.get_size() / sizeof(T); auto buffer = original_buffer.template reinterpret< typename Eigen::internal::remove_const::type>( cl::sycl::range<1>(typed_size)); const ptrdiff_t size = buffer.get_count() - typed_offset; eigen_assert(size >= 0); typedef cl::sycl::accessor::type, 1, AcMd, global_access, is_place_holder> placeholder_accessor_t; const auto start_ptr = static_cast(ptr) - offset; return ret_type(placeholder_accessor_t(buffer, cl::sycl::range<1>(size), cl::sycl::id<1>(typed_offset)), static_cast(typed_offset), reinterpret_cast(start_ptr)); } /// Get a range accessor to the virtual pointer's device memory with a /// specified size. template EIGEN_STRONG_INLINE cl::sycl::accessor< buffer_scalar_t, 1, AcMd, cl::sycl::access::target::global_buffer> get_range_accessor(cl::sycl::handler &cgh, const void *ptr, const Index n_bytes) const { static const auto global_access = cl::sycl::access::target::global_buffer; eigen_assert(n_bytes >= 0); std::lock_guard lock(pmapper_mutex_); auto buffer = pMapper.get_buffer(ptr); const ptrdiff_t offset = pMapper.get_offset(ptr); eigen_assert(offset >= 0); eigen_assert(offset + n_bytes <= buffer.get_size()); return buffer.template get_access( cgh, cl::sycl::range<1>(n_bytes), cl::sycl::id<1>(offset)); } /// Creation of sycl accessor for a buffer. This function first tries to find /// the buffer in the buffer_map. If found it gets the accessor from it, if /// not, the function then adds an entry by creating a sycl buffer for that /// particular pointer. template EIGEN_STRONG_INLINE cl::sycl::accessor< buffer_scalar_t, 1, AcMd, cl::sycl::access::target::global_buffer> get_sycl_accessor(cl::sycl::handler &cgh, const void *ptr) const { std::lock_guard lock(pmapper_mutex_); return pMapper.get_buffer(ptr) .template get_access( cgh); } EIGEN_STRONG_INLINE cl::sycl::buffer get_sycl_buffer( const void *ptr) const { std::lock_guard lock(pmapper_mutex_); return pMapper.get_buffer(ptr); } EIGEN_STRONG_INLINE ptrdiff_t get_offset(const void *ptr) const { std::lock_guard lock(pmapper_mutex_); return pMapper.get_offset(ptr); } template EIGEN_ALWAYS_INLINE void binary_kernel_launcher(const Lhs &lhs, const Rhs &rhs, OutPtr outptr, Range thread_range, Index scratchSize, T... var) const { auto kernel_functor = [=](cl::sycl::handler &cgh) { // binding the placeholder accessors to a commandgroup handler lhs.bind(cgh); rhs.bind(cgh); outptr.bind(cgh); typedef cl::sycl::accessor LocalAccessor; LocalAccessor scratch(cl::sycl::range<1>(scratchSize), cgh); cgh.parallel_for( #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS program().template get_kernel(), #endif thread_range, sycl_kernel(scratch, lhs, rhs, outptr, var...)); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(kernel_functor)); async_synchronize(e); } template EIGEN_ALWAYS_INLINE void unary_kernel_launcher(const InPtr &inptr, OutPtr &outptr, Range thread_range, Index scratchSize, T... var) const { auto kernel_functor = [=](cl::sycl::handler &cgh) { // binding the placeholder accessors to a commandgroup handler inptr.bind(cgh); outptr.bind(cgh); typedef cl::sycl::accessor LocalAccessor; LocalAccessor scratch(cl::sycl::range<1>(scratchSize), cgh); cgh.parallel_for( #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS program().template get_kernel(), #endif thread_range, sycl_kernel(scratch, inptr, outptr, var...)); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(kernel_functor)); async_synchronize(e); } template EIGEN_ALWAYS_INLINE void nullary_kernel_launcher(const InPtr &inptr, Range thread_range, Index scratchSize, T... var) const { auto kernel_functor = [=](cl::sycl::handler &cgh) { // binding the placeholder accessors to a commandgroup handler inptr.bind(cgh); typedef cl::sycl::accessor LocalAccessor; LocalAccessor scratch(cl::sycl::range<1>(scratchSize), cgh); cgh.parallel_for( #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS program().template get_kernel(), #endif thread_range, sycl_kernel(scratch, inptr, var...)); }; cl::sycl::event e; EIGEN_SYCL_TRY_CATCH(e = m_queue.submit(kernel_functor)); async_synchronize(e); } EIGEN_STRONG_INLINE void synchronize() const { #ifdef EIGEN_EXCEPTIONS m_queue.wait_and_throw(); #else m_queue.wait(); #endif } EIGEN_STRONG_INLINE void async_synchronize(cl::sycl::event e) const { set_latest_event(e); #ifndef EIGEN_SYCL_ASYNC_EXECUTION synchronize(); #endif } template EIGEN_STRONG_INLINE void parallel_for_setup(Index n, Index &tileSize, Index &rng, Index &GRange) const { tileSize = static_cast(getNearestPowerOfTwoWorkGroupSize()); tileSize = std::min(static_cast(EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1), static_cast(tileSize)); rng = n; if (rng == 0) rng = static_cast(1); GRange = rng; if (tileSize > GRange) tileSize = GRange; else if (GRange > tileSize) { Index xMode = static_cast(GRange % tileSize); if (xMode != 0) GRange += static_cast(tileSize - xMode); } } /// This is used to prepare the number of threads and also the number of /// threads per block for sycl kernels template EIGEN_STRONG_INLINE void parallel_for_setup( const std::array &input_dim, cl::sycl::range<2> &global_range, cl::sycl::range<2> &local_range) const { std::array input_range = input_dim; Index max_workgroup_Size = static_cast(getNearestPowerOfTwoWorkGroupSize()); max_workgroup_Size = std::min(static_cast(EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1), static_cast(max_workgroup_Size)); Index pow_of_2 = static_cast(std::log2(max_workgroup_Size)); local_range[1] = static_cast(std::pow(2, static_cast(pow_of_2 / 2))); input_range[1] = input_dim[1]; if (input_range[1] == 0) input_range[1] = static_cast(1); global_range[1] = input_range[1]; if (local_range[1] > global_range[1]) local_range[1] = global_range[1]; else if (global_range[1] > local_range[1]) { Index xMode = static_cast(global_range[1] % local_range[1]); if (xMode != 0) global_range[1] += static_cast(local_range[1] - xMode); } local_range[0] = static_cast(max_workgroup_Size / local_range[1]); input_range[0] = input_dim[0]; if (input_range[0] == 0) input_range[0] = static_cast(1); global_range[0] = input_range[0]; if (local_range[0] > global_range[0]) local_range[0] = global_range[0]; else if (global_range[0] > local_range[0]) { Index xMode = static_cast(global_range[0] % local_range[0]); if (xMode != 0) global_range[0] += static_cast(local_range[0] - xMode); } } /// This is used to prepare the number of threads and also the number of /// threads per block for sycl kernels template EIGEN_STRONG_INLINE void parallel_for_setup( const std::array &input_dim, cl::sycl::range<3> &global_range, cl::sycl::range<3> &local_range) const { std::array input_range = input_dim; Index max_workgroup_Size = static_cast(getNearestPowerOfTwoWorkGroupSize()); max_workgroup_Size = std::min(static_cast(EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1), static_cast(max_workgroup_Size)); Index pow_of_2 = static_cast(std::log2(max_workgroup_Size)); local_range[2] = static_cast(std::pow(2, static_cast(pow_of_2 / 3))); input_range[2] = input_dim[2]; if (input_range[2] == 0) input_range[1] = static_cast(1); global_range[2] = input_range[2]; if (local_range[2] > global_range[2]) local_range[2] = global_range[2]; else if (global_range[2] > local_range[2]) { Index xMode = static_cast(global_range[2] % local_range[2]); if (xMode != 0) global_range[2] += static_cast(local_range[2] - xMode); } pow_of_2 = static_cast( std::log2(static_cast(max_workgroup_Size / local_range[2]))); local_range[1] = static_cast(std::pow(2, static_cast(pow_of_2 / 2))); input_range[1] = input_dim[1]; if (input_range[1] == 0) input_range[1] = static_cast(1); global_range[1] = input_range[1]; if (local_range[1] > global_range[1]) local_range[1] = global_range[1]; else if (global_range[1] > local_range[1]) { Index xMode = static_cast(global_range[1] % local_range[1]); if (xMode != 0) global_range[1] += static_cast(local_range[1] - xMode); } local_range[0] = static_cast(max_workgroup_Size / (local_range[1] * local_range[2])); input_range[0] = input_dim[0]; if (input_range[0] == 0) input_range[0] = static_cast(1); global_range[0] = input_range[0]; if (local_range[0] > global_range[0]) local_range[0] = global_range[0]; else if (global_range[0] > local_range[0]) { Index xMode = static_cast(global_range[0] % local_range[0]); if (xMode != 0) global_range[0] += static_cast(local_range[0] - xMode); } } EIGEN_STRONG_INLINE bool has_local_memory() const { #if !defined(EIGEN_SYCL_LOCAL_MEM) && defined(EIGEN_SYCL_NO_LOCAL_MEM) return false; #elif defined(EIGEN_SYCL_LOCAL_MEM) && !defined(EIGEN_SYCL_NO_LOCAL_MEM) return true; #else return m_device_info.local_mem_type == cl::sycl::info::local_mem_type::local; #endif } EIGEN_STRONG_INLINE unsigned long max_buffer_size() const { return m_device_info.max_mem_alloc_size; } EIGEN_STRONG_INLINE unsigned long getNumSyclMultiProcessors() const { return m_device_info.max_compute_units; } EIGEN_STRONG_INLINE unsigned long maxSyclThreadsPerBlock() const { return m_device_info.max_work_group_size; } EIGEN_STRONG_INLINE cl::sycl::id<3> maxWorkItemSizes() const { return m_device_info.max_work_item_sizes; } /// No need for sycl it should act the same as CPU version EIGEN_STRONG_INLINE int majorDeviceVersion() const { return 1; } EIGEN_STRONG_INLINE unsigned long maxSyclThreadsPerMultiProcessor() const { // OpenCL doesnot have such concept return 2; } EIGEN_STRONG_INLINE size_t sharedMemPerBlock() const { return m_device_info.local_mem_size; } // This function returns the nearest power of 2 Work-group size which is <= // maximum device workgroup size. EIGEN_STRONG_INLINE size_t getNearestPowerOfTwoWorkGroupSize() const { return getPowerOfTwo(m_device_info.max_work_group_size, false); } EIGEN_STRONG_INLINE std::string getPlatformName() const { return m_device_info.platform_name; } EIGEN_STRONG_INLINE std::string getDeviceName() const { return m_device_info.device_name; } EIGEN_STRONG_INLINE std::string getDeviceVendor() const { return m_device_info.device_vendor; } // This function returns the nearest power of 2 // if roundup is true returns result>=wgsize // else it return result <= wgsize EIGEN_STRONG_INLINE size_t getPowerOfTwo(size_t wGSize, bool roundUp) const { if (roundUp) --wGSize; wGSize |= (wGSize >> 1); wGSize |= (wGSize >> 2); wGSize |= (wGSize >> 4); wGSize |= (wGSize >> 8); wGSize |= (wGSize >> 16); #if EIGEN_ARCH_x86_64 || EIGEN_ARCH_ARM64 || EIGEN_OS_WIN64 wGSize |= (wGSize >> 32); #endif return ((!roundUp) ? (wGSize - (wGSize >> 1)) : ++wGSize); } EIGEN_STRONG_INLINE cl::sycl::queue &sycl_queue() const { return m_queue; } // This function checks if the runtime recorded an error for the // underlying stream device. EIGEN_STRONG_INLINE bool ok() const { if (!exception_caught_) { synchronize(); } return !exception_caught_; } EIGEN_STRONG_INLINE cl::sycl::event get_latest_event() const { #ifdef EIGEN_SYCL_STORE_LATEST_EVENT std::lock_guard lock(event_mutex_); return latest_events_[std::this_thread::get_id()]; #else eigen_assert(false); return cl::sycl::event(); #endif } // destructor ~QueueInterface() { pMapper.clear(); #ifndef EIGEN_SYCL_NO_REUSE_BUFFERS scratch_buffers.clear(); #endif } protected: EIGEN_STRONG_INLINE void set_latest_event(cl::sycl::event e) const { #ifdef EIGEN_SYCL_STORE_LATEST_EVENT std::lock_guard lock(event_mutex_); latest_events_[std::this_thread::get_id()] = e; #else EIGEN_UNUSED_VARIABLE(e); #endif } void synchronize_and_callback(cl::sycl::event e, const std::function &callback) const { set_latest_event(e); if (callback) { auto callback_ = [=]() { #ifdef EIGEN_EXCEPTIONS cl::sycl::event(e).wait_and_throw(); #else cl::sycl::event(e).wait(); #endif callback(); }; m_thread_pool.Schedule(std::move(callback_)); } else { #ifdef EIGEN_EXCEPTIONS m_queue.wait_and_throw(); #else m_queue.wait(); #endif } } bool sycl_async_handler(cl::sycl::exception_list exceptions) const { bool exception_caught = false; for (const auto &e : exceptions) { if (e) { exception_caught = true; EIGEN_THROW_X(e); } } return exception_caught; } /// class members: bool exception_caught_ = false; mutable std::mutex pmapper_mutex_; #ifdef EIGEN_SYCL_STORE_LATEST_EVENT mutable std::mutex event_mutex_; mutable std::unordered_map latest_events_; #endif /// std::map is the container used to make sure that we create only one buffer /// per pointer. The lifespan of the buffer now depends on the lifespan of /// SyclDevice. If a non-read-only pointer is needed to be accessed on the /// host we should manually deallocate it. mutable TensorSycl::internal::PointerMapper pMapper; #ifndef EIGEN_SYCL_NO_REUSE_BUFFERS mutable std::unordered_set scratch_buffers; #endif /// sycl queue mutable cl::sycl::queue m_queue; #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS mutable cl::sycl::program m_prog; #endif /// The thread pool is used to wait on events and call callbacks /// asynchronously mutable Eigen::ThreadPool m_thread_pool; const TensorSycl::internal::SyclDeviceInfo m_device_info; }; struct SyclDeviceBase { /// QueueInterface is not owned. it is the caller's responsibility to destroy /// it const QueueInterface *m_queue_stream; explicit SyclDeviceBase(const QueueInterface *queue_stream) : m_queue_stream(queue_stream) {} EIGEN_STRONG_INLINE const QueueInterface *queue_stream() const { return m_queue_stream; } }; // Here is a sycl device struct which accept the sycl queue interface // as an input struct SyclDevice : public SyclDeviceBase { explicit SyclDevice(const QueueInterface *queue_stream) : SyclDeviceBase(queue_stream) {} // this is the accessor used to construct the evaluator template EIGEN_STRONG_INLINE TensorSycl::internal::RangeAccess get_range_accessor(const void *ptr) const { return queue_stream()->template get_range_accessor(ptr); } // get sycl accessor template EIGEN_STRONG_INLINE cl::sycl::accessor< buffer_scalar_t, 1, AcMd, cl::sycl::access::target::global_buffer> get_sycl_accessor(cl::sycl::handler &cgh, const void *ptr) const { return queue_stream()->template get_sycl_accessor(cgh, ptr); } /// Accessing the created sycl device buffer for the device pointer EIGEN_STRONG_INLINE cl::sycl::buffer get_sycl_buffer( const void *ptr) const { return queue_stream()->get_sycl_buffer(ptr); } /// This is used to prepare the number of threads and also the number of /// threads per block for sycl kernels template EIGEN_STRONG_INLINE void parallel_for_setup(Index n, Index &tileSize, Index &rng, Index &GRange) const { queue_stream()->parallel_for_setup(n, tileSize, rng, GRange); } /// This is used to prepare the number of threads and also the number of /// threads per block for sycl kernels template EIGEN_STRONG_INLINE void parallel_for_setup( const std::array &input_dim, cl::sycl::range<2> &global_range, cl::sycl::range<2> &local_range) const { queue_stream()->parallel_for_setup(input_dim, global_range, local_range); } /// This is used to prepare the number of threads and also the number of /// threads per block for sycl kernels template EIGEN_STRONG_INLINE void parallel_for_setup( const std::array &input_dim, cl::sycl::range<3> &global_range, cl::sycl::range<3> &local_range) const { queue_stream()->parallel_for_setup(input_dim, global_range, local_range); } /// allocate device memory EIGEN_STRONG_INLINE void *allocate(size_t num_bytes) const { return queue_stream()->allocate(num_bytes); } EIGEN_STRONG_INLINE void *allocate_temp(size_t num_bytes) const { return queue_stream()->allocate_temp(num_bytes); } /// deallocate device memory EIGEN_STRONG_INLINE void deallocate(void *p) const { queue_stream()->deallocate(p); } EIGEN_STRONG_INLINE void deallocate_temp(void *buffer) const { queue_stream()->deallocate_temp(buffer); } template EIGEN_STRONG_INLINE void deallocate_temp( const TensorSycl::internal::RangeAccess &buffer) const { queue_stream()->deallocate_temp(buffer); } EIGEN_STRONG_INLINE void deallocate_all() const { queue_stream()->deallocate_all(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorSycl::internal::RangeAccess< cl::sycl::access::mode::read_write, data_t> get(data_t *data) const { return queue_stream()->get(data); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE data_t *get( TensorSycl::internal::RangeAccess data) const { return queue_stream()->get(data); } /// attach existing buffer EIGEN_STRONG_INLINE void *attach_buffer( cl::sycl::buffer &buf) const { return queue_stream()->attach_buffer(buf); } /// detach buffer EIGEN_STRONG_INLINE void detach_buffer(void *p) const { queue_stream()->detach_buffer(p); } EIGEN_STRONG_INLINE ptrdiff_t get_offset(const void *ptr) const { return queue_stream()->get_offset(ptr); } // some runtime conditions that can be applied here EIGEN_STRONG_INLINE bool isDeviceSuitable() const { return true; } /// memcpyHostToDevice template EIGEN_STRONG_INLINE void memcpyHostToDevice( Index *dst, const Index *src, size_t n, std::function callback = {}) const { queue_stream()->memcpyHostToDevice(dst, src, n, callback); } /// memcpyDeviceToHost template EIGEN_STRONG_INLINE void memcpyDeviceToHost( void *dst, const Index *src, size_t n, std::function callback = {}) const { queue_stream()->memcpyDeviceToHost(dst, src, n, callback); } /// the memcpy function template EIGEN_STRONG_INLINE void memcpy(void *dst, const Index *src, size_t n) const { queue_stream()->memcpy(dst, src, n); } /// the memset function EIGEN_STRONG_INLINE void memset(void *data, int c, size_t n) const { queue_stream()->memset(data, c, n); } /// returning the sycl queue EIGEN_STRONG_INLINE cl::sycl::queue &sycl_queue() const { return queue_stream()->sycl_queue(); } #ifdef EIGEN_SYCL_USE_PROGRAM_CLASS EIGEN_STRONG_INLINE cl::sycl::program &program() const { return queue_stream()->program(); } #endif EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const { return 48 * 1024; } EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const { // We won't try to take advantage of the l2 cache for the time being, and // there is no l3 cache on sycl devices. return firstLevelCacheSize(); } EIGEN_STRONG_INLINE unsigned long getNumSyclMultiProcessors() const { return queue_stream()->getNumSyclMultiProcessors(); } EIGEN_STRONG_INLINE unsigned long maxSyclThreadsPerBlock() const { return queue_stream()->maxSyclThreadsPerBlock(); } EIGEN_STRONG_INLINE cl::sycl::id<3> maxWorkItemSizes() const { return queue_stream()->maxWorkItemSizes(); } EIGEN_STRONG_INLINE unsigned long maxSyclThreadsPerMultiProcessor() const { // OpenCL doesnot have such concept return queue_stream()->maxSyclThreadsPerMultiProcessor(); } EIGEN_STRONG_INLINE size_t sharedMemPerBlock() const { return queue_stream()->sharedMemPerBlock(); } EIGEN_STRONG_INLINE size_t getNearestPowerOfTwoWorkGroupSize() const { return queue_stream()->getNearestPowerOfTwoWorkGroupSize(); } EIGEN_STRONG_INLINE size_t getPowerOfTwo(size_t val, bool roundUp) const { return queue_stream()->getPowerOfTwo(val, roundUp); } /// No need for sycl it should act the same as CPU version EIGEN_STRONG_INLINE int majorDeviceVersion() const { return queue_stream()->majorDeviceVersion(); } EIGEN_STRONG_INLINE void synchronize() const { queue_stream()->synchronize(); } EIGEN_STRONG_INLINE void async_synchronize( cl::sycl::event e = cl::sycl::event()) const { queue_stream()->async_synchronize(e); } EIGEN_STRONG_INLINE cl::sycl::event get_latest_event() const { return queue_stream()->get_latest_event(); } // This function checks if the runtime recorded an error for the // underlying stream device. EIGEN_STRONG_INLINE bool ok() const { return queue_stream()->ok(); } EIGEN_STRONG_INLINE bool has_local_memory() const { return queue_stream()->has_local_memory(); } EIGEN_STRONG_INLINE long max_buffer_size() const { return queue_stream()->max_buffer_size(); } EIGEN_STRONG_INLINE std::string getPlatformName() const { return queue_stream()->getPlatformName(); } EIGEN_STRONG_INLINE std::string getDeviceName() const { return queue_stream()->getDeviceName(); } EIGEN_STRONG_INLINE std::string getDeviceVendor() const { return queue_stream()->getDeviceVendor(); } template EIGEN_ALWAYS_INLINE void binary_kernel_launcher(T... var) const { queue_stream()->template binary_kernel_launcher( var...); } template EIGEN_ALWAYS_INLINE void unary_kernel_launcher(T... var) const { queue_stream()->template unary_kernel_launcher( var...); } template EIGEN_ALWAYS_INLINE void nullary_kernel_launcher(T... var) const { queue_stream()->template nullary_kernel_launcher( var...); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_SYCL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h0000644000176200001440000006664214562417221026402 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H #define EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H namespace Eigen { /** \class TensorImagePatch * \ingroup CXX11_Tensor_Module * * \brief Patch extraction specialized for image processing. * This assumes that the input has a least 3 dimensions ordered as follow: * 1st dimension: channels (of size d) * 2nd dimension: rows (of size r) * 3rd dimension: columns (of size c) * There can be additional dimensions such as time (for video) or batch (for * bulk processing after the first 3. * Calling the image patch code with patch_rows and patch_cols is equivalent * to calling the regular patch extraction code with parameters d, patch_rows, * patch_cols, and 1 for all the additional dimensions. */ namespace internal { template struct traits > : public traits { typedef typename internal::remove_const::type Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions + 1; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorImagePatchOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorImagePatchOp type; }; template struct ImagePatchCopyOp { typedef typename Self::Index Index; typedef typename Self::Scalar Scalar; typedef typename Self::Impl Impl; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run( const Self& self, const Index num_coeff_to_copy, const Index dst_index, Scalar* dst_data, const Index src_index) { const Impl& impl = self.impl(); for (Index i = 0; i < num_coeff_to_copy; ++i) { dst_data[dst_index + i] = impl.coeff(src_index + i); } } }; template struct ImagePatchCopyOp { typedef typename Self::Index Index; typedef typename Self::Scalar Scalar; typedef typename Self::Impl Impl; typedef typename packet_traits::type Packet; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run( const Self& self, const Index num_coeff_to_copy, const Index dst_index, Scalar* dst_data, const Index src_index) { const Impl& impl = self.impl(); const Index packet_size = internal::unpacket_traits::size; const Index vectorized_size = (num_coeff_to_copy / packet_size) * packet_size; for (Index i = 0; i < vectorized_size; i += packet_size) { Packet p = impl.template packet(src_index + i); internal::pstoret(dst_data + dst_index + i, p); } for (Index i = vectorized_size; i < num_coeff_to_copy; ++i) { dst_data[dst_index + i] = impl.coeff(src_index + i); } } }; template struct ImagePatchPaddingOp { typedef typename Self::Index Index; typedef typename Self::Scalar Scalar; typedef typename packet_traits::type Packet; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run( const Index num_coeff_to_pad, const Scalar padding_value, const Index dst_index, Scalar* dst_data) { const Index packet_size = internal::unpacket_traits::size; const Packet padded_packet = internal::pset1(padding_value); const Index vectorized_size = (num_coeff_to_pad / packet_size) * packet_size; for (Index i = 0; i < vectorized_size; i += packet_size) { internal::pstoret(dst_data + dst_index + i, padded_packet); } for (Index i = vectorized_size; i < num_coeff_to_pad; ++i) { dst_data[dst_index + i] = padding_value; } } }; } // end namespace internal template class TensorImagePatchOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorImagePatchOp(const XprType& expr, DenseIndex patch_rows, DenseIndex patch_cols, DenseIndex row_strides, DenseIndex col_strides, DenseIndex in_row_strides, DenseIndex in_col_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides, PaddingType padding_type, Scalar padding_value) : m_xpr(expr), m_patch_rows(patch_rows), m_patch_cols(patch_cols), m_row_strides(row_strides), m_col_strides(col_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides), m_padding_explicit(false), m_padding_top(0), m_padding_bottom(0), m_padding_left(0), m_padding_right(0), m_padding_type(padding_type), m_padding_value(padding_value) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorImagePatchOp(const XprType& expr, DenseIndex patch_rows, DenseIndex patch_cols, DenseIndex row_strides, DenseIndex col_strides, DenseIndex in_row_strides, DenseIndex in_col_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides, DenseIndex padding_top, DenseIndex padding_bottom, DenseIndex padding_left, DenseIndex padding_right, Scalar padding_value) : m_xpr(expr), m_patch_rows(patch_rows), m_patch_cols(patch_cols), m_row_strides(row_strides), m_col_strides(col_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides), m_padding_explicit(true), m_padding_top(padding_top), m_padding_bottom(padding_bottom), m_padding_left(padding_left), m_padding_right(padding_right), m_padding_type(PADDING_VALID), m_padding_value(padding_value) {} EIGEN_DEVICE_FUNC DenseIndex patch_rows() const { return m_patch_rows; } EIGEN_DEVICE_FUNC DenseIndex patch_cols() const { return m_patch_cols; } EIGEN_DEVICE_FUNC DenseIndex row_strides() const { return m_row_strides; } EIGEN_DEVICE_FUNC DenseIndex col_strides() const { return m_col_strides; } EIGEN_DEVICE_FUNC DenseIndex in_row_strides() const { return m_in_row_strides; } EIGEN_DEVICE_FUNC DenseIndex in_col_strides() const { return m_in_col_strides; } EIGEN_DEVICE_FUNC DenseIndex row_inflate_strides() const { return m_row_inflate_strides; } EIGEN_DEVICE_FUNC DenseIndex col_inflate_strides() const { return m_col_inflate_strides; } EIGEN_DEVICE_FUNC bool padding_explicit() const { return m_padding_explicit; } EIGEN_DEVICE_FUNC DenseIndex padding_top() const { return m_padding_top; } EIGEN_DEVICE_FUNC DenseIndex padding_bottom() const { return m_padding_bottom; } EIGEN_DEVICE_FUNC DenseIndex padding_left() const { return m_padding_left; } EIGEN_DEVICE_FUNC DenseIndex padding_right() const { return m_padding_right; } EIGEN_DEVICE_FUNC PaddingType padding_type() const { return m_padding_type; } EIGEN_DEVICE_FUNC Scalar padding_value() const { return m_padding_value; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const DenseIndex m_patch_rows; const DenseIndex m_patch_cols; const DenseIndex m_row_strides; const DenseIndex m_col_strides; const DenseIndex m_in_row_strides; const DenseIndex m_in_col_strides; const DenseIndex m_row_inflate_strides; const DenseIndex m_col_inflate_strides; const bool m_padding_explicit; const DenseIndex m_padding_top; const DenseIndex m_padding_bottom; const DenseIndex m_padding_left; const DenseIndex m_padding_right; const PaddingType m_padding_type; const Scalar m_padding_value; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorImagePatchOp XprType; typedef typename XprType::Index Index; static const int NumInputDims = internal::array_size::Dimensions>::value; static const int NumDims = NumInputDims + 1; typedef DSizes Dimensions; typedef typename internal::remove_const::type Scalar; typedef TensorEvaluator, Device> Self; typedef TensorEvaluator Impl; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator( const XprType& op, const Device& device) : m_device(device), m_impl(op.expression(), device) { EIGEN_STATIC_ASSERT((NumDims >= 4), YOU_MADE_A_PROGRAMMING_MISTAKE); m_paddingValue = op.padding_value(); const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); // Caches a few variables. if (static_cast(Layout) == static_cast(ColMajor)) { m_inputDepth = input_dims[0]; m_inputRows = input_dims[1]; m_inputCols = input_dims[2]; } else { m_inputDepth = input_dims[NumInputDims-1]; m_inputRows = input_dims[NumInputDims-2]; m_inputCols = input_dims[NumInputDims-3]; } m_row_strides = op.row_strides(); m_col_strides = op.col_strides(); // Input strides and effective input/patch size m_in_row_strides = op.in_row_strides(); m_in_col_strides = op.in_col_strides(); m_row_inflate_strides = op.row_inflate_strides(); m_col_inflate_strides = op.col_inflate_strides(); // The "effective" input rows and input cols are the input rows and cols // after inflating them with zeros. // For examples, a 2x3 matrix with row_inflate_strides and // col_inflate_strides of 2 comes from: // A B C // D E F // // to a matrix is 3 x 5: // // A . B . C // . . . . . // D . E . F m_input_rows_eff = (m_inputRows - 1) * m_row_inflate_strides + 1; m_input_cols_eff = (m_inputCols - 1) * m_col_inflate_strides + 1; m_patch_rows_eff = op.patch_rows() + (op.patch_rows() - 1) * (m_in_row_strides - 1); m_patch_cols_eff = op.patch_cols() + (op.patch_cols() - 1) * (m_in_col_strides - 1); if (op.padding_explicit()) { m_outputRows = numext::ceil((m_input_rows_eff + op.padding_top() + op.padding_bottom() - m_patch_rows_eff + 1.f) / static_cast(m_row_strides)); m_outputCols = numext::ceil((m_input_cols_eff + op.padding_left() + op.padding_right() - m_patch_cols_eff + 1.f) / static_cast(m_col_strides)); m_rowPaddingTop = op.padding_top(); m_colPaddingLeft = op.padding_left(); } else { // Computing padding from the type switch (op.padding_type()) { case PADDING_VALID: m_outputRows = numext::ceil((m_input_rows_eff - m_patch_rows_eff + 1.f) / static_cast(m_row_strides)); m_outputCols = numext::ceil((m_input_cols_eff - m_patch_cols_eff + 1.f) / static_cast(m_col_strides)); // Calculate the padding m_rowPaddingTop = numext::maxi(0, ((m_outputRows - 1) * m_row_strides + m_patch_rows_eff - m_input_rows_eff) / 2); m_colPaddingLeft = numext::maxi(0, ((m_outputCols - 1) * m_col_strides + m_patch_cols_eff - m_input_cols_eff) / 2); break; case PADDING_SAME: m_outputRows = numext::ceil(m_input_rows_eff / static_cast(m_row_strides)); m_outputCols = numext::ceil(m_input_cols_eff / static_cast(m_col_strides)); // Calculate the padding m_rowPaddingTop = ((m_outputRows - 1) * m_row_strides + m_patch_rows_eff - m_input_rows_eff) / 2; m_colPaddingLeft = ((m_outputCols - 1) * m_col_strides + m_patch_cols_eff - m_input_cols_eff) / 2; // The padding size calculation for PADDING_SAME has been updated to // be consistent with how TensorFlow extracts its paddings. m_rowPaddingTop = numext::maxi(0, m_rowPaddingTop); m_colPaddingLeft = numext::maxi(0, m_colPaddingLeft); break; default: eigen_assert(false && "unexpected padding"); m_outputCols=0; // silence the uninitialised warning; m_outputRows=0; //// silence the uninitialised warning; } } eigen_assert(m_outputRows > 0); eigen_assert(m_outputCols > 0); // Dimensions for result of extraction. if (static_cast(Layout) == static_cast(ColMajor)) { // ColMajor // 0: depth // 1: patch_rows // 2: patch_cols // 3: number of patches // 4 and beyond: anything else (such as batch). m_dimensions[0] = input_dims[0]; m_dimensions[1] = op.patch_rows(); m_dimensions[2] = op.patch_cols(); m_dimensions[3] = m_outputRows * m_outputCols; for (int i = 4; i < NumDims; ++i) { m_dimensions[i] = input_dims[i-1]; } } else { // RowMajor // NumDims-1: depth // NumDims-2: patch_rows // NumDims-3: patch_cols // NumDims-4: number of patches // NumDims-5 and beyond: anything else (such as batch). m_dimensions[NumDims-1] = input_dims[NumInputDims-1]; m_dimensions[NumDims-2] = op.patch_rows(); m_dimensions[NumDims-3] = op.patch_cols(); m_dimensions[NumDims-4] = m_outputRows * m_outputCols; for (int i = NumDims-5; i >= 0; --i) { m_dimensions[i] = input_dims[i]; } } // Strides for moving the patch in various dimensions. if (static_cast(Layout) == static_cast(ColMajor)) { m_colStride = m_dimensions[1]; m_patchStride = m_colStride * m_dimensions[2] * m_dimensions[0]; m_otherStride = m_patchStride * m_dimensions[3]; } else { m_colStride = m_dimensions[NumDims-2]; m_patchStride = m_colStride * m_dimensions[NumDims-3] * m_dimensions[NumDims-1]; m_otherStride = m_patchStride * m_dimensions[NumDims-4]; } // Strides for navigating through the input tensor. m_rowInputStride = m_inputDepth; m_colInputStride = m_inputDepth * m_inputRows; m_patchInputStride = m_inputDepth * m_inputRows * m_inputCols; // Fast representations of different variables. m_fastOtherStride = internal::TensorIntDivisor(m_otherStride); m_fastPatchStride = internal::TensorIntDivisor(m_patchStride); m_fastColStride = internal::TensorIntDivisor(m_colStride); m_fastInflateRowStride = internal::TensorIntDivisor(m_row_inflate_strides); m_fastInflateColStride = internal::TensorIntDivisor(m_col_inflate_strides); m_fastInputColsEff = internal::TensorIntDivisor(m_input_cols_eff); // Number of patches in the width dimension. m_fastOutputRows = internal::TensorIntDivisor(m_outputRows); if (static_cast(Layout) == static_cast(ColMajor)) { m_fastOutputDepth = internal::TensorIntDivisor(m_dimensions[0]); } else { m_fastOutputDepth = internal::TensorIntDivisor(m_dimensions[NumDims-1]); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { // Patch index corresponding to the passed in index. const Index patchIndex = index / m_fastPatchStride; // Find the offset of the element wrt the location of the first element. const Index patchOffset = (index - patchIndex * m_patchStride) / m_fastOutputDepth; // Other ways to index this element. const Index otherIndex = (NumDims == 4) ? 0 : index / m_fastOtherStride; const Index patch2DIndex = (NumDims == 4) ? patchIndex : (index - otherIndex * m_otherStride) / m_fastPatchStride; // Calculate col index in the input original tensor. const Index colIndex = patch2DIndex / m_fastOutputRows; const Index colOffset = patchOffset / m_fastColStride; const Index inputCol = colIndex * m_col_strides + colOffset * m_in_col_strides - m_colPaddingLeft; const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInflateColStride) : 0); if (inputCol < 0 || inputCol >= m_input_cols_eff || ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides))) { return Scalar(m_paddingValue); } // Calculate row index in the original input tensor. const Index rowIndex = patch2DIndex - colIndex * m_outputRows; const Index rowOffset = patchOffset - colOffset * m_colStride; const Index inputRow = rowIndex * m_row_strides + rowOffset * m_in_row_strides - m_rowPaddingTop; const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInflateRowStride) : 0); if (inputRow < 0 || inputRow >= m_input_rows_eff || ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) { return Scalar(m_paddingValue); } const int depth_index = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - 1; const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index]; const Index inputIndex = depth + origInputRow * m_rowInputStride + origInputCol * m_colInputStride + otherIndex * m_patchInputStride; return m_impl.coeff(inputIndex); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); if (m_in_row_strides != 1 || m_in_col_strides != 1 || m_row_inflate_strides != 1 || m_col_inflate_strides != 1) { return packetWithPossibleZero(index); } const Index indices[2] = {index, index + PacketSize - 1}; const Index patchIndex = indices[0] / m_fastPatchStride; if (patchIndex != indices[1] / m_fastPatchStride) { return packetWithPossibleZero(index); } const Index otherIndex = (NumDims == 4) ? 0 : indices[0] / m_fastOtherStride; eigen_assert(otherIndex == indices[1] / m_fastOtherStride); // Find the offset of the element wrt the location of the first element. const Index patchOffsets[2] = {(indices[0] - patchIndex * m_patchStride) / m_fastOutputDepth, (indices[1] - patchIndex * m_patchStride) / m_fastOutputDepth}; const Index patch2DIndex = (NumDims == 4) ? patchIndex : (indices[0] - otherIndex * m_otherStride) / m_fastPatchStride; eigen_assert(patch2DIndex == (indices[1] - otherIndex * m_otherStride) / m_fastPatchStride); const Index colIndex = patch2DIndex / m_fastOutputRows; const Index colOffsets[2] = {patchOffsets[0] / m_fastColStride, patchOffsets[1] / m_fastColStride}; // Calculate col indices in the original input tensor. const Index inputCols[2] = {colIndex * m_col_strides + colOffsets[0] - m_colPaddingLeft, colIndex * m_col_strides + colOffsets[1] - m_colPaddingLeft}; if (inputCols[1] < 0 || inputCols[0] >= m_inputCols) { return internal::pset1(Scalar(m_paddingValue)); } if (inputCols[0] == inputCols[1]) { const Index rowIndex = patch2DIndex - colIndex * m_outputRows; const Index rowOffsets[2] = {patchOffsets[0] - colOffsets[0]*m_colStride, patchOffsets[1] - colOffsets[1]*m_colStride}; eigen_assert(rowOffsets[0] <= rowOffsets[1]); // Calculate col indices in the original input tensor. const Index inputRows[2] = {rowIndex * m_row_strides + rowOffsets[0] - m_rowPaddingTop, rowIndex * m_row_strides + rowOffsets[1] - m_rowPaddingTop}; if (inputRows[1] < 0 || inputRows[0] >= m_inputRows) { return internal::pset1(Scalar(m_paddingValue)); } if (inputRows[0] >= 0 && inputRows[1] < m_inputRows) { // no padding const int depth_index = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - 1; const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index]; const Index inputIndex = depth + inputRows[0] * m_rowInputStride + inputCols[0] * m_colInputStride + otherIndex * m_patchInputStride; return m_impl.template packet(inputIndex); } } return packetWithPossibleZero(index); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorEvaluator& impl() const { return m_impl; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowPaddingTop() const { return m_rowPaddingTop; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colPaddingLeft() const { return m_colPaddingLeft; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outputRows() const { return m_outputRows; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outputCols() const { return m_outputCols; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userRowStride() const { return m_row_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userColStride() const { return m_col_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userInRowStride() const { return m_in_row_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userInColStride() const { return m_in_col_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowInflateStride() const { return m_row_inflate_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colInflateStride() const { return m_col_inflate_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // We conservatively estimate the cost for the code path where the computed // index is inside the original image and // TensorEvaluator::CoordAccess is false. const double compute_cost = 3 * TensorOpCost::DivCost() + 6 * TensorOpCost::MulCost() + 8 * TensorOpCost::MulCost(); return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } Dimensions m_dimensions; Index m_otherStride; Index m_patchStride; Index m_colStride; Index m_row_strides; Index m_col_strides; Index m_in_row_strides; Index m_in_col_strides; Index m_row_inflate_strides; Index m_col_inflate_strides; Index m_input_rows_eff; Index m_input_cols_eff; Index m_patch_rows_eff; Index m_patch_cols_eff; internal::TensorIntDivisor m_fastOtherStride; internal::TensorIntDivisor m_fastPatchStride; internal::TensorIntDivisor m_fastColStride; internal::TensorIntDivisor m_fastInflateRowStride; internal::TensorIntDivisor m_fastInflateColStride; internal::TensorIntDivisor m_fastInputColsEff; Index m_rowInputStride; Index m_colInputStride; Index m_patchInputStride; Index m_inputDepth; Index m_inputRows; Index m_inputCols; Index m_outputRows; Index m_outputCols; Index m_rowPaddingTop; Index m_colPaddingLeft; internal::TensorIntDivisor m_fastOutputRows; internal::TensorIntDivisor m_fastOutputDepth; Scalar m_paddingValue; const Device EIGEN_DEVICE_REF m_device; TensorEvaluator m_impl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMapper.h0000644000176200001440000005604214562417221030021 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPER_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPER_H namespace Eigen { namespace internal { enum { Rhs = 0, Lhs = 1 }; /* * Implementation of the Eigen blas_data_mapper class for tensors. */ /// The make pointer class is used by sycl in order to build the mapper class on the device. For other platform the default make pointer is used which /// is scalar * for CoeffLoader. template class MakePointer_ = MakePointer> struct CoeffLoader; template class MakePointer_ = MakePointer> class BaseTensorContractionMapper; template class MakePointer_> struct CoeffLoader { enum { DirectOffsets = false }; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffLoader(const Tensor& tensor) : m_tensor(tensor) { } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void offsetBuffer(typename Tensor::Index) { eigen_assert(false && "unsupported"); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const typename MakePointer_::Type data() const { eigen_assert(false && "unsupported"); return NULL; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename Tensor::Scalar coeff(typename Tensor::Index index) const { return m_tensor.coeff(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Tensor::PacketReturnType packet(typename Tensor::Index index) const { return m_tensor.template packet(index); } #ifdef EIGEN_USE_SYCL // The placeholder accessors require to be bound to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_tensor.bind(cgh); } #endif private: const Tensor m_tensor; }; template class MakePointer_> struct CoeffLoader { enum { DirectOffsets = true }; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffLoader(const Tensor& tensor) : m_data(tensor.data()) {} EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void offsetBuffer(typename Tensor::Index offset) { m_data += offset; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const typename MakePointer_::Type data() const { return m_data; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename Tensor::Scalar coeff(typename Tensor::Index index) const { return loadConstant(m_data+index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Tensor::PacketReturnType packet(typename Tensor::Index index) const { return internal::ploadt_ro(m_data + index); } #ifdef EIGEN_USE_SYCL // The placeholder accessors require to be bound to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_data.bind(cgh); } #endif private: typedef typename Tensor::Scalar Scalar; typename MakePointer_::Type m_data; }; template class MakePointer_ = MakePointer> class SimpleTensorContractionMapper { public: EIGEN_DEVICE_FUNC SimpleTensorContractionMapper(const Tensor& tensor, const nocontract_t& nocontract_strides, const nocontract_t& ij_strides, const contract_t& contract_strides, const contract_t& k_strides) : m_tensor(tensor), m_nocontract_strides(nocontract_strides), m_ij_strides(ij_strides), m_contract_strides(contract_strides), m_k_strides(k_strides) { } enum { DirectOffsets = CoeffLoader::DirectOffsets }; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void offsetBuffer(typename Tensor::Index offset) { m_tensor.offsetBuffer(offset); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void prefetch(Index /*i*/) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(Index row) const { // column major assumption return operator()(row, 0); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(Index row, Index col) const { return m_tensor.coeff(computeIndex(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index computeIndex(Index row, Index col) const { const bool left = (side == Lhs); EIGEN_UNUSED_VARIABLE(left); // annoying bug in g++8.1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85963 Index nocontract_val = left ? row : col; Index linidx = 0; EIGEN_UNROLL_LOOP for (int i = static_cast(array_size::value) - 1; i > 0; i--) { const Index idx = nocontract_val / m_ij_strides[i]; linidx += idx * m_nocontract_strides[i]; nocontract_val -= idx * m_ij_strides[i]; } if (array_size::value > array_size::value) { if (side == Lhs && inner_dim_contiguous) { eigen_assert(m_nocontract_strides[0] == 1); linidx += nocontract_val; } else { linidx += nocontract_val * m_nocontract_strides[0]; } } Index contract_val = left ? col : row; if(array_size::value > 0) { EIGEN_UNROLL_LOOP for (int i = static_cast(array_size::value) - 1; i > 0; i--) { const Index idx = contract_val / m_k_strides[i]; linidx += idx * m_contract_strides[i]; contract_val -= idx * m_k_strides[i]; } if (side == Rhs && inner_dim_contiguous) { eigen_assert(m_contract_strides[0] == 1); linidx += contract_val; } else { linidx += contract_val * m_contract_strides[0]; } } return linidx; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE IndexPair computeIndexPair(Index row, Index col, const Index distance) const { const bool left = (side == Lhs); EIGEN_UNUSED_VARIABLE(left); // annoying bug in g++8.1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85963 Index nocontract_val[2] = {left ? row : col, left ? row + distance : col}; Index linidx[2] = {0, 0}; if (array_size::value > array_size::value) { EIGEN_UNROLL_LOOP for (int i = static_cast(array_size::value) - 1; i > 0; i--) { const Index idx0 = nocontract_val[0] / m_ij_strides[i]; const Index idx1 = nocontract_val[1] / m_ij_strides[i]; linidx[0] += idx0 * m_nocontract_strides[i]; linidx[1] += idx1 * m_nocontract_strides[i]; nocontract_val[0] -= idx0 * m_ij_strides[i]; nocontract_val[1] -= idx1 * m_ij_strides[i]; } if (side == Lhs && inner_dim_contiguous) { eigen_assert(m_nocontract_strides[0] == 1); linidx[0] += nocontract_val[0]; linidx[1] += nocontract_val[1]; } else { linidx[0] += nocontract_val[0] * m_nocontract_strides[0]; linidx[1] += nocontract_val[1] * m_nocontract_strides[0]; } } Index contract_val[2] = {left ? col : row, left ? col : row + distance}; if (array_size::value> 0) { EIGEN_UNROLL_LOOP for (int i = static_cast(array_size::value) - 1; i > 0; i--) { const Index idx0 = contract_val[0] / m_k_strides[i]; const Index idx1 = contract_val[1] / m_k_strides[i]; linidx[0] += idx0 * m_contract_strides[i]; linidx[1] += idx1 * m_contract_strides[i]; contract_val[0] -= idx0 * m_k_strides[i]; contract_val[1] -= idx1 * m_k_strides[i]; } if (side == Rhs && inner_dim_contiguous) { eigen_assert(m_contract_strides[0] == 1); linidx[0] += contract_val[0]; linidx[1] += contract_val[1]; } else { linidx[0] += contract_val[0] * m_contract_strides[0]; linidx[1] += contract_val[1] * m_contract_strides[0]; } } return IndexPair(linidx[0], linidx[1]); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Index firstAligned(Index size) const { // Only claim alignment when we can compute the actual stride (ie when we're // dealing with the lhs with inner_dim_contiguous. This is because the // matrix-vector product relies on the stride when dealing with aligned inputs. return (Alignment == Aligned) && (side == Lhs) && inner_dim_contiguous ? 0 : size; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Index stride() const { return ((side == Lhs) && inner_dim_contiguous && array_size::value > 0) ? m_contract_strides[0] : 1; } #ifdef EIGEN_USE_SYCL // The placeholder accessors require to be bound to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_tensor.bind(cgh); } #endif const CoeffLoader& tensor() const { return m_tensor; } const nocontract_t& nocontract_strides() const { return m_nocontract_strides; } const nocontract_t& ij_strides() const { return m_ij_strides; } const contract_t& contract_strides() const { return m_contract_strides; } const contract_t& k_strides() const { return m_k_strides; } protected: CoeffLoader m_tensor; const nocontract_t m_nocontract_strides; const nocontract_t m_ij_strides; const contract_t m_contract_strides; const contract_t m_k_strides; }; template class MakePointer_> class BaseTensorContractionMapper : public SimpleTensorContractionMapper { public: typedef SimpleTensorContractionMapper ParentMapper; EIGEN_DEVICE_FUNC BaseTensorContractionMapper(const Tensor& tensor, const nocontract_t& nocontract_strides, const nocontract_t& ij_strides, const contract_t& contract_strides, const contract_t& k_strides) : ParentMapper(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::enable_if::size==packet_size,PacketT>::type load(Index i, Index j) const { // whole method makes column major assumption // don't need to add offsets for now (because operator handles that) // current code assumes packet size must be a multiple of 2 EIGEN_STATIC_ASSERT(packet_size % 2 == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); if (Tensor::PacketAccess && inner_dim_contiguous && !inner_dim_reordered) { const Index index = this->computeIndex(i, j); eigen_assert(this->computeIndex(i+packet_size-1, j) == index + packet_size-1); return this->m_tensor.template packet(index); } const IndexPair indexPair = this->computeIndexPair(i, j, packet_size - 1); const Index first = indexPair.first; const Index lastIdx = indexPair.second; // We can always do optimized packet reads from left hand side right now, because // the vertical matrix dimension on the left hand side is never contracting. // On the right hand side we need to check if the contracting dimensions may have // been shuffled first. if (Tensor::PacketAccess && (side == Lhs || internal::array_size::value <= 1 || !inner_dim_reordered) && (lastIdx - first) == (packet_size - 1)) { return this->m_tensor.template packet(first); } EIGEN_ALIGN_MAX Scalar data[packet_size]; data[0] = this->m_tensor.coeff(first); EIGEN_UNROLL_LOOP for (Index k = 1; k < packet_size - 1; k += 2) { const IndexPair internal_pair = this->computeIndexPair(i + k, j, 1); data[k] = this->m_tensor.coeff(internal_pair.first); data[k + 1] = this->m_tensor.coeff(internal_pair.second); } data[packet_size - 1] = this->m_tensor.coeff(lastIdx); return pload(data); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::enable_if::size!=packet_size,PacketT>::type load(Index i, Index j) const { const Index requested_packet_size = internal::unpacket_traits::size; EIGEN_ALIGN_MAX Scalar data[requested_packet_size]; const IndexPair indexPair = this->computeIndexPair(i, j, requested_packet_size - 1); const Index first = indexPair.first; const Index lastIdx = indexPair.second; data[0] = this->m_tensor.coeff(first); for (Index k = 1; k < requested_packet_size - 1; k += 2) { const IndexPair internal_pair = this->computeIndexPair(i + k, j, 1); data[k] = this->m_tensor.coeff(internal_pair.first); data[k + 1] = this->m_tensor.coeff(internal_pair.second); } data[requested_packet_size - 1] = this->m_tensor.coeff(lastIdx); return pload(data); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketT loadPacket(Index i, Index j) const { return this->load(i,j); } }; template class MakePointer_> class BaseTensorContractionMapper : public SimpleTensorContractionMapper { public: typedef SimpleTensorContractionMapper ParentMapper; EIGEN_DEVICE_FUNC BaseTensorContractionMapper(const Tensor& tensor, const nocontract_t& nocontract_strides, const nocontract_t& ij_strides, const contract_t& contract_strides, const contract_t& k_strides) : ParentMapper(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketT loadPacket(Index i, Index j) const { EIGEN_ALIGN_MAX Scalar data[1]; data[0] = this->m_tensor.coeff(this->computeIndex(i, j)); return pload(data); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketT load(Index i, Index j) const { EIGEN_ALIGN_MAX Scalar data[1]; data[0] = this->m_tensor.coeff(this->computeIndex(i, j)); return pload(data); } }; template class MakePointer_=MakePointer> class TensorContractionSubMapper { public: typedef BaseTensorContractionMapper ParentMapper; typedef TensorContractionSubMapper Self; typedef Self LinearMapper; enum { // We can use direct offsets iff the parent mapper supports then and we can compute the strides. // TODO: we should also enable direct offsets for the Rhs case. UseDirectOffsets = ParentMapper::DirectOffsets && (side == Lhs) && inner_dim_contiguous && (array_size::value > 0) }; EIGEN_DEVICE_FUNC TensorContractionSubMapper(const ParentMapper& base_mapper, Index vert_offset, Index horiz_offset) : m_base_mapper(base_mapper), m_vert_offset(vert_offset), m_horiz_offset(horiz_offset) { // Bake the offsets into the buffer used by the base mapper whenever possible. This avoids the need to recompute // this offset every time we attempt to access a coefficient. if (UseDirectOffsets) { Index stride = m_base_mapper.stride(); m_base_mapper.offsetBuffer(vert_offset + horiz_offset * stride); } } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar operator()(Index i) const { if (UseDirectOffsets) { return m_base_mapper(i, 0); } return m_base_mapper(i + m_vert_offset, m_horiz_offset); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar operator()(Index i, Index j) const { if (UseDirectOffsets) { return m_base_mapper(i, j); } return m_base_mapper(i + m_vert_offset, j + m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT loadPacket(Index i) const { if (UseDirectOffsets) { return m_base_mapper.template loadPacket(i, 0); } return m_base_mapper.template loadPacket(i + m_vert_offset, m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT loadPacket(Index i, Index j) const { if (UseDirectOffsets) { return m_base_mapper.template loadPacket(i, j); } return m_base_mapper.template loadPacket(i + m_vert_offset, j + m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT loadPacket(Index i, Index j) const { if (UseDirectOffsets) { return m_base_mapper.template load(i, j); } return m_base_mapper.template loadPacket(i + m_vert_offset, j + m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacket(Index i, const PacketT& p) const { if (UseDirectOffsets) { m_base_mapper.storePacket(i, 0, p); } m_base_mapper.storePacket(i + m_vert_offset, m_horiz_offset, p); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE LinearMapper getLinearMapper(Index i, Index j) const { if (UseDirectOffsets) { return LinearMapper(m_base_mapper, i, j); } return LinearMapper(m_base_mapper, i + m_vert_offset, j + m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT load(Index i) const { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MADE_A_PROGRAMMING_MISTAKE); const int ActualAlignment = (AlignmentType == Aligned) && (Alignment == Aligned) ? Aligned : Unaligned; if (UseDirectOffsets) { return m_base_mapper.template loadPacket(i, 0); } return m_base_mapper.template loadPacket(i + m_vert_offset, m_horiz_offset); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool aligned(Index) const { return false; } #ifdef EIGEN_USE_SYCL // The placeholder accessors require to be bound to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_base_mapper.bind(cgh); } #endif const ParentMapper& base_mapper() const { return m_base_mapper; } Index vert_offset() const { return m_vert_offset; } Index horiz_offset() const { return m_horiz_offset; } private: ParentMapper m_base_mapper; const Index m_vert_offset; const Index m_horiz_offset; }; template class MakePointer_=MakePointer> class TensorContractionInputMapper : public BaseTensorContractionMapper { public: typedef Scalar_ Scalar; typedef BaseTensorContractionMapper Base; typedef TensorContractionSubMapper SubMapper; typedef SubMapper VectorMapper; EIGEN_DEVICE_FUNC TensorContractionInputMapper(const Tensor& tensor, const nocontract_t& nocontract_strides, const nocontract_t& ij_strides, const contract_t& contract_strides, const contract_t& k_strides) : Base(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE SubMapper getSubMapper(Index i, Index j) const { return SubMapper(*this, i, j); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE VectorMapper getVectorMapper(Index i, Index j) const { return VectorMapper(*this, i, j); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const CoeffLoader& get_tensor() const { return Base::m_tensor; } }; template struct TensorContractionInputMapperTrait; template class MakePointer_> struct TensorContractionInputMapperTrait > { typedef Tensor_ XprType; static const bool inner_dim_contiguous = inner_dim_contiguous_; static const bool inner_dim_reordered = inner_dim_reordered_; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPER_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h0000644000176200001440000003736314562417221025314 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_EXPR_H #define EIGEN_CXX11_TENSOR_TENSOR_EXPR_H namespace Eigen { /** \class TensorExpr * \ingroup CXX11_Tensor_Module * * \brief Tensor expression classes. * * The TensorCwiseNullaryOp class applies a nullary operators to an expression. * This is typically used to generate constants. * * The TensorCwiseUnaryOp class represents an expression where a unary operator * (e.g. cwiseSqrt) is applied to an expression. * * The TensorCwiseBinaryOp class represents an expression where a binary * operator (e.g. addition) is applied to a lhs and a rhs expression. * */ namespace internal { template struct traits > : traits { typedef traits XprTraits; typedef typename XprType::Scalar Scalar; typedef typename XprType::Nested XprTypeNested; typedef typename remove_reference::type _XprTypeNested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; enum { Flags = 0 }; }; } // end namespace internal template class TensorCwiseNullaryOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef TensorCwiseNullaryOp Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseNullaryOp(const XprType& xpr, const NullaryOp& func = NullaryOp()) : m_xpr(xpr), m_functor(func) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } EIGEN_DEVICE_FUNC const NullaryOp& functor() const { return m_functor; } protected: typename XprType::Nested m_xpr; const NullaryOp m_functor; }; namespace internal { template struct traits > : traits { // TODO(phli): Add InputScalar, InputPacket. Check references to // current Scalar/Packet to see if the intent is Input or Output. typedef typename result_of::type Scalar; typedef traits XprTraits; typedef typename XprType::Nested XprTypeNested; typedef typename remove_reference::type _XprTypeNested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename TypeConversion::type PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorCwiseUnaryOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorCwiseUnaryOp type; }; } // end namespace internal template class TensorCwiseUnaryOp : public TensorBase, ReadOnlyAccessors> { public: // TODO(phli): Add InputScalar, InputPacket. Check references to // current Scalar/Packet to see if the intent is Input or Output. typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef Scalar CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp()) : m_xpr(xpr), m_functor(func) {} EIGEN_DEVICE_FUNC const UnaryOp& functor() const { return m_functor; } /** \returns the nested expression */ EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const UnaryOp m_functor; }; namespace internal { template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs // are different. // TODO(phli): Add Lhs/RhsScalar, Lhs/RhsPacket. Check references to // current Scalar/Packet to see if the intent is Inputs or Output. typedef typename result_of< BinaryOp(typename LhsXprType::Scalar, typename RhsXprType::Scalar)>::type Scalar; typedef traits XprTraits; typedef typename promote_storage_type< typename traits::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type< typename traits::Index, typename traits::Index>::type Index; typedef typename LhsXprType::Nested LhsNested; typedef typename RhsXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename TypeConversion::val, typename traits::PointerType, typename traits::PointerType>::type >::type PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorCwiseBinaryOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorCwiseBinaryOp type; }; } // end namespace internal template class TensorCwiseBinaryOp : public TensorBase, ReadOnlyAccessors> { public: // TODO(phli): Add Lhs/RhsScalar, Lhs/RhsPacket. Check references to // current Scalar/Packet to see if the intent is Inputs or Output. typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef Scalar CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseBinaryOp(const LhsXprType& lhs, const RhsXprType& rhs, const BinaryOp& func = BinaryOp()) : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_functor(func) {} EIGEN_DEVICE_FUNC const BinaryOp& functor() const { return m_functor; } /** \returns the nested expressions */ EIGEN_DEVICE_FUNC const typename internal::remove_all::type& lhsExpression() const { return m_lhs_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& rhsExpression() const { return m_rhs_xpr; } protected: typename LhsXprType::Nested m_lhs_xpr; typename RhsXprType::Nested m_rhs_xpr; const BinaryOp m_functor; }; namespace internal { template struct traits > { // Type promotion to handle the case where the types of the args are different. typedef typename result_of< TernaryOp(typename Arg1XprType::Scalar, typename Arg2XprType::Scalar, typename Arg3XprType::Scalar)>::type Scalar; typedef traits XprTraits; typedef typename traits::StorageKind StorageKind; typedef typename traits::Index Index; typedef typename Arg1XprType::Nested Arg1Nested; typedef typename Arg2XprType::Nested Arg2Nested; typedef typename Arg3XprType::Nested Arg3Nested; typedef typename remove_reference::type _Arg1Nested; typedef typename remove_reference::type _Arg2Nested; typedef typename remove_reference::type _Arg3Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename TypeConversion::val, typename traits::PointerType, typename traits::PointerType>::type >::type PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorCwiseTernaryOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorCwiseTernaryOp type; }; } // end namespace internal template class TensorCwiseTernaryOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef Scalar CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseTernaryOp(const Arg1XprType& arg1, const Arg2XprType& arg2, const Arg3XprType& arg3, const TernaryOp& func = TernaryOp()) : m_arg1_xpr(arg1), m_arg2_xpr(arg2), m_arg3_xpr(arg3), m_functor(func) {} EIGEN_DEVICE_FUNC const TernaryOp& functor() const { return m_functor; } /** \returns the nested expressions */ EIGEN_DEVICE_FUNC const typename internal::remove_all::type& arg1Expression() const { return m_arg1_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& arg2Expression() const { return m_arg2_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& arg3Expression() const { return m_arg3_xpr; } protected: typename Arg1XprType::Nested m_arg1_xpr; typename Arg2XprType::Nested m_arg2_xpr; typename Arg3XprType::Nested m_arg3_xpr; const TernaryOp m_functor; }; namespace internal { template struct traits > : traits { typedef typename traits::Scalar Scalar; typedef traits XprTraits; typedef typename promote_storage_type::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename IfXprType::Nested IfNested; typedef typename ThenXprType::Nested ThenNested; typedef typename ElseXprType::Nested ElseNested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename conditional::val, typename traits::PointerType, typename traits::PointerType>::type PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorSelectOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorSelectOp type; }; } // end namespace internal template class TensorSelectOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::promote_storage_type::ret CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC TensorSelectOp(const IfXprType& a_condition, const ThenXprType& a_then, const ElseXprType& a_else) : m_condition(a_condition), m_then(a_then), m_else(a_else) { } EIGEN_DEVICE_FUNC const IfXprType& ifExpression() const { return m_condition; } EIGEN_DEVICE_FUNC const ThenXprType& thenExpression() const { return m_then; } EIGEN_DEVICE_FUNC const ElseXprType& elseExpression() const { return m_else; } protected: typename IfXprType::Nested m_condition; typename ThenXprType::Nested m_then; typename ElseXprType::Nested m_else; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_EXPR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h0000644000176200001440000011741714562417221027005 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_GPU_H #define EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_GPU_H namespace Eigen { namespace internal { #if defined(EIGEN_USE_GPU) && defined(EIGEN_GPUCC) // Full reducers for GPU, don't vectorize for now // Reducer function that enables multiple gpu thread to safely accumulate at the same // output address. It basically reads the current value of the output variable, and // attempts to update it with the new value. If in the meantime another gpu thread // updated the content of the output address it will try again. template __device__ EIGEN_ALWAYS_INLINE void atomicReduce(T* output, T accum, R& reducer) { #if (defined(EIGEN_HIP_DEVICE_COMPILE) && defined(__HIP_ARCH_HAS_WARP_SHUFFLE__)) || (EIGEN_CUDA_ARCH >= 300) if (sizeof(T) == 4) { unsigned int oldval = *reinterpret_cast(output); unsigned int newval = oldval; reducer.reduce(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } unsigned int readback; while ((readback = atomicCAS((unsigned int*)output, oldval, newval)) != oldval) { oldval = readback; newval = oldval; reducer.reduce(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } } } else if (sizeof(T) == 8) { unsigned long long oldval = *reinterpret_cast(output); unsigned long long newval = oldval; reducer.reduce(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } unsigned long long readback; while ((readback = atomicCAS((unsigned long long*)output, oldval, newval)) != oldval) { oldval = readback; newval = oldval; reducer.reduce(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } } } else { gpu_assert(0 && "Wordsize not supported"); } #else // EIGEN_CUDA_ARCH >= 300 gpu_assert(0 && "Shouldn't be called on unsupported device"); #endif // EIGEN_CUDA_ARCH >= 300 } // We extend atomicExch to support extra data types template __device__ inline Type atomicExchCustom(Type* address, Type val) { return atomicExch(address, val); } template <> __device__ inline double atomicExchCustom(double* address, double val) { unsigned long long int* address_as_ull = reinterpret_cast(address); return __longlong_as_double(atomicExch(address_as_ull, __double_as_longlong(val))); } #ifdef EIGEN_HAS_GPU_FP16 template __device__ inline void atomicReduce(half2* output, half2 accum, R& reducer) { unsigned int oldval = *reinterpret_cast(output); unsigned int newval = oldval; reducer.reducePacket(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } unsigned int readback; while ((readback = atomicCAS((unsigned int*)output, oldval, newval)) != oldval) { oldval = readback; newval = oldval; reducer.reducePacket(accum, reinterpret_cast(&newval)); if (newval == oldval) { return; } } } // reduction should be associative since reduction is not atomic in wide vector but atomic in half2 operations template __device__ inline void atomicReduce(Packet4h2* output, Packet4h2 accum, R& reducer) { half2* houtput=reinterpret_cast(output); half2* haccum=reinterpret_cast(&accum); for(int i=0;i<4;++i){ atomicReduce(houtput+i,*(haccum+i),reducer); } } #endif // EIGEN_HAS_GPU_FP16 template <> __device__ inline void atomicReduce(float* output, float accum, SumReducer&) { #if (defined(EIGEN_HIP_DEVICE_COMPILE) && defined(__HIP_ARCH_HAS_WARP_SHUFFLE__)) || (EIGEN_CUDA_ARCH >= 300) atomicAdd(output, accum); #else // EIGEN_CUDA_ARCH >= 300 gpu_assert(0 && "Shouldn't be called on unsupported device"); #endif // EIGEN_CUDA_ARCH >= 300 } template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ReductionInitKernel(const CoeffType val, Index num_preserved_coeffs, CoeffType* output) { const Index thread_id = blockIdx.x * blockDim.x + threadIdx.x; const Index num_threads = blockDim.x * gridDim.x; for (Index i = thread_id; i < num_preserved_coeffs; i += num_threads) { output[i] = val; } } template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void FullReductionKernel(Reducer reducer, const Self input, Index num_coeffs, typename Self::CoeffReturnType* output, unsigned int* semaphore) { #if (defined(EIGEN_HIP_DEVICE_COMPILE) && defined(__HIP_ARCH_HAS_WARP_SHUFFLE__)) || (EIGEN_CUDA_ARCH >= 300) // Initialize the output value const Index first_index = blockIdx.x * BlockSize * NumPerThread + threadIdx.x; if (gridDim.x == 1) { if (first_index == 0) { *output = reducer.initialize(); } } else { if (threadIdx.x == 0) { unsigned int block = atomicCAS(semaphore, 0u, 1u); if (block == 0) { // We're the first block to run, initialize the output value atomicExchCustom(output, reducer.initialize()); __threadfence(); atomicExch(semaphore, 2u); } else { // Wait for the first block to initialize the output value. // Use atomicCAS here to ensure that the reads aren't cached unsigned int val; do { val = atomicCAS(semaphore, 2u, 2u); } while (val < 2u); } } } __syncthreads(); eigen_assert(gridDim.x == 1 || *semaphore >= 2u); typename Self::CoeffReturnType accum = reducer.initialize(); Index max_iter = numext::mini(num_coeffs - first_index, NumPerThread*BlockSize); for (Index i = 0; i < max_iter; i+=BlockSize) { const Index index = first_index + i; eigen_assert(index < num_coeffs); typename Self::CoeffReturnType val = input.m_impl.coeff(index); reducer.reduce(val, &accum); } #pragma unroll for (int offset = warpSize/2; offset > 0; offset /= 2) { #if defined(EIGEN_HIPCC) // use std::is_floating_point to determine the type of reduced_val // This is needed because when Type == double, hipcc will give a "call to __shfl_down is ambguous" error // and list the float and int versions of __shfl_down as the candidate functions. if (std::is_floating_point::value) { reducer.reduce(__shfl_down(static_cast(accum), offset, warpSize), &accum); } else { reducer.reduce(__shfl_down(static_cast(accum), offset, warpSize), &accum); } #elif defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000 reducer.reduce(__shfl_down(accum, offset, warpSize), &accum); #else reducer.reduce(__shfl_down_sync(0xFFFFFFFF, accum, offset, warpSize), &accum); #endif } if ((threadIdx.x & (warpSize - 1)) == 0) { atomicReduce(output, accum, reducer); } if (gridDim.x > 1 && threadIdx.x == 0) { // Let the last block reset the semaphore atomicInc(semaphore, gridDim.x + 1); #if defined(EIGEN_HIPCC) __threadfence_system(); #endif } #else // EIGEN_CUDA_ARCH >= 300 gpu_assert(0 && "Shouldn't be called on unsupported device"); #endif // EIGEN_CUDA_ARCH >= 300 } #ifdef EIGEN_HAS_GPU_FP16 template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ReductionInitFullReduxKernelHalfFloat(Reducer reducer, const Self input, Index num_coeffs, packet_traits::type* scratch) { eigen_assert(blockDim.x == 1); eigen_assert(gridDim.x == 1); typedef packet_traits::type packet_type; Index packet_remainder = num_coeffs % Index(unpacket_traits::size); if (packet_remainder != 0) { half2* h2scratch = reinterpret_cast(scratch); for (Index i = num_coeffs - packet_remainder; i + 2 <= num_coeffs; i += 2) { *h2scratch = __halves2half2(input.m_impl.coeff(i), input.m_impl.coeff(i + 1)); h2scratch++; } if ((num_coeffs & 1) != 0) { half lastCoeff = input.m_impl.coeff(num_coeffs - 1); *h2scratch = __halves2half2(lastCoeff, reducer.initialize()); } } else { *scratch = reducer.template initializePacket(); } } template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ReductionInitKernelHalfFloat(Reducer reducer, const Self input, Index num_coeffs, half* output) { const Index thread_id = blockIdx.x * blockDim.x + threadIdx.x; const Index num_threads = blockDim.x * gridDim.x; typedef typename packet_traits::type PacketType; const Index num_packets = num_coeffs / Index(unpacket_traits::size); PacketType* p_output = reinterpret_cast(output); for (Index i = thread_id; i < num_packets; i += num_threads) { p_output[i] = reducer.template initializePacket(); } Index packet_remainder = num_coeffs % Index(unpacket_traits::size); if (thread_id < packet_remainder) { output[num_coeffs - packet_remainder + thread_id] = reducer.initialize(); } } template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void FullReductionKernelHalfFloat(Reducer reducer, const Self input, Index num_coeffs, half* output, packet_traits::type* scratch) { typedef typename packet_traits::type PacketType; const int packet_width = unpacket_traits::size; eigen_assert(NumPerThread % packet_width == 0); const Index first_index = blockIdx.x * BlockSize * NumPerThread + packet_width * threadIdx.x; // Initialize the output value if it wasn't initialized by the ReductionInitKernel if (gridDim.x == 1) { if (first_index == 0) { int rem = num_coeffs % packet_width; if (rem != 0) { half2* p_scratch = reinterpret_cast(scratch); *scratch = reducer.template initializePacket(); for (int i = 0; i < rem / 2; i++) { *p_scratch = __halves2half2( input.m_impl.coeff(num_coeffs - packet_width + 2 * i), input.m_impl.coeff(num_coeffs - packet_width + 2 * i + 1)); p_scratch++; } if ((num_coeffs & 1) != 0) { half last = input.m_impl.coeff(num_coeffs - 1); *p_scratch = __halves2half2(last, reducer.initialize()); } } else { *scratch = reducer.template initializePacket(); } } __syncthreads(); } PacketType accum = reducer.template initializePacket(); const Index max_iter = numext::mini((num_coeffs - first_index) / packet_width, NumPerThread * BlockSize / packet_width); for (Index i = 0; i < max_iter; i += BlockSize) { const Index index = first_index + packet_width * i; eigen_assert(index + packet_width < num_coeffs); PacketType val = input.m_impl.template packet(index); reducer.reducePacket(val, &accum); } #pragma unroll for (int offset = warpSize/2; offset > 0; offset /= 2) { #if defined(EIGEN_HIPCC) PacketType r1; half2* hr = reinterpret_cast(&r1); half2* hacc = reinterpret_cast(&accum); for (int i = 0; i < packet_width / 2; i++) { // FIXME : remove this workaround once we have native half/half2 support for __shfl_down union { int i; half2 h; } wka_in, wka_out; wka_in.h = hacc[i]; wka_out.i = __shfl_down(wka_in.i, offset, warpSize); hr[i] = wka_out.h; } reducer.reducePacket(r1, &accum); #elif defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000 PacketType r1; half2* hr = reinterpret_cast(&r1); half2* hacc = reinterpret_cast(&accum); for (int i = 0; i < packet_width / 2; i++) { hr[i] = __shfl_down(hacc[i], offset, warpSize); } reducer.reducePacket(r1, &accum); #else PacketType r1; half2* hr = reinterpret_cast(&r1); half2* hacc = reinterpret_cast(&accum); for (int i = 0; i < packet_width / 2; i++) { hr[i] = __shfl_down_sync(0xFFFFFFFF, hacc[i], (unsigned)offset, warpSize); } reducer.reducePacket(r1, &accum); #endif } if ((threadIdx.x & (warpSize - 1)) == 0) { atomicReduce(scratch, accum, reducer); } __syncthreads(); half2* rv1 = reinterpret_cast(scratch); if (packet_width > 2) { reducer.reducePacket(rv1[2], rv1); reducer.reducePacket(rv1[3], rv1 + 1); reducer.reducePacket(rv1[1], rv1); } if (gridDim.x == 1) { if (first_index == 0) { half tmp = __low2half(*rv1); reducer.reduce(__high2half(*rv1), &tmp); *output = tmp; } } } template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ReductionCleanupKernelHalfFloat(Op reducer, half* output, packet_traits::type* scratch) { eigen_assert(threadIdx.x == 1); half2* pscratch = reinterpret_cast(scratch); half tmp = __float2half(0.f); typedef packet_traits::type packet_type; for (int i = 0; i < unpacket_traits::size; i += 2) { reducer.reduce(__low2half(*pscratch), &tmp); reducer.reduce(__high2half(*pscratch), &tmp); pscratch++; } *output = tmp; } #endif // EIGEN_HAS_GPU_FP16 template struct FullReductionLauncher { static void run(const Self&, Op&, const GpuDevice&, OutputType*, typename Self::Index) { gpu_assert(false && "Should only be called on doubles, floats and half floats"); } }; // Specialization for float and double template struct FullReductionLauncher< Self, Op, OutputType, PacketAccess, typename internal::enable_if< internal::is_same::value || internal::is_same::value, void>::type> { static void run(const Self& self, Op& reducer, const GpuDevice& device, OutputType* output, typename Self::Index num_coeffs) { typedef typename Self::Index Index; const int block_size = 256; const int num_per_thread = 128; const int num_blocks = divup(num_coeffs, block_size * num_per_thread); unsigned int* semaphore = NULL; if (num_blocks > 1) { semaphore = device.semaphore(); } LAUNCH_GPU_KERNEL((FullReductionKernel), num_blocks, block_size, 0, device, reducer, self, num_coeffs, output, semaphore); } }; #ifdef EIGEN_HAS_GPU_FP16 template struct FullReductionLauncher { static void run(const Self&, Op&, const GpuDevice&, half*, typename Self::Index) { gpu_assert(false && "Should not be called since there is no packet accessor"); } }; template struct FullReductionLauncher { static void run(const Self& self, Op& reducer, const GpuDevice& device, half* output, typename Self::Index num_coeffs) { typedef typename Self::Index Index; typedef typename packet_traits::type PacketType; const int block_size = 256; const int num_per_thread = 128; const int num_blocks = divup(num_coeffs, block_size * num_per_thread); PacketType* scratch = static_cast(device.scratchpad()); // half2* scratch = static_cast(device.scratchpad()); if (num_blocks > 1) { // We initialize the output and the scrathpad outside the reduction kernel when we can't be sure that there // won't be a race conditions between multiple thread blocks. LAUNCH_GPU_KERNEL((ReductionInitFullReduxKernelHalfFloat), 1, 1, 0, device, reducer, self, num_coeffs, scratch); } LAUNCH_GPU_KERNEL((FullReductionKernelHalfFloat), num_blocks, block_size, 0, device, reducer, self, num_coeffs, output, scratch); if (num_blocks > 1) { LAUNCH_GPU_KERNEL((ReductionCleanupKernelHalfFloat), 1, 1, 0, device, reducer, output, scratch); } } }; #endif // EIGEN_HAS_GPU_FP16 template struct FullReducer { // Unfortunately nvidia doesn't support well exotic types such as complex, // so reduce the scope of the optimized version of the code to the simple cases // of doubles, floats and half floats #ifdef EIGEN_HAS_GPU_FP16 static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful && (internal::is_same::value || internal::is_same::value || (internal::is_same::value && reducer_traits::PacketAccess)); #else // EIGEN_HAS_GPU_FP16 static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful && (internal::is_same::value || internal::is_same::value); #endif // EIGEN_HAS_GPU_FP16 template static void run(const Self& self, Op& reducer, const GpuDevice& device, OutputType* output) { gpu_assert(HasOptimizedImplementation && "Should only be called on doubles, floats or half floats"); const Index num_coeffs = array_prod(self.m_impl.dimensions()); // Don't crash when we're called with an input tensor of size 0. if (num_coeffs == 0) { return; } FullReductionLauncher::PacketAccess>::run(self, reducer, device, output, num_coeffs); } }; template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void InnerReductionKernel(Reducer reducer, const Self input, Index num_coeffs_to_reduce, Index num_preserved_coeffs, typename Self::CoeffReturnType* output) { #if (defined(EIGEN_HIP_DEVICE_COMPILE) && defined(__HIP_ARCH_HAS_WARP_SHUFFLE__)) || (EIGEN_CUDA_ARCH >= 300) typedef typename Self::CoeffReturnType Type; eigen_assert(blockDim.y == 1); eigen_assert(blockDim.z == 1); eigen_assert(gridDim.y == 1); eigen_assert(gridDim.z == 1); const int unroll_times = 16; eigen_assert(NumPerThread % unroll_times == 0); const Index input_col_blocks = divup(num_coeffs_to_reduce, blockDim.x * NumPerThread); const Index num_input_blocks = input_col_blocks * num_preserved_coeffs; const Index num_threads = blockDim.x * gridDim.x; const Index thread_id = blockIdx.x * blockDim.x + threadIdx.x; // Initialize the output values if they weren't initialized by the ReductionInitKernel if (gridDim.x == 1) { for (Index i = thread_id; i < num_preserved_coeffs; i += num_threads) { output[i] = reducer.initialize(); } __syncthreads(); } for (Index i = blockIdx.x; i < num_input_blocks; i += gridDim.x) { const Index row = i / input_col_blocks; if (row < num_preserved_coeffs) { const Index col_block = i % input_col_blocks; const Index col_begin = col_block * blockDim.x * NumPerThread + threadIdx.x; Type reduced_val = reducer.initialize(); for (Index j = 0; j < NumPerThread; j += unroll_times) { const Index last_col = col_begin + blockDim.x * (j + unroll_times - 1); if (last_col >= num_coeffs_to_reduce) { for (Index col = col_begin + blockDim.x * j; col < num_coeffs_to_reduce; col += blockDim.x) { const Type val = input.m_impl.coeff(row * num_coeffs_to_reduce + col); reducer.reduce(val, &reduced_val); } break; } else { // Faster version of the loop with no branches after unrolling. #pragma unroll for (int k = 0; k < unroll_times; ++k) { const Index col = col_begin + blockDim.x * (j + k); reducer.reduce(input.m_impl.coeff(row * num_coeffs_to_reduce + col), &reduced_val); } } } #pragma unroll for (int offset = warpSize/2; offset > 0; offset /= 2) { #if defined(EIGEN_HIPCC) // use std::is_floating_point to determine the type of reduced_val // This is needed because when Type == double, hipcc will give a "call to __shfl_down is ambguous" error // and list the float and int versions of __shfl_down as the candidate functions. if (std::is_floating_point::value) { reducer.reduce(__shfl_down(static_cast(reduced_val), offset), &reduced_val); } else { reducer.reduce(__shfl_down(static_cast(reduced_val), offset), &reduced_val); } #elif defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000 reducer.reduce(__shfl_down(reduced_val, offset), &reduced_val); #else reducer.reduce(__shfl_down_sync(0xFFFFFFFF, reduced_val, offset), &reduced_val); #endif } if ((threadIdx.x & (warpSize - 1)) == 0) { atomicReduce(&(output[row]), reduced_val, reducer); } } } #else // EIGEN_CUDA_ARCH >= 300 gpu_assert(0 && "Shouldn't be called on unsupported device"); #endif // EIGEN_CUDA_ARCH >= 300 } #ifdef EIGEN_HAS_GPU_FP16 template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void InnerReductionKernelHalfFloat(Reducer reducer, const Self input, Index num_coeffs_to_reduce, Index num_preserved_coeffs, half* output) { eigen_assert(blockDim.y == 1); eigen_assert(blockDim.z == 1); eigen_assert(gridDim.y == 1); eigen_assert(gridDim.z == 1); typedef typename packet_traits::type PacketType; const int packet_width = unpacket_traits::size; const int unroll_times = 16 / packet_width; eigen_assert(NumPerThread % unroll_times == 0); eigen_assert(unroll_times % 2 == 0); const Index input_col_blocks = divup(num_coeffs_to_reduce, blockDim.x * NumPerThread * 2); const Index num_input_blocks = divup(input_col_blocks * num_preserved_coeffs, 2); const Index num_threads = blockDim.x * gridDim.x; const Index thread_id = blockIdx.x * blockDim.x + threadIdx.x; // Initialize the output values if they weren't initialized by the ReductionInitKernel if (gridDim.x == 1) { Index i = packet_width * thread_id; for (; i + packet_width <= num_preserved_coeffs; i += packet_width * num_threads) { PacketType* poutput = reinterpret_cast(output + i); *poutput = reducer.template initializePacket(); } if (i < num_preserved_coeffs) { output[i] = reducer.initialize(); } __syncthreads(); } for (Index i = blockIdx.x; i < num_input_blocks; i += gridDim.x) { const Index row = 2 * (i / input_col_blocks); // everybody takes 2 rows if (row + 1 < num_preserved_coeffs) { const Index col_block = i % input_col_blocks; const Index col_begin = packet_width * (col_block * blockDim.x * NumPerThread + threadIdx.x); PacketType reduced_val1 = reducer.template initializePacket(); PacketType reduced_val2 = reducer.template initializePacket(); for (Index j = 0; j < NumPerThread; j += unroll_times) { const Index last_col = col_begin + blockDim.x * (j + unroll_times - 1) * packet_width; if (last_col >= num_coeffs_to_reduce) { Index col = col_begin + blockDim.x * j; for (; col + packet_width <= num_coeffs_to_reduce; col += blockDim.x) { const PacketType val1 = input.m_impl.template packet( row * num_coeffs_to_reduce + col); reducer.reducePacket(val1, &reduced_val1); const PacketType val2 = input.m_impl.template packet( (row + 1) * num_coeffs_to_reduce + col); reducer.reducePacket(val2, &reduced_val2); } if (col < num_coeffs_to_reduce) { PacketType r1 = reducer.template initializePacket(); PacketType r2 = reducer.template initializePacket(); half2* hr1 = reinterpret_cast(&r1); half2* hr2 = reinterpret_cast(&r2); while (col + 1 < num_coeffs_to_reduce) { *hr1 = __halves2half2( input.m_impl.coeff(row * num_coeffs_to_reduce + col), input.m_impl.coeff(row * num_coeffs_to_reduce + col + 1)); *hr2 = __halves2half2( input.m_impl.coeff((row + 1) * num_coeffs_to_reduce + col), input.m_impl.coeff((row + 1) * num_coeffs_to_reduce + col + 1)); hr1++; hr2++; col += 2; } if (col < num_coeffs_to_reduce) { // Peel; const half last1 = input.m_impl.coeff(row * num_coeffs_to_reduce + col); *hr1 = __halves2half2(last1, reducer.initialize()); const half last2 = input.m_impl.coeff((row + 1) * num_coeffs_to_reduce + col); *hr2 = __halves2half2(last2, reducer.initialize()); } reducer.reducePacket(r1, &reduced_val1); reducer.reducePacket(r2, &reduced_val2); } break; } else { // Faster version of the loop with no branches after unrolling. #pragma unroll for (int k = 0; k < unroll_times; ++k) { const Index col = col_begin + blockDim.x * (j + k) * packet_width; reducer.reducePacket(input.m_impl.template packet( row * num_coeffs_to_reduce + col), &reduced_val1); reducer.reducePacket(input.m_impl.template packet( (row + 1) * num_coeffs_to_reduce + col), &reduced_val2); } } } #pragma unroll for (int offset = warpSize/2; offset > 0; offset /= 2) { #if defined(EIGEN_HIPCC) PacketType r1; PacketType r2; half2* hr1 = reinterpret_cast(&r1); half2* hr2 = reinterpret_cast(&r2); half2* rv1 = reinterpret_cast(&reduced_val1); half2* rv2 = reinterpret_cast(&reduced_val2); for (int i = 0; i < packet_width / 2; i++) { // FIXME : remove this workaround once we have native half/half2 support for __shfl_down union { int i; half2 h; } wka_in1, wka_out1; wka_in1.h = rv1[i]; wka_out1.i = __shfl_down(wka_in1.i, offset, warpSize); hr1[i] = wka_out1.h; union { int i; half2 h; } wka_in2, wka_out2; wka_in2.h = rv2[i]; wka_out2.i = __shfl_down(wka_in2.i, offset, warpSize); hr2[i] = wka_out2.h; } reducer.reducePacket(r1, &reduced_val1); reducer.reducePacket(r2, &reduced_val2); #elif defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000 PacketType r1; PacketType r2; half2* hr1 = reinterpret_cast(&r1); half2* hr2 = reinterpret_cast(&r2); half2* rv1 = reinterpret_cast(&reduced_val1); half2* rv2 = reinterpret_cast(&reduced_val2); for (int i = 0; i < packet_width / 2; i++) { hr1[i] = __shfl_down(rv1[i], offset, warpSize); hr2[i] = __shfl_down(rv2[i], offset, warpSize); } reducer.reducePacket(r1, &reduced_val1); reducer.reducePacket(r2, &reduced_val2); #else PacketType r1; PacketType r2; half2* hr1 = reinterpret_cast(&r1); half2* hr2 = reinterpret_cast(&r2); half2* rr1 = reinterpret_cast(&reduced_val1); half2* rr2 = reinterpret_cast(&reduced_val2); for (int i = 0; i < packet_width / 2; i++) { hr1[i] = __shfl_down_sync(0xFFFFFFFF, rr1[i], (unsigned)offset, warpSize); hr2[i] = __shfl_down_sync(0xFFFFFFFF, rr2[i], (unsigned)offset, warpSize); } reducer.reducePacket(r1, &reduced_val1); reducer.reducePacket(r2, &reduced_val2); #endif } half2* rv1 = reinterpret_cast(&reduced_val1); half2* rv2 = reinterpret_cast(&reduced_val2); half2 val; if (packet_width > 2) { reducer.reducePacket(rv1[2], rv1); reducer.reducePacket(rv1[3], rv1 + 1); reducer.reducePacket(rv1[1], rv1); reducer.reducePacket(rv2[2], rv2); reducer.reducePacket(rv2[3], rv2 + 1); reducer.reducePacket(rv2[1], rv2); } half val1 = __low2half(*rv1); reducer.reduce(__high2half(*rv1), &val1); half val2 = __low2half(*rv2); reducer.reduce(__high2half(*rv2), &val2); val = __halves2half2(val1, val2); if ((threadIdx.x & (warpSize - 1)) == 0) { half* loc = output + row; atomicReduce((half2*)loc, val, reducer); } } } } #endif // EIGEN_HAS_GPU_FP16 template struct InnerReductionLauncher { static EIGEN_DEVICE_FUNC bool run(const Self&, Op&, const GpuDevice&, OutputType*, typename Self::Index, typename Self::Index) { gpu_assert(false && "Should only be called to reduce doubles, floats and half floats on a gpu device"); return true; } }; // Specialization for float and double template struct InnerReductionLauncher< Self, Op, OutputType, PacketAccess, typename internal::enable_if< internal::is_same::value || internal::is_same::value, void>::type> { static bool run(const Self& self, Op& reducer, const GpuDevice& device, OutputType* output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_preserved_vals) { typedef typename Self::Index Index; const Index num_coeffs = num_coeffs_to_reduce * num_preserved_vals; const int block_size = 256; const int num_per_thread = 128; const int dyn_blocks = divup(num_coeffs, block_size * num_per_thread); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / block_size; const int num_blocks = numext::mini(max_blocks, dyn_blocks); if (num_blocks > 1) { // We initialize the outputs outside the reduction kernel when we can't be sure that there // won't be a race conditions between multiple thread blocks. const int dyn_blocks = divup(num_preserved_vals, 1024); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / 1024; const int num_blocks = numext::mini(max_blocks, dyn_blocks); LAUNCH_GPU_KERNEL((ReductionInitKernel), num_blocks, 1024, 0, device, reducer.initialize(), num_preserved_vals, output); } LAUNCH_GPU_KERNEL((InnerReductionKernel), num_blocks, block_size, 0, device, reducer, self, num_coeffs_to_reduce, num_preserved_vals, output); return false; } }; #ifdef EIGEN_HAS_GPU_FP16 template struct InnerReductionLauncher { static bool run(const Self&, Op&, const GpuDevice&, half*, typename Self::Index, typename Self::Index) { gpu_assert(false && "Should not be called since there is no packet accessor"); return true; } }; template struct InnerReductionLauncher { static bool run(const Self& self, Op& reducer, const GpuDevice& device, half* output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_preserved_vals) { typedef typename Self::Index Index; if (num_preserved_vals % 2 != 0) { // Not supported yet, revert to the slower code path return true; } const Index num_coeffs = num_coeffs_to_reduce * num_preserved_vals; const int block_size = /*256*/128; const int num_per_thread = /*128*/64; const int dyn_blocks = divup(num_coeffs, block_size * num_per_thread); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / block_size; const int num_blocks = numext::mini(max_blocks, dyn_blocks); if (num_blocks > 1) { // We initialize the outputs outside the reduction kernel when we can't be sure that there // won't be a race conditions between multiple thread blocks. LAUNCH_GPU_KERNEL((ReductionInitKernelHalfFloat), 1, 1, 0, device, reducer, self, num_preserved_vals, output); } LAUNCH_GPU_KERNEL((InnerReductionKernelHalfFloat), num_blocks, block_size, 0, device, reducer, self, num_coeffs_to_reduce, num_preserved_vals, output); return false; } }; #endif // EIGEN_HAS_GPU_FP16 template struct InnerReducer { // Unfortunately nvidia doesn't support well exotic types such as complex, // so reduce the scope of the optimized version of the code to the simple case // of floats and half floats. #ifdef EIGEN_HAS_GPU_FP16 static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful && (internal::is_same::value || internal::is_same::value || (internal::is_same::value && reducer_traits::PacketAccess)); #else // EIGEN_HAS_GPU_FP16 static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful && (internal::is_same::value || internal::is_same::value); #endif // EIGEN_HAS_GPU_FP16 template static bool run(const Self& self, Op& reducer, const GpuDevice& device, OutputType* output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_preserved_vals) { gpu_assert(HasOptimizedImplementation && "Should only be called on doubles, floats or half floats"); const Index num_coeffs = array_prod(self.m_impl.dimensions()); // Don't crash when we're called with an input tensor of size 0. if (num_coeffs == 0) { return true; } // It's faster to use the usual code. if (num_coeffs_to_reduce <= 128) { return true; } return InnerReductionLauncher::PacketAccess>::run(self, reducer, device, output, num_coeffs_to_reduce, num_preserved_vals); } }; template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void OuterReductionKernel(Reducer reducer, const Self input, Index num_coeffs_to_reduce, Index num_preserved_coeffs, typename Self::CoeffReturnType* output) { const Index num_threads = blockDim.x * gridDim.x; const Index thread_id = blockIdx.x * blockDim.x + threadIdx.x; // Initialize the output values if they weren't initialized by the ReductionInitKernel if (gridDim.x == 1) { for (Index i = thread_id; i < num_preserved_coeffs; i += num_threads) { output[i] = reducer.initialize(); } __syncthreads(); } // Do the reduction. const Index max_iter = num_preserved_coeffs * divup(num_coeffs_to_reduce, NumPerThread); for (Index i = thread_id; i < max_iter; i += num_threads) { const Index input_col = i % num_preserved_coeffs; const Index input_row = (i / num_preserved_coeffs) * NumPerThread; typename Self::CoeffReturnType reduced_val = reducer.initialize(); const Index max_row = numext::mini(input_row + NumPerThread, num_coeffs_to_reduce); for (Index j = input_row; j < max_row; j++) { typename Self::CoeffReturnType val = input.m_impl.coeff(j * num_preserved_coeffs + input_col); reducer.reduce(val, &reduced_val); } atomicReduce(&(output[input_col]), reduced_val, reducer); } } template struct OuterReducer { // Unfortunately nvidia doesn't support well exotic types such as complex, // so reduce the scope of the optimized version of the code to the simple case // of floats. static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful && (internal::is_same::value || internal::is_same::value); template static #if !defined(EIGEN_HIPCC) // FIXME : leaving this EIGEN_DEVICE_FUNC in, results in the following runtime error // (in the cxx11_tensor_reduction_gpu test) // // terminate called after throwing an instance of 'std::runtime_error' // what(): No device code available for function: _ZN5Eigen8internal20OuterReductionKernelIL... // // don't know why this happens (and why is it a runtime error instead of a compile time error) // // this will be fixed by HIP PR#457 EIGEN_DEVICE_FUNC #endif bool run(const Self&, Op&, const Device&, OutputType*, typename Self::Index, typename Self::Index) { gpu_assert(false && "Should only be called to reduce doubles or floats on a gpu device"); return true; } static bool run(const Self& self, Op& reducer, const GpuDevice& device, float* output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_preserved_vals) { typedef typename Self::Index Index; // It's faster to use the usual code. if (num_coeffs_to_reduce <= 32) { return true; } const Index num_coeffs = num_coeffs_to_reduce * num_preserved_vals; const int block_size = 256; const int num_per_thread = 16; const int dyn_blocks = divup(num_coeffs, block_size * num_per_thread); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / block_size; const int num_blocks = numext::mini(max_blocks, dyn_blocks); if (num_blocks > 1) { // We initialize the outputs in the reduction kernel itself when we don't have to worry // about race conditions between multiple thread blocks. const int dyn_blocks = divup(num_preserved_vals, 1024); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / 1024; const int num_blocks = numext::mini(max_blocks, dyn_blocks); LAUNCH_GPU_KERNEL((ReductionInitKernel), num_blocks, 1024, 0, device, reducer.initialize(), num_preserved_vals, output); } LAUNCH_GPU_KERNEL((OuterReductionKernel), num_blocks, block_size, 0, device, reducer, self, num_coeffs_to_reduce, num_preserved_vals, output); return false; } }; #endif // defined(EIGEN_USE_GPU) && defined(EIGEN_GPUCC) } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_GPU_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h0000644000176200001440000013705614562417221026715 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H #define EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H namespace Eigen { /** \class TensorConvolution * \ingroup CXX11_Tensor_Module * * \brief Tensor convolution class. * * */ namespace internal { template class IndexMapper { public: IndexMapper(const InputDims& input_dims, const array& kernel_dims, const array& indices) { array dimensions = input_dims; for (int i = 0; i < NumKernelDims; ++i) { const Index index = indices[i]; const Index input_dim = input_dims[index]; const Index kernel_dim = kernel_dims[i]; const Index result_dim = input_dim - kernel_dim + 1; dimensions[index] = result_dim; } array inputStrides; array outputStrides; if (static_cast(Layout) == static_cast(ColMajor)) { inputStrides[0] = 1; outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { inputStrides[i] = inputStrides[i-1] * input_dims[i-1]; outputStrides[i] = outputStrides[i-1] * dimensions[i-1]; } } else { inputStrides[NumDims - 1] = 1; outputStrides[NumDims - 1] = 1; for (int i = static_cast(NumDims) - 2; i >= 0; --i) { inputStrides[i] = inputStrides[i + 1] * input_dims[i + 1]; outputStrides[i] = outputStrides[i + 1] * dimensions[i + 1]; } } array gpuInputDimensions; array gpuOutputDimensions; array tmp = dimensions; array ordering; const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; for (int i = 0; i < NumKernelDims; ++i) { const Index index = i + offset; ordering[index] = indices[i]; tmp[indices[i]] = -1; gpuInputDimensions[index] = input_dims[indices[i]]; gpuOutputDimensions[index] = dimensions[indices[i]]; } int written = static_cast(Layout) == static_cast(ColMajor) ? NumKernelDims : 0; for (int i = 0; i < NumDims; ++i) { if (tmp[i] >= 0) { ordering[written] = i; gpuInputDimensions[written] = input_dims[i]; gpuOutputDimensions[written] = dimensions[i]; ++written; } } for (int i = 0; i < NumDims; ++i) { m_inputStrides[i] = inputStrides[ordering[i]]; m_outputStrides[i] = outputStrides[ordering[i]]; } if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < NumDims; ++i) { if (i > NumKernelDims) { m_gpuInputStrides[i] = m_gpuInputStrides[i - 1] * gpuInputDimensions[i - 1]; m_gpuOutputStrides[i] = m_gpuOutputStrides[i - 1] * gpuOutputDimensions[i - 1]; } else { m_gpuInputStrides[i] = 1; m_gpuOutputStrides[i] = 1; } } } else { for (int i = NumDims - 1; i >= 0; --i) { if (static_cast(i + 1) < offset) { m_gpuInputStrides[i] = m_gpuInputStrides[i + 1] * gpuInputDimensions[i + 1]; m_gpuOutputStrides[i] = m_gpuOutputStrides[i + 1] * gpuOutputDimensions[i + 1]; } else { m_gpuInputStrides[i] = 1; m_gpuOutputStrides[i] = 1; } } } } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuInputPlaneToTensorInputOffset(Index p) const { Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int d = NumDims - 1; d > NumKernelDims; --d) { const Index idx = p / m_gpuInputStrides[d]; inputIndex += idx * m_inputStrides[d]; p -= idx * m_gpuInputStrides[d]; } inputIndex += p * m_inputStrides[NumKernelDims]; } else { std::ptrdiff_t limit = 0; if (NumKernelDims < NumDims) { limit = NumDims - NumKernelDims - 1; } for (int d = 0; d < limit; ++d) { const Index idx = p / m_gpuInputStrides[d]; inputIndex += idx * m_inputStrides[d]; p -= idx * m_gpuInputStrides[d]; } inputIndex += p * m_inputStrides[limit]; } return inputIndex; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuOutputPlaneToTensorOutputOffset(Index p) const { Index outputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int d = NumDims - 1; d > NumKernelDims; --d) { const Index idx = p / m_gpuOutputStrides[d]; outputIndex += idx * m_outputStrides[d]; p -= idx * m_gpuOutputStrides[d]; } outputIndex += p * m_outputStrides[NumKernelDims]; } else { std::ptrdiff_t limit = 0; if (NumKernelDims < NumDims) { limit = NumDims - NumKernelDims - 1; } for (int d = 0; d < limit; ++d) { const Index idx = p / m_gpuOutputStrides[d]; outputIndex += idx * m_outputStrides[d]; p -= idx * m_gpuOutputStrides[d]; } outputIndex += p * m_outputStrides[limit]; } return outputIndex; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuInputKernelToTensorInputOffset(Index i) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_inputStrides[offset]; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuOutputKernelToTensorOutputOffset(Index i) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_outputStrides[offset]; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuInputKernelToTensorInputOffset(Index i, Index j) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_inputStrides[offset] + j * m_inputStrides[offset + 1]; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuOutputKernelToTensorOutputOffset(Index i, Index j) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_outputStrides[offset] + j * m_outputStrides[offset + 1]; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuInputKernelToTensorInputOffset(Index i, Index j, Index k) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_inputStrides[offset] + j * m_inputStrides[offset + 1] + k * m_inputStrides[offset + 2]; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapGpuOutputKernelToTensorOutputOffset(Index i, Index j, Index k) const { const size_t offset = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - NumKernelDims; return i * m_outputStrides[offset] + j * m_outputStrides[offset + 1] + k * m_outputStrides[offset + 2]; } private: static const int NumDims = internal::array_size::value; array m_inputStrides; array m_outputStrides; array m_gpuInputStrides; array m_gpuOutputStrides; }; template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef typename promote_storage_type::ret Scalar; typedef typename promote_storage_type::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename InputXprType::Nested LhsNested; typedef typename KernelXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; static const int NumDimensions = traits::NumDimensions; static const int Layout = traits::Layout; typedef typename conditional::val, typename traits::PointerType, typename traits::PointerType>::type PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorConvolutionOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorConvolutionOp type; }; } // end namespace internal template class TensorConvolutionOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::promote_storage_type::ret CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConvolutionOp(const InputXprType& input, const KernelXprType& kernel, const Indices& dims) : m_input_xpr(input), m_kernel_xpr(kernel), m_indices(dims) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Indices& indices() const { return m_indices; } /** \returns the nested expressions */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& inputExpression() const { return m_input_xpr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& kernelExpression() const { return m_kernel_xpr; } protected: typename InputXprType::Nested m_input_xpr; typename KernelXprType::Nested m_kernel_xpr; const Indices m_indices; }; template struct TensorEvaluator, Device> { typedef TensorConvolutionOp XprType; static const int NumDims = internal::array_size::Dimensions>::value; static const int NumKernelDims = internal::array_size::value; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = int(TensorEvaluator::IsAligned) & int(TensorEvaluator::IsAligned), PacketAccess = int(TensorEvaluator::PacketAccess) & int(TensorEvaluator::PacketAccess), BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_inputImpl(op.inputExpression(), device), m_kernelImpl(op.kernelExpression(), device), m_kernelArg(op.kernelExpression()), m_kernel(NULL), m_local_kernel(false), m_device(device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); const typename TensorEvaluator::Dimensions& input_dims = m_inputImpl.dimensions(); const typename TensorEvaluator::Dimensions& kernel_dims = m_kernelImpl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_inputStride[0] = 1; for (int i = 1; i < NumDims; ++i) { m_inputStride[i] = m_inputStride[i - 1] * input_dims[i - 1]; } } else { m_inputStride[NumDims - 1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_inputStride[i] = m_inputStride[i + 1] * input_dims[i + 1]; } } m_dimensions = m_inputImpl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < NumKernelDims; ++i) { const Index index = op.indices()[i]; const Index input_dim = input_dims[index]; const Index kernel_dim = kernel_dims[i]; const Index result_dim = input_dim - kernel_dim + 1; m_dimensions[index] = result_dim; if (i > 0) { m_kernelStride[i] = m_kernelStride[i - 1] * kernel_dims[i - 1]; } else { m_kernelStride[0] = 1; } m_indexStride[i] = m_inputStride[index]; } m_outputStride[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStride[i] = m_outputStride[i - 1] * m_dimensions[i - 1]; } } else { for (int i = NumKernelDims - 1; i >= 0; --i) { const Index index = op.indices()[i]; const Index input_dim = input_dims[index]; const Index kernel_dim = kernel_dims[i]; const Index result_dim = input_dim - kernel_dim + 1; m_dimensions[index] = result_dim; if (i < NumKernelDims - 1) { m_kernelStride[i] = m_kernelStride[i + 1] * kernel_dims[i + 1]; } else { m_kernelStride[NumKernelDims - 1] = 1; } m_indexStride[i] = m_inputStride[index]; } m_outputStride[NumDims - 1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_outputStride[i] = m_outputStride[i + 1] * m_dimensions[i + 1]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) { m_inputImpl.evalSubExprsIfNeeded(NULL); preloadKernel(); return true; } EIGEN_STRONG_INLINE void cleanup() { m_inputImpl.cleanup(); if (m_local_kernel) { m_device.deallocate((void*)m_kernel); m_local_kernel = false; } m_kernel = NULL; } void evalTo(typename XprType::Scalar* buffer) { evalSubExprsIfNeeded(NULL); for (int i = 0; i < dimensions().TotalSize(); ++i) { buffer[i] += coeff(i); } cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { CoeffReturnType result = CoeffReturnType(0); convolve(firstInput(index), 0, NumKernelDims-1, result); return result; } template EIGEN_DEVICE_FUNC PacketReturnType packet(const Index index) const { Index indices[2] = {index, index+PacketSize-1}; Index startInputs[2] = {0, 0}; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { const Index idx0 = indices[0] / m_outputStride[i]; const Index idx1 = indices[1] / m_outputStride[i]; startInputs[0] += idx0 * m_inputStride[i]; startInputs[1] += idx1 * m_inputStride[i]; indices[0] -= idx0 * m_outputStride[i]; indices[1] -= idx1 * m_outputStride[i]; } } else { for (int i = 0; i < NumDims - 1; ++i) { const Index idx0 = indices[0] / m_outputStride[i]; const Index idx1 = indices[1] / m_outputStride[i]; startInputs[0] += idx0 * m_inputStride[i]; startInputs[1] += idx1 * m_inputStride[i]; indices[0] -= idx0 * m_outputStride[i]; indices[1] -= idx1 * m_outputStride[i]; } } startInputs[0] += indices[0]; startInputs[1] += indices[1]; if (startInputs[1]-startInputs[0] == PacketSize-1) { PacketReturnType result = internal::pset1(0); convolvePacket(startInputs[0], 0, NumKernelDims-1, result); return result; } else { EIGEN_ALIGN_MAX Scalar data[PacketSize]; data[0] = Scalar(0); convolve(startInputs[0], 0, NumKernelDims-1, data[0]); for (int i = 1; i < PacketSize-1; ++i) { data[i] = Scalar(0); convolve(firstInput(index+i), 0, NumKernelDims-1, data[i]); } data[PacketSize-1] = Scalar(0); convolve(startInputs[1], 0, NumKernelDims-1, data[PacketSize-1]); return internal::pload(data); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double kernel_size = m_kernelImpl.dimensions().TotalSize(); // We ignore the use of fused multiply-add. const double convolve_compute_cost = TensorOpCost::AddCost() + TensorOpCost::MulCost(); const double firstIndex_compute_cost = NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost()); return TensorOpCost(0, 0, firstIndex_compute_cost, vectorized, PacketSize) + kernel_size * (m_inputImpl.costPerCoeff(vectorized) + m_kernelImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, convolve_compute_cost, vectorized, PacketSize)); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } private: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index firstInput(Index index) const { Index startInput = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStride[i]; startInput += idx * m_inputStride[i]; index -= idx * m_outputStride[i]; } } else { for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStride[i]; startInput += idx * m_inputStride[i]; index -= idx * m_outputStride[i]; } } startInput += index; return startInput; } EIGEN_DEVICE_FUNC void convolve(Index firstIndex, Index firstKernel, int DimIndex, CoeffReturnType& accum) const { for (int j = 0; j < m_kernelImpl.dimensions()[DimIndex]; ++j) { const Index input = firstIndex + j * m_indexStride[DimIndex]; const Index kernel = firstKernel + j * m_kernelStride[DimIndex]; if (DimIndex > 0) { convolve(input, kernel, DimIndex-1, accum); } else { accum += m_inputImpl.coeff(input) * m_kernel[kernel]; } } } template EIGEN_DEVICE_FUNC void convolvePacket(Index firstIndex, Index firstKernel, int DimIndex, Packet& accum) const { for (int j = 0; j < m_kernelImpl.dimensions()[DimIndex]; ++j) { const Index input = firstIndex + j * m_indexStride[DimIndex]; const Index kernel = firstKernel + j * m_kernelStride[DimIndex]; if (DimIndex > 0) { convolvePacket(input, kernel, DimIndex-1, accum); } else { accum = internal::pmadd(m_inputImpl.template packet(input), internal::pset1(m_kernel[kernel]), accum); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void preloadKernel() { // Don't make a local copy of the kernel unless we have to (i.e. it's an // expression that needs to be evaluated) const Scalar* in_place = m_kernelImpl.data(); if (in_place) { m_kernel = in_place; m_local_kernel = false; } else { size_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar); Scalar* local = (Scalar*)m_device.allocate_temp(kernel_sz); typedef TensorEvalToOp EvalTo; EvalTo evalToTmp(local, m_kernelArg); const bool Vectorize = internal::IsVectorizable::value; internal::TensorExecutor::run(evalToTmp, m_device); m_kernel = local; m_local_kernel = true; } } array m_inputStride; array m_outputStride; array m_indexStride; array m_kernelStride; TensorEvaluator m_inputImpl; TensorEvaluator m_kernelImpl; Dimensions m_dimensions; KernelArgType m_kernelArg; const Scalar* m_kernel; bool m_local_kernel; const Device EIGEN_DEVICE_REF m_device; }; // Use an optimized implementation of the evaluation code for GPUs whenever possible. #if defined(EIGEN_USE_GPU) && defined(EIGEN_GPUCC) template struct GetKernelSize { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int operator() (const int /*kernelSize*/) const { return StaticKernelSize; } }; template <> struct GetKernelSize { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int operator() (const int kernelSize) const { return kernelSize; } }; template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void EigenConvolutionKernel1D( InputEvaluator eval, const internal::IndexMapper indexMapper, const float* __restrict kernel, const int numPlanes, const int numX, const int maxX, const int kernelSize, float* buffer) { #if defined(EIGEN_HIPCC) HIP_DYNAMIC_SHARED(float, s) #else extern __shared__ float s[]; #endif const int first_x = blockIdx.x * maxX; const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1; const int num_x_input = last_x - first_x + GetKernelSize()(kernelSize); const int num_x_output = last_x - first_x + 1; const int first_plane = blockIdx.y * blockDim.y; const int plane_stride = blockDim.y * gridDim.y; for (int p = first_plane + threadIdx.y; p < numPlanes; p += plane_stride) { // Load inputs to shared memory const int plane_input_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(p); const int plane_kernel_offset = threadIdx.y * num_x_input; #pragma unroll for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) { const int tensor_index = plane_input_offset + indexMapper.mapGpuInputKernelToTensorInputOffset(i+first_x); s[i + plane_kernel_offset] = eval.coeff(tensor_index); } __syncthreads(); // Compute the convolution const int plane_output_offset = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(p); #pragma unroll for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) { const int kernel_offset = plane_kernel_offset + i; float result = 0.0f; #pragma unroll for (int k = 0; k < GetKernelSize()(kernelSize); ++k) { result += s[k + kernel_offset] * kernel[k]; } const int tensor_index = plane_output_offset + indexMapper.mapGpuOutputKernelToTensorOutputOffset(i+first_x); buffer[tensor_index] = result; } __syncthreads(); } }; template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void EigenConvolutionKernel2D( InputEvaluator eval, const internal::IndexMapper indexMapper, const float* __restrict kernel, const int numPlanes, const int numX, const int maxX, const int numY, const int maxY, const int kernelSizeX, const int kernelSizeY, float* buffer) { #if defined(EIGEN_HIPCC) HIP_DYNAMIC_SHARED(float, s) #else extern __shared__ float s[]; #endif const int first_x = blockIdx.x * maxX; const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1; const int num_x_input = last_x - first_x + GetKernelSize()(kernelSizeX); const int num_x_output = last_x - first_x + 1; const int first_y = blockIdx.y * maxY; const int last_y = (first_y + maxY < numY ? first_y + maxY : numY) - 1; const int num_y_input = last_y - first_y + GetKernelSize()(kernelSizeY); const int num_y_output = last_y - first_y + 1; const int first_plane = blockIdx.z * blockDim.z; const int plane_stride = blockDim.z * gridDim.z; for (int p = first_plane + threadIdx.z; p < numPlanes; p += plane_stride) { const int plane_input_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(p); const int plane_kernel_offset = threadIdx.z * num_y_input; // Load inputs to shared memory #pragma unroll for (int j = threadIdx.y; j < num_y_input; j += blockDim.y) { const int input_offset = num_x_input * (j + plane_kernel_offset); #pragma unroll for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) { const int tensor_index = plane_input_offset + indexMapper.mapGpuInputKernelToTensorInputOffset(i+first_x, j+first_y); s[i + input_offset] = eval.coeff(tensor_index); } } __syncthreads(); // Convolution const int plane_output_offset = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(p); #pragma unroll for (int j = threadIdx.y; j < num_y_output; j += blockDim.y) { #pragma unroll for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) { float result = 0.0f; #pragma unroll for (int l = 0; l < GetKernelSize()(kernelSizeY); ++l) { const int kernel_offset = kernelSizeX * l; const int input_offset = i + num_x_input * (j + l + plane_kernel_offset); #pragma unroll for (int k = 0; k < GetKernelSize()(kernelSizeX); ++k) { result += s[k + input_offset] * kernel[k + kernel_offset]; } } const int tensor_index = plane_output_offset + indexMapper.mapGpuOutputKernelToTensorOutputOffset(i+first_x, j+first_y); buffer[tensor_index] = result; } } __syncthreads(); } }; template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void EigenConvolutionKernel3D( InputEvaluator eval, const internal::IndexMapper indexMapper, const float* __restrict kernel, const size_t numPlanes, const size_t numX, const size_t maxX, const size_t numY, const size_t maxY, const size_t numZ, const size_t maxZ, const size_t kernelSizeX, const size_t kernelSizeY, const size_t kernelSizeZ, float* buffer) { #if defined(EIGEN_HIPCC) HIP_DYNAMIC_SHARED(float, s) #else extern __shared__ float s[]; #endif // Load inputs to shared memory const int first_x = blockIdx.x * maxX; const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1; const int num_x_input = last_x - first_x + kernelSizeX; const int first_y = blockIdx.y * maxY; const int last_y = (first_y + maxY < numY ? first_y + maxY : numY) - 1; const int num_y_input = last_y - first_y + kernelSizeY; const int first_z = blockIdx.z * maxZ; const int last_z = (first_z + maxZ < numZ ? first_z + maxZ : numZ) - 1; const int num_z_input = last_z - first_z + kernelSizeZ; for (int p = 0; p < numPlanes; ++p) { const int plane_input_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(p); const int plane_kernel_offset = 0; for (int k = threadIdx.z; k < num_z_input; k += blockDim.z) { for (int j = threadIdx.y; j < num_y_input; j += blockDim.y) { for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) { const int tensor_index = plane_input_offset + indexMapper.mapGpuInputKernelToTensorInputOffset(i+first_x, j+first_y, k+first_z); s[i + num_x_input * (j + num_y_input * (k + plane_kernel_offset))] = eval.coeff(tensor_index); } } } __syncthreads(); // Convolution const int num_z_output = last_z - first_z + 1; const int num_y_output = last_y - first_y + 1; const int num_x_output = last_x - first_x + 1; const int plane_output_offset = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(p); for (int k = threadIdx.z; k < num_z_output; k += blockDim.z) { for (int j = threadIdx.y; j < num_y_output; j += blockDim.y) { for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) { float result = 0.0f; for (int n = 0; n < kernelSizeZ; ++n) { for (int m = 0; m < kernelSizeY; ++m) { for (int l = 0; l < kernelSizeX; ++l) { result += s[i + l + num_x_input * (j + m + num_y_input * (k + n + plane_kernel_offset))] * kernel[l + kernelSizeX * (m + kernelSizeY * n)]; } } } const int tensor_index = plane_output_offset + indexMapper.mapGpuOutputKernelToTensorOutputOffset(i+first_x, j+first_y, k+first_z); buffer[tensor_index] = result; } } } __syncthreads(); } }; template struct TensorEvaluator, GpuDevice> { typedef TensorConvolutionOp XprType; static const int NumDims = internal::array_size::Dimensions>::value; static const int NumKernelDims = internal::array_size::value; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename TensorEvaluator::Dimensions KernelDimensions; enum { IsAligned = TensorEvaluator::IsAligned & TensorEvaluator::IsAligned, PacketAccess = false, BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// TensorEvaluator(const XprType& op, const GpuDevice& device) : m_inputImpl(op.inputExpression(), device), m_kernelImpl(op.kernelExpression(), device), m_kernelArg(op.kernelExpression()), m_indices(op.indices()), m_buf(NULL), m_kernel(NULL), m_local_kernel(false), m_device(device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); const typename TensorEvaluator::Dimensions& input_dims = m_inputImpl.dimensions(); const typename TensorEvaluator::Dimensions& kernel_dims = m_kernelImpl.dimensions(); m_dimensions = m_inputImpl.dimensions(); for (int i = 0; i < NumKernelDims; ++i) { const Index index = op.indices()[i]; const Index input_dim = input_dims[index]; const Index kernel_dim = kernel_dims[i]; const Index result_dim = input_dim - kernel_dim + 1; m_dimensions[index] = result_dim; } } typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename InputArgType::Scalar Scalar; static const int PacketSize = internal::unpacket_traits::size; EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) { preloadKernel(); m_inputImpl.evalSubExprsIfNeeded(NULL); if (data) { executeEval(data); return false; } else { m_buf = (Scalar*)m_device.allocate(dimensions().TotalSize() * sizeof(Scalar)); executeEval(m_buf); return true; } } EIGEN_STRONG_INLINE void cleanup() { m_inputImpl.cleanup(); if (m_buf) { m_device.deallocate(m_buf); m_buf = NULL; } if (m_local_kernel) { m_device.deallocate((void*)m_kernel); m_local_kernel = false; } m_kernel = NULL; } EIGEN_STRONG_INLINE void preloadKernel() { // Don't make a local copy of the kernel unless we have to (i.e. it's an // expression that needs to be evaluated) const Scalar* in_place = m_kernelImpl.data(); if (in_place) { m_kernel = in_place; m_local_kernel = false; } else { size_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar); Scalar* local = (Scalar*)m_device.allocate(kernel_sz); typedef TensorEvalToOp EvalTo; EvalTo evalToTmp(local, m_kernelArg); const bool PacketAccess = internal::IsVectorizable::value; internal::TensorExecutor::run(evalToTmp, m_device); m_kernel = local; m_local_kernel = true; } } static unsigned int ceil(unsigned int num, unsigned int denom) { const unsigned int rounded_toward_zero = num / denom; if (num > rounded_toward_zero * denom) { return rounded_toward_zero + 1; } return rounded_toward_zero; } void executeEval(Scalar* data) const { typedef typename TensorEvaluator::Dimensions InputDims; const int maxSharedMem = m_device.sharedMemPerBlock(); const int maxThreadsPerBlock = m_device.maxGpuThreadsPerBlock(); const int maxBlocksPerProcessor = m_device.maxGpuThreadsPerMultiProcessor() / maxThreadsPerBlock; const int numMultiProcessors = m_device.getNumGpuMultiProcessors(); const int warpSize = 32; switch (NumKernelDims) { case 1: { const int kernel_size = m_kernelImpl.dimensions().TotalSize(); const int numX = dimensions()[m_indices[0]]; const int numP = dimensions().TotalSize() / numX; int maxX; dim3 block_size; const int single_stride_dim = static_cast(Layout) == static_cast(ColMajor) ? 0 : m_inputImpl.dimensions().rank() - 1; if (m_indices[0] == single_stride_dim) { // Maximum the reuse const int inner_dim = ((maxSharedMem / (sizeof(Scalar)) - kernel_size + 1 + 31) / 32) * 32; maxX = numext::mini(inner_dim, numX); const int maxP = numext::mini(maxSharedMem / ((kernel_size - 1 + maxX) * sizeof(Scalar)), numP); block_size.x = numext::mini(maxThreadsPerBlock, maxX); block_size.y = numext::mini(maxThreadsPerBlock / block_size.x, maxP); } else { // Read as much as possible alongside the inner most dimension, that is the plane const int inner_dim = maxSharedMem / ((warpSize + kernel_size) * sizeof(Scalar)); const int maxP = numext::mini(inner_dim, numP); maxX = numext::mini(maxSharedMem / (inner_dim * sizeof(Scalar)) - kernel_size + 1, numX); block_size.x = numext::mini(warpSize, maxX); block_size.y = numext::mini(maxThreadsPerBlock/block_size.x, maxP); } const int shared_mem = block_size.y * (maxX + kernel_size - 1) * sizeof(Scalar); gpu_assert(shared_mem <= maxSharedMem); const int num_x_blocks = ceil(numX, maxX); const int blocksPerProcessor = numext::mini(maxBlocksPerProcessor, maxSharedMem / shared_mem); const int num_y_blocks = ceil(numMultiProcessors * blocksPerProcessor, num_x_blocks); dim3 num_blocks(num_x_blocks, numext::mini(num_y_blocks, ceil(numP, block_size.y))); //cout << "launching 1D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " maxX: " << maxX << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl; const array indices(m_indices[0]); const array kernel_dims(m_kernelImpl.dimensions()[0]); internal::IndexMapper indexMapper( m_inputImpl.dimensions(), kernel_dims, indices); switch(kernel_size) { case 4: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel1D, Index, InputDims, 4>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, 4, data); break; } case 7: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel1D, Index, InputDims, 7>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, 7, data); break; } default: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel1D, Index, InputDims, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, kernel_size, data); } } break; } case 2: { const int idxX = static_cast(Layout) == static_cast(ColMajor) ? 0 : 1; const int idxY = static_cast(Layout) == static_cast(ColMajor) ? 1 : 0; const int kernel_size_x = m_kernelImpl.dimensions()[idxX]; const int kernel_size_y = m_kernelImpl.dimensions()[idxY]; const int numX = dimensions()[m_indices[idxX]]; const int numY = dimensions()[m_indices[idxY]]; const int numP = dimensions().TotalSize() / (numX*numY); const float scaling_factor = sqrtf(static_cast(maxSharedMem) / (sizeof(Scalar) * kernel_size_y * kernel_size_x)); // Snap maxX to warp size int inner_dim = ((static_cast(scaling_factor * kernel_size_x) - kernel_size_x + 1 + 32) / 32) * 32; const int maxX = numext::mini(inner_dim, numX); const int maxY = numext::mini(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1)) - kernel_size_y + 1, numY); const int maxP = numext::mini(maxSharedMem / ((kernel_size_x - 1 + maxX) * (kernel_size_y - 1 + maxY) * sizeof(Scalar)), numP); dim3 block_size; block_size.x = numext::mini(1024, maxX); block_size.y = numext::mini(1024/block_size.x, maxY); block_size.z = numext::mini(1024/(block_size.x*block_size.y), maxP); const int shared_mem = block_size.z * (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1) * sizeof(Scalar); gpu_assert(shared_mem <= maxSharedMem); const int num_x_blocks = ceil(numX, maxX); const int num_y_blocks = ceil(numY, maxY); const int blocksPerProcessor = numext::mini(maxBlocksPerProcessor, maxSharedMem / shared_mem); const int num_z_blocks = ceil(numMultiProcessors * blocksPerProcessor, num_x_blocks * num_y_blocks); dim3 num_blocks(num_x_blocks, num_y_blocks, numext::mini(num_z_blocks, ceil(numP, block_size.z))); //cout << "launching 2D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y << " block_size.z: " << block_size.z << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " num_blocks.z: " << num_blocks.z << " maxX: " << maxX << " maxY: " << maxY << " maxP: " << maxP << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl; const array indices(m_indices[idxX], m_indices[idxY]); const array kernel_dims(m_kernelImpl.dimensions()[idxX], m_kernelImpl.dimensions()[idxY]); internal::IndexMapper indexMapper( m_inputImpl.dimensions(), kernel_dims, indices); switch (kernel_size_x) { case 4: { switch (kernel_size_y) { case 7: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel2D, Index, InputDims, 4, 7>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 4, 7, data); break; } default: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel2D, Index, InputDims, 4, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 4, kernel_size_y, data); break; } } break; } case 7: { switch (kernel_size_y) { case 4: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel2D, Index, InputDims, 7, 4>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 7, 4, data); break; } default: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel2D, Index, InputDims, 7, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 7, kernel_size_y, data); break; } } break; } default: { LAUNCH_GPU_KERNEL((EigenConvolutionKernel2D, Index, InputDims, Dynamic, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, kernel_size_x, kernel_size_y, data); break; } } break; } case 3: { const int idxX = static_cast(Layout) == static_cast(ColMajor) ? 0 : 2; const int idxY = static_cast(Layout) == static_cast(ColMajor) ? 1 : 1; const int idxZ = static_cast(Layout) == static_cast(ColMajor) ? 2 : 0; const int kernel_size_x = m_kernelImpl.dimensions()[idxX]; const int kernel_size_y = m_kernelImpl.dimensions()[idxY]; const int kernel_size_z = m_kernelImpl.dimensions()[idxZ]; const int numX = dimensions()[m_indices[idxX]]; const int numY = dimensions()[m_indices[idxY]]; const int numZ = dimensions()[m_indices[idxZ]]; const int numP = dimensions().TotalSize() / (numX*numY*numZ); const int maxX = numext::mini(128, numext::mini(maxSharedMem / (sizeof(Scalar) * kernel_size_y * kernel_size_z) - kernel_size_x + 1, numX)); const int maxY = numext::mini(128, numext::mini(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1) * kernel_size_z) - kernel_size_y + 1, numY)); const int maxZ = numext::mini(128, numext::mini(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1)) - kernel_size_z + 1, numZ)); dim3 block_size; block_size.x = numext::mini(32, maxX); block_size.y = numext::mini(32, maxY); block_size.z = numext::mini(1024/(block_size.x*block_size.y), maxZ); dim3 num_blocks(ceil(numX, maxX), ceil(numY, maxY), ceil(numZ, maxZ)); const int shared_mem = (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1) * (maxZ + kernel_size_z - 1) * sizeof(Scalar); gpu_assert(shared_mem <= maxSharedMem); //cout << "launching 3D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y << " block_size.z: " << block_size.z << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " num_blocks.z: " << num_blocks.z << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl; const array indices(m_indices[idxX], m_indices[idxY], m_indices[idxZ]); const array kernel_dims(m_kernelImpl.dimensions()[idxX], m_kernelImpl.dimensions()[idxY], m_kernelImpl.dimensions()[idxZ]); internal::IndexMapper indexMapper( m_inputImpl.dimensions(), kernel_dims, indices); LAUNCH_GPU_KERNEL((EigenConvolutionKernel3D, Index, InputDims>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, numZ, maxZ, kernel_size_x, kernel_size_y, kernel_size_z, data); break; } default: { EIGEN_STATIC_ASSERT((NumKernelDims >= 1 && NumKernelDims <= 3), THIS_METHOD_IS_ONLY_FOR_OBJECTS_OF_A_SPECIFIC_SIZE); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { eigen_assert(m_buf); eigen_assert(index < m_dimensions.TotalSize()); return m_buf[index]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(const Index index) const { eigen_assert(m_buf); eigen_assert(index < m_dimensions.TotalSize()); return internal::ploadt(m_buf+index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // TODO(rmlarsen): FIXME: For now, this is just a copy of the CPU cost // model. const double kernel_size = m_kernelImpl.dimensions().TotalSize(); // We ignore the use of fused multiply-add. const double convolve_compute_cost = TensorOpCost::AddCost() + TensorOpCost::MulCost(); const double firstIndex_compute_cost = NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost()); return TensorOpCost(0, 0, firstIndex_compute_cost, vectorized, PacketSize) + kernel_size * (m_inputImpl.costPerCoeff(vectorized) + m_kernelImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, convolve_compute_cost, vectorized, PacketSize)); } private: // No assignment (copies are needed by the kernels) TensorEvaluator& operator = (const TensorEvaluator&); TensorEvaluator m_inputImpl; TensorEvaluator m_kernelImpl; KernelArgType m_kernelArg; Indices m_indices; Dimensions m_dimensions; Scalar* m_buf; const Scalar* m_kernel; bool m_local_kernel; const GpuDevice& m_device; }; #endif } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h0000644000176200001440000001713114562417221026475 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H #define EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H namespace Eigen { /** \class TensorLayoutSwap * \ingroup CXX11_Tensor_Module * * \brief Swap the layout from col-major to row-major, or row-major * to col-major, and invert the order of the dimensions. * * Beware: the dimensions are reversed by this operation. If you want to * preserve the ordering of the dimensions, you need to combine this * operation with a shuffle. * * \example: * Tensor input(2, 4); * Tensor output = input.swap_layout(); * eigen_assert(output.dimension(0) == 4); * eigen_assert(output.dimension(1) == 2); * * array shuffle(1, 0); * output = input.swap_layout().shuffle(shuffle); * eigen_assert(output.dimension(0) == 2); * eigen_assert(output.dimension(1) == 4); * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = traits::NumDimensions; static const int Layout = (traits::Layout == ColMajor) ? RowMajor : ColMajor; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorLayoutSwapOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorLayoutSwapOp type; }; } // end namespace internal template class TensorLayoutSwapOp : public TensorBase, WriteAccessors> { public: typedef TensorBase, WriteAccessors> Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorLayoutSwapOp(const XprType& expr) : m_xpr(expr) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorLayoutSwapOp) protected: typename XprType::Nested m_xpr; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorLayoutSwapOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = (static_cast(TensorEvaluator::Layout) == static_cast(ColMajor)) ? RowMajor : ColMajor, CoordAccess = false, // to be implemented RawAccess = TensorEvaluator::RawAccess }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { for(int i = 0; i < NumDims; ++i) { m_dimensions[i] = m_impl.dimensions()[NumDims-1-i]; } } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { return m_impl.evalSubExprsIfNeeded(data); } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_impl.coeff(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_impl.template packet(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_impl.costPerCoeff(vectorized); } EIGEN_DEVICE_FUNC typename Storage::Type data() const { return constCast(m_impl.data()); } const TensorEvaluator& impl() const { return m_impl; } protected: TensorEvaluator m_impl; Dimensions m_dimensions; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorLayoutSwapOp XprType; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = (static_cast(TensorEvaluator::Layout) == static_cast(ColMajor)) ? RowMajor : ColMajor, CoordAccess = false // to be implemented }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { return this->m_impl.coeffRef(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { this->m_impl.template writePacket(index, x); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h0000644000176200001440000004105214562417221025777 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Navdeep Jaitly // Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H #define EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H namespace Eigen { /** \class TensorReverse * \ingroup CXX11_Tensor_Module * * \brief Tensor reverse elements class. * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorReverseOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorReverseOp type; }; } // end namespace internal template class TensorReverseOp : public TensorBase, WriteAccessors> { public: typedef TensorBase, WriteAccessors>Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReverseOp( const XprType& expr, const ReverseDimensions& reverse_dims) : m_xpr(expr), m_reverse_dims(reverse_dims) { } EIGEN_DEVICE_FUNC const ReverseDimensions& reverse() const { return m_reverse_dims; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorReverseOp) protected: typename XprType::Nested m_xpr; const ReverseDimensions m_reverse_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorReverseOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = NumDims > 0, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; typedef internal::TensorIntDivisor IndexDivisor; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_reverse(op.reverse()), m_device(device) { // Reversing a scalar isn't supported yet. It would be a no-op anyway. EIGEN_STATIC_ASSERT((NumDims > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); // Compute strides m_dimensions = m_impl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_strides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_strides[i] = m_strides[i-1] * m_dimensions[i-1]; if (m_strides[i] > 0) m_fastStrides[i] = IndexDivisor(m_strides[i]); } } else { m_strides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_strides[i] = m_strides[i+1] * m_dimensions[i+1]; if (m_strides[i] > 0) m_fastStrides[i] = IndexDivisor(m_strides[i]); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_impl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index reverseIndex( Index index) const { eigen_assert(index < dimensions().TotalSize()); Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { Index idx = index / m_fastStrides[i]; index -= idx * m_strides[i]; if (m_reverse[i]) { idx = m_dimensions[i] - idx - 1; } inputIndex += idx * m_strides[i] ; } if (m_reverse[0]) { inputIndex += (m_dimensions[0] - index - 1); } else { inputIndex += index; } } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { Index idx = index / m_fastStrides[i]; index -= idx * m_strides[i]; if (m_reverse[i]) { idx = m_dimensions[i] - idx - 1; } inputIndex += idx * m_strides[i] ; } if (m_reverse[NumDims-1]) { inputIndex += (m_dimensions[NumDims-1] - index - 1); } else { inputIndex += index; } } return inputIndex; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff( Index index) const { return m_impl.coeff(reverseIndex(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); // TODO(ndjaitly): write a better packing routine that uses // local structure. EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { const size_t target_size = m_device.lastLevelCacheSize(); // Block evaluation reads underlying memory in reverse order, and default // cost model does not properly catch this in bytes stored/loaded. return internal::TensorBlockResourceRequirements::skewed( target_size) .addCostPerCoeff({0, 0, 24}); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { // TODO(ezhulenev): If underlying tensor expression supports and prefers // block evaluation we must use it. Currently we use coeff and packet // access into the underlying tensor expression. // static const bool useBlockAccessForArgType = // TensorEvaluator::BlockAccess && // TensorEvaluator::PreferBlockAccess; static const bool isColMajor = static_cast(Layout) == static_cast(ColMajor); static const Index inner_dim_idx = isColMajor ? 0 : NumDims - 1; const bool inner_dim_reversed = m_reverse[inner_dim_idx]; // Offset in the output block. Index block_offset = 0; // Offset in the input Tensor. Index input_offset = reverseIndex(desc.offset()); // Initialize output block iterator state. Dimension in this array are // always in inner_most -> outer_most order (col major layout). array it; for (int i = 0; i < NumDims; ++i) { const int dim = isColMajor ? i : NumDims - 1 - i; it[i].size = desc.dimension(dim); it[i].count = 0; it[i].reverse = m_reverse[dim]; it[i].block_stride = i == 0 ? 1 : (it[i - 1].size * it[i - 1].block_stride); it[i].block_span = it[i].block_stride * (it[i].size - 1); it[i].input_stride = m_strides[dim]; it[i].input_span = it[i].input_stride * (it[i].size - 1); if (it[i].reverse) { it[i].input_stride = -1 * it[i].input_stride; it[i].input_span = -1 * it[i].input_span; } } // If multiple inner dimensions have the same reverse flag, check if we can // merge them into a single virtual inner dimension. int effective_inner_dim = 0; for (int i = 1; i < NumDims; ++i) { if (it[i].reverse != it[effective_inner_dim].reverse) break; if (it[i].block_stride != it[effective_inner_dim].size) break; if (it[i].block_stride != numext::abs(it[i].input_stride)) break; it[i].size = it[effective_inner_dim].size * it[i].size; it[i].block_stride = 1; it[i].input_stride = (inner_dim_reversed ? -1 : 1); it[i].block_span = it[i].block_stride * (it[i].size - 1); it[i].input_span = it[i].input_stride * (it[i].size - 1); effective_inner_dim = i; } eigen_assert(it[effective_inner_dim].block_stride == 1); eigen_assert(it[effective_inner_dim].input_stride == (inner_dim_reversed ? -1 : 1)); const Index inner_dim_size = it[effective_inner_dim].size; // Prepare storage for the materialized reverse result. const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage(desc, scratch); CoeffReturnType* block_buffer = block_storage.data(); while (it[NumDims - 1].count < it[NumDims - 1].size) { // Copy inner-most dimension data from reversed location in input. Index dst = block_offset; Index src = input_offset; // NOTE(ezhulenev): Adding vectorized path with internal::preverse showed // worse results in benchmarks than a simple coefficient loop. if (inner_dim_reversed) { for (Index i = 0; i < inner_dim_size; ++i) { block_buffer[dst] = m_impl.coeff(src); ++dst; --src; } } else { for (Index i = 0; i < inner_dim_size; ++i) { block_buffer[dst] = m_impl.coeff(src); ++dst; ++src; } } // For the 1d tensor we need to generate only one inner-most dimension. if ((NumDims - effective_inner_dim) == 1) break; // Update offset. for (Index i = effective_inner_dim + 1; i < NumDims; ++i) { if (++it[i].count < it[i].size) { block_offset += it[i].block_stride; input_offset += it[i].input_stride; break; } if (i != NumDims - 1) it[i].count = 0; block_offset -= it[i].block_span; input_offset -= it[i].input_span; } } return block_storage.AsTensorMaterializedBlock(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { double compute_cost = NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost()); for (int i = 0; i < NumDims; ++i) { if (m_reverse[i]) { compute_cost += 2 * TensorOpCost::AddCost(); } } return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost, false /* vectorized */, PacketSize); } EIGEN_DEVICE_FUNC typename Storage::Type data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: Dimensions m_dimensions; array m_strides; array m_fastStrides; TensorEvaluator m_impl; ReverseDimensions m_reverse; const Device EIGEN_DEVICE_REF m_device; private: struct BlockIteratorState { BlockIteratorState() : size(0), count(0), reverse(false), block_stride(0), block_span(0), input_stride(0), input_span(0) {} Index size; Index count; bool reverse; Index block_stride; Index block_span; Index input_stride; Index input_span; }; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorReverseOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::value; typedef DSizes Dimensions; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) {} typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return this->m_dimensions; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return this->m_impl.coeffRef(this->reverseIndex(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); // This code is pilfered from TensorMorphing.h EIGEN_ALIGN_MAX CoeffReturnType values[PacketSize]; internal::pstore(values, x); EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { this->coeffRef(index+i) = values[i]; } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h0000644000176200001440000002160614562417221026312 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Ke Yang // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H #define EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H namespace Eigen { /** \class TensorInflation * \ingroup CXX11_Tensor_Module * * \brief Tensor inflation class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorInflationOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorInflationOp type; }; } // end namespace internal template class TensorInflationOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorInflationOp(const XprType& expr, const Strides& strides) : m_xpr(expr), m_strides(strides) {} EIGEN_DEVICE_FUNC const Strides& strides() const { return m_strides; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const Strides m_strides; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorInflationOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = /*TensorEvaluator::IsAligned*/ false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_strides(op.strides()) { m_dimensions = m_impl.dimensions(); // Expand each dimension to the inflated dimension. for (int i = 0; i < NumDims; ++i) { m_dimensions[i] = (m_dimensions[i] - 1) * op.strides()[i] + 1; } // Remember the strides for fast division. for (int i = 0; i < NumDims; ++i) { m_fastStrides[i] = internal::TensorIntDivisor(m_strides[i]); } const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_outputStrides[0] = 1; m_inputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1]; m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; } } else { // RowMajor m_outputStrides[NumDims-1] = 1; m_inputStrides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1]; m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } // Computes the input index given the output index. Returns true if the output // index doesn't fall into a hole. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool getInputIndex(Index index, Index* inputIndex) const { eigen_assert(index < dimensions().TotalSize()); *inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; if (idx != idx / m_fastStrides[i] * m_strides[i]) { return false; } *inputIndex += idx / m_strides[i] * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } if (index != index / m_fastStrides[0] * m_strides[0]) { return false; } *inputIndex += index / m_strides[0]; return true; } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStrides[i]; if (idx != idx / m_fastStrides[i] * m_strides[i]) { return false; } *inputIndex += idx / m_strides[i] * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } if (index != index / m_fastStrides[NumDims-1] * m_strides[NumDims-1]) { return false; } *inputIndex += index / m_strides[NumDims - 1]; } return true; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { Index inputIndex = 0; if (getInputIndex(index, &inputIndex)) { return m_impl.coeff(inputIndex); } else { return Scalar(0); } } // TODO(yangke): optimize this function so that we can detect and produce // all-zero packets template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = NumDims * (3 * TensorOpCost::DivCost() + 3 * TensorOpCost::MulCost() + 2 * TensorOpCost::AddCost()); const double input_size = m_impl.dimensions().TotalSize(); const double output_size = m_dimensions.TotalSize(); if (output_size == 0) return TensorOpCost(); return m_impl.costPerCoeff(vectorized) + TensorOpCost(sizeof(CoeffReturnType) * input_size / output_size, 0, compute_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: Dimensions m_dimensions; array m_outputStrides; array m_inputStrides; TensorEvaluator m_impl; const Strides m_strides; array, NumDims> m_fastStrides; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h0000644000176200001440000013041014562417221026644 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H namespace Eigen { /** \class TensorContraction * \ingroup CXX11_Tensor_Module * * \brief Tensor contraction class. * * */ namespace internal { template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef typename gebp_traits::type, typename remove_const::type>::ResScalar Scalar; typedef typename promote_storage_type::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename LhsXprType::Nested LhsNested; typedef typename RhsXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; // From NumDims below. static const int NumDimensions = traits::NumDimensions + traits::NumDimensions - 2 * array_size::value; static const int Layout = traits::Layout; typedef typename conditional::val, typename traits::PointerType, typename traits::PointerType>::type PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorContractionOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorContractionOp type; }; template struct traits, Device_> > { typedef Indices_ Indices; typedef LeftArgType_ LeftArgType; typedef RightArgType_ RightArgType; typedef OutputKernelType_ OutputKernelType; typedef Device_ Device; // From NumDims below. static const int NumDimensions = traits::NumDimensions + traits::NumDimensions - 2 * array_size::value; }; // Helper class to allocate and deallocate temporary memory for packed buffers. template struct TensorContractionBlockMemAllocator { typedef void* BlockMemHandle; template EIGEN_DEVICE_FUNC static BlockMemHandle allocate(Device& d, const Index bm, const Index bk, const Index bn, LhsScalar** lhs_block, RhsScalar** rhs_block) { eigen_assert(lhs_block); eigen_assert(rhs_block); BlockSizes sz = ComputeLhsRhsBlockSizes(bm, bk, bn); char* block_mem = static_cast(d.allocate(sz.lhs_size + sz.rhs_size)); eigen_assert(block_mem); *lhs_block = reinterpret_cast(block_mem); *rhs_block = reinterpret_cast(block_mem + sz.lhs_size); return block_mem; } template EIGEN_DEVICE_FUNC static BlockMemHandle allocateSlices( Device& d, const Index bm, const Index bk, const Index bn, const Index num_lhs, const Index num_rhs, const Index num_slices, std::vector* lhs_blocks, std::vector* rhs_blocks) { eigen_assert(num_slices > 0); eigen_assert(num_lhs >= 0 && num_rhs >= 0); eigen_assert(num_lhs == 0 || lhs_blocks); eigen_assert(num_rhs == 0 || rhs_blocks); BlockSizes sz = ComputeLhsRhsBlockSizes(bm, bk, bn); void* block_mem = d.allocate( (num_lhs * sz.lhs_size + num_rhs * sz.rhs_size) * num_slices); eigen_assert(block_mem); char* mem = static_cast(block_mem); for (Index x = 0; x < num_slices; x++) { if (num_lhs > 0) lhs_blocks[x].resize(num_lhs); for (Index m = 0; m < num_lhs; m++) { lhs_blocks[x][m] = reinterpret_cast(mem); mem += sz.lhs_size; } if (num_rhs > 0) rhs_blocks[x].resize(num_rhs); for (Index n = 0; n < num_rhs; n++) { rhs_blocks[x][n] = reinterpret_cast(mem); mem += sz.rhs_size; } } return block_mem; } template EIGEN_DEVICE_FUNC static void deallocate(Device& d, BlockMemHandle handle) { d.deallocate(handle); } private: struct BlockSizes { Index lhs_size; Index rhs_size; }; EIGEN_DEVICE_FUNC static BlockSizes ComputeLhsRhsBlockSizes(const Index bm, const Index bk, const Index bn) { Index align = numext::maxi(EIGEN_MAX_ALIGN_BYTES, 1); BlockSizes sz; sz.lhs_size = divup(bm * bk * sizeof(LhsScalar), align) * align; sz.rhs_size = divup(bn * bk * sizeof(RhsScalar), align) * align; return sz; } }; // WARNING: In this code we assume that Lhs and Rhs tensor expressions are in // ColMajor storage order. This property is guaranteed by the // TensorContractionOp evaluator. TensorContractionKernel specifies how we pack // blocks of Lhs and Rhs tensor expressions, and how we invoke matrix // multiplication for these blocks. Default tensor contraction uses // gemm_pack_rhs, gemm_pack_lhs and gebp_kernel from Eigen Core (see // GeneralBlocPanelKernel.h for details). // // By specializing contraction kernels we can use other low level libraries to // perform matrix multiplication, and still rely on Eigen contraction evaluator. // This also includes full support in TensorContractionThreadPool, assuming that // underlying gemm do not use it's own threading. // // - ResScalar/LhsScalar/RhsScalar - scalar type for the result of // multiplication, lhs tensor and rhs tensor respectively. // // - StorageIndex - index type for the tensor expressions. In practice almost // always is Eigen::Index. // // - OutputMapper provides access to the memory of the output matrix. In // practice it's always column major blas_data_mapper (it must be of ResScalar // type). // // - LhsMapper/RhsMapper similarly to blas_data_mapper provide a two dimensional // view into the Lhs/Rhs tensor expressions. In practice it's // TensorContractionInputMapper, or some specialization of it based on the // type of tensor expression (e.g. TensorImagePatchOp has optimized input // mapper). template struct TensorContractionKernel { // True if `invoke()` supports `beta` in `C <- alpha * A * B + beta * C` // (otherwise beta should be always equal to 1). enum { HasBeta = false }; EIGEN_DEVICE_FUNC TensorContractionKernel(StorageIndex m_, StorageIndex k_, StorageIndex n_, StorageIndex bm_, StorageIndex bk_, StorageIndex bn_) : m(m_), k(k_), n(n_), bm(bm_), bk(bk_), bn(bn_) {} // Pack blocks of Lhs and Rhs into contiguous blocks in memory. typedef LhsScalar* LhsBlock; typedef RhsScalar* RhsBlock; // Packed Lhs/Rhs block memory allocator. typedef TensorContractionBlockMemAllocator BlockMemAllocator; typedef typename BlockMemAllocator::BlockMemHandle BlockMemHandle; typedef typename internal::gebp_traits Traits; typedef internal::gemm_pack_lhs< LhsScalar, StorageIndex, typename LhsMapper::SubMapper, Traits::mr, Traits::LhsProgress, typename Traits::LhsPacket4Packing, ColMajor> LhsPacker; typedef internal::gemm_pack_rhs RhsPacker; typedef internal::gebp_kernel GebpKernel; template EIGEN_DEVICE_FUNC BlockMemHandle allocate(Device& d, LhsBlock* lhs_block, RhsBlock* rhs_block) { return BlockMemAllocator::allocate(d, bm, bk, bn, lhs_block, rhs_block); } template EIGEN_DEVICE_FUNC BlockMemHandle allocateSlices( Device& d, const StorageIndex num_lhs, const StorageIndex num_rhs, const StorageIndex num_slices, std::vector* lhs_blocks, std::vector* rhs_blocks) { return BlockMemAllocator::allocateSlices( d, bm, bk, bn, num_lhs, num_rhs, num_slices, lhs_blocks, rhs_blocks); } template EIGEN_DEVICE_FUNC static void deallocate(Device& d, BlockMemHandle handle) { BlockMemAllocator::deallocate(d, handle); } EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE void packLhs( LhsBlock* lhsBlock, const typename LhsMapper::SubMapper& data_mapper, const StorageIndex depth, const StorageIndex rows) { LhsPacker()(*lhsBlock, data_mapper, depth, rows, /*stride*/ 0, /*offset*/ 0); } EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE void packRhs( RhsBlock* rhsBlock, const typename RhsMapper::SubMapper& data_mapper, const StorageIndex depth, const StorageIndex cols) { RhsPacker()(*rhsBlock, data_mapper, depth, cols); } EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE void invoke( const OutputMapper& output_mapper, const LhsBlock& lhsBlock, const RhsBlock& rhsBlock, const StorageIndex rows, const StorageIndex depth, const StorageIndex cols, const ResScalar alpha, const ResScalar beta) { // Default GEBP kernel does not support beta. eigen_assert(beta == ResScalar(1)); static const int kComputeStrideFromBlockDimensions = -1; GebpKernel()(output_mapper, lhsBlock, rhsBlock, rows, depth, cols, alpha, /*strideA*/ kComputeStrideFromBlockDimensions, /*strideB*/ kComputeStrideFromBlockDimensions, /*offsetA*/ 0, /*offsetB*/ 0); } private: // These are dimensions of the original Tensors, and selected block sizes. The // actual block sizes passed to all function above might be smaller because of // the partial blocks at the end. const StorageIndex m; const StorageIndex k; const StorageIndex n; const StorageIndex bm; const StorageIndex bk; const StorageIndex bn; }; } // end namespace internal // Tensor contraction params that should enable to get from output matrix // 2-dimensional coordinates to the output tensor dimensions. struct TensorContractionParams { // TensorContraction evaluator assumes that both tensors are in ColMajor // layout, if tensors are in RowMajor evaluator swap lhs with rhs. bool swapped_arguments; }; // Output kernel allows to fuse operations into the tensor contraction. // // Examples: // 1. Elementwise Relu transformation following Conv2D. // 2. AddBias to the Conv2D output channels dimension. // // The NoOpOutputKernel implements an output kernel that does absolutely nothing. struct NoOpOutputKernel { /** * Tensor contraction evaluator calls this kernel after finishing each block * of output matrix. Output blocks belong to the 2-dimensional output tensor. * * TensorContractionParams contains contraction dimensions information * required to map output 2-d space into the expected output tensor space * (potentially higher dimensional). * * \param[in] output_mapper Access to output tensor memory * \param[in] params Tensor contraction parameters * \param[in] i Index of a first row available through output_mapper * \param[in] j Index of a first column available through output_mapper * \param[in] num_rows Number of available rows * \param[in] num_cols Number of available columns */ template EIGEN_ALWAYS_INLINE void operator()( const internal::blas_data_mapper& output_mapper, const TensorContractionParams& params, Index i, Index j, Index num_rows, Index num_cols) const { EIGEN_UNUSED_VARIABLE(output_mapper); EIGEN_UNUSED_VARIABLE(params); EIGEN_UNUSED_VARIABLE(i); EIGEN_UNUSED_VARIABLE(j); EIGEN_UNUSED_VARIABLE(num_rows); EIGEN_UNUSED_VARIABLE(num_cols); } }; template class TensorContractionOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename internal::gebp_traits::ResScalar CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorContractionOp( const LhsXprType& lhs, const RhsXprType& rhs, const Indices& dims, const OutputKernelType& output_kernel = OutputKernelType()) : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_indices(dims), m_output_kernel(output_kernel) {} EIGEN_DEVICE_FUNC const Indices& indices() const { return m_indices; } /** \returns the nested expressions */ EIGEN_DEVICE_FUNC const typename internal::remove_all::type& lhsExpression() const { return m_lhs_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& rhsExpression() const { return m_rhs_xpr; } EIGEN_DEVICE_FUNC const OutputKernelType& outputKernel() const { return m_output_kernel; } protected: typename LhsXprType::Nested m_lhs_xpr; typename RhsXprType::Nested m_rhs_xpr; const Indices m_indices; const OutputKernelType m_output_kernel; }; template struct TensorContractionEvaluatorBase : internal::no_assignment_operator { typedef typename internal::traits::Indices Indices; typedef typename internal::traits::LeftArgType LeftArgType; typedef typename internal::traits::RightArgType RightArgType; typedef typename internal::traits::OutputKernelType OutputKernelType; typedef typename internal::traits::Device Device; typedef TensorContractionOp XprType; typedef typename internal::remove_const::type Scalar; typedef typename XprType::Index Index; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = true, PacketAccess = (PacketType::size > 1), BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = true }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// // Most of the code is assuming that both input tensors are ColMajor. If the // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS: // If we want to compute A * B = C, where A is LHS and B is RHS, the code // will pretend B is LHS and A is RHS. typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType; typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType; typedef TensorEvaluator LeftEvaluatorType; typedef TensorEvaluator RightEvaluatorType; static const int LDims = internal::array_size::Dimensions>::value; static const int RDims = internal::array_size::Dimensions>::value; static const int ContractDims = internal::array_size::value; static const int NumDims = LDims + RDims - 2 * ContractDims; typedef array contract_t; typedef array left_nocontract_t; typedef array right_nocontract_t; typedef DSizes Dimensions; EIGEN_STRONG_INLINE TensorContractionEvaluatorBase(const XprType& op, const Device& device) : m_leftImpl(choose(Cond(Layout) == static_cast(ColMajor)>(), op.lhsExpression(), op.rhsExpression()), device), m_rightImpl(choose(Cond(Layout) == static_cast(ColMajor)>(), op.rhsExpression(), op.lhsExpression()), device), m_device(device), m_output_kernel(op.outputKernel()), m_result(NULL) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); DSizes eval_left_dims; DSizes eval_right_dims; array, ContractDims> eval_op_indices; if (static_cast(Layout) == static_cast(ColMajor)) { // For ColMajor, we keep using the existing dimensions for (int i = 0; i < LDims; i++) { eval_left_dims[i] = m_leftImpl.dimensions()[i]; } for (int i = 0; i < RDims; i++) { eval_right_dims[i] = m_rightImpl.dimensions()[i]; } // We keep the pairs of contracting indices. for (int i = 0; i < ContractDims; i++) { eval_op_indices[i].first = op.indices()[i].first; eval_op_indices[i].second = op.indices()[i].second; } } else { // For RowMajor, we need to reverse the existing dimensions for (int i = 0; i < LDims; i++) { eval_left_dims[i] = m_leftImpl.dimensions()[LDims - i - 1]; } for (int i = 0; i < RDims; i++) { eval_right_dims[i] = m_rightImpl.dimensions()[RDims - i - 1]; } // We need to flip all the pairs of contracting indices as well as // reversing the dimensions. for (int i = 0; i < ContractDims; i++) { eval_op_indices[i].first = LDims - 1 - op.indices()[ContractDims - 1 - i].second; eval_op_indices[i].second = RDims - 1 - op.indices()[ContractDims - 1 - i].first; } } // Check for duplicate axes and make sure the first index in eval_op_indices // is increasing. Using O(n^2) sorting is OK since ContractDims is small for (int i = 0; i < ContractDims; i++) { for (int j = i + 1; j < ContractDims; j++) { eigen_assert(eval_op_indices[j].first != eval_op_indices[i].first && eval_op_indices[j].second != eval_op_indices[i].second && "contraction axes should be unique"); if (eval_op_indices[j].first < eval_op_indices[i].first) { numext::swap(eval_op_indices[j], eval_op_indices[i]); } } } array lhs_strides; lhs_strides[0] = 1; for (int i = 0; i < LDims-1; ++i) { lhs_strides[i+1] = lhs_strides[i] * eval_left_dims[i]; } array rhs_strides; rhs_strides[0] = 1; for (int i = 0; i < RDims-1; ++i) { rhs_strides[i+1] = rhs_strides[i] * eval_right_dims[i]; } if (m_i_strides.size() > 0) m_i_strides[0] = 1; if (m_j_strides.size() > 0) m_j_strides[0] = 1; if (m_k_strides.size() > 0) m_k_strides[0] = 1; m_i_size = 1; m_j_size = 1; m_k_size = 1; // To compute the dimension, we simply concatenate the non-contracting // dimensions of the left and then the right tensor. Additionally, we also // compute the strides corresponding to the left non-contracting // dimensions and right non-contracting dimensions. m_lhs_inner_dim_contiguous = true; int dim_idx = 0; Index nocontract_idx = 0; for (int i = 0; i < LDims; i++) { // find if we are contracting on index i of left tensor bool contracting = false; for (int j = 0; j < ContractDims; j++) { if (eval_op_indices[j].first == i) { contracting = true; break; } } if (!contracting) { // add dimension size to output dimensions m_dimensions[dim_idx] = eval_left_dims[i]; m_left_nocontract_strides[nocontract_idx] = lhs_strides[i]; if (dim_idx != i) { m_lhs_inner_dim_contiguous = false; } if (nocontract_idx+1 < internal::array_size::value) { m_i_strides[nocontract_idx+1] = m_i_strides[nocontract_idx] * eval_left_dims[i]; } else { m_i_size = m_i_strides[nocontract_idx] * eval_left_dims[i]; } dim_idx++; nocontract_idx++; } } nocontract_idx = 0; for (int i = 0; i < RDims; i++) { bool contracting = false; // find if we are contracting on index i of right tensor for (int j = 0; j < ContractDims; j++) { if (eval_op_indices[j].second == i) { contracting = true; break; } } if (!contracting) { m_dimensions[dim_idx] = eval_right_dims[i]; if (nocontract_idx+1 < internal::array_size::value) { m_j_strides[nocontract_idx+1] = m_j_strides[nocontract_idx] * eval_right_dims[i]; } else { m_j_size = m_j_strides[nocontract_idx] * eval_right_dims[i]; } m_right_nocontract_strides[nocontract_idx] = rhs_strides[i]; dim_idx++; nocontract_idx++; } } // Now compute the strides corresponding to the contracting dimensions. We // assumed above that non-contracting axes are represented in the same order // in the matrix as they are in the tensor. This is not the case for // contracting axes. As the contracting axes must be of the same size in // each tensor, we'll only look at the first tensor here. m_rhs_inner_dim_contiguous = true; m_rhs_inner_dim_reordered = false; for (int i = 0; i < ContractDims; i++) { Index left = eval_op_indices[i].first; Index right = eval_op_indices[i].second; Index size = eval_left_dims[left]; eigen_assert(size == eval_right_dims[right] && "Contraction axes must be same size"); if (i+1 < static_cast(internal::array_size::value)) { m_k_strides[i+1] = m_k_strides[i] * size; } else { m_k_size = m_k_strides[i] * size; } m_left_contracting_strides[i] = lhs_strides[left]; m_right_contracting_strides[i] = rhs_strides[right]; if (i > 0 && right < eval_op_indices[i-1].second) { m_rhs_inner_dim_reordered = true; } if (right != i) { m_rhs_inner_dim_contiguous = false; } } // If the layout is RowMajor, we need to reverse the m_dimensions if (static_cast(Layout) == static_cast(RowMajor)) { for (int i = 0, j = NumDims - 1; i < j; i++, j--) { numext::swap(m_dimensions[i], m_dimensions[j]); } } // A set of parameters that will allow output kernel to get from output // tensor dimensions (i, j) into the original tensor dimensions. // TODO(ezhulenev): Add parameters required to infer output tensor index for // more complex contractions than 2x2 on internal dimension. m_tensor_contraction_params.swapped_arguments = static_cast(Layout) == RowMajor; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { m_leftImpl.evalSubExprsIfNeeded(NULL); m_rightImpl.evalSubExprsIfNeeded(NULL); if (data) { evalTo(data); return false; } else { m_result = static_cast(m_device.allocate(dimensions().TotalSize() * sizeof(Scalar))); evalTo(m_result); return true; } } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType dest, EvalSubExprsCallback done) { m_leftImpl.evalSubExprsIfNeededAsync(nullptr, [this, done, dest](bool) { m_rightImpl.evalSubExprsIfNeededAsync(nullptr, [this, done, dest](bool) { if (dest) { evalToAsync(dest, [done]() { done(false); }); } else { m_result = static_cast( m_device.allocate(dimensions().TotalSize() * sizeof(Scalar))); evalToAsync(m_result, [done]() { done(true); }); } }); }); } #endif // EIGEN_USE_THREADS #ifndef TENSOR_CONTRACTION_DISPATCH #define TENSOR_CONTRACTION_DISPATCH(METHOD, ALIGNMENT, ARGS) \ if (this->m_lhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_reordered) { \ METHOD ARGS; \ } else { \ METHOD ARGS; \ } \ } else { \ if (this->m_rhs_inner_dim_reordered) { \ METHOD ARGS; \ } else { \ METHOD ARGS; \ } \ } \ } else { \ if (this->m_rhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_reordered) { \ METHOD ARGS; \ } else { \ METHOD ARGS; \ } \ } else { \ if (this->m_rhs_inner_dim_reordered) { \ METHOD ARGS; \ } else { \ METHOD ARGS; \ } \ } \ } #endif #ifndef TENSOR_CONTRACTION_ASYNC_DISPATCH #define TENSOR_CONTRACTION_ASYNC_DISPATCH(METHOD, DONE, ALIGNMENT, ARGS, FN) \ if (this->m_lhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_reordered) { \ (new METHOD ARGS)->FN; \ } else { \ (new METHOD ARGS)->FN; \ } \ } else { \ if (this->m_rhs_inner_dim_reordered) { \ (new METHOD ARGS)->FN; \ } else { \ (new METHOD ARGS)->FN; \ } \ } \ } else { \ if (this->m_rhs_inner_dim_contiguous) { \ if (this->m_rhs_inner_dim_reordered) { \ (new METHOD ARGS)->FN; \ } else { \ (new METHOD ARGS)->FN; \ } \ } else { \ if (this->m_rhs_inner_dim_reordered) { \ (new METHOD ARGS)->FN; \ } else { \ (new METHOD ARGS)->FN; \ } \ } \ } #endif EIGEN_DEVICE_FUNC void evalTo(Scalar* buffer) const { static_cast(this)->template evalProduct(buffer); } #ifdef EIGEN_USE_THREADS template void evalToAsync(Scalar* buffer, EvalToCallback done) const { static_cast(this) ->template evalProductAsync(buffer, std::move(done)); } #endif // EIGEN_USE_THREADS template void evalProductSequential(Scalar* buffer) const { if (this->m_j_size == 1) { this->template evalGemv(buffer); } else { this->template evalGemm(buffer); } } template #if !defined(EIGEN_HIPCC) EIGEN_DEVICE_FUNC #endif void evalGemv(Scalar* buffer) const { const Index rows = m_i_size; const Index cols = m_k_size; typedef typename internal::remove_const::type LhsScalar; typedef typename internal::remove_const::type RhsScalar; typedef TensorEvaluator LeftEvaluator; typedef TensorEvaluator RightEvaluator; const Index lhs_packet_size = internal::unpacket_traits::size; const Index rhs_packet_size = internal::unpacket_traits::size; const int lhs_alignment = LeftEvaluator::IsAligned ? Aligned : Unaligned; const int rhs_alignment = RightEvaluator::IsAligned ? Aligned : Unaligned; typedef internal::TensorContractionInputMapper LhsMapper; typedef internal::TensorContractionInputMapper RhsMapper; LhsMapper lhs(m_leftImpl, m_left_nocontract_strides, m_i_strides, m_left_contracting_strides, m_k_strides); RhsMapper rhs(m_rightImpl, m_right_nocontract_strides, m_j_strides, m_right_contracting_strides, m_k_strides); const Scalar alpha(1); const Index resIncr(1); // zero out the result buffer (which must be of size at least rows * sizeof(Scalar) m_device.memset(buffer, 0, rows * sizeof(Scalar)); internal::general_matrix_vector_product::run( rows, cols, lhs, rhs, buffer, resIncr, alpha); typedef internal::blas_data_mapper OutputMapper; m_output_kernel(OutputMapper(buffer, rows), m_tensor_contraction_params, static_cast(0), static_cast(0), rows, static_cast(1)); } template #if !defined(EIGEN_HIPCC) EIGEN_DEVICE_FUNC #endif void evalGemm(Scalar* buffer) const { // columns in left side, rows in right side const Index k = this->m_k_size; this->template evalGemmPartial(buffer, 0, k, 1); } template EIGEN_DEVICE_FUNC void evalGemmPartialWithoutOutputKernel( Scalar* buffer, Index k_start, Index k_end, int num_threads) const { evalGemmPartial(buffer, k_start, k_end, num_threads); } template EIGEN_DEVICE_FUNC void evalGemmPartial(Scalar* buffer, Index k_start, Index k_end, int num_threads) const { eigen_assert(k_end >= k_start && k_start >= 0 && k_end <= this->m_k_size); // columns in slice on left side, rows on right side const Index k_slice = k_end - k_start; // rows in left side const Index m = this->m_i_size; // columns in right side const Index n = this->m_j_size; // define data mappers for Lhs and Rhs typedef typename internal::remove_const::type LhsScalar; typedef typename internal::remove_const::type RhsScalar; typedef TensorEvaluator LeftEvaluator; typedef TensorEvaluator RightEvaluator; const Index lhs_packet_size = internal::unpacket_traits::size; const Index rhs_packet_size = internal::unpacket_traits::size; typedef internal::TensorContractionInputMapper LhsMapper; typedef internal::TensorContractionInputMapper RhsMapper; typedef internal::blas_data_mapper OutputMapper; typedef internal::TensorContractionKernel< Scalar, LhsScalar, RhsScalar, Index, OutputMapper, LhsMapper, RhsMapper> TensorContractionKernel; // initialize data mappers LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides, this->m_left_contracting_strides, this->m_k_strides); RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides, this->m_right_contracting_strides, this->m_k_strides); OutputMapper output(buffer, m); // Sizes of the blocks to load in cache. See the Goto paper for details. internal::TensorContractionBlocking blocking(k_slice, m, n, num_threads); const Index kc = blocking.kc(); const Index mc = numext::mini(m, blocking.mc()); const Index nc = numext::mini(n, blocking.nc()); typedef typename TensorContractionKernel::LhsBlock LhsBlock; typedef typename TensorContractionKernel::RhsBlock RhsBlock; LhsBlock blockA; RhsBlock blockB; TensorContractionKernel kernel(m, k_slice, n, mc, kc, nc); typedef typename TensorContractionKernel::BlockMemHandle BlockMemHandle; const BlockMemHandle packed_mem = kernel.allocate(this->m_device, &blockA, &blockB); // If a contraction kernel does not support beta, explicitly initialize // output buffer with zeroes. if (!TensorContractionKernel::HasBeta) { this->m_device.memset(buffer, 0, m * n * sizeof(Scalar)); } for(Index i2=0; i2= k_end) { m_output_kernel(output_mapper, m_tensor_contraction_params, i2, j2, actual_mc, actual_nc); } } } } kernel.deallocate(this->m_device, packed_mem); } EIGEN_STRONG_INLINE void cleanup() { m_leftImpl.cleanup(); m_rightImpl.cleanup(); if (m_result != NULL) { m_device.deallocate(m_result); m_result = NULL; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_result[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return internal::ploadt(m_result + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return m_result; } protected: Dimensions m_dimensions; contract_t m_k_strides; contract_t m_left_contracting_strides; contract_t m_right_contracting_strides; bool m_lhs_inner_dim_contiguous; bool m_rhs_inner_dim_contiguous; bool m_rhs_inner_dim_reordered; left_nocontract_t m_i_strides; right_nocontract_t m_j_strides; left_nocontract_t m_left_nocontract_strides; right_nocontract_t m_right_nocontract_strides; Index m_i_size; Index m_j_size; Index m_k_size; TensorContractionParams m_tensor_contraction_params; TensorEvaluator m_leftImpl; TensorEvaluator m_rightImpl; const Device EIGEN_DEVICE_REF m_device; OutputKernelType m_output_kernel; EvaluatorPointerType m_result; }; // evaluator for default device template struct TensorEvaluator, Device> : public TensorContractionEvaluatorBase< TensorEvaluator, Device> > { typedef TensorEvaluator, Device> Self; typedef TensorContractionEvaluatorBase Base; typedef TensorContractionOp XprType; typedef typename internal::remove_const::type Scalar; typedef typename XprType::Index Index; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; enum { Layout = TensorEvaluator::Layout }; // Most of the code is assuming that both input tensors are ColMajor. If the // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS: // If we want to compute A * B = C, where A is LHS and B is RHS, the code // will pretend B is LHS and A is RHS. typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType; typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType; static const int LDims = internal::array_size::Dimensions>::value; static const int RDims = internal::array_size::Dimensions>::value; static const int ContractDims = internal::array_size::value; typedef array contract_t; typedef array left_nocontract_t; typedef array right_nocontract_t; static const int NumDims = LDims + RDims - 2 * ContractDims; // Could we use NumDimensions here? typedef DSizes Dimensions; TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } template void evalProduct(Scalar* buffer) const { TENSOR_CONTRACTION_DISPATCH(this->template evalProductSequential, Alignment, (buffer)); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h0000644000176200001440000002111614562417221026375 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H #define EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H namespace Eigen { /** \class TensorForcedEval * \ingroup CXX11_Tensor_Module * * \brief Tensor reshaping class. * * */ namespace internal { template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename traits::StorageKind StorageKind; typedef typename traits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorForcedEvalOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorForcedEvalOp type; }; } // end namespace internal template class TensorForcedEvalOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorForcedEvalOp(const XprType& expr) : m_xpr(expr) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; }; namespace internal { template struct non_integral_type_placement_new{ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(Index numValues, StorageType m_buffer) { // Initialize non-trivially constructible types. if (!internal::is_arithmetic::value) { for (Index i = 0; i < numValues; ++i) new (m_buffer + i) CoeffReturnType(); } } }; // SYCL does not support non-integral types // having new (m_buffer + i) CoeffReturnType() causes the following compiler error for SYCL Devices // no matching function for call to 'operator new' template struct non_integral_type_placement_new { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(Index, StorageType) { } }; } // end namespace internal template struct TensorEvaluator, Device> { typedef const typename internal::remove_all::type ArgType; typedef TensorForcedEvalOp XprType; typedef typename ArgType::Scalar Scalar; typedef typename TensorEvaluator::Dimensions Dimensions; typedef typename XprType::Index Index; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename Eigen::internal::traits::PointerType TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = true, PacketAccess = (PacketType::size > 1), BlockAccess = internal::is_arithmetic::value, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, RawAccess = true }; static const int NumDims = internal::traits::NumDimensions; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_op(op.expression()), m_device(device), m_buffer(NULL) { } EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { const Index numValues = internal::array_prod(m_impl.dimensions()); m_buffer = m_device.get((CoeffReturnType*)m_device.allocate_temp(numValues * sizeof(CoeffReturnType))); internal::non_integral_type_placement_new()(numValues, m_buffer); typedef TensorEvalToOp< const typename internal::remove_const::type > EvalTo; EvalTo evalToTmp(m_device.get(m_buffer), m_op); internal::TensorExecutor< const EvalTo, typename internal::remove_const::type, /*Vectorizable=*/internal::IsVectorizable::value, /*Tiling=*/internal::IsTileable::value>:: run(evalToTmp, m_device); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { const Index numValues = internal::array_prod(m_impl.dimensions()); m_buffer = m_device.get((CoeffReturnType*)m_device.allocate_temp( numValues * sizeof(CoeffReturnType))); typedef TensorEvalToOp::type> EvalTo; EvalTo evalToTmp(m_device.get(m_buffer), m_op); auto on_done = std::bind([](EvalSubExprsCallback done_) { done_(true); }, std::move(done)); internal::TensorAsyncExecutor< const EvalTo, typename internal::remove_const::type, decltype(on_done), /*Vectorizable=*/internal::IsVectorizable::value, /*Tiling=*/internal::IsTileable::value>:: runAsync(evalToTmp, m_device, std::move(on_done)); } #endif EIGEN_STRONG_INLINE void cleanup() { m_device.deallocate_temp(m_buffer); m_buffer = NULL; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_buffer[index]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return internal::ploadt(m_buffer + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return internal::TensorBlockResourceRequirements::any(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { assert(m_buffer != NULL); return TensorBlock::materialize(m_buffer, m_impl.dimensions(), desc, scratch); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return m_buffer; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_buffer.bind(cgh); m_impl.bind(cgh); } #endif private: TensorEvaluator m_impl; const ArgType m_op; const Device EIGEN_DEVICE_REF m_device; EvaluatorPointerType m_buffer; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h0000644000176200001440000012655314562417221026332 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2016 Mehdi Goli, Codeplay Software Ltd // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H #define EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H // clang is incompatible with the CUDA syntax wrt making a kernel a class friend, // so we'll use a macro to make clang happy. #ifndef KERNEL_FRIEND #if defined(__clang__) && (defined(__CUDA__) || defined(__HIP__)) #define KERNEL_FRIEND friend __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 #else #define KERNEL_FRIEND friend #endif #endif namespace Eigen { /** \class TensorReduction * \ingroup CXX11_Tensor_Module * * \brief Tensor reduction class. * */ namespace internal { template class MakePointer_ > struct traits > : traits { typedef traits XprTraits; typedef typename XprTraits::Scalar Scalar; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; static const int NumDimensions = XprTraits::NumDimensions - array_size::value; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; template struct MakePointer { // Intermediate typedef to workaround MSVC issue. typedef MakePointer_ MakePointerT; typedef typename MakePointerT::Type Type; }; }; template class MakePointer_> struct eval, Eigen::Dense> { typedef const TensorReductionOp& type; }; template class MakePointer_> struct nested, 1, typename eval >::type> { typedef TensorReductionOp type; }; template struct DimInitializer { template EIGEN_DEVICE_FUNC static void run(const InputDims& input_dims, const array::value>& reduced, OutputDims* output_dims, ReducedDims* reduced_dims) { const int NumInputDims = internal::array_size::value; int outputIndex = 0; int reduceIndex = 0; for (int i = 0; i < NumInputDims; ++i) { if (reduced[i]) { (*reduced_dims)[reduceIndex] = input_dims[i]; ++reduceIndex; } else { (*output_dims)[outputIndex] = input_dims[i]; ++outputIndex; } } } }; template <> struct DimInitializer > { template EIGEN_DEVICE_FUNC static void run(const InputDims& input_dims, const array&, Sizes<>*, array* reduced_dims) { const int NumInputDims = internal::array_size::value; for (int i = 0; i < NumInputDims; ++i) { (*reduced_dims)[i] = input_dims[i]; } } }; template struct are_inner_most_dims { static const bool value = false; }; template struct preserve_inner_most_dims { static const bool value = false; }; #if EIGEN_HAS_CONSTEXPR && EIGEN_HAS_VARIADIC_TEMPLATES template struct are_inner_most_dims{ static const bool tmp1 = indices_statically_known_to_increase(); static const bool tmp2 = index_statically_eq(0, 0); static const bool tmp3 = index_statically_eq(array_size::value-1, array_size::value-1); static const bool value = tmp1 & tmp2 & tmp3; }; template struct are_inner_most_dims{ static const bool tmp1 = indices_statically_known_to_increase(); static const bool tmp2 = index_statically_eq(0, NumTensorDims - array_size::value); static const bool tmp3 = index_statically_eq(array_size::value - 1, NumTensorDims - 1); static const bool value = tmp1 & tmp2 & tmp3; }; template struct preserve_inner_most_dims{ static const bool tmp1 = indices_statically_known_to_increase(); static const bool tmp2 = index_statically_gt(0, 0); static const bool value = tmp1 & tmp2; }; template struct preserve_inner_most_dims{ static const bool tmp1 = indices_statically_known_to_increase(); static const bool tmp2 = index_statically_lt(array_size::value - 1, NumTensorDims - 1); static const bool value = tmp1 & tmp2; }; #endif template struct GenericDimReducer { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::CoeffReturnType* accum) { EIGEN_STATIC_ASSERT((DimIndex > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); for (int j = 0; j < self.m_reducedDims[DimIndex]; ++j) { const typename Self::Index input = firstIndex + j * self.m_reducedStrides[DimIndex]; GenericDimReducer::reduce(self, input, reducer, accum); } } }; template struct GenericDimReducer<0, Self, Op> { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::CoeffReturnType* accum) { for (int j = 0; j < self.m_reducedDims[0]; ++j) { const typename Self::Index input = firstIndex + j * self.m_reducedStrides[0]; reducer.reduce(self.m_impl.coeff(input), accum); } } }; template struct GenericDimReducer<-1, Self, Op> { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index index, Op& reducer, typename Self::CoeffReturnType* accum) { reducer.reduce(self.m_impl.coeff(index), accum); } }; template struct InnerMostDimReducer { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) { typename Self::CoeffReturnType accum = reducer.initialize(); for (typename Self::Index j = 0; j < numValuesToReduce; ++j) { reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum); } return reducer.finalize(accum); } }; template struct InnerMostDimReducer { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) { const typename Self::Index packetSize = internal::unpacket_traits::size; const typename Self::Index VectorizedSize = (numValuesToReduce / packetSize) * packetSize; typename Self::PacketReturnType paccum = reducer.template initializePacket(); for (typename Self::Index j = 0; j < VectorizedSize; j += packetSize) { reducer.reducePacket(self.m_impl.template packet(firstIndex + j), &paccum); } typename Self::CoeffReturnType accum = reducer.initialize(); for (typename Self::Index j = VectorizedSize; j < numValuesToReduce; ++j) { reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum); } return reducer.finalizeBoth(accum, paccum); } }; #if !defined(EIGEN_HIPCC) static const int kLeafSize = 1024; template struct InnerMostDimReducer { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) { typename Self::CoeffReturnType accum = reducer.initialize(); if (numValuesToReduce > kLeafSize) { const typename Self::Index half = numValuesToReduce / 2; reducer.reduce(reduce(self, firstIndex, half, reducer), &accum); reducer.reduce( reduce(self, firstIndex + half, numValuesToReduce - half, reducer), &accum); } else { for (typename Self::Index j = 0; j < numValuesToReduce; ++j) { reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum); } } return reducer.finalize(accum); } }; template struct InnerMostDimReducer { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) { const typename Self::Index packetSize = internal::unpacket_traits::size; typename Self::CoeffReturnType accum = reducer.initialize(); if (numValuesToReduce > packetSize * kLeafSize) { // Make sure the split point is aligned on a packet boundary. const typename Self::Index split = packetSize * divup(firstIndex + divup(numValuesToReduce, typename Self::Index(2)), packetSize); const typename Self::Index num_left = numext::mini(split - firstIndex, numValuesToReduce); reducer.reduce(reduce(self, firstIndex, num_left, reducer), &accum); if (num_left < numValuesToReduce) { reducer.reduce( reduce(self, split, numValuesToReduce - num_left, reducer), &accum); } return reducer.finalize(accum); } else { const typename Self::Index UnrollSize = (numValuesToReduce / (2*packetSize)) * 2*packetSize; const typename Self::Index VectorizedSize = (numValuesToReduce / packetSize) * packetSize; typename Self::PacketReturnType paccum = reducer.template initializePacket(); typename Self::PacketReturnType paccum2 = reducer.template initializePacket(); for (typename Self::Index j = 0; j < UnrollSize; j += packetSize * 2) { reducer.reducePacket( self.m_impl.template packet(firstIndex + j), &paccum); reducer.reducePacket( self.m_impl.template packet(firstIndex + j + packetSize), &paccum2); } for (typename Self::Index j = UnrollSize; j < VectorizedSize; j+= packetSize) { reducer.reducePacket(self.m_impl.template packet( firstIndex + j), &paccum); } reducer.reducePacket(paccum2, &paccum); for (typename Self::Index j = VectorizedSize; j < numValuesToReduce; ++j) { reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum); } return reducer.finalizeBoth(accum, paccum); } } }; #endif template struct InnerMostDimPreserver { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self&, typename Self::Index, Op&, typename Self::PacketReturnType*) { eigen_assert(false && "should never be called"); } }; template struct InnerMostDimPreserver { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::PacketReturnType* accum) { EIGEN_STATIC_ASSERT((DimIndex > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); for (typename Self::Index j = 0; j < self.m_reducedDims[DimIndex]; ++j) { const typename Self::Index input = firstIndex + j * self.m_reducedStrides[DimIndex]; InnerMostDimPreserver::reduce(self, input, reducer, accum); } } }; template struct InnerMostDimPreserver<0, Self, Op, true> { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::PacketReturnType* accum) { for (typename Self::Index j = 0; j < self.m_reducedDims[0]; ++j) { const typename Self::Index input = firstIndex + j * self.m_reducedStrides[0]; reducer.reducePacket(self.m_impl.template packet(input), accum); } } }; template struct InnerMostDimPreserver<-1, Self, Op, true> { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self&, typename Self::Index, Op&, typename Self::PacketReturnType*) { eigen_assert(false && "should never be called"); } }; // Default full reducer template struct FullReducer { static const bool HasOptimizedImplementation = false; static EIGEN_DEVICE_FUNC void run(const Self& self, Op& reducer, const Device&, typename Self::EvaluatorPointerType output) { const typename Self::Index num_coeffs = array_prod(self.m_impl.dimensions()); *output = InnerMostDimReducer::reduce(self, 0, num_coeffs, reducer); } }; #ifdef EIGEN_USE_THREADS // Multithreaded full reducers template struct FullReducerShard { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer, typename Self::CoeffReturnType* output) { *output = InnerMostDimReducer::reduce( self, firstIndex, numValuesToReduce, reducer); } }; // Multithreaded full reducer template struct FullReducer { static const bool HasOptimizedImplementation = !Self::ReducerTraits::IsStateful; static const Index PacketSize = unpacket_traits::size; // launch one reducer per thread and accumulate the result. static void run(const Self& self, Op& reducer, const ThreadPoolDevice& device, typename Self::CoeffReturnType* output) { typedef typename Self::Index Index; const Index num_coeffs = array_prod(self.m_impl.dimensions()); if (num_coeffs == 0) { *output = reducer.finalize(reducer.initialize()); return; } const TensorOpCost cost = self.m_impl.costPerCoeff(Vectorizable) + TensorOpCost(0, 0, internal::functor_traits::Cost, Vectorizable, PacketSize); const int num_threads = TensorCostModel::numThreads( num_coeffs, cost, device.numThreads()); if (num_threads == 1) { *output = InnerMostDimReducer::reduce(self, 0, num_coeffs, reducer); return; } const Index blocksize = std::floor(static_cast(num_coeffs) / num_threads); const Index numblocks = blocksize > 0 ? num_coeffs / blocksize : 0; eigen_assert(num_coeffs >= numblocks * blocksize); Barrier barrier(internal::convert_index(numblocks)); MaxSizeVector shards(numblocks, reducer.initialize()); for (Index i = 0; i < numblocks; ++i) { device.enqueue_with_barrier(&barrier, &FullReducerShard::run, self, i * blocksize, blocksize, reducer, &shards[i]); } typename Self::CoeffReturnType finalShard; if (numblocks * blocksize < num_coeffs) { finalShard = InnerMostDimReducer::reduce( self, numblocks * blocksize, num_coeffs - numblocks * blocksize, reducer); } else { finalShard = reducer.initialize(); } barrier.Wait(); for (Index i = 0; i < numblocks; ++i) { reducer.reduce(shards[i], &finalShard); } *output = reducer.finalize(finalShard); } }; #endif // Default inner reducer template struct InnerReducer { static const bool HasOptimizedImplementation = false; EIGEN_DEVICE_FUNC static bool run(const Self&, Op&, const Device&, typename Self::CoeffReturnType*, typename Self::Index, typename Self::Index) { eigen_assert(false && "Not implemented"); return true; } }; // Default outer reducer template struct OuterReducer { static const bool HasOptimizedImplementation = false; EIGEN_DEVICE_FUNC static bool run(const Self&, Op&, const Device&, typename Self::CoeffReturnType*, typename Self::Index, typename Self::Index) { eigen_assert(false && "Not implemented"); return true; } }; #ifdef EIGEN_USE_SYCL // Default Generic reducer template struct GenericReducer { static const bool HasOptimizedImplementation = false; EIGEN_DEVICE_FUNC static bool run(const Self&, Op&, const Device&, typename Self::CoeffReturnType*, typename Self::Index, typename Self::Index) { eigen_assert(false && "Not implemented"); return true; } }; #endif #if defined(EIGEN_USE_GPU) && (defined(EIGEN_GPUCC)) template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void FullReductionKernel(R, const S, I_, typename S::CoeffReturnType*, unsigned int*); #if defined(EIGEN_HAS_GPU_FP16) template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ReductionInitFullReduxKernelHalfFloat(R, const S, I_, internal::packet_traits::type*); template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void FullReductionKernelHalfFloat(R, const S, I_, half*, internal::packet_traits::type*); template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void InnerReductionKernelHalfFloat(R, const S, I_, I_, half*); #endif template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void InnerReductionKernel(R, const S, I_, I_, typename S::CoeffReturnType*); template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void OuterReductionKernel(R, const S, I_, I_, typename S::CoeffReturnType*); #endif /** * For SYCL, the return type of the reduction is deduced from the initialize method of the given Op. * This allows the reduction to have a different type for the accumulator than the input data type. * If this is the case, the functor needs to have two reduce method: one for reducing an element of the input * with the accumulator and the other for reducing two accumulators. * Such a reducer can be useful for instance when the accumulator is a boolean or a bitset that checks for * some properties of the input. */ template struct ReductionReturnType { #if defined(EIGEN_USE_SYCL) typedef typename remove_const().initialize())>::type type; #else typedef typename remove_const::type type; #endif }; } // end namespace internal template class MakePointer_> class TensorReductionOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReductionOp(const XprType& expr, const Dims& dims) : m_expr(expr), m_dims(dims) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReductionOp(const XprType& expr, const Dims& dims, const Op& reducer) : m_expr(expr), m_dims(dims), m_reducer(reducer) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const XprType& expression() const { return m_expr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dims& dims() const { return m_dims; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Op& reducer() const { return m_reducer; } protected: typename XprType::Nested m_expr; const Dims m_dims; const Op m_reducer; }; template struct TensorReductionEvaluatorBase; // Eval as rvalue template class MakePointer_, typename Device> struct TensorReductionEvaluatorBase, Device> { typedef internal::reducer_traits ReducerTraits; typedef Dims ReducedDims; typedef TensorReductionOp XprType; typedef typename XprType::Index Index; typedef ArgType ChildType; typedef typename TensorEvaluator::Dimensions InputDimensions; static const int NumInputDims = internal::array_size::value; static const int NumReducedDims = internal::array_size::value; static const int NumOutputDims = NumInputDims - NumReducedDims; typedef typename internal::conditional, DSizes >::type Dimensions; typedef typename XprType::Scalar Scalar; typedef TensorReductionEvaluatorBase, Device> Self; static const bool InputPacketAccess = TensorEvaluator::PacketAccess; typedef typename internal::ReductionReturnType::type CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const Index PacketSize = PacketType::size; typedef typename Eigen::internal::traits::PointerType TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; // Subset of strides of the input tensor for the non-reduced dimensions. // Indexed by output dimensions. static const int NumPreservedStrides = max_n_1::size; enum { IsAligned = false, PacketAccess = Self::InputPacketAccess && ReducerTraits::PacketAccess, BlockAccess = false, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// static const bool ReducingInnerMostDims = internal::are_inner_most_dims::value; static const bool PreservingInnerMostDims = internal::preserve_inner_most_dims::value; static const bool RunningFullReduction = (NumOutputDims==0); EIGEN_STRONG_INLINE TensorReductionEvaluatorBase(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_reducer(op.reducer()), m_result(NULL), m_device(device) { EIGEN_STATIC_ASSERT((NumInputDims >= NumReducedDims), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((!ReducingInnerMostDims | !PreservingInnerMostDims | (NumReducedDims == NumInputDims)), YOU_MADE_A_PROGRAMMING_MISTAKE); // Build the bitmap indicating if an input dimension is reduced or not. for (int i = 0; i < NumInputDims; ++i) { m_reduced[i] = false; } for (int i = 0; i < NumReducedDims; ++i) { eigen_assert(op.dims()[i] >= 0); eigen_assert(op.dims()[i] < NumInputDims); m_reduced[op.dims()[i]] = true; } const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); internal::DimInitializer::run(input_dims, m_reduced, &m_dimensions, &m_reducedDims); // Precompute output strides. if (NumOutputDims > 0) { if (static_cast(Layout) == static_cast(ColMajor)) { m_outputStrides[0] = 1; for (int i = 1; i < NumOutputDims; ++i) { m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i]); } } else { m_outputStrides[NumOutputDims - 1] = 1; for (int i = NumOutputDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i]); } } } // Precompute input strides. if (NumInputDims > 0) { array input_strides; if (static_cast(Layout) == static_cast(ColMajor)) { input_strides[0] = 1; for (int i = 1; i < NumInputDims; ++i) { input_strides[i] = input_strides[i-1] * input_dims[i-1]; } } else { input_strides.back() = 1; for (int i = NumInputDims - 2; i >= 0; --i) { input_strides[i] = input_strides[i + 1] * input_dims[i + 1]; } } int outputIndex = 0; int reduceIndex = 0; for (int i = 0; i < NumInputDims; ++i) { if (m_reduced[i]) { m_reducedStrides[reduceIndex] = input_strides[i]; ++reduceIndex; } else { m_preservedStrides[outputIndex] = input_strides[i]; m_output_to_input_dim_map[outputIndex] = i; ++outputIndex; } } } // Special case for full reductions if (NumOutputDims == 0) { m_preservedStrides[0] = internal::array_prod(input_dims); } m_numValuesToReduce = NumOutputDims == 0 ? internal::array_prod(input_dims) : (static_cast(Layout) == static_cast(ColMajor)) ? m_preservedStrides[0] : m_preservedStrides[NumOutputDims - 1]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeededCommon(EvaluatorPointerType data) { // Use the FullReducer if possible. if ((RunningFullReduction && RunningOnSycl) ||(RunningFullReduction && internal::FullReducer::HasOptimizedImplementation && ((RunningOnGPU && (m_device.majorDeviceVersion() >= 3)) || !RunningOnGPU))) { bool need_assign = false; if (!data) { m_result = static_cast(m_device.get((CoeffReturnType*)m_device.allocate_temp(sizeof(CoeffReturnType)))); data = m_result; need_assign = true; } Op reducer(m_reducer); internal::FullReducer::run(*this, reducer, m_device, data); return need_assign; } // Attempt to use an optimized reduction. else if ((RunningOnGPU && (m_device.majorDeviceVersion() >= 3)) || (RunningOnSycl)) { bool reducing_inner_dims = true; for (int i = 0; i < NumReducedDims; ++i) { if (static_cast(Layout) == static_cast(ColMajor)) { reducing_inner_dims &= m_reduced[i]; } else { reducing_inner_dims &= m_reduced[NumInputDims - 1 - i]; } } if (internal::InnerReducer::HasOptimizedImplementation && (reducing_inner_dims || ReducingInnerMostDims)) { const Index num_values_to_reduce = internal::array_prod(m_reducedDims); const Index num_coeffs_to_preserve = internal::array_prod(m_dimensions); if (!data) { if ((num_coeffs_to_preserve < 1024 && num_values_to_reduce > num_coeffs_to_preserve && num_values_to_reduce > 128) || (RunningOnSycl)) { data = static_cast(m_device.get((CoeffReturnType*)m_device.allocate_temp(sizeof(CoeffReturnType) * num_coeffs_to_preserve))); m_result = data; } else { return true; } } Op reducer(m_reducer); // For SYCL this if always return false if (internal::InnerReducer::run(*this, reducer, m_device, data, num_values_to_reduce, num_coeffs_to_preserve)) { if (m_result) { m_device.deallocate_temp(m_result); m_result = NULL; } return true; } else { return (m_result != NULL); } } bool preserving_inner_dims = true; for (int i = 0; i < NumReducedDims; ++i) { if (static_cast(Layout) == static_cast(ColMajor)) { preserving_inner_dims &= m_reduced[NumInputDims - 1 - i]; } else { preserving_inner_dims &= m_reduced[i]; } } if (internal::OuterReducer::HasOptimizedImplementation && preserving_inner_dims) { const Index num_values_to_reduce = internal::array_prod(m_reducedDims); const Index num_coeffs_to_preserve = internal::array_prod(m_dimensions); if (!data) { if ((num_coeffs_to_preserve < 1024 && num_values_to_reduce > num_coeffs_to_preserve && num_values_to_reduce > 32) || (RunningOnSycl)) { data = static_cast(m_device.get((CoeffReturnType*)m_device.allocate_temp(sizeof(CoeffReturnType) * num_coeffs_to_preserve))); m_result = data; } else { return true; } } Op reducer(m_reducer); // For SYCL this if always return false if (internal::OuterReducer::run(*this, reducer, m_device, data, num_values_to_reduce, num_coeffs_to_preserve)) { if (m_result) { m_device.deallocate_temp(m_result); m_result = NULL; } return true; } else { return (m_result != NULL); } } #if defined(EIGEN_USE_SYCL) // If there is no Optimised version for SYCL, the reduction expression // must break into two subexpression and use the SYCL generic Reducer on the device. if(RunningOnSycl) { const Index num_values_to_reduce = internal::array_prod(m_reducedDims); const Index num_coeffs_to_preserve = internal::array_prod(m_dimensions); if (!data) { data = static_cast(m_device.get((CoeffReturnType*)m_device.allocate_temp(sizeof(CoeffReturnType) * num_coeffs_to_preserve))); m_result = data; } Op reducer(m_reducer); internal::GenericReducer::run(*this, reducer, m_device, data, num_values_to_reduce, num_coeffs_to_preserve); return (m_result != NULL); } #endif } return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync(EvaluatorPointerType data, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(NULL, [this, data, done](bool) { done(evalSubExprsIfNeededCommon(data)); }); } #endif EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { m_impl.evalSubExprsIfNeeded(NULL); return evalSubExprsIfNeededCommon(data); } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); if (m_result) { m_device.deallocate_temp(m_result); m_result = NULL; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { if (( RunningFullReduction || RunningOnGPU) && m_result ) { return *(m_result + index); } Op reducer(m_reducer); if (ReducingInnerMostDims || RunningFullReduction) { const Index num_values_to_reduce = (static_cast(Layout) == static_cast(ColMajor)) ? m_preservedStrides[0] : m_preservedStrides[NumPreservedStrides - 1]; return internal::InnerMostDimReducer::reduce(*this, firstInput(index), num_values_to_reduce, reducer); } else { typename Self::CoeffReturnType accum = reducer.initialize(); internal::GenericDimReducer::reduce(*this, firstInput(index), reducer, &accum); return reducer.finalize(accum); } } // TODO(bsteiner): provide a more efficient implementation. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index + PacketSize - 1 < Index(internal::array_prod(dimensions()))); if (RunningOnGPU && m_result) { return internal::pload(m_result + index); } EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; if (ReducingInnerMostDims) { const Index num_values_to_reduce = (static_cast(Layout) == static_cast(ColMajor)) ? m_preservedStrides[0] : m_preservedStrides[NumPreservedStrides - 1]; const Index firstIndex = firstInput(index); for (Index i = 0; i < PacketSize; ++i) { Op reducer(m_reducer); values[i] = internal::InnerMostDimReducer::reduce(*this, firstIndex + i * num_values_to_reduce, num_values_to_reduce, reducer); } } else if (PreservingInnerMostDims) { const Index firstIndex = firstInput(index); const int innermost_dim = (static_cast(Layout) == static_cast(ColMajor)) ? 0 : NumOutputDims - 1; // TBD: extend this the the n innermost dimensions that we preserve. if (((firstIndex % m_dimensions[innermost_dim]) + PacketSize - 1) < m_dimensions[innermost_dim]) { Op reducer(m_reducer); typename Self::PacketReturnType accum = reducer.template initializePacket(); internal::InnerMostDimPreserver::reduce(*this, firstIndex, reducer, &accum); return reducer.finalizePacket(accum); } else { for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index + i); } } } else { for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index + i); } } PacketReturnType rslt = internal::pload(values); return rslt; } // Must be called after evalSubExprsIfNeeded(). EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { if (RunningFullReduction && m_result) { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketSize); } else { const Index num_values_to_reduce = internal::array_prod(m_reducedDims); const double compute_cost = num_values_to_reduce * internal::functor_traits::Cost; return m_impl.costPerCoeff(vectorized) * num_values_to_reduce + TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_result; } EIGEN_DEVICE_FUNC const TensorEvaluator& impl() const { return m_impl; } EIGEN_DEVICE_FUNC const Device& device() const { return m_device; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); m_result.bind(cgh); } #endif private: template friend struct internal::GenericDimReducer; template friend struct internal::InnerMostDimReducer; template friend struct internal::InnerMostDimPreserver; template friend struct internal::FullReducer; #ifdef EIGEN_USE_THREADS template friend struct internal::FullReducerShard; #endif #if defined(EIGEN_USE_GPU) && (defined(EIGEN_GPUCC)) template KERNEL_FRIEND void internal::FullReductionKernel(R, const S, I_, typename S::CoeffReturnType*, unsigned int*); #if defined(EIGEN_HAS_GPU_FP16) template KERNEL_FRIEND void internal::ReductionInitFullReduxKernelHalfFloat(R, const S, I_, internal::packet_traits::type*); template KERNEL_FRIEND void internal::FullReductionKernelHalfFloat(R, const S, I_, half*, internal::packet_traits::type*); template KERNEL_FRIEND void internal::InnerReductionKernelHalfFloat(R, const S, I_, I_, half*); #endif template KERNEL_FRIEND void internal::InnerReductionKernel(R, const S, I_, I_, typename S::CoeffReturnType*); template KERNEL_FRIEND void internal::OuterReductionKernel(R, const S, I_, I_, typename S::CoeffReturnType*); #endif #if defined(EIGEN_USE_SYCL) template < typename Evaluator_, typename Op__> friend class TensorSycl::internal::GenericNondeterministicReducer; // SYCL need the Generic reducer for the case the recution algorithm is neither inner, outer, and full reducer template friend struct internal::GenericReducer; #endif template friend struct internal::InnerReducer; struct BlockIteratorState { Index input_dim; Index output_size; Index output_count; }; // Returns the Index in the input tensor of the first value that needs to be // used to compute the reduction at output index "index". EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index firstInput(Index index) const { if (ReducingInnerMostDims) { if (static_cast(Layout) == static_cast(ColMajor)) { return index * m_preservedStrides[0]; } else { return index * m_preservedStrides[NumPreservedStrides - 1]; } } // TBD: optimize the case where we preserve the innermost dimensions. Index startInput = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumOutputDims - 1; i > 0; --i) { // This is index_i in the output tensor. const Index idx = index / m_outputStrides[i]; startInput += idx * m_preservedStrides[i]; index -= idx * m_outputStrides[i]; } if (PreservingInnerMostDims) { eigen_assert(m_preservedStrides[0] == 1); startInput += index; } else { startInput += index * m_preservedStrides[0]; } } else { for (int i = 0; i < NumOutputDims - 1; ++i) { // This is index_i in the output tensor. const Index idx = index / m_outputStrides[i]; startInput += idx * m_preservedStrides[i]; index -= idx * m_outputStrides[i]; } if (PreservingInnerMostDims) { eigen_assert(m_preservedStrides[NumPreservedStrides - 1] == 1); startInput += index; } else { startInput += index * m_preservedStrides[NumPreservedStrides - 1]; } } return startInput; } // Bitmap indicating if an input dimension is reduced or not. array m_reduced; // Dimensions of the output of the operation. Dimensions m_dimensions; // Precomputed strides for the output tensor. array m_outputStrides; array, NumOutputDims> m_fastOutputStrides; array m_preservedStrides; // Map from output to input dimension index. array m_output_to_input_dim_map; // How many values go into each reduction Index m_numValuesToReduce; // Subset of strides of the input tensor for the reduced dimensions. // Indexed by reduced dimensions. array m_reducedStrides; // Size of the input dimensions that are reduced. // Indexed by reduced dimensions. array m_reducedDims; // Evaluator for the input expression. TensorEvaluator m_impl; // Operation to apply for computing the reduction. Op m_reducer; // For full reductions #if defined(EIGEN_USE_GPU) && (defined(EIGEN_GPUCC)) static const bool RunningOnGPU = internal::is_same::value; static const bool RunningOnSycl = false; #elif defined(EIGEN_USE_SYCL) static const bool RunningOnSycl = internal::is_same::type, Eigen::SyclDevice>::value; static const bool RunningOnGPU = false; #else static const bool RunningOnGPU = false; static const bool RunningOnSycl = false; #endif EvaluatorPointerType m_result; const Device EIGEN_DEVICE_REF m_device; }; template class MakePointer_, typename Device> struct TensorEvaluator, Device> : public TensorReductionEvaluatorBase, Device> { typedef TensorReductionEvaluatorBase, Device> Base; EIGEN_STRONG_INLINE TensorEvaluator(const typename Base::XprType& op, const Device& device) : Base(op, device){} }; template class MakePointer_> struct TensorEvaluator, Eigen::SyclDevice> : public TensorReductionEvaluatorBase, Eigen::SyclDevice> { typedef TensorReductionEvaluatorBase, Eigen::SyclDevice> Base; EIGEN_STRONG_INLINE TensorEvaluator(const typename Base::XprType& op, const Eigen::SyclDevice& device) : Base(op, device){} // The coeff function in the base the recursive method which is not an standard layout and cannot be used in the SYCL kernel //Therefore the coeff function should be overridden by for SYCL kernel EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Base::CoeffReturnType coeff(typename Base::Index index) const { return *(this->data() + index); } // The packet function in the base the recursive method which is not an standard layout and cannot be used in the SYCL kernel //Therefore the packet function should be overridden by for SYCL kernel template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Base::PacketReturnType packet(typename Base::Index index) const { return internal::pload(this->data() + index); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceDefault.h0000644000176200001440000000654314562417221027076 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_DEVICE_DEFAULT_H #define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_DEFAULT_H namespace Eigen { // Default device for the machine (typically a single cpu core) struct DefaultDevice { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const { return internal::aligned_malloc(num_bytes); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void deallocate(void* buffer) const { internal::aligned_free(buffer); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void* allocate_temp(size_t num_bytes) const { return allocate(num_bytes); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void deallocate_temp(void* buffer) const { deallocate(buffer); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const { ::memcpy(dst, src, n); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const { memcpy(dst, src, n); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const { memcpy(dst, src, n); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const { ::memset(buffer, c, n); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Type get(Type data) const { return data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t numThreads() const { #if !defined(EIGEN_GPU_COMPILE_PHASE) // Running on the host CPU return 1; #elif defined(EIGEN_HIP_DEVICE_COMPILE) // Running on a HIP device return 64; #else // Running on a CUDA device return 32; #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const { #if !defined(EIGEN_GPU_COMPILE_PHASE) && !defined(SYCL_DEVICE_ONLY) // Running on the host CPU return l1CacheSize(); #elif defined(EIGEN_HIP_DEVICE_COMPILE) // Running on a HIP device return 48*1024; // FIXME : update this number for HIP #else // Running on a CUDA device, return the amount of shared memory available. return 48*1024; #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const { #if !defined(EIGEN_GPU_COMPILE_PHASE) && !defined(SYCL_DEVICE_ONLY) // Running single threaded on the host CPU return l3CacheSize(); #elif defined(EIGEN_HIP_DEVICE_COMPILE) // Running on a HIP device return firstLevelCacheSize(); // FIXME : update this number for HIP #else // Running on a CUDA device return firstLevelCacheSize(); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int majorDeviceVersion() const { #if !defined(EIGEN_GPU_COMPILE_PHASE) // Running single threaded on the host CPU // Should return an enum that encodes the ISA supported by the CPU return 1; #elif defined(EIGEN_HIP_DEVICE_COMPILE) // Running on a HIP device // return 1 as major for HIP return 1; #else // Running on a CUDA device return EIGEN_CUDA_ARCH / 100; #endif } }; } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_DEFAULT_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h0000644000176200001440000001255114562417221025772 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // Copyright (C) 2014-2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSORSTORAGE_H #define EIGEN_CXX11_TENSOR_TENSORSTORAGE_H #ifdef EIGEN_TENSOR_STORAGE_CTOR_PLUGIN #define EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN EIGEN_TENSOR_STORAGE_CTOR_PLUGIN; #else #define EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN #endif namespace Eigen { /** \internal * * \class TensorStorage * \ingroup CXX11_Tensor_Module * * \brief Stores the data of a tensor * * This class stores the data of fixed-size, dynamic-size or mixed tensors * in a way as compact as possible. * * \sa Tensor */ template class TensorStorage; // Pure fixed-size storage template class TensorStorage { private: static const std::size_t Size = FixedDimensions::total_size; // Allocate an array of size at least one to prevent compiler warnings. static const std::size_t MinSize = max_n_1::size; EIGEN_ALIGN_MAX T m_data[MinSize]; public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStorage() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T *data() { return m_data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T *data() const { return m_data; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const FixedDimensions& dimensions() { static const FixedDimensions* singleton_dimensions = new FixedDimensions(); return *singleton_dimensions; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex size() const { return Size; } }; // pure dynamic template class TensorStorage, Options_> { public: typedef IndexType Index; typedef DSizes Dimensions; typedef TensorStorage, Options_> Self; EIGEN_DEVICE_FUNC TensorStorage() : m_data(0), m_dimensions() { if (NumIndices_ == 0) { m_data = internal::conditional_aligned_new_auto(1); } } EIGEN_DEVICE_FUNC TensorStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_dimensions(internal::template repeat(0)) {} EIGEN_DEVICE_FUNC TensorStorage(Index size, const array& dimensions) : m_data(internal::conditional_aligned_new_auto(size)), m_dimensions(dimensions) { EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC TensorStorage(DenseIndex... indices) : m_dimensions(indices...) { m_data = internal::conditional_aligned_new_auto(internal::array_prod(m_dimensions)); } #endif EIGEN_DEVICE_FUNC TensorStorage(const Self& other) : m_data(internal::conditional_aligned_new_auto(internal::array_prod(other.m_dimensions))) , m_dimensions(other.m_dimensions) { internal::smart_copy(other.m_data, other.m_data+internal::array_prod(other.m_dimensions), m_data); } EIGEN_DEVICE_FUNC Self& operator=(const Self& other) { if (this != &other) { Self tmp(other); this->swap(tmp); } return *this; } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC TensorStorage(Self&& other) : TensorStorage() { *this = std::move(other); } EIGEN_DEVICE_FUNC Self& operator=(Self&& other) { numext::swap(m_data, other.m_data); numext::swap(m_dimensions, other.m_dimensions); return *this; } #endif EIGEN_DEVICE_FUNC ~TensorStorage() { internal::conditional_aligned_delete_auto(m_data, internal::array_prod(m_dimensions)); } EIGEN_DEVICE_FUNC void swap(Self& other) { numext::swap(m_data,other.m_data); numext::swap(m_dimensions,other.m_dimensions); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {return m_dimensions;} EIGEN_DEVICE_FUNC void resize(Index size, const array& nbDimensions) { const Index currentSz = internal::array_prod(m_dimensions); if(size != currentSz) { internal::conditional_aligned_delete_auto(m_data, currentSz); if (size) m_data = internal::conditional_aligned_new_auto(size); else if (NumIndices_ == 0) { m_data = internal::conditional_aligned_new_auto(1); } else m_data = 0; EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) } m_dimensions = nbDimensions; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T *data() { return m_data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T *data() const { return m_data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_dimensions.TotalSize(); } private: T *m_data; Dimensions m_dimensions; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSORSTORAGE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h0000644000176200001440000003554314562417221027555 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #if defined(EIGEN_USE_THREADS) && !defined(EIGEN_CXX11_TENSOR_TENSOR_DEVICE_THREAD_POOL_H) #define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_THREAD_POOL_H namespace Eigen { // Runs an arbitrary function and then calls Notify() on the passed in // Notification. template struct FunctionWrapperWithNotification { static void run(Notification* n, Function f, Args... args) { f(args...); if (n) { n->Notify(); } } }; template struct FunctionWrapperWithBarrier { static void run(Barrier* b, Function f, Args... args) { f(args...); if (b) { b->Notify(); } } }; template static EIGEN_STRONG_INLINE void wait_until_ready(SyncType* n) { if (n) { n->Wait(); } } // An abstract interface to a device specific memory allocator. class Allocator { public: virtual ~Allocator() {} virtual void* allocate(size_t num_bytes) const = 0; virtual void deallocate(void* buffer) const = 0; }; // Build a thread pool device on top the an existing pool of threads. struct ThreadPoolDevice { // The ownership of the thread pool remains with the caller. ThreadPoolDevice(ThreadPoolInterface* pool, int num_cores, Allocator* allocator = nullptr) : pool_(pool), num_threads_(num_cores), allocator_(allocator) { } EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const { return allocator_ ? allocator_->allocate(num_bytes) : internal::aligned_malloc(num_bytes); } EIGEN_STRONG_INLINE void deallocate(void* buffer) const { if (allocator_) { allocator_->deallocate(buffer); } else { internal::aligned_free(buffer); } } EIGEN_STRONG_INLINE void* allocate_temp(size_t num_bytes) const { return allocate(num_bytes); } EIGEN_STRONG_INLINE void deallocate_temp(void* buffer) const { deallocate(buffer); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Type get(Type data) const { return data; } EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const { #ifdef __ANDROID__ ::memcpy(dst, src, n); #else // TODO(rmlarsen): Align blocks on cache lines. // We have observed that going beyond 4 threads usually just wastes // CPU cycles due to the threads competing for memory bandwidth, so we // statically schedule at most 4 block copies here. const size_t kMinBlockSize = 32768; const size_t num_threads = CostModel::numThreads(n, TensorOpCost(1.0, 1.0, 0), 4); if (n <= kMinBlockSize || num_threads < 2) { ::memcpy(dst, src, n); } else { const char* src_ptr = static_cast(src); char* dst_ptr = static_cast(dst); const size_t blocksize = (n + (num_threads - 1)) / num_threads; Barrier barrier(static_cast(num_threads - 1)); // Launch the last 3 blocks on worker threads. for (size_t i = 1; i < num_threads; ++i) { enqueue_with_barrier(&barrier, [n, i, src_ptr, dst_ptr, blocksize] { ::memcpy(dst_ptr + i * blocksize, src_ptr + i * blocksize, numext::mini(blocksize, n - (i * blocksize))); }); } // Launch the first block on the main thread. ::memcpy(dst_ptr, src_ptr, blocksize); barrier.Wait(); } #endif } EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const { memcpy(dst, src, n); } EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const { memcpy(dst, src, n); } EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const { ::memset(buffer, c, n); } EIGEN_STRONG_INLINE int numThreads() const { return num_threads_; } // Number of theads available in the underlying thread pool. This number can // be different from the value returned by numThreads(). EIGEN_STRONG_INLINE int numThreadsInPool() const { return pool_->NumThreads(); } EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const { return l1CacheSize(); } EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const { // The l3 cache size is shared between all the cores. return l3CacheSize() / num_threads_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int majorDeviceVersion() const { // Should return an enum that encodes the ISA supported by the CPU return 1; } template EIGEN_STRONG_INLINE Notification* enqueue(Function&& f, Args&&... args) const { Notification* n = new Notification(); pool_->Schedule( std::bind(&FunctionWrapperWithNotification::run, n, std::move(f), args...)); return n; } template EIGEN_STRONG_INLINE void enqueue_with_barrier(Barrier* b, Function&& f, Args&&... args) const { pool_->Schedule( std::bind(&FunctionWrapperWithBarrier::run, b, std::move(f), args...)); } template EIGEN_STRONG_INLINE void enqueueNoNotification(Function&& f, Args&&... args) const { if (sizeof...(args) > 0) { pool_->Schedule(std::bind(std::move(f), args...)); } else { pool_->Schedule(std::move(f)); } } // Returns a logical thread index between 0 and pool_->NumThreads() - 1 if // called from one of the threads in pool_. Returns -1 otherwise. EIGEN_STRONG_INLINE int currentThreadId() const { return pool_->CurrentThreadId(); } // WARNING: This function is synchronous and will block the calling thread. // // Synchronous parallelFor executes f with [0, n) arguments in parallel and // waits for completion. F accepts a half-open interval [first, last). Block // size is chosen based on the iteration cost and resulting parallel // efficiency. If block_align is not nullptr, it is called to round up the // block size. void parallelFor(Index n, const TensorOpCost& cost, std::function block_align, std::function f) const { if (EIGEN_PREDICT_FALSE(n <= 0)){ return; // Compute small problems directly in the caller thread. } else if (n == 1 || numThreads() == 1 || CostModel::numThreads(n, cost, static_cast(numThreads())) == 1) { f(0, n); return; } // Compute block size and total count of blocks. ParallelForBlock block = CalculateParallelForBlock(n, cost, block_align); // Recursively divide size into halves until we reach block_size. // Division code rounds mid to block_size, so we are guaranteed to get // block_count leaves that do actual computations. Barrier barrier(static_cast(block.count)); std::function handleRange; handleRange = [=, &handleRange, &barrier, &f](Index firstIdx, Index lastIdx) { while (lastIdx - firstIdx > block.size) { // Split into halves and schedule the second half on a different thread. const Index midIdx = firstIdx + divup((lastIdx - firstIdx) / 2, block.size) * block.size; pool_->Schedule([=, &handleRange]() { handleRange(midIdx, lastIdx); }); lastIdx = midIdx; } // Single block or less, execute directly. f(firstIdx, lastIdx); barrier.Notify(); }; if (block.count <= numThreads()) { // Avoid a thread hop by running the root of the tree and one block on the // main thread. handleRange(0, n); } else { // Execute the root in the thread pool to avoid running work on more than // numThreads() threads. pool_->Schedule([=, &handleRange]() { handleRange(0, n); }); } barrier.Wait(); } // Convenience wrapper for parallelFor that does not align blocks. void parallelFor(Index n, const TensorOpCost& cost, std::function f) const { parallelFor(n, cost, nullptr, std::move(f)); } // WARNING: This function is asynchronous and will not block the calling thread. // // Asynchronous parallelFor executes f with [0, n) arguments in parallel // without waiting for completion. When the last block finished, it will call // 'done' callback. F accepts a half-open interval [first, last). Block size // is chosen based on the iteration cost and resulting parallel efficiency. If // block_align is not nullptr, it is called to round up the block size. void parallelForAsync(Index n, const TensorOpCost& cost, std::function block_align, std::function f, std::function done) const { // Compute small problems directly in the caller thread. if (n <= 1 || numThreads() == 1 || CostModel::numThreads(n, cost, static_cast(numThreads())) == 1) { f(0, n); done(); return; } // Compute block size and total count of blocks. ParallelForBlock block = CalculateParallelForBlock(n, cost, block_align); ParallelForAsyncContext* const ctx = new ParallelForAsyncContext(block.count, std::move(f), std::move(done)); // Recursively divide size into halves until we reach block_size. // Division code rounds mid to block_size, so we are guaranteed to get // block_count leaves that do actual computations. ctx->handle_range = [this, ctx, block](Index firstIdx, Index lastIdx) { while (lastIdx - firstIdx > block.size) { // Split into halves and schedule the second half on a different thread. const Index midIdx = firstIdx + divup((lastIdx - firstIdx) / 2, block.size) * block.size; pool_->Schedule( [ctx, midIdx, lastIdx]() { ctx->handle_range(midIdx, lastIdx); }); lastIdx = midIdx; } // Single block or less, execute directly. ctx->f(firstIdx, lastIdx); // Delete async context if it was the last block. if (ctx->count.fetch_sub(1) == 1) delete ctx; }; if (block.count <= numThreads()) { // Avoid a thread hop by running the root of the tree and one block on the // main thread. ctx->handle_range(0, n); } else { // Execute the root in the thread pool to avoid running work on more than // numThreads() threads. pool_->Schedule([ctx, n]() { ctx->handle_range(0, n); }); } } // Convenience wrapper for parallelForAsync that does not align blocks. void parallelForAsync(Index n, const TensorOpCost& cost, std::function f, std::function done) const { parallelForAsync(n, cost, nullptr, std::move(f), std::move(done)); } // Thread pool accessor. ThreadPoolInterface* getPool() const { return pool_; } // Allocator accessor. Allocator* allocator() const { return allocator_; } private: typedef TensorCostModel CostModel; // For parallelForAsync we must keep passed in closures on the heap, and // delete them only after `done` callback finished. struct ParallelForAsyncContext { ParallelForAsyncContext(Index block_count, std::function block_f, std::function done_callback) : count(block_count), f(std::move(block_f)), done(std::move(done_callback)) {} ~ParallelForAsyncContext() { done(); } std::atomic count; std::function f; std::function done; std::function handle_range; }; struct ParallelForBlock { Index size; // block size Index count; // number of blocks }; // Calculates block size based on (1) the iteration cost and (2) parallel // efficiency. We want blocks to be not too small to mitigate parallelization // overheads; not too large to mitigate tail effect and potential load // imbalance and we also want number of blocks to be evenly dividable across // threads. ParallelForBlock CalculateParallelForBlock( const Index n, const TensorOpCost& cost, std::function block_align) const { const double block_size_f = 1.0 / CostModel::taskSize(1, cost); const Index max_oversharding_factor = 4; Index block_size = numext::mini( n, numext::maxi( divup(n, max_oversharding_factor * numThreads()), block_size_f)); const Index max_block_size = numext::mini(n, 2 * block_size); if (block_align) { Index new_block_size = block_align(block_size); eigen_assert(new_block_size >= block_size); block_size = numext::mini(n, new_block_size); } Index block_count = divup(n, block_size); // Calculate parallel efficiency as fraction of total CPU time used for // computations: double max_efficiency = static_cast(block_count) / (divup(block_count, numThreads()) * numThreads()); // Now try to increase block size up to max_block_size as long as it // doesn't decrease parallel efficiency. for (Index prev_block_count = block_count; max_efficiency < 1.0 && prev_block_count > 1;) { // This is the next block size that divides size into a smaller number // of blocks than the current block_size. Index coarser_block_size = divup(n, prev_block_count - 1); if (block_align) { Index new_block_size = block_align(coarser_block_size); eigen_assert(new_block_size >= coarser_block_size); coarser_block_size = numext::mini(n, new_block_size); } if (coarser_block_size > max_block_size) { break; // Reached max block size. Stop. } // Recalculate parallel efficiency. const Index coarser_block_count = divup(n, coarser_block_size); eigen_assert(coarser_block_count < prev_block_count); prev_block_count = coarser_block_count; const double coarser_efficiency = static_cast(coarser_block_count) / (divup(coarser_block_count, numThreads()) * numThreads()); if (coarser_efficiency + 0.01 >= max_efficiency) { // Taking it. block_size = coarser_block_size; block_count = coarser_block_count; if (max_efficiency < coarser_efficiency) { max_efficiency = coarser_efficiency; } } } return {block_size, block_count}; } ThreadPoolInterface* pool_; int num_threads_; Allocator* allocator_; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_THREAD_POOL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h0000644000176200001440000004637314562417221026140 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H #define EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H namespace Eigen { /** \class TensorKChippingReshaping * \ingroup CXX11_Tensor_Module * * \brief A chip is a thin slice, corresponding to a column or a row in a 2-d tensor. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions - 1; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorChippingOp EIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorChippingOp type; }; template struct DimensionId { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DimensionId(DenseIndex dim) { EIGEN_UNUSED_VARIABLE(dim); eigen_assert(dim == DimId); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex actualDim() const { return DimId; } }; template <> struct DimensionId { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DimensionId(DenseIndex dim) : actual_dim(dim) { eigen_assert(dim >= 0); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex actualDim() const { return actual_dim; } private: const DenseIndex actual_dim; }; } // end namespace internal template class TensorChippingOp : public TensorBase > { public: typedef TensorBase > Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorChippingOp(const XprType& expr, const Index offset, const Index dim) : m_xpr(expr), m_offset(offset), m_dim(dim) { } EIGEN_DEVICE_FUNC const Index offset() const { return m_offset; } EIGEN_DEVICE_FUNC const Index dim() const { return m_dim.actualDim(); } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorChippingOp) protected: typename XprType::Nested m_xpr; const Index m_offset; const internal::DimensionId m_dim; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorChippingOp XprType; static const int NumInputDims = internal::array_size::Dimensions>::value; static const int NumDims = NumInputDims-1; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { // Alignment can't be guaranteed at compile time since it depends on the // slice offsets. IsAligned = false, Layout = TensorEvaluator::Layout, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::BlockAccess, // Chipping of outer-most dimension is a trivial operation, because we can // read and write directly from the underlying tensor using single offset. IsOuterChipping = (static_cast(Layout) == ColMajor && DimId == NumInputDims - 1) || (static_cast(Layout) == RowMajor && DimId == 0), // Chipping inner-most dimension. IsInnerChipping = (static_cast(Layout) == ColMajor && DimId == 0) || (static_cast(Layout) == RowMajor && DimId == NumInputDims - 1), // Prefer block access if the underlying expression prefers it, otherwise // only if chipping is not trivial. PreferBlockAccess = TensorEvaluator::PreferBlockAccess || !IsOuterChipping, CoordAccess = false, // to be implemented RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef internal::TensorBlockDescriptor ArgTensorBlockDesc; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_dim(op.dim()), m_device(device) { EIGEN_STATIC_ASSERT((NumInputDims >= 1), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(NumInputDims > m_dim.actualDim()); const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); eigen_assert(op.offset() < input_dims[m_dim.actualDim()]); int j = 0; for (int i = 0; i < NumInputDims; ++i) { if (i != m_dim.actualDim()) { m_dimensions[j] = input_dims[i]; ++j; } } m_stride = 1; m_inputStride = 1; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < m_dim.actualDim(); ++i) { m_stride *= input_dims[i]; m_inputStride *= input_dims[i]; } } else { for (int i = NumInputDims-1; i > m_dim.actualDim(); --i) { m_stride *= input_dims[i]; m_inputStride *= input_dims[i]; } } m_inputStride *= input_dims[m_dim.actualDim()]; m_inputOffset = m_stride * op.offset(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_impl.coeff(srcCoeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); if (isInnerChipping()) { // m_stride is equal to 1, so let's avoid the integer division. eigen_assert(m_stride == 1); Index inputIndex = index * m_inputStride + m_inputOffset; EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = m_impl.coeff(inputIndex); inputIndex += m_inputStride; } PacketReturnType rslt = internal::pload(values); return rslt; } else if (isOuterChipping()) { // m_stride is always greater than index, so let's avoid the integer division. eigen_assert(m_stride > index); return m_impl.template packet(index + m_inputOffset); } else { const Index idx = index / m_stride; const Index rem = index - idx * m_stride; if (rem + PacketSize <= m_stride) { Index inputIndex = idx * m_inputStride + m_inputOffset + rem; return m_impl.template packet(inputIndex); } else { // Cross the stride boundary. Fallback to slow path. EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index); ++index; } PacketReturnType rslt = internal::pload(values); return rslt; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { double cost = 0; if ((static_cast(Layout) == static_cast(ColMajor) && m_dim.actualDim() == 0) || (static_cast(Layout) == static_cast(RowMajor) && m_dim.actualDim() == NumInputDims - 1)) { cost += TensorOpCost::MulCost() + TensorOpCost::AddCost(); } else if ((static_cast(Layout) == static_cast(ColMajor) && m_dim.actualDim() == NumInputDims - 1) || (static_cast(Layout) == static_cast(RowMajor) && m_dim.actualDim() == 0)) { cost += TensorOpCost::AddCost(); } else { cost += 3 * TensorOpCost::MulCost() + TensorOpCost::DivCost() + 3 * TensorOpCost::AddCost(); } return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { const size_t target_size = m_device.lastLevelCacheSize(); return internal::TensorBlockResourceRequirements::merge( internal::TensorBlockResourceRequirements::skewed(target_size), m_impl.getResourceRequirements()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool root_of_expr_ast = false) const { const Index chip_dim = m_dim.actualDim(); DSizes input_block_dims; for (int i = 0; i < NumInputDims; ++i) { input_block_dims[i] = i < chip_dim ? desc.dimension(i) : i > chip_dim ? desc.dimension(i - 1) : 1; } ArgTensorBlockDesc arg_desc(srcCoeff(desc.offset()), input_block_dims); // Try to reuse destination buffer for materializing argument block. if (desc.HasDestinationBuffer()) { DSizes arg_destination_strides; for (int i = 0; i < NumInputDims; ++i) { arg_destination_strides[i] = i < chip_dim ? desc.destination().strides()[i] : i > chip_dim ? desc.destination().strides()[i - 1] : 0; // for dimensions of size `1` stride should never be used. } arg_desc.template AddDestinationBuffer( desc.destination().template data(), arg_destination_strides); } ArgTensorBlock arg_block = m_impl.block(arg_desc, scratch, root_of_expr_ast); if (!arg_desc.HasDestinationBuffer()) desc.DropDestinationBuffer(); if (arg_block.data() != NULL) { // Forward argument block buffer if possible. return TensorBlock(arg_block.kind(), arg_block.data(), desc.dimensions()); } else { // Assign argument block expression to a buffer. // Prepare storage for the materialized chipping result. const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage(desc, scratch); typedef internal::TensorBlockAssignment< ScalarNoConst, NumInputDims, typename ArgTensorBlock::XprType, Index> TensorBlockAssignment; TensorBlockAssignment::Run( TensorBlockAssignment::target( arg_desc.dimensions(), internal::strides(arg_desc.dimensions()), block_storage.data()), arg_block.expr()); return block_storage.AsTensorMaterializedBlock(); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Storage::Type data() const { typename Storage::Type result = constCast(m_impl.data()); if (isOuterChipping() && result) { return result + m_inputOffset; } else { return NULL; } } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const { Index inputIndex; if (isInnerChipping()) { // m_stride is equal to 1, so let's avoid the integer division. eigen_assert(m_stride == 1); inputIndex = index * m_inputStride + m_inputOffset; } else if (isOuterChipping()) { // m_stride is always greater than index, so let's avoid the integer // division. eigen_assert(m_stride > index); inputIndex = index + m_inputOffset; } else { const Index idx = index / m_stride; inputIndex = idx * m_inputStride + m_inputOffset; index -= idx * m_stride; inputIndex += index; } return inputIndex; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool isInnerChipping() const { return IsInnerChipping || (static_cast(Layout) == ColMajor && m_dim.actualDim() == 0) || (static_cast(Layout) == RowMajor && m_dim.actualDim() == NumInputDims - 1); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool isOuterChipping() const { return IsOuterChipping || (static_cast(Layout) == ColMajor && m_dim.actualDim() == NumInputDims-1) || (static_cast(Layout) == RowMajor && m_dim.actualDim() == 0); } Dimensions m_dimensions; Index m_stride; Index m_inputOffset; Index m_inputStride; TensorEvaluator m_impl; const internal::DimensionId m_dim; const Device EIGEN_DEVICE_REF m_device; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorChippingOp XprType; static const int NumInputDims = internal::array_size::Dimensions>::value; static const int NumDims = NumInputDims-1; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::RawAccess, Layout = TensorEvaluator::Layout, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { return this->m_impl.coeffRef(this->srcCoeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) if (this->isInnerChipping()) { // m_stride is equal to 1, so let's avoid the integer division. eigen_assert(this->m_stride == 1); EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; internal::pstore(values, x); Index inputIndex = index * this->m_inputStride + this->m_inputOffset; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { this->m_impl.coeffRef(inputIndex) = values[i]; inputIndex += this->m_inputStride; } } else if (this->isOuterChipping()) { // m_stride is always greater than index, so let's avoid the integer division. eigen_assert(this->m_stride > index); this->m_impl.template writePacket(index + this->m_inputOffset, x); } else { const Index idx = index / this->m_stride; const Index rem = index - idx * this->m_stride; if (rem + PacketSize <= this->m_stride) { const Index inputIndex = idx * this->m_inputStride + this->m_inputOffset + rem; this->m_impl.template writePacket(inputIndex, x); } else { // Cross stride boundary. Fallback to slow path. EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; internal::pstore(values, x); EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { this->coeffRef(index) = values[i]; ++index; } } } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock( const TensorBlockDesc& desc, const TensorBlock& block) { assert(this->m_impl.data() != NULL); const Index chip_dim = this->m_dim.actualDim(); DSizes input_block_dims; for (int i = 0; i < NumInputDims; ++i) { input_block_dims[i] = i < chip_dim ? desc.dimension(i) : i > chip_dim ? desc.dimension(i - 1) : 1; } typedef TensorReshapingOp, const typename TensorBlock::XprType> TensorBlockExpr; typedef internal::TensorBlockAssignment TensorBlockAssign; TensorBlockAssign::Run( TensorBlockAssign::target( input_block_dims, internal::strides(this->m_impl.dimensions()), this->m_impl.data(), this->srcCoeff(desc.offset())), block.expr().reshape(input_block_dims)); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h0000644000176200001440000004252714562417221026504 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H #define EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H namespace Eigen { /** \internal * * \class TensorDimensions * \ingroup CXX11_Tensor_Module * * \brief Set of classes used to encode and store the dimensions of a Tensor. * * The Sizes class encodes as part of the type the number of dimensions and the * sizes corresponding to each dimension. It uses no storage space since it is * entirely known at compile time. * The DSizes class is its dynamic sibling: the number of dimensions is known * at compile time but the sizes are set during execution. * * \sa Tensor */ // Boilerplate code namespace internal { template struct dget { static const std::ptrdiff_t value = get::value; }; template struct fixed_size_tensor_index_linearization_helper { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index run(array const& indices, const Dimensions& dimensions) { return array_get(indices) + dget::value * fixed_size_tensor_index_linearization_helper::run(indices, dimensions); } }; template struct fixed_size_tensor_index_linearization_helper { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index run(array const&, const Dimensions&) { return 0; } }; template struct fixed_size_tensor_index_extraction_helper { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index run(const Index index, const Dimensions& dimensions) { const Index mult = (index == n-1) ? 1 : 0; return array_get(dimensions) * mult + fixed_size_tensor_index_extraction_helper::run(index, dimensions); } }; template struct fixed_size_tensor_index_extraction_helper { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index run(const Index, const Dimensions&) { return 0; } }; } // end namespace internal // Fixed size #ifndef EIGEN_EMULATE_CXX11_META_H template struct Sizes { typedef internal::numeric_list Base; const Base t = Base(); static const std::ptrdiff_t total_size = internal::arg_prod(Indices...); static const ptrdiff_t count = Base::count; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t rank() const { return Base::count; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t TotalSize() { return internal::arg_prod(Indices...); } EIGEN_DEVICE_FUNC Sizes() { } template explicit EIGEN_DEVICE_FUNC Sizes(const array& /*indices*/) { // todo: add assertion } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC Sizes(DenseIndex...) { } explicit EIGEN_DEVICE_FUNC Sizes(std::initializer_list /*l*/) { // todo: add assertion } #endif template Sizes& operator = (const T& /*other*/) { // add assertion failure if the size of other is different return *this; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t operator[] (const std::ptrdiff_t index) const { return internal::fixed_size_tensor_index_extraction_helper::run(index, t); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t IndexOfColMajor(const array& indices) const { return internal::fixed_size_tensor_index_linearization_helper::run(indices, t); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t IndexOfRowMajor(const array& indices) const { return internal::fixed_size_tensor_index_linearization_helper::run(indices, t); } }; namespace internal { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_prod(const Sizes&) { return Sizes::total_size; } } #else template struct non_zero_size { typedef internal::type2val type; }; template <> struct non_zero_size<0> { typedef internal::null_type type; }; template struct Sizes { typedef typename internal::make_type_list::type, typename non_zero_size::type, typename non_zero_size::type, typename non_zero_size::type, typename non_zero_size::type >::type Base; static const std::ptrdiff_t count = Base::count; static const std::ptrdiff_t total_size = internal::arg_prod::value; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t rank() const { return count; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t TotalSize() { return internal::arg_prod::value; } Sizes() { } template explicit Sizes(const array& /*indices*/) { // todo: add assertion } template Sizes& operator = (const T& /*other*/) { // add assertion failure if the size of other is different return *this; } #if EIGEN_HAS_VARIADIC_TEMPLATES template Sizes(DenseIndex... /*indices*/) { } explicit Sizes(std::initializer_list) { // todo: add assertion } #else EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex) { } EIGEN_DEVICE_FUNC Sizes(const DenseIndex, const DenseIndex) { } EIGEN_DEVICE_FUNC Sizes(const DenseIndex, const DenseIndex, const DenseIndex) { } EIGEN_DEVICE_FUNC Sizes(const DenseIndex, const DenseIndex, const DenseIndex, const DenseIndex) { } EIGEN_DEVICE_FUNC Sizes(const DenseIndex, const DenseIndex, const DenseIndex, const DenseIndex, const DenseIndex) { } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index operator[] (const Index index) const { switch (index) { case 0: return internal::get<0, Base>::value; case 1: return internal::get<1, Base>::value; case 2: return internal::get<2, Base>::value; case 3: return internal::get<3, Base>::value; case 4: return internal::get<4, Base>::value; default: eigen_assert(false && "index overflow"); return static_cast(-1); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t IndexOfColMajor(const array& indices) const { return internal::fixed_size_tensor_index_linearization_helper::run(indices, *reinterpret_cast(this)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptrdiff_t IndexOfRowMajor(const array& indices) const { return internal::fixed_size_tensor_index_linearization_helper::run(indices, *reinterpret_cast(this)); } }; namespace internal { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_prod(const Sizes&) { return Sizes::total_size; } } #endif // Boilerplate namespace internal { template struct tensor_index_linearization_helper { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index run(array const& indices, array const& dimensions) { return array_get(indices) + array_get(dimensions) * tensor_index_linearization_helper::run(indices, dimensions); } }; template struct tensor_index_linearization_helper { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index run(array const& indices, array const&) { return array_get(indices); } }; } // end namespace internal // Dynamic size template struct DSizes : array { typedef array Base; static const int count = NumDims; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rank() const { return NumDims; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex TotalSize() const { return (NumDims == 0) ? 1 : internal::array_prod(*static_cast(this)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DSizes() { for (int i = 0 ; i < NumDims; ++i) { (*this)[i] = 0; } } EIGEN_DEVICE_FUNC explicit DSizes(const array& a) : Base(a) { } EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0) { eigen_assert(NumDims == 1); (*this)[0] = i0; } EIGEN_DEVICE_FUNC DSizes(const DimensionList& a) { for (int i = 0 ; i < NumDims; ++i) { (*this)[i] = a[i]; } } // Enable DSizes index type promotion only if we are promoting to the // larger type, e.g. allow to promote dimensions of type int to long. template EIGEN_DEVICE_FUNC explicit DSizes(const array& other, // Default template parameters require c++11. typename internal::enable_if< internal::is_same< DenseIndex, typename internal::promote_index_type< DenseIndex, OtherIndex >::type >::value, void*>::type = 0) { for (int i = 0; i < NumDims; ++i) { (*this)[i] = static_cast(other[i]); } } #ifdef EIGEN_HAS_INDEX_LIST template EIGEN_DEVICE_FUNC explicit DSizes(const Eigen::IndexList& dimensions) { for (int i = 0; i < dimensions.count; ++i) { (*this)[i] = dimensions[i]; } } #endif #ifndef EIGEN_EMULATE_CXX11_META_H template EIGEN_DEVICE_FUNC DSizes(const Sizes& a) { for (int i = 0 ; i < NumDims; ++i) { (*this)[i] = a[i]; } } #else template EIGEN_DEVICE_FUNC DSizes(const Sizes& a) { for (int i = 0 ; i < NumDims; ++i) { (*this)[i] = a[i]; } } #endif #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit DSizes(DenseIndex firstDimension, DenseIndex secondDimension, IndexTypes... otherDimensions) : Base({{firstDimension, secondDimension, otherDimensions...}}) { EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 2 == NumDims, YOU_MADE_A_PROGRAMMING_MISTAKE) } #else EIGEN_DEVICE_FUNC DSizes(const DenseIndex i0, const DenseIndex i1) { eigen_assert(NumDims == 2); (*this)[0] = i0; (*this)[1] = i1; } EIGEN_DEVICE_FUNC DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2) { eigen_assert(NumDims == 3); (*this)[0] = i0; (*this)[1] = i1; (*this)[2] = i2; } EIGEN_DEVICE_FUNC DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3) { eigen_assert(NumDims == 4); (*this)[0] = i0; (*this)[1] = i1; (*this)[2] = i2; (*this)[3] = i3; } EIGEN_DEVICE_FUNC DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3, const DenseIndex i4) { eigen_assert(NumDims == 5); (*this)[0] = i0; (*this)[1] = i1; (*this)[2] = i2; (*this)[3] = i3; (*this)[4] = i4; } #endif EIGEN_DEVICE_FUNC DSizes& operator = (const array& other) { *static_cast(this) = other; return *this; } // A constexpr would be so much better here EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex IndexOfColMajor(const array& indices) const { return internal::tensor_index_linearization_helper::run(indices, *static_cast(this)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex IndexOfRowMajor(const array& indices) const { return internal::tensor_index_linearization_helper::run(indices, *static_cast(this)); } }; template std::ostream& operator<<(std::ostream& os, const DSizes& dims) { os << "["; for (int i = 0; i < NumDims; ++i) { if (i > 0) os << ", "; os << dims[i]; } os << "]"; return os; } // Boilerplate namespace internal { template struct tensor_vsize_index_linearization_helper { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index run(array const& indices, std::vector const& dimensions) { return array_get(indices) + array_get(dimensions) * tensor_vsize_index_linearization_helper::run(indices, dimensions); } }; template struct tensor_vsize_index_linearization_helper { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index run(array const& indices, std::vector const&) { return array_get(indices); } }; } // end namespace internal namespace internal { template struct array_size > { static const ptrdiff_t value = NumDims; }; template struct array_size > { static const ptrdiff_t value = NumDims; }; #ifndef EIGEN_EMULATE_CXX11_META_H template struct array_size > { static const std::ptrdiff_t value = Sizes::count; }; template struct array_size > { static const std::ptrdiff_t value = Sizes::count; }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_get(const Sizes&) { return get >::value; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_get(const Sizes<>&) { eigen_assert(false && "should never be called"); return -1; } #else template struct array_size > { static const ptrdiff_t value = Sizes::count; }; template struct array_size > { static const ptrdiff_t value = Sizes::count; }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_get(const Sizes&) { return get::Base>::value; } #endif template struct sizes_match_below_dim { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool run(Dims1&, Dims2&) { return false; } }; template struct sizes_match_below_dim { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool run(Dims1& dims1, Dims2& dims2) { return (array_get(dims1) == array_get(dims2)) && sizes_match_below_dim::run(dims1, dims2); } }; template struct sizes_match_below_dim { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool run(Dims1&, Dims2&) { return true; } }; } // end namespace internal template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool dimensions_match(Dims1 dims1, Dims2 dims2) { return internal::sizes_match_below_dim::value, internal::array_size::value>::run(dims1, dims2); } } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h0000644000176200001440000002152114562417221025560 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H #define EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H namespace Eigen { /** \internal * * \class TensorIntDiv * \ingroup CXX11_Tensor_Module * * \brief Fast integer division by a constant. * * See the paper from Granlund and Montgomery for explanation. * (at https://doi.org/10.1145/773473.178249) * * \sa Tensor */ namespace internal { namespace { // Note: result is undefined if val == 0 template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename internal::enable_if::type count_leading_zeros(const T val) { #ifdef EIGEN_GPU_COMPILE_PHASE return __clz(val); #elif defined(SYCL_DEVICE_ONLY) return cl::sycl::clz(val); #elif EIGEN_COMP_MSVC unsigned long index; _BitScanReverse(&index, val); return 31 - index; #else EIGEN_STATIC_ASSERT(sizeof(unsigned long long) == 8, YOU_MADE_A_PROGRAMMING_MISTAKE); return __builtin_clz(static_cast(val)); #endif } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename internal::enable_if::type count_leading_zeros(const T val) { #ifdef EIGEN_GPU_COMPILE_PHASE return __clzll(val); #elif defined(SYCL_DEVICE_ONLY) return static_cast(cl::sycl::clz(val)); #elif EIGEN_COMP_MSVC && EIGEN_ARCH_x86_64 unsigned long index; _BitScanReverse64(&index, val); return 63 - index; #elif EIGEN_COMP_MSVC // MSVC's _BitScanReverse64 is not available for 32bits builds. unsigned int lo = (unsigned int)(val&0xffffffff); unsigned int hi = (unsigned int)((val>>32)&0xffffffff); int n; if(hi==0) n = 32 + count_leading_zeros(lo); else n = count_leading_zeros(hi); return n; #else EIGEN_STATIC_ASSERT(sizeof(unsigned long long) == 8, YOU_MADE_A_PROGRAMMING_MISTAKE); return __builtin_clzll(static_cast(val)); #endif } template struct UnsignedTraits { typedef typename conditional::type type; }; template struct DividerTraits { typedef typename UnsignedTraits::type type; static const int N = sizeof(T) * 8; }; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint32_t muluh(const uint32_t a, const T b) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __umulhi(a, b); #elif defined(SYCL_DEVICE_ONLY) return cl::sycl::mul_hi(a, static_cast(b)); #else return (static_cast(a) * b) >> 32; #endif } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint64_t muluh(const uint64_t a, const T b) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __umul64hi(a, b); #elif defined(SYCL_DEVICE_ONLY) return cl::sycl::mul_hi(a, static_cast(b)); #elif EIGEN_HAS_BUILTIN_INT128 __uint128_t v = static_cast<__uint128_t>(a) * static_cast<__uint128_t>(b); return static_cast(v >> 64); #else return (TensorUInt128, uint64_t>(a) * TensorUInt128, uint64_t>(b)).upper(); #endif } template struct DividerHelper { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint32_t computeMultiplier(const int log_div, const T divider) { EIGEN_STATIC_ASSERT(N == 32, YOU_MADE_A_PROGRAMMING_MISTAKE); return static_cast((static_cast(1) << (N+log_div)) / divider - (static_cast(1) << N) + 1); } }; template struct DividerHelper<64, T> { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint64_t computeMultiplier(const int log_div, const T divider) { #if EIGEN_HAS_BUILTIN_INT128 && !defined(EIGEN_GPU_COMPILE_PHASE) && !defined(SYCL_DEVICE_ONLY) return static_cast((static_cast<__uint128_t>(1) << (64+log_div)) / static_cast<__uint128_t>(divider) - (static_cast<__uint128_t>(1) << 64) + 1); #else const uint64_t shift = 1ULL << log_div; TensorUInt128 result = TensorUInt128 >(shift, 0) / TensorUInt128, uint64_t>(divider) - TensorUInt128, static_val<0> >(1, 0) + TensorUInt128, static_val<1> >(1); return static_cast(result); #endif } }; } template struct TensorIntDivisor { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() { multiplier = 0; shift1 = 0; shift2 = 0; } // Must have 0 < divider < 2^31. This is relaxed to // 0 < divider < 2^63 when using 64-bit indices on platforms that support // the __uint128_t type. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor(const T divider) { const int N = DividerTraits::N; eigen_assert(static_cast::type>(divider) < NumTraits::highest()/2); eigen_assert(divider > 0); // fast ln2 const int leading_zeros = count_leading_zeros(static_cast(divider)); int log_div = N - leading_zeros; // if divider is a power of two then log_div is 1 more than it should be. if ((static_cast::type>(1) << (log_div-1)) == static_cast::type>(divider)) log_div--; multiplier = DividerHelper::computeMultiplier(log_div, divider); shift1 = log_div > 1 ? 1 : log_div; shift2 = log_div > 1 ? log_div-1 : 0; } // Must have 0 <= numerator. On platforms that don't support the __uint128_t // type numerator should also be less than 2^32-1. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T divide(const T numerator) const { eigen_assert(static_cast::type>(numerator) < NumTraits::highest()/2); //eigen_assert(numerator >= 0); // this is implicitly asserted by the line above UnsignedType t1 = muluh(multiplier, numerator); UnsignedType t = (static_cast(numerator) - t1) >> shift1; return (t1 + t) >> shift2; } private: typedef typename DividerTraits::type UnsignedType; UnsignedType multiplier; int32_t shift1; int32_t shift2; }; // Optimized version for signed 32 bit integers. // Derived from Hacker's Delight. // Only works for divisors strictly greater than one template <> class TensorIntDivisor { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() { magic = 0; shift = 0; } // Must have 2 <= divider EIGEN_DEVICE_FUNC TensorIntDivisor(int32_t divider) { eigen_assert(divider >= 2); calcMagic(divider); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE int divide(const int32_t n) const { #ifdef EIGEN_GPU_COMPILE_PHASE return (__umulhi(magic, n) >> shift); #elif defined(SYCL_DEVICE_ONLY) return (cl::sycl::mul_hi(magic, static_cast(n)) >> shift); #else uint64_t v = static_cast(magic) * static_cast(n); return (static_cast(v >> 32) >> shift); #endif } private: // Compute the magic numbers. See Hacker's Delight section 10 for an in // depth explanation. EIGEN_DEVICE_FUNC void calcMagic(int32_t d) { const unsigned two31 = 0x80000000; // 2**31. unsigned ad = d; unsigned t = two31 + (ad >> 31); unsigned anc = t - 1 - t%ad; // Absolute value of nc. int p = 31; // Init. p. unsigned q1 = two31/anc; // Init. q1 = 2**p/|nc|. unsigned r1 = two31 - q1*anc; // Init. r1 = rem(2**p, |nc|). unsigned q2 = two31/ad; // Init. q2 = 2**p/|d|. unsigned r2 = two31 - q2*ad; // Init. r2 = rem(2**p, |d|). unsigned delta = 0; do { p = p + 1; q1 = 2*q1; // Update q1 = 2**p/|nc|. r1 = 2*r1; // Update r1 = rem(2**p, |nc|). if (r1 >= anc) { // (Must be an unsigned q1 = q1 + 1; // comparison here). r1 = r1 - anc;} q2 = 2*q2; // Update q2 = 2**p/|d|. r2 = 2*r2; // Update r2 = rem(2**p, |d|). if (r2 >= ad) { // (Must be an unsigned q2 = q2 + 1; // comparison here). r2 = r2 - ad;} delta = ad - r2; } while (q1 < delta || (q1 == delta && r1 == 0)); magic = (unsigned)(q2 + 1); shift = p - 32; } uint32_t magic; int32_t shift; }; template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator / (const T& numerator, const TensorIntDivisor& divisor) { return divisor.divide(numerator); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h0000644000176200001440000003153214562417221026137 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H #define EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H namespace Eigen { /** \class TensorCustomUnaryOp * \ingroup CXX11_Tensor_Module * * \brief Tensor custom class. * * */ namespace internal { template struct traits > { typedef typename XprType::Scalar Scalar; typedef typename XprType::StorageKind StorageKind; typedef typename XprType::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = traits::NumDimensions; static const int Layout = traits::Layout; typedef typename traits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorCustomUnaryOpEIGEN_DEVICE_REF type; }; template struct nested > { typedef TensorCustomUnaryOp type; }; } // end namespace internal template class TensorCustomUnaryOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCustomUnaryOp(const XprType& expr, const CustomUnaryFunc& func) : m_expr(expr), m_func(func) {} EIGEN_DEVICE_FUNC const CustomUnaryFunc& func() const { return m_func; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_expr; } protected: typename XprType::Nested m_expr; const CustomUnaryFunc m_func; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorCustomUnaryOp ArgType; typedef typename internal::traits::Index Index; static const int NumDims = internal::traits::NumDimensions; typedef DSizes Dimensions; typedef typename internal::remove_const::type Scalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename Eigen::internal::traits::PointerType TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const ArgType& op, const Device& device) : m_op(op), m_device(device), m_result(NULL) { m_dimensions = op.func().dimensions(op.expression()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { if (data) { evalTo(data); return false; } else { m_result = static_cast(m_device.get( (CoeffReturnType*) m_device.allocate_temp(dimensions().TotalSize() * sizeof(Scalar)))); evalTo(m_result); return true; } } EIGEN_STRONG_INLINE void cleanup() { if (m_result) { m_device.deallocate_temp(m_result); m_result = NULL; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_result[index]; } template EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const { return internal::ploadt(m_result + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // TODO(rmlarsen): Extend CustomOp API to return its cost estimate. return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_result; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_result.bind(cgh); } #endif protected: void evalTo(EvaluatorPointerType data) { TensorMap > result(m_device.get(data), m_dimensions); m_op.func().eval(m_op.expression(), result, m_device); } Dimensions m_dimensions; const ArgType m_op; const Device EIGEN_DEVICE_REF m_device; EvaluatorPointerType m_result; }; /** \class TensorCustomBinaryOp * \ingroup CXX11_Tensor_Module * * \brief Tensor custom class. * * */ namespace internal { template struct traits > { typedef typename internal::promote_storage_type::ret Scalar; typedef typename internal::promote_storage_type::ret CoeffReturnType; typedef typename promote_storage_type::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename LhsXprType::Nested LhsNested; typedef typename RhsXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; static const int NumDimensions = traits::NumDimensions; static const int Layout = traits::Layout; typedef typename conditional::val, typename traits::PointerType, typename traits::PointerType>::type PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorCustomBinaryOp& type; }; template struct nested > { typedef TensorCustomBinaryOp type; }; } // end namespace internal template class TensorCustomBinaryOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::traits::CoeffReturnType CoeffReturnType; typedef typename internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCustomBinaryOp(const LhsXprType& lhs, const RhsXprType& rhs, const CustomBinaryFunc& func) : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_func(func) {} EIGEN_DEVICE_FUNC const CustomBinaryFunc& func() const { return m_func; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& lhsExpression() const { return m_lhs_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& rhsExpression() const { return m_rhs_xpr; } protected: typename LhsXprType::Nested m_lhs_xpr; typename RhsXprType::Nested m_rhs_xpr; const CustomBinaryFunc m_func; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorCustomBinaryOp XprType; typedef typename internal::traits::Index Index; static const int NumDims = internal::traits::NumDimensions; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename Eigen::internal::traits::PointerType TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_op(op), m_device(device), m_result(NULL) { m_dimensions = op.func().dimensions(op.lhsExpression(), op.rhsExpression()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { if (data) { evalTo(data); return false; } else { m_result = static_cast(m_device.get( (CoeffReturnType*) m_device.allocate_temp(dimensions().TotalSize() * sizeof(CoeffReturnType)))); evalTo(m_result); return true; } } EIGEN_STRONG_INLINE void cleanup() { if (m_result != NULL) { m_device.deallocate_temp(m_result); m_result = NULL; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_result[index]; } template EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const { return internal::ploadt(m_result + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // TODO(rmlarsen): Extend CustomOp API to return its cost estimate. return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_result; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_result.bind(cgh); } #endif protected: void evalTo(EvaluatorPointerType data) { TensorMap > result(m_device.get(data), m_dimensions); m_op.func().eval(m_op.lhsExpression(), m_op.rhsExpression(), result, m_device); } Dimensions m_dimensions; const XprType m_op; const Device EIGEN_DEVICE_REF m_device; EvaluatorPointerType m_result; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/README.md0000644000176200001440000017163514562417221024312 0ustar liggesusers# Eigen Tensors {#eigen_tensors} Tensors are multidimensional arrays of elements. Elements are typically scalars, but more complex types such as strings are also supported. ## Tensor Classes You can manipulate a tensor with one of the following classes. They all are in the namespace `::Eigen.` ### Class Tensor This is the class to use to create a tensor and allocate memory for it. The class is templatized with the tensor datatype, such as float or int, and the tensor rank. The rank is the number of dimensions, for example rank 2 is a matrix. Tensors of this class are resizable. For example, if you assign a tensor of a different size to a Tensor, that tensor is resized to match its new value. #### Constructor Tensor(size0, size1, ...) Constructor for a Tensor. The constructor must be passed `rank` integers indicating the sizes of the instance along each of the the `rank` dimensions. // Create a tensor of rank 3 of sizes 2, 3, 4. This tensor owns // memory to hold 24 floating point values (24 = 2 x 3 x 4). Tensor t_3d(2, 3, 4); // Resize t_3d by assigning a tensor of different sizes, but same rank. t_3d = Tensor(3, 4, 3); #### Constructor Tensor(size_array) Constructor where the sizes for the constructor are specified as an array of values instead of an explicitly list of parameters. The array type to use is `Eigen::array`. The array can be constructed automatically from an initializer list. // Create a tensor of strings of rank 2 with sizes 5, 7. Tensor t_2d({5, 7}); ### Class TensorFixedSize> Class to use for tensors of fixed size, where the size is known at compile time. Fixed sized tensors can provide very fast computations because all their dimensions are known by the compiler. FixedSize tensors are not resizable. If the total number of elements in a fixed size tensor is small enough the tensor data is held onto the stack and does not cause heap allocation and free. // Create a 4 x 3 tensor of floats. TensorFixedSize> t_4x3; ### Class TensorMap> This is the class to use to create a tensor on top of memory allocated and owned by another part of your code. It allows to view any piece of allocated memory as a Tensor. Instances of this class do not own the memory where the data are stored. A TensorMap is not resizable because it does not own the memory where its data are stored. #### Constructor TensorMap>(data, size0, size1, ...) Constructor for a Tensor. The constructor must be passed a pointer to the storage for the data, and "rank" size attributes. The storage has to be large enough to hold all the data. // Map a tensor of ints on top of stack-allocated storage. int storage[128]; // 2 x 4 x 2 x 8 = 128 TensorMap> t_4d(storage, 2, 4, 2, 8); // The same storage can be viewed as a different tensor. // You can also pass the sizes as an array. TensorMap> t_2d(storage, 16, 8); // You can also map fixed-size tensors. Here we get a 1d view of // the 2d fixed-size tensor. TensorFixedSize> t_4x3; TensorMap> t_12(t_4x3.data(), 12); #### Class TensorRef See Assigning to a TensorRef below. ## Accessing Tensor Elements #### tensor(index0, index1...) Return the element at position `(index0, index1...)` in tensor `tensor`. You must pass as many parameters as the rank of `tensor`. The expression can be used as an l-value to set the value of the element at the specified position. The value returned is of the datatype of the tensor. // Set the value of the element at position (0, 1, 0); Tensor t_3d(2, 3, 4); t_3d(0, 1, 0) = 12.0f; // Initialize all elements to random values. for (int i = 0; i < 2; ++i) { for (int j = 0; j < 3; ++j) { for (int k = 0; k < 4; ++k) { t_3d(i, j, k) = ...some random value...; } } } // Print elements of a tensor. for (int i = 0; i < 2; ++i) { LOG(INFO) << t_3d(i, 0, 0); } ## TensorLayout The tensor library supports 2 layouts: `ColMajor` (the default) and `RowMajor`. Only the default column major layout is currently fully supported, and it is therefore not recommended to attempt to use the row major layout at the moment. The layout of a tensor is optionally specified as part of its type. If not specified explicitly column major is assumed. Tensor col_major; // equivalent to Tensor TensorMap > row_major(data, ...); All the arguments to an expression must use the same layout. Attempting to mix different layouts will result in a compilation error. It is possible to change the layout of a tensor or an expression using the `swap_layout()` method. Note that this will also reverse the order of the dimensions. Tensor col_major(2, 4); Tensor row_major(2, 4); Tensor col_major_result = col_major; // ok, layouts match Tensor col_major_result = row_major; // will not compile // Simple layout swap col_major_result = row_major.swap_layout(); eigen_assert(col_major_result.dimension(0) == 4); eigen_assert(col_major_result.dimension(1) == 2); // Swap the layout and preserve the order of the dimensions array shuffle(1, 0); col_major_result = row_major.swap_layout().shuffle(shuffle); eigen_assert(col_major_result.dimension(0) == 2); eigen_assert(col_major_result.dimension(1) == 4); ## Tensor Operations The Eigen Tensor library provides a vast library of operations on Tensors: numerical operations such as addition and multiplication, geometry operations such as slicing and shuffling, etc. These operations are available as methods of the Tensor classes, and in some cases as operator overloads. For example the following code computes the elementwise addition of two tensors: Tensor t1(2, 3, 4); ...set some values in t1... Tensor t2(2, 3, 4); ...set some values in t2... // Set t3 to the element wise sum of t1 and t2 Tensor t3 = t1 + t2; While the code above looks easy enough, it is important to understand that the expression `t1 + t2` is not actually adding the values of the tensors. The expression instead constructs a "tensor operator" object of the class TensorCwiseBinaryOp, which has references to the tensors `t1` and `t2`. This is a small C++ object that knows how to add `t1` and `t2`. It is only when the value of the expression is assigned to the tensor `t3` that the addition is actually performed. Technically, this happens through the overloading of `operator=()` in the Tensor class. This mechanism for computing tensor expressions allows for lazy evaluation and optimizations which are what make the tensor library very fast. Of course, the tensor operators do nest, and the expression `t1 + t2 * 0.3f` is actually represented with the (approximate) tree of operators: TensorCwiseBinaryOp(t1, TensorCwiseUnaryOp(t2, 0.3f)) ### Tensor Operations and C++ "auto" Because Tensor operations create tensor operators, the C++ `auto` keyword does not have its intuitive meaning. Consider these 2 lines of code: Tensor t3 = t1 + t2; auto t4 = t1 + t2; In the first line we allocate the tensor `t3` and it will contain the result of the addition of `t1` and `t2`. In the second line, `t4` is actually the tree of tensor operators that will compute the addition of `t1` and `t2`. In fact, `t4` is *not* a tensor and you cannot get the values of its elements: Tensor t3 = t1 + t2; cout << t3(0, 0, 0); // OK prints the value of t1(0, 0, 0) + t2(0, 0, 0) auto t4 = t1 + t2; cout << t4(0, 0, 0); // Compilation error! When you use `auto` you do not get a Tensor as a result but instead a non-evaluated expression. So only use `auto` to delay evaluation. Unfortunately, there is no single underlying concrete type for holding non-evaluated expressions, hence you have to use auto in the case when you do want to hold non-evaluated expressions. When you need the results of set of tensor computations you have to assign the result to a Tensor that will be capable of holding onto them. This can be either a normal Tensor, a fixed size Tensor, or a TensorMap on an existing piece of memory. All the following will work: auto t4 = t1 + t2; Tensor result = t4; // Could also be: result(t4); cout << result(0, 0, 0); TensorMap result(, , ...) = t4; cout << result(0, 0, 0); TensorFixedSize> result = t4; cout << result(0, 0, 0); Until you need the results, you can keep the operation around, and even reuse it for additional operations. As long as you keep the expression as an operation, no computation is performed. // One way to compute exp((t1 + t2) * 0.2f); auto t3 = t1 + t2; auto t4 = t3 * 0.2f; auto t5 = t4.exp(); Tensor result = t5; // Another way, exactly as efficient as the previous one: Tensor result = ((t1 + t2) * 0.2f).exp(); ### Controlling When Expression are Evaluated There are several ways to control when expressions are evaluated: * Assignment to a Tensor, TensorFixedSize, or TensorMap. * Use of the eval() method. * Assignment to a TensorRef. #### Assigning to a Tensor, TensorFixedSize, or TensorMap. The most common way to evaluate an expression is to assign it to a Tensor. In the example below, the `auto` declarations make the intermediate values "Operations", not Tensors, and do not cause the expressions to be evaluated. The assignment to the Tensor `result` causes the evaluation of all the operations. auto t3 = t1 + t2; // t3 is an Operation. auto t4 = t3 * 0.2f; // t4 is an Operation. auto t5 = t4.exp(); // t5 is an Operation. Tensor result = t5; // The operations are evaluated. If you know the ranks and sizes of the Operation value you can assign the Operation to a TensorFixedSize instead of a Tensor, which is a bit more efficient. // We know that the result is a 4x4x2 tensor! TensorFixedSize> result = t5; Simiarly, assigning an expression to a TensorMap causes its evaluation. Like tensors of type TensorFixedSize, TensorMaps cannot be resized so they have to have the rank and sizes of the expression that are assigned to them. #### Calling eval(). When you compute large composite expressions, you sometimes want to tell Eigen that an intermediate value in the expression tree is worth evaluating ahead of time. This is done by inserting a call to the `eval()` method of the expression Operation. // The previous example could have been written: Tensor result = ((t1 + t2) * 0.2f).exp(); // If you want to compute (t1 + t2) once ahead of time you can write: Tensor result = ((t1 + t2).eval() * 0.2f).exp(); Semantically, calling `eval()` is equivalent to materializing the value of the expression in a temporary Tensor of the right size. The code above in effect does: // .eval() knows the size! TensorFixedSize> tmp = t1 + t2; Tensor result = (tmp * 0.2f).exp(); Note that the return value of `eval()` is itself an Operation, so the following code does not do what you may think: // Here t3 is an evaluation Operation. t3 has not been evaluated yet. auto t3 = (t1 + t2).eval(); // You can use t3 in another expression. Still no evaluation. auto t4 = (t3 * 0.2f).exp(); // The value is evaluated when you assign the Operation to a Tensor, using // an intermediate tensor to represent t3.x Tensor result = t4; While in the examples above calling `eval()` does not make a difference in performance, in other cases it can make a huge difference. In the expression below the `broadcast()` expression causes the `X.maximum()` expression to be evaluated many times: Tensor<...> X ...; Tensor<...> Y = ((X - X.maximum(depth_dim).reshape(dims2d).broadcast(bcast)) * beta).exp(); Inserting a call to `eval()` between the `maximum()` and `reshape()` calls guarantees that maximum() is only computed once and greatly speeds-up execution: Tensor<...> Y = ((X - X.maximum(depth_dim).eval().reshape(dims2d).broadcast(bcast)) * beta).exp(); In the other example below, the tensor `Y` is both used in the expression and its assignment. This is an aliasing problem and if the evaluation is not done in the right order Y will be updated incrementally during the evaluation resulting in bogus results: Tensor<...> Y ...; Y = Y / (Y.sum(depth_dim).reshape(dims2d).broadcast(bcast)); Inserting a call to `eval()` between the `sum()` and `reshape()` expressions ensures that the sum is computed before any updates to `Y` are done. Y = Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast)); Note that an eval around the full right hand side expression is not needed because the generated has to compute the i-th value of the right hand side before assigning it to the left hand side. However, if you were assigning the expression value to a shuffle of `Y` then you would need to force an eval for correctness by adding an `eval()` call for the right hand side: Y.shuffle(...) = (Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast))).eval(); #### Assigning to a TensorRef. If you need to access only a few elements from the value of an expression you can avoid materializing the value in a full tensor by using a TensorRef. A TensorRef is a small wrapper class for any Eigen Operation. It provides overloads for the `()` operator that let you access individual values in the expression. TensorRef is convenient, because the Operation themselves do not provide a way to access individual elements. // Create a TensorRef for the expression. The expression is not // evaluated yet. TensorRef > ref = ((t1 + t2) * 0.2f).exp(); // Use "ref" to access individual elements. The expression is evaluated // on the fly. float at_0 = ref(0, 0, 0); cout << ref(0, 1, 0); Only use TensorRef when you need a subset of the values of the expression. TensorRef only computes the values you access. However note that if you are going to access all the values it will be much faster to materialize the results in a Tensor first. In some cases, if the full Tensor result would be very large, you may save memory by accessing it as a TensorRef. But not always. So don't count on it. ### Controlling How Expressions Are Evaluated The tensor library provides several implementations of the various operations such as contractions and convolutions. The implementations are optimized for different environments: single threaded on CPU, multi threaded on CPU, or on a GPU using cuda. Additional implementations may be added later. You can choose which implementation to use with the `device()` call. If you do not choose an implementation explicitly the default implementation that uses a single thread on the CPU is used. The default implementation has been optimized for recent Intel CPUs, taking advantage of SSE, AVX, and FMA instructions. Work is ongoing to tune the library on ARM CPUs. Note that you need to pass compiler-dependent flags to enable the use of SSE, AVX, and other instructions. For example, the following code adds two tensors using the default single-threaded CPU implementation: Tensor a(30, 40); Tensor b(30, 40); Tensor c = a + b; To choose a different implementation you have to insert a `device()` call before the assignment of the result. For technical C++ reasons this requires that the Tensor for the result be declared on its own. This means that you have to know the size of the result. Eigen::Tensor c(30, 40); c.device(...) = a + b; The call to `device()` must be the last call on the left of the operator=. You must pass to the `device()` call an Eigen device object. There are presently three devices you can use: DefaultDevice, ThreadPoolDevice and GpuDevice. #### Evaluating With the DefaultDevice This is exactly the same as not inserting a `device()` call. DefaultDevice my_device; c.device(my_device) = a + b; #### Evaluating with a Thread Pool // Create the Eigen ThreadPool Eigen::ThreadPool pool(8 /* number of threads in pool */) // Create the Eigen ThreadPoolDevice. Eigen::ThreadPoolDevice my_device(&pool, 4 /* number of threads to use */); // Now just use the device when evaluating expressions. Eigen::Tensor c(30, 50); c.device(my_device) = a.contract(b, dot_product_dims); #### Evaluating On GPU This is presently a bit more complicated than just using a thread pool device. You need to create a GPU device but you also need to explicitly allocate the memory for tensors with cuda. ## API Reference ### Datatypes In the documentation of the tensor methods and Operation we mention datatypes that are tensor-type specific: #### ::Dimensions Acts like an array of ints. Has an `int size` attribute, and can be indexed like an array to access individual values. Used to represent the dimensions of a tensor. See `dimensions()`. #### ::Index Acts like an `int`. Used for indexing tensors along their dimensions. See `operator()`, `dimension()`, and `size()`. #### ::Scalar Represents the datatype of individual tensor elements. For example, for a `Tensor`, `Scalar` is the type `float`. See `setConstant()`. #### We use this pseudo type to indicate that a tensor Operation is returned by a method. We indicate in the text the type and dimensions of the tensor that the Operation returns after evaluation. The Operation will have to be evaluated, for example by assigning it to a tensor, before you can access the values of the resulting tensor. You can also access the values through a TensorRef. ## Built-in Tensor Methods These are usual C++ methods that act on tensors immediately. They are not Operations which provide delayed evaluation of their results. Unless specified otherwise, all the methods listed below are available on all tensor classes: Tensor, TensorFixedSize, and TensorMap. ## Metadata ### int NumDimensions Constant value indicating the number of dimensions of a Tensor. This is also known as the tensor "rank". Eigen::Tensor a(3, 4); cout << "Dims " << a.NumDimensions; => Dims 2 ### Dimensions dimensions() Returns an array-like object representing the dimensions of the tensor. The actual type of the `dimensions()` result is `::``Dimensions`. Eigen::Tensor a(3, 4); const Eigen::Tensor::Dimensions& d = a.dimensions(); cout << "Dim size: " << d.size << ", dim 0: " << d[0] << ", dim 1: " << d[1]; => Dim size: 2, dim 0: 3, dim 1: 4 If you use a C++11 compiler, you can use `auto` to simplify the code: const auto& d = a.dimensions(); cout << "Dim size: " << d.size << ", dim 0: " << d[0] << ", dim 1: " << d[1]; => Dim size: 2, dim 0: 3, dim 1: 4 ### Index dimension(Index n) Returns the n-th dimension of the tensor. The actual type of the `dimension()` result is `::``Index`, but you can always use it like an int. Eigen::Tensor a(3, 4); int dim1 = a.dimension(1); cout << "Dim 1: " << dim1; => Dim 1: 4 ### Index size() Returns the total number of elements in the tensor. This is the product of all the tensor dimensions. The actual type of the `size()` result is `::``Index`, but you can always use it like an int. Eigen::Tensor a(3, 4); cout << "Size: " << a.size(); => Size: 12 ### Getting Dimensions From An Operation A few operations provide `dimensions()` directly, e.g. `TensorReslicingOp`. Most operations defer calculating dimensions until the operation is being evaluated. If you need access to the dimensions of a deferred operation, you can wrap it in a TensorRef (see Assigning to a TensorRef above), which provides `dimensions()` and `dimension()` as above. TensorRef can also wrap the plain Tensor types, so this is a useful idiom in templated contexts where the underlying object could be either a raw Tensor or some deferred operation (e.g. a slice of a Tensor). In this case, the template code can wrap the object in a TensorRef and reason about its dimensionality while remaining agnostic to the underlying type. ## Constructors ### Tensor Creates a tensor of the specified size. The number of arguments must be equal to the rank of the tensor. The content of the tensor is not initialized. Eigen::Tensor a(3, 4); cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl; => NumRows: 3 NumCols: 4 ### TensorFixedSize Creates a tensor of the specified size. The number of arguments in the Sizes<> template parameter determines the rank of the tensor. The content of the tensor is not initialized. Eigen::TensorFixedSize> a; cout << "Rank: " << a.rank() << endl; => Rank: 2 cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl; => NumRows: 3 NumCols: 4 ### TensorMap Creates a tensor mapping an existing array of data. The data must not be freed until the TensorMap is discarded, and the size of the data must be large enough to accommodate the coefficients of the tensor. float data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}; Eigen::TensorMap> a(data, 3, 4); cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl; => NumRows: 3 NumCols: 4 cout << "a(1, 2): " << a(1, 2) << endl; => a(1, 2): 7 ## Contents Initialization When a new Tensor or a new TensorFixedSize are created, memory is allocated to hold all the tensor elements, but the memory is not initialized. Similarly, when a new TensorMap is created on top of non-initialized memory the memory its contents are not initialized. You can use one of the methods below to initialize the tensor memory. These have an immediate effect on the tensor and return the tensor itself as a result. These are not tensor Operations which delay evaluation. ### setConstant(const Scalar& val) Sets all elements of the tensor to the constant value `val`. `Scalar` is the type of data stored in the tensor. You can pass any value that is convertible to that type. Returns the tensor itself in case you want to chain another call. a.setConstant(12.3f); cout << "Constant: " << endl << a << endl << endl; => Constant: 12.3 12.3 12.3 12.3 12.3 12.3 12.3 12.3 12.3 12.3 12.3 12.3 Note that `setConstant()` can be used on any tensor where the element type has a copy constructor and an `operator=()`: Eigen::Tensor a(2, 3); a.setConstant("yolo"); cout << "String tensor: " << endl << a << endl << endl; => String tensor: yolo yolo yolo yolo yolo yolo ### setZero() Fills the tensor with zeros. Equivalent to `setConstant(Scalar(0))`. Returns the tensor itself in case you want to chain another call. a.setZero(); cout << "Zeros: " << endl << a << endl << endl; => Zeros: 0 0 0 0 0 0 0 0 0 0 0 0 ### setValues({..initializer_list}) Fills the tensor with explicit values specified in a std::initializer_list. The type of the initializer list depends on the type and rank of the tensor. If the tensor has rank N, the initializer list must be nested N times. The most deeply nested lists must contains P scalars of the Tensor type where P is the size of the last dimension of the Tensor. For example, for a `TensorFixedSize` the initializer list must contains 2 lists of 3 floats each. `setValues()` returns the tensor itself in case you want to chain another call. Eigen::Tensor a(2, 3); a.setValues({{0.0f, 1.0f, 2.0f}, {3.0f, 4.0f, 5.0f}}); cout << "a" << endl << a << endl << endl; => a 0 1 2 3 4 5 If a list is too short, the corresponding elements of the tensor will not be changed. This is valid at each level of nesting. For example the following code only sets the values of the first row of the tensor. Eigen::Tensor a(2, 3); a.setConstant(1000); a.setValues({{10, 20, 30}}); cout << "a" << endl << a << endl << endl; => a 10 20 30 1000 1000 1000 ### setRandom() Fills the tensor with random values. Returns the tensor itself in case you want to chain another call. a.setRandom(); cout << "Random: " << endl << a << endl << endl; => Random: 0.680375 0.59688 -0.329554 0.10794 -0.211234 0.823295 0.536459 -0.0452059 0.566198 -0.604897 -0.444451 0.257742 You can customize `setRandom()` by providing your own random number generator as a template argument: a.setRandom(); Here, `MyRandomGenerator` must be a struct with the following member functions, where Scalar and Index are the same as `::``Scalar` and `::``Index`. See `struct UniformRandomGenerator` in TensorFunctors.h for an example. // Custom number generator for use with setRandom(). struct MyRandomGenerator { // Default and copy constructors. Both are needed MyRandomGenerator() { } MyRandomGenerator(const MyRandomGenerator& ) { } // Return a random value to be used. "element_location" is the // location of the entry to set in the tensor, it can typically // be ignored. Scalar operator()(Eigen::DenseIndex element_location, Eigen::DenseIndex /*unused*/ = 0) const { return ; } // Same as above but generates several numbers at a time. typename internal::packet_traits::type packetOp( Eigen::DenseIndex packet_location, Eigen::DenseIndex /*unused*/ = 0) const { return ; } }; You can also use one of the 2 random number generators that are part of the tensor library: * UniformRandomGenerator * NormalRandomGenerator ## Data Access The Tensor, TensorFixedSize, and TensorRef classes provide the following accessors to access the tensor coefficients: const Scalar& operator()(const array& indices) const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) Scalar& operator()(const array& indices) Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) The number of indices must be equal to the rank of the tensor. Moreover, these accessors are not available on tensor expressions. In order to access the values of a tensor expression, the expression must either be evaluated or wrapped in a TensorRef. ### Scalar* data() and const Scalar* data() const Returns a pointer to the storage for the tensor. The pointer is const if the tensor was const. This allows direct access to the data. The layout of the data depends on the tensor layout: RowMajor or ColMajor. This access is usually only needed for special cases, for example when mixing Eigen Tensor code with other libraries. Scalar is the type of data stored in the tensor. Eigen::Tensor a(3, 4); float* a_data = a.data(); a_data[0] = 123.45f; cout << "a(0, 0): " << a(0, 0); => a(0, 0): 123.45 ## Tensor Operations All the methods documented below return non evaluated tensor `Operations`. These can be chained: you can apply another Tensor Operation to the value returned by the method. The chain of Operation is evaluated lazily, typically when it is assigned to a tensor. See "Controlling when Expression are Evaluated" for more details about their evaluation. ### constant(const Scalar& val) Returns a tensor of the same type and dimensions as the original tensor but where all elements have the value `val`. This is useful, for example, when you want to add or subtract a constant from a tensor, or multiply every element of a tensor by a scalar. Eigen::Tensor a(2, 3); a.setConstant(1.0f); Eigen::Tensor b = a + a.constant(2.0f); Eigen::Tensor c = b * b.constant(0.2f); cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; cout << "c" << endl << c << endl << endl; => a 1 1 1 1 1 1 b 3 3 3 3 3 3 c 0.6 0.6 0.6 0.6 0.6 0.6 ### random() Returns a tensor of the same type and dimensions as the current tensor but where all elements have random values. This is for example useful to add random values to an existing tensor. The generation of random values can be customized in the same manner as for `setRandom()`. Eigen::Tensor a(2, 3); a.setConstant(1.0f); Eigen::Tensor b = a + a.random(); cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 1 1 1 1 1 1 b 1.68038 1.5662 1.82329 0.788766 1.59688 0.395103 ## Unary Element Wise Operations All these operations take a single input tensor as argument and return a tensor of the same type and dimensions as the tensor to which they are applied. The requested operations are applied to each element independently. ### operator-() Returns a tensor of the same type and dimensions as the original tensor containing the opposite values of the original tensor. Eigen::Tensor a(2, 3); a.setConstant(1.0f); Eigen::Tensor b = -a; cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 1 1 1 1 1 1 b -1 -1 -1 -1 -1 -1 ### sqrt() Returns a tensor of the same type and dimensions as the original tensor containing the square roots of the original tensor. ### rsqrt() Returns a tensor of the same type and dimensions as the original tensor containing the inverse square roots of the original tensor. ### square() Returns a tensor of the same type and dimensions as the original tensor containing the squares of the original tensor values. ### inverse() Returns a tensor of the same type and dimensions as the original tensor containing the inverse of the original tensor values. ### exp() Returns a tensor of the same type and dimensions as the original tensor containing the exponential of the original tensor. ### log() Returns a tensor of the same type and dimensions as the original tensor containing the natural logarithms of the original tensor. ### abs() Returns a tensor of the same type and dimensions as the original tensor containing the absolute values of the original tensor. ### pow(Scalar exponent) Returns a tensor of the same type and dimensions as the original tensor containing the coefficients of the original tensor to the power of the exponent. The type of the exponent, Scalar, is always the same as the type of the tensor coefficients. For example, only integer exponents can be used in conjuntion with tensors of integer values. You can use cast() to lift this restriction. For example this computes cubic roots of an int Tensor: Eigen::Tensor a(2, 3); a.setValues({{0, 1, 8}, {27, 64, 125}}); Eigen::Tensor b = a.cast().pow(1.0 / 3.0); cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 0 1 8 27 64 125 b 0 1 2 3 4 5 ### operator * (Scalar scale) Multiplies all the coefficients of the input tensor by the provided scale. ### cwiseMax(Scalar threshold) TODO ### cwiseMin(Scalar threshold) TODO ### unaryExpr(const CustomUnaryOp& func) TODO ## Binary Element Wise Operations These operations take two input tensors as arguments. The 2 input tensors should be of the same type and dimensions. The result is a tensor of the same dimensions as the tensors to which they are applied, and unless otherwise specified it is also of the same type. The requested operations are applied to each pair of elements independently. ### operator+(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise sums of the inputs. ### operator-(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise differences of the inputs. ### operator*(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise products of the inputs. ### operator/(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise quotients of the inputs. This operator is not supported for integer types. ### cwiseMax(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise maximums of the inputs. ### cwiseMin(const OtherDerived& other) Returns a tensor of the same type and dimensions as the input tensors containing the coefficient wise mimimums of the inputs. ### Logical operators The following logical operators are supported as well: * operator&&(const OtherDerived& other) * operator||(const OtherDerived& other) * operator<(const OtherDerived& other) * operator<=(const OtherDerived& other) * operator>(const OtherDerived& other) * operator>=(const OtherDerived& other) * operator==(const OtherDerived& other) * operator!=(const OtherDerived& other) They all return a tensor of boolean values. ## Selection (select(const ThenDerived& thenTensor, const ElseDerived& elseTensor) Selection is a coefficient-wise ternary operator that is the tensor equivalent to the if-then-else operation. Tensor if = ...; Tensor then = ...; Tensor else = ...; Tensor result = if.select(then, else); The 3 arguments must be of the same dimensions, which will also be the dimension of the result. The 'if' tensor must be of type boolean, the 'then' and the 'else' tensor must be of the same type, which will also be the type of the result. Each coefficient in the result is equal to the corresponding coefficient in the 'then' tensor if the corresponding value in the 'if' tensor is true. If not, the resulting coefficient will come from the 'else' tensor. ## Contraction Tensor *contractions* are a generalization of the matrix product to the multidimensional case. // Create 2 matrices using tensors of rank 2 Eigen::Tensor a(2, 3); a.setValues({{1, 2, 3}, {6, 5, 4}}); Eigen::Tensor b(3, 2); b.setValues({{1, 2}, {4, 5}, {5, 6}}); // Compute the traditional matrix product Eigen::array, 1> product_dims = { Eigen::IndexPair(1, 0) }; Eigen::Tensor AB = a.contract(b, product_dims); // Compute the product of the transpose of the matrices Eigen::array, 1> transposed_product_dims = { Eigen::IndexPair(0, 1) }; Eigen::Tensor AtBt = a.contract(b, transposed_product_dims); // Contraction to scalar value using a double contraction. // First coordinate of both tensors are contracted as well as both second coordinates, i.e., this computes the sum of the squares of the elements. Eigen::array, 2> double_contraction_product_dims = { Eigen::IndexPair(0, 0), Eigen::IndexPair(1, 1) }; Eigen::Tensor AdoubleContractedA = a.contract(a, double_contraction_product_dims); // Extracting the scalar value of the tensor contraction for further usage int value = AdoubleContractedA(0); ## Reduction Operations A *Reduction* operation returns a tensor with fewer dimensions than the original tensor. The values in the returned tensor are computed by applying a *reduction operator* to slices of values from the original tensor. You specify the dimensions along which the slices are made. The Eigen Tensor library provides a set of predefined reduction operators such as `maximum()` and `sum()` and lets you define additional operators by implementing a few methods from a reductor template. ### Reduction Dimensions All reduction operations take a single parameter of type `::``Dimensions` which can always be specified as an array of ints. These are called the "reduction dimensions." The values are the indices of the dimensions of the input tensor over which the reduction is done. The parameter can have at most as many element as the rank of the input tensor; each element must be less than the tensor rank, as it indicates one of the dimensions to reduce. Each dimension of the input tensor should occur at most once in the reduction dimensions as the implementation does not remove duplicates. The order of the values in the reduction dimensions does not affect the results, but the code may execute faster if you list the dimensions in increasing order. Example: Reduction along one dimension. // Create a tensor of 2 dimensions Eigen::Tensor a(2, 3); a.setValues({{1, 2, 3}, {6, 5, 4}}); // Reduce it along the second dimension (1)... Eigen::array dims({1 /* dimension to reduce */}); // ...using the "maximum" operator. // The result is a tensor with one dimension. The size of // that dimension is the same as the first (non-reduced) dimension of a. Eigen::Tensor b = a.maximum(dims); cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 1 2 3 6 5 4 b 3 6 Example: Reduction along two dimensions. Eigen::Tensor a(2, 3, 4); a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f}, {7.0f, 6.0f, 5.0f, 4.0f}, {8.0f, 9.0f, 10.0f, 11.0f}}, {{12.0f, 13.0f, 14.0f, 15.0f}, {19.0f, 18.0f, 17.0f, 16.0f}, {20.0f, 21.0f, 22.0f, 23.0f}}}); // The tensor a has 3 dimensions. We reduce along the // first 2, resulting in a tensor with a single dimension // of size 4 (the last dimension of a.) // Note that we pass the array of reduction dimensions // directly to the maximum() call. Eigen::Tensor b = a.maximum(Eigen::array({0, 1})); cout << "b" << endl << b << endl << endl; => b 20 21 22 23 #### Reduction along all dimensions As a special case, if you pass no parameter to a reduction operation the original tensor is reduced along *all* its dimensions. The result is a scalar, represented as a zero-dimension tensor. Eigen::Tensor a(2, 3, 4); a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f}, {7.0f, 6.0f, 5.0f, 4.0f}, {8.0f, 9.0f, 10.0f, 11.0f}}, {{12.0f, 13.0f, 14.0f, 15.0f}, {19.0f, 18.0f, 17.0f, 16.0f}, {20.0f, 21.0f, 22.0f, 23.0f}}}); // Reduce along all dimensions using the sum() operator. Eigen::Tensor b = a.sum(); cout << "b" << endl << b << endl << endl; => b 276 ### sum(const Dimensions& new_dims) ### sum() Reduce a tensor using the sum() operator. The resulting values are the sum of the reduced values. ### mean(const Dimensions& new_dims) ### mean() Reduce a tensor using the mean() operator. The resulting values are the mean of the reduced values. ### maximum(const Dimensions& new_dims) ### maximum() Reduce a tensor using the maximum() operator. The resulting values are the largest of the reduced values. ### minimum(const Dimensions& new_dims) ### minimum() Reduce a tensor using the minimum() operator. The resulting values are the smallest of the reduced values. ### prod(const Dimensions& new_dims) ### prod() Reduce a tensor using the prod() operator. The resulting values are the product of the reduced values. ### all(const Dimensions& new_dims) ### all() Reduce a tensor using the all() operator. Casts tensor to bool and then checks whether all elements are true. Runs through all elements rather than short-circuiting, so may be significantly inefficient. ### any(const Dimensions& new_dims) ### any() Reduce a tensor using the any() operator. Casts tensor to bool and then checks whether any element is true. Runs through all elements rather than short-circuiting, so may be significantly inefficient. ### reduce(const Dimensions& new_dims, const Reducer& reducer) Reduce a tensor using a user-defined reduction operator. See `SumReducer` in TensorFunctors.h for information on how to implement a reduction operator. ## Trace A *Trace* operation returns a tensor with fewer dimensions than the original tensor. It returns a tensor whose elements are the sum of the elements of the original tensor along the main diagonal for a list of specified dimensions, the "trace dimensions". Similar to the `Reduction Dimensions`, the trace dimensions are passed as an input parameter to the operation, are of type `::``Dimensions` , and have the same requirements when passed as an input parameter. In addition, the trace dimensions must have the same size. Example: Trace along 2 dimensions. // Create a tensor of 3 dimensions Eigen::Tensor a(2, 2, 3); a.setValues({{{1, 2, 3}, {4, 5, 6}}, {{7, 8, 9}, {10, 11, 12}}}); // Specify the dimensions along which the trace will be computed. // In this example, the trace can only be computed along the dimensions // with indices 0 and 1 Eigen::array dims({0, 1}); // The output tensor contains all but the trace dimensions. Tensor a_trace = a.trace(dims); cout << "a_trace:" << endl; cout << a_trace << endl; => a_trace: 11 13 15 ### trace(const Dimensions& new_dims) ### trace() As a special case, if no parameter is passed to the operation, trace is computed along *all* dimensions of the input tensor. Example: Trace along all dimensions. // Create a tensor of 3 dimensions, with all dimensions having the same size. Eigen::Tensor a(3, 3, 3); a.setValues({{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{10, 11, 12}, {13, 14, 15}, {16, 17, 18}}, {{19, 20, 21}, {22, 23, 24}, {25, 26, 27}}}); // Result is a zero dimension tensor Tensor a_trace = a.trace(); cout<<"a_trace:"< a_trace: 42 ## Scan Operations A *Scan* operation returns a tensor with the same dimensions as the original tensor. The operation performs an inclusive scan along the specified axis, which means it computes a running total along the axis for a given reduction operation. If the reduction operation corresponds to summation, then this computes the prefix sum of the tensor along the given axis. Example: dd a comment to this line // Create a tensor of 2 dimensions Eigen::Tensor a(2, 3); a.setValues({{1, 2, 3}, {4, 5, 6}}); // Scan it along the second dimension (1) using summation Eigen::Tensor b = a.cumsum(1); // The result is a tensor with the same size as the input cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 1 2 3 4 5 6 b 1 3 6 4 9 15 ### cumsum(const Index& axis) Perform a scan by summing consecutive entries. ### cumprod(const Index& axis) Perform a scan by multiplying consecutive entries. ## Convolutions ### convolve(const Kernel& kernel, const Dimensions& dims) Returns a tensor that is the output of the convolution of the input tensor with the kernel, along the specified dimensions of the input tensor. The dimension size for dimensions of the output tensor which were part of the convolution will be reduced by the formula: output_dim_size = input_dim_size - kernel_dim_size + 1 (requires: input_dim_size >= kernel_dim_size). The dimension sizes for dimensions that were not part of the convolution will remain the same. Performance of the convolution can depend on the length of the stride(s) of the input tensor dimension(s) along which the convolution is computed (the first dimension has the shortest stride for ColMajor, whereas RowMajor's shortest stride is for the last dimension). // Compute convolution along the second and third dimension. Tensor input(3, 3, 7, 11); Tensor kernel(2, 2); Tensor output(3, 2, 6, 11); input.setRandom(); kernel.setRandom(); Eigen::array dims({1, 2}); // Specify second and third dimension for convolution. output = input.convolve(kernel, dims); for (int i = 0; i < 3; ++i) { for (int j = 0; j < 2; ++j) { for (int k = 0; k < 6; ++k) { for (int l = 0; l < 11; ++l) { const float result = output(i,j,k,l); const float expected = input(i,j+0,k+0,l) * kernel(0,0) + input(i,j+1,k+0,l) * kernel(1,0) + input(i,j+0,k+1,l) * kernel(0,1) + input(i,j+1,k+1,l) * kernel(1,1); VERIFY_IS_APPROX(result, expected); } } } } ## Geometrical Operations These operations return a Tensor with different dimensions than the original Tensor. They can be used to access slices of tensors, see them with different dimensions, or pad tensors with additional data. ### reshape(const Dimensions& new_dims) Returns a view of the input tensor that has been reshaped to the specified new dimensions. The argument new_dims is an array of Index values. The rank of the resulting tensor is equal to the number of elements in new_dims. The product of all the sizes in the new dimension array must be equal to the number of elements in the input tensor. // Increase the rank of the input tensor by introducing a new dimension // of size 1. Tensor input(7, 11); array three_dims{{7, 11, 1}}; Tensor result = input.reshape(three_dims); // Decrease the rank of the input tensor by merging 2 dimensions; array one_dim{{7 * 11}}; Tensor result = input.reshape(one_dim); This operation does not move any data in the input tensor, so the resulting contents of a reshaped Tensor depend on the data layout of the original Tensor. For example this is what happens when you `reshape()` a 2D ColMajor tensor to one dimension: Eigen::Tensor a(2, 3); a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}}); Eigen::array one_dim({3 * 2}); Eigen::Tensor b = a.reshape(one_dim); cout << "b" << endl << b << endl; => b 0 300 100 400 200 500 This is what happens when the 2D Tensor is RowMajor: Eigen::Tensor a(2, 3); a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}}); Eigen::array one_dim({3 * 2}); Eigen::Tensor b = a.reshape(one_dim); cout << "b" << endl << b << endl; => b 0 100 200 300 400 500 The reshape operation is a lvalue. In other words, it can be used on the left side of the assignment operator. The previous example can be rewritten as follow: Eigen::Tensor a(2, 3); a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}}); Eigen::array two_dim({2, 3}); Eigen::Tensor b(6); b.reshape(two_dim) = a; cout << "b" << endl << b << endl; => b 0 300 100 400 200 500 Note that "b" itself was not reshaped but that instead the assignment is done to the reshape view of b. ### shuffle(const Shuffle& shuffle) Returns a copy of the input tensor whose dimensions have been reordered according to the specified permutation. The argument shuffle is an array of Index values. Its size is the rank of the input tensor. It must contain a permutation of 0, 1, ..., rank - 1. The i-th dimension of the output tensor equals to the size of the shuffle[i]-th dimension of the input tensor. For example: // Shuffle all dimensions to the left by 1. Tensor input(20, 30, 50); // ... set some values in input. Tensor output = input.shuffle({1, 2, 0}) eigen_assert(output.dimension(0) == 30); eigen_assert(output.dimension(1) == 50); eigen_assert(output.dimension(2) == 20); Indices into the output tensor are shuffled accordingly to formulate indices into the input tensor. For example, one can assert in the above code snippet that: eigen_assert(output(3, 7, 11) == input(11, 3, 7)); In general, one can assert that eigen_assert(output(..., indices[shuffle[i]], ...) == input(..., indices[i], ...)) The shuffle operation results in a lvalue, which means that it can be assigned to. In other words, it can be used on the left side of the assignment operator. Let's rewrite the previous example to take advantage of this feature: // Shuffle all dimensions to the left by 1. Tensor input(20, 30, 50); // ... set some values in input. Tensor output(30, 50, 20); output.shuffle({2, 0, 1}) = input; ### stride(const Strides& strides) Returns a view of the input tensor that strides (skips stride-1 elements) along each of the dimensions. The argument strides is an array of Index values. The dimensions of the resulting tensor are ceil(input_dimensions[i] / strides[i]). For example this is what happens when you `stride()` a 2D tensor: Eigen::Tensor a(4, 3); a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}}); Eigen::array strides({3, 2}); Eigen::Tensor b = a.stride(strides); cout << "b" << endl << b << endl; => b 0 200 900 1100 It is possible to assign a tensor to a stride: Tensor input(20, 30, 50); // ... set some values in input. Tensor output(40, 90, 200); output.stride({2, 3, 4}) = input; ### slice(const StartIndices& offsets, const Sizes& extents) Returns a sub-tensor of the given tensor. For each dimension i, the slice is made of the coefficients stored between offset[i] and offset[i] + extents[i] in the input tensor. Eigen::Tensor a(4, 3); a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}}); Eigen::array offsets = {1, 0}; Eigen::array extents = {2, 2}; Eigen::Tensor slice = a.slice(offsets, extents); cout << "a" << endl << a << endl; => a 0 100 200 300 400 500 600 700 800 900 1000 1100 cout << "slice" << endl << slice << endl; => slice 300 400 600 700 ### chip(const Index offset, const Index dim) A chip is a special kind of slice. It is the subtensor at the given offset in the dimension dim. The returned tensor has one fewer dimension than the input tensor: the dimension dim is removed. For example, a matrix chip would be either a row or a column of the input matrix. Eigen::Tensor a(4, 3); a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}}); Eigen::Tensor row_3 = a.chip(2, 0); Eigen::Tensor col_2 = a.chip(1, 1); cout << "a" << endl << a << endl; => a 0 100 200 300 400 500 600 700 800 900 1000 1100 cout << "row_3" << endl << row_3 << endl; => row_3 600 700 800 cout << "col_2" << endl << col_2 << endl; => col_2 100 400 700 1000 It is possible to assign values to a tensor chip since the chip operation is a lvalue. For example: Eigen::Tensor a(3); a.setValues({{100, 200, 300}}); Eigen::Tensor b(2, 3); b.setZero(); b.chip(0, 0) = a; cout << "a" << endl << a << endl; => a 100 200 300 cout << "b" << endl << b << endl; => b 100 200 300 0 0 0 ### reverse(const ReverseDimensions& reverse) Returns a view of the input tensor that reverses the order of the coefficients along a subset of the dimensions. The argument reverse is an array of boolean values that indicates whether or not the order of the coefficients should be reversed along each of the dimensions. This operation preserves the dimensions of the input tensor. For example this is what happens when you `reverse()` the first dimension of a 2D tensor: Eigen::Tensor a(4, 3); a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}}); Eigen::array reverse({true, false}); Eigen::Tensor b = a.reverse(reverse); cout << "a" << endl << a << endl << "b" << endl << b << endl; => a 0 100 200 300 400 500 600 700 800 900 1000 1100 b 900 1000 1100 600 700 800 300 400 500 0 100 200 ### broadcast(const Broadcast& broadcast) Returns a view of the input tensor in which the input is replicated one to many times. The broadcast argument specifies how many copies of the input tensor need to be made in each of the dimensions. Eigen::Tensor a(2, 3); a.setValues({{0, 100, 200}, {300, 400, 500}}); Eigen::array bcast({3, 2}); Eigen::Tensor b = a.broadcast(bcast); cout << "a" << endl << a << endl << "b" << endl << b << endl; => a 0 100 200 300 400 500 b 0 100 200 0 100 200 300 400 500 300 400 500 0 100 200 0 100 200 300 400 500 300 400 500 0 100 200 0 100 200 300 400 500 300 400 500 ### concatenate(const OtherDerived& other, Axis axis) TODO ### pad(const PaddingDimensions& padding) Returns a view of the input tensor in which the input is padded with zeros. Eigen::Tensor a(2, 3); a.setValues({{0, 100, 200}, {300, 400, 500}}); Eigen::array, 2> paddings; paddings[0] = make_pair(0, 1); paddings[1] = make_pair(2, 3); Eigen::Tensor b = a.pad(paddings); cout << "a" << endl << a << endl << "b" << endl << b << endl; => a 0 100 200 300 400 500 b 0 0 0 0 0 0 0 0 0 100 200 0 300 400 500 0 0 0 0 0 0 0 0 0 0 0 0 0 ### extract_patches(const PatchDims& patch_dims) Returns a tensor of coefficient patches extracted from the input tensor, where each patch is of dimension specified by 'patch_dims'. The returned tensor has one greater dimension than the input tensor, which is used to index each patch. The patch index in the output tensor depends on the data layout of the input tensor: the patch index is the last dimension ColMajor layout, and the first dimension in RowMajor layout. For example, given the following input tensor: Eigen::Tensor tensor(3,4); tensor.setValues({{0.0f, 1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f, 7.0f}, {8.0f, 9.0f, 10.0f, 11.0f}}); cout << "tensor: " << endl << tensor << endl; => tensor: 0 1 2 3 4 5 6 7 8 9 10 11 Six 2x2 patches can be extracted and indexed using the following code: Eigen::Tensor patch; Eigen::array patch_dims; patch_dims[0] = 2; patch_dims[1] = 2; patch = tensor.extract_patches(patch_dims); for (int k = 0; k < 6; ++k) { cout << "patch index: " << k << endl; for (int i = 0; i < 2; ++i) { for (int j = 0; j < 2; ++j) { if (DataLayout == ColMajor) { cout << patch(i, j, k) << " "; } else { cout << patch(k, i, j) << " "; } } cout << endl; } } This code results in the following output when the data layout is ColMajor: patch index: 0 0 1 4 5 patch index: 1 4 5 8 9 patch index: 2 1 2 5 6 patch index: 3 5 6 9 10 patch index: 4 2 3 6 7 patch index: 5 6 7 10 11 This code results in the following output when the data layout is RowMajor: (NOTE: the set of patches is the same as in ColMajor, but are indexed differently). patch index: 0 0 1 4 5 patch index: 1 1 2 5 6 patch index: 2 2 3 6 7 patch index: 3 4 5 8 9 patch index: 4 5 6 9 10 patch index: 5 6 7 10 11 ### extract_image_patches(const Index patch_rows, const Index patch_cols, const Index row_stride, const Index col_stride, const PaddingType padding_type) Returns a tensor of coefficient image patches extracted from the input tensor, which is expected to have dimensions ordered as follows (depending on the data layout of the input tensor, and the number of additional dimensions 'N'): *) ColMajor 1st dimension: channels (of size d) 2nd dimension: rows (of size r) 3rd dimension: columns (of size c) 4th-Nth dimension: time (for video) or batch (for bulk processing). *) RowMajor (reverse order of ColMajor) 1st-Nth dimension: time (for video) or batch (for bulk processing). N+1'th dimension: columns (of size c) N+2'th dimension: rows (of size r) N+3'th dimension: channels (of size d) The returned tensor has one greater dimension than the input tensor, which is used to index each patch. The patch index in the output tensor depends on the data layout of the input tensor: the patch index is the 4'th dimension in ColMajor layout, and the 4'th from the last dimension in RowMajor layout. For example, given the following input tensor with the following dimension sizes: *) depth: 2 *) rows: 3 *) columns: 5 *) batch: 7 Tensor tensor(2,3,5,7); Tensor tensor_row_major = tensor.swap_layout(); 2x2 image patches can be extracted and indexed using the following code: *) 2D patch: ColMajor (patch indexed by second-to-last dimension) Tensor twod_patch; twod_patch = tensor.extract_image_patches<2, 2>(); // twod_patch.dimension(0) == 2 // twod_patch.dimension(1) == 2 // twod_patch.dimension(2) == 2 // twod_patch.dimension(3) == 3*5 // twod_patch.dimension(4) == 7 *) 2D patch: RowMajor (patch indexed by the second dimension) Tensor twod_patch_row_major; twod_patch_row_major = tensor_row_major.extract_image_patches<2, 2>(); // twod_patch_row_major.dimension(0) == 7 // twod_patch_row_major.dimension(1) == 3*5 // twod_patch_row_major.dimension(2) == 2 // twod_patch_row_major.dimension(3) == 2 // twod_patch_row_major.dimension(4) == 2 ## Special Operations ### cast() Returns a tensor of type T with the same dimensions as the original tensor. The returned tensor contains the values of the original tensor converted to type T. Eigen::Tensor a(2, 3); Eigen::Tensor b = a.cast(); This can be useful for example if you need to do element-wise division of Tensors of integers. This is not currently supported by the Tensor library but you can easily cast the tensors to floats to do the division: Eigen::Tensor a(2, 3); a.setValues({{0, 1, 2}, {3, 4, 5}}); Eigen::Tensor b = (a.cast() / a.constant(2).cast()).cast(); cout << "a" << endl << a << endl << endl; cout << "b" << endl << b << endl << endl; => a 0 1 2 3 4 5 b 0 0 1 1 2 2 ### eval() TODO ## Representation of scalar values Scalar values are often represented by tensors of size 1 and rank 0.For example Tensor::maximum() currently returns a Tensor. Similarly, the inner product of 2 1d tensors (through contractions) returns a 0d tensor. ## Limitations * The number of tensor dimensions is currently limited to 250 when using a compiler that supports cxx11. It is limited to only 5 for older compilers. * The IndexList class requires a cxx11 compliant compiler. You can use an array of indices instead if you don't have access to a modern compiler. * On GPUs only floating point values are properly tested and optimized for. * Complex and integer values are known to be broken on GPUs. If you try to use them you'll most likely end up triggering a static assertion failure such as EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE) RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceGpu.h0000644000176200001440000003104514562417221026240 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #if defined(EIGEN_USE_GPU) && !defined(EIGEN_CXX11_TENSOR_TENSOR_DEVICE_GPU_H) #define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_GPU_H // This header file container defines fo gpu* macros which will resolve to // their equivalent hip* or cuda* versions depending on the compiler in use // A separate header (included at the end of this file) will undefine all #include "TensorGpuHipCudaDefines.h" namespace Eigen { static const int kGpuScratchSize = 1024; // This defines an interface that GPUDevice can take to use // HIP / CUDA streams underneath. class StreamInterface { public: virtual ~StreamInterface() {} virtual const gpuStream_t& stream() const = 0; virtual const gpuDeviceProp_t& deviceProperties() const = 0; // Allocate memory on the actual device where the computation will run virtual void* allocate(size_t num_bytes) const = 0; virtual void deallocate(void* buffer) const = 0; // Return a scratchpad buffer of size 1k virtual void* scratchpad() const = 0; // Return a semaphore. The semaphore is initially initialized to 0, and // each kernel using it is responsible for resetting to 0 upon completion // to maintain the invariant that the semaphore is always equal to 0 upon // each kernel start. virtual unsigned int* semaphore() const = 0; }; class GpuDeviceProperties { public: GpuDeviceProperties() : initialized_(false), first_(true), device_properties_(nullptr) {} ~GpuDeviceProperties() { if (device_properties_) { delete[] device_properties_; } } EIGEN_STRONG_INLINE const gpuDeviceProp_t& get(int device) const { return device_properties_[device]; } EIGEN_STRONG_INLINE bool isInitialized() const { return initialized_; } void initialize() { if (!initialized_) { // Attempts to ensure proper behavior in the case of multiple threads // calling this function simultaneously. This would be trivial to // implement if we could use std::mutex, but unfortunately mutex don't // compile with nvcc, so we resort to atomics and thread fences instead. // Note that if the caller uses a compiler that doesn't support c++11 we // can't ensure that the initialization is thread safe. if (first_.exchange(false)) { // We're the first thread to reach this point. int num_devices; gpuError_t status = gpuGetDeviceCount(&num_devices); if (status != gpuSuccess) { std::cerr << "Failed to get the number of GPU devices: " << gpuGetErrorString(status) << std::endl; gpu_assert(status == gpuSuccess); } device_properties_ = new gpuDeviceProp_t[num_devices]; for (int i = 0; i < num_devices; ++i) { status = gpuGetDeviceProperties(&device_properties_[i], i); if (status != gpuSuccess) { std::cerr << "Failed to initialize GPU device #" << i << ": " << gpuGetErrorString(status) << std::endl; gpu_assert(status == gpuSuccess); } } std::atomic_thread_fence(std::memory_order_release); initialized_ = true; } else { // Wait for the other thread to inititialize the properties. while (!initialized_) { std::atomic_thread_fence(std::memory_order_acquire); std::this_thread::sleep_for(std::chrono::milliseconds(1000)); } } } } private: volatile bool initialized_; std::atomic first_; gpuDeviceProp_t* device_properties_; }; EIGEN_ALWAYS_INLINE const GpuDeviceProperties& GetGpuDeviceProperties() { static GpuDeviceProperties* deviceProperties = new GpuDeviceProperties(); if (!deviceProperties->isInitialized()) { deviceProperties->initialize(); } return *deviceProperties; } EIGEN_ALWAYS_INLINE const gpuDeviceProp_t& GetGpuDeviceProperties(int device) { return GetGpuDeviceProperties().get(device); } static const gpuStream_t default_stream = gpuStreamDefault; class GpuStreamDevice : public StreamInterface { public: // Use the default stream on the current device GpuStreamDevice() : stream_(&default_stream), scratch_(NULL), semaphore_(NULL) { gpuGetDevice(&device_); } // Use the default stream on the specified device GpuStreamDevice(int device) : stream_(&default_stream), device_(device), scratch_(NULL), semaphore_(NULL) {} // Use the specified stream. Note that it's the // caller responsibility to ensure that the stream can run on // the specified device. If no device is specified the code // assumes that the stream is associated to the current gpu device. GpuStreamDevice(const gpuStream_t* stream, int device = -1) : stream_(stream), device_(device), scratch_(NULL), semaphore_(NULL) { if (device < 0) { gpuGetDevice(&device_); } else { int num_devices; gpuError_t err = gpuGetDeviceCount(&num_devices); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); gpu_assert(device < num_devices); device_ = device; } } virtual ~GpuStreamDevice() { if (scratch_) { deallocate(scratch_); } } const gpuStream_t& stream() const { return *stream_; } const gpuDeviceProp_t& deviceProperties() const { return GetGpuDeviceProperties(device_); } virtual void* allocate(size_t num_bytes) const { gpuError_t err = gpuSetDevice(device_); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); void* result; err = gpuMalloc(&result, num_bytes); gpu_assert(err == gpuSuccess); gpu_assert(result != NULL); return result; } virtual void deallocate(void* buffer) const { gpuError_t err = gpuSetDevice(device_); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); gpu_assert(buffer != NULL); err = gpuFree(buffer); gpu_assert(err == gpuSuccess); } virtual void* scratchpad() const { if (scratch_ == NULL) { scratch_ = allocate(kGpuScratchSize + sizeof(unsigned int)); } return scratch_; } virtual unsigned int* semaphore() const { if (semaphore_ == NULL) { char* scratch = static_cast(scratchpad()) + kGpuScratchSize; semaphore_ = reinterpret_cast(scratch); gpuError_t err = gpuMemsetAsync(semaphore_, 0, sizeof(unsigned int), *stream_); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); } return semaphore_; } private: const gpuStream_t* stream_; int device_; mutable void* scratch_; mutable unsigned int* semaphore_; }; struct GpuDevice { // The StreamInterface is not owned: the caller is // responsible for its initialization and eventual destruction. explicit GpuDevice(const StreamInterface* stream) : stream_(stream), max_blocks_(INT_MAX) { eigen_assert(stream); } explicit GpuDevice(const StreamInterface* stream, int num_blocks) : stream_(stream), max_blocks_(num_blocks) { eigen_assert(stream); } // TODO(bsteiner): This is an internal API, we should not expose it. EIGEN_STRONG_INLINE const gpuStream_t& stream() const { return stream_->stream(); } EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const { return stream_->allocate(num_bytes); } EIGEN_STRONG_INLINE void deallocate(void* buffer) const { stream_->deallocate(buffer); } EIGEN_STRONG_INLINE void* allocate_temp(size_t num_bytes) const { return stream_->allocate(num_bytes); } EIGEN_STRONG_INLINE void deallocate_temp(void* buffer) const { stream_->deallocate(buffer); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Type get(Type data) const { return data; } EIGEN_STRONG_INLINE void* scratchpad() const { return stream_->scratchpad(); } EIGEN_STRONG_INLINE unsigned int* semaphore() const { return stream_->semaphore(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const { #ifndef EIGEN_GPU_COMPILE_PHASE gpuError_t err = gpuMemcpyAsync(dst, src, n, gpuMemcpyDeviceToDevice, stream_->stream()); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); #else EIGEN_UNUSED_VARIABLE(dst); EIGEN_UNUSED_VARIABLE(src); EIGEN_UNUSED_VARIABLE(n); eigen_assert(false && "The default device should be used instead to generate kernel code"); #endif } EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const { gpuError_t err = gpuMemcpyAsync(dst, src, n, gpuMemcpyHostToDevice, stream_->stream()); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); } EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const { gpuError_t err = gpuMemcpyAsync(dst, src, n, gpuMemcpyDeviceToHost, stream_->stream()); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const { #ifndef EIGEN_GPU_COMPILE_PHASE gpuError_t err = gpuMemsetAsync(buffer, c, n, stream_->stream()); EIGEN_UNUSED_VARIABLE(err) gpu_assert(err == gpuSuccess); #else eigen_assert(false && "The default device should be used instead to generate kernel code"); #endif } EIGEN_STRONG_INLINE size_t numThreads() const { // FIXME return 32; } EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const { // FIXME return 48*1024; } EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const { // We won't try to take advantage of the l2 cache for the time being, and // there is no l3 cache on hip/cuda devices. return firstLevelCacheSize(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void synchronize() const { #ifndef EIGEN_GPU_COMPILE_PHASE gpuError_t err = gpuStreamSynchronize(stream_->stream()); if (err != gpuSuccess) { std::cerr << "Error detected in GPU stream: " << gpuGetErrorString(err) << std::endl; gpu_assert(err == gpuSuccess); } #else gpu_assert(false && "The default device should be used instead to generate kernel code"); #endif } EIGEN_STRONG_INLINE int getNumGpuMultiProcessors() const { return stream_->deviceProperties().multiProcessorCount; } EIGEN_STRONG_INLINE int maxGpuThreadsPerBlock() const { return stream_->deviceProperties().maxThreadsPerBlock; } EIGEN_STRONG_INLINE int maxGpuThreadsPerMultiProcessor() const { return stream_->deviceProperties().maxThreadsPerMultiProcessor; } EIGEN_STRONG_INLINE int sharedMemPerBlock() const { return stream_->deviceProperties().sharedMemPerBlock; } EIGEN_STRONG_INLINE int majorDeviceVersion() const { return stream_->deviceProperties().major; } EIGEN_STRONG_INLINE int minorDeviceVersion() const { return stream_->deviceProperties().minor; } EIGEN_STRONG_INLINE int maxBlocks() const { return max_blocks_; } // This function checks if the GPU runtime recorded an error for the // underlying stream device. inline bool ok() const { #ifdef EIGEN_GPUCC gpuError_t error = gpuStreamQuery(stream_->stream()); return (error == gpuSuccess) || (error == gpuErrorNotReady); #else return false; #endif } private: const StreamInterface* stream_; int max_blocks_; }; #if defined(EIGEN_HIPCC) #define LAUNCH_GPU_KERNEL(kernel, gridsize, blocksize, sharedmem, device, ...) \ hipLaunchKernelGGL(kernel, dim3(gridsize), dim3(blocksize), (sharedmem), (device).stream(), __VA_ARGS__); \ gpu_assert(hipGetLastError() == hipSuccess); #else #define LAUNCH_GPU_KERNEL(kernel, gridsize, blocksize, sharedmem, device, ...) \ (kernel) <<< (gridsize), (blocksize), (sharedmem), (device).stream() >>> (__VA_ARGS__); \ gpu_assert(cudaGetLastError() == cudaSuccess); #endif // FIXME: Should be device and kernel specific. #ifdef EIGEN_GPUCC static EIGEN_DEVICE_FUNC inline void setGpuSharedMemConfig(gpuSharedMemConfig config) { #ifndef EIGEN_GPU_COMPILE_PHASE gpuError_t status = gpuDeviceSetSharedMemConfig(config); EIGEN_UNUSED_VARIABLE(status) gpu_assert(status == gpuSuccess); #else EIGEN_UNUSED_VARIABLE(config) #endif } #endif } // end namespace Eigen // undefine all the gpu* macros we defined at the beginning of the file #include "TensorGpuHipCudaUndefines.h" #endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_GPU_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/Tensor.h0000644000176200001440000005142514562417221024450 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_H #define EIGEN_CXX11_TENSOR_TENSOR_H namespace Eigen { /** \class Tensor * \ingroup CXX11_Tensor_Module * * \brief The tensor class. * * The %Tensor class is the work-horse for all \em dense tensors within Eigen. * * The %Tensor class encompasses only dynamic-size objects so far. * * The first two template parameters are required: * \tparam Scalar_ Numeric type, e.g. float, double, int or `std::complex`. * User defined scalar types are supported as well (see \ref user_defined_scalars "here"). * \tparam NumIndices_ Number of indices (i.e. rank of the tensor) * * The remaining template parameters are optional -- in most cases you don't have to worry about them. * \tparam Options_ A combination of either \b #RowMajor or \b #ColMajor, and of either * \b #AutoAlign or \b #DontAlign. * The former controls \ref TopicStorageOrders "storage order", and defaults to column-major. The latter controls alignment, which is required * for vectorization. It defaults to aligning tensors. Note that tensors currently do not support any operations that profit from vectorization. * Support for such operations (i.e. adding two tensors etc.) is planned. * * You can access elements of tensors using normal subscripting: * * \code * Eigen::Tensor t(10, 10, 10, 10); * t(0, 1, 2, 3) = 42.0; * \endcode * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_TENSOR_PLUGIN. * * Some notes: * *
*
Relation to other parts of Eigen:
*
The midterm development goal for this class is to have a similar hierarchy as Eigen uses for matrices, so that * taking blocks or using tensors in expressions is easily possible, including an interface with the vector/matrix code * by providing .asMatrix() and .asVector() (or similar) methods for rank 2 and 1 tensors. However, currently, the %Tensor * class does not provide any of these features and is only available as a stand-alone class that just allows for * coefficient access. Also, when fixed-size tensors are implemented, the number of template arguments is likely to * change dramatically.
*
* * \ref TopicStorageOrders */ template class Tensor : public TensorBase > { public: typedef Tensor Self; typedef TensorBase > Base; typedef typename Eigen::internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef Scalar_ Scalar; typedef typename NumTraits::Real RealScalar; typedef typename Base::CoeffReturnType CoeffReturnType; enum { IsAligned = bool(EIGEN_MAX_ALIGN_BYTES>0) & !(Options_&DontAlign), Layout = Options_ & RowMajor ? RowMajor : ColMajor, CoordAccess = true, RawAccess = true }; static const int Options = Options_; static const int NumIndices = NumIndices_; typedef DSizes Dimensions; protected: TensorStorage m_storage; #ifdef EIGEN_HAS_SFINAE template struct isOfNormalIndex{ static const bool is_array = internal::is_base_of, CustomIndices>::value; static const bool is_int = NumTraits::IsInteger; static const bool value = is_array | is_int; }; #endif public: // Metadata EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rank() const { return NumIndices; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index dimension(std::size_t n) const { return m_storage.dimensions()[n]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_storage.dimensions(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_storage.size(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar *data() { return m_storage.data(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar *data() const { return m_storage.data(); } // This makes EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED // work, because that uses base().coeffRef() - and we don't yet // implement a similar class hierarchy inline Self& base() { return *this; } inline const Self& base() const { return *this; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeff(array{{firstIndex, secondIndex, otherIndices...}}); } #endif // normal indices EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(const array& indices) const { eigen_internal_assert(checkIndexRange(indices)); return m_storage.data()[linearizedIndex(indices)]; } // custom indices #ifdef EIGEN_HAS_SFINAE template::value) ) > EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(CustomIndices& indices) const { return coeff(internal::customIndices2Array(indices)); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff() const { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return m_storage.data()[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const { eigen_internal_assert(index >= 0 && index < size()); return m_storage.data()[index]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template inline Scalar& coeffRef(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeffRef(array{{firstIndex, secondIndex, otherIndices...}}); } #endif // normal indices EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array& indices) { eigen_internal_assert(checkIndexRange(indices)); return m_storage.data()[linearizedIndex(indices)]; } // custom indices #ifdef EIGEN_HAS_SFINAE template::value) ) > EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(CustomIndices& indices) { return coeffRef(internal::customIndices2Array(indices)); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return m_storage.data()[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { eigen_internal_assert(index >= 0 && index < size()); return m_storage.data()[index]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template inline const Scalar& operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return this->operator()(array{{firstIndex, secondIndex, otherIndices...}}); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1) const { return coeff(array(i0, i1)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2) const { return coeff(array(i0, i1, i2)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3) const { return coeff(array(i0, i1, i2, i3)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const { return coeff(array(i0, i1, i2, i3, i4)); } #endif // custom indices #ifdef EIGEN_HAS_SFINAE template::value) ) > EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(CustomIndices& indices) const { return coeff(internal::customIndices2Array(indices)); } #endif // normal indices EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(const array& indices) const { return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const { eigen_internal_assert(index >= 0 && index < size()); return coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()() const { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeff(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator[](Index index) const { // The bracket operator is only for vectors, use the parenthesis operator instead. EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeff(index); } #if EIGEN_HAS_VARIADIC_TEMPLATES template inline Scalar& operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return operator()(array{{firstIndex, secondIndex, otherIndices...}}); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1) { return coeffRef(array(i0, i1)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2) { return coeffRef(array(i0, i1, i2)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3) { return coeffRef(array(i0, i1, i2, i3)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) { return coeffRef(array(i0, i1, i2, i3, i4)); } #endif // normal indices EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(const array& indices) { return coeffRef(indices); } // custom indices #ifdef EIGEN_HAS_SFINAE template::value) ) > EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(CustomIndices& indices) { return coeffRef(internal::customIndices2Array(indices)); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index index) { eigen_assert(index >= 0 && index < size()); return coeffRef(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeffRef(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator[](Index index) { // The bracket operator is only for vectors, use the parenthesis operator instead EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeffRef(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor() : m_storage() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(const Self& other) : m_storage(other.m_storage) { } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Index firstDimension, IndexTypes... otherDimensions) : m_storage(firstDimension, otherDimensions...) { // The number of dimensions used to construct a tensor must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Tensor(Index dim1) : m_storage(dim1, array(dim1)) { EIGEN_STATIC_ASSERT(1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Index dim1, Index dim2) : m_storage(dim1*dim2, array(dim1, dim2)) { EIGEN_STATIC_ASSERT(2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Index dim1, Index dim2, Index dim3) : m_storage(dim1*dim2*dim3, array(dim1, dim2, dim3)) { EIGEN_STATIC_ASSERT(3 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Index dim1, Index dim2, Index dim3, Index dim4) : m_storage(dim1*dim2*dim3*dim4, array(dim1, dim2, dim3, dim4)) { EIGEN_STATIC_ASSERT(4 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Index dim1, Index dim2, Index dim3, Index dim4, Index dim5) : m_storage(dim1*dim2*dim3*dim4*dim5, array(dim1, dim2, dim3, dim4, dim5)) { EIGEN_STATIC_ASSERT(5 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) } #endif /** Normal Dimension */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Tensor(const array& dimensions) : m_storage(internal::array_prod(dimensions), dimensions) { EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(const TensorBase& other) { typedef TensorAssignOp Assign; Assign assign(*this, other.derived()); resize(TensorEvaluator(assign, DefaultDevice()).dimensions()); internal::TensorExecutor::run(assign, DefaultDevice()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(const TensorBase& other) { typedef TensorAssignOp Assign; Assign assign(*this, other.derived()); resize(TensorEvaluator(assign, DefaultDevice()).dimensions()); internal::TensorExecutor::run(assign, DefaultDevice()); } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor(Self&& other) : m_storage(std::move(other.m_storage)) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor& operator=(Self&& other) { m_storage = std::move(other.m_storage); return *this; } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor& operator=(const Tensor& other) { typedef TensorAssignOp Assign; Assign assign(*this, other); resize(TensorEvaluator(assign, DefaultDevice()).dimensions()); internal::TensorExecutor::run(assign, DefaultDevice()); return *this; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tensor& operator=(const OtherDerived& other) { typedef TensorAssignOp Assign; Assign assign(*this, other); resize(TensorEvaluator(assign, DefaultDevice()).dimensions()); internal::TensorExecutor::run(assign, DefaultDevice()); return *this; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC void resize(Index firstDimension, IndexTypes... otherDimensions) { // The number of dimensions used to resize a tensor must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) resize(array{{firstDimension, otherDimensions...}}); } #endif /** Normal Dimension */ EIGEN_DEVICE_FUNC void resize(const array& dimensions) { int i; Index size = Index(1); for (i = 0; i < NumIndices; i++) { internal::check_rows_cols_for_overflow::run(size, dimensions[i]); size *= dimensions[i]; } #ifdef EIGEN_INITIALIZE_COEFFS bool size_changed = size != this->size(); m_storage.resize(size, dimensions); if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED #else m_storage.resize(size, dimensions); #endif } // Why this overload, DSizes is derived from array ??? // EIGEN_DEVICE_FUNC void resize(const DSizes& dimensions) { array dims; for (int i = 0; i < NumIndices; ++i) { dims[i] = dimensions[i]; } resize(dims); } EIGEN_DEVICE_FUNC void resize() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); // Nothing to do: rank 0 tensors have fixed size } #ifdef EIGEN_HAS_INDEX_LIST template EIGEN_DEVICE_FUNC void resize(const Eigen::IndexList& dimensions) { array dims; for (int i = 0; i < NumIndices; ++i) { dims[i] = static_cast(dimensions[i]); } resize(dims); } #endif /** Custom Dimension */ #ifdef EIGEN_HAS_SFINAE template::value) ) > EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize(CustomDimension& dimensions) { resize(internal::customIndices2Array(dimensions)); } #endif #ifndef EIGEN_EMULATE_CXX11_META_H template EIGEN_DEVICE_FUNC void resize(const Sizes& dimensions) { array dims; for (int i = 0; i < NumIndices; ++i) { dims[i] = static_cast(dimensions[i]); } resize(dims); } #else template EIGEN_DEVICE_FUNC void resize(const Sizes& dimensions) { array dims; for (int i = 0; i < NumIndices; ++i) { dims[i] = static_cast(dimensions[i]); } resize(dims); } #endif protected: bool checkIndexRange(const array& indices) const { using internal::array_apply_and_reduce; using internal::array_zip_and_reduce; using internal::greater_equal_zero_op; using internal::logical_and_op; using internal::lesser_op; return // check whether the indices are all >= 0 array_apply_and_reduce(indices) && // check whether the indices fit in the dimensions array_zip_and_reduce(indices, m_storage.dimensions()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index linearizedIndex(const array& indices) const { if (Options&RowMajor) { return m_storage.dimensions().IndexOfRowMajor(indices); } else { return m_storage.dimensions().IndexOfColMajor(indices); } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h0000644000176200001440000004352014562417221026313 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H #define EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H namespace Eigen { /** \class TensorShuffling * \ingroup CXX11_Tensor_Module * * \brief Tensor shuffling class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorShufflingOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorShufflingOp type; }; } // end namespace internal template class TensorShufflingOp : public TensorBase > { public: typedef TensorBase > Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorShufflingOp(const XprType& expr, const Shuffle& shfl) : m_xpr(expr), m_shuffle(shfl) {} EIGEN_DEVICE_FUNC const Shuffle& shufflePermutation() const { return m_shuffle; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorShufflingOp) protected: typename XprType::Nested m_xpr; const Shuffle m_shuffle; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorEvaluator, Device> Self; typedef TensorShufflingOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = TensorEvaluator::RawAccess, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_device(device), m_impl(op.expression(), device) { const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); const Shuffle& shuffle = op.shufflePermutation(); m_is_identity = true; for (int i = 0; i < NumDims; ++i) { m_shuffle[i] = static_cast(shuffle[i]); m_dimensions[i] = input_dims[shuffle[i]]; m_inverseShuffle[shuffle[i]] = i; if (m_is_identity && shuffle[i] != i) { m_is_identity = false; } } if (static_cast(Layout) == static_cast(ColMajor)) { m_unshuffledInputStrides[0] = 1; m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_unshuffledInputStrides[i] = m_unshuffledInputStrides[i - 1] * input_dims[i - 1]; m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1]; m_fastOutputStrides[i] = internal::TensorIntDivisor( m_outputStrides[i] > 0 ? m_outputStrides[i] : Index(1)); } } else { m_unshuffledInputStrides[NumDims - 1] = 1; m_outputStrides[NumDims - 1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_unshuffledInputStrides[i] = m_unshuffledInputStrides[i + 1] * input_dims[i + 1]; m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1]; m_fastOutputStrides[i] = internal::TensorIntDivisor( m_outputStrides[i] > 0 ? m_outputStrides[i] : Index(1)); } } for (int i = 0; i < NumDims; ++i) { m_inputStrides[i] = m_unshuffledInputStrides[shuffle[i]]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { if (m_is_identity) { return m_impl.coeff(index); } else { return m_impl.coeff(srcCoeff(index)); } } template struct PacketLoader { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType Run(const Self& self, Index index) { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = self.coeff(index + i); } PacketReturnType rslt = internal::pload(values); return rslt; } }; template struct PacketLoader { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType Run(const Self& self, Index index) { if (self.m_is_identity) { return self.m_impl.template packet(index); } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = self.coeff(index + i); } PacketReturnType rslt = internal::pload(values); return rslt; } } }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index + PacketSize - 1 < dimensions().TotalSize()); return PacketLoader::PacketAccess>::Run(*this, index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { static const int inner_dim = Layout == static_cast(ColMajor) ? 0 : NumDims - 1; const size_t target_size = m_device.firstLevelCacheSize(); const bool inner_dim_shuffled = m_shuffle[inner_dim] != inner_dim; // Shuffled inner dimensions leads to a random memory access, which is not // captured by default cost model bytes loaded/stored. We add this cost // explicitly. The number of cycles picked based on the benchmarks. // TODO(ezhulenev): This number was picked based on a very questionable // benchmarks, add benchmarks that are representative of real workloads. using BlockRequirements = internal::TensorBlockResourceRequirements; if (inner_dim_shuffled) { return BlockRequirements::uniform(target_size) .addCostPerCoeff({0, 0, NumDims * 28}); } else { return BlockRequirements::skewed(target_size); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool root_of_expr_ast = false) const { assert(m_impl.data() != NULL); typedef internal::TensorBlockIO TensorBlockIO; typedef typename TensorBlockIO::Dst TensorBlockIODst; typedef typename TensorBlockIO::Src TensorBlockIOSrc; const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage( desc, scratch, /*allow_strided_storage=*/root_of_expr_ast); typename TensorBlockIO::Dimensions input_strides(m_unshuffledInputStrides); TensorBlockIOSrc src(input_strides, m_impl.data(), srcCoeff(desc.offset())); TensorBlockIODst dst(block_storage.dimensions(), block_storage.strides(), block_storage.data()); typename TensorBlockIO::DimensionsMap dst_to_src_dim_map(m_shuffle); TensorBlockIO::Copy(dst, src, dst_to_src_dim_map); return block_storage.AsTensorMaterializedBlock(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = m_is_identity ? TensorOpCost::AddCost() : NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost()); return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost, m_is_identity /* vectorized */, PacketSize); } EIGEN_DEVICE_FUNC typename Storage::Type data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index GetBlockOutputIndex( Index input_index, const DSizes& input_block_strides, const DSizes& output_block_strides, const DSizes, NumDims>& fast_input_block_strides) const { Index output_index = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { const Index idx = input_index / fast_input_block_strides[i]; output_index += idx * output_block_strides[m_inverseShuffle[i]]; input_index -= idx * input_block_strides[i]; } return output_index + input_index * output_block_strides[m_inverseShuffle[0]]; } else { for (int i = 0; i < NumDims - 1; ++i) { const Index idx = input_index / fast_input_block_strides[i]; output_index += idx * output_block_strides[m_inverseShuffle[i]]; input_index -= idx * input_block_strides[i]; } return output_index + input_index * output_block_strides[m_inverseShuffle[NumDims - 1]]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const { Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += idx * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } return inputIndex + index * m_inputStrides[0]; } else { for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += idx * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } return inputIndex + index * m_inputStrides[NumDims - 1]; } } Dimensions m_dimensions; bool m_is_identity; array m_shuffle; array m_inverseShuffle; // TODO(ezhulenev): Make it int type. array m_outputStrides; array, NumDims> m_fastOutputStrides; array m_inputStrides; array m_unshuffledInputStrides; const Device EIGEN_DEVICE_REF m_device; TensorEvaluator m_impl; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorShufflingOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = TensorEvaluator::RawAccess, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { return this->m_impl.coeffRef(this->srcCoeff(index)); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; internal::pstore(values, x); EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { this->coeffRef(index+i) = values[i]; } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock( const TensorBlockDesc& desc, const TensorBlock& block) { eigen_assert(this->m_impl.data() != NULL); typedef internal::TensorBlockIO TensorBlockIO; typedef typename TensorBlockIO::Dst TensorBlockIODst; typedef typename TensorBlockIO::Src TensorBlockIOSrc; const Scalar* block_buffer = block.data(); // TODO(ezhulenev): TensorBlockIO should be able to read from any Eigen // expression with coefficient and packet access as `src`. void* mem = NULL; if (block_buffer == NULL) { mem = this->m_device.allocate(desc.size() * sizeof(Scalar)); ScalarNoConst* buf = static_cast(mem); typedef internal::TensorBlockAssignment< ScalarNoConst, NumDims, typename TensorBlock::XprType, Index> TensorBlockAssignment; TensorBlockAssignment::Run( TensorBlockAssignment::target( desc.dimensions(), internal::strides(desc.dimensions()), buf), block.expr()); block_buffer = buf; } // Read from block. TensorBlockIOSrc src(internal::strides(desc.dimensions()), block_buffer); // Write to the output buffer. typename TensorBlockIO::Dimensions output_strides( this->m_unshuffledInputStrides); typename TensorBlockIO::Dimensions output_dimensions; for (int i = 0; i < NumDims; ++i) { output_dimensions[this->m_shuffle[i]] = desc.dimension(i); } TensorBlockIODst dst(output_dimensions, output_strides, this->m_impl.data(), this->srcCoeff(desc.offset())); // Reorder dimensions according to the shuffle. typename TensorBlockIO::DimensionsMap dst_to_src_dim_map; for (int i = 0; i < NumDims; ++i) { dst_to_src_dim_map[i] = static_cast(this->m_inverseShuffle[i]); } TensorBlockIO::Copy(dst, src, dst_to_src_dim_map); // Deallocate temporary buffer used for the block materialization. if (mem != NULL) this->m_device.deallocate(mem); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h0000644000176200001440000003231114562417221026145 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H #define EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H namespace Eigen { /** \class TensorStriding * \ingroup CXX11_Tensor_Module * * \brief Tensor striding class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorStridingOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorStridingOp type; }; } // end namespace internal template class TensorStridingOp : public TensorBase > { public: typedef TensorBase > Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStridingOp(const XprType& expr, const Strides& dims) : m_xpr(expr), m_dims(dims) {} EIGEN_DEVICE_FUNC const Strides& strides() const { return m_dims; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorStridingOp) protected: typename XprType::Nested m_xpr; const Strides m_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorStridingOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = /*TensorEvaluator::IsAligned*/false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { m_dimensions = m_impl.dimensions(); for (int i = 0; i < NumDims; ++i) { m_dimensions[i] =Eigen::numext::ceil(static_cast(m_dimensions[i]) / op.strides()[i]); } const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_outputStrides[0] = 1; m_inputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1]; m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; m_inputStrides[i-1] *= op.strides()[i-1]; } m_inputStrides[NumDims-1] *= op.strides()[NumDims-1]; } else { // RowMajor m_outputStrides[NumDims-1] = 1; m_inputStrides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1]; m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; m_inputStrides[i+1] *= op.strides()[i+1]; } m_inputStrides[0] *= op.strides()[0]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType/*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_impl.coeff(srcCoeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); Index inputIndices[] = {0, 0}; Index indices[] = {index, index + PacketSize - 1}; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx0 = indices[0] / m_outputStrides[i]; const Index idx1 = indices[1] / m_outputStrides[i]; inputIndices[0] += idx0 * m_inputStrides[i]; inputIndices[1] += idx1 * m_inputStrides[i]; indices[0] -= idx0 * m_outputStrides[i]; indices[1] -= idx1 * m_outputStrides[i]; } inputIndices[0] += indices[0] * m_inputStrides[0]; inputIndices[1] += indices[1] * m_inputStrides[0]; } else { // RowMajor EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx0 = indices[0] / m_outputStrides[i]; const Index idx1 = indices[1] / m_outputStrides[i]; inputIndices[0] += idx0 * m_inputStrides[i]; inputIndices[1] += idx1 * m_inputStrides[i]; indices[0] -= idx0 * m_outputStrides[i]; indices[1] -= idx1 * m_outputStrides[i]; } inputIndices[0] += indices[0] * m_inputStrides[NumDims-1]; inputIndices[1] += indices[1] * m_inputStrides[NumDims-1]; } if (inputIndices[1] - inputIndices[0] == PacketSize - 1) { PacketReturnType rslt = m_impl.template packet(inputIndices[0]); return rslt; } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; values[0] = m_impl.coeff(inputIndices[0]); values[PacketSize-1] = m_impl.coeff(inputIndices[1]); EIGEN_UNROLL_LOOP for (int i = 1; i < PacketSize-1; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { double compute_cost = (NumDims - 1) * (TensorOpCost::AddCost() + TensorOpCost::MulCost() + TensorOpCost::DivCost()) + TensorOpCost::MulCost(); if (vectorized) { compute_cost *= 2; // packet() computes two indices } const int innerDim = (static_cast(Layout) == static_cast(ColMajor)) ? 0 : (NumDims - 1); return m_impl.costPerCoeff(vectorized && m_inputStrides[innerDim] == 1) + // Computation is not vectorized per se, but it is done once per packet. TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC typename Storage::Type data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const { Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; inputIndex += idx * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } inputIndex += index * m_inputStrides[0]; } else { // RowMajor EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStrides[i]; inputIndex += idx * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } inputIndex += index * m_inputStrides[NumDims-1]; } return inputIndex; } Dimensions m_dimensions; array m_outputStrides; array m_inputStrides; TensorEvaluator m_impl; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorStridingOp XprType; typedef TensorEvaluator Base; // typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; // typedef DSizes Dimensions; enum { IsAligned = /*TensorEvaluator::IsAligned*/false, PacketAccess = TensorEvaluator::PacketAccess, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return this->m_impl.coeffRef(this->srcCoeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < this->dimensions().TotalSize()); Index inputIndices[] = {0, 0}; Index indices[] = {index, index + PacketSize - 1}; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx0 = indices[0] / this->m_outputStrides[i]; const Index idx1 = indices[1] / this->m_outputStrides[i]; inputIndices[0] += idx0 * this->m_inputStrides[i]; inputIndices[1] += idx1 * this->m_inputStrides[i]; indices[0] -= idx0 * this->m_outputStrides[i]; indices[1] -= idx1 * this->m_outputStrides[i]; } inputIndices[0] += indices[0] * this->m_inputStrides[0]; inputIndices[1] += indices[1] * this->m_inputStrides[0]; } else { // RowMajor EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx0 = indices[0] / this->m_outputStrides[i]; const Index idx1 = indices[1] / this->m_outputStrides[i]; inputIndices[0] += idx0 * this->m_inputStrides[i]; inputIndices[1] += idx1 * this->m_inputStrides[i]; indices[0] -= idx0 * this->m_outputStrides[i]; indices[1] -= idx1 * this->m_outputStrides[i]; } inputIndices[0] += indices[0] * this->m_inputStrides[NumDims-1]; inputIndices[1] += indices[1] * this->m_inputStrides[NumDims-1]; } if (inputIndices[1] - inputIndices[0] == PacketSize - 1) { this->m_impl.template writePacket(inputIndices[0], x); } else { EIGEN_ALIGN_MAX Scalar values[PacketSize]; internal::pstore(values, x); this->m_impl.coeffRef(inputIndices[0]) = values[0]; this->m_impl.coeffRef(inputIndices[1]) = values[PacketSize-1]; EIGEN_UNROLL_LOOP for (int i = 1; i < PacketSize-1; ++i) { this->coeffRef(index+i) = values[i]; } } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h0000644000176200001440000002525014562417221026314 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H #define EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H namespace Eigen { /** \class TensorGeneratorOp * \ingroup CXX11_Tensor_Module * * \brief Tensor generator class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorGeneratorOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorGeneratorOp type; }; } // end namespace internal template class TensorGeneratorOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorGeneratorOp(const XprType& expr, const Generator& generator) : m_xpr(expr), m_generator(generator) {} EIGEN_DEVICE_FUNC const Generator& generator() const { return m_generator; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const Generator m_generator; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorGeneratorOp XprType; typedef typename XprType::Index Index; typedef typename TensorEvaluator::Dimensions Dimensions; static const int NumDims = internal::array_size::value; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = true, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; typedef internal::TensorIntDivisor IndexDivisor; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_device(device), m_generator(op.generator()) { TensorEvaluator argImpl(op.expression(), device); m_dimensions = argImpl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_strides[0] = 1; EIGEN_UNROLL_LOOP for (int i = 1; i < NumDims; ++i) { m_strides[i] = m_strides[i - 1] * m_dimensions[i - 1]; if (m_strides[i] != 0) m_fast_strides[i] = IndexDivisor(m_strides[i]); } } else { m_strides[NumDims - 1] = 1; EIGEN_UNROLL_LOOP for (int i = NumDims - 2; i >= 0; --i) { m_strides[i] = m_strides[i + 1] * m_dimensions[i + 1]; if (m_strides[i] != 0) m_fast_strides[i] = IndexDivisor(m_strides[i]); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { return true; } EIGEN_STRONG_INLINE void cleanup() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { array coords; extract_coordinates(index, coords); return m_generator(coords); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { const int packetSize = PacketType::size; EIGEN_STATIC_ASSERT((packetSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+packetSize-1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX typename internal::remove_const::type values[packetSize]; for (int i = 0; i < packetSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { const size_t target_size = m_device.firstLevelCacheSize(); // TODO(ezhulenev): Generator should have a cost. return internal::TensorBlockResourceRequirements::skewed( target_size); } struct BlockIteratorState { Index stride; Index span; Index size; Index count; }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { static const bool is_col_major = static_cast(Layout) == static_cast(ColMajor); // Compute spatial coordinates for the first block element. array coords; extract_coordinates(desc.offset(), coords); array initial_coords = coords; // Offset in the output block buffer. Index offset = 0; // Initialize output block iterator state. Dimension in this array are // always in inner_most -> outer_most order (col major layout). array it; for (int i = 0; i < NumDims; ++i) { const int dim = is_col_major ? i : NumDims - 1 - i; it[i].size = desc.dimension(dim); it[i].stride = i == 0 ? 1 : (it[i - 1].size * it[i - 1].stride); it[i].span = it[i].stride * (it[i].size - 1); it[i].count = 0; } eigen_assert(it[0].stride == 1); // Prepare storage for the materialized generator result. const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage(desc, scratch); CoeffReturnType* block_buffer = block_storage.data(); static const int packet_size = PacketType::size; static const int inner_dim = is_col_major ? 0 : NumDims - 1; const Index inner_dim_size = it[0].size; const Index inner_dim_vectorized = inner_dim_size - packet_size; while (it[NumDims - 1].count < it[NumDims - 1].size) { Index i = 0; // Generate data for the vectorized part of the inner-most dimension. for (; i <= inner_dim_vectorized; i += packet_size) { for (Index j = 0; j < packet_size; ++j) { array j_coords = coords; // Break loop dependence. j_coords[inner_dim] += j; *(block_buffer + offset + i + j) = m_generator(j_coords); } coords[inner_dim] += packet_size; } // Finalize non-vectorized part of the inner-most dimension. for (; i < inner_dim_size; ++i) { *(block_buffer + offset + i) = m_generator(coords); coords[inner_dim]++; } coords[inner_dim] = initial_coords[inner_dim]; // For the 1d tensor we need to generate only one inner-most dimension. if (NumDims == 1) break; // Update offset. for (i = 1; i < NumDims; ++i) { if (++it[i].count < it[i].size) { offset += it[i].stride; coords[is_col_major ? i : NumDims - 1 - i]++; break; } if (i != NumDims - 1) it[i].count = 0; coords[is_col_major ? i : NumDims - 1 - i] = initial_coords[is_col_major ? i : NumDims - 1 - i]; offset -= it[i].span; } } return block_storage.AsTensorMaterializedBlock(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool) const { // TODO(rmlarsen): This is just a placeholder. Define interface to make // generators return their cost. return TensorOpCost(0, 0, TensorOpCost::AddCost() + TensorOpCost::MulCost()); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler&) const {} #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void extract_coordinates(Index index, array& coords) const { if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_fast_strides[i]; index -= idx * m_strides[i]; coords[i] = idx; } coords[0] = index; } else { for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_fast_strides[i]; index -= idx * m_strides[i]; coords[i] = idx; } coords[NumDims-1] = index; } } const Device EIGEN_DEVICE_REF m_device; Dimensions m_dimensions; array m_strides; array m_fast_strides; Generator m_generator; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGlobalFunctions.h0000644000176200001440000000244414107270226027454 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Eugene Brevdo // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_GLOBAL_FUNCTIONS_H #define EIGEN_CXX11_TENSOR_TENSOR_GLOBAL_FUNCTIONS_H namespace Eigen { /** \cpp11 \returns an expression of the coefficient-wise betainc(\a x, \a a, \a b) to the given tensors. * * This function computes the regularized incomplete beta function (integral). * */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseTernaryOp, const ADerived, const BDerived, const XDerived> betainc(const ADerived& a, const BDerived& b, const XDerived& x) { return TensorCwiseTernaryOp< internal::scalar_betainc_op, const ADerived, const BDerived, const XDerived>( a, b, x, internal::scalar_betainc_op()); } } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_GLOBAL_FUNCTIONS_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h0000644000176200001440000001765014562417221025261 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_META_H #define EIGEN_CXX11_TENSOR_TENSOR_META_H namespace Eigen { template struct Cond {}; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const T1& choose(Cond, const T1& first, const T2&) { return first; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const T2& choose(Cond, const T1&, const T2& second) { return second; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T divup(const X x, const Y y) { return static_cast((x + y - 1) / y); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T divup(const T x, const T y) { return static_cast((x + y - 1) / y); } template struct max_n_1 { static const size_t size = n; }; template <> struct max_n_1<0> { static const size_t size = 1; }; // Default packet types template struct PacketType : internal::packet_traits { typedef typename internal::packet_traits::type type; }; // For CUDA packet types when using a GpuDevice #if defined(EIGEN_USE_GPU) && defined(EIGEN_HAS_GPU_FP16) typedef ulonglong2 Packet4h2; template<> struct PacketType { typedef Packet4h2 type; static const int size = 8; enum { HasAdd = 1, HasSub = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 0, HasMin = 1, HasMax = 1, HasConj = 0, HasSetLinear = 0, HasBlend = 0, HasDiv = 1, HasSqrt = 1, HasRsqrt = 1, HasExp = 1, HasExpm1 = 0, HasLog = 1, HasLog1p = 0, HasLog10 = 0, HasPow = 1, }; }; #endif #if defined(EIGEN_USE_SYCL) namespace TensorSycl { namespace internal { template struct PlusOp { static constexpr Index Value = A + B; }; template struct DivOp { static constexpr Index Value = A / B; }; template class StepOp> struct static_for { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void loop(UnaryOperator op) { op(start); static_for::Value, end, step, StepOp>::loop(op); } }; template class StepOp> struct static_for { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void loop(UnaryOperator) {} }; template struct Vectorise { static const int PacketSize = 1; typedef OutScalar PacketReturnType; }; template struct Vectorise { static const int PacketSize = Eigen::PacketType::size; typedef typename Eigen::PacketType::type PacketReturnType; }; static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Index roundUp(Index x, Index y) { return ((((x) + (y)-1) / (y)) * (y)); } } // namespace internal } // namespace TensorSycl template <> struct PacketType { typedef half type; static const int size = 1; enum { HasAdd = 0, HasSub = 0, HasMul = 0, HasNegate = 0, HasAbs = 0, HasArg = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasConj = 0, HasSetLinear = 0, HasBlend = 0 }; }; template struct PacketType : internal::default_packet_traits { typedef Scalar type; typedef Scalar half; enum { Vectorizable = 0, size = 1, AlignedOnScalar = 0, HasHalfPacket = 0 }; enum { HasAdd = 0, HasSub = 0, HasMul = 0, HasNegate = 0, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasConj = 0, HasSetLinear = 0 }; }; template struct PacketType : PacketType{}; #ifndef EIGEN_DONT_VECTORIZE_SYCL #define PACKET_TYPE(CVQual, Type, val, lengths, DEV)\ template<> struct PacketType : internal::sycl_packet_traits \ {\ typedef typename internal::packet_traits::type type;\ typedef typename internal::packet_traits::half half;\ }; PACKET_TYPE(const, float, 1, 4, SyclDevice) PACKET_TYPE(, float, 1, 4, SyclDevice) PACKET_TYPE(const, float, 1, 4, const SyclDevice) PACKET_TYPE(, float, 1, 4, const SyclDevice) PACKET_TYPE(const, double, 0, 2, SyclDevice) PACKET_TYPE(, double, 0, 2, SyclDevice) PACKET_TYPE(const, double, 0, 2, const SyclDevice) PACKET_TYPE(, double, 0, 2, const SyclDevice) #undef PACKET_TYPE template<> struct PacketType: PacketType{}; template<> struct PacketType: PacketType{}; #endif #endif // Tuple mimics std::pair but works on e.g. nvcc. template struct Tuple { public: U first; V second; typedef U first_type; typedef V second_type; EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tuple() : first(), second() {} EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Tuple(const U& f, const V& s) : first(f), second(s) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void swap(Tuple& rhs) { using numext::swap; swap(first, rhs.first); swap(second, rhs.second); } }; template EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator==(const Tuple& x, const Tuple& y) { return (x.first == y.first && x.second == y.second); } template EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator!=(const Tuple& x, const Tuple& y) { return !(x == y); } // Can't use std::pairs on cuda devices template struct IndexPair { EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE IndexPair() : first(0), second(0) {} EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE IndexPair(Idx f, Idx s) : first(f), second(s) {} EIGEN_DEVICE_FUNC void set(IndexPair val) { first = val.first; second = val.second; } Idx first; Idx second; }; #ifdef EIGEN_HAS_SFINAE namespace internal { template EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array customIndices2Array(IndexType& idx, numeric_list) { return { idx[Is]... }; } template EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array customIndices2Array(IndexType&, numeric_list) { return array(); } /** Make an array (for index/dimensions) out of a custom index */ template EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array customIndices2Array(IndexType& idx) { return customIndices2Array(idx, typename gen_numeric_list::type{}); } template struct is_base_of { typedef char (&yes)[1]; typedef char (&no)[2]; template struct Host { operator BB*() const; operator DD*(); }; template static yes check(D*, T); static no check(B*, int); static const bool value = sizeof(check(Host(), int())) == sizeof(yes); }; } #endif } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_META_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h0000644000176200001440000002412314562417221025610 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H #define EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H namespace Eigen { /** \class TensorAssign * \ingroup CXX11_Tensor_Module * * \brief The tensor assignment class. * * This class is represents the assignment of the values resulting from the evaluation of * the rhs expression to the memory locations denoted by the lhs expression. */ namespace internal { template struct traits > { typedef typename LhsXprType::Scalar Scalar; typedef typename traits::StorageKind StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename LhsXprType::Nested LhsNested; typedef typename RhsXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; static const std::size_t NumDimensions = internal::traits::NumDimensions; static const int Layout = internal::traits::Layout; typedef typename traits::PointerType PointerType; enum { Flags = 0 }; }; template struct eval, Eigen::Dense> { typedef const TensorAssignOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorAssignOp type; }; } // end namespace internal template class TensorAssignOp : public TensorBase > { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename LhsXprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; static const int NumDims = Eigen::internal::traits::NumDimensions; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorAssignOp(LhsXprType& lhs, const RhsXprType& rhs) : m_lhs_xpr(lhs), m_rhs_xpr(rhs) {} /** \returns the nested expressions */ EIGEN_DEVICE_FUNC typename internal::remove_all::type& lhsExpression() const { return *((typename internal::remove_all::type*)&m_lhs_xpr); } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& rhsExpression() const { return m_rhs_xpr; } protected: typename internal::remove_all::type& m_lhs_xpr; const typename internal::remove_all::type& m_rhs_xpr; }; template struct TensorEvaluator, Device> { typedef TensorAssignOp XprType; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; static const int PacketSize = PacketType::size; static const int NumDims = XprType::NumDims; enum { IsAligned = int(TensorEvaluator::IsAligned) & int(TensorEvaluator::IsAligned), PacketAccess = int(TensorEvaluator::PacketAccess) & int(TensorEvaluator::PacketAccess), BlockAccess = int(TensorEvaluator::BlockAccess) & int(TensorEvaluator::BlockAccess), PreferBlockAccess = int(TensorEvaluator::PreferBlockAccess) | int(TensorEvaluator::PreferBlockAccess), Layout = TensorEvaluator::Layout, RawAccess = TensorEvaluator::RawAccess }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock RightTensorBlock; //===--------------------------------------------------------------------===// TensorEvaluator(const XprType& op, const Device& device) : m_leftImpl(op.lhsExpression(), device), m_rightImpl(op.rhsExpression(), device) { EIGEN_STATIC_ASSERT( (static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); } EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { // The dimensions of the lhs and the rhs tensors should be equal to prevent // overflows and ensure the result is fully initialized. // TODO: use left impl instead if right impl dimensions are known at compile time. return m_rightImpl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { eigen_assert(dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())); m_leftImpl.evalSubExprsIfNeeded(NULL); // If the lhs provides raw access to its storage area (i.e. if m_leftImpl.data() returns a non // null value), attempt to evaluate the rhs expression in place. Returns true iff in place // evaluation isn't supported and the caller still needs to manually assign the values generated // by the rhs to the lhs. return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data()); } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_leftImpl.evalSubExprsIfNeededAsync(nullptr, [this, done](bool) { m_rightImpl.evalSubExprsIfNeededAsync( m_leftImpl.data(), [done](bool need_assign) { done(need_assign); }); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_leftImpl.cleanup(); m_rightImpl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalScalar(Index i) { m_leftImpl.coeffRef(i) = m_rightImpl.coeff(i); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalPacket(Index i) { const int LhsStoreMode = TensorEvaluator::IsAligned ? Aligned : Unaligned; const int RhsLoadMode = TensorEvaluator::IsAligned ? Aligned : Unaligned; m_leftImpl.template writePacket(i, m_rightImpl.template packet(i)); } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_leftImpl.coeff(index); } template EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const { return m_leftImpl.template packet(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // We assume that evalPacket or evalScalar is called to perform the // assignment and account for the cost of the write here, but reduce left // cost by one load because we are using m_leftImpl.coeffRef. TensorOpCost left = m_leftImpl.costPerCoeff(vectorized); return m_rightImpl.costPerCoeff(vectorized) + TensorOpCost( numext::maxi(0.0, left.bytes_loaded() - sizeof(CoeffReturnType)), left.bytes_stored(), left.compute_cycles()) + TensorOpCost(0, sizeof(CoeffReturnType), 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return internal::TensorBlockResourceRequirements::merge( m_leftImpl.getResourceRequirements(), m_rightImpl.getResourceRequirements()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalBlock( TensorBlockDesc& desc, TensorBlockScratch& scratch) { if (TensorEvaluator::RawAccess && m_leftImpl.data() != NULL) { // If destination has raw data access, we pass it as a potential // destination for a block descriptor evaluation. desc.template AddDestinationBuffer( /*dst_base=*/m_leftImpl.data() + desc.offset(), /*dst_strides=*/internal::strides(m_leftImpl.dimensions())); } RightTensorBlock block = m_rightImpl.block(desc, scratch, /*root_of_expr_ast=*/true); // If block was evaluated into a destination, there is no need to do assignment. if (block.kind() != internal::TensorBlockKind::kMaterializedInOutput) { m_leftImpl.writeBlock(desc, block); } block.cleanup(); } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_leftImpl.bind(cgh); m_rightImpl.bind(cgh); } #endif EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_leftImpl.data(); } private: TensorEvaluator m_leftImpl; TensorEvaluator m_rightImpl; }; } #endif // EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h0000644000176200001440000003422614562417221026263 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H #define EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H namespace Eigen { /** \class TensorFixedSize * \ingroup CXX11_Tensor_Module * * \brief The fixed sized version of the tensor class. * * The fixed sized equivalent of * Eigen::Tensor t(3, 5, 7); * is * Eigen::TensorFixedSize> t; */ template class TensorFixedSize : public TensorBase > { public: typedef TensorFixedSize Self; typedef TensorBase > Base; typedef typename Eigen::internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef Scalar_ Scalar; typedef typename NumTraits::Real RealScalar; typedef typename Base::CoeffReturnType CoeffReturnType; static const int Options = Options_; enum { IsAligned = bool(EIGEN_MAX_ALIGN_BYTES>0), PacketAccess = (internal::packet_traits::size > 1), BlockAccess = false, PreferBlockAccess = false, Layout = Options_ & RowMajor ? RowMajor : ColMajor, CoordAccess = true, RawAccess = true }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// typedef Dimensions_ Dimensions; static const std::size_t NumIndices = Dimensions::count; protected: TensorStorage m_storage; public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rank() const { return NumIndices; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index dimension(std::size_t n) const { return m_storage.dimensions()[n]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_storage.dimensions(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_storage.size(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar *data() { return m_storage.data(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar *data() const { return m_storage.data(); } // This makes EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED // work, because that uses base().coeffRef() - and we don't yet // implement a similar class hierarchy inline Self& base() { return *this; } inline const Self& base() const { return *this; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index firstIndex, IndexTypes... otherIndices) const { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeff(array{{firstIndex, otherIndices...}}); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(const array& indices) const { eigen_internal_assert(checkIndexRange(indices)); return m_storage.data()[linearizedIndex(indices)]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const { eigen_internal_assert(index >= 0 && index < size()); return m_storage.data()[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff() const { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return m_storage.data()[0]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index firstIndex, IndexTypes... otherIndices) { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeffRef(array{{firstIndex, otherIndices...}}); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array& indices) { eigen_internal_assert(checkIndexRange(indices)); return m_storage.data()[linearizedIndex(indices)]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { eigen_internal_assert(index >= 0 && index < size()); return m_storage.data()[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return m_storage.data()[0]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) const { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return this->operator()(array{{firstIndex, otherIndices...}}); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1) const { if (Options&RowMajor) { const Index index = i1 + i0 * m_storage.dimensions()[1]; return m_storage.data()[index]; } else { const Index index = i0 + i1 * m_storage.dimensions()[0]; return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2) const { if (Options&RowMajor) { const Index index = i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * i2); return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3) const { if (Options&RowMajor) { const Index index = i3 + m_storage.dimensions()[3] * (i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0)); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * (i2 + m_storage.dimensions()[2] * i3)); return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const { if (Options&RowMajor) { const Index index = i4 + m_storage.dimensions()[4] * (i3 + m_storage.dimensions()[3] * (i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0))); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * (i2 + m_storage.dimensions()[2] * (i3 + m_storage.dimensions()[3] * i4))); return m_storage.data()[index]; } } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(const array& indices) const { eigen_assert(checkIndexRange(indices)); return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const { eigen_internal_assert(index >= 0 && index < size()); return coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()() const { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeff(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator[](Index index) const { // The bracket operator is only for vectors, use the parenthesis operator instead. EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeff(index); } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) { // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) return operator()(array{{firstIndex, otherIndices...}}); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1) { if (Options&RowMajor) { const Index index = i1 + i0 * m_storage.dimensions()[1]; return m_storage.data()[index]; } else { const Index index = i0 + i1 * m_storage.dimensions()[0]; return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2) { if (Options&RowMajor) { const Index index = i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * i2); return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3) { if (Options&RowMajor) { const Index index = i3 + m_storage.dimensions()[3] * (i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0)); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * (i2 + m_storage.dimensions()[2] * i3)); return m_storage.data()[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) { if (Options&RowMajor) { const Index index = i4 + m_storage.dimensions()[4] * (i3 + m_storage.dimensions()[3] * (i2 + m_storage.dimensions()[2] * (i1 + m_storage.dimensions()[1] * i0))); return m_storage.data()[index]; } else { const Index index = i0 + m_storage.dimensions()[0] * (i1 + m_storage.dimensions()[1] * (i2 + m_storage.dimensions()[2] * (i3 + m_storage.dimensions()[3] * i4))); return m_storage.data()[index]; } } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(const array& indices) { eigen_assert(checkIndexRange(indices)); return coeffRef(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index index) { eigen_assert(index >= 0 && index < size()); return coeffRef(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE); return coeffRef(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator[](Index index) { // The bracket operator is only for vectors, use the parenthesis operator instead EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE) return coeffRef(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize() : m_storage() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize(const Self& other) : m_storage(other.m_storage) { } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize(Self&& other) : m_storage(other.m_storage) { } #endif template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize(const TensorBase& other) { typedef TensorAssignOp Assign; Assign assign(*this, other.derived()); internal::TensorExecutor::run(assign, DefaultDevice()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize(const TensorBase& other) { typedef TensorAssignOp Assign; Assign assign(*this, other.derived()); internal::TensorExecutor::run(assign, DefaultDevice()); } // FIXME: check that the dimensions of other match the dimensions of *this. // Unfortunately this isn't possible yet when the rhs is an expression. EIGEN_TENSOR_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(TensorFixedSize) protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool checkIndexRange(const array& /*indices*/) const { using internal::array_apply_and_reduce; using internal::array_zip_and_reduce; using internal::greater_equal_zero_op; using internal::logical_and_op; using internal::lesser_op; return true; // check whether the indices are all >= 0 /* array_apply_and_reduce(indices) && // check whether the indices fit in the dimensions array_zip_and_reduce(indices, m_storage.dimensions());*/ } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index linearizedIndex(const array& indices) const { if (Options&RowMajor) { return m_storage.dimensions().IndexOfRowMajor(indices); } else { return m_storage.dimensions().IndexOfColMajor(indices); } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionSycl.h0000755000176200001440000025572214562417221027520 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla Public License v. 2.0. If a copy of the MPL was not // distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * TensorContractionSycl.h * * \brief: * TensorContractionSycl.h, provides various tensor contraction kernel for SYCL backend * *****************************************************************/ #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_SYCL_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_SYCL_H namespace Eigen { namespace TensorSycl { namespace internal { #ifndef EIGEN_SYCL_DISABLE_GEMV /*! * \brief TVPanelSize, a template class used for setting the panel size required for launching General TensorVector * contraction kernel on various hardware devices. * * \tparam Scalar: determines the element type of the tensor/vector * * \tparam StorageIndex determines the Index type. * * \tparam NCWindow: determines the number of non-contracting element to be process by each work-group * * \tparam CFactor: determines the number of contracting element to be process by each thread * * \tparam NCFactor: determines the number of non-contracting element to be process by each thread */ template struct TVPanelSize { // LocalThreadSizeC: determines total number of thread per workgroup for the contracting dimension static EIGEN_CONSTEXPR StorageIndex LocalThreadSizeC = EIGEN_SYCL_LOCAL_THREAD_DIM0; // LocalThreadSizeNC: determines total number of thread per workgroup for the non-contracting dimension static EIGEN_CONSTEXPR StorageIndex LocalThreadSizeNC = EIGEN_SYCL_LOCAL_THREAD_DIM1; // TileSizeDimNC: determines the tile size for the non-contracting dimension static EIGEN_CONSTEXPR StorageIndex TileSizeDimNC = NCWindow / NCFactor; // TileSizeDimC: determines the tile size for the contracting dimension static EIGEN_CONSTEXPR StorageIndex TileSizeDimC = CFactor * LocalThreadSizeNC * LocalThreadSizeC; // WorkLoadPerThreadNC : determines workload per thread for loading the non-contracting dimension static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadNC = TileSizeDimNC / LocalThreadSizeNC; // WorkLoadPerThreadC: determines workload per thread for loading the non-contracting dimension static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadC = TileSizeDimC / LocalThreadSizeC; // BC : determines if supporting bank conflict is required static EIGEN_CONSTEXPR bool BC = false; }; #endif /*! * \brief TTPanelSize, a template class used for setting the panel size required for launching General Tensor Tensor contraction kernel on various hardware devices. * * \tparam Scalar: determines the element type of the tensor * * \tparam StorageIndex: determines the Index type. * * \tparam REG_SIZE_M: determines workload per thread for loading the M dimension This can be varied based on the available register on a chosen device(can be controlled by EIGEN_SYCL_REG_M macro). * * \tparam REG_SIZE_N: determines workload per thread for loading the N dimension This can be varied based on the available register on a chosen device(can be controlled by EIGEN_SYCL_REG_N macro). * * \tparam TSDK: determines Tile size for dimension K. The packet size is assumed to be considered */ template struct TTPanelSize { // TileSizeDimK: determines Tile size for dimension K. The packet size is assumed to be considered static EIGEN_CONSTEXPR StorageIndex TileSizeDimK = TSDK; // WorkLoadPerThreadM : determines workload per thread for loading the M dimension This can be varied based on the // available register on a chosen device(can be controlled by EIGEN_SYCL_REG_M macro// #ifndef EIGEN_SYCL_REG_M static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadM = REG_SIZE_M; #else static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadM = EIGEN_SYCL_REG_M; #endif // WorkLoadPerThreadN : determines workload per thread for loading the N dimension This can be varied based on the // available register on a chosen device(can be controlled by EIGEN_SYCL_REG_N macro #ifndef EIGEN_SYCL_REG_N static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadN = REG_SIZE_N; #else static EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadN = EIGEN_SYCL_REG_N; #endif // LocalThreadSizeM: determines total number of thread per workgroup for the m dimension static EIGEN_CONSTEXPR StorageIndex LocalThreadSizeM = EIGEN_SYCL_LOCAL_THREAD_DIM0; // LocalThreadSizeN: determines total number of thread per workgroup for the n dimension static EIGEN_CONSTEXPR StorageIndex LocalThreadSizeN = EIGEN_SYCL_LOCAL_THREAD_DIM1; // TileSizeDimM: determines the tile size for the m dimension static EIGEN_CONSTEXPR StorageIndex TileSizeDimM = LocalThreadSizeM * WorkLoadPerThreadM; // TileSizeDimN: determines the tile size for the n dimension static EIGEN_CONSTEXPR StorageIndex TileSizeDimN = LocalThreadSizeN * WorkLoadPerThreadN; // LoadPerThreadLhs: determines workload per thread for loading Lhs Tensor. This must be divisable by packetsize static EIGEN_CONSTEXPR StorageIndex LoadPerThreadLhs = ((TileSizeDimK * WorkLoadPerThreadM * WorkLoadPerThreadN) / (TileSizeDimN)); // LoadPerThreadRhs: determines workload per thread for loading Rhs Tensor. This must be divisable by packetsize static EIGEN_CONSTEXPR StorageIndex LoadPerThreadRhs = ((TileSizeDimK * WorkLoadPerThreadM * WorkLoadPerThreadN) / (TileSizeDimM)); // BC : determines if supporting bank conflict is required static EIGEN_CONSTEXPR bool BC = true; // DoubleBuffer: determines if double buffering technique should be used (This can be disabled by // EIGEN_SYCL_DISABLE_DOUBLE_BUFFER macro when the device doesnot have sufficient local memory) static EIGEN_CONSTEXPR bool DoubleBuffer = #ifdef EIGEN_SYCL_DISABLE_DOUBLE_BUFFER false; #else true; #endif }; /* ! * \brief contraction_type: an enum class representing the Tensor Contraction implementation algorithm. This is used to * specialize the contraction algorithm based on device support for dedicated local memory. */ enum class contraction_type { local, no_local }; /* ! * \brief data_source an enum class determining the location of the data in a memory hierarchy (global, local, private). */ enum class data_source { global_mem, local_mem, private_mem }; /*! * \brief read, a template function used for loading the data from global memory. This function is used to guarantee coalesced and vectorized load whenever possible * * \tparam PacketLoad: determines if the each element of this tensor block should be loaded in a packet mode * * \param is_coalesced_layout: determines whether or not the Tensor data in a memory can be access coalesced and vectorized when possible. Coalesced memory access is a key factor in Kernel performance. When a tensor is 2d and the contracting dimension is 1, it is always possible to accessed tensor data coalesced and vectorized. This is the case when RHS(right hand side) Tensor is transposed or when LHS(left hand side) Tensor is not transposed. * * \tparam PacketType: determines the type of packet * * \tparam TensorMapper: determines the input tensor mapper type * * \tparam StorageIndex: determines the Index type * \param tensorMapper: is the input tensor * * \param NCIndex: is the non-contracting dim index * * \param CIndex is the contracting dim index * * \param ld: is the leading dimension of the flattened tensor */ template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type read( const TensorMapper &tensorMapper, const StorageIndex &NCIndex, const StorageIndex &CIndex, const StorageIndex &ld) { const StorageIndex row = (is_coalesced_layout) ? NCIndex : CIndex; const StorageIndex col = (is_coalesced_layout) ? CIndex : NCIndex; return tensorMapper.get_tensor().template packet(row + (col * ld)); } /*! * \brief read, special overload of read function, when the read access is not vectorized * * \tparam PacketLoad: determines if the each element of this tensor block should be loaded in a packet mode * * \param is_coalesced_layout: determines whether or not the Tensor data in a memory can be access coalesced and vectorized when possible. Coalesced memory access is a key factor in Kernel performance. When a tensor is 2d and the contracting dimension is 1, it is always possible to accessed tensor data coalesced and vectorized. This is the case when RHS(right hand side) Tensor is transposed or when LHS(left hand side) Tensor is not transposed. * * \tparam PacketType: determines the type of packet * * \tparam TensorMapper: determines the input tensor mapper type * * \tparam StorageIndex: determines the Index type * \param tensorMapper: is the input tensor * * \param NCIndex: is the non-contracting dim index * * \param CIndex: is the contracting dim index */ template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type read( const TensorMapper &tensorMapper, const StorageIndex &NCIndex, const StorageIndex &CIndex, const StorageIndex &) { const StorageIndex row = (IsRhs) ? CIndex : NCIndex; const StorageIndex col = (IsRhs) ? NCIndex : CIndex; return tensorMapper(row, col); } /*! * \brief write, a template function used for storing the data to local memory. This function is used to guarantee * coalesced and vectorized store whenever possible. * * \tparam StorageIndex: determines the Index type * * \param ld is the leading dimension of the local memory. ld is a compile time value for the local memory * * \tparam data_source: an enum value representing if the location of the data in a memory hierarchy. * * \tparam PacketType: determines the type of packet * * \tparam DataScalar: determines the output data type * * \param packet_data: the data to be written in the local memory * * \param ptr: a pointer to the local memory * * \param CIndex is the contracting dim index */ template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if
::type write(PacketType &packet_data, DataScalar ptr) { EIGEN_CONSTEXPR int PacketSize = Eigen::internal::unpacket_traits::size; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; i++) { *ptr = PacketWrapper::scalarize(i, packet_data); ptr += ld; } } /*! * \brief Overloading the write function for storing the data to global memory, when vectorization enabled This function * is used to guarantee coalesced and vectorized store whenever possible. * * \tparam data_source: an enum value representing if the location of the data in a memory hierarchy. * * \tparam PacketType: determines the type of packet * * \tparam DataScalar: determines the output data type * * \param packet_data: the data to be written in the local memory * * \param ptr: a pointer to the local memory */ template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if< Eigen::internal::unpacket_traits::size != 1 && dt == data_source::global_mem, void>::type write(PacketType &packet_data, DataScalar *ptr) { ::Eigen::internal::pstoreu(ptr, packet_data); } /*! * \brief Overloading the write function for storing the data to global memory, when vectorization is disabled. * * \tparam data_source: an enum value representing if the location of the data in a memory hierarchy. * * \tparam PacketType: determines the type of packet * * \tparam DataScalar: determines the output data type * * \param packet_data: the data to be written in the local memory * * \param ptr: a pointer to the local memory */ template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if< Eigen::internal::unpacket_traits::size == 1 && dt == data_source::global_mem, void>::type write(PacketType &packet_data, DataScalar *ptr) { *ptr = packet_data; } /*! * \brief check_boundary: is used to check the edge condition for non-internal blocks. * * \tparam is_internal: determines if the block is internal */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool check_boundary(bool) { return true; } /*! * \brief check_boundary: specialization of the check_boundary for non-internal blocks. * * \param cond: true when the data is in range. Otherwise false */ template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool check_boundary(bool cond) { return cond; } /*! * \brief BlockProperties is a template class that provides different characteristic of a block of each Tensor processed * by each workgroup. * * \tparam is_transposed: iff true, determines whether or not the block of the Tensor is transposed * * \tparam packet_load_: determines if the each element of this tensor block should be loaded in a packet mode * * \tparam PacketType: determines the type of packet * * \tparam OutType: determines the type of each element for this block of tensor. If packet load is true, it will be * packetType; Otherwise it will be scalar Type * * \param elements_per_access determines the size of each element based on OutType * * \param is_coalesced_layout determines whether or not the Tensor data in a memory can be access coalesced and * vectorized when possible. Coalesced memory access is a key factor in Kernel performance. When a tensor is 2d and the * contracting dimension is 1, it is always possible to accessed tensor data coalesced and vectorized. This is the case * when RHS(right hand side) Tensor is transposed or when LHS(left hand side) Tensor is not transposed. * * \param nc_stride determines the stride of non-contracting dimension to access the next adjustment element within the * Tensor Block for each workgroup * * \param c_stride determines the stride of contracting dimension to access the next adjustment element within the * Tensor Block for each workgroup */ template struct BlockProperties { static EIGEN_CONSTEXPR bool packet_load = packet_load_; typedef typename Eigen::internal::unpacket_traits::type OutScalar; static EIGEN_CONSTEXPR bool is_rhs = is_rhs_; typedef typename Eigen::internal::conditional::type OutType; static EIGEN_CONSTEXPR int elements_per_access = Eigen::internal::unpacket_traits::size; static EIGEN_CONSTEXPR bool is_coalesced_layout = !(is_transposed ^ is_rhs); static EIGEN_CONSTEXPR int nc_stride = (is_coalesced_layout ? elements_per_access : 1); static EIGEN_CONSTEXPR int c_stride = (is_coalesced_layout ? 1 : elements_per_access); }; /*! * \brief ThreadProperties is a template class that provides each thread's properties within a workgroup. Please see * the sycl-1.2.1 specification (https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf) for the workgroup, * work-items * * \tparam StorageIndex: determines the StorageIndex Type * * \param linearLocalThreadId: determines the linearized location of a thread within a work-group * * \param kGroupId: determines the logical group id in a k dimension of the flattened tensor. It will be > 1 when * tall/skinny algorithm is used * * \param mGroupOffset: determines the logical start position of all thread within a workgroup for the m dimension of * the flattened tensor. * * \param kGroupOffset determines the logical start position of all thread within a workgroup for the k dimension of the * flattened tensor. It will be > 1 when tall/skinny algorithm is used. * * \param mLocalOffset: determines the logical start position of each thread within a workgroup for the m dimension of a * flattened tensor. The position determines the distance of each thread within the workgroup from each other * independent from their global position. * * \param nLocalOffset: determines the logical start position of each thread within a workgroup for the n dimension of a * flattened tensor. The position determines the distance of each thread within the workgroup from each other * independent from their global position. * * \param mGlobalOffset: determines the logical start position of each thread a thread for the m dimension on a * flattened tensor * * \param nGlobalOffset: determines the logical start position of each thread a thread for the n dimension on a * flattened tensor * * \param kSize : determine the number of the k elements of the flattened Tensor to be processed by each thread for the * given tensor block. This is !=K dimension of Flattened Tensor when Tall/Skinny matrix is used. * * \param is_internal : this will determined if the thread within the work-group computes an internal block of tensor or * the edge blocks. When it is internal, there is no need to check the boundaries and all the if stantement can be * resolve by compiler. */ template struct ThreadProperties { const StorageIndex linearLocalThreadId; const StorageIndex kGroupId; const StorageIndex mGroupOffset; const StorageIndex nGroupOffset; const StorageIndex kGroupOffset; const StorageIndex mLocalOffset; const StorageIndex nLocalOffset; const StorageIndex mGlobalOffset; const StorageIndex nGlobalOffset; StorageIndex kSize; const bool is_internal; // this is used to adjust the last block EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ThreadProperties( const StorageIndex linearLocalThreadId_, const StorageIndex kGroupId_, const StorageIndex mGroupOffset_, const StorageIndex nGroupOffset_, const StorageIndex kGroupOffset_, const StorageIndex mLocalOffset_, const StorageIndex nLocalOffset_, const StorageIndex mGlobalOffset_, const StorageIndex nGlobalOffset_, StorageIndex kSize_, const bool is_internal_) : linearLocalThreadId(linearLocalThreadId_), kGroupId(kGroupId_), mGroupOffset(mGroupOffset_), nGroupOffset(nGroupOffset_), kGroupOffset(kGroupOffset_), mLocalOffset(mLocalOffset_), nLocalOffset(nLocalOffset_), mGlobalOffset(mGlobalOffset_), nGlobalOffset(nGlobalOffset_), kSize(kSize_), is_internal(is_internal_) {} }; /*! * \brief TensorContractionKernel is a template class that provides Tensor -Tensor contraction operation. * * \tparam OutScalar: determines the output scalar type * * \tparam LhsScalar: determines the left-hand-side scalar type * * \tparam RhsScalar: determines the right-hand-side scalar type * * \tparam OutAccessor: determines the sycl accessor type for out put (please see the sycl-1.2.1 specification (https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf) for accessor definition) * * \tparam LhsMapper determines the tensor contraction mapper type for left-hand-side matrix * * \tparam RhsMapper determines the tensor contraction mapper type for right-hand-side matrix * * \tparam StorageIndex: determines the StorageIndex Type * * \tparam Properties: determines the Contraction Panel properties * * \tparam TripleDim: determines the M, K, N dimensions for the flatten tensors in order to treat them as a matrix * * \tparam Vectorizable: determines whether or not the vectorization is enabled for the Eigen expression. * * \tparam input_mapper_properties : determine if the input tensors are matrix. If they are matrix, special memory access is used to guarantee that always the memory access are coalesced. * * \tptaram IsFinal : determine if this is the final kernel. If so, the result will be written in a final output. Otherwise, the result of contraction will be written iin a temporary buffer. This is the case when Tall/Skinny contraction is used. So in this case, a final reduction step is required to compute final output. * \tparam contraction_tp: it is an enum value representing whether the local memroy/no local memory implementation of the algorithm to be used * * \param scratch: local memory containing tiles of LHS and RHS tensors for each work-group * * \param lhs: determines the left-hand-side flattened tensor (tensor mapper) * * \param rhs: determines the right-hand-side flattened tensor (tensor mapper) * * \param out_res: determines the output tensor containing the contraction result * * \param groupSizeM: a logical number determining the number of work-group for m dimension * * \param groupSizeN: a logical number determining the number of work-group for n dimension * * \param numTiles: determines total number of tiles on the k dimension * * \param TripleDim: determines the M, K, N dimensions for the flatten tensors in order to treat them as a matrix */ template class TensorContractionKernel { public: typedef typename Eigen::TensorSycl::internal::Vectorise::PacketReturnType PacketReturnType; static EIGEN_CONSTEXPR int PacketSize = Eigen::TensorSycl::internal::Vectorise::PacketSize; static EIGEN_CONSTEXPR bool is_lhs_transposed = !::Eigen::internal::TensorContractionInputMapperTrait::inner_dim_contiguous; static EIGEN_CONSTEXPR bool is_rhs_transposed = !::Eigen::internal::TensorContractionInputMapperTrait::inner_dim_contiguous; typedef BlockProperties LHSBlockProperties; typedef BlockProperties RHSBlockProperties; static EIGEN_CONSTEXPR StorageIndex NStride = contraction_tp == contraction_type::local ? Properties::WorkLoadPerThreadN : RHSBlockProperties::nc_stride; typedef cl::sycl::accessor Scratch; typedef cl::sycl::multi_ptr local_ptr; typedef OutScalar * /*cl::sycl::multi_ptr*/ private_ptr; typedef typename ::Eigen::internal::conditional::type tile_ptr; static EIGEN_CONSTEXPR StorageIndex LSDL = contraction_tp == contraction_type::local ? Properties::TileSizeDimM + Properties::BC : Properties::WorkLoadPerThreadM; static EIGEN_CONSTEXPR StorageIndex LSDR = contraction_tp == contraction_type::local ? Properties::TileSizeDimN + Properties::BC : Properties::WorkLoadPerThreadN; static EIGEN_CONSTEXPR StorageIndex LocalOffset = Properties::LocalThreadSizeM * Properties::LocalThreadSizeN; /** * \brief MemHolder this is a place holder struct for creating memory hierarchy in SYCL. Inside SYCL kernel it is not * allowed to have dynamic memory allocation. While the local memory is created outside of the kernel and passed to * the kernel as an accessor, the private memory can only allowed to be allocated statically. Since we are abstracting * the TiledMemory for both local and private memory, the MemHolder structs is used as a helper to abstract out * different type of memory needed when local/no_local memory computation is called. * * \tparam contraction_type: it is an enum value representing whether the local memroy/no local memory implementation of the algorithm to be used * \tparam the private memory size * \param ptr the tile memory pointer type */ template struct MemHolder { tile_ptr ptr; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE MemHolder(local_ptr block_start_ptr) : ptr(block_start_ptr) {} }; /** * \brief specialization of memHolder class when no local memory kernel is used. */ template struct MemHolder { OutScalar ptr[MemSize] = {OutScalar{0}}; }; /** * \brief TiledMemory: contains required memory pointer for loading each tile of the TensorContraction panel from * global memory to local/private memory when local/no_local algorithm used. * * \param lhs_scratch_extract : determines the LHS tile memory. It is either private or local memory based on the * selected contraction_type. * * \param rhs_scratch_extract : determines the RHS tile memory. It is either private or local memory based on the * selected contraction_type. * * \param lhs_extract_index: determins the position of each thread on a local memory for lhs input. When private * memory is used this is set to zero as this is not applicable in case of private memory. * * \param rhs_extract_index: determins the position of each thread on a local memory for rhs input. When private * memory is used this is set to zero as this is not applicable in case of private memory. * * \param lhs_scratch_compute : determines the location to load for computation for lhs_local memory. This is the * same as lhs_scratch_extract for private memory. * * \param rhs_scratch_compute : determines the location to load for computation for rhs_local memory. This is the * same as rhs_scratch_extract for private memory. */ struct TiledMemory { MemHolder lhs_scratch_extract; MemHolder rhs_scratch_extract; tile_ptr lhs_scratch_ptr_compute; tile_ptr rhs_scratch_ptr_compute; const std::pair lhs_extract_index; const std::pair rhs_extract_index; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TiledMemory(const ThreadProperties &, local_ptr, typename ::Eigen::internal::enable_if::type * = 0) : lhs_scratch_extract{}, rhs_scratch_extract{}, lhs_scratch_ptr_compute(lhs_scratch_extract.ptr), rhs_scratch_ptr_compute(rhs_scratch_extract.ptr), lhs_extract_index(std::pair(StorageIndex{0}, StorageIndex{0})), rhs_extract_index(std::pair(StorageIndex{0}, StorageIndex{0})) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TiledMemory(const ThreadProperties &thread_properties, local_ptr block_start_ptr, typename ::Eigen::internal::enable_if::type * = 0) : lhs_scratch_extract{block_start_ptr}, rhs_scratch_extract{lhs_scratch_extract.ptr + ((Properties::DoubleBuffer + 1) * LSDL * Properties::TileSizeDimK)}, lhs_scratch_ptr_compute(lhs_scratch_extract.ptr + thread_properties.mLocalOffset), rhs_scratch_ptr_compute(rhs_scratch_extract.ptr + thread_properties.nLocalOffset), lhs_extract_index( local_id_extract(thread_properties.linearLocalThreadId)), rhs_extract_index( local_id_extract(thread_properties.linearLocalThreadId)) {} }; Scratch scratch; const LhsMapper lhs; const RhsMapper rhs; OutAccessor out_res; const StorageIndex groupSizeM; const StorageIndex groupSizeN; const StorageIndex numTiles; const TripleDim triple_dim; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorContractionKernel(Scratch scratch_, const LhsMapper lhs_, const RhsMapper rhs_, OutAccessor out_res_, const StorageIndex groupSizeM_, const StorageIndex groupSizeN_, const StorageIndex numTiles_, const TripleDim triple_dim_) : scratch(scratch_), lhs(lhs_), rhs(rhs_), out_res(out_res_), groupSizeM(groupSizeM_), groupSizeN(groupSizeN_), numTiles(numTiles_), triple_dim(triple_dim_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorContractionKernel(Scratch scratch_, const LhsMapper lhs_, const RhsMapper rhs_, OutAccessor out_res_, const StorageIndex groupSizeM_, const StorageIndex numTiles_, const TripleDim triple_dim_) : TensorContractionKernel(scratch_, lhs_, rhs_, out_res_, groupSizeM_, 1, numTiles_, triple_dim_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { const StorageIndex linearLocalThreadId = itemID.get_local_id(0); const StorageIndex nLocalThreadId = linearLocalThreadId / Properties::LocalThreadSizeM; const StorageIndex mLocalThreadId = linearLocalThreadId % Properties::LocalThreadSizeM; const StorageIndex mGroupId = itemID.get_group(0) % groupSizeM; const StorageIndex tmp = itemID.get_group(0) / groupSizeM; const StorageIndex nGroupId = IsFinal ? tmp : tmp % groupSizeN; const StorageIndex kGroupId = IsFinal ? 0 : tmp / groupSizeN; const StorageIndex mGroupOffset = mGroupId * Properties::TileSizeDimM; const StorageIndex nGroupOffset = nGroupId * Properties::TileSizeDimN; const StorageIndex mLocalOffset = PacketSize * mLocalThreadId; const StorageIndex nLocalOffset = NStride * nLocalThreadId; const StorageIndex mGlobalOffset = mGroupOffset + mLocalOffset; const StorageIndex nGlobalOffset = nGroupOffset + nLocalOffset; const StorageIndex kSizePerWG = IsFinal ? triple_dim.K : numTiles * Properties::TileSizeDimK; StorageIndex kGroupOffset = kGroupId * kSizePerWG; const bool is_internal = triple_dim.M - mGroupOffset >= Properties::TileSizeDimM && triple_dim.N - nGroupOffset >= Properties::TileSizeDimN && triple_dim.K - kGroupOffset >= kSizePerWG; // this is used to adjust the last block StorageIndex kSize = IsFinal ? triple_dim.K : std::min(kSizePerWG, triple_dim.K - kGroupOffset); // This is used to find out the lats K offset so that kGroupOffset -kSize can compute the coffset for loading to // tile kGroupOffset += kSize; auto thread_properties = ThreadProperties(linearLocalThreadId, kGroupId, mGroupOffset, nGroupOffset, kGroupOffset, mLocalOffset, nLocalOffset, mGlobalOffset, nGlobalOffset, kSize, is_internal); auto out_ptr = out_res.get_pointer() + (IsFinal ? 0 : thread_properties.kGroupId * triple_dim.M * triple_dim.N); (thread_properties.is_internal) ? compute_panel(itemID, thread_properties, out_ptr) : compute_panel(itemID, thread_properties, out_ptr); } // The compute block computes the contraction operation private block for each thread and store the resutl in the // privateRes memory of Each computation the compute block function is independent of local and no local concepts as // it only compute the block on each thread's private memory space EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_block_per_tile(OutScalar *lhs_block_ptr, OutScalar *rhs_block_ptr, PacketReturnType *privateRes) { StorageIndex idx = 0; EIGEN_CONSTEXPR StorageIndex lhs_stride = contraction_tp == contraction_type::local ? (PacketSize * Properties::LocalThreadSizeM) : 1; EIGEN_UNROLL_LOOP for (StorageIndex wLPTN = 0; wLPTN < Properties::WorkLoadPerThreadN; wLPTN++) { auto rhsPacket = PacketReturnType{*(rhs_block_ptr + wLPTN)}; StorageIndex lhs_index = 0; EIGEN_UNROLL_LOOP for (StorageIndex wLPTM = 0; wLPTM < Properties::WorkLoadPerThreadM / PacketSize; wLPTM++) { PacketReturnType lhsPack{}; Eigen::TensorSycl::internal::PacketWrapper::set_packet(lhsPack, lhs_block_ptr + lhs_index); privateRes[idx] = ::Eigen::internal::pmadd(lhsPack, rhsPacket, privateRes[idx]); lhs_index += lhs_stride; idx++; } } } // The store function write the computed contraction operation in the private memory of each thread to the global // memory. The store function is independent of local and no local concepts s that it can be abstract out in the base // class. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void store(OutPtr *out_ptr, PacketReturnType *privateRes, StorageIndex mGlobalOffset, StorageIndex nGlobalOffset) { auto chk_bound = [&](const StorageIndex &mIndex, const StorageIndex &nIndex) EIGEN_DEVICE_FUNC { return (mIndex + PacketSize - 1 < triple_dim.M && nGlobalOffset + nIndex < triple_dim.N); }; // when local memory is not used M and N are both accessed in a coalesced way. However, when local memory is // available the k*N is transposed in the local to N*K therefore, each blocks operates on blockId* // WorkLoadPerThreadN slice of N EIGEN_CONSTEXPR StorageIndex GlobalNStride = contraction_tp == contraction_type::local ? 1 : Properties::LocalThreadSizeN; EIGEN_UNROLL_LOOP for (StorageIndex wLPTN = 0; wLPTN < Properties::WorkLoadPerThreadN / PrivateNStride; wLPTN++) { // output leading dimension StorageIndex outputLD = 0; // When local memory is used the PrivateNstride is always 1 because the coalesed access on N is loaded into Local // memory and extracting from local to global is the same as no transposed version. However, when local memory is // not used and RHS is transposed we packetize the load for RHS. EIGEN_UNROLL_LOOP for (StorageIndex nId = 0; nId < PrivateNStride; nId++) { StorageIndex globalRow = mGlobalOffset; EIGEN_UNROLL_LOOP for (StorageIndex wLPTM = 0; wLPTM < Properties::WorkLoadPerThreadM / PacketSize; wLPTM++) { PacketReturnType privetOut = privateRes[wLPTM]; if (check_boundary(chk_bound(globalRow, nId))) { // Store the final results in C. The C matrix has always M as a first StorageIndex and N as a second // StorageIndex Therefore it is always coalesced layout write(privetOut, out_ptr + outputLD + globalRow); } else { EIGEN_UNROLL_LOOP for (StorageIndex mId = 0; mId < PacketSize; mId++) { StorageIndex mOffset = globalRow + mId; if (mOffset < triple_dim.M && (nGlobalOffset + nId < triple_dim.N)) { out_ptr[mOffset + outputLD] = Eigen::TensorSycl::internal::PacketWrapper::scalarize(mId, privetOut); } } } globalRow += (PacketSize * Properties::LocalThreadSizeM); } outputLD += triple_dim.M; privateRes += Properties::WorkLoadPerThreadM / PacketSize; } out_ptr += (GlobalNStride * outputLD); nGlobalOffset += (PrivateNStride * GlobalNStride); } } // when no local memory is used the following extract_block will be enabled template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type extract_block(const Input &inpt, PrivateReg private_ptr, const std::pair &, const StorageIndex &ncOffset, const StorageIndex cOffset) { EIGEN_CONSTEXPR StorageIndex LocalThreadSizeNC = InputBlockProperties::is_rhs ? Properties::LocalThreadSizeN : Properties::LocalThreadSizeM; EIGEN_CONSTEXPR StorageIndex WorkLoadPerThreadNC = InputBlockProperties::is_rhs ? Properties::WorkLoadPerThreadN : Properties::WorkLoadPerThreadM; const StorageIndex &NC = InputBlockProperties::is_rhs ? triple_dim.N : triple_dim.M; auto chk_bound = [&](const StorageIndex &CIndex, const StorageIndex &NCIndex) EIGEN_DEVICE_FUNC { return ((CIndex + InputBlockProperties::c_stride - 1 < triple_dim.K) && (NCIndex + InputBlockProperties::nc_stride - 1 < NC)); }; const StorageIndex ld = InputBlockProperties::is_coalesced_layout ? NC : triple_dim.K; StorageIndex cIndex = cOffset; EIGEN_UNROLL_LOOP for (StorageIndex cId = 0; cId < Properties::TileSizeDimK / InputBlockProperties::c_stride; cId++) { StorageIndex ncIndex = ncOffset; EIGEN_UNROLL_LOOP for (StorageIndex ncId = 0; ncId < WorkLoadPerThreadNC / InputBlockProperties::nc_stride; ncId++) { if (check_boundary(chk_bound(cIndex, ncIndex))) { auto val = read(inpt, ncIndex, cIndex, ld); write(val, private_ptr); } else { EIGEN_UNROLL_LOOP for (StorageIndex i = 0; i < InputBlockProperties::elements_per_access; i++) { const StorageIndex ncInd = ncIndex + (InputBlockProperties::is_coalesced_layout ? i : 0); const StorageIndex cInd = cIndex + (InputBlockProperties::is_coalesced_layout ? 0 : i); OutScalar val = (ncInd < NC && cInd < triple_dim.K) ? read( inpt, ncInd, cInd, ld) : OutScalar(0); write( val, private_ptr + (InputBlockProperties::is_coalesced_layout ? i : 0) + ((InputBlockProperties::is_coalesced_layout ? 0 : i) * WorkLoadPerThreadNC)); } } // if it is lhs we have to load it packetised when the packet size is > 1, because the output is coalesced. So // even if M is not accessed in a coalesced mode, we have to load packet_size number of m per thread. ncIndex = (!InputBlockProperties::is_rhs && InputBlockProperties::nc_stride == 1 && PacketSize != 1) ? ncOffset + (ncId + 1) % PacketSize + ((ncId + 1) / PacketSize) * LocalThreadSizeNC : (ncIndex + InputBlockProperties::nc_stride * LocalThreadSizeNC); private_ptr += InputBlockProperties::nc_stride; } // the previous for loop ( private_ptr += (ncId * nc_stride)) has already moved ptr with one WorkLoadPerThreadNC private_ptr += (InputBlockProperties::c_stride - 1) * WorkLoadPerThreadNC; cIndex += InputBlockProperties::c_stride; } } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::pair local_id_extract( const StorageIndex &linearLocalThreadId) { const StorageIndex localThreadNC = (InputBlockProperties::is_coalesced_layout) ? linearLocalThreadId % (TileSizeDimNC / InputBlockProperties::nc_stride) : linearLocalThreadId / (Properties::TileSizeDimK / InputBlockProperties::c_stride); const StorageIndex localThreadC = (InputBlockProperties::is_coalesced_layout) ? linearLocalThreadId / (TileSizeDimNC / InputBlockProperties::nc_stride) : linearLocalThreadId % (Properties::TileSizeDimK / InputBlockProperties::c_stride); return std::pair(localThreadNC, localThreadC); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_mem(const cl::sycl::nd_item<1> &, bool &db_offset) noexcept { db_offset = !db_offset; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_mem(const cl::sycl::nd_item<1> &itemID, bool &) noexcept { itemID.barrier(cl::sycl::access::fence_space::local_space); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_mem(const cl::sycl::nd_item<1> &, bool &) noexcept { return; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_thread(const cl::sycl::nd_item<1> & #ifdef EIGEN_SYCL_ARM_GPU_CACHE_OPTIMISATION itemID #endif ) noexcept { #ifdef EIGEN_SYCL_ARM_GPU_CACHE_OPTIMISATION itemID.barrier(cl::sycl::access::fence_spacce::local_space); #else return; #endif } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_thread(const cl::sycl::nd_item<1> &itemID) { itemID.barrier(cl::sycl::access::fence_space::local_space); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type sync_thread( const cl::sycl::nd_item<1> &) { return; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_tile_per_panel(const cl::sycl::nd_item<1> &itemID, ThreadProperties &thread_properties, TiledMemory &tiled_input_block, PacketReturnType *privateRes, bool &db_offset) { // Tiling the Rhs block from global to local memory extract_block( rhs, tiled_input_block.rhs_scratch_extract.ptr + (db_offset * Properties::TileSizeDimK * LSDR), tiled_input_block.rhs_extract_index, contraction_tp == contraction_type::local ? thread_properties.nGroupOffset : thread_properties.nGlobalOffset, thread_properties.kGroupOffset - thread_properties.kSize); sync_thread(itemID); // Tiling the Lhs block from global to local memory extract_block( lhs, tiled_input_block.lhs_scratch_extract.ptr + (db_offset * LSDL * Properties::TileSizeDimK), tiled_input_block.lhs_extract_index, contraction_tp == contraction_type::local ? thread_properties.mGroupOffset : thread_properties.mGlobalOffset, thread_properties.kGroupOffset - thread_properties.kSize); // itemID.barrier(cl::sycl::access::fence_space::local_space); sync_thread(itemID); // switch to compute mede StorageIndex lhs_offset = (db_offset * LSDL * Properties::TileSizeDimK); StorageIndex rhs_offset = (db_offset * Properties::TileSizeDimK * LSDR); // Loop over the values of a single tile for (StorageIndex k = 0; k < Properties::TileSizeDimK; k++) { compute_block_per_tile(tiled_input_block.lhs_scratch_ptr_compute + lhs_offset, tiled_input_block.rhs_scratch_ptr_compute + rhs_offset, privateRes); lhs_offset += LSDL; rhs_offset += LSDR; } // computing the K index for the next tile thread_properties.kSize -= Properties::TileSizeDimK; sync_mem(itemID, db_offset); } // when local memory is available the following compute_panel will be enabled template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_panel(const cl::sycl::nd_item<1> &itemID, ThreadProperties &thread_properties, OutPtr out_ptr) { auto tiled_input_block = TiledMemory{thread_properties, scratch.get_pointer()}; // Allocate register space PacketReturnType privateRes[Properties::WorkLoadPerThreadM * Properties::WorkLoadPerThreadN / PacketSize] = { PacketReturnType{0}}; bool db_offset = 0; while (thread_properties.kSize >= Properties::TileSizeDimK) { compute_tile_per_panel(itemID, thread_properties, tiled_input_block, privateRes, db_offset); } if (thread_properties.kSize > 0) { compute_tile_per_panel(itemID, thread_properties, tiled_input_block, privateRes, db_offset); } // Storing the final results in the output store(1) : RHSBlockProperties::nc_stride>( out_ptr + thread_properties.nGlobalOffset * triple_dim.M, privateRes, thread_properties.mGlobalOffset, thread_properties.nGlobalOffset); } // When local memory is available the following extract_block will be enabled template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type extract_block(const Input &inpt, Local local_ptr, const std::pair& local_index, const StorageIndex &ncOffset, const StorageIndex cOffset) { EIGEN_CONSTEXPR StorageIndex TileSizeDimNC = InputBlockProperties::is_rhs ? Properties::TileSizeDimN : Properties::TileSizeDimM; EIGEN_CONSTEXPR StorageIndex LoadPerThread = InputBlockProperties::is_rhs ? Properties::LoadPerThreadRhs : Properties::LoadPerThreadLhs; EIGEN_CONSTEXPR StorageIndex LSD = InputBlockProperties::is_rhs ? LSDR : LSDL; static_assert(((LocalOffset % (TileSizeDimNC / InputBlockProperties::nc_stride) == 0) && (LocalOffset % (Properties::TileSizeDimK / InputBlockProperties::c_stride) == 0)), " LocalOffset must be divisable by stride"); const StorageIndex &NC = InputBlockProperties::is_rhs ? triple_dim.N : triple_dim.M; StorageIndex localThreadNC = local_index.first; StorageIndex localThreadC = local_index.second; auto chk_bound = [&](const StorageIndex &CIndex, const StorageIndex &NCIndex) EIGEN_DEVICE_FUNC { return ((CIndex + InputBlockProperties::c_stride - 1 < triple_dim.K) && (NCIndex + InputBlockProperties::nc_stride - 1 < NC)); }; EIGEN_UNROLL_LOOP for (StorageIndex lPT = 0; lPT < LoadPerThread / InputBlockProperties::elements_per_access; lPT++) { const StorageIndex CIndex = cOffset + (InputBlockProperties::c_stride * localThreadC); const StorageIndex NCIndex = ncOffset + (InputBlockProperties::nc_stride * localThreadNC); const StorageIndex ld = InputBlockProperties::is_coalesced_layout ? NC : triple_dim.K; if (check_boundary(chk_bound(CIndex, NCIndex))) { auto val = read(inpt, NCIndex, CIndex, ld); write( val, local_ptr + (InputBlockProperties::nc_stride * localThreadNC) + (InputBlockProperties::c_stride * localThreadC * LSD)); } else { EIGEN_UNROLL_LOOP for (StorageIndex i = 0; i < InputBlockProperties::elements_per_access; i++) { const StorageIndex nCInd = NCIndex + (InputBlockProperties::is_coalesced_layout ? i : 0); const StorageIndex cInd = CIndex + (InputBlockProperties::is_coalesced_layout ? 0 : i); OutScalar val = (nCInd < NC && cInd < triple_dim.K) ? read( inpt, nCInd, cInd, ld) : OutScalar(0); write( val, local_ptr + (InputBlockProperties::nc_stride * localThreadNC) + (InputBlockProperties::is_coalesced_layout ? i : 0) + ((InputBlockProperties::c_stride * localThreadC + (InputBlockProperties::is_coalesced_layout ? 0 : i)) * LSD)); } } localThreadNC += (InputBlockProperties::is_coalesced_layout) ? LocalOffset % (TileSizeDimNC / InputBlockProperties::nc_stride) : LocalOffset / (Properties::TileSizeDimK / InputBlockProperties::c_stride); localThreadC += (InputBlockProperties::is_coalesced_layout) ? LocalOffset / (TileSizeDimNC / InputBlockProperties::nc_stride) : LocalOffset % (Properties::TileSizeDimK / InputBlockProperties::c_stride); } } }; #ifndef EIGEN_SYCL_DISABLE_GEMV /*! * \brief GeneralVectorTensor is a template class that provides Tensor -vector contraction operation, which is a special * case of Tensor Tensor contraction. * * \tparam OutScalar: determines the output scalar type * * \tparam OutAccessor: determines the sycl accessor type for out put (please see the sycl-1.2.1 specification * (https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf) for accessor definition) * * \tparam VectorMapper: determines the tensor contraction mapper for the vector input (can be lhs or rhs) * * \tparam TensorMapper: determines the tensor contraction mapper for the tensor input (can be lhs or rhs) * * \tparam StorageIndex: determines the StorageIndex Type * * \tparam Properties: determines the Contraction Panel properties * * \tparam KFactor: determines the number of elements in K dimension in a Tile * * \tparam Vectorizable: determines whether or not the vectorization is enabled for the Eigen expression. * * \tparam is_lhs_vec: determines whether lhs is a vector or rhs is a vector * * \tparam IsFinal: determine if this is the final kernel. If so, the result will be written in a final output. * Otherwise, the result of contraction will be written iin a temporary buffer. * * \param scratch: determines the local memory containing the vector block for each work-group * * \param vec: determines the vector input (tensor mapper) * * \param mat: determines the tensor input (tensor mapper) * * \param out_res: determines the output vector containing the contraction result * * \param nonContractGroupSize: a logical number determining the number of work-group for non-contracting dimension * * \param nonContractDim: determines the size of non contracting dimension for the flattened tensor * * \param contractDim: determines the size of non contracting dimension for the flattened tensor * */ template struct GeneralVectorTensor { typedef typename Eigen::TensorSycl::internal::Vectorise::PacketReturnType PacketReturnType; static EIGEN_CONSTEXPR int PacketSize = Eigen::TensorSycl::internal::Vectorise::PacketSize; typedef cl::sycl::accessor Scratch; static EIGEN_CONSTEXPR StorageIndex OutScratchOffset = KFactor * Properties::LocalThreadSizeC * Properties::LocalThreadSizeNC; // Since the access layout for a vector can always be coalesced, when LHS is a vector, we pass false and false to make // sure that the !^ is true When RHS is a vector, we pass true and true to make sure that the !^ is true. typedef BlockProperties VecBlockProperties; Scratch scratch; const VectorMapper vec; const TensorMapper mat; OutAccessor out_res; const StorageIndex nonContractGroupSize; const StorageIndex nonContractDim; const StorageIndex contractDim; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE GeneralVectorTensor(Scratch scratch_, const VectorMapper vec_, const TensorMapper mat_, OutAccessor out_res_, const StorageIndex nonContractGroupSize_, const StorageIndex nonContractDim_, const StorageIndex contractDim_) : scratch(scratch_), vec(vec_), mat(mat_), out_res(out_res_), nonContractGroupSize(nonContractGroupSize_), nonContractDim(nonContractDim_), contractDim(contractDim_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { auto scratch_ptr = scratch.get_pointer(); const StorageIndex linearLocalThreadId = itemID.get_local_id(0); StorageIndex nonContractId = is_lhs_vec ? linearLocalThreadId / Properties::LocalThreadSizeC : linearLocalThreadId % Properties::LocalThreadSizeNC; StorageIndex contractId = is_lhs_vec ? linearLocalThreadId % Properties::LocalThreadSizeC : linearLocalThreadId / Properties::LocalThreadSizeNC; const StorageIndex cGroupSize = itemID.get_group_range(0) / nonContractGroupSize; const StorageIndex nonContractGroupId = is_lhs_vec ? itemID.get_group(0) / cGroupSize : itemID.get_group(0) % nonContractGroupSize; const StorageIndex contractGroupId = is_lhs_vec ? itemID.get_group(0) % cGroupSize : itemID.get_group(0) / nonContractGroupSize; auto out_ptr = out_res.get_pointer() + (IsFinal ? 0 : contractGroupId * nonContractDim); const StorageIndex nonContractGroupOffset = nonContractGroupId * Properties::TileSizeDimNC; const StorageIndex contractGroupOffset = contractGroupId * Properties::TileSizeDimC; auto outScratchIndex = nonContractId + contractId * Properties::LocalThreadSizeNC; const StorageIndex globalNonContractDimOffset = nonContractGroupOffset + nonContractId; const StorageIndex globalContractDimOffset = contractGroupOffset + contractId; auto local_output = scratch_ptr + OutScratchOffset; const bool is_internal = nonContractDim - nonContractGroupOffset >= Properties::TileSizeDimNC && contractDim - contractGroupOffset >= Properties::TileSizeDimC; is_internal ? compute_panel(itemID, vec, mat, local_output, out_ptr, #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON scratch_ptr, contractGroupOffset, #endif nonContractGroupOffset, linearLocalThreadId, contractDim, nonContractDim, contractId, nonContractId, globalContractDimOffset, globalNonContractDimOffset, outScratchIndex) : compute_panel(itemID, vec, mat, local_output, out_ptr, #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON scratch_ptr, contractGroupOffset, #endif nonContractGroupOffset, linearLocalThreadId, contractDim, nonContractDim, contractId, nonContractId, globalContractDimOffset, globalNonContractDimOffset, outScratchIndex); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_panel( const cl::sycl::nd_item<1> &itemID, const VectorMapper &vec, const TensorMapper &mat, OutScalar *local_output, OutPtr out_ptr, #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON OutScalar *scratch_ptr, const StorageIndex contractGroupOffset, #endif const StorageIndex nonContractGroupOffset, const StorageIndex linearLocalThreadId, StorageIndex contractDim, StorageIndex nonContractDim, StorageIndex contractId, StorageIndex nonContractId, StorageIndex globalContractDimOffset, StorageIndex globalNonContractDimOffset, StorageIndex outScratchIndex) { OutScalar outScalar[Properties::WorkLoadPerThreadNC] = {OutScalar(0)}; // Reading the vector #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON const StorageIndex vectorOffset = contractGroupOffset + linearLocalThreadId; extract_block(vec, scratch_ptr, linearLocalThreadId, vectorOffset, contractDim); itemID.barrier(cl::sycl::access::fence_space::local_space); auto in_scratch_ptr = scratch_ptr + contractId; #endif StorageIndex privateOffsetC = 0; EIGEN_UNROLL_LOOP for (StorageIndex i = 0; i < Properties::WorkLoadPerThreadC; i++) { StorageIndex privateOffsetNC = 0; bool contract_conds = ((globalContractDimOffset + privateOffsetC) < contractDim); #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON auto vecScalar = *in_scratch_ptr; #else auto vecScalar = (check_boundary(contract_conds)) ? vec(is_lhs_vec ? StorageIndex(0) : globalContractDimOffset + privateOffsetC, is_lhs_vec ? globalContractDimOffset + privateOffsetC : StorageIndex(0)) : OutScalar(0); #endif EIGEN_UNROLL_LOOP for (StorageIndex j = 0; j < Properties::WorkLoadPerThreadNC; j++) { auto matScalar = (check_boundary( contract_conds && ((globalNonContractDimOffset + privateOffsetNC) < nonContractDim))) ? mat(is_lhs_vec ? globalContractDimOffset + privateOffsetC : globalNonContractDimOffset + privateOffsetNC, is_lhs_vec ? globalNonContractDimOffset + privateOffsetNC : globalContractDimOffset + privateOffsetC) : OutScalar(0); outScalar[j] = cl::sycl::mad(matScalar, vecScalar, outScalar[j]); privateOffsetNC += Properties::LocalThreadSizeNC; } privateOffsetC += Properties::LocalThreadSizeC; #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON in_scratch_ptr += Properties::LocalThreadSizeC; #endif } auto out_scratch_ptr = local_output + outScratchIndex; // Each block of 16*16 element in shared memory should reduce to 16*1 EIGEN_UNROLL_LOOP for (StorageIndex j = 0; j < Properties::WorkLoadPerThreadNC; j++) { *out_scratch_ptr = outScalar[j]; out_scratch_ptr += (Properties::LocalThreadSizeNC * Properties::LocalThreadSizeC); } if (is_lhs_vec) { nonContractId = linearLocalThreadId % Properties::LocalThreadSizeNC; contractId = linearLocalThreadId / Properties::LocalThreadSizeNC; outScratchIndex = nonContractId + contractId * Properties::LocalThreadSizeNC; } out_scratch_ptr = local_output + outScratchIndex; EIGEN_UNROLL_LOOP for (StorageIndex j = 0; j < Properties::WorkLoadPerThreadNC; j++) { EIGEN_UNROLL_LOOP for (StorageIndex offset = Properties::LocalThreadSizeC >> 1; offset > 0; offset >>= 1) { itemID.barrier(cl::sycl::access::fence_space::local_space); if (contractId < offset) { StorageIndex myNeigbourId = (Properties::LocalThreadSizeNC * offset); *out_scratch_ptr += out_scratch_ptr[myNeigbourId]; } } // moving to next 16 by 16 block out_scratch_ptr += (Properties::LocalThreadSizeNC * Properties::LocalThreadSizeC); } if (contractId == 0) { out_scratch_ptr = local_output + nonContractId; StorageIndex global_final_offset = nonContractGroupOffset + nonContractId; out_ptr += global_final_offset; EIGEN_UNROLL_LOOP for (StorageIndex j = 0; j < Properties::WorkLoadPerThreadNC; j++) { if (check_boundary(global_final_offset < nonContractDim)) { auto res = *out_scratch_ptr; *out_ptr = res; out_ptr += Properties::LocalThreadSizeNC; } // moving to next 16 by 16 block to ge the next 16 reduced elements out_scratch_ptr += (Properties::LocalThreadSizeNC * Properties::LocalThreadSizeC); if (!(is_internal_block)) global_final_offset += Properties::LocalThreadSizeNC; } } } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void extract_block(const Input &inpt, Local *local_ptr, const StorageIndex &linearLocalThreadId, const StorageIndex &cOffset, const StorageIndex &C) { local_ptr += InputBlockProperties::c_stride * linearLocalThreadId; StorageIndex cIndex = cOffset; for (StorageIndex cId = 0; cId < CFactor / InputBlockProperties::c_stride; cId++) { if (check_boundary(cIndex + InputBlockProperties::c_stride - 1 < C)) { auto val = read(inpt, StorageIndex(0), cIndex, StorageIndex(1)); write(val, local_ptr); } else { EIGEN_UNROLL_LOOP for (StorageIndex i = 0; i < InputBlockProperties::elements_per_access; i++) { OutScalar val = (cIndex + i < C) ? read( inpt, StorageIndex(0), cIndex + i, StorageIndex(1)) : OutScalar(0); write(val, local_ptr + i); } } local_ptr += InputBlockProperties::c_stride * GroupSize; cIndex += InputBlockProperties::c_stride * GroupSize; } } }; #endif #ifndef EIGEN_SYCL_DISABLE_SCALAR /*! * \brief GeneralScalarContraction is a template class that provides the scalar value of Tensor -Tensor contraction * operation, when all the dimensions are contracting dimensions. This Kernel reduces two tensors to an scalar * * \tparam OutScalar: determines the output scalar type * * \tparam LhsScalar: determines the left-hand-side scalar type * * \tparam RhsScalar: determines the right-hand-side scalar type * * \tparam OutAccessor: determines the sycl accessor type for out put (please see the sycl-1.2.1 specification * (https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf) for accessor definition) * * \tparam LhsMapper: determines the tensor contraction mapper type for left-hand-side matrix * * \tparam RhsMapper: determines the tensor contraction mapper type for right-hand-side matrix * * \tparam StorageIndex: determines the StorageIndex Type * * \tparam Vectorizable: determines whether or not the vectorization is enabled for the Eigen expression. * * \param scratch: local memory containing tiles of LHS and RHS tensors for each work-group * * \param lhs: determines the left-hand-side flattened tensor (tensor mapper) * * \param rhs: determines the right-hand-side flattened tensor (tensor mapper) * * \param out_res: determines the output tensor containing the contraction result * * \param rng: determins the total input data size */ template struct GeneralScalarContraction { typedef cl::sycl::accessor Scratch; Scratch scratch; const LhsMapper lhs; const RhsMapper rhs; OutAccessor out_res; const StorageIndex rng; EIGEN_DEVICE_FUNC GeneralScalarContraction(Scratch scratch_, const LhsMapper lhs_, const RhsMapper rhs_, OutAccessor out_res_, const StorageIndex rng_) : scratch(scratch_), lhs(lhs_), rhs(rhs_), out_res(out_res_), rng(rng_) {} EIGEN_DEVICE_FUNC void operator()(cl::sycl::nd_item<1> itemID) { auto out_ptr = out_res.get_pointer(); auto scratch_ptr = scratch.get_pointer().get(); StorageIndex globalid = itemID.get_global_id(0); StorageIndex localid = itemID.get_local_id(0); OutScalar accumulator = OutScalar(0); for (StorageIndex i = globalid; i < rng; i += itemID.get_global_range(0)) { accumulator = cl::sycl::mad(lhs(0, i), rhs(i, 0), accumulator); } auto out_scratch_ptr = scratch_ptr + localid; *out_scratch_ptr = accumulator; for (StorageIndex offset = itemID.get_local_range(0) >> 1; offset > 0; offset >>= 1) { itemID.barrier(cl::sycl::access::fence_space::local_space); if (localid < offset) { *out_scratch_ptr = (accumulator += out_scratch_ptr[offset]); } } if (localid == 0) { out_ptr[itemID.get_group(0)] = accumulator; } } }; #endif } // namespace internal } // namespace TensorSycl template struct TensorEvaluator, Eigen::SyclDevice> : public TensorContractionEvaluatorBase, Eigen::SyclDevice>> { static_assert(std::is_same::value, "SYCL tensor contraction does not support output kernels."); typedef Eigen::SyclDevice Device; typedef TensorEvaluator, Device> Self; typedef TensorContractionEvaluatorBase Base; typedef TensorContractionOp XprType; typedef typename internal::remove_const::type Scalar; typedef typename XprType::Index StorageIndex; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename Base::Storage Storage; typedef typename Base::EvaluatorPointerType EvaluatorPointerType; struct TripleDim { const StorageIndex M; const StorageIndex N; const StorageIndex K; TripleDim(const StorageIndex M_, const StorageIndex N_, const StorageIndex K_) : M(M_), N(N_), K(K_) {} }; enum { Layout = TensorEvaluator::Layout, PacketAccess = (PacketType::size > 1), BlockAccess = false, }; static EIGEN_CONSTEXPR int LDims = Base::LDims; static EIGEN_CONSTEXPR int RDims = Base::RDims; static EIGEN_CONSTEXPR int ContractDims = Base::ContractDims; typedef array left_dim_mapper_t; typedef array right_dim_mapper_t; typedef array contract_t; typedef array left_nocontract_t; typedef array right_nocontract_t; static const int NumDims = LDims + RDims - 2 * ContractDims; typedef DSizes Dimensions; typedef TensorEvaluator LeftEvaluator; typedef TensorEvaluator RightEvaluator; typedef typename Eigen::internal::remove_const::type LhsScalar; typedef typename Eigen::internal::remove_const::type RhsScalar; typedef typename LeftEvaluator::Dimensions LeftDimensions; typedef typename RightEvaluator::Dimensions RightDimensions; template struct input_mapper_propertis { static EIGEN_CONSTEXPR bool is_lhs_matrix = (LDims == 2 && ContractDims == 1) || lhs_inner_dim_contiguous; static EIGEN_CONSTEXPR bool is_rhs_matrix = (RDims == 2 && ContractDims == 1) || (rhs_inner_dim_contiguous && !rhs_inner_dim_reordered); }; TensorEvaluator(const XprType &op, const Device &device) : Base(op, device) {} // We need to redefine this method to make nvcc happy EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(typename Base::EvaluatorPointerType data) { this->m_leftImpl.evalSubExprsIfNeeded(NULL); this->m_rightImpl.evalSubExprsIfNeeded(NULL); if (!data) { this->m_result = this->m_device.get( static_cast(this->m_device.allocate_temp(this->dimensions().TotalSize() * sizeof(Scalar)))); data = this->m_result; } evalToSycl(data); return (this->m_result != NULL); } const Eigen::SyclDevice &device() const { return this->m_device; } void evalToSycl(typename Base::EvaluatorPointerType buffer) const { if (this->m_lhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } else { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } } else { if (this->m_rhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } else { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } } } template void evalTyped(typename Base::EvaluatorPointerType buffer) const { const auto triple_dim = TripleDim{this->m_i_size, this->m_j_size, this->m_k_size}; typedef internal::TensorContractionInputMapper< LhsScalar, StorageIndex, internal::Lhs, LeftEvaluator, left_nocontract_t, contract_t, PacketType::size, lhs_inner_dim_contiguous, false, Unaligned, MakeSYCLPointer> LhsMapper; typedef internal::TensorContractionInputMapper::size, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Unaligned, MakeSYCLPointer> RhsMapper; // initialize data mappers LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides, this->m_left_contracting_strides, this->m_k_strides); RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides, this->m_right_contracting_strides, this->m_k_strides); #ifndef EIGEN_SYCL_DISABLE_SCALAR if (triple_dim.M == 1 && triple_dim.N == 1) { launchSC(buffer, lhs, rhs, triple_dim.K); } else #endif #ifndef EIGEN_SYCL_DISABLE_GEMV if (triple_dim.M != 1 && triple_dim.N == 1) { LaunchVT(buffer, rhs, lhs, triple_dim.M, triple_dim.K); } else if (triple_dim.M == 1 && triple_dim.N != 1) { LaunchVT(buffer, lhs, rhs, triple_dim.N, triple_dim.K); } else // This is equivalent of if (m!=1 && n!=1) #endif { typedef input_mapper_propertis inpt_mapper_properties; #ifndef EIGEN_SYCL_DISABLE_SKINNY bool skinny = false; auto platform_name = this->device().getPlatformName(); // This is based on empirical calculation for AMD r9-nano and Fiji if (platform_name.find("AMD") == 0) { skinny = (triple_dim.M < triple_dim.K || triple_dim.N < triple_dim.K) && ((triple_dim.M < 1024 && triple_dim.N < 1024) || (uint64_t(triple_dim.M * triple_dim.N) < uint64_t(triple_dim.K))); } else { skinny = (((std::max(triple_dim.K, triple_dim.N) / std::min(triple_dim.K, triple_dim.N)) > 100) || ((std::max(triple_dim.K, triple_dim.M) / std::min(triple_dim.K, triple_dim.M)) > 100) || ((std::max(triple_dim.N, triple_dim.M) / std::min(triple_dim.N, triple_dim.M)) > 100)); } if (skinny) adjustTT(buffer, lhs, rhs, triple_dim); else #endif // EIGEN_SYCL_DISABLE_SKINNY adjustTT(buffer, lhs, rhs, triple_dim); } } template void EIGEN_ALWAYS_INLINE adjustTT(EvaluatorPointerType buffer, const LhsMapper &lhs, const RhsMapper &rhs, const TripleDim &triple_dim) const { #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON if (device().has_local_memory()) { typedef TensorSycl::internal::TTPanelSize PanelParameters; launchTT( buffer, lhs, rhs, triple_dim); } #endif #ifdef EIGEN_SYCL_LOCAL_MEM_UNSET_OR_OFF if (!(device().has_local_memory())) { typedef TensorSycl::internal::TTPanelSize PanelParameters; launchTT( buffer, lhs, rhs, triple_dim); } #endif } template void launchTT(EvaluatorPointerType buffer, const LhsMapper &lhs, const RhsMapper &rhs, const TripleDim &triple_dim) const { const StorageIndex roundUpM = Eigen::TensorSycl::internal::roundUp(triple_dim.M, Properties::TileSizeDimM); const StorageIndex roundUpN = Eigen::TensorSycl::internal::roundUp(triple_dim.N, Properties::TileSizeDimN); const StorageIndex groupSizeM = roundUpM / Properties::TileSizeDimM; const StorageIndex groupSizeN = roundUpN / Properties::TileSizeDimN; const StorageIndex roundUpK = Eigen::TensorSycl::internal::roundUp(triple_dim.K, Properties::TileSizeDimK); StorageIndex totalTilesK = roundUpK / Properties::TileSizeDimK; StorageIndex groupSizeK = skinny ? std::max(std::min(totalTilesK, (StorageIndex)(device().getPowerOfTwo(device().getNumSyclMultiProcessors(), true) * 4) / (groupSizeM * groupSizeN)), StorageIndex(1)) : StorageIndex(1); const StorageIndex numTilesPerGroup = Eigen::TensorSycl::internal::roundUp(totalTilesK, groupSizeK) / groupSizeK; const StorageIndex totalGroupSize = groupSizeM * groupSizeN * groupSizeK; const StorageIndex localRange = Properties::LocalThreadSizeM * Properties::LocalThreadSizeN; const StorageIndex globalRange = totalGroupSize * localRange; const StorageIndex scratchSize = (ct == TensorSycl::internal::contraction_type::local) ? ((Properties::DoubleBuffer + 1) * (Properties::TileSizeDimM + Properties::BC) * (Properties::TileSizeDimK)) + ((Properties::DoubleBuffer + 1) * (Properties::TileSizeDimK) * (Properties::TileSizeDimN + Properties::BC)) : StorageIndex(1); auto thread_range = cl::sycl::nd_range<1>(cl::sycl::range<1>(globalRange), cl::sycl::range<1>(localRange)); if (groupSizeK == 1) { typedef TensorSycl::internal::TensorContractionKernel ContractKernelName; device().template binary_kernel_launcher( lhs, rhs, buffer, thread_range, scratchSize, groupSizeM, groupSizeN, numTilesPerGroup, triple_dim); } else { typedef TensorSycl::internal::TensorContractionKernel ContractKernelName; CoeffReturnType *temp_pointer = static_cast( device().allocate_temp(triple_dim.M * triple_dim.N * groupSizeK * sizeof(CoeffReturnType))); EvaluatorPointerType tmp_global_accessor = device().get(temp_pointer); device().template binary_kernel_launcher( lhs, rhs, tmp_global_accessor, thread_range, scratchSize, groupSizeM, groupSizeN, numTilesPerGroup, triple_dim); typedef Eigen::internal::SumReducer Op; auto op = Op(); typedef TensorSycl::internal::SecondStepPartialReduction ReductionKernel; device().template unary_kernel_launcher( tmp_global_accessor, buffer, cl::sycl::nd_range<1>(cl::sycl::range<1>(StorageIndex( Eigen::TensorSycl::internal::roundUp(triple_dim.M * triple_dim.N, localRange))), cl::sycl::range<1>(localRange)), StorageIndex(1), op, StorageIndex(triple_dim.M * triple_dim.N), groupSizeK); device().deallocate_temp(temp_pointer); } } #ifndef EIGEN_SYCL_DISABLE_GEMV template void EIGEN_ALWAYS_INLINE LaunchVT(EvaluatorPointerType buffer, const VectorMapper &vec, const TensorMapper &mat, StorageIndex NC, StorageIndex C) const { const StorageIndex nonContractDim = NC; EIGEN_CONSTEXPR StorageIndex NCFactor = 1; EIGEN_CONSTEXPR StorageIndex CFactor = 1; EIGEN_CONSTEXPR StorageIndex NCWindow = 16; typedef Eigen::TensorSycl::internal::TVPanelSize Properties; const StorageIndex roundUpC = Eigen::TensorSycl::internal::roundUp(C, Properties::TileSizeDimC); const StorageIndex cNumGroups = roundUpC / (Properties::LocalThreadSizeC * Properties::WorkLoadPerThreadC); const StorageIndex roundUpNC = Eigen::TensorSycl::internal::roundUp(nonContractDim, Properties::TileSizeDimNC); const StorageIndex nCNumGroups = roundUpNC / (Properties::LocalThreadSizeNC * Properties::WorkLoadPerThreadNC); const StorageIndex globalRange = (roundUpNC / (Properties::WorkLoadPerThreadNC)) * (roundUpC / (Properties::WorkLoadPerThreadC)); const StorageIndex localRange = Properties::LocalThreadSizeNC * Properties::LocalThreadSizeC; const StorageIndex scratchSize = (Properties::WorkLoadPerThreadNC + CFactor) * Properties::LocalThreadSizeC * Properties::LocalThreadSizeNC; auto thread_range = cl::sycl::nd_range<1>(cl::sycl::range<1>(globalRange), cl::sycl::range<1>(localRange)); if (cNumGroups > 1) { typedef Eigen::TensorSycl::internal::GeneralVectorTensor ContractKernelName; CoeffReturnType *temp_pointer = static_cast(device().allocate_temp(nonContractDim * cNumGroups * sizeof(CoeffReturnType))); EvaluatorPointerType tmp_global_accessor = device().get(temp_pointer); device().template binary_kernel_launcher( vec, mat, tmp_global_accessor, thread_range, scratchSize, nCNumGroups, nonContractDim, C); typedef Eigen::internal::SumReducer Op; typedef TensorSycl::internal::SecondStepPartialReduction ReductionKernel; device().template unary_kernel_launcher( tmp_global_accessor, buffer, cl::sycl::nd_range<1>(cl::sycl::range<1>(Eigen::TensorSycl::internal::roundUp(nonContractDim, localRange)), cl::sycl::range<1>(localRange)), StorageIndex(1), Op(), nonContractDim, cNumGroups); device().deallocate_temp(temp_pointer); } else { typedef Eigen::TensorSycl::internal::GeneralVectorTensor ContractKernelName; device().template binary_kernel_launcher( vec, mat, buffer, thread_range, scratchSize, nCNumGroups, nonContractDim, C); } } #endif #ifndef EIGEN_SYCL_DISABLE_SCALAR template EIGEN_ALWAYS_INLINE void launchSC(EvaluatorPointerType buffer, const LhsMapper &lhs, const RhsMapper &rhs, StorageIndex K) const { EIGEN_STATIC_ASSERT(!((EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1) & (EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1 - 1)), "The Local thread size must be a power of 2 for the reduction " "operation"); EIGEN_CONSTEXPR StorageIndex local_range = EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1; // Here we force the code not to be more than 2-step reduction: Our empirical research shows that if each thread // reduces at least 512 elementss individually, we get better performance. const StorageIndex num_work_group = ((K + (512 * local_range - 1)) / (512 * local_range) > 1 ? local_range : 1); const StorageIndex global_range = num_work_group * local_range; typedef Eigen::TensorSycl::internal::GeneralScalarContraction< CoeffReturnType, LhsScalar, RhsScalar, EvaluatorPointerType, LhsMapper, RhsMapper, StorageIndex, false> ContractKernelName; auto thread_range = cl::sycl::nd_range<1>(cl::sycl::range<1>(global_range), cl::sycl::range<1>(local_range)); if (num_work_group > 1) { CoeffReturnType *temp_pointer = static_cast(device().allocate_temp(num_work_group * sizeof(CoeffReturnType))); EvaluatorPointerType tmp_global_accessor = device().get(temp_pointer); device().template binary_kernel_launcher(lhs, rhs, tmp_global_accessor, thread_range, local_range, K); typedef Eigen::internal::SumReducer Op; typedef TensorSycl::internal::SecondStepFullReducer GenericRKernel; device().template unary_kernel_launcher( tmp_global_accessor, buffer, cl::sycl::nd_range<1>(cl::sycl::range<1>(local_range), cl::sycl::range<1>(local_range)), local_range, Op()); device().deallocate_temp(temp_pointer); } else { device().template binary_kernel_launcher(lhs, rhs, buffer, thread_range, local_range, K); } } #endif EIGEN_STRONG_INLINE void cleanup() { this->m_leftImpl.cleanup(); this->m_rightImpl.cleanup(); if (this->m_result) { this->m_device.deallocate_temp(this->m_result); this->m_result = NULL; } } // The placeholder accessors must bound to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { this->m_leftImpl.bind(cgh); this->m_rightImpl.bind(cgh); this->m_result.bind(cgh); } }; } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_SYCL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h0000644000176200001440000007013414562417221025735 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_PADDING_H #define EIGEN_CXX11_TENSOR_TENSOR_PADDING_H namespace Eigen { /** \class TensorPadding * \ingroup CXX11_Tensor_Module * * \brief Tensor padding class. * At the moment only padding with a constant value is supported. * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorPaddingOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorPaddingOp type; }; } // end namespace internal template class TensorPaddingOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorPaddingOp(const XprType& expr, const PaddingDimensions& padding_dims, const Scalar padding_value) : m_xpr(expr), m_padding_dims(padding_dims), m_padding_value(padding_value) {} EIGEN_DEVICE_FUNC const PaddingDimensions& padding() const { return m_padding_dims; } EIGEN_DEVICE_FUNC Scalar padding_value() const { return m_padding_value; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const PaddingDimensions m_padding_dims; const Scalar m_padding_value; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorPaddingOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = true, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::RawAccess, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = true, RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_padding(op.padding()), m_paddingValue(op.padding_value()), m_device(device) { // The padding op doesn't change the rank of the tensor. Directly padding a scalar would lead // to a vector, which doesn't make sense. Instead one should reshape the scalar into a vector // of 1 element first and then pad. EIGEN_STATIC_ASSERT((NumDims > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); // Compute dimensions m_dimensions = m_impl.dimensions(); for (int i = 0; i < NumDims; ++i) { m_dimensions[i] += m_padding[i].first + m_padding[i].second; } const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { m_inputStrides[0] = 1; m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1]; } m_outputStrides[NumDims] = m_outputStrides[NumDims-1] * m_dimensions[NumDims-1]; } else { m_inputStrides[NumDims - 1] = 1; m_outputStrides[NumDims] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; m_outputStrides[i+1] = m_outputStrides[i+2] * m_dimensions[i+1]; } m_outputStrides[0] = m_outputStrides[1] * m_dimensions[0]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_impl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { eigen_assert(index < dimensions().TotalSize()); Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; if (isPaddingAtIndexForDim(idx, i)) { return m_paddingValue; } inputIndex += (idx - m_padding[i].first) * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } if (isPaddingAtIndexForDim(index, 0)) { return m_paddingValue; } inputIndex += (index - m_padding[0].first); } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStrides[i+1]; if (isPaddingAtIndexForDim(idx, i)) { return m_paddingValue; } inputIndex += (idx - m_padding[i].first) * m_inputStrides[i]; index -= idx * m_outputStrides[i+1]; } if (isPaddingAtIndexForDim(index, NumDims-1)) { return m_paddingValue; } inputIndex += (index - m_padding[NumDims-1].first); } return m_impl.coeff(inputIndex); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { if (static_cast(Layout) == static_cast(ColMajor)) { return packetColMajor(index); } return packetRowMajor(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { TensorOpCost cost = m_impl.costPerCoeff(vectorized); if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims; ++i) updateCostPerDimension(cost, i, i == 0); } else { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i >= 0; --i) updateCostPerDimension(cost, i, i == NumDims - 1); } return cost; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { const size_t target_size = m_device.lastLevelCacheSize(); return internal::TensorBlockResourceRequirements::merge( internal::TensorBlockResourceRequirements::skewed(target_size), m_impl.getResourceRequirements()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { // If one of the dimensions is zero, return empty block view. if (desc.size() == 0) { return TensorBlock(internal::TensorBlockKind::kView, NULL, desc.dimensions()); } static const bool IsColMajor = Layout == static_cast(ColMajor); const int inner_dim_idx = IsColMajor ? 0 : NumDims - 1; Index offset = desc.offset(); // Compute offsets in the output tensor corresponding to the desc.offset(). DSizes output_offsets; for (int i = NumDims - 1; i > 0; --i) { const int dim = IsColMajor ? i : NumDims - i - 1; const int stride_dim = IsColMajor ? dim : dim + 1; output_offsets[dim] = offset / m_outputStrides[stride_dim]; offset -= output_offsets[dim] * m_outputStrides[stride_dim]; } output_offsets[inner_dim_idx] = offset; // Offsets in the input corresponding to output offsets. DSizes input_offsets = output_offsets; for (int i = 0; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; input_offsets[dim] = input_offsets[dim] - m_padding[dim].first; } // Compute offset in the input buffer (at this point it might be illegal and // point outside of the input buffer, because we don't check for negative // offsets, it will be autocorrected in the block iteration loop below). Index input_offset = 0; for (int i = 0; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; input_offset += input_offsets[dim] * m_inputStrides[dim]; } // Destination buffer and scratch buffer both indexed from 0 and have the // same dimensions as the requested block (for destination buffer this // property is guaranteed by `desc.destination()`). Index output_offset = 0; const DSizes output_strides = internal::strides(desc.dimensions()); // NOTE(ezhulenev): We initialize bock iteration state for `NumDims - 1` // dimensions, skipping innermost dimension. In theory it should be possible // to squeeze matching innermost dimensions, however in practice that did // not show any improvements in benchmarks. Also in practice first outer // dimension usually has padding, and will prevent squeezing. // Initialize output block iterator state. Dimension in this array are // always in inner_most -> outer_most order (col major layout). array it; for (int i = 0; i < NumDims - 1; ++i) { const int dim = IsColMajor ? i + 1 : NumDims - i - 2; it[i].count = 0; it[i].size = desc.dimension(dim); it[i].input_stride = m_inputStrides[dim]; it[i].input_span = it[i].input_stride * (it[i].size - 1); it[i].output_stride = output_strides[dim]; it[i].output_span = it[i].output_stride * (it[i].size - 1); } const Index input_inner_dim_size = static_cast(m_impl.dimensions()[inner_dim_idx]); // Total output size. const Index output_size = desc.size(); // We will fill inner dimension of this size in the output. It might be // larger than the inner dimension in the input, so we might have to pad // before/after we copy values from the input inner dimension. const Index output_inner_dim_size = desc.dimension(inner_dim_idx); // How many values to fill with padding BEFORE reading from the input inner // dimension. const Index output_inner_pad_before_size = input_offsets[inner_dim_idx] < 0 ? numext::mini(numext::abs(input_offsets[inner_dim_idx]), output_inner_dim_size) : 0; // How many values we can actually copy from the input inner dimension. const Index output_inner_copy_size = numext::mini( // Want to copy from input. (output_inner_dim_size - output_inner_pad_before_size), // Can copy from input. numext::maxi(input_inner_dim_size - (input_offsets[inner_dim_idx] + output_inner_pad_before_size), Index(0))); eigen_assert(output_inner_copy_size >= 0); // How many values to fill with padding AFTER reading from the input inner // dimension. const Index output_inner_pad_after_size = (output_inner_dim_size - output_inner_copy_size - output_inner_pad_before_size); // Sanity check, sum of all sizes must be equal to the output size. eigen_assert(output_inner_dim_size == (output_inner_pad_before_size + output_inner_copy_size + output_inner_pad_after_size)); // Keep track of current coordinates and padding in the output. DSizes output_coord = output_offsets; DSizes output_padded; for (int i = 0; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; output_padded[dim] = isPaddingAtIndexForDim(output_coord[dim], dim); } typedef internal::StridedLinearBufferCopy LinCopy; // Prepare storage for the materialized padding result. const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage(desc, scratch); // TODO(ezhulenev): Squeeze multiple non-padded inner dimensions into a // single logical inner dimension. // When possible we squeeze writes for the innermost (only if non-padded) // dimension with the first padded dimension. This allows to reduce the // number of calls to LinCopy and better utilize vector instructions. const bool squeeze_writes = NumDims > 1 && // inner dimension is not padded (input_inner_dim_size == m_dimensions[inner_dim_idx]) && // and equal to the block inner dimension (input_inner_dim_size == output_inner_dim_size); const int squeeze_dim = IsColMajor ? inner_dim_idx + 1 : inner_dim_idx - 1; // Maximum coordinate on a squeeze dimension that we can write to. const Index squeeze_max_coord = squeeze_writes ? numext::mini( // max non-padded element in the input static_cast(m_dimensions[squeeze_dim] - m_padding[squeeze_dim].second), // max element in the output buffer static_cast(output_offsets[squeeze_dim] + desc.dimension(squeeze_dim))) : static_cast(0); // Iterate copying data from `m_impl.data()` to the output buffer. for (Index size = 0; size < output_size;) { // Detect if we are in the padded region (exclude innermost dimension). bool is_padded = false; for (int j = 1; j < NumDims; ++j) { const int dim = IsColMajor ? j : NumDims - j - 1; is_padded = output_padded[dim]; if (is_padded) break; } if (is_padded) { // Fill single innermost dimension with padding value. size += output_inner_dim_size; LinCopy::template Run( typename LinCopy::Dst(output_offset, 1, block_storage.data()), typename LinCopy::Src(0, 0, &m_paddingValue), output_inner_dim_size); } else if (squeeze_writes) { // Squeeze multiple reads from innermost dimensions. const Index squeeze_num = squeeze_max_coord - output_coord[squeeze_dim]; size += output_inner_dim_size * squeeze_num; // Copy `squeeze_num` inner dimensions from input to output. LinCopy::template Run( typename LinCopy::Dst(output_offset, 1, block_storage.data()), typename LinCopy::Src(input_offset, 1, m_impl.data()), output_inner_dim_size * squeeze_num); // Update iteration state for only `squeeze_num - 1` processed inner // dimensions, because we have another iteration state update at the end // of the loop that will update iteration state for the last inner // processed dimension. it[0].count += (squeeze_num - 1); input_offset += it[0].input_stride * (squeeze_num - 1); output_offset += it[0].output_stride * (squeeze_num - 1); output_coord[squeeze_dim] += (squeeze_num - 1); } else { // Single read from innermost dimension. size += output_inner_dim_size; { // Fill with padding before copying from input inner dimension. const Index out = output_offset; LinCopy::template Run( typename LinCopy::Dst(out, 1, block_storage.data()), typename LinCopy::Src(0, 0, &m_paddingValue), output_inner_pad_before_size); } { // Copy data from input inner dimension. const Index out = output_offset + output_inner_pad_before_size; const Index in = input_offset + output_inner_pad_before_size; eigen_assert(output_inner_copy_size == 0 || m_impl.data() != NULL); LinCopy::template Run( typename LinCopy::Dst(out, 1, block_storage.data()), typename LinCopy::Src(in, 1, m_impl.data()), output_inner_copy_size); } { // Fill with padding after copying from input inner dimension. const Index out = output_offset + output_inner_pad_before_size + output_inner_copy_size; LinCopy::template Run( typename LinCopy::Dst(out, 1, block_storage.data()), typename LinCopy::Src(0, 0, &m_paddingValue), output_inner_pad_after_size); } } for (int j = 0; j < NumDims - 1; ++j) { const int dim = IsColMajor ? j + 1 : NumDims - j - 2; if (++it[j].count < it[j].size) { input_offset += it[j].input_stride; output_offset += it[j].output_stride; output_coord[dim] += 1; output_padded[dim] = isPaddingAtIndexForDim(output_coord[dim], dim); break; } it[j].count = 0; input_offset -= it[j].input_span; output_offset -= it[j].output_span; output_coord[dim] -= it[j].size - 1; output_padded[dim] = isPaddingAtIndexForDim(output_coord[dim], dim); } } return block_storage.AsTensorMaterializedBlock(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif private: struct BlockIteratorState { BlockIteratorState() : count(0), size(0), input_stride(0), input_span(0), output_stride(0), output_span(0) {} Index count; Index size; Index input_stride; Index input_span; Index output_stride; Index output_span; }; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool isPaddingAtIndexForDim( Index index, int dim_index) const { #if defined(EIGEN_HAS_INDEX_LIST) return (!internal::index_pair_first_statically_eq(dim_index, 0) && index < m_padding[dim_index].first) || (!internal::index_pair_second_statically_eq(dim_index, 0) && index >= m_dimensions[dim_index] - m_padding[dim_index].second); #else return (index < m_padding[dim_index].first) || (index >= m_dimensions[dim_index] - m_padding[dim_index].second); #endif } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool isLeftPaddingCompileTimeZero( int dim_index) const { #if defined(EIGEN_HAS_INDEX_LIST) return internal::index_pair_first_statically_eq(dim_index, 0); #else EIGEN_UNUSED_VARIABLE(dim_index); return false; #endif } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool isRightPaddingCompileTimeZero( int dim_index) const { #if defined(EIGEN_HAS_INDEX_LIST) return internal::index_pair_second_statically_eq(dim_index, 0); #else EIGEN_UNUSED_VARIABLE(dim_index); return false; #endif } void updateCostPerDimension(TensorOpCost& cost, int i, bool first) const { const double in = static_cast(m_impl.dimensions()[i]); const double out = in + m_padding[i].first + m_padding[i].second; if (out == 0) return; const double reduction = in / out; cost *= reduction; if (first) { cost += TensorOpCost(0, 0, 2 * TensorOpCost::AddCost() + reduction * (1 * TensorOpCost::AddCost())); } else { cost += TensorOpCost(0, 0, 2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + reduction * (2 * TensorOpCost::MulCost() + 1 * TensorOpCost::DivCost())); } } protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetColMajor(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); const Index initialIndex = index; Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index firstIdx = index; const Index lastIdx = index + PacketSize - 1; const Index lastPaddedLeft = m_padding[i].first * m_outputStrides[i]; const Index firstPaddedRight = (m_dimensions[i] - m_padding[i].second) * m_outputStrides[i]; const Index lastPaddedRight = m_outputStrides[i+1]; if (!isLeftPaddingCompileTimeZero(i) && lastIdx < lastPaddedLeft) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if (!isRightPaddingCompileTimeZero(i) && firstIdx >= firstPaddedRight && lastIdx < lastPaddedRight) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if ((isLeftPaddingCompileTimeZero(i) && isRightPaddingCompileTimeZero(i)) || (firstIdx >= lastPaddedLeft && lastIdx < firstPaddedRight)) { // all the coefficient are between the 2 padding zones. const Index idx = index / m_outputStrides[i]; inputIndex += (idx - m_padding[i].first) * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } else { // Every other case return packetWithPossibleZero(initialIndex); } } const Index lastIdx = index + PacketSize - 1; const Index firstIdx = index; const Index lastPaddedLeft = m_padding[0].first; const Index firstPaddedRight = (m_dimensions[0] - m_padding[0].second); const Index lastPaddedRight = m_outputStrides[1]; if (!isLeftPaddingCompileTimeZero(0) && lastIdx < lastPaddedLeft) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if (!isRightPaddingCompileTimeZero(0) && firstIdx >= firstPaddedRight && lastIdx < lastPaddedRight) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if ((isLeftPaddingCompileTimeZero(0) && isRightPaddingCompileTimeZero(0)) || (firstIdx >= lastPaddedLeft && lastIdx < firstPaddedRight)) { // all the coefficient are between the 2 padding zones. inputIndex += (index - m_padding[0].first); return m_impl.template packet(inputIndex); } // Every other case return packetWithPossibleZero(initialIndex); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetRowMajor(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); const Index initialIndex = index; Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index firstIdx = index; const Index lastIdx = index + PacketSize - 1; const Index lastPaddedLeft = m_padding[i].first * m_outputStrides[i+1]; const Index firstPaddedRight = (m_dimensions[i] - m_padding[i].second) * m_outputStrides[i+1]; const Index lastPaddedRight = m_outputStrides[i]; if (!isLeftPaddingCompileTimeZero(i) && lastIdx < lastPaddedLeft) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if (!isRightPaddingCompileTimeZero(i) && firstIdx >= firstPaddedRight && lastIdx < lastPaddedRight) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if ((isLeftPaddingCompileTimeZero(i) && isRightPaddingCompileTimeZero(i)) || (firstIdx >= lastPaddedLeft && lastIdx < firstPaddedRight)) { // all the coefficient are between the 2 padding zones. const Index idx = index / m_outputStrides[i+1]; inputIndex += (idx - m_padding[i].first) * m_inputStrides[i]; index -= idx * m_outputStrides[i+1]; } else { // Every other case return packetWithPossibleZero(initialIndex); } } const Index lastIdx = index + PacketSize - 1; const Index firstIdx = index; const Index lastPaddedLeft = m_padding[NumDims-1].first; const Index firstPaddedRight = (m_dimensions[NumDims-1] - m_padding[NumDims-1].second); const Index lastPaddedRight = m_outputStrides[NumDims-1]; if (!isLeftPaddingCompileTimeZero(NumDims-1) && lastIdx < lastPaddedLeft) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if (!isRightPaddingCompileTimeZero(NumDims-1) && firstIdx >= firstPaddedRight && lastIdx < lastPaddedRight) { // all the coefficient are in the padding zone. return internal::pset1(m_paddingValue); } else if ((isLeftPaddingCompileTimeZero(NumDims-1) && isRightPaddingCompileTimeZero(NumDims-1)) || (firstIdx >= lastPaddedLeft && lastIdx < firstPaddedRight)) { // all the coefficient are between the 2 padding zones. inputIndex += (index - m_padding[NumDims-1].first); return m_impl.template packet(inputIndex); } // Every other case return packetWithPossibleZero(initialIndex); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } Dimensions m_dimensions; array m_outputStrides; array m_inputStrides; TensorEvaluator m_impl; PaddingDimensions m_padding; Scalar m_paddingValue; const Device EIGEN_DEVICE_REF m_device; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_PADDING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h0000644000176200001440000005743114562417221025013 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Jianwei Cui // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_FFT_H #define EIGEN_CXX11_TENSOR_TENSOR_FFT_H namespace Eigen { /** \class TensorFFT * \ingroup CXX11_Tensor_Module * * \brief Tensor FFT class. * * TODO: * Vectorize the Cooley Tukey and the Bluestein algorithm * Add support for multithreaded evaluation * Improve the performance on GPU */ template struct MakeComplex { template EIGEN_DEVICE_FUNC T operator() (const T& val) const { return val; } }; template <> struct MakeComplex { template EIGEN_DEVICE_FUNC std::complex operator() (const T& val) const { return std::complex(val, 0); } }; template <> struct MakeComplex { template EIGEN_DEVICE_FUNC std::complex operator() (const std::complex& val) const { return val; } }; template struct PartOf { template T operator() (const T& val) const { return val; } }; template <> struct PartOf { template T operator() (const std::complex& val) const { return val.real(); } }; template <> struct PartOf { template T operator() (const std::complex& val) const { return val.imag(); } }; namespace internal { template struct traits > : public traits { typedef traits XprTraits; typedef typename NumTraits::Real RealScalar; typedef typename std::complex ComplexScalar; typedef typename XprTraits::Scalar InputScalar; typedef typename conditional::type OutputScalar; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename traits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorFFTOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorFFTOp type; }; } // end namespace internal template class TensorFFTOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename std::complex ComplexScalar; typedef typename internal::conditional::type OutputScalar; typedef OutputScalar CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFFTOp(const XprType& expr, const FFT& fft) : m_xpr(expr), m_fft(fft) {} EIGEN_DEVICE_FUNC const FFT& fft() const { return m_fft; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const FFT m_fft; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorFFTOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename std::complex ComplexScalar; typedef typename TensorEvaluator::Dimensions InputDimensions; typedef internal::traits XprTraits; typedef typename XprTraits::Scalar InputScalar; typedef typename internal::conditional::type OutputScalar; typedef OutputScalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = internal::unpacket_traits::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = true, BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_fft(op.fft()), m_impl(op.expression(), device), m_data(NULL), m_device(device) { const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); for (int i = 0; i < NumDims; ++i) { eigen_assert(input_dims[i] > 0); m_dimensions[i] = input_dims[i]; } if (static_cast(Layout) == static_cast(ColMajor)) { m_strides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_strides[i] = m_strides[i - 1] * m_dimensions[i - 1]; } } else { m_strides[NumDims - 1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_strides[i] = m_strides[i + 1] * m_dimensions[i + 1]; } } m_size = m_dimensions.TotalSize(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { m_impl.evalSubExprsIfNeeded(NULL); if (data) { evalToBuf(data); return false; } else { m_data = (EvaluatorPointerType)m_device.get((CoeffReturnType*)(m_device.allocate_temp(sizeof(CoeffReturnType) * m_size))); evalToBuf(m_data); return true; } } EIGEN_STRONG_INLINE void cleanup() { if (m_data) { m_device.deallocate(m_data); m_data = NULL; } m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffReturnType coeff(Index index) const { return m_data[index]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketReturnType packet(Index index) const { return internal::ploadt(m_data + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_data; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_data.bind(cgh); } #endif private: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalToBuf(EvaluatorPointerType data) { const bool write_to_out = internal::is_same::value; ComplexScalar* buf = write_to_out ? (ComplexScalar*)data : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * m_size); for (Index i = 0; i < m_size; ++i) { buf[i] = MakeComplex::value>()(m_impl.coeff(i)); } for (size_t i = 0; i < m_fft.size(); ++i) { Index dim = m_fft[i]; eigen_assert(dim >= 0 && dim < NumDims); Index line_len = m_dimensions[dim]; eigen_assert(line_len >= 1); ComplexScalar* line_buf = (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * line_len); const bool is_power_of_two = isPowerOfTwo(line_len); const Index good_composite = is_power_of_two ? 0 : findGoodComposite(line_len); const Index log_len = is_power_of_two ? getLog2(line_len) : getLog2(good_composite); ComplexScalar* a = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * good_composite); ComplexScalar* b = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * good_composite); ComplexScalar* pos_j_base_powered = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * (line_len + 1)); if (!is_power_of_two) { // Compute twiddle factors // t_n = exp(sqrt(-1) * pi * n^2 / line_len) // for n = 0, 1,..., line_len-1. // For n > 2 we use the recurrence t_n = t_{n-1}^2 / t_{n-2} * t_1^2 // The recurrence is correct in exact arithmetic, but causes // numerical issues for large transforms, especially in // single-precision floating point. // // pos_j_base_powered[0] = ComplexScalar(1, 0); // if (line_len > 1) { // const ComplexScalar pos_j_base = ComplexScalar( // numext::cos(M_PI / line_len), numext::sin(M_PI / line_len)); // pos_j_base_powered[1] = pos_j_base; // if (line_len > 2) { // const ComplexScalar pos_j_base_sq = pos_j_base * pos_j_base; // for (int i = 2; i < line_len + 1; ++i) { // pos_j_base_powered[i] = pos_j_base_powered[i - 1] * // pos_j_base_powered[i - 1] / // pos_j_base_powered[i - 2] * // pos_j_base_sq; // } // } // } // TODO(rmlarsen): Find a way to use Eigen's vectorized sin // and cosine functions here. for (int j = 0; j < line_len + 1; ++j) { double arg = ((EIGEN_PI * j) * j) / line_len; std::complex tmp(numext::cos(arg), numext::sin(arg)); pos_j_base_powered[j] = static_cast(tmp); } } for (Index partial_index = 0; partial_index < m_size / line_len; ++partial_index) { const Index base_offset = getBaseOffsetFromIndex(partial_index, dim); // get data into line_buf const Index stride = m_strides[dim]; if (stride == 1) { m_device.memcpy(line_buf, &buf[base_offset], line_len*sizeof(ComplexScalar)); } else { Index offset = base_offset; for (int j = 0; j < line_len; ++j, offset += stride) { line_buf[j] = buf[offset]; } } // process the line if (is_power_of_two) { processDataLineCooleyTukey(line_buf, line_len, log_len); } else { processDataLineBluestein(line_buf, line_len, good_composite, log_len, a, b, pos_j_base_powered); } // write back if (FFTDir == FFT_FORWARD && stride == 1) { m_device.memcpy(&buf[base_offset], line_buf, line_len*sizeof(ComplexScalar)); } else { Index offset = base_offset; const ComplexScalar div_factor = ComplexScalar(1.0 / line_len, 0); for (int j = 0; j < line_len; ++j, offset += stride) { buf[offset] = (FFTDir == FFT_FORWARD) ? line_buf[j] : line_buf[j] * div_factor; } } } m_device.deallocate(line_buf); if (!is_power_of_two) { m_device.deallocate(a); m_device.deallocate(b); m_device.deallocate(pos_j_base_powered); } } if(!write_to_out) { for (Index i = 0; i < m_size; ++i) { data[i] = PartOf()(buf[i]); } m_device.deallocate(buf); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static bool isPowerOfTwo(Index x) { eigen_assert(x > 0); return !(x & (x - 1)); } // The composite number for padding, used in Bluestein's FFT algorithm EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Index findGoodComposite(Index n) { Index i = 2; while (i < 2 * n - 1) i *= 2; return i; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Index getLog2(Index m) { Index log2m = 0; while (m >>= 1) log2m++; return log2m; } // Call Cooley Tukey algorithm directly, data length must be power of 2 EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void processDataLineCooleyTukey(ComplexScalar* line_buf, Index line_len, Index log_len) { eigen_assert(isPowerOfTwo(line_len)); scramble_FFT(line_buf, line_len); compute_1D_Butterfly(line_buf, line_len, log_len); } // Call Bluestein's FFT algorithm, m is a good composite number greater than (2 * n - 1), used as the padding length EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void processDataLineBluestein(ComplexScalar* line_buf, Index line_len, Index good_composite, Index log_len, ComplexScalar* a, ComplexScalar* b, const ComplexScalar* pos_j_base_powered) { Index n = line_len; Index m = good_composite; ComplexScalar* data = line_buf; for (Index i = 0; i < n; ++i) { if(FFTDir == FFT_FORWARD) { a[i] = data[i] * numext::conj(pos_j_base_powered[i]); } else { a[i] = data[i] * pos_j_base_powered[i]; } } for (Index i = n; i < m; ++i) { a[i] = ComplexScalar(0, 0); } for (Index i = 0; i < n; ++i) { if(FFTDir == FFT_FORWARD) { b[i] = pos_j_base_powered[i]; } else { b[i] = numext::conj(pos_j_base_powered[i]); } } for (Index i = n; i < m - n; ++i) { b[i] = ComplexScalar(0, 0); } for (Index i = m - n; i < m; ++i) { if(FFTDir == FFT_FORWARD) { b[i] = pos_j_base_powered[m-i]; } else { b[i] = numext::conj(pos_j_base_powered[m-i]); } } scramble_FFT(a, m); compute_1D_Butterfly(a, m, log_len); scramble_FFT(b, m); compute_1D_Butterfly(b, m, log_len); for (Index i = 0; i < m; ++i) { a[i] *= b[i]; } scramble_FFT(a, m); compute_1D_Butterfly(a, m, log_len); //Do the scaling after ifft for (Index i = 0; i < m; ++i) { a[i] /= m; } for (Index i = 0; i < n; ++i) { if(FFTDir == FFT_FORWARD) { data[i] = a[i] * numext::conj(pos_j_base_powered[i]); } else { data[i] = a[i] * pos_j_base_powered[i]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void scramble_FFT(ComplexScalar* data, Index n) { eigen_assert(isPowerOfTwo(n)); Index j = 1; for (Index i = 1; i < n; ++i){ if (j > i) { std::swap(data[j-1], data[i-1]); } Index m = n >> 1; while (m >= 2 && j > m) { j -= m; m >>= 1; } j += m; } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void butterfly_2(ComplexScalar* data) { ComplexScalar tmp = data[1]; data[1] = data[0] - data[1]; data[0] += tmp; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void butterfly_4(ComplexScalar* data) { ComplexScalar tmp[4]; tmp[0] = data[0] + data[1]; tmp[1] = data[0] - data[1]; tmp[2] = data[2] + data[3]; if (Dir == FFT_FORWARD) { tmp[3] = ComplexScalar(0.0, -1.0) * (data[2] - data[3]); } else { tmp[3] = ComplexScalar(0.0, 1.0) * (data[2] - data[3]); } data[0] = tmp[0] + tmp[2]; data[1] = tmp[1] + tmp[3]; data[2] = tmp[0] - tmp[2]; data[3] = tmp[1] - tmp[3]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void butterfly_8(ComplexScalar* data) { ComplexScalar tmp_1[8]; ComplexScalar tmp_2[8]; tmp_1[0] = data[0] + data[1]; tmp_1[1] = data[0] - data[1]; tmp_1[2] = data[2] + data[3]; if (Dir == FFT_FORWARD) { tmp_1[3] = (data[2] - data[3]) * ComplexScalar(0, -1); } else { tmp_1[3] = (data[2] - data[3]) * ComplexScalar(0, 1); } tmp_1[4] = data[4] + data[5]; tmp_1[5] = data[4] - data[5]; tmp_1[6] = data[6] + data[7]; if (Dir == FFT_FORWARD) { tmp_1[7] = (data[6] - data[7]) * ComplexScalar(0, -1); } else { tmp_1[7] = (data[6] - data[7]) * ComplexScalar(0, 1); } tmp_2[0] = tmp_1[0] + tmp_1[2]; tmp_2[1] = tmp_1[1] + tmp_1[3]; tmp_2[2] = tmp_1[0] - tmp_1[2]; tmp_2[3] = tmp_1[1] - tmp_1[3]; tmp_2[4] = tmp_1[4] + tmp_1[6]; // SQRT2DIV2 = sqrt(2)/2 #define SQRT2DIV2 0.7071067811865476 if (Dir == FFT_FORWARD) { tmp_2[5] = (tmp_1[5] + tmp_1[7]) * ComplexScalar(SQRT2DIV2, -SQRT2DIV2); tmp_2[6] = (tmp_1[4] - tmp_1[6]) * ComplexScalar(0, -1); tmp_2[7] = (tmp_1[5] - tmp_1[7]) * ComplexScalar(-SQRT2DIV2, -SQRT2DIV2); } else { tmp_2[5] = (tmp_1[5] + tmp_1[7]) * ComplexScalar(SQRT2DIV2, SQRT2DIV2); tmp_2[6] = (tmp_1[4] - tmp_1[6]) * ComplexScalar(0, 1); tmp_2[7] = (tmp_1[5] - tmp_1[7]) * ComplexScalar(-SQRT2DIV2, SQRT2DIV2); } data[0] = tmp_2[0] + tmp_2[4]; data[1] = tmp_2[1] + tmp_2[5]; data[2] = tmp_2[2] + tmp_2[6]; data[3] = tmp_2[3] + tmp_2[7]; data[4] = tmp_2[0] - tmp_2[4]; data[5] = tmp_2[1] - tmp_2[5]; data[6] = tmp_2[2] - tmp_2[6]; data[7] = tmp_2[3] - tmp_2[7]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void butterfly_1D_merge( ComplexScalar* data, Index n, Index n_power_of_2) { // Original code: // RealScalar wtemp = std::sin(M_PI/n); // RealScalar wpi = -std::sin(2 * M_PI/n); const RealScalar wtemp = m_sin_PI_div_n_LUT[n_power_of_2]; const RealScalar wpi = (Dir == FFT_FORWARD) ? m_minus_sin_2_PI_div_n_LUT[n_power_of_2] : -m_minus_sin_2_PI_div_n_LUT[n_power_of_2]; const ComplexScalar wp(wtemp, wpi); const ComplexScalar wp_one = wp + ComplexScalar(1, 0); const ComplexScalar wp_one_2 = wp_one * wp_one; const ComplexScalar wp_one_3 = wp_one_2 * wp_one; const ComplexScalar wp_one_4 = wp_one_3 * wp_one; const Index n2 = n / 2; ComplexScalar w(1.0, 0.0); for (Index i = 0; i < n2; i += 4) { ComplexScalar temp0(data[i + n2] * w); ComplexScalar temp1(data[i + 1 + n2] * w * wp_one); ComplexScalar temp2(data[i + 2 + n2] * w * wp_one_2); ComplexScalar temp3(data[i + 3 + n2] * w * wp_one_3); w = w * wp_one_4; data[i + n2] = data[i] - temp0; data[i] += temp0; data[i + 1 + n2] = data[i + 1] - temp1; data[i + 1] += temp1; data[i + 2 + n2] = data[i + 2] - temp2; data[i + 2] += temp2; data[i + 3 + n2] = data[i + 3] - temp3; data[i + 3] += temp3; } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_1D_Butterfly( ComplexScalar* data, Index n, Index n_power_of_2) { eigen_assert(isPowerOfTwo(n)); if (n > 8) { compute_1D_Butterfly(data, n / 2, n_power_of_2 - 1); compute_1D_Butterfly(data + n / 2, n / 2, n_power_of_2 - 1); butterfly_1D_merge(data, n, n_power_of_2); } else if (n == 8) { butterfly_8(data); } else if (n == 4) { butterfly_4(data); } else if (n == 2) { butterfly_2(data); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index getBaseOffsetFromIndex(Index index, Index omitted_dim) const { Index result = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > omitted_dim; --i) { const Index partial_m_stride = m_strides[i] / m_dimensions[omitted_dim]; const Index idx = index / partial_m_stride; index -= idx * partial_m_stride; result += idx * m_strides[i]; } result += index; } else { for (Index i = 0; i < omitted_dim; ++i) { const Index partial_m_stride = m_strides[i] / m_dimensions[omitted_dim]; const Index idx = index / partial_m_stride; index -= idx * partial_m_stride; result += idx * m_strides[i]; } result += index; } // Value of index_coords[omitted_dim] is not determined to this step return result; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index getIndexFromOffset(Index base, Index omitted_dim, Index offset) const { Index result = base + offset * m_strides[omitted_dim] ; return result; } protected: Index m_size; const FFT EIGEN_DEVICE_REF m_fft; Dimensions m_dimensions; array m_strides; TensorEvaluator m_impl; EvaluatorPointerType m_data; const Device EIGEN_DEVICE_REF m_device; // This will support a maximum FFT size of 2^32 for each dimension // m_sin_PI_div_n_LUT[i] = (-2) * std::sin(M_PI / std::pow(2,i)) ^ 2; const RealScalar m_sin_PI_div_n_LUT[32] = { RealScalar(0.0), RealScalar(-2), RealScalar(-0.999999999999999), RealScalar(-0.292893218813453), RealScalar(-0.0761204674887130), RealScalar(-0.0192147195967696), RealScalar(-0.00481527332780311), RealScalar(-0.00120454379482761), RealScalar(-3.01181303795779e-04), RealScalar(-7.52981608554592e-05), RealScalar(-1.88247173988574e-05), RealScalar(-4.70619042382852e-06), RealScalar(-1.17654829809007e-06), RealScalar(-2.94137117780840e-07), RealScalar(-7.35342821488550e-08), RealScalar(-1.83835707061916e-08), RealScalar(-4.59589268710903e-09), RealScalar(-1.14897317243732e-09), RealScalar(-2.87243293150586e-10), RealScalar( -7.18108232902250e-11), RealScalar(-1.79527058227174e-11), RealScalar(-4.48817645568941e-12), RealScalar(-1.12204411392298e-12), RealScalar(-2.80511028480785e-13), RealScalar(-7.01277571201985e-14), RealScalar(-1.75319392800498e-14), RealScalar(-4.38298482001247e-15), RealScalar(-1.09574620500312e-15), RealScalar(-2.73936551250781e-16), RealScalar(-6.84841378126949e-17), RealScalar(-1.71210344531737e-17), RealScalar(-4.28025861329343e-18) }; // m_minus_sin_2_PI_div_n_LUT[i] = -std::sin(2 * M_PI / std::pow(2,i)); const RealScalar m_minus_sin_2_PI_div_n_LUT[32] = { RealScalar(0.0), RealScalar(0.0), RealScalar(-1.00000000000000e+00), RealScalar(-7.07106781186547e-01), RealScalar(-3.82683432365090e-01), RealScalar(-1.95090322016128e-01), RealScalar(-9.80171403295606e-02), RealScalar(-4.90676743274180e-02), RealScalar(-2.45412285229123e-02), RealScalar(-1.22715382857199e-02), RealScalar(-6.13588464915448e-03), RealScalar(-3.06795676296598e-03), RealScalar(-1.53398018628477e-03), RealScalar(-7.66990318742704e-04), RealScalar(-3.83495187571396e-04), RealScalar(-1.91747597310703e-04), RealScalar(-9.58737990959773e-05), RealScalar(-4.79368996030669e-05), RealScalar(-2.39684498084182e-05), RealScalar(-1.19842249050697e-05), RealScalar(-5.99211245264243e-06), RealScalar(-2.99605622633466e-06), RealScalar(-1.49802811316901e-06), RealScalar(-7.49014056584716e-07), RealScalar(-3.74507028292384e-07), RealScalar(-1.87253514146195e-07), RealScalar(-9.36267570730981e-08), RealScalar(-4.68133785365491e-08), RealScalar(-2.34066892682746e-08), RealScalar(-1.17033446341373e-08), RealScalar(-5.85167231706864e-09), RealScalar(-2.92583615853432e-09) }; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_FFT_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceCuda.h0000644000176200001440000000032714562417221026360 0ustar liggesusers #if defined(__clang__) || defined(__GNUC__) #warning "Deprecated header file, please either include the main Eigen/CXX11/Tensor header or the respective TensorDeviceGpu.h file" #endif #include "TensorDeviceGpu.h" RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h0000644000176200001440000011610514562417221026330 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H #define EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H namespace Eigen { /** \class TensorEvaluator * \ingroup CXX11_Tensor_Module * * \brief The tensor evaluator classes. * * These classes are responsible for the evaluation of the tensor expression. * * TODO: add support for more types of expressions, in particular expressions * leading to lvalues (slicing, reshaping, etc...) */ // Generic evaluator template struct TensorEvaluator { typedef typename Derived::Index Index; typedef typename Derived::Scalar Scalar; typedef typename Derived::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename Derived::Dimensions Dimensions; typedef Derived XprType; static const int PacketSize = PacketType::size; typedef typename internal::traits::template MakePointer::Type TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; // NumDimensions is -1 for variable dim tensors static const int NumCoords = internal::traits::NumDimensions > 0 ? internal::traits::NumDimensions : 0; enum { IsAligned = Derived::IsAligned, PacketAccess = (PacketType::size > 1), BlockAccess = internal::is_arithmetic::type>::value, PreferBlockAccess = false, Layout = Derived::Layout, CoordAccess = NumCoords > 0, RawAccess = true }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const Derived& m, const Device& device) : m_data(device.get((const_cast(m.data())))), m_dims(m.dimensions()), m_device(device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dims; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType dest) { if (!NumTraits::type>::RequireInitialization && dest) { m_device.memcpy((void*)(m_device.get(dest)), m_device.get(m_data), m_dims.TotalSize() * sizeof(Scalar)); return false; } return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType dest, EvalSubExprsCallback done) { // TODO(ezhulenev): ThreadPoolDevice memcpy is blockign operation. done(evalSubExprsIfNeeded(dest)); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { eigen_assert(m_data != NULL); return m_data[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { eigen_assert(m_data != NULL); return m_data[index]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return internal::ploadt(m_data + index); } // Return a packet starting at `index` where `umask` specifies which elements // have to be loaded. Type/size of mask depends on PacketReturnType, e.g. for // Packet16f, `umask` is of type uint16_t and if a bit is 1, corresponding // float element will be loaded, otherwise 0 will be loaded. // Function has been templatized to enable Sfinae. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::enable_if::masked_load_available, PacketReturnTypeT>::type partialPacket(Index index, typename internal::unpacket_traits::mask_t umask) const { return internal::ploadu(m_data + index, umask); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { return internal::pstoret(m_data + index, x); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array& coords) const { eigen_assert(m_data != NULL); if (static_cast(Layout) == static_cast(ColMajor)) { return m_data[m_dims.IndexOfColMajor(coords)]; } else { return m_data[m_dims.IndexOfRowMajor(coords)]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(const array& coords) { eigen_assert(m_data != NULL); if (static_cast(Layout) == static_cast(ColMajor)) { return m_data[m_dims.IndexOfColMajor(coords)]; } else { return m_data[m_dims.IndexOfRowMajor(coords)]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketType::size); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return internal::TensorBlockResourceRequirements::any(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { assert(m_data != NULL); return TensorBlock::materialize(m_data, m_dims, desc, scratch); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock( const TensorBlockDesc& desc, const TensorBlock& block) { assert(m_data != NULL); typedef typename TensorBlock::XprType TensorBlockExpr; typedef internal::TensorBlockAssignment TensorBlockAssign; TensorBlockAssign::Run( TensorBlockAssign::target(desc.dimensions(), internal::strides(m_dims), m_data, desc.offset()), block.expr()); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_data; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_data.bind(cgh); } #endif protected: EvaluatorPointerType m_data; Dimensions m_dims; const Device EIGEN_DEVICE_REF m_device; }; namespace { template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T loadConstant(const T* address) { return *address; } // Use the texture cache on CUDA devices whenever possible #if defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350 template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float loadConstant(const float* address) { return __ldg(address); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double loadConstant(const double* address) { return __ldg(address); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Eigen::half loadConstant(const Eigen::half* address) { return Eigen::half(half_impl::raw_uint16_to_half(__ldg(&address->x))); } #endif #ifdef EIGEN_USE_SYCL // overload of load constant should be implemented here based on range access template T &loadConstant(const Eigen::TensorSycl::internal::RangeAccess &address) { return *address; } #endif } // Default evaluator for rvalues template struct TensorEvaluator { typedef typename Derived::Index Index; typedef typename Derived::Scalar Scalar; typedef typename Derived::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename Derived::Dimensions Dimensions; typedef const Derived XprType; typedef typename internal::traits::template MakePointer::Type TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef typename internal::remove_const::type ScalarNoConst; // NumDimensions is -1 for variable dim tensors static const int NumCoords = internal::traits::NumDimensions > 0 ? internal::traits::NumDimensions : 0; static const int PacketSize = PacketType::size; enum { IsAligned = Derived::IsAligned, PacketAccess = (PacketType::size > 1), BlockAccess = internal::is_arithmetic::value, PreferBlockAccess = false, Layout = Derived::Layout, CoordAccess = NumCoords > 0, RawAccess = true }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const Derived& m, const Device& device) : m_data(device.get(m.data())), m_dims(m.dimensions()), m_device(device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dims; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { if (!NumTraits::type>::RequireInitialization && data) { m_device.memcpy((void*)(m_device.get(data)),m_device.get(m_data), m_dims.TotalSize() * sizeof(Scalar)); return false; } return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType dest, EvalSubExprsCallback done) { // TODO(ezhulenev): ThreadPoolDevice memcpy is a blockign operation. done(evalSubExprsIfNeeded(dest)); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { eigen_assert(m_data != NULL); return loadConstant(m_data+index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return internal::ploadt_ro(m_data + index); } // Return a packet starting at `index` where `umask` specifies which elements // have to be loaded. Type/size of mask depends on PacketReturnType, e.g. for // Packet16f, `umask` is of type uint16_t and if a bit is 1, corresponding // float element will be loaded, otherwise 0 will be loaded. // Function has been templatized to enable Sfinae. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::enable_if::masked_load_available, PacketReturnTypeT>::type partialPacket(Index index, typename internal::unpacket_traits::mask_t umask) const { return internal::ploadu(m_data + index, umask); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array& coords) const { eigen_assert(m_data != NULL); const Index index = (static_cast(Layout) == static_cast(ColMajor)) ? m_dims.IndexOfColMajor(coords) : m_dims.IndexOfRowMajor(coords); return loadConstant(m_data+index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketType::size); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return internal::TensorBlockResourceRequirements::any(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { assert(m_data != NULL); return TensorBlock::materialize(m_data, m_dims, desc, scratch); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_data; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_data.bind(cgh); } #endif protected: EvaluatorPointerType m_data; Dimensions m_dims; const Device EIGEN_DEVICE_REF m_device; }; // -------------------- CwiseNullaryOp -------------------- template struct TensorEvaluator, Device> { typedef TensorCwiseNullaryOp XprType; TensorEvaluator(const XprType& op, const Device& device) : m_functor(op.functor()), m_argImpl(op.nestedExpression(), device), m_wrapper() { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename internal::traits::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = true, PacketAccess = internal::functor_traits::PacketAccess #ifdef EIGEN_USE_SYCL && (PacketType::size >1) #endif , BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_argImpl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { done(true); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_wrapper(m_functor, index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_wrapper.template packetOp(m_functor, index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0, vectorized, PacketType::size); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_argImpl.bind(cgh); } #endif private: const NullaryOp m_functor; TensorEvaluator m_argImpl; const internal::nullary_wrapper m_wrapper; }; // -------------------- CwiseUnaryOp -------------------- template struct TensorEvaluator, Device> { typedef TensorCwiseUnaryOp XprType; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = int(TensorEvaluator::PacketAccess) & int(internal::functor_traits::PacketAccess), BlockAccess = TensorEvaluator::BlockAccess, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; TensorEvaluator(const XprType& op, const Device& device) : m_device(device), m_functor(op.functor()), m_argImpl(op.nestedExpression(), device) { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename internal::remove_const::type ScalarNoConst; typedef typename internal::traits::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; static const int NumDims = internal::array_size::value; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; typedef internal::TensorCwiseUnaryBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_argImpl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_argImpl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_argImpl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_argImpl.cleanup(); } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_functor(m_argImpl.coeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_functor.packetOp(m_argImpl.template packet(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double functor_cost = internal::functor_traits::Cost; return m_argImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, functor_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { static const double functor_cost = internal::functor_traits::Cost; return m_argImpl.getResourceRequirements().addCostPerCoeff( {0, 0, functor_cost / PacketSize}); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { return TensorBlock(m_argImpl.block(desc, scratch), m_functor); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const{ m_argImpl.bind(cgh); } #endif private: const Device EIGEN_DEVICE_REF m_device; const UnaryOp m_functor; TensorEvaluator m_argImpl; }; // -------------------- CwiseBinaryOp -------------------- template struct TensorEvaluator, Device> { typedef TensorCwiseBinaryOp XprType; enum { IsAligned = int(TensorEvaluator::IsAligned) & int(TensorEvaluator::IsAligned), PacketAccess = int(TensorEvaluator::PacketAccess) & int(TensorEvaluator::PacketAccess) & int(internal::functor_traits::PacketAccess), BlockAccess = int(TensorEvaluator::BlockAccess) & int(TensorEvaluator::BlockAccess), PreferBlockAccess = int(TensorEvaluator::PreferBlockAccess) | int(TensorEvaluator::PreferBlockAccess), Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; TensorEvaluator(const XprType& op, const Device& device) : m_device(device), m_functor(op.functor()), m_leftImpl(op.lhsExpression(), device), m_rightImpl(op.rhsExpression(), device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout) || internal::traits::NumDimensions <= 1), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())); } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename internal::traits::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; static const int NumDims = internal::array_size< typename TensorEvaluator::Dimensions>::value; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock LeftTensorBlock; typedef typename TensorEvaluator::TensorBlock RightTensorBlock; typedef internal::TensorCwiseBinaryBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { // TODO: use right impl instead if right impl dimensions are known at compile time. return m_leftImpl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_leftImpl.evalSubExprsIfNeeded(NULL); m_rightImpl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { // TODO(ezhulenev): Evaluate two expression in parallel? m_leftImpl.evalSubExprsIfNeededAsync(nullptr, [this, done](bool) { m_rightImpl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_leftImpl.cleanup(); m_rightImpl.cleanup(); } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_functor(m_leftImpl.coeff(index), m_rightImpl.coeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_functor.packetOp(m_leftImpl.template packet(index), m_rightImpl.template packet(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double functor_cost = internal::functor_traits::Cost; return m_leftImpl.costPerCoeff(vectorized) + m_rightImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, functor_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { static const double functor_cost = internal::functor_traits::Cost; return internal::TensorBlockResourceRequirements::merge( m_leftImpl.getResourceRequirements(), m_rightImpl.getResourceRequirements()) .addCostPerCoeff({0, 0, functor_cost / PacketSize}); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { desc.DropDestinationBuffer(); return TensorBlock(m_leftImpl.block(desc, scratch), m_rightImpl.block(desc, scratch), m_functor); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_leftImpl.bind(cgh); m_rightImpl.bind(cgh); } #endif private: const Device EIGEN_DEVICE_REF m_device; const BinaryOp m_functor; TensorEvaluator m_leftImpl; TensorEvaluator m_rightImpl; }; // -------------------- CwiseTernaryOp -------------------- template struct TensorEvaluator, Device> { typedef TensorCwiseTernaryOp XprType; enum { IsAligned = TensorEvaluator::IsAligned & TensorEvaluator::IsAligned & TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess && TensorEvaluator::PacketAccess && TensorEvaluator::PacketAccess && internal::functor_traits::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; TensorEvaluator(const XprType& op, const Device& device) : m_functor(op.functor()), m_arg1Impl(op.arg1Expression(), device), m_arg2Impl(op.arg2Expression(), device), m_arg3Impl(op.arg3Expression(), device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout) || internal::traits::NumDimensions <= 1), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((internal::is_same::StorageKind, typename internal::traits::StorageKind>::value), STORAGE_KIND_MUST_MATCH) EIGEN_STATIC_ASSERT((internal::is_same::StorageKind, typename internal::traits::StorageKind>::value), STORAGE_KIND_MUST_MATCH) EIGEN_STATIC_ASSERT((internal::is_same::Index, typename internal::traits::Index>::value), STORAGE_INDEX_MUST_MATCH) EIGEN_STATIC_ASSERT((internal::is_same::Index, typename internal::traits::Index>::value), STORAGE_INDEX_MUST_MATCH) eigen_assert(dimensions_match(m_arg1Impl.dimensions(), m_arg2Impl.dimensions()) && dimensions_match(m_arg1Impl.dimensions(), m_arg3Impl.dimensions())); } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename internal::traits::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { // TODO: use arg2 or arg3 dimensions if they are known at compile time. return m_arg1Impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_arg1Impl.evalSubExprsIfNeeded(NULL); m_arg2Impl.evalSubExprsIfNeeded(NULL); m_arg3Impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_arg1Impl.cleanup(); m_arg2Impl.cleanup(); m_arg3Impl.cleanup(); } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_functor(m_arg1Impl.coeff(index), m_arg2Impl.coeff(index), m_arg3Impl.coeff(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_functor.packetOp(m_arg1Impl.template packet(index), m_arg2Impl.template packet(index), m_arg3Impl.template packet(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double functor_cost = internal::functor_traits::Cost; return m_arg1Impl.costPerCoeff(vectorized) + m_arg2Impl.costPerCoeff(vectorized) + m_arg3Impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, functor_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_arg1Impl.bind(cgh); m_arg2Impl.bind(cgh); m_arg3Impl.bind(cgh); } #endif private: const TernaryOp m_functor; TensorEvaluator m_arg1Impl; TensorEvaluator m_arg2Impl; TensorEvaluator m_arg3Impl; }; // -------------------- SelectOp -------------------- template struct TensorEvaluator, Device> { typedef TensorSelectOp XprType; typedef typename XprType::Scalar Scalar; enum { IsAligned = TensorEvaluator::IsAligned & TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess & TensorEvaluator::PacketAccess & PacketType::HasBlend, BlockAccess = TensorEvaluator::BlockAccess && TensorEvaluator::BlockAccess && TensorEvaluator::BlockAccess, PreferBlockAccess = TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; TensorEvaluator(const XprType& op, const Device& device) : m_condImpl(op.ifExpression(), device), m_thenImpl(op.thenExpression(), device), m_elseImpl(op.elseExpression(), device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(dimensions_match(m_condImpl.dimensions(), m_thenImpl.dimensions())); eigen_assert(dimensions_match(m_thenImpl.dimensions(), m_elseImpl.dimensions())); } typedef typename XprType::Index Index; typedef typename internal::traits::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename TensorEvaluator::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; static const int NumDims = internal::array_size::value; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock IfArgTensorBlock; typedef typename TensorEvaluator::TensorBlock ThenArgTensorBlock; typedef typename TensorEvaluator::TensorBlock ElseArgTensorBlock; struct TensorSelectOpBlockFactory { template struct XprType { typedef TensorSelectOp type; }; template typename XprType::type expr( const IfArgXprType& if_expr, const ThenArgXprType& then_expr, const ElseArgXprType& else_expr) const { return typename XprType::type(if_expr, then_expr, else_expr); } }; typedef internal::TensorTernaryExprBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { // TODO: use then or else impl instead if they happen to be known at compile time. return m_condImpl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_condImpl.evalSubExprsIfNeeded(NULL); m_thenImpl.evalSubExprsIfNeeded(NULL); m_elseImpl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_condImpl.evalSubExprsIfNeeded(nullptr, [this, done](bool) { m_thenImpl.evalSubExprsIfNeeded(nullptr, [this, done](bool) { m_elseImpl.evalSubExprsIfNeeded(nullptr, [done](bool) { done(true); }); }); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_condImpl.cleanup(); m_thenImpl.cleanup(); m_elseImpl.cleanup(); } EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const { return m_condImpl.coeff(index) ? m_thenImpl.coeff(index) : m_elseImpl.coeff(index); } template EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const { internal::Selector select; EIGEN_UNROLL_LOOP for (Index i = 0; i < PacketSize; ++i) { select.select[i] = m_condImpl.coeff(index+i); } return internal::pblend(select, m_thenImpl.template packet(index), m_elseImpl.template packet(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_condImpl.costPerCoeff(vectorized) + m_thenImpl.costPerCoeff(vectorized) .cwiseMax(m_elseImpl.costPerCoeff(vectorized)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { auto then_req = m_thenImpl.getResourceRequirements(); auto else_req = m_elseImpl.getResourceRequirements(); auto merged_req = internal::TensorBlockResourceRequirements::merge(then_req, else_req); merged_req.cost_per_coeff = then_req.cost_per_coeff.cwiseMax(else_req.cost_per_coeff); return internal::TensorBlockResourceRequirements::merge( m_condImpl.getResourceRequirements(), merged_req); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { // It's unsafe to pass destination buffer to underlying expressions, because // output might be aliased with one of the inputs. desc.DropDestinationBuffer(); return TensorBlock( m_condImpl.block(desc, scratch), m_thenImpl.block(desc, scratch), m_elseImpl.block(desc, scratch), TensorSelectOpBlockFactory()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_condImpl.bind(cgh); m_thenImpl.bind(cgh); m_elseImpl.bind(cgh); } #endif private: TensorEvaluator m_condImpl; TensorEvaluator m_thenImpl; TensorEvaluator m_elseImpl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGpuHipCudaDefines.h0000644000176200001440000000774414562417221027665 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2018 Deven Desai // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #if defined(EIGEN_USE_GPU) && !defined(EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H) #define EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H // Note that we are using EIGEN_USE_HIP here instead of EIGEN_HIPCC...this is by design // There is code in the Tensorflow codebase that will define EIGEN_USE_GPU, but // for some reason gets sent to the gcc/host compiler instead of the gpu/nvcc/hipcc compiler // When compiling such files, gcc will end up trying to pick up the CUDA headers by // default (see the code within "unsupported/Eigen/CXX11/Tensor" that is guarded by EIGEN_USE_GPU) // This will obviously not work when trying to compile tensorflow on a system with no CUDA // To work around this issue for HIP systems (and leave the default behaviour intact), the // HIP tensorflow build defines EIGEN_USE_HIP when compiling all source files, and // "unsupported/Eigen/CXX11/Tensor" has been updated to use HIP header when EIGEN_USE_HIP is // defined. In continuation of that requirement, the guard here needs to be EIGEN_USE_HIP as well #if defined(EIGEN_USE_HIP) #define gpuStream_t hipStream_t #define gpuDeviceProp_t hipDeviceProp_t #define gpuError_t hipError_t #define gpuSuccess hipSuccess #define gpuErrorNotReady hipErrorNotReady #define gpuGetDeviceCount hipGetDeviceCount #define gpuGetLastError hipGetLastError #define gpuPeekAtLastError hipPeekAtLastError #define gpuGetErrorName hipGetErrorName #define gpuGetErrorString hipGetErrorString #define gpuGetDeviceProperties hipGetDeviceProperties #define gpuStreamDefault hipStreamDefault #define gpuGetDevice hipGetDevice #define gpuSetDevice hipSetDevice #define gpuMalloc hipMalloc #define gpuFree hipFree #define gpuMemsetAsync hipMemsetAsync #define gpuMemcpyAsync hipMemcpyAsync #define gpuMemcpyDeviceToDevice hipMemcpyDeviceToDevice #define gpuMemcpyDeviceToHost hipMemcpyDeviceToHost #define gpuMemcpyHostToDevice hipMemcpyHostToDevice #define gpuStreamQuery hipStreamQuery #define gpuSharedMemConfig hipSharedMemConfig #define gpuDeviceSetSharedMemConfig hipDeviceSetSharedMemConfig #define gpuStreamSynchronize hipStreamSynchronize #define gpuDeviceSynchronize hipDeviceSynchronize #define gpuMemcpy hipMemcpy #else #define gpuStream_t cudaStream_t #define gpuDeviceProp_t cudaDeviceProp #define gpuError_t cudaError_t #define gpuSuccess cudaSuccess #define gpuErrorNotReady cudaErrorNotReady #define gpuGetDeviceCount cudaGetDeviceCount #define gpuGetLastError cudaGetLastError #define gpuPeekAtLastError cudaPeekAtLastError #define gpuGetErrorName cudaGetErrorName #define gpuGetErrorString cudaGetErrorString #define gpuGetDeviceProperties cudaGetDeviceProperties #define gpuStreamDefault cudaStreamDefault #define gpuGetDevice cudaGetDevice #define gpuSetDevice cudaSetDevice #define gpuMalloc cudaMalloc #define gpuFree cudaFree #define gpuMemsetAsync cudaMemsetAsync #define gpuMemcpyAsync cudaMemcpyAsync #define gpuMemcpyDeviceToDevice cudaMemcpyDeviceToDevice #define gpuMemcpyDeviceToHost cudaMemcpyDeviceToHost #define gpuMemcpyHostToDevice cudaMemcpyHostToDevice #define gpuStreamQuery cudaStreamQuery #define gpuSharedMemConfig cudaSharedMemConfig #define gpuDeviceSetSharedMemConfig cudaDeviceSetSharedMemConfig #define gpuStreamSynchronize cudaStreamSynchronize #define gpuDeviceSynchronize cudaDeviceSynchronize #define gpuMemcpy cudaMemcpy #endif // gpu_assert can be overridden #ifndef gpu_assert #if defined(EIGEN_HIP_DEVICE_COMPILE) // HIPCC do not support the use of assert on the GPU side. #define gpu_assert(COND) #else #define gpu_assert(COND) assert(COND) #endif #endif // gpu_assert #endif // EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h0000644000176200001440000003564514562417221026202 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H #define EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H namespace Eigen { namespace internal { /** \internal * \brief Template functor to compute the modulo between an array and a scalar. */ template struct scalar_mod_op { EIGEN_DEVICE_FUNC scalar_mod_op(const Scalar& divisor) : m_divisor(divisor) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a) const { return a % m_divisor; } const Scalar m_divisor; }; template struct functor_traits > { enum { Cost = scalar_div_cost::value, PacketAccess = false }; }; /** \internal * \brief Template functor to compute the modulo between 2 arrays. */ template struct scalar_mod2_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_mod2_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a, const Scalar& b) const { return a % b; } }; template struct functor_traits > { enum { Cost = scalar_div_cost::value, PacketAccess = false }; }; template struct scalar_fmod_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_fmod_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const Scalar& a, const Scalar& b) const { return numext::fmod(a, b); } }; template struct functor_traits > { enum { Cost = 13, // Reciprocal throughput of FPREM on Haswell. PacketAccess = false }; }; template struct reducer_traits { enum { Cost = 1, PacketAccess = false, IsStateful = false, IsExactlyAssociative = true }; }; // Standard reduction functors template struct SumReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const { internal::scalar_sum_op sum_op; *accum = sum_op(*accum, t); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const { (*accum) = padd(*accum, p); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { internal::scalar_cast_op conv; return conv(0); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const { return pset1(initialize()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const { return accum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const { return vaccum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const { internal::scalar_sum_op sum_op; return sum_op(saccum, predux(vaccum)); } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = PacketType::HasAdd, IsStateful = false, IsExactlyAssociative = NumTraits::IsInteger }; }; template struct MeanReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE MeanReducer() : scalarCount_(0), packetCount_(0) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) { internal::scalar_sum_op sum_op; *accum = sum_op(*accum, t); scalarCount_++; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) { (*accum) = padd(*accum, p); packetCount_++; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { internal::scalar_cast_op conv; return conv(0); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const { return pset1(initialize()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const { internal::scalar_quotient_op quotient_op; return quotient_op(accum, T(scalarCount_)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const { return pdiv(vaccum, pset1(T(packetCount_))); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const { internal::scalar_sum_op sum_op; internal::scalar_quotient_op quotient_op; return quotient_op( sum_op(saccum, predux(vaccum)), T(scalarCount_ + packetCount_ * unpacket_traits::size)); } protected: DenseIndex scalarCount_; DenseIndex packetCount_; }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = PacketType::HasAdd && PacketType::HasDiv && !NumTraits::IsInteger, IsStateful = true, IsExactlyAssociative = NumTraits::IsInteger }; }; template struct MinMaxBottomValue { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T bottom_value() { return Eigen::NumTraits::lowest(); } }; template struct MinMaxBottomValue { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T bottom_value() { return -Eigen::NumTraits::infinity(); } }; template struct MinMaxBottomValue { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T bottom_value() { return Eigen::NumTraits::highest(); } }; template struct MinMaxBottomValue { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE T bottom_value() { return Eigen::NumTraits::infinity(); } }; template struct MaxReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const { scalar_max_op op; *accum = op(t, *accum); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const { scalar_max_op op; (*accum) = op.packetOp(*accum, p); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { return MinMaxBottomValue::IsInteger>::bottom_value(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const { return pset1(initialize()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const { return accum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const { return vaccum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const { scalar_max_op op; return op(saccum, op.predux(vaccum)); } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = PacketType::HasMax, IsStateful = false, IsExactlyAssociative = (NaNPropagation!=PropagateFast) }; }; template struct MinReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const { scalar_min_op op; *accum = op(t, *accum); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const { scalar_min_op op; (*accum) = op.packetOp(*accum, p); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { return MinMaxBottomValue::IsInteger>::bottom_value(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const { return pset1(initialize()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const { return accum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const { return vaccum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const { scalar_min_op op; return op(saccum, op.predux(vaccum)); } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = PacketType::HasMin, IsStateful = false, IsExactlyAssociative = (NaNPropagation!=PropagateFast) }; }; template struct ProdReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const { internal::scalar_product_op prod_op; (*accum) = prod_op(*accum, t); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const { (*accum) = pmul(*accum, p); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { internal::scalar_cast_op conv; return conv(1); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const { return pset1(initialize()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const { return accum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const { return vaccum; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const { internal::scalar_product_op prod_op; return prod_op(saccum, predux_mul(vaccum)); } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::MulCost, PacketAccess = PacketType::HasMul, IsStateful = false, IsExactlyAssociative = true }; }; struct AndReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(bool t, bool* accum) const { *accum = *accum && t; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool initialize() const { return true; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool finalize(bool accum) const { return accum; } }; template struct reducer_traits { enum { Cost = 1, PacketAccess = false, IsStateful = false, IsExactlyAssociative = true }; }; struct OrReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(bool t, bool* accum) const { *accum = *accum || t; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool initialize() const { return false; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool finalize(bool accum) const { return accum; } }; template struct reducer_traits { enum { Cost = 1, PacketAccess = false, IsStateful = false, IsExactlyAssociative = true }; }; // Argmin/Argmax reducers. Returns the first occurrence if multiple locations // contain the same min/max value. template struct ArgMaxTupleReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const { if (t.second < accum->second) { return; } else if (t.second > accum->second || accum->first > t.first ) { *accum = t; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { return T(0, NumTraits::lowest()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T& accum) const { return accum; } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = false, IsStateful = false, IsExactlyAssociative = true }; }; template struct ArgMinTupleReducer { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T& t, T* accum) const { if (t.second > accum->second) { return; } else if (t.second < accum->second || accum->first > t.first) { *accum = t; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const { return T(0, NumTraits::highest()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T& accum) const { return accum; } }; template struct reducer_traits, Device> { enum { Cost = NumTraits::AddCost, PacketAccess = false, IsStateful = false, IsExactlyAssociative = true }; }; template class GaussianGenerator { public: static const bool PacketAccess = false; EIGEN_DEVICE_FUNC GaussianGenerator(const array& means, const array& std_devs) : m_means(means) { EIGEN_UNROLL_LOOP for (size_t i = 0; i < NumDims; ++i) { m_two_sigmas[i] = std_devs[i] * std_devs[i] * 2; } } EIGEN_DEVICE_FUNC T operator()(const array& coordinates) const { T tmp = T(0); EIGEN_UNROLL_LOOP for (size_t i = 0; i < NumDims; ++i) { T offset = coordinates[i] - m_means[i]; tmp += offset * offset / m_two_sigmas[i]; } return numext::exp(-tmp); } private: array m_means; array m_two_sigmas; }; template struct functor_traits > { enum { Cost = NumDims * (2 * NumTraits::AddCost + NumTraits::MulCost + functor_traits >::Cost) + functor_traits >::Cost, PacketAccess = GaussianGenerator::PacketAccess }; }; template struct scalar_clamp_op { EIGEN_DEVICE_FUNC inline scalar_clamp_op(const Scalar& _min, const Scalar& _max) : m_min(_min), m_max(_max) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(const Scalar& x) const { return numext::mini(numext::maxi(x, m_min), m_max); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& x) const { return internal::pmin(internal::pmax(x, pset1(m_min)), pset1(m_max)); } const Scalar m_min; const Scalar m_max; }; template struct functor_traits > { enum { Cost = 2 * NumTraits::AddCost, PacketAccess = (packet_traits::HasMin && packet_traits::HasMax)}; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h0000644000176200001440000003646114562417221027161 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H #define EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H namespace Eigen { /** \class TensorConcatenationOp * \ingroup CXX11_Tensor_Module * * \brief Tensor concatenation class. * * */ namespace internal { template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef typename promote_storage_type::ret Scalar; typedef typename promote_storage_type::StorageKind, typename traits::StorageKind>::ret StorageKind; typedef typename promote_index_type::Index, typename traits::Index>::type Index; typedef typename LhsXprType::Nested LhsNested; typedef typename RhsXprType::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; static const int NumDimensions = traits::NumDimensions; static const int Layout = traits::Layout; enum { Flags = 0 }; typedef typename conditional::val, typename traits::PointerType, typename traits::PointerType>::type PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorConcatenationOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorConcatenationOp type; }; } // end namespace internal template class TensorConcatenationOp : public TensorBase, WriteAccessors> { public: typedef TensorBase, WriteAccessors> Base; typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef typename internal::nested::type Nested; typedef typename internal::promote_storage_type::ret CoeffReturnType; typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConcatenationOp(const LhsXprType& lhs, const RhsXprType& rhs, Axis axis) : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_axis(axis) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& lhsExpression() const { return m_lhs_xpr; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& rhsExpression() const { return m_rhs_xpr; } EIGEN_DEVICE_FUNC const Axis& axis() const { return m_axis; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorConcatenationOp) protected: typename LhsXprType::Nested m_lhs_xpr; typename RhsXprType::Nested m_rhs_xpr; const Axis m_axis; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorConcatenationOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; static const int RightNumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess && TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_leftImpl(op.lhsExpression(), device), m_rightImpl(op.rhsExpression(), device), m_axis(op.axis()) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout) || NumDims == 1), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((NumDims == RightNumDims), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((NumDims > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(0 <= m_axis && m_axis < NumDims); const Dimensions& lhs_dims = m_leftImpl.dimensions(); const Dimensions& rhs_dims = m_rightImpl.dimensions(); { int i = 0; for (; i < m_axis; ++i) { eigen_assert(lhs_dims[i] > 0); eigen_assert(lhs_dims[i] == rhs_dims[i]); m_dimensions[i] = lhs_dims[i]; } eigen_assert(lhs_dims[i] > 0); // Now i == m_axis. eigen_assert(rhs_dims[i] > 0); m_dimensions[i] = lhs_dims[i] + rhs_dims[i]; for (++i; i < NumDims; ++i) { eigen_assert(lhs_dims[i] > 0); eigen_assert(lhs_dims[i] == rhs_dims[i]); m_dimensions[i] = lhs_dims[i]; } } if (static_cast(Layout) == static_cast(ColMajor)) { m_leftStrides[0] = 1; m_rightStrides[0] = 1; m_outputStrides[0] = 1; for (int j = 1; j < NumDims; ++j) { m_leftStrides[j] = m_leftStrides[j-1] * lhs_dims[j-1]; m_rightStrides[j] = m_rightStrides[j-1] * rhs_dims[j-1]; m_outputStrides[j] = m_outputStrides[j-1] * m_dimensions[j-1]; } } else { m_leftStrides[NumDims - 1] = 1; m_rightStrides[NumDims - 1] = 1; m_outputStrides[NumDims - 1] = 1; for (int j = NumDims - 2; j >= 0; --j) { m_leftStrides[j] = m_leftStrides[j+1] * lhs_dims[j+1]; m_rightStrides[j] = m_rightStrides[j+1] * rhs_dims[j+1]; m_outputStrides[j] = m_outputStrides[j+1] * m_dimensions[j+1]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } // TODO(phli): Add short-circuit memcpy evaluation if underlying data are linear? EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_leftImpl.evalSubExprsIfNeeded(NULL); m_rightImpl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_leftImpl.cleanup(); m_rightImpl.cleanup(); } // TODO(phli): attempt to speed this up. The integer divisions and modulo are slow. // See CL/76180724 comments for more ideas. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { // Collect dimension-wise indices (subs). array subs; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumDims - 1; i > 0; --i) { subs[i] = index / m_outputStrides[i]; index -= subs[i] * m_outputStrides[i]; } subs[0] = index; } else { for (int i = 0; i < NumDims - 1; ++i) { subs[i] = index / m_outputStrides[i]; index -= subs[i] * m_outputStrides[i]; } subs[NumDims - 1] = index; } const Dimensions& left_dims = m_leftImpl.dimensions(); if (subs[m_axis] < left_dims[m_axis]) { Index left_index; if (static_cast(Layout) == static_cast(ColMajor)) { left_index = subs[0]; EIGEN_UNROLL_LOOP for (int i = 1; i < NumDims; ++i) { left_index += (subs[i] % left_dims[i]) * m_leftStrides[i]; } } else { left_index = subs[NumDims - 1]; EIGEN_UNROLL_LOOP for (int i = NumDims - 2; i >= 0; --i) { left_index += (subs[i] % left_dims[i]) * m_leftStrides[i]; } } return m_leftImpl.coeff(left_index); } else { subs[m_axis] -= left_dims[m_axis]; const Dimensions& right_dims = m_rightImpl.dimensions(); Index right_index; if (static_cast(Layout) == static_cast(ColMajor)) { right_index = subs[0]; EIGEN_UNROLL_LOOP for (int i = 1; i < NumDims; ++i) { right_index += (subs[i] % right_dims[i]) * m_rightStrides[i]; } } else { right_index = subs[NumDims - 1]; EIGEN_UNROLL_LOOP for (int i = NumDims - 2; i >= 0; --i) { right_index += (subs[i] % right_dims[i]) * m_rightStrides[i]; } } return m_rightImpl.coeff(right_index); } } // TODO(phli): Add a real vectorization. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { const int packetSize = PacketType::size; EIGEN_STATIC_ASSERT((packetSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index + packetSize - 1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX CoeffReturnType values[packetSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < packetSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost() + TensorOpCost::ModCost()); const double lhs_size = m_leftImpl.dimensions().TotalSize(); const double rhs_size = m_rightImpl.dimensions().TotalSize(); return (lhs_size / (lhs_size + rhs_size)) * m_leftImpl.costPerCoeff(vectorized) + (rhs_size / (lhs_size + rhs_size)) * m_rightImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_leftImpl.bind(cgh); m_rightImpl.bind(cgh); } #endif protected: Dimensions m_dimensions; array m_outputStrides; array m_leftStrides; array m_rightStrides; TensorEvaluator m_leftImpl; TensorEvaluator m_rightImpl; const Axis m_axis; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorConcatenationOp XprType; typedef typename Base::Dimensions Dimensions; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess && TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess || TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(XprType& op, const Device& device) : Base(op, device) { EIGEN_STATIC_ASSERT((static_cast(Layout) == static_cast(ColMajor)), YOU_MADE_A_PROGRAMMING_MISTAKE); } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { // Collect dimension-wise indices (subs). array subs; for (int i = Base::NumDims - 1; i > 0; --i) { subs[i] = index / this->m_outputStrides[i]; index -= subs[i] * this->m_outputStrides[i]; } subs[0] = index; const Dimensions& left_dims = this->m_leftImpl.dimensions(); if (subs[this->m_axis] < left_dims[this->m_axis]) { Index left_index = subs[0]; for (int i = 1; i < Base::NumDims; ++i) { left_index += (subs[i] % left_dims[i]) * this->m_leftStrides[i]; } return this->m_leftImpl.coeffRef(left_index); } else { subs[this->m_axis] -= left_dims[this->m_axis]; const Dimensions& right_dims = this->m_rightImpl.dimensions(); Index right_index = subs[0]; for (int i = 1; i < Base::NumDims; ++i) { right_index += (subs[i] % right_dims[i]) * this->m_rightStrides[i]; } return this->m_rightImpl.coeffRef(right_index); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { const int packetSize = PacketType::size; EIGEN_STATIC_ASSERT((packetSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index + packetSize - 1 < this->dimensions().TotalSize()); EIGEN_ALIGN_MAX CoeffReturnType values[packetSize]; internal::pstore(values, x); for (int i = 0; i < packetSize; ++i) { coeffRef(index+i) = values[i]; } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorUInt128.h0000644000176200001440000001660014562417221025477 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_UINT128_H #define EIGEN_CXX11_TENSOR_TENSOR_UINT128_H namespace Eigen { namespace internal { template struct static_val { static const uint64_t value = n; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE operator uint64_t() const { return n; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static_val() { } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static_val(const T& v) { EIGEN_UNUSED_VARIABLE(v); eigen_assert(v == n); } }; template struct TensorUInt128 { HIGH high; LOW low; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TensorUInt128(const TensorUInt128& other) : high(other.high), low(other.low) { EIGEN_STATIC_ASSERT(sizeof(OTHER_HIGH) <= sizeof(HIGH), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT(sizeof(OTHER_LOW) <= sizeof(LOW), YOU_MADE_A_PROGRAMMING_MISTAKE); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TensorUInt128& operator = (const TensorUInt128& other) { EIGEN_STATIC_ASSERT(sizeof(OTHER_HIGH) <= sizeof(HIGH), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT(sizeof(OTHER_LOW) <= sizeof(LOW), YOU_MADE_A_PROGRAMMING_MISTAKE); high = other.high; low = other.low; return *this; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE explicit TensorUInt128(const T& x) : high(0), low(x) { eigen_assert((static_cast::type>(x) <= NumTraits::highest())); eigen_assert(x >= 0); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TensorUInt128(HIGH y, LOW x) : high(y), low(x) { } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE operator LOW() const { return low; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE LOW lower() const { return low; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE HIGH upper() const { return high; } }; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool operator == (const TensorUInt128& lhs, const TensorUInt128& rhs) { return (lhs.high == rhs.high) & (lhs.low == rhs.low); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool operator != (const TensorUInt128& lhs, const TensorUInt128& rhs) { return (lhs.high != rhs.high) | (lhs.low != rhs.low); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool operator >= (const TensorUInt128& lhs, const TensorUInt128& rhs) { if (lhs.high != rhs.high) { return lhs.high > rhs.high; } return lhs.low >= rhs.low; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool operator < (const TensorUInt128& lhs, const TensorUInt128& rhs) { if (lhs.high != rhs.high) { return lhs.high < rhs.high; } return lhs.low < rhs.low; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TensorUInt128 operator + (const TensorUInt128& lhs, const TensorUInt128& rhs) { TensorUInt128 result(lhs.high + rhs.high, lhs.low + rhs.low); if (result.low < rhs.low) { result.high += 1; } return result; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TensorUInt128 operator - (const TensorUInt128& lhs, const TensorUInt128& rhs) { TensorUInt128 result(lhs.high - rhs.high, lhs.low - rhs.low); if (result.low > lhs.low) { result.high -= 1; } return result; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorUInt128 operator * (const TensorUInt128& lhs, const TensorUInt128& rhs) { // Split each 128-bit integer into 4 32-bit integers, and then do the // multiplications by hand as follow: // lhs a b c d // rhs e f g h // ----------- // ah bh ch dh // bg cg dg // cf df // de // The result is stored in 2 64bit integers, high and low. const uint64_t LOW = 0x00000000FFFFFFFFLL; const uint64_t HIGH = 0xFFFFFFFF00000000LL; uint64_t d = lhs.low & LOW; uint64_t c = (lhs.low & HIGH) >> 32LL; uint64_t b = lhs.high & LOW; uint64_t a = (lhs.high & HIGH) >> 32LL; uint64_t h = rhs.low & LOW; uint64_t g = (rhs.low & HIGH) >> 32LL; uint64_t f = rhs.high & LOW; uint64_t e = (rhs.high & HIGH) >> 32LL; // Compute the low 32 bits of low uint64_t acc = d * h; uint64_t low = acc & LOW; // Compute the high 32 bits of low. Add a carry every time we wrap around acc >>= 32LL; uint64_t carry = 0; uint64_t acc2 = acc + c * h; if (acc2 < acc) { carry++; } acc = acc2 + d * g; if (acc < acc2) { carry++; } low |= (acc << 32LL); // Carry forward the high bits of acc to initiate the computation of the // low 32 bits of high acc2 = (acc >> 32LL) | (carry << 32LL); carry = 0; acc = acc2 + b * h; if (acc < acc2) { carry++; } acc2 = acc + c * g; if (acc2 < acc) { carry++; } acc = acc2 + d * f; if (acc < acc2) { carry++; } uint64_t high = acc & LOW; // Start to compute the high 32 bits of high. acc2 = (acc >> 32LL) | (carry << 32LL); acc = acc2 + a * h; acc2 = acc + b * g; acc = acc2 + c * f; acc2 = acc + d * e; high |= (acc2 << 32LL); return TensorUInt128(high, low); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorUInt128 operator / (const TensorUInt128& lhs, const TensorUInt128& rhs) { if (rhs == TensorUInt128, static_val<1> >(1)) { return TensorUInt128(lhs.high, lhs.low); } else if (lhs < rhs) { return TensorUInt128(0); } else { // calculate the biggest power of 2 times rhs that's less than or equal to lhs TensorUInt128 power2(1); TensorUInt128 d(rhs); TensorUInt128 tmp(lhs - d); while (lhs >= d) { tmp = tmp - d; d = d + d; power2 = power2 + power2; } tmp = TensorUInt128(lhs.high, lhs.low); TensorUInt128 result(0); while (power2 != TensorUInt128, static_val<0> >(0)) { if (tmp >= d) { tmp = tmp - d; result = result + power2; } // Shift right power2 = TensorUInt128(power2.high >> 1, (power2.low >> 1) | (power2.high << 63)); d = TensorUInt128(d.high >> 1, (d.low >> 1) | (d.high << 63)); } return result; } } } // namespace internal } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_UINT128_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h0000644000176200001440000000034114562417221027440 0ustar liggesusers #if defined(__clang__) || defined(__GNUC__) #warning "Deprecated header file, please either include the main Eigen/CXX11/Tensor header or the respective TensorContractionGpu.h file" #endif #include "TensorContractionGpu.h" RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h0000644000176200001440000001144014562417221025561 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H #define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H namespace Eigen { /** \class TensorDevice * \ingroup CXX11_Tensor_Module * * \brief Pseudo expression providing an operator = that will evaluate its argument * on the specified computing 'device' (GPU, thread pool, ...) * * Example: * C.device(EIGEN_GPU) = A + B; * * Todo: operator *= and /=. */ template class TensorDevice { public: TensorDevice(const DeviceType& device, ExpressionType& expression) : m_device(device), m_expression(expression) {} EIGEN_DEFAULT_COPY_CONSTRUCTOR(TensorDevice) template EIGEN_STRONG_INLINE TensorDevice& operator=(const OtherDerived& other) { typedef TensorAssignOp Assign; Assign assign(m_expression, other); internal::TensorExecutor::run(assign, m_device); return *this; } template EIGEN_STRONG_INLINE TensorDevice& operator+=(const OtherDerived& other) { typedef typename OtherDerived::Scalar Scalar; typedef TensorCwiseBinaryOp, const ExpressionType, const OtherDerived> Sum; Sum sum(m_expression, other); typedef TensorAssignOp Assign; Assign assign(m_expression, sum); internal::TensorExecutor::run(assign, m_device); return *this; } template EIGEN_STRONG_INLINE TensorDevice& operator-=(const OtherDerived& other) { typedef typename OtherDerived::Scalar Scalar; typedef TensorCwiseBinaryOp, const ExpressionType, const OtherDerived> Difference; Difference difference(m_expression, other); typedef TensorAssignOp Assign; Assign assign(m_expression, difference); internal::TensorExecutor::run(assign, m_device); return *this; } protected: const DeviceType& m_device; ExpressionType& m_expression; }; /** \class TensorAsyncDevice * \ingroup CXX11_Tensor_Module * * \brief Pseudo expression providing an operator = that will evaluate its * argument asynchronously on the specified device. Currently only * ThreadPoolDevice implements proper asynchronous execution, while the default * and GPU devices just run the expression synchronously and call m_done() on * completion.. * * Example: * auto done = []() { ... expression evaluation done ... }; * C.device(thread_pool_device, std::move(done)) = A + B; */ template class TensorAsyncDevice { public: TensorAsyncDevice(const DeviceType& device, ExpressionType& expression, DoneCallback done) : m_device(device), m_expression(expression), m_done(std::move(done)) {} template EIGEN_STRONG_INLINE TensorAsyncDevice& operator=(const OtherDerived& other) { typedef TensorAssignOp Assign; typedef internal::TensorExecutor Executor; Assign assign(m_expression, other); Executor::run(assign, m_device); m_done(); return *this; } protected: const DeviceType& m_device; ExpressionType& m_expression; DoneCallback m_done; }; #ifdef EIGEN_USE_THREADS template class TensorAsyncDevice { public: TensorAsyncDevice(const ThreadPoolDevice& device, ExpressionType& expression, DoneCallback done) : m_device(device), m_expression(expression), m_done(std::move(done)) {} template EIGEN_STRONG_INLINE TensorAsyncDevice& operator=(const OtherDerived& other) { typedef TensorAssignOp Assign; typedef internal::TensorAsyncExecutor Executor; // WARNING: After assignment 'm_done' callback will be in undefined state. Assign assign(m_expression, other); Executor::runAsync(assign, m_device, std::move(m_done)); return *this; } protected: const ThreadPoolDevice& m_device; ExpressionType& m_expression; DoneCallback m_done; }; #endif } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h0000644000176200001440000012442414562417221026154 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H #define EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H namespace Eigen { /** \class TensorReshaping * \ingroup CXX11_Tensor_Module * * \brief Tensor reshaping class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = array_size::value; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorReshapingOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorReshapingOp type; }; } // end namespace internal template class TensorReshapingOp : public TensorBase, WriteAccessors> { public: typedef TensorBase, WriteAccessors> Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReshapingOp(const XprType& expr, const NewDimensions& dims) : m_xpr(expr), m_dims(dims) {} EIGEN_DEVICE_FUNC const NewDimensions& dimensions() const { return m_dims; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorReshapingOp) protected: typename XprType::Nested m_xpr; const NewDimensions m_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorReshapingOp XprType; typedef NewDimensions Dimensions; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef StorageMemory::type, Device> ConstCastStorage; static const int NumOutputDims = internal::array_size::value; static const int NumInputDims = internal::array_size::Dimensions>::value; enum ReshapingKind { // We do not use layout information to determine reshaping kind. // Depending on the layout `N` can be inner or outer dimension. OneByN = 0, // expr.reshape(1, N) NByOne = 1, // expr.reshape(N, 1) Runtime = 2 // Reshape dimensions are dynamic (specified at runtime). }; // clang-format off static const ReshapingKind kind = #if defined(EIGEN_HAS_INDEX_LIST) (NumOutputDims == 2 && internal::index_statically_eq(/*index=*/0, /*value=*/1)) ? OneByN : (NumOutputDims == 2 && internal::index_statically_eq(/*index=*/1, /*value=*/1)) ? NByOne : Runtime; #else Runtime; #endif // clang-format on enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, // For trivial reshapes with raw access to underlying data we will provide // zero overhead block access. // TODO(ezhulenev): Consider adding block access without raw access? BlockAccess = TensorEvaluator::RawAccess && NumInputDims > 0 && NumOutputDims > 0, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = TensorEvaluator::RawAccess }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_dimensions(op.dimensions()) { // The total size of the reshaped tensor must be equal to the total size // of the input tensor. eigen_assert(internal::array_prod(m_impl.dimensions()) == internal::array_prod(op.dimensions())); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType data, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(data, std::move(done)); } #endif EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { return m_impl.evalSubExprsIfNeeded(data); } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_impl.coeff(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return m_impl.template packet(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_impl.costPerCoeff(vectorized); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return internal::TensorBlockResourceRequirements::any(); } // required in block(OutputTensorBlock* output_block) const // For C++03 compatibility this must be defined outside the method struct BlockIteratorState { Index stride; Index span; Index size; Index count; }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { eigen_assert(m_impl.data() != NULL); eigen_assert((kind == Runtime) || (kind == OneByN && desc.dimensions()[0] == 1) || (kind == NByOne && desc.dimensions()[1] == 1)); if (kind == OneByN || kind == NByOne) { // We can guarantee at compile time that block is just a contiguous slice // of the underlying expression memory buffer. return TensorBlock(internal::TensorBlockKind::kView, m_impl.data() + desc.offset(), desc.dimensions()); } else { // This will do additional runtime checks, and in the end it might be also // a view, or it might be a block materialized in the temporary buffer. return TensorBlock::materialize(m_impl.data(), m_dimensions, desc, scratch); } } EIGEN_DEVICE_FUNC typename Storage::Type data() const { return constCast(m_impl.data()); } EIGEN_DEVICE_FUNC const TensorEvaluator& impl() const { return m_impl; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: TensorEvaluator m_impl; NewDimensions m_dimensions; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorReshapingOp XprType; typedef NewDimensions Dimensions; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::RawAccess, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = TensorEvaluator::RawAccess }; EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; //===--------------------------------------------------------------------===// EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { return this->m_impl.coeffRef(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { this->m_impl.template writePacket(index, x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock( const TensorBlockDesc& desc, const TensorBlock& block) { assert(this->m_impl.data() != NULL); typedef typename TensorBlock::XprType TensorBlockExpr; typedef internal::TensorBlockAssignment< Scalar, TensorEvaluator::NumOutputDims, TensorBlockExpr, Index> TensorBlockAssign; TensorBlockAssign::Run( TensorBlockAssign::target(desc.dimensions(), internal::strides(this->dimensions()), this->m_impl.data(), desc.offset()), block.expr()); } }; /** \class TensorSlicing * \ingroup CXX11_Tensor_Module * * \brief Tensor slicing class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = array_size::value; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorSlicingOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorSlicingOp type; }; } // end namespace internal template class TensorSlicingOp : public TensorBase > { public: typedef TensorBase > Base; typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorSlicingOp(const XprType& expr, const StartIndices& indices, const Sizes& sizes) : m_xpr(expr), m_indices(indices), m_sizes(sizes) {} EIGEN_DEVICE_FUNC const StartIndices& startIndices() const { return m_indices; } EIGEN_DEVICE_FUNC const Sizes& sizes() const { return m_sizes; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorSlicingOp) protected: typename XprType::Nested m_xpr; const StartIndices m_indices; const Sizes m_sizes; }; // Fixme: figure out the exact threshold namespace { template struct MemcpyTriggerForSlicing { EIGEN_DEVICE_FUNC MemcpyTriggerForSlicing(const Device& device) : threshold_(2 * device.numThreads()) { } EIGEN_DEVICE_FUNC bool operator ()(Index total, Index contiguous) const { const bool prefer_block_evaluation = BlockAccess && total > 32*1024; return !prefer_block_evaluation && contiguous > threshold_; } private: Index threshold_; }; // It is very expensive to start the memcpy kernel on GPU: we therefore only // use it for large copies. #ifdef EIGEN_USE_GPU template struct MemcpyTriggerForSlicing { EIGEN_DEVICE_FUNC MemcpyTriggerForSlicing(const GpuDevice&) { } EIGEN_DEVICE_FUNC bool operator ()(Index, Index contiguous) const { return contiguous > 4*1024*1024; } }; #endif // It is very expensive to start the memcpy kernel on GPU: we therefore only // use it for large copies. #ifdef EIGEN_USE_SYCL template struct MemcpyTriggerForSlicing { EIGEN_DEVICE_FUNC MemcpyTriggerForSlicing(const SyclDevice&) { } EIGEN_DEVICE_FUNC bool operator ()(Index, Index contiguous) const { return contiguous > 4*1024*1024; } }; #endif } // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorSlicingOp XprType; static const int NumDims = internal::array_size::value; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef Sizes Dimensions; typedef StorageMemory Storage; typedef StorageMemory::type, Device> ConstCastStorage; typedef typename Storage::Type EvaluatorPointerType; enum { // Alignment can't be guaranteed at compile time since it depends on the // slice offsets and sizes. IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::BlockAccess && // FIXME: Temporary workaround for bug in slicing of bool tensors. !internal::is_same::type, bool>::value, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; // Tensor slicing does not change the block type. typedef typename TensorEvaluator::TensorBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_device(device), m_dimensions(op.sizes()), m_offsets(op.startIndices()) { m_is_identity = true; for (int i = 0; i < internal::array_size::value; ++i) { eigen_assert(m_impl.dimensions()[i] >= op.sizes()[i] + op.startIndices()[i]); if (m_impl.dimensions()[i] != op.sizes()[i] || op.startIndices()[i] != 0) { m_is_identity = false; } } // No strides for scalars. if (NumDims == 0) return; const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); const Sizes& output_dims = op.sizes(); if (static_cast(Layout) == static_cast(ColMajor)) { m_inputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; } // Don't initialize m_fastOutputStrides[0] since it won't ever be accessed. m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStrides[i] = m_outputStrides[i-1] * output_dims[i-1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i] > 0 ? m_outputStrides[i] : 1); } } else { m_inputStrides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; } // Don't initialize m_fastOutputStrides[NumDims-1] since it won't ever be accessed. m_outputStrides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i+1] * output_dims[i+1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i] > 0 ? m_outputStrides[i] : 1); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { m_impl.evalSubExprsIfNeeded(NULL); if (!NumTraits::type>::RequireInitialization && data && m_impl.data()) { Index contiguous_values = 1; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < NumDims; ++i) { contiguous_values *= dimensions()[i]; if (dimensions()[i] != m_impl.dimensions()[i]) { break; } } } else { for (int i = NumDims-1; i >= 0; --i) { contiguous_values *= dimensions()[i]; if (dimensions()[i] != m_impl.dimensions()[i]) { break; } } } // Use memcpy if it's going to be faster than using the regular evaluation. const MemcpyTriggerForSlicing trigger(m_device); if (trigger(internal::array_prod(dimensions()), contiguous_values)) { EvaluatorPointerType src = (EvaluatorPointerType)m_impl.data(); for (Index i = 0; i < internal::array_prod(dimensions()); i += contiguous_values) { Index offset = srcCoeff(i); m_device.memcpy((void*)(m_device.get(data + i)), m_device.get(src+offset), contiguous_values * sizeof(Scalar)); } return false; } } return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType /*data*/, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { if (m_is_identity) { return m_impl.coeff(index); } else { return m_impl.coeff(srcCoeff(index)); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { const int packetSize = PacketType::size; EIGEN_STATIC_ASSERT((packetSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+packetSize-1 < internal::array_prod(dimensions())); if (m_is_identity) { return m_impl.template packet(index); } Index inputIndices[] = {0, 0}; Index indices[] = {index, index + packetSize - 1}; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx0 = indices[0] / m_fastOutputStrides[i]; const Index idx1 = indices[1] / m_fastOutputStrides[i]; inputIndices[0] += (idx0 + m_offsets[i]) * m_inputStrides[i]; inputIndices[1] += (idx1 + m_offsets[i]) * m_inputStrides[i]; indices[0] -= idx0 * m_outputStrides[i]; indices[1] -= idx1 * m_outputStrides[i]; } inputIndices[0] += (indices[0] + m_offsets[0]); inputIndices[1] += (indices[1] + m_offsets[0]); } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx0 = indices[0] / m_fastOutputStrides[i]; const Index idx1 = indices[1] / m_fastOutputStrides[i]; inputIndices[0] += (idx0 + m_offsets[i]) * m_inputStrides[i]; inputIndices[1] += (idx1 + m_offsets[i]) * m_inputStrides[i]; indices[0] -= idx0 * m_outputStrides[i]; indices[1] -= idx1 * m_outputStrides[i]; } inputIndices[0] += (indices[0] + m_offsets[NumDims-1]); inputIndices[1] += (indices[1] + m_offsets[NumDims-1]); } if (inputIndices[1] - inputIndices[0] == packetSize - 1) { PacketReturnType rslt = m_impl.template packet(inputIndices[0]); return rslt; } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[packetSize]; values[0] = m_impl.coeff(inputIndices[0]); values[packetSize-1] = m_impl.coeff(inputIndices[1]); EIGEN_UNROLL_LOOP for (int i = 1; i < packetSize-1; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, m_is_identity ? 1 : NumDims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { const size_t target_size = m_device.lastLevelCacheSize(); return internal::TensorBlockResourceRequirements::merge( internal::TensorBlockResourceRequirements::skewed(target_size), m_impl.getResourceRequirements()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { TensorBlockDesc arg_desc = desc.WithOffset(srcCoeff(desc.offset())); TensorBlock block = m_impl.block(arg_desc, scratch); if (!arg_desc.HasDestinationBuffer()) desc.DropDestinationBuffer(); return block; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Storage::Type data() const { typename Storage::Type result = constCast(m_impl.data()); if (result) { Index offset = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < NumDims; ++i) { if (m_dimensions[i] != m_impl.dimensions()[i]) { offset += m_offsets[i] * m_inputStrides[i]; for (int j = i+1; j < NumDims; ++j) { if (m_dimensions[j] > 1) { return NULL; } offset += m_offsets[j] * m_inputStrides[j]; } break; } } } else { for (int i = NumDims - 1; i >= 0; --i) { if (m_dimensions[i] != m_impl.dimensions()[i]) { offset += m_offsets[i] * m_inputStrides[i]; for (int j = i-1; j >= 0; --j) { if (m_dimensions[j] > 1) { return NULL; } offset += m_offsets[j] * m_inputStrides[j]; } break; } } } return result + offset; } return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const { Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += (idx + m_offsets[i]) * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } inputIndex += (index + m_offsets[0]); } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += (idx + m_offsets[i]) * m_inputStrides[i]; index -= idx * m_outputStrides[i]; } inputIndex += (index + m_offsets[NumDims-1]); } return inputIndex; } array m_outputStrides; array, NumDims> m_fastOutputStrides; array m_inputStrides; TensorEvaluator m_impl; const Device EIGEN_DEVICE_REF m_device; Dimensions m_dimensions; bool m_is_identity; const StartIndices m_offsets; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorSlicingOp XprType; static const int NumDims = internal::array_size::value; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef Sizes Dimensions; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::BlockAccess, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = (NumDims == 1) & TensorEvaluator::RawAccess }; typedef typename internal::remove_const::type ScalarNoConst; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { if (this->m_is_identity) { return this->m_impl.coeffRef(index); } else { return this->m_impl.coeffRef(this->srcCoeff(index)); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writePacket(Index index, const PacketReturnType& x) { if (this->m_is_identity) { this->m_impl.template writePacket(index, x); return; } const int packetSize = PacketType::size; Index inputIndices[] = {0, 0}; Index indices[] = {index, index + packetSize - 1}; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx0 = indices[0] / this->m_fastOutputStrides[i]; const Index idx1 = indices[1] / this->m_fastOutputStrides[i]; inputIndices[0] += (idx0 + this->m_offsets[i]) * this->m_inputStrides[i]; inputIndices[1] += (idx1 + this->m_offsets[i]) * this->m_inputStrides[i]; indices[0] -= idx0 * this->m_outputStrides[i]; indices[1] -= idx1 * this->m_outputStrides[i]; } inputIndices[0] += (indices[0] + this->m_offsets[0]); inputIndices[1] += (indices[1] + this->m_offsets[0]); } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx0 = indices[0] / this->m_fastOutputStrides[i]; const Index idx1 = indices[1] / this->m_fastOutputStrides[i]; inputIndices[0] += (idx0 + this->m_offsets[i]) * this->m_inputStrides[i]; inputIndices[1] += (idx1 + this->m_offsets[i]) * this->m_inputStrides[i]; indices[0] -= idx0 * this->m_outputStrides[i]; indices[1] -= idx1 * this->m_outputStrides[i]; } inputIndices[0] += (indices[0] + this->m_offsets[NumDims-1]); inputIndices[1] += (indices[1] + this->m_offsets[NumDims-1]); } if (inputIndices[1] - inputIndices[0] == packetSize - 1) { this->m_impl.template writePacket(inputIndices[0], x); } else { EIGEN_ALIGN_MAX CoeffReturnType values[packetSize]; internal::pstore(values, x); this->m_impl.coeffRef(inputIndices[0]) = values[0]; this->m_impl.coeffRef(inputIndices[1]) = values[packetSize-1]; EIGEN_UNROLL_LOOP for (int i = 1; i < packetSize-1; ++i) { this->coeffRef(index+i) = values[i]; } } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock( const TensorBlockDesc& desc, const TensorBlock& block) { TensorBlockDesc arg_desc = desc.WithOffset(this->srcCoeff(desc.offset())); this->m_impl.writeBlock(arg_desc, block); } }; namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = array_size::value; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorStridingSlicingOpEIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorStridingSlicingOp type; }; } // end namespace internal template class TensorStridingSlicingOp : public TensorBase > { public: typedef TensorBase > Base; typedef typename internal::traits::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStridingSlicingOp( const XprType& expr, const StartIndices& startIndices, const StopIndices& stopIndices, const Strides& strides) : m_xpr(expr), m_startIndices(startIndices), m_stopIndices(stopIndices), m_strides(strides) {} EIGEN_DEVICE_FUNC const StartIndices& startIndices() const { return m_startIndices; } EIGEN_DEVICE_FUNC const StartIndices& stopIndices() const { return m_stopIndices; } EIGEN_DEVICE_FUNC const StartIndices& strides() const { return m_strides; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorStridingSlicingOp) protected: typename XprType::Nested m_xpr; const StartIndices m_startIndices; const StopIndices m_stopIndices; const Strides m_strides; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorStridingSlicingOp XprType; static const int NumDims = internal::array_size::value; typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef Strides Dimensions; enum { // Alignment can't be guaranteed at compile time since it depends on the // slice offsets and sizes. IsAligned = false, PacketAccess = false, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_device(device), m_strides(op.strides()) { // Handle degenerate intervals by gracefully clamping and allowing m_dimensions to be zero DSizes startIndicesClamped, stopIndicesClamped; for (ptrdiff_t i = 0; i < internal::array_size::value; ++i) { eigen_assert(m_strides[i] != 0 && "0 stride is invalid"); if (m_strides[i] > 0) { startIndicesClamped[i] = clamp(op.startIndices()[i], 0, m_impl.dimensions()[i]); stopIndicesClamped[i] = clamp(op.stopIndices()[i], 0, m_impl.dimensions()[i]); } else { /* implies m_strides[i] < 0 by assert */ startIndicesClamped[i] = clamp(op.startIndices()[i], -1, m_impl.dimensions()[i] - 1); stopIndicesClamped[i] = clamp(op.stopIndices()[i], -1, m_impl.dimensions()[i] - 1); } m_startIndices[i] = startIndicesClamped[i]; } typedef typename TensorEvaluator::Dimensions InputDimensions; const InputDimensions& input_dims = m_impl.dimensions(); // compute output tensor shape m_is_identity = true; for (int i = 0; i < NumDims; i++) { Index interval = stopIndicesClamped[i] - startIndicesClamped[i]; if (interval == 0 || ((interval < 0) != (m_strides[i] < 0))) { m_dimensions[i] = 0; } else { m_dimensions[i] = (interval / m_strides[i]) + (interval % m_strides[i] != 0 ? 1 : 0); eigen_assert(m_dimensions[i] >= 0); } if (m_strides[i] != 1 || interval != m_impl.dimensions()[i]) { m_is_identity = false; } } Strides output_dims = m_dimensions; if (static_cast(Layout) == static_cast(ColMajor)) { m_inputStrides[0] = m_strides[0]; m_offsets[0] = startIndicesClamped[0]; Index previousDimProduct = 1; for (int i = 1; i < NumDims; ++i) { previousDimProduct *= input_dims[i-1]; m_inputStrides[i] = previousDimProduct * m_strides[i]; m_offsets[i] = startIndicesClamped[i] * previousDimProduct; } // Don't initialize m_fastOutputStrides[0] since it won't ever be accessed. m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStrides[i] = m_outputStrides[i-1] * output_dims[i-1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i] > 0 ? m_outputStrides[i] : 1); } } else { m_inputStrides[NumDims-1] = m_strides[NumDims-1]; m_offsets[NumDims-1] = startIndicesClamped[NumDims-1]; Index previousDimProduct = 1; for (int i = NumDims - 2; i >= 0; --i) { previousDimProduct *= input_dims[i+1]; m_inputStrides[i] = previousDimProduct * m_strides[i]; m_offsets[i] = startIndicesClamped[i] * previousDimProduct; } m_outputStrides[NumDims-1] = 1; for (int i = NumDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i+1] * output_dims[i+1]; m_fastOutputStrides[i] = internal::TensorIntDivisor(m_outputStrides[i] > 0 ? m_outputStrides[i] : 1); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { if (m_is_identity) { return m_impl.coeff(index); } else { return m_impl.coeff(srcCoeff(index)); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, m_is_identity ? 1 : NumDims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Storage::Type data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const { Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i >= 0; --i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += idx * m_inputStrides[i] + m_offsets[i]; index -= idx * m_outputStrides[i]; } } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims; ++i) { const Index idx = index / m_fastOutputStrides[i]; inputIndex += idx * m_inputStrides[i] + m_offsets[i]; index -= idx * m_outputStrides[i]; } } return inputIndex; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index clamp(Index value, Index min, Index max) { #ifndef SYCL_DEVICE_ONLY return numext::maxi(min, numext::mini(max,value)); #else return cl::sycl::clamp(value, min, max); #endif } array m_outputStrides; array, NumDims> m_fastOutputStrides; array m_inputStrides; bool m_is_identity; TensorEvaluator m_impl; const Device EIGEN_DEVICE_REF m_device; DSizes m_startIndices; // clamped startIndices DSizes m_dimensions; DSizes m_offsets; // offset in a flattened shape const Strides m_strides; }; // Eval as lvalue template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef TensorEvaluator, Device> Base; typedef TensorStridingSlicingOp XprType; static const int NumDims = internal::array_size::value; enum { IsAligned = false, PacketAccess = false, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = TensorEvaluator::CoordAccess, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { } typedef typename XprType::Index Index; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef Strides Dimensions; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index) { if (this->m_is_identity) { return this->m_impl.coeffRef(index); } else { return this->m_impl.coeffRef(this->srcCoeff(index)); } } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h0000644000176200001440000003355714562417221025114 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_MAP_H #define EIGEN_CXX11_TENSOR_TENSOR_MAP_H namespace Eigen { // FIXME use proper doxygen documentation (e.g. \tparam MakePointer_) /** \class TensorMap * \ingroup CXX11_Tensor_Module * * \brief A tensor expression mapping an existing array of data. * */ /// `template class MakePointer_` is added to convert the host pointer to the device pointer. /// It is added due to the fact that for our device compiler `T*` is not allowed. /// If we wanted to use the same Evaluator functions we have to convert that type to our pointer `T`. /// This is done through our `MakePointer_` class. By default the Type in the `MakePointer_` is `T*` . /// Therefore, by adding the default value, we managed to convert the type and it does not break any /// existing code as its default value is `T*`. template class MakePointer_> class TensorMap : public TensorBase > { public: typedef TensorMap Self; typedef TensorBase > Base; #ifdef EIGEN_USE_SYCL typedef typename Eigen::internal::remove_reference::type>::type Nested; #else typedef typename Eigen::internal::nested::type Nested; #endif typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef typename internal::traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef typename PlainObjectType::Base::CoeffReturnType CoeffReturnType; typedef typename MakePointer_::Type PointerType; typedef typename MakePointer_::ConstType PointerConstType; // WARN: PointerType still can be a pointer to const (const Scalar*), for // example in TensorMap> expression. This type of // expression should be illegal, but adding this restriction is not possible // in practice (see https://bitbucket.org/eigen/eigen/pull-requests/488). typedef typename internal::conditional< bool(internal::is_lvalue::value), PointerType, // use simple pointer in lvalue expressions PointerConstType // use const pointer in rvalue expressions >::type StoragePointerType; // If TensorMap was constructed over rvalue expression (e.g. const Tensor), // we should return a reference to const from operator() (and others), even // if TensorMap itself is not const. typedef typename internal::conditional< bool(internal::is_lvalue::value), Scalar&, const Scalar& >::type StorageRefType; static const int Options = Options_; static const Index NumIndices = PlainObjectType::NumIndices; typedef typename PlainObjectType::Dimensions Dimensions; enum { IsAligned = ((int(Options_)&Aligned)==Aligned), Layout = PlainObjectType::Layout, CoordAccess = true, RawAccess = true }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr) : m_data(dataPtr), m_dimensions() { // The number of dimensions used to construct a tensor must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT((0 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE) } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index firstDimension, IndexTypes... otherDimensions) : m_data(dataPtr), m_dimensions(firstDimension, otherDimensions...) { // The number of dimensions used to construct a tensor must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT((sizeof...(otherDimensions) + 1 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE) } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index firstDimension) : m_data(dataPtr), m_dimensions(firstDimension) { // The number of dimensions used to construct a tensor must be equal to the rank of the tensor. EIGEN_STATIC_ASSERT((1 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index dim1, Index dim2) : m_data(dataPtr), m_dimensions(dim1, dim2) { EIGEN_STATIC_ASSERT(2 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index dim1, Index dim2, Index dim3) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3) { EIGEN_STATIC_ASSERT(3 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index dim1, Index dim2, Index dim3, Index dim4) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3, dim4) { EIGEN_STATIC_ASSERT(4 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE) } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, Index dim1, Index dim2, Index dim3, Index dim4, Index dim5) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3, dim4, dim5) { EIGEN_STATIC_ASSERT(5 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE) } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, const array& dimensions) : m_data(dataPtr), m_dimensions(dimensions) { } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(StoragePointerType dataPtr, const Dimensions& dimensions) : m_data(dataPtr), m_dimensions(dimensions) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(PlainObjectType& tensor) : m_data(tensor.data()), m_dimensions(tensor.dimensions()) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rank() const { return m_dimensions.rank(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index dimension(Index n) const { return m_dimensions[n]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_dimensions.TotalSize(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StoragePointerType data() { return m_data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StoragePointerType data() const { return m_data; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(const array& indices) const { // eigen_assert(checkIndexRange(indices)); if (PlainObjectType::Options&RowMajor) { const Index index = m_dimensions.IndexOfRowMajor(indices); return m_data[index]; } else { const Index index = m_dimensions.IndexOfColMajor(indices); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()() const { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE) return m_data[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index index) const { eigen_internal_assert(index >= 0 && index < size()); return m_data[index]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const { EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(internal::all((Eigen::NumTraits::highest() >= otherIndices)...)); if (PlainObjectType::Options&RowMajor) { const Index index = m_dimensions.IndexOfRowMajor(array{{firstIndex, secondIndex, otherIndices...}}); return m_data[index]; } else { const Index index = m_dimensions.IndexOfColMajor(array{{firstIndex, secondIndex, otherIndices...}}); return m_data[index]; } } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1) const { if (PlainObjectType::Options&RowMajor) { const Index index = i1 + i0 * m_dimensions[1]; return m_data[index]; } else { const Index index = i0 + i1 * m_dimensions[0]; return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2) const { if (PlainObjectType::Options&RowMajor) { const Index index = i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * i2); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2, Index i3) const { if (PlainObjectType::Options&RowMajor) { const Index index = i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0)); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * i3)); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const { if (PlainObjectType::Options&RowMajor) { const Index index = i4 + m_dimensions[4] * (i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0))); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * (i3 + m_dimensions[3] * i4))); return m_data[index]; } } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(const array& indices) { // eigen_assert(checkIndexRange(indices)); if (PlainObjectType::Options&RowMajor) { const Index index = m_dimensions.IndexOfRowMajor(indices); return m_data[index]; } else { const Index index = m_dimensions.IndexOfColMajor(indices); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()() { EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE) return m_data[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index index) { eigen_internal_assert(index >= 0 && index < size()); return m_data[index]; } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) { static_assert(sizeof...(otherIndices) + 2 == NumIndices || NumIndices == Dynamic, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor."); eigen_assert(internal::all((Eigen::NumTraits::highest() >= otherIndices)...)); const std::size_t NumDims = sizeof...(otherIndices) + 2; if (PlainObjectType::Options&RowMajor) { const Index index = m_dimensions.IndexOfRowMajor(array{{firstIndex, secondIndex, otherIndices...}}); return m_data[index]; } else { const Index index = m_dimensions.IndexOfColMajor(array{{firstIndex, secondIndex, otherIndices...}}); return m_data[index]; } } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1) { if (PlainObjectType::Options&RowMajor) { const Index index = i1 + i0 * m_dimensions[1]; return m_data[index]; } else { const Index index = i0 + i1 * m_dimensions[0]; return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2) { if (PlainObjectType::Options&RowMajor) { const Index index = i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * i2); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2, Index i3) { if (PlainObjectType::Options&RowMajor) { const Index index = i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0)); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * i3)); return m_data[index]; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE StorageRefType operator()(Index i0, Index i1, Index i2, Index i3, Index i4) { if (PlainObjectType::Options&RowMajor) { const Index index = i4 + m_dimensions[4] * (i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0))); return m_data[index]; } else { const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * (i3 + m_dimensions[3] * i4))); return m_data[index]; } } #endif EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(TensorMap) private: StoragePointerType m_data; Dimensions m_dimensions; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_MAP_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h0000644000176200001440000006213414562417221026273 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H #define EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H #if EIGEN_HAS_CONSTEXPR && EIGEN_HAS_VARIADIC_TEMPLATES #define EIGEN_HAS_INDEX_LIST namespace Eigen { /** \internal * * \class TensorIndexList * \ingroup CXX11_Tensor_Module * * \brief Set of classes used to encode a set of Tensor dimensions/indices. * * The indices in the list can be known at compile time or at runtime. A mix * of static and dynamic indices can also be provided if needed. The tensor * code will attempt to take advantage of the indices that are known at * compile time to optimize the code it generates. * * This functionality requires a c++11 compliant compiler. If your compiler * is older you need to use arrays of indices instead. * * Several examples are provided in the cxx11_tensor_index_list.cpp file. * * \sa Tensor */ template struct type2index { static const Index value = n; EIGEN_DEVICE_FUNC constexpr operator Index() const { return n; } EIGEN_DEVICE_FUNC void set(Index val) { eigen_assert(val == n); } }; // This can be used with IndexPairList to get compile-time constant pairs, // such as IndexPairList, type2indexpair<3,4>>(). template struct type2indexpair { static const Index first = f; static const Index second = s; constexpr EIGEN_DEVICE_FUNC operator IndexPair() const { return IndexPair(f, s); } EIGEN_DEVICE_FUNC void set(const IndexPair& val) { eigen_assert(val.first == f); eigen_assert(val.second == s); } }; template struct NumTraits > { typedef Index Real; enum { IsComplex = 0, RequireInitialization = false, ReadCost = 1, AddCost = 1, MulCost = 1 }; EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR EIGEN_STRONG_INLINE Real epsilon() { return 0; } EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR EIGEN_STRONG_INLINE Real dummy_precision() { return 0; } EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR EIGEN_STRONG_INLINE Real highest() { return n; } EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR EIGEN_STRONG_INLINE Real lowest() { return n; } }; namespace internal { template EIGEN_DEVICE_FUNC void update_value(T& val, Index new_val) { val = internal::convert_index(new_val); } template EIGEN_DEVICE_FUNC void update_value(type2index& val, Index new_val) { val.set(new_val); } template EIGEN_DEVICE_FUNC void update_value(T& val, IndexPair new_val) { val = new_val; } template EIGEN_DEVICE_FUNC void update_value(type2indexpair& val, IndexPair new_val) { val.set(new_val); } template struct is_compile_time_constant { static constexpr bool value = false; }; template struct is_compile_time_constant > { static constexpr bool value = true; }; template struct is_compile_time_constant > { static constexpr bool value = true; }; template struct is_compile_time_constant& > { static constexpr bool value = true; }; template struct is_compile_time_constant& > { static constexpr bool value = true; }; template struct is_compile_time_constant > { static constexpr bool value = true; }; template struct is_compile_time_constant > { static constexpr bool value = true; }; template struct is_compile_time_constant& > { static constexpr bool value = true; }; template struct is_compile_time_constant& > { static constexpr bool value = true; }; template struct IndexTuple; template struct IndexTuple { EIGEN_DEVICE_FUNC constexpr IndexTuple() : head(), others() { } EIGEN_DEVICE_FUNC constexpr IndexTuple(const T& v, const O... o) : head(v), others(o...) { } constexpr static int count = 1 + sizeof...(O); T head; IndexTuple others; typedef T Head; typedef IndexTuple Other; }; template struct IndexTuple { EIGEN_DEVICE_FUNC constexpr IndexTuple() : head() { } EIGEN_DEVICE_FUNC constexpr IndexTuple(const T& v) : head(v) { } constexpr static int count = 1; T head; typedef T Head; }; template struct IndexTupleExtractor; template struct IndexTupleExtractor { typedef typename IndexTupleExtractor::ValType ValType; EIGEN_DEVICE_FUNC static constexpr ValType& get_val(IndexTuple& val) { return IndexTupleExtractor::get_val(val.others); } EIGEN_DEVICE_FUNC static constexpr const ValType& get_val(const IndexTuple& val) { return IndexTupleExtractor::get_val(val.others); } template EIGEN_DEVICE_FUNC static void set_val(IndexTuple& val, V& new_val) { IndexTupleExtractor::set_val(val.others, new_val); } }; template struct IndexTupleExtractor<0, T, O...> { typedef T ValType; EIGEN_DEVICE_FUNC static constexpr ValType& get_val(IndexTuple& val) { return val.head; } EIGEN_DEVICE_FUNC static constexpr const ValType& get_val(const IndexTuple& val) { return val.head; } template EIGEN_DEVICE_FUNC static void set_val(IndexTuple& val, V& new_val) { val.head = new_val; } }; template EIGEN_DEVICE_FUNC constexpr typename IndexTupleExtractor::ValType& array_get(IndexTuple& tuple) { return IndexTupleExtractor::get_val(tuple); } template EIGEN_DEVICE_FUNC constexpr const typename IndexTupleExtractor::ValType& array_get(const IndexTuple& tuple) { return IndexTupleExtractor::get_val(tuple); } template struct array_size > { static const size_t value = IndexTuple::count; }; template struct array_size > { static const size_t value = IndexTuple::count; }; template struct tuple_coeff { template EIGEN_DEVICE_FUNC static constexpr ValueT get(const Index i, const IndexTuple& t) { // return array_get(t) * (i == Idx) + tuple_coeff::get(i, t) * (i != Idx); return (i == Idx ? array_get(t) : tuple_coeff::get(i, t)); } template EIGEN_DEVICE_FUNC static void set(const Index i, IndexTuple& t, const ValueT& value) { if (i == Idx) { update_value(array_get(t), value); } else { tuple_coeff::set(i, t, value); } } template EIGEN_DEVICE_FUNC static constexpr bool value_known_statically(const Index i, const IndexTuple& t) { return ((i == Idx) & is_compile_time_constant::ValType>::value) || tuple_coeff::value_known_statically(i, t); } template EIGEN_DEVICE_FUNC static constexpr bool values_up_to_known_statically(const IndexTuple& t) { return is_compile_time_constant::ValType>::value && tuple_coeff::values_up_to_known_statically(t); } template EIGEN_DEVICE_FUNC static constexpr bool values_up_to_statically_known_to_increase(const IndexTuple& t) { return is_compile_time_constant::ValType>::value && is_compile_time_constant::ValType>::value && array_get(t) > array_get(t) && tuple_coeff::values_up_to_statically_known_to_increase(t); } }; template struct tuple_coeff<0, ValueT> { template EIGEN_DEVICE_FUNC static constexpr ValueT get(const Index /*i*/, const IndexTuple& t) { // eigen_assert (i == 0); // gcc fails to compile assertions in constexpr return array_get<0>(t)/* * (i == 0)*/; } template EIGEN_DEVICE_FUNC static void set(const Index i, IndexTuple& t, const ValueT value) { eigen_assert (i == 0); update_value(array_get<0>(t), value); } template EIGEN_DEVICE_FUNC static constexpr bool value_known_statically(const Index i, const IndexTuple&) { return is_compile_time_constant::ValType>::value && (i == 0); } template EIGEN_DEVICE_FUNC static constexpr bool values_up_to_known_statically(const IndexTuple&) { return is_compile_time_constant::ValType>::value; } template EIGEN_DEVICE_FUNC static constexpr bool values_up_to_statically_known_to_increase(const IndexTuple&) { return true; } }; } // namespace internal template struct IndexList : internal::IndexTuple { EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC constexpr Index operator[] (const Index i) const { return internal::tuple_coeff >::value-1, Index>::get(i, *this); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC constexpr Index get(const Index i) const { return internal::tuple_coeff >::value-1, Index>::get(i, *this); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC void set(const Index i, const Index value) { return internal::tuple_coeff >::value-1, Index>::set(i, *this, value); } EIGEN_DEVICE_FUNC constexpr IndexList(const internal::IndexTuple& other) : internal::IndexTuple(other) { } EIGEN_DEVICE_FUNC constexpr IndexList(FirstType& first, OtherTypes... other) : internal::IndexTuple(first, other...) { } EIGEN_DEVICE_FUNC constexpr IndexList() : internal::IndexTuple() { } EIGEN_DEVICE_FUNC constexpr bool value_known_statically(const Index i) const { return internal::tuple_coeff >::value-1, Index>::value_known_statically(i, *this); } EIGEN_DEVICE_FUNC constexpr bool all_values_known_statically() const { return internal::tuple_coeff >::value-1, Index>::values_up_to_known_statically(*this); } EIGEN_DEVICE_FUNC constexpr bool values_statically_known_to_increase() const { return internal::tuple_coeff >::value-1, Index>::values_up_to_statically_known_to_increase(*this); } }; template std::ostream& operator<<(std::ostream& os, const IndexList& dims) { os << "["; for (size_t i = 0; i < 1 + sizeof...(OtherTypes); ++i) { if (i > 0) os << ", "; os << dims[i]; } os << "]"; return os; } template constexpr IndexList make_index_list(FirstType val1, OtherTypes... other_vals) { return IndexList(val1, other_vals...); } template struct IndexPairList : internal::IndexTuple { EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC constexpr IndexPair operator[] (const Index i) const { return internal::tuple_coeff >::value-1, IndexPair>::get(i, *this); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC void set(const Index i, const IndexPair value) { return internal::tuple_coeff>::value-1, IndexPair >::set(i, *this, value); } EIGEN_DEVICE_FUNC constexpr IndexPairList(const internal::IndexTuple& other) : internal::IndexTuple(other) { } EIGEN_DEVICE_FUNC constexpr IndexPairList() : internal::IndexTuple() { } EIGEN_DEVICE_FUNC constexpr bool value_known_statically(const Index i) const { return internal::tuple_coeff >::value-1, Index>::value_known_statically(i, *this); } }; namespace internal { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index array_prod(const IndexList& sizes) { Index result = 1; EIGEN_UNROLL_LOOP for (size_t i = 0; i < array_size >::value; ++i) { result *= sizes[i]; } return result; } template struct array_size > { static const size_t value = array_size >::value; }; template struct array_size > { static const size_t value = array_size >::value; }; template struct array_size > { static const size_t value = std::tuple_size >::value; }; template struct array_size > { static const size_t value = std::tuple_size >::value; }; template EIGEN_DEVICE_FUNC constexpr Index array_get(IndexList& a) { return IndexTupleExtractor::get_val(a); } template EIGEN_DEVICE_FUNC constexpr Index array_get(const IndexList& a) { return IndexTupleExtractor::get_val(a); } template struct index_known_statically_impl { EIGEN_DEVICE_FUNC static constexpr bool run(const Index) { return false; } }; template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i) { return IndexList().value_known_statically(i); } }; template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i) { return IndexList().value_known_statically(i); } }; template struct all_indices_known_statically_impl { static constexpr bool run() { return false; } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return IndexList().all_values_known_statically(); } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return IndexList().all_values_known_statically(); } }; template struct indices_statically_known_to_increase_impl { EIGEN_DEVICE_FUNC static constexpr bool run() { return false; } }; template struct indices_statically_known_to_increase_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return Eigen::IndexList().values_statically_known_to_increase(); } }; template struct indices_statically_known_to_increase_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return Eigen::IndexList().values_statically_known_to_increase(); } }; template struct index_statically_eq_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) == value); } }; template struct index_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) == value); } }; template struct index_statically_ne_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_statically_ne_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) != value); } }; template struct index_statically_ne_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) != value); } }; template struct index_statically_gt_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_statically_gt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) > value); } }; template struct index_statically_gt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) > value); } }; template struct index_statically_lt_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_statically_lt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) < value); } }; template struct index_statically_lt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexList().value_known_statically(i) & (IndexList().get(i) < value); } }; template struct index_pair_first_statically_eq_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_pair_first_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexPairList().value_known_statically(i) & (IndexPairList().operator[](i).first == value); } }; template struct index_pair_first_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexPairList().value_known_statically(i) & (IndexPairList().operator[](i).first == value); } }; template struct index_pair_second_statically_eq_impl { EIGEN_DEVICE_FUNC static constexpr bool run(Index, Index) { return false; } }; template struct index_pair_second_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexPairList().value_known_statically(i) & (IndexPairList().operator[](i).second == value); } }; template struct index_pair_second_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const Index i, const Index value) { return IndexPairList().value_known_statically(i) & (IndexPairList().operator[](i).second == value); } }; } // end namespace internal } // end namespace Eigen #else namespace Eigen { namespace internal { template struct index_known_statically_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const Index) { return false; } }; template struct all_indices_known_statically_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run() { return false; } }; template struct indices_statically_known_to_increase_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run() { return false; } }; template struct index_statically_eq_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; template struct index_statically_ne_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; template struct index_statically_gt_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; template struct index_statically_lt_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; template struct index_pair_first_statically_eq_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; template struct index_pair_second_statically_eq_impl { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(Index, Index) { return false; } }; } // end namespace internal } // end namespace Eigen #endif namespace Eigen { namespace internal { template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_known_statically(Index i) { return index_known_statically_impl::run(i); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool all_indices_known_statically() { return all_indices_known_statically_impl::run(); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool indices_statically_known_to_increase() { return indices_statically_known_to_increase_impl::run(); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_statically_eq(Index i, Index value) { return index_statically_eq_impl::run(i, value); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_statically_ne(Index i, Index value) { return index_statically_ne_impl::run(i, value); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_statically_gt(Index i, Index value) { return index_statically_gt_impl::run(i, value); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_statically_lt(Index i, Index value) { return index_statically_lt_impl::run(i, value); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_pair_first_statically_eq(Index i, Index value) { return index_pair_first_statically_eq_impl::run(i, value); } template static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bool index_pair_second_statically_eq(Index i, Index value) { return index_pair_second_statically_eq_impl::run(i, value); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h0000644000176200001440000001677214107270226027155 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H #define EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H namespace Eigen { /** \internal * * \class TensorDimensionList * \ingroup CXX11_Tensor_Module * * \brief Special case of tensor index list used to list all the dimensions of a tensor of rank n. * * \sa Tensor */ template struct DimensionList { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const Index operator[] (const Index i) const { return i; } }; namespace internal { template struct array_size > { static const size_t value = Rank; }; template struct array_size > { static const size_t value = Rank; }; template const Index array_get(DimensionList&) { return n; } template const Index array_get(const DimensionList&) { return n; } #if EIGEN_HAS_CONSTEXPR template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex) { return true; } }; template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex) { return true; } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return true; } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return true; } }; template struct indices_statically_known_to_increase_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return true; } }; template struct indices_statically_known_to_increase_impl > { EIGEN_DEVICE_FUNC static constexpr bool run() { return true; } }; template struct index_statically_eq_impl > { static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i == value; } }; template struct index_statically_eq_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i == value; } }; template struct index_statically_ne_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i != value; } }; template struct index_statically_ne_impl > { static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i != value; } }; template struct index_statically_gt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i > value; } }; template struct index_statically_gt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i > value; } }; template struct index_statically_lt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i < value; } }; template struct index_statically_lt_impl > { EIGEN_DEVICE_FUNC static constexpr bool run(const DenseIndex i, const DenseIndex value) { return i < value; } }; #else template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE bool run(const DenseIndex) { return true; } }; template struct index_known_statically_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE bool run(const DenseIndex) { return true; } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE bool run() { return true; } }; template struct all_indices_known_statically_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE bool run() { return true; } }; template struct indices_statically_known_to_increase_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run() { return true; } }; template struct indices_statically_known_to_increase_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run() { return true; } }; template struct index_statically_eq_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_eq_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_ne_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex){ return false; } }; template struct index_statically_ne_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_gt_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_gt_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_lt_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; template struct index_statically_lt_impl > { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool run(const DenseIndex, const DenseIndex) { return false; } }; #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h0000644000176200001440000021203714562417221030634 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H // evaluator for thread pool device #ifdef EIGEN_USE_THREADS namespace Eigen { template struct TensorEvaluator, ThreadPoolDevice> : public TensorContractionEvaluatorBase, ThreadPoolDevice> > { typedef ThreadPoolDevice Device; typedef TensorEvaluator, Device> Self; typedef TensorContractionEvaluatorBase Base; typedef TensorContractionOp XprType; typedef typename internal::remove_const::type Scalar; typedef typename XprType::Index Index; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; enum { Layout = TensorEvaluator::Layout, }; // Most of the code is assuming that both input tensors are ColMajor. If the // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS: // If we want to compute A * B = C, where A is LHS and B is RHS, the code // will pretend B is LHS and A is RHS. typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType; typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType; static const int LDims = internal::array_size::Dimensions>::value; static const int RDims = internal::array_size::Dimensions>::value; static const int ContractDims = internal::array_size::value; typedef array left_dim_mapper_t; typedef array right_dim_mapper_t; typedef array contract_t; typedef array left_nocontract_t; typedef array right_nocontract_t; static const int NumDims = LDims + RDims - 2 * ContractDims; typedef DSizes Dimensions; // typedefs needed in evalTo typedef typename internal::remove_const::type LhsScalar; typedef typename internal::remove_const::type RhsScalar; typedef typename internal::gebp_traits Traits; typedef TensorEvaluator LeftEvaluator; typedef TensorEvaluator RightEvaluator; TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) {} template void evalProduct(Scalar* buffer) const { evalProductImpl(buffer, NoCallback()); } template void evalProductAsync(Scalar* buffer, EvalToCallback done) const { evalProductImpl(buffer, std::move(done)); } template void evalProductImpl(Scalar* buffer, DoneCallback done) const { // This function computes a lot of heuristics in multiple steps, and it // also has multiple exit points. To keep it sane, readable and all in one // place, sync/async execution decision is made at runtime at the very end. // // (1) In sync mode we allocate Context on the stack, submit computations // to the device thread pool, and block on a barrier until it is // completed. // // (2) In async mode we allocate Context on the heap, and after all tasks // are finished, we call provided the done callback, and delete a // context from the heap. // // (*) EvalParallelContext & EvalShardedByInnerDimContext owns all the state // and temporary buffers, requried for executing the tensor contraction. // They are responsible for cleaning it up after contraction is done. static const bool IsEvalInSyncMode = std::is_same::value; const Index m = this->m_i_size; const Index n = this->m_j_size; const Index k = this->m_k_size; if (m == 0 || n == 0 || k == 0) return; // Compute a set of algorithm parameters: // - kernel block sizes (bm, bn, bk) // - task grain sizes (number of kernels executed per task: gm, gn) // - number of threads // - sharding by row/column // - parallel packing or first lhs then rhs // and some derived parameters: // - number of tasks (nm, nn, nk) // - number of kernels (nm0, nn0) // Unfortunately, all these parameters are tightly interdependent. // So in some cases we first compute approximate values, then compute other // values based on these approximations and then refine the approximations. // There are lots of heuristics here. There is some reasoning behind them, // but ultimately they are just tuned on contraction benchmarks for // different input configurations, thread counts and instruction sets. // So feel free to question any of them. // Compute whether we want to shard by row or by column. // This is a first approximation, it will be refined later. Since we don't // know number of threads yet we use 2, because what's we are most // interested in at this point is whether it makes sense to use // parallelization at all or not. bool shard_by_col = shardByCol(m, n, 2); // First approximation of kernel blocking sizes. // Again, we don't know number of threads yet, so we use 2. Index bm, bn, bk; if (shard_by_col) { internal::TensorContractionBlocking blocking(k, m, n, 2); bm = blocking.mc(); bn = blocking.nc(); bk = blocking.kc(); } else { internal::TensorContractionBlocking blocking(k, m, n, 2); bm = blocking.mc(); bn = blocking.nc(); bk = blocking.kc(); } // Compute optimal number of threads. // Note: we use bk instead of k here because we are interested in amount of // _parallelizable_ computations, and computations are not parallelizable // across k dimension. const TensorOpCost cost = contractionCost(m, n, bm, bn, bk, shard_by_col, false); int num_threads = TensorCostModel::numThreads( static_cast(n) * m, cost, this->m_device.numThreads()); int num_threads_by_k = numThreadsInnerDim(m, n, k); if (shardByInnerDim(m, n, k, num_threads, num_threads_by_k)) { // We are in the scenario where it is more effective to shard by the // inner dimension. if (IsEvalInSyncMode) { EvalShardedByInnerDimContext ctx( this, num_threads_by_k, buffer, m, n, k, std::move(done)); ctx.template run(); } else { auto* ctx = new EvalShardedByInnerDimContext( this, num_threads_by_k, buffer, m, n, k, std::move(done)); ctx->template runAsync(); } return; } // TODO(dvyukov): this is a stop-gap to prevent regressions while the cost // model is not tuned. Remove this when the cost model is tuned. if (n == 1) num_threads = 1; if (num_threads == 1) { TENSOR_CONTRACTION_DISPATCH(this->template evalProductSequential, Unaligned, (buffer)); if (!IsEvalInSyncMode) done(); return; } // Now that we know number of threads, recalculate sharding and blocking. shard_by_col = shardByCol(m, n, num_threads); if (shard_by_col) { internal::TensorContractionBlocking blocking(k, m, n, num_threads); bm = blocking.mc(); bn = blocking.nc(); bk = blocking.kc(); } else { internal::TensorContractionBlocking blocking(k, m, n, num_threads); bm = blocking.mc(); bn = blocking.nc(); bk = blocking.kc(); } // Number of kernels for each dimension. Index nm0 = divup(m, bm); Index nn0 = divup(n, bn); Index nk = divup(k, bk); // Calculate task grain size (number of kernels executed per task). // This task size coarsening serves two purposes: // 1. It reduces per-task overheads including synchronization overheads. // 2. It allows to use caches better (reuse the same packed rhs in several // consecutive kernels). Index gm = 1; Index gn = 1; // If we are sharding by column, then we prefer to reduce rows first. if (shard_by_col) { gm = coarsenM(m, n, bm, bn, bk, gn, num_threads, shard_by_col); gn = coarsenN(m, n, bm, bn, bk, gm, num_threads, shard_by_col); } else { gn = coarsenN(m, n, bm, bn, bk, gm, num_threads, shard_by_col); gm = coarsenM(m, n, bm, bn, bk, gn, num_threads, shard_by_col); } // Number of tasks in each dimension. Index nm = divup(nm0, gm); Index nn = divup(nn0, gn); // If there is enough concurrency in the sharding dimension, we choose not // to paralellize by the other dimension, and execute all kernels in sync // mode. This reduces parallelism from the nm x nn down to nn // (shard_by_col==true) or nm (shard_by_col==false). const Index sharding_dim_tasks = shard_by_col ? nn : nm; const int num_worker_threads = this->m_device.numThreadsInPool(); // With small number of threads we want to make sure that we do not reduce // parallelism too much. With large number of threads we trade maximum // parallelism for better memory locality. const float oversharding_factor = num_worker_threads <= 4 ? 8.0 : num_worker_threads <= 8 ? 4.0 : num_worker_threads <= 16 ? 2.0 : num_worker_threads <= 32 ? 1.0 : num_worker_threads <= 64 ? 0.8 : /* num_worker_threads > 64 */ 0.6; const bool parallelize_by_sharding_dim_only = sharding_dim_tasks >= oversharding_factor * num_worker_threads; // Last by not least, decide whether we want to issue both lhs and rhs // packing in parallel; or issue lhs packing first, and then issue rhs // packing when lhs packing completes (for !shard_by_col lhs and rhs are // swapped). Parallel packing allows more parallelism (for both packing and // kernels), while sequential packing provides better locality (once // a thread finishes rhs packing it proceed to kernels with that rhs). // First, we are interested in parallel packing if there are few tasks. bool parallel_pack = num_threads >= nm * nn; // Also do parallel packing if all data fits into L2$. if (m * bk * Index(sizeof(LhsScalar)) + n * bk * Index(sizeof(RhsScalar)) <= l2CacheSize() * num_threads) parallel_pack = true; // But don't do it if we will use each rhs only once. Locality seems to be // more important in this case. if ((shard_by_col ? nm : nn) == 1) parallel_pack = false; // Also don't get in the way of parallelize_by_sharding_dim_only // optimization. if (parallelize_by_sharding_dim_only) parallel_pack = false; // TODO(ezhulnev): With if contexpr we don't need SyncEvalParallelContext. if (IsEvalInSyncMode) { #define CONTEXT_ARGS \ (this, num_threads, buffer, m, n, k, bm, bn, bk, nm, nn, nk, gm, gn, nm0, \ nn0, shard_by_col, parallel_pack, parallelize_by_sharding_dim_only, \ NoCallback()) \ .run() TENSOR_CONTRACTION_DISPATCH(SyncEvalParallelContext, Alignment, CONTEXT_ARGS); #undef CONTEXT_ARGS } else { #define CONTEXT_ARGS \ (this, num_threads, buffer, m, n, k, bm, bn, bk, nm, nn, nk, gm, gn, nm0, \ nn0, shard_by_col, parallel_pack, parallelize_by_sharding_dim_only, \ std::move(done)) TENSOR_CONTRACTION_ASYNC_DISPATCH(EvalParallelContext, DoneCallback, Alignment, CONTEXT_ARGS, run()); #undef CONTEXT_ARGS } } // ------------------------------------------------------------------------ // // Dummy struct to represent an empty DoneCallback. struct NoCallback { void operator()() { eigen_assert(false && "NoCallback should never be called"); } }; // ------------------------------------------------------------------------ // template class EvalParallelNotification; // Synchronous evaluation notification that blocks caller thread in Wait(). template class EvalParallelNotification { public: EvalParallelNotification(Context*, NoCallback) {} void Notify() { done_.Notify(); } void Wait() { done_.Wait(); } private: Eigen::Notification done_; }; // Asynchronous evaluation notification that does not block in Wait(). template class EvalParallelNotification { public: EvalParallelNotification(Context* ctx, DoneCallback done) : ctx_(ctx), done_(std::move(done)) {} void Notify() { // Make a copy of done callback, because it will be destructed when we // will delete context in the next line (EvalParallelNotification is a // data member of EvalParallelContext class). DoneCallback done_copy = std::move(done_); // Delete parallel evaluation context. delete ctx_; // Now safely call the done callback. done_copy(); } void Wait() {} private: Context* ctx_; DoneCallback done_; }; // Context orchestrates sync/async parallel contraction evaluation. When it is // executed in asynchronous mode, it owns all the shared state that might be // accessible by block packing and kernel tasks. template class EvalParallelContext { public: typedef internal::TensorContractionInputMapper< LhsScalar, Index, internal::Lhs, LeftEvaluator, left_nocontract_t, contract_t, internal::packet_traits::size, lhs_inner_dim_contiguous, false, Unaligned> LhsMapper; typedef internal::TensorContractionInputMapper< RhsScalar, Index, internal::Rhs, RightEvaluator, right_nocontract_t, contract_t, internal::packet_traits::size, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Unaligned> RhsMapper; typedef internal::blas_data_mapper OutputMapper; typedef internal::TensorContractionKernel< Scalar, LhsScalar, RhsScalar, Index, OutputMapper, LhsMapper, RhsMapper> TensorContractionKernel; typedef typename TensorContractionKernel::LhsBlock LhsBlock; typedef typename TensorContractionKernel::RhsBlock RhsBlock; typedef typename TensorContractionKernel::BlockMemHandle BlockMemHandle; EvalParallelContext(const Self* self, int num_threads, Scalar* buffer, Index tm, Index tn, Index tk, Index bm, Index bn, Index bk, Index nm, Index nn, Index nk, Index gm, Index gn, Index nm0, Index nn0, bool shard_by_col, bool parallel_pack, bool parallelize_by_sharding_dim_only, DoneCallback done) : created_by_thread_id_(std::this_thread::get_id()), done_(this, std::move(done)), device_(self->m_device), lhs_(self->m_leftImpl, self->m_left_nocontract_strides, self->m_i_strides, self->m_left_contracting_strides, self->m_k_strides), rhs_(self->m_rightImpl, self->m_right_nocontract_strides, self->m_j_strides, self->m_right_contracting_strides, self->m_k_strides), buffer_(buffer), output_(buffer, tm), output_kernel_(self->m_output_kernel), tensor_contraction_params_(self->m_tensor_contraction_params), num_threads_(num_threads), shard_by_col_(shard_by_col), parallel_pack_(parallel_pack), parallelize_by_sharding_dim_only_(parallelize_by_sharding_dim_only), m_(tm), n_(tn), k_(tk), bm_(bm), bn_(bn), bk_(bk), nm_(nm), nn_(nn), nk_(nk), gm_(gm), gn_(gn), nm0_(nm0), nn0_(nn0), kernel_(m_, k_, n_, bm_, bk_, bn_), num_thread_local_allocations_(0), // We reserve 2X more capacity for a thread local values, than the // number of threads in the pool to efficiently handle task stealing // by threads that are not managed by the pool. thread_local_capacity(2 * (parallelize_by_sharding_dim_only_ ? device_.numThreadsInPool() : 0)), // We will use only one of the Lhs/Rhs thread local storage depending // on the shard_by_col value and we parallelize by sharding dim ONLY. lhs_thread_local_blocks_(shard_by_col_ ? 0 : thread_local_capacity, {*this}, {*this}), rhs_thread_local_blocks_(shard_by_col_ ? thread_local_capacity : 0, {*this}, {*this}) { // These two options are mutually exclusive. eigen_assert(!(parallel_pack && parallelize_by_sharding_dim_only)); for (Index x = 0; x < P; x++) { // Normal number of notifications for k slice switch is // nm_ + nn_ + nm_ * nn_. However, first P - 1 slices will receive only // nm_ + nn_ notifications, because they will not receive notifications // from preceding kernels. state_switch_[x] = x == 0 ? 1 : (parallel_pack_ ? nn_ + nm_ : (shard_by_col_ ? nn_ : nm_)) + (x == P - 1 ? nm_ * nn_ : 0); state_packing_ready_[x] = parallel_pack_ ? 0 : (shard_by_col_ ? nm_ : nn_); state_kernel_[x] = new std::atomic*[nm_]; for (Index m = 0; m < nm_; m++) { state_kernel_[x][m] = new std::atomic[nn_]; // Kernels generally receive 3 notifications (previous kernel + 2 // packing), but the first slice won't get notifications from previous // kernels. for (Index n = 0; n < nn_; n++) state_kernel_[x][m][n].store( (x == 0 ? 0 : 1) + (parallel_pack_ ? 2 : 1), std::memory_order_relaxed); } } // Allocate memory for packed rhs/lhs matrices. packed_mem_ = kernel_.allocateSlices( // device_, // /*num_lhs=*/nm0_, // /*num_rhs=*/nn0_, // /*num_slices=*/std::min(nk_, P - 1), // packed_lhs_, packed_rhs_); if (parallelize_by_sharding_dim_only_) { const int num_worker_threads = device_.numThreadsInPool(); if (shard_by_col) { can_use_thread_local_packed_ = new std::atomic[nn_]; for (int i = 0; i < nn_; ++i) can_use_thread_local_packed_[i].store(true, std::memory_order_relaxed); Index num_blocks = num_worker_threads * gn_; thread_local_pre_alocated_mem_ = kernel_.allocateSlices( // device_, // /*num_lhs=*/0, // /*num_rhs=*/num_blocks, // /*num_slices=*/1, // /*lhs_blocks=*/nullptr, &rhs_thread_local_pre_allocated_); } else { can_use_thread_local_packed_ = new std::atomic[nm_]; for (int i = 0; i < nm_; ++i) can_use_thread_local_packed_[i].store(true, std::memory_order_relaxed); Index num_blocks = num_worker_threads * gm_; thread_local_pre_alocated_mem_ = kernel_.allocateSlices( // device_, // /*num_lhs=*/num_blocks, // /*num_rhs=*/0, // /*num_slices=*/1, &lhs_thread_local_pre_allocated_, // /*rhs_blocks=*/nullptr); } } } ~EvalParallelContext() { for (Index x = 0; x < P; x++) { for (Index m = 0; m < nm_; m++) delete[] state_kernel_[x][m]; delete[] state_kernel_[x]; } kernel_.deallocate(device_, packed_mem_); if (parallelize_by_sharding_dim_only_) { kernel_.deallocate(device_, thread_local_pre_alocated_mem_); delete[] can_use_thread_local_packed_; } } void run() { // Kick off packing of the first slice. signal_switch(0, 1); // Wait for overall completion. // // If parallel evaluation is executed in async mode, this is a no-op, and // Wait() will return immediately. In synchronous mode it will block the // caller thread until it will receive notification from last task. // // In async mode, last task when completed will call done callback from // the same thread, and will delete this context. // // TODO(dvyukov): This wait can lead to deadlock if contraction is // evaluated in synchronous mode. If nthreads contractions are // concurrently submitted from worker threads, this wait will block all // worker threads and the system will deadlock. done_.Wait(); } private: std::thread::id created_by_thread_id_; // This notification is specialized on the type of DoneCallback and can be // blocking or non-blocking. EvalParallelNotification done_; const Device& device_; LhsMapper lhs_; RhsMapper rhs_; Scalar* const buffer_; OutputMapper output_; OutputKernelType output_kernel_; TensorContractionParams tensor_contraction_params_; const int num_threads_; const bool shard_by_col_; const bool parallel_pack_; const bool parallelize_by_sharding_dim_only_; // Matrix sizes. const Index m_; const Index n_; const Index k_; // Block sizes. const Index bm_; const Index bn_; const Index bk_; // Number of tasks. const Index nm_; const Index nn_; const Index nk_; // Task grain sizes (number of kernels executed per task). const Index gm_; const Index gn_; // Number of blocks (this is different from ni_/nn_ because of task size // coarsening). const Index nm0_; const Index nn0_; // Tensor contraction kernel. TensorContractionKernel kernel_; // Parallelization strategy. // // Blocks related to the same k block can run in parallel because they write // to different output blocks. So we parallelize within k slices, this // gives us parallelism level of m x n. Before we can start any kernels // related to k-th slice, we need to issue m lhs packing tasks and n rhs // packing tasks. // // However, there is a bottleneck when we are finishing kernels for k-th // slice (at the very end there is only 1 runnable kernel). To mitigate this // bottleneck we allow kernels from k-th and k+1-th slices to run in // parallel. Note that (m, n, k) and (m, n, k+1) kernels write to the same // output block, so they must not run in parallel. // // This gives us the following dependency graph. // On each k slice we have m x n kernel tasks, m lhs paking tasks and n rhs // packing tasks. // Kernel (m, n, k) can start when: // - kernel (m, n, k-1) has finished // - lhs packing (m, k) has finished // - rhs packing (n, k) has finished // Lhs/rhs packing can start when: // - all k-1 packing has finished (artificially imposed to limit amount of // parallel packing) // // On top of that we limit runnable tasks to two consecutive k slices. // This is done to limit amount of memory we need for packed lhs/rhs // (for each k slice we need m*bk + n*bk memory in packed_lhs_/packed_rhs_). // // state_switch_ tracks when we are ready to switch to the next k slice. // state_kernel_[m][n] tracks when we are ready to kick off kernel (m, n). // These variable are rolling over 3 consecutive k slices: first two we are // actively executing + one to track completion of kernels in the second // slice. static const Index P = 3; // Handle to the allocated temporary storage for Lhs/Rhs blocks. BlockMemHandle packed_mem_; std::vector packed_lhs_[P - 1]; std::vector packed_rhs_[P - 1]; // If we choose to parallelize only by the sharding dimension, each thread // will have it's own "thead local" (not a c++ thread local storage) memory // for packed_lhs or packed_rhs (shard_by_col = false of true). This memory // can't be passed to a kernel that might execute on a different thread. // // In practice when we are ready to pack memory for the sharding dimension // (rhs if shard_by_col==true) of the K-th slice, all kernels for K-1 slice // already computed (99% of the time), and we can pack data into the thread // local storage, and guarantee that all the kernels will be executed // immediately in the same thread. This significantly increases L1 cache hit // ratio and reduces pressure on the memory bus. // // It's still possible that kernel for the K-th slice will be ready before // completion of the K-1 kernel, so we have to allocate "global" packed_lhs_ // and packed_rhs_ to allow kernels to be executed later on a thread // different from the thread that was used for packing. // Handle for pre-allocated thread local memory buffers. BlockMemHandle thread_local_pre_alocated_mem_; // Only one of these will be initialized depending on shard_by_col value // (the size will be `num_worker_threads * num_grains_in_the_sharding_dim`). std::vector lhs_thread_local_pre_allocated_; std::vector rhs_thread_local_pre_allocated_; // How many thread local blocks were already allocated. std::atomic num_thread_local_allocations_; const int thread_local_capacity; // We will use pre-allocated Lhs/Rhs blocks defined above, if the number of // unique threads in a system is below or equal to the number of threads in // a thread pool. We will fallback on dynamic memory allocation after that. // ThreadLocalBlocks is a container for Lhs or Rhs thread local buffers. Its // size is equal to the grain size in Lhs/Rhs sharding dimension. template class ThreadLocalBlocks { public: ThreadLocalBlocks() = default; ThreadLocalBlocks(BlockType* base, size_t grain_size) : is_pre_allocated_(true), thread_local_pre_allocated_base_(base), grain_size_(grain_size) {} ThreadLocalBlocks(BlockMemHandle mem_handle, std::vector blocks) : is_pre_allocated_(false), mem_handle_(std::move(mem_handle)), blocks_(std::move(blocks)) {} BlockType& block(int grain_index) { eigen_assert(grain_index >= 0); eigen_assert(static_cast(grain_index) < size()); return is_pre_allocated_ ? thread_local_pre_allocated_base_[grain_index] : blocks_[grain_index]; } void Release(EvalParallelContext& ctx) const { if (!is_pre_allocated_) { ctx.kernel_.deallocate(ctx.device_, mem_handle_); } } size_t size() const { return is_pre_allocated_ ? grain_size_ : blocks_.size(); } private: bool is_pre_allocated_; // Reuse pre-allocated thread local buffers. BlockType* thread_local_pre_allocated_base_ = nullptr; size_t grain_size_ = 0; // These will be initialized only if `is_pre_allocated == false`. BlockMemHandle mem_handle_{}; std::vector blocks_; }; // ThreadLocalBlocksInitialize callable does custom thread local blocks // initialization, and will reuse pre-allocated buffers if possible, or will // dynamically allocate new memory. // // Lhs/Rhs blocks might be of the same type, so we have to pass explicitly // for what side do we plan to do block allocation. template class ThreadLocalBlocksInitialize { static constexpr bool kIsLhs = !is_rhs && std::is_same::value; static const bool kIsRhs = is_rhs && std::is_same::value; static_assert(kIsLhs || kIsRhs, "Unkown block type"); using Blocks = ThreadLocalBlocks; public: ThreadLocalBlocksInitialize(EvalParallelContext& ctx) : ctx_(ctx), num_worker_threads_(ctx_.device_.numThreadsInPool()) {} void operator()(Blocks& blocks) { const int n = ctx_.num_thread_local_allocations_.fetch_add( 1, std::memory_order_relaxed); if (n >= num_worker_threads_) { ThreadLocalBlocksAllocator::allocate(ctx_, blocks); } else { ThreadLocalBlocksAllocator::reuse(ctx_, n, blocks); } } private: // NOTE(ezhulenev): Without 'if constexpr' we have to put calls to // TensorContractionKernel::allocateSlices into template specializations. // Also explicit specializations are not allowed at class scope in C++03, // EvalCtx type parameter is just a workaround for that limitation. template struct ThreadLocalBlocksAllocator; template struct ThreadLocalBlocksAllocator { static void allocate(EvalCtx& ctx, Blocks& blocks) { std::vector rhs_blocks; BlockMemHandle mem_handle = ctx.kernel_.allocateSlices( ctx.device_, /*num_lhs=*/0, /*num_rhs=*/ctx.gn_, /*num_slices=*/1, /*lhs_blocks=*/nullptr, /*rhs_blocks=*/&rhs_blocks); blocks = ThreadLocalBlocks(std::move(mem_handle), std::move(rhs_blocks)); } static void reuse(EvalCtx& ctx, int index, Blocks& blocks) { RhsBlock* ptr = &ctx.rhs_thread_local_pre_allocated_[ctx.gn_ * index]; blocks = ThreadLocalBlocks(ptr, ctx.gn_); } }; template struct ThreadLocalBlocksAllocator { static void allocate(EvalCtx& ctx, Blocks& blocks) { std::vector lhs_blocks; BlockMemHandle mem_handle = ctx.kernel_.allocateSlices( ctx.device_, /*num_lhs=*/ctx.gm_, /*num_rhs=*/0, /*num_slices=*/1, /*lhs_blocks=*/&lhs_blocks, /*rhs_blocks=*/nullptr); blocks = ThreadLocalBlocks(std::move(mem_handle), std::move(lhs_blocks)); } static void reuse(EvalCtx& ctx, int index, Blocks& blocks) { LhsBlock* ptr = &ctx.lhs_thread_local_pre_allocated_[ctx.gm_ * index]; blocks = ThreadLocalBlocks(ptr, ctx.gm_); } }; EvalParallelContext& ctx_; const int num_worker_threads_; }; template class ThreadLocalBlocksRelease { public: using Blocks = ThreadLocalBlocks; ThreadLocalBlocksRelease(EvalParallelContext& ctx) : ctx_(ctx) {} void operator()(Blocks& blocks) { blocks.Release(ctx_); } private: EvalParallelContext& ctx_; }; // ThreadLocalBlocks initialization callables. using ThreadLocalLhsInit = ThreadLocalBlocksInitialize; using ThreadLocalRhsInit = ThreadLocalBlocksInitialize; // ThreadLocalBlocks release callables. using ThreadLocalLhsRelease = ThreadLocalBlocksRelease; using ThreadLocalRhsRelease = ThreadLocalBlocksRelease; // Thread local containers for Lhs/Rhs block packs. In practice only one of // them will be used, depending on the shard_by_col value. Eigen::ThreadLocal, ThreadLocalLhsInit, ThreadLocalLhsRelease> lhs_thread_local_blocks_; Eigen::ThreadLocal, ThreadLocalRhsInit, ThreadLocalRhsRelease> rhs_thread_local_blocks_; // After a particular shard for Kth slice missed thread local execution // opportunity (K-1 slice didn't complete kernels execution), we can no // longer schedule K+1 and following slices in thread local mode, because // there is no more guarantee that previous kernels were executed // sequentially in the same thread (size is nn_ or nm_). std::atomic* can_use_thread_local_packed_; std::atomic** state_kernel_[P]; // state_switch_ is frequently modified by worker threads, while other // fields are read-only after constructor. Let's move it to a separate cache // line to reduce cache-coherency traffic. char pad_[128]; std::atomic state_packing_ready_[P]; std::atomic state_switch_[P]; LhsBlock& packed_lhs(Index m, Index k, Index m1, bool use_thread_local) { if (use_thread_local) { eigen_assert(!shard_by_col_); ThreadLocalBlocks& blocks = lhs_thread_local_blocks_.local(); Index grain_index = m1 - m * gm_; return blocks.block(internal::convert_index(grain_index)); // FIXME better make ThreadLocalBlocks use Eigen::Index? } else { return packed_lhs_[k % (P - 1)][m1]; } } RhsBlock& packed_rhs(Index n, Index k, Index n1, bool use_thread_local) { if (use_thread_local) { eigen_assert(shard_by_col_); ThreadLocalBlocks& blocks = rhs_thread_local_blocks_.local(); Index grain_index = n1 - n * gn_; return blocks.block(internal::convert_index(grain_index)); // FIXME better make ThreadLocalBlocks use Eigen::Index? } else { return packed_rhs_[k % (P - 1)][n1]; } } // In following two methods (pack_lhs and pack_rhs), if we know for sure // that we'll be able to immediately call a kernel with packed data, and do // not submit it to the thread pool, we can use thread local memory for // packed data. // // We can only reliably check it if we are running all kernels in sync mode // (parallelize only by sharding dim). If kernel for m==0 (n==0) is ready to // run, it's guaranteed that all kernels with larger values of m (n) are // also ready, because we execute them in the same order for all K slices. void pack_lhs(Index m, Index k) { bool use_thread_local = false; if (parallelize_by_sharding_dim_only_ && !shard_by_col_ && can_use_thread_local_packed_[m].load(std::memory_order_relaxed)) { if (state_kernel_[k % P][m][0].load(std::memory_order_relaxed) == 1) { use_thread_local = true; } else { // If we can't guarantee that all kernels in `k` slice will be // executed sequentially in current thread, it's no longer safe to use // thread local memory in following slices along the k dimensions. eigen_assert(k > 0); can_use_thread_local_packed_[m].store(false, std::memory_order_relaxed); } } const Index mend = m * gm_ + gm(m); for (Index m1 = m * gm_; m1 < mend; m1++) kernel_.packLhs(&packed_lhs(m, k, m1, use_thread_local), lhs_.getSubMapper(m1 * bm_, k * bk_), bk(k), bm(m1)); if (!parallel_pack_ && shard_by_col_) { assert(!use_thread_local); signal_packing(k); } else { signal_switch(k + 1); for (Index n = nn_ - 1; n >= 0; n--) { bool sync = parallelize_by_sharding_dim_only_ || n == 0; signal_kernel(m, n, k, sync, use_thread_local); } } } void pack_rhs(Index n, Index k) { bool use_thread_local = false; if (parallelize_by_sharding_dim_only_ && shard_by_col_ && can_use_thread_local_packed_[n].load(std::memory_order_relaxed)) { if (state_kernel_[k % P][0][n].load(std::memory_order_relaxed) == 1) { use_thread_local = true; } else { // If we can't guarantee that all kernels in `k` slice will be // executed sequentially in current thread, it's no longer safe to use // thread local memory in followig slices along the k dimensions. eigen_assert(k > 0); can_use_thread_local_packed_[n].store(false, std::memory_order_relaxed); } } const Index nend = n * gn_ + gn(n); for (Index n1 = n * gn_; n1 < nend; n1++) { if (!TensorContractionKernel::HasBeta && k == 0) { // Zero the output memory in parallel, only if contraction kernel does // not support `beta`. Otherwise we will pass beta 0.0 to the first // call to the `TensorContractionKernel::invoke()`. // // On 10000x2x10000 mm zeroing can easily take half of time. Zero (bn // x m) row. Safe to do here because all kernels that will write to // this memory depend on completion of this task. Note: don't call // device_.memset() here. device_.memset() blocks on thread pool // worker thread, which can lead to underutilization and deadlocks. memset(buffer_ + n1 * bn_ * m_, 0, bn(n1) * m_ * sizeof(Scalar)); } kernel_.packRhs(&packed_rhs(n, k, n1, use_thread_local), rhs_.getSubMapper(k * bk_, n1 * bn_), bk(k), bn(n1)); } if (parallel_pack_ || shard_by_col_) { signal_switch(k + 1); for (Index m = nm_ - 1; m >= 0; m--) { bool sync = parallelize_by_sharding_dim_only_ || m == 0; signal_kernel(m, n, k, sync, use_thread_local); } } else { assert(!use_thread_local); signal_packing(k); } } void kernel(Index m, Index n, Index k, bool use_thread_local) { // Note: order of iteration matters here. Iteration over m is innermost // because we want to reuse the same packed rhs in consecutive tasks // (rhs fits into L2$ while lhs only into L3$). const Index nend = n * gn_ + gn(n); const Index mend = m * gm_ + gm(m); // NOTE: output = alpha * LHS * RHS + beta * output. const Scalar alpha = Scalar(1); const Scalar beta = (TensorContractionKernel::HasBeta && k == 0) ? Scalar(0) : Scalar(1); if (shard_by_col_) { for (Index n1 = n * gn_; n1 < nend; n1++) { for (Index m1 = m * gm_; m1 < mend; m1++) { const auto output_mapper = output_.getSubMapper(m1 * bm_, n1 * bn_); kernel_.invoke( output_mapper, packed_lhs(m, k, m1, !shard_by_col_ && use_thread_local), packed_rhs(n, k, n1, shard_by_col_ && use_thread_local), bm(m1), bk(k), bn(n1), alpha, beta); // We are done with the last task for the [m1, n1] block. if (k + 1 == nk_) { output_kernel_(output_mapper, tensor_contraction_params_, m1 * bm_, n1 * bn_, bm(m1), bn(n1)); } } } } else { for (Index m1 = m * gm_; m1 < mend; m1++) for (Index n1 = n * gn_; n1 < nend; n1++) { const auto output_mapper = output_.getSubMapper(m1 * bm_, n1 * bn_); kernel_.invoke( output_mapper, packed_lhs(m, k, m1, !shard_by_col_ && use_thread_local), packed_rhs(n, k, n1, shard_by_col_ && use_thread_local), bm(m1), bk(k), bn(n1), alpha, beta); // We are done with the last task for the [m1, n1] block. if (k + 1 == nk_) { output_kernel_(output_mapper, tensor_contraction_params_, m1 * bm_, n1 * bn_, bm(m1), bn(n1)); } } } signal_kernel(m, n, k + 1, /*sync=*/false, /*use_thread_local=*/false); signal_switch(k + 2); } void signal_packing(Index k) { eigen_assert(!parallel_pack_); Index s = state_packing_ready_[k % P].fetch_sub(1); eigen_assert(s > 0); if (s != 1) return; state_packing_ready_[k % P] = shard_by_col_ ? nm_ : nn_; enqueue_packing(k, shard_by_col_); } void signal_kernel(Index m, Index n, Index k, bool sync, bool use_thread_local) { std::atomic* state = &state_kernel_[k % P][m][n]; Index s = state->load(); eigen_assert(s > 0); if (s != 1 && state->fetch_sub(1) != 1) { eigen_assert(!use_thread_local); return; } state->store(parallel_pack_ ? 3 : 2, std::memory_order_relaxed); if (sync) { kernel(m, n, k, use_thread_local); } else { eigen_assert(!use_thread_local); device_.enqueueNoNotification( [=]() { kernel(m, n, k, use_thread_local); }); } } void signal_switch(Index k, Index v = 1) { Index s = state_switch_[k % P].fetch_sub(v); eigen_assert(s >= v); if (s != v) return; // Ready to switch to the next k slice. // Reset counter for the next iteration. state_switch_[k % P] = (parallel_pack_ ? nm_ + nn_ : (shard_by_col_ ? nn_ : nm_)) + nm_ * nn_; if (k < nk_) { // Issue lhs/rhs packing. Their completion will in turn kick off // kernels. if (parallel_pack_) { enqueue_packing(k, !shard_by_col_); enqueue_packing(k, shard_by_col_); } else if (shard_by_col_) { enqueue_packing(k, false); } else { enqueue_packing(k, true); } // Termination handling. // Because kernel completion signals k + 2 switch, we need to finish nk // + 2 slices without issuing any tasks on nk + 1 slice. So here we // pretend that all nk + 1 packing tasks just finish instantly; so that // nk + 2 switch only waits for completion of nk kernels. } else if (k == nk_) { signal_switch(k + 1, parallel_pack_ ? nm_ + nn_ : (shard_by_col_ ? nn_ : nm_)); } else { done_.Notify(); } } // Enqueue all rhs/lhs packing for k-th slice. void enqueue_packing(Index k, bool rhs) { enqueue_packing_helper(0, rhs ? nn_ : nm_, k, rhs); } void enqueue_packing_helper(Index start, Index end, Index k, bool rhs) { if (end - start == 1) { if (rhs) pack_rhs(start, k); else pack_lhs(start, k); } else { while (end - start > 1) { Index mid = (start + end) / 2; device_.enqueueNoNotification( [=]() { enqueue_packing_helper(mid, end, k, rhs); }); end = mid; } // Decide if we want to run first packing task (start == 0) in // async mode if we parallelize only by sharding dim: // (1) pack_lhs and pack_rhs call signal_switch before completing // all calls to signal_kernel, which in sync mode might lead // to the execution of the first kernel of the k+1 slice, before // completing a call to the last kernel of the k slice. // (2) all pack tasks for sharded dim must be executed in a thread // pool to get pre-allocated thead local buffers. bool pack_async = (start == 0) && (parallelize_by_sharding_dim_only_&& shard_by_col_ == rhs) && (k > 0 || std::this_thread::get_id() == created_by_thread_id_); if (pack_async) { device_.enqueueNoNotification( [=]() { enqueue_packing_helper(start, end, k, rhs); }); } else { enqueue_packing_helper(start, end, k, rhs); } } } // Block sizes with accounting for potentially incomplete last block. Index bm(Index m) const { return m + 1 < nm0_ ? bm_ : m_ + bm_ - bm_ * nm0_; } Index bn(Index n) const { return n + 1 < nn0_ ? bn_ : n_ + bn_ - bn_ * nn0_; } Index bk(Index k) const { return k + 1 < nk_ ? bk_ : k_ + bk_ - bk_ * nk_; } // Task grain sizes accounting for potentially incomplete last task. Index gm(Index m) const { return m + 1 < nm_ ? gm_ : nm0_ + gm_ - gm_ * nm_; } Index gn(Index n) const { return n + 1 < nn_ ? gn_ : nn0_ + gn_ - gn_ * nn_; } EvalParallelContext(const EvalParallelContext&) = delete; void operator=(const EvalParallelContext&) = delete; }; template using SyncEvalParallelContext = EvalParallelContext; // ------------------------------------------------------------------------ // // EvalShardedByInnerDimContext orchestrates sync/async contraction // evaluation, when we shard by inner dimension. When it is executed in // asynchronous mode, it owns all the shared state that might be accessible by // block processing tasks. template struct EvalShardedByInnerDimContext { EvalShardedByInnerDimContext(const Self* self, int num_threads, Scalar* result_buffer, Index m_size, Index n_size, Index k_size, DoneCallback done_callback) : evaluator(self), m_lhs_inner_dim_contiguous(evaluator->m_lhs_inner_dim_contiguous), m_rhs_inner_dim_contiguous(evaluator->m_rhs_inner_dim_contiguous), m_rhs_inner_dim_reordered(evaluator->m_rhs_inner_dim_reordered), result(result_buffer), m(m_size), n(n_size), k(k_size), done(std::move(done_callback)), buffer_size_bytes(m * n * sizeof(Scalar)), block_size(blockSize(k, num_threads)), num_blocks(divup(k, block_size)), num_pending_blocks(internal::convert_index(num_blocks)), l0_ranges(divup(num_blocks, l0_size)), l0_state(l0_ranges), block_buffers(num_blocks) { // Keep count of pending gemm tasks for each l0 range. for (int i = 0; i < l0_ranges; ++i) { const Index num_pending_tasks = actualRangeSize(l0_ranges, l0_size, i); l0_state.emplace_back(internal::convert_index(num_pending_tasks)); } // Allocate temporary buffers for each block. for (Index block_idx = 0; block_idx < num_blocks; ++block_idx) { Scalar* buf = block_idx == 0 ? result : static_cast(evaluator->m_device.allocate( buffer_size_bytes)); block_buffers.emplace_back(buf); } } ~EvalShardedByInnerDimContext() { for (Index i = 1; i < num_blocks; ++i) { evaluator->m_device.deallocate(block_buffers[i]); } } template void run() { Barrier barrier(internal::convert_index(num_blocks)); eval(barrier, 0, num_blocks); barrier.Wait(); // Aggregate partial sums from l0 ranges. aggregateL0Blocks(); // Apply output kernel. applyOutputKernel(); } template void runAsync() { evalAsync(0, num_blocks); } private: // The underlying GEMM kernel assumes that k is a multiple of // the packet size and subtle breakage occurs if this is violated. static const Index packet_size = internal::packet_traits::size; const Self* evaluator; // TensorContraction evaluator // These fields required fromTENSOR_CONTRACTION_DISPATCH macro. bool m_lhs_inner_dim_contiguous; bool m_rhs_inner_dim_contiguous; bool m_rhs_inner_dim_reordered; Scalar* result; Index m; Index n; Index k; DoneCallback done; // ----------------------------------------------------------------------// // Algorithm parameters. // We will compute partial results into the buffers of this size. Index buffer_size_bytes; Index block_size; Index num_blocks; // Keep track of pending tasks when evaluate in async mode. std::atomic num_pending_blocks; // We compute partial gemm results in parallel, and to get the final result // we need to add them all together. For the large number of threads (>= 48) // this adds a very expensive sequential step at the end. // // We split the [0, num_blocks) into small ranges, and when a task for the // block finishes its partial gemm computation, it checks if it was the last // gemm in the range, and if so, it will add all blocks of the range. // // After all tasks done, we need to add only these pre-aggregated blocks. // For now we use just a single level of ranges to compute pre-aggregated // partial sums, but in general we can use more layers to compute tree // aggregation in parallel and reduce the size of the sequential step. // // TODO(ezhulenev): Add multilevel tree aggregation? Probably will make // sense only if number of threads >= ~128? static const Index l0_size = 4; Index l0_ranges; // Keep count of pending gemm tasks for each l0 range. MaxSizeVector> l0_state; // [0, l0_ranges) // Buffers allocated for each temporary block computation. MaxSizeVector block_buffers; // [0, num_blocks) template void processBlock(Index block_idx, Index begin, Index end) { Scalar* buf = block_buffers[block_idx]; TENSOR_CONTRACTION_DISPATCH( evaluator->template evalGemmPartialWithoutOutputKernel, Alignment, (buf, begin, end, /*num_threads=*/internal::convert_index(num_blocks))); // Check if it was the last task in l0 range. const Index l0_index = block_idx / l0_size; const int v = l0_state[l0_index].fetch_sub(1); eigen_assert(v >= 1); // If we processed the last block of the range, we can aggregate all // partial results into the first block of the range. if (v == 1) { const Index rng_size = actualRangeSize(l0_ranges, l0_size, l0_index); const Index dst_block_idx = l0_index * l0_size; if (rng_size == l0_size) { addAllToBuffer( m * n, /*src_buf0=*/block_buffers[dst_block_idx + 1], /*src_buf1=*/block_buffers[dst_block_idx + 2], /*src_buf2=*/block_buffers[dst_block_idx + 3], /*dst_buf= */ block_buffers[dst_block_idx]); } else { // Aggregate blocks of potentially incomplete last range. for (int i = 1; i < rng_size; ++i) { addToBuffer(m * n, /*src_buf=*/block_buffers[dst_block_idx + i], /*dst_buf=*/block_buffers[dst_block_idx]); } } } } // Aggregate partial sums from l0 ranges. template void aggregateL0Blocks() const { Index l0_index = 1; for (; l0_index + 2 < l0_ranges; l0_index += 3) { addAllToBuffer( m * n, /*src_buf0=*/block_buffers[(l0_index + 0) * l0_size], /*src_buf1=*/block_buffers[(l0_index + 1) * l0_size], /*src_buf2=*/block_buffers[(l0_index + 2) * l0_size], /*dst_buf= */ block_buffers[0]); } for (; l0_index < l0_ranges; ++l0_index) { addToBuffer(m * n, block_buffers[l0_index * l0_size], block_buffers[0]); } } void applyOutputKernel() const { typedef internal::blas_data_mapper OutputMapper; evaluator->m_output_kernel( OutputMapper(result, m), evaluator->m_tensor_contraction_params, static_cast(0), static_cast(0), m, n); } // Compute block size with accounting for potentially incomplete last block. Index actualBlockSize(Index block_idx) const { return block_idx + 1 < num_blocks ? block_size : k + block_size - block_size * num_blocks; }; // Compute range size with accounting for potentially incomplete last range. Index actualRangeSize(Index num_ranges, Index range_size, Index range_idx) const { eigen_assert(range_idx < num_ranges); return range_idx + 1 < num_ranges ? range_size : num_blocks + range_size - range_size * num_ranges; }; template EIGEN_STRONG_INLINE static void addToBuffer(size_t n, const Scalar* src_buf, Scalar* tgt_buf) { const int output_packet_size = internal::unpacket_traits::size; size_t i = 0; const size_t num_packets = n / output_packet_size; for (; i < output_packet_size * num_packets; i += output_packet_size) { const PacketReturnType src_val = internal::pload(src_buf + i); const PacketReturnType tgt_val = internal::ploadt(tgt_buf + i); const PacketReturnType sum = internal::padd(src_val, tgt_val); internal::pstoret(tgt_buf + i, sum); } for (; i < n; ++i) { tgt_buf[i] += src_buf[i]; } } template EIGEN_STRONG_INLINE static void addAllToBuffer(size_t n, const Scalar* src_buf0, const Scalar* src_buf1, const Scalar* src_buf2, Scalar* dst_buf) { using ::Eigen::internal::padd; using ::Eigen::internal::pload; using ::Eigen::internal::ploadt; using ::Eigen::internal::pstoret; const int output_packet_size = internal::unpacket_traits::size; size_t i = 0; const size_t num_packets = n / output_packet_size; for (; i < output_packet_size * num_packets; i += output_packet_size) { const auto src_val0 = pload(src_buf0 + i); const auto src_val1 = pload(src_buf1 + i); const auto src_val2 = pload(src_buf2 + i); const auto dst_val = ploadt(dst_buf + i); const auto sum = padd(padd(dst_val, src_val0), padd(src_val1, src_val2)); pstoret(dst_buf + i, sum); } for (; i < n; ++i) { dst_buf[i] += src_buf0[i] + src_buf1[i] + src_buf2[i]; } } template void eval(Barrier& barrier, Index start_block_idx, Index end_block_idx) { while (end_block_idx - start_block_idx > 1) { Index mid_block_idx = (start_block_idx + end_block_idx) / 2; evaluator->m_device.enqueueNoNotification( [this, &barrier, mid_block_idx, end_block_idx]() { eval(barrier, mid_block_idx, end_block_idx); }); end_block_idx = mid_block_idx; } Index block_idx = start_block_idx; Index block_start = block_idx * block_size; Index block_end = block_start + actualBlockSize(block_idx); processBlock(block_idx, block_start, block_end); barrier.Notify(); } template void evalAsync(Index start_block_idx, Index end_block_idx) { while (end_block_idx - start_block_idx > 1) { Index mid_block_idx = (start_block_idx + end_block_idx) / 2; evaluator->m_device.enqueueNoNotification( [this, mid_block_idx, end_block_idx]() { evalAsync(mid_block_idx, end_block_idx); }); end_block_idx = mid_block_idx; } Index block_idx = start_block_idx; Index block_start = block_idx * block_size; Index block_end = block_start + actualBlockSize(block_idx); processBlock(block_idx, block_start, block_end); int v = num_pending_blocks.fetch_sub(1); eigen_assert(v >= 1); if (v == 1) { // Aggregate partial sums from l0 ranges. aggregateL0Blocks(); // Apply output kernel. applyOutputKernel(); // NOTE: If we call `done` callback before deleting this (context), // it might deallocate Self* pointer captured by context, and we'll // fail in destructor trying to deallocate temporary buffers. // Move done call back from context before it will be destructed. DoneCallback done_copy = std::move(done); // We are confident that we are the last one who touches context. delete this; // Now safely call the done callback. done_copy(); } } // Cost model doesn't capture well the cost associated with constructing // tensor contraction mappers and computing loop bounds in gemm_pack_lhs // and gemm_pack_rhs, so we specify minimum desired block size. static Index blockSize(Index k, int num_threads) { const auto round_up = [=](Index index) -> Index { const Index kmultiple = packet_size <= 8 ? 8 : packet_size; return divup(index, kmultiple) * kmultiple; }; const Index target_block_size = round_up(divup(k, num_threads)); const Index desired_min_block_size = 12 * packet_size; return numext::mini( k, numext::maxi(desired_min_block_size, target_block_size)); } EvalShardedByInnerDimContext(const EvalShardedByInnerDimContext&) = delete; void operator=(const EvalShardedByInnerDimContext&) = delete; }; // ------------------------------------------------------------------------ // // Below are the function used by evalProductImpl heuristics, trying to select // optimcal parameters for parallelization algorithm. // Decide whether we want to shard m x n contraction by columns or by rows. static bool shardByCol(Index m, Index n, Index num_threads) { // Note: we are comparing both n and m against Traits::nr, it is not // a mistake. We are trying to figure out how both n and m will fit into // the main sharding dimension. // Sharding by column is the default // ... unless there is enough data for vectorization over rows if (m / num_threads >= Traits::nr && // and not enough data for vectorization over columns (n / num_threads < Traits::nr || // ... or barely enough data for vectorization over columns, // but it is not evenly dividable across threads (n / num_threads < 4 * Traits::nr && (n % (num_threads * Traits::nr)) != 0 && // ... and it is evenly dividable across threads for rows ((m % (num_threads * Traits::nr)) == 0 || // .. or it is not evenly dividable for both dimensions but // there is much more data over rows so that corner effects are // mitigated. (m / n >= 6))))) return false; // Wait, or if matrices are just substantially prolonged over the other // dimension. if (n / num_threads < 16 * Traits::nr && m > n * 32) return false; return true; } Index coarsenM(Index m, Index n, Index bm, Index bn, Index bk, Index gn, int num_threads, bool shard_by_col) const { Index gm = 1; Index gm1 = 1; Index nm0 = divup(m, bm); Index nm1 = nm0; for (;;) { // Find the next candidate for m grain size. It needs to result in // different number of blocks. E.g. if we have 10 kernels, we want to try // 5 and 10, but not 6, 7, 8 and 9. while (gm1 <= nm0 && nm1 == divup(nm0, gm1)) gm1++; if (gm1 > nm0) break; // Check the candidate. int res = checkGrain(m, n, bm, bn, bk, gm1, gn, gm, gn, num_threads, shard_by_col); if (res < 0) break; nm1 = divup(nm0, gm1); if (res == 0) continue; // Commit new grain size. gm = gm1; } return gm; } Index coarsenN(Index m, Index n, Index bm, Index bn, Index bk, Index gm, int num_threads, bool shard_by_col) const { Index gn = 1; Index gn1 = 1; Index nn0 = divup(n, bn); Index nn1 = nn0; for (;;) { while (gn1 <= nn0 && nn1 == divup(nn0, gn1)) gn1++; if (gn1 > nn0) break; int res = checkGrain(m, n, bm, bn, bk, gm, gn1, gm, gn, num_threads, shard_by_col); if (res < 0) break; nn1 = divup(nn0, gn1); if (res == 0) continue; gn = gn1; } return gn; } // checkGrain checks whether grain (gm, gn) is suitable and is better than // (oldgm, oldgn). int checkGrain(Index m, Index n, Index bm, Index bn, Index bk, Index gm, Index gn, Index oldgm, Index oldgn, int num_threads, bool shard_by_col) const { const TensorOpCost cost = contractionCost(bm * gm, bn * gn, bm, bn, bk, shard_by_col, true); double taskSize = TensorCostModel::taskSize( static_cast(bm) * gm * bn * gn, cost); // If the task is too small, then we agree on it regardless of anything // else. Otherwise synchronization overheads will dominate. if (taskSize < 1) return 1; // If it is too large, then we reject it and all larger tasks. if (taskSize > 2) return -1; // Now we are in presumably good task size range. // The main deciding factor here is parallelism. Consider that we have 12 // kernels and 4 threads. Grains of 2, 3 and 4 all yield good task sizes. // But 2/4 yield 6/3 tasks, which gives us parallelism of 0.75 (at most 3/4 // of cores will be busy). While grain size 3 gives us 4 tasks, which gives // us parallelism of 1 (we can load all cores). Index nm0 = divup(m, bm); Index nn0 = divup(n, bn); Index new_tasks = divup(nm0, gm) * divup(nn0, gn); double new_parallelism = static_cast(new_tasks) / (divup(new_tasks, num_threads) * num_threads); Index old_tasks = divup(nm0, oldgm) * divup(nn0, oldgn); double old_parallelism = static_cast(old_tasks) / (divup(old_tasks, num_threads) * num_threads); if (new_parallelism > old_parallelism || new_parallelism == 1) return 1; return 0; } TensorOpCost contractionCost(Index m, Index n, Index bm, Index bn, Index bk, bool shard_by_col, bool prepacked) const { const int packed_size = std::min(PacketType::size, PacketType::size); const int output_packet_size = internal::unpacket_traits::size; const double kd = static_cast(bk); double compute_bandwidth = computeBandwidth(false, bm, bn, bk); // Computations. TensorOpCost cost = TensorOpCost(0, 0, kd * compute_bandwidth, true, packed_size); // Output stores. cost += TensorOpCost(0, sizeof(CoeffReturnType), 0, true, output_packet_size); if (prepacked) { // Packing and kernels are executed in different tasks. When we calculate // task grain size we look only at kernel cost assuming that kernel // is more expensive than packing. return cost; } // Lhs/rhs loads + computations. TensorOpCost lhsCost = this->m_leftImpl.costPerCoeff(true) * (kd / n); TensorOpCost rhsCost = this->m_rightImpl.costPerCoeff(true) * (kd / m); // Lhs packing memory cost does not contribute considerably to overall // execution time because lhs is prefetched early and accessed sequentially. if (shard_by_col) lhsCost.dropMemoryCost(); else rhsCost.dropMemoryCost(); return cost + lhsCost + rhsCost; } // Decide whether we want to shard m x k x n contraction over the inner // (contraction) dimension (k). static bool shardByInnerDim(Index m, Index n, Index k, int num_threads, int num_threads_by_k) { std::ptrdiff_t bufsize = m * n * sizeof(Scalar); bool shard_by_k = false; if (n == 1 || // If mat*vec or... num_threads_by_k < 2 || // running single threaded or... num_threads_by_k < num_threads || // sharding by k gives less parallelism or... bufsize > l3CacheSize() / num_threads_by_k || // need more buffer space // than L3 cache or... k / num_threads_by_k < 2 * Traits::nr) { // k per thread is tiny. shard_by_k = false; } else if (numext::maxi(m, n) / num_threads < Traits::nr || // both other dimensions are tiny or... // k per thread is not small and... (k / num_threads_by_k > 8 * Traits::nr && // one of the outer dimensions is tiny or sharding by k offers // more parallelism. (numext::mini(m, n) < 2 * Traits::nr || num_threads_by_k > num_threads))) { shard_by_k = true; } return shard_by_k; } TensorOpCost contractionCostPerInnerDim(Index m, Index n, Index k) const { // Compute cost. const int output_packet_size = internal::unpacket_traits::size; TensorOpCost cost(0, 0, (computeBandwidth(true, m, n, k) * m) * n, true, output_packet_size); // Output stores. cost += TensorOpCost(0, sizeof(CoeffReturnType), 0, true, output_packet_size); TensorOpCost lhsCost = this->m_leftImpl.costPerCoeff(true) * m; TensorOpCost rhsCost = this->m_rightImpl.costPerCoeff(true) * n; // Since the inner gemm kernel is always sharded by column, the lhs // load cost is negligible. lhsCost.dropMemoryCost(); return cost + lhsCost + rhsCost; } int numThreadsInnerDim(Index m, Index n, Index k) const { const int output_packet_size = internal::unpacket_traits::size; TensorOpCost cost = contractionCostPerInnerDim(m, n, k); double total_parallel_cost = TensorCostModel::totalCost(k, cost); // Cost of reduction step accumulating the m*n per-thread buffers into the // result. double reduction_cost = TensorCostModel::totalCost( m * n, TensorOpCost(2, 1, 1, true, output_packet_size)); int num_threads = 1; double min_cost = total_parallel_cost; double kPerThreadOverHead = 3000; double kFixedOverHead = 100000; for (int nt = 2; nt <= this->m_device.numThreads(); nt += 2) { double sequential_cost = kFixedOverHead + nt * (reduction_cost + kPerThreadOverHead); double parallel_cost = total_parallel_cost / nt + sequential_cost; if (parallel_cost < min_cost) { num_threads = nt; min_cost = parallel_cost; } } return num_threads; } double computeBandwidth(bool shard_by_col, Index bm, Index bn, Index bk) const { // Peak VFMA bandwidth is 0.5. However if we have not enough data for // vectorization bandwidth drops. The 4.0 and 2.0 bandwidth is determined // experimentally. double computeBandwidth = bk == 1 ? 4.0 : (shard_by_col ? bn : bm) < Traits::nr || (shard_by_col ? bm : bn) < Traits::mr ? 2.0 : 0.5; #ifndef EIGEN_VECTORIZE_FMA // Bandwidth of all of VFMA/MULPS/ADDPS is 0.5 on latest Intel processors. // However for MULPS/ADDPS we have dependent sequence of 2 such // instructions, // so overall bandwidth is 1.0. if (computeBandwidth == 0.5) computeBandwidth = 1.0; #endif return computeBandwidth; } }; } // end namespace Eigen #endif // EIGEN_USE_THREADS #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h0000644000176200001440000002632214562417221025426 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_PATCH_H #define EIGEN_CXX11_TENSOR_TENSOR_PATCH_H namespace Eigen { /** \class TensorPatch * \ingroup CXX11_Tensor_Module * * \brief Tensor patch class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions + 1; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorPatchOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorPatchOp type; }; } // end namespace internal template class TensorPatchOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorPatchOp(const XprType& expr, const PatchDim& patch_dims) : m_xpr(expr), m_patch_dims(patch_dims) {} EIGEN_DEVICE_FUNC const PatchDim& patch_dims() const { return m_patch_dims; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const PatchDim m_patch_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorPatchOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value + 1; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { Index num_patches = 1; const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); const PatchDim& patch_dims = op.patch_dims(); if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < NumDims-1; ++i) { m_dimensions[i] = patch_dims[i]; num_patches *= (input_dims[i] - patch_dims[i] + 1); } m_dimensions[NumDims-1] = num_patches; m_inputStrides[0] = 1; m_patchStrides[0] = 1; for (int i = 1; i < NumDims-1; ++i) { m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; m_patchStrides[i] = m_patchStrides[i-1] * (input_dims[i-1] - patch_dims[i-1] + 1); } m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1]; } } else { for (int i = 0; i < NumDims-1; ++i) { m_dimensions[i+1] = patch_dims[i]; num_patches *= (input_dims[i] - patch_dims[i] + 1); } m_dimensions[0] = num_patches; m_inputStrides[NumDims-2] = 1; m_patchStrides[NumDims-2] = 1; for (int i = NumDims-3; i >= 0; --i) { m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; m_patchStrides[i] = m_patchStrides[i+1] * (input_dims[i+1] - patch_dims[i+1] + 1); } m_outputStrides[NumDims-1] = 1; for (int i = NumDims-2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { Index output_stride_index = (static_cast(Layout) == static_cast(ColMajor)) ? NumDims - 1 : 0; // Find the location of the first element of the patch. Index patchIndex = index / m_outputStrides[output_stride_index]; // Find the offset of the element wrt the location of the first element. Index patchOffset = index - patchIndex * m_outputStrides[output_stride_index]; Index inputIndex = 0; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 2; i > 0; --i) { const Index patchIdx = patchIndex / m_patchStrides[i]; patchIndex -= patchIdx * m_patchStrides[i]; const Index offsetIdx = patchOffset / m_outputStrides[i]; patchOffset -= offsetIdx * m_outputStrides[i]; inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i]; } } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 2; ++i) { const Index patchIdx = patchIndex / m_patchStrides[i]; patchIndex -= patchIdx * m_patchStrides[i]; const Index offsetIdx = patchOffset / m_outputStrides[i+1]; patchOffset -= offsetIdx * m_outputStrides[i+1]; inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i]; } } inputIndex += (patchIndex + patchOffset); return m_impl.coeff(inputIndex); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); Index output_stride_index = (static_cast(Layout) == static_cast(ColMajor)) ? NumDims - 1 : 0; Index indices[2] = {index, index + PacketSize - 1}; Index patchIndices[2] = {indices[0] / m_outputStrides[output_stride_index], indices[1] / m_outputStrides[output_stride_index]}; Index patchOffsets[2] = {indices[0] - patchIndices[0] * m_outputStrides[output_stride_index], indices[1] - patchIndices[1] * m_outputStrides[output_stride_index]}; Index inputIndices[2] = {0, 0}; if (static_cast(Layout) == static_cast(ColMajor)) { EIGEN_UNROLL_LOOP for (int i = NumDims - 2; i > 0; --i) { const Index patchIdx[2] = {patchIndices[0] / m_patchStrides[i], patchIndices[1] / m_patchStrides[i]}; patchIndices[0] -= patchIdx[0] * m_patchStrides[i]; patchIndices[1] -= patchIdx[1] * m_patchStrides[i]; const Index offsetIdx[2] = {patchOffsets[0] / m_outputStrides[i], patchOffsets[1] / m_outputStrides[i]}; patchOffsets[0] -= offsetIdx[0] * m_outputStrides[i]; patchOffsets[1] -= offsetIdx[1] * m_outputStrides[i]; inputIndices[0] += (patchIdx[0] + offsetIdx[0]) * m_inputStrides[i]; inputIndices[1] += (patchIdx[1] + offsetIdx[1]) * m_inputStrides[i]; } } else { EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 2; ++i) { const Index patchIdx[2] = {patchIndices[0] / m_patchStrides[i], patchIndices[1] / m_patchStrides[i]}; patchIndices[0] -= patchIdx[0] * m_patchStrides[i]; patchIndices[1] -= patchIdx[1] * m_patchStrides[i]; const Index offsetIdx[2] = {patchOffsets[0] / m_outputStrides[i+1], patchOffsets[1] / m_outputStrides[i+1]}; patchOffsets[0] -= offsetIdx[0] * m_outputStrides[i+1]; patchOffsets[1] -= offsetIdx[1] * m_outputStrides[i+1]; inputIndices[0] += (patchIdx[0] + offsetIdx[0]) * m_inputStrides[i]; inputIndices[1] += (patchIdx[1] + offsetIdx[1]) * m_inputStrides[i]; } } inputIndices[0] += (patchIndices[0] + patchOffsets[0]); inputIndices[1] += (patchIndices[1] + patchOffsets[1]); if (inputIndices[1] - inputIndices[0] == PacketSize - 1) { PacketReturnType rslt = m_impl.template packet(inputIndices[0]); return rslt; } else { EIGEN_ALIGN_MAX CoeffReturnType values[PacketSize]; values[0] = m_impl.coeff(inputIndices[0]); values[PacketSize-1] = m_impl.coeff(inputIndices[1]); EIGEN_UNROLL_LOOP for (int i = 1; i < PacketSize-1; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = NumDims * (TensorOpCost::DivCost() + TensorOpCost::MulCost() + 2 * TensorOpCost::AddCost()); return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: Dimensions m_dimensions; array m_outputStrides; array m_inputStrides; array m_patchStrides; TensorEvaluator m_impl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_PATCH_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorRef.h0000644000176200001440000003471114562417221025104 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_REF_H #define EIGEN_CXX11_TENSOR_TENSOR_REF_H namespace Eigen { namespace internal { template class TensorLazyBaseEvaluator { public: TensorLazyBaseEvaluator() : m_refcount(0) { } virtual ~TensorLazyBaseEvaluator() { } EIGEN_DEVICE_FUNC virtual const Dimensions& dimensions() const = 0; EIGEN_DEVICE_FUNC virtual const Scalar* data() const = 0; EIGEN_DEVICE_FUNC virtual const Scalar coeff(DenseIndex index) const = 0; EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex index) = 0; void incrRefCount() { ++m_refcount; } void decrRefCount() { --m_refcount; } int refCount() const { return m_refcount; } private: // No copy, no assignment; TensorLazyBaseEvaluator(const TensorLazyBaseEvaluator& other); TensorLazyBaseEvaluator& operator = (const TensorLazyBaseEvaluator& other); int m_refcount; }; template class TensorLazyEvaluatorReadOnly : public TensorLazyBaseEvaluator::Scalar> { public: // typedef typename TensorEvaluator::Dimensions Dimensions; typedef typename TensorEvaluator::Scalar Scalar; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef TensorEvaluator EvalType; TensorLazyEvaluatorReadOnly(const Expr& expr, const Device& device) : m_impl(expr, device), m_dummy(Scalar(0)) { m_dims = m_impl.dimensions(); m_impl.evalSubExprsIfNeeded(NULL); } virtual ~TensorLazyEvaluatorReadOnly() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC virtual const Dimensions& dimensions() const { return m_dims; } EIGEN_DEVICE_FUNC virtual const Scalar* data() const { return m_impl.data(); } EIGEN_DEVICE_FUNC virtual const Scalar coeff(DenseIndex index) const { return m_impl.coeff(index); } EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex /*index*/) { eigen_assert(false && "can't reference the coefficient of a rvalue"); return m_dummy; }; protected: TensorEvaluator m_impl; Dimensions m_dims; Scalar m_dummy; }; template class TensorLazyEvaluatorWritable : public TensorLazyEvaluatorReadOnly { public: typedef TensorLazyEvaluatorReadOnly Base; typedef typename Base::Scalar Scalar; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; TensorLazyEvaluatorWritable(const Expr& expr, const Device& device) : Base(expr, device) { } virtual ~TensorLazyEvaluatorWritable() { } EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex index) { return this->m_impl.coeffRef(index); } }; template class TensorLazyEvaluator : public internal::conditional::value), TensorLazyEvaluatorWritable, TensorLazyEvaluatorReadOnly >::type { public: typedef typename internal::conditional::value), TensorLazyEvaluatorWritable, TensorLazyEvaluatorReadOnly >::type Base; typedef typename Base::Scalar Scalar; TensorLazyEvaluator(const Expr& expr, const Device& device) : Base(expr, device) { } virtual ~TensorLazyEvaluator() { } }; } // namespace internal /** \class TensorRef * \ingroup CXX11_Tensor_Module * * \brief A reference to a tensor expression * The expression will be evaluated lazily (as much as possible). * */ template class TensorRef : public TensorBase > { public: typedef TensorRef Self; typedef typename PlainObjectType::Base Base; typedef typename Eigen::internal::nested::type Nested; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef typename internal::traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef typename Base::CoeffReturnType CoeffReturnType; typedef Scalar* PointerType; typedef PointerType PointerArgType; static const Index NumIndices = PlainObjectType::NumIndices; typedef typename PlainObjectType::Dimensions Dimensions; enum { IsAligned = false, PacketAccess = false, BlockAccess = false, PreferBlockAccess = false, Layout = PlainObjectType::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -----------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorRef() : m_evaluator(NULL) { } template EIGEN_STRONG_INLINE TensorRef(const Expression& expr) : m_evaluator(new internal::TensorLazyEvaluator(expr, DefaultDevice())) { m_evaluator->incrRefCount(); } template EIGEN_STRONG_INLINE TensorRef& operator = (const Expression& expr) { unrefEvaluator(); m_evaluator = new internal::TensorLazyEvaluator(expr, DefaultDevice()); m_evaluator->incrRefCount(); return *this; } ~TensorRef() { unrefEvaluator(); } TensorRef(const TensorRef& other) : m_evaluator(other.m_evaluator) { eigen_assert(m_evaluator->refCount() > 0); m_evaluator->incrRefCount(); } TensorRef& operator = (const TensorRef& other) { if (this != &other) { unrefEvaluator(); m_evaluator = other.m_evaluator; eigen_assert(m_evaluator->refCount() > 0); m_evaluator->incrRefCount(); } return *this; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rank() const { return m_evaluator->dimensions().size(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index dimension(Index n) const { return m_evaluator->dimensions()[n]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_evaluator->dimensions(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_evaluator->dimensions().TotalSize(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar* data() const { return m_evaluator->data(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index index) const { return m_evaluator->coeff(index); } #if EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index firstIndex, IndexTypes... otherIndices) const { const std::size_t num_indices = (sizeof...(otherIndices) + 1); const array indices{{firstIndex, otherIndices...}}; return coeff(indices); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index firstIndex, IndexTypes... otherIndices) { const std::size_t num_indices = (sizeof...(otherIndices) + 1); const array indices{{firstIndex, otherIndices...}}; return coeffRef(indices); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1) const { array indices; indices[0] = i0; indices[1] = i1; return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2) const { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2, Index i3) const { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; indices[3] = i3; return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; indices[3] = i3; indices[4] = i4; return coeff(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1) { array indices; indices[0] = i0; indices[1] = i1; return coeffRef(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1, Index i2) { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; return coeffRef(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3) { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; indices[3] = i3; return coeffRef(indices); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1, Index i2, Index i3, Index i4) { array indices; indices[0] = i0; indices[1] = i1; indices[2] = i2; indices[3] = i3; indices[4] = i4; return coeffRef(indices); } #endif template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(const array& indices) const { const Dimensions& dims = this->dimensions(); Index index = 0; if (PlainObjectType::Options & RowMajor) { index += indices[0]; for (size_t i = 1; i < NumIndices; ++i) { index = index * dims[i] + indices[i]; } } else { index += indices[NumIndices-1]; for (int i = NumIndices-2; i >= 0; --i) { index = index * dims[i] + indices[i]; } } return m_evaluator->coeff(index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array& indices) { const Dimensions& dims = this->dimensions(); Index index = 0; if (PlainObjectType::Options & RowMajor) { index += indices[0]; for (size_t i = 1; i < NumIndices; ++i) { index = index * dims[i] + indices[i]; } } else { index += indices[NumIndices-1]; for (int i = NumIndices-2; i >= 0; --i) { index = index * dims[i] + indices[i]; } } return m_evaluator->coeffRef(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar coeff(Index index) const { return m_evaluator->coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_evaluator->coeffRef(index); } private: EIGEN_STRONG_INLINE void unrefEvaluator() { if (m_evaluator) { m_evaluator->decrRefCount(); if (m_evaluator->refCount() == 0) { delete m_evaluator; } } } internal::TensorLazyBaseEvaluator* m_evaluator; }; // evaluator for rvalues template struct TensorEvaluator, Device> { typedef typename Derived::Index Index; typedef typename Derived::Scalar Scalar; typedef typename Derived::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename Derived::Dimensions Dimensions; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = false, BlockAccess = false, PreferBlockAccess = false, Layout = TensorRef::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const TensorRef& m, const Device&) : m_ref(m) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_ref.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { return true; } EIGEN_STRONG_INLINE void cleanup() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_ref.coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_ref.coeffRef(index); } EIGEN_DEVICE_FUNC const Scalar* data() const { return m_ref.data(); } protected: TensorRef m_ref; }; // evaluator for lvalues template struct TensorEvaluator, Device> : public TensorEvaluator, Device> { typedef typename Derived::Index Index; typedef typename Derived::Scalar Scalar; typedef typename Derived::Scalar CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename Derived::Dimensions Dimensions; typedef TensorEvaluator, Device> Base; enum { IsAligned = false, PacketAccess = false, BlockAccess = false, PreferBlockAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(TensorRef& m, const Device& d) : Base(m, d) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return this->m_ref.coeffRef(index); } }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_REF_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h0000644000176200001440000006403714562417221026172 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H #define EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H namespace Eigen { /** * \class TensorExecutor * \ingroup CXX11_Tensor_Module * * \brief The tensor executor class. * * This class is responsible for launch the evaluation of the expression on * the specified computing device. * * @tparam Vectorizable can use packet math (SSE/AVX/etc... registers and * instructions) * @tparam Tiling can use block based tensor evaluation * (see TensorBlock.h) */ namespace internal { /** * Evaluating TensorBroadcastingOp via coefficient of packet path is extremely * expensive. If expression has at least one broadcast op in it, and it supports * block based evaluation, we always prefer it, even for the small tensors. For * all other tileable ops, block evaluation overhead for small tensors (fits * into L1) is too large, and we fallback on vectorized evaluation. */ // TODO(ezhulenev): Add specializations for all other types of Tensor ops. template struct ExpressionHasTensorBroadcastingOp { enum { value = false }; }; template struct ExpressionHasTensorBroadcastingOp< const TensorAssignOp > { enum { value = ExpressionHasTensorBroadcastingOp::value }; }; template struct ExpressionHasTensorBroadcastingOp< const TensorCwiseUnaryOp > { enum { value = ExpressionHasTensorBroadcastingOp::value }; }; template struct ExpressionHasTensorBroadcastingOp< const TensorCwiseBinaryOp > { enum { value = ExpressionHasTensorBroadcastingOp::value || ExpressionHasTensorBroadcastingOp::value }; }; template struct ExpressionHasTensorBroadcastingOp< const TensorBroadcastingOp > { enum { value = true }; }; // -------------------------------------------------------------------------- // /** * Default strategy: the expression is evaluated sequentially with a single cpu * thread, without vectorization and block evaluation. */ template class TensorExecutor { public: typedef typename Expression::Index StorageIndex; // Including `unsupported/Eigen/CXX11/Tensor` in different translation units // with/without `EIGEN_USE_THREADS` or `EIGEN_USE_GPU` is a potential ODR // violation. If this template is instantiated with a non-default device, it // means that this header file was included without defining // `EIGEN_USE_THREADS`, `EIGEN_USE_GPU` or `EIGEN_USE_SYCL`. static_assert(std::is_same::value, "Default executor instantiated with non-default device. " "You must #define EIGEN_USE_THREADS, EIGEN_USE_GPU or " "EIGEN_USE_SYCL before including Eigen headers."); EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(const Expression& expr, const Device& device = Device()) { TensorEvaluator evaluator(expr, device); const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL); if (needs_assign) { const StorageIndex size = array_prod(evaluator.dimensions()); for (StorageIndex i = 0; i < size; ++i) { evaluator.evalScalar(i); } } evaluator.cleanup(); } }; /** * Default async execution strategy is not implemented. Currently it's only * available for ThreadPoolDevice (see definition below). */ template class TensorAsyncExecutor {}; /** * Process all the data with a single cpu thread, using vectorized instructions. */ template class TensorExecutor { public: typedef typename Expression::Index StorageIndex; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run( const Expression& expr, const DefaultDevice& device = DefaultDevice()) { TensorEvaluator evaluator(expr, device); const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL); if (needs_assign) { const StorageIndex size = array_prod(evaluator.dimensions()); const int PacketSize = unpacket_traits::PacketReturnType>::size; // Give compiler a strong possibility to unroll the loop. But don't insist // on unrolling, because if the function is expensive compiler should not // unroll the loop at the expense of inlining. const StorageIndex UnrolledSize = (size / (4 * PacketSize)) * 4 * PacketSize; for (StorageIndex i = 0; i < UnrolledSize; i += 4 * PacketSize) { for (StorageIndex j = 0; j < 4; j++) { evaluator.evalPacket(i + j * PacketSize); } } const StorageIndex VectorizedSize = (size / PacketSize) * PacketSize; for (StorageIndex i = UnrolledSize; i < VectorizedSize; i += PacketSize) { evaluator.evalPacket(i); } for (StorageIndex i = VectorizedSize; i < size; ++i) { evaluator.evalScalar(i); } } evaluator.cleanup(); } }; /** * Process all the data with a single cpu thread, using blocks of data. By * sizing a block to fit L1 cache we get better cache performance. */ template class TensorExecutor { public: typedef typename traits::Scalar Scalar; typedef typename remove_const::type ScalarNoConst; typedef TensorEvaluator Evaluator; typedef typename traits::Index StorageIndex; static const int NumDims = traits::NumDimensions; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(const Expression& expr, const DefaultDevice& device = DefaultDevice()) { typedef TensorBlockMapper TensorBlockMapper; typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; Evaluator evaluator(expr, device); // TODO(ezhulenev): Do not use tiling for small tensors? const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL); if (needs_assign) { // Query expression tree for desired block size/shape. const TensorBlockResourceRequirements requirements = evaluator.getResourceRequirements(); const TensorBlockMapper block_mapper( typename TensorBlockDesc::Dimensions(evaluator.dimensions()), requirements); // Share scratch memory allocator between all blocks. TensorBlockScratch scratch(device); const StorageIndex total_block_count = block_mapper.blockCount(); for (StorageIndex i = 0; i < total_block_count; ++i) { TensorBlockDesc desc = block_mapper.blockDescriptor(i); evaluator.evalBlock(desc, scratch); scratch.reset(); } } evaluator.cleanup(); } }; /** * Multicore strategy: the index space is partitioned and each partition is * executed on a single core. * * (1) TensorExecutor will submit work to the ThreadPoolDevice managed thread * pool, and will block the caller thread until all tasks are finished. * * (2) TensorAsyncExecutor is a non-blocking version, that will submit work to * the ThreadPoolDevice managed thread pool, and will return immediately. * It will call 'done' callback after all tasks are finished. */ #ifdef EIGEN_USE_THREADS template struct TensorExecutorTilingContext { TensorExecutorTilingContext() = default; TensorExecutorTilingContext(const TensorBlockMapper& b_mapper, const TensorOpCost& b_cost, size_t b_aligned_size) : block_mapper(b_mapper), cost(b_cost), aligned_blocksize(b_aligned_size) {} TensorBlockMapper block_mapper; // navigate through blocks TensorOpCost cost; // cost of computing a single block size_t aligned_blocksize; // block size after memory alignment }; // Computes a block evaluation parameters, and allocates temporary memory buffer // for blocks. See TensorExecutor/TensorAsyncExecutor (Tiling=On) below. template TensorExecutorTilingContext GetTensorExecutorTilingContext( const Evaluator& evaluator) { // Query expression tree for desired block size/shape. TensorBlockResourceRequirements requirements = evaluator.getResourceRequirements(); // Update target block size based on cost model. double taskSize = TensorCostModel::taskSize( 1, requirements.cost_per_coeff); requirements.size = static_cast(1.0 / taskSize); TensorBlockMapper block_mapper( typename TensorBlockMapper::Dimensions(evaluator.dimensions()), requirements); size_t block_size = block_mapper.blockTotalSize(); const size_t align = numext::maxi(EIGEN_MAX_ALIGN_BYTES, 1); const size_t aligned_blocksize = align * divup(block_size * sizeof(typename Evaluator::Scalar), align); return {block_mapper, requirements.cost_per_coeff * block_size, aligned_blocksize}; } template struct EvalRange { static void run(Evaluator* evaluator_in, const StorageIndex firstIdx, const StorageIndex lastIdx) { Evaluator evaluator = *evaluator_in; eigen_assert(lastIdx >= firstIdx); for (StorageIndex i = firstIdx; i < lastIdx; ++i) { evaluator.evalScalar(i); } } static StorageIndex alignBlockSize(StorageIndex size) { return size; } }; template struct EvalRange { static const int PacketSize = unpacket_traits::size; static void run(Evaluator* evaluator_in, const StorageIndex firstIdx, const StorageIndex lastIdx) { Evaluator evaluator = *evaluator_in; eigen_assert(lastIdx >= firstIdx); StorageIndex i = firstIdx; if (lastIdx - firstIdx >= PacketSize) { eigen_assert(firstIdx % PacketSize == 0); StorageIndex last_chunk_offset = lastIdx - 4 * PacketSize; // Give compiler a strong possibility to unroll the loop. But don't insist // on unrolling, because if the function is expensive compiler should not // unroll the loop at the expense of inlining. for (; i <= last_chunk_offset; i += 4 * PacketSize) { for (StorageIndex j = 0; j < 4; j++) { evaluator.evalPacket(i + j * PacketSize); } } last_chunk_offset = lastIdx - PacketSize; for (; i <= last_chunk_offset; i += PacketSize) { evaluator.evalPacket(i); } } for (; i < lastIdx; ++i) { evaluator.evalScalar(i); } } static StorageIndex alignBlockSize(StorageIndex size) { // Align block size to packet size and account for unrolling in run above. if (size >= 16 * PacketSize) { return (size + 4 * PacketSize - 1) & ~(4 * PacketSize - 1); } // Aligning to 4 * PacketSize would increase block size by more than 25%. return (size + PacketSize - 1) & ~(PacketSize - 1); } }; template class TensorExecutor { public: typedef typename Expression::Index StorageIndex; static EIGEN_STRONG_INLINE void run(const Expression& expr, const ThreadPoolDevice& device) { typedef TensorEvaluator Evaluator; typedef EvalRange EvalRange; Evaluator evaluator(expr, device); const bool needs_assign = evaluator.evalSubExprsIfNeeded(nullptr); if (needs_assign) { const StorageIndex size = array_prod(evaluator.dimensions()); device.parallelFor(size, evaluator.costPerCoeff(Vectorizable), EvalRange::alignBlockSize, [&evaluator](StorageIndex firstIdx, StorageIndex lastIdx) { EvalRange::run(&evaluator, firstIdx, lastIdx); }); } evaluator.cleanup(); } }; template class TensorExecutor { public: typedef typename traits::Index IndexType; typedef typename traits::Scalar Scalar; typedef typename remove_const::type ScalarNoConst; static const int NumDims = traits::NumDimensions; typedef TensorEvaluator Evaluator; typedef TensorBlockMapper BlockMapper; typedef TensorExecutorTilingContext TilingContext; typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; static EIGEN_STRONG_INLINE void run(const Expression& expr, const ThreadPoolDevice& device) { Evaluator evaluator(expr, device); const bool needs_assign = evaluator.evalSubExprsIfNeeded(nullptr); if (needs_assign) { const TilingContext tiling = internal::GetTensorExecutorTilingContext(evaluator); auto eval_block = [&device, &evaluator, &tiling](IndexType firstBlockIdx, IndexType lastBlockIdx) { TensorBlockScratch scratch(device); for (IndexType block_idx = firstBlockIdx; block_idx < lastBlockIdx; ++block_idx) { TensorBlockDesc desc = tiling.block_mapper.blockDescriptor(block_idx); evaluator.evalBlock(desc, scratch); scratch.reset(); } }; // Evaluate small expressions directly as a single block. if (tiling.block_mapper.blockCount() == 1) { TensorBlockScratch scratch(device); TensorBlockDesc desc(0, tiling.block_mapper.blockDimensions()); evaluator.evalBlock(desc, scratch); } else { device.parallelFor(tiling.block_mapper.blockCount(), tiling.cost, eval_block); } } evaluator.cleanup(); } }; template class TensorAsyncExecutor { public: typedef typename Expression::Index StorageIndex; typedef TensorEvaluator Evaluator; static EIGEN_STRONG_INLINE void runAsync(const Expression& expr, const ThreadPoolDevice& device, DoneCallback done) { TensorAsyncExecutorContext* const ctx = new TensorAsyncExecutorContext(expr, device, std::move(done)); const auto on_eval_subexprs = [ctx, &device](bool need_assign) -> void { if (!need_assign) { delete ctx; return; } typedef EvalRange EvalRange; const StorageIndex size = array_prod(ctx->evaluator.dimensions()); device.parallelForAsync( size, ctx->evaluator.costPerCoeff(Vectorizable), EvalRange::alignBlockSize, [ctx](StorageIndex firstIdx, StorageIndex lastIdx) { EvalRange::run(&ctx->evaluator, firstIdx, lastIdx); }, [ctx]() { delete ctx; }); }; ctx->evaluator.evalSubExprsIfNeededAsync(nullptr, on_eval_subexprs); } private: struct TensorAsyncExecutorContext { TensorAsyncExecutorContext(const Expression& expr, const ThreadPoolDevice& thread_pool, DoneCallback done) : evaluator(expr, thread_pool), on_done(std::move(done)) {} ~TensorAsyncExecutorContext() { evaluator.cleanup(); on_done(); } Evaluator evaluator; private: DoneCallback on_done; }; }; template class TensorAsyncExecutor { public: typedef typename traits::Index IndexType; typedef typename traits::Scalar Scalar; typedef typename remove_const::type ScalarNoConst; static const int NumDims = traits::NumDimensions; typedef TensorEvaluator Evaluator; typedef TensorBlockMapper BlockMapper; typedef TensorExecutorTilingContext TilingContext; typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; static EIGEN_STRONG_INLINE void runAsync(const Expression& expr, const ThreadPoolDevice& device, DoneCallback done) { TensorAsyncExecutorContext* const ctx = new TensorAsyncExecutorContext(expr, device, std::move(done)); const auto on_eval_subexprs = [ctx](bool need_assign) -> void { if (!need_assign) { delete ctx; return; } ctx->tiling = internal::GetTensorExecutorTilingContext< Evaluator, BlockMapper, Vectorizable>(ctx->evaluator); auto eval_block = [ctx](IndexType firstBlockIdx, IndexType lastBlockIdx) { TensorBlockScratch scratch(ctx->device); for (IndexType block_idx = firstBlockIdx; block_idx < lastBlockIdx; ++block_idx) { TensorBlockDesc desc = ctx->tiling.block_mapper.blockDescriptor(block_idx); ctx->evaluator.evalBlock(desc, scratch); scratch.reset(); } }; // Evaluate small expressions directly as a single block. if (ctx->tiling.block_mapper.blockCount() == 1) { TensorBlockScratch scratch(ctx->device); TensorBlockDesc desc(0, ctx->tiling.block_mapper.blockDimensions()); ctx->evaluator.evalBlock(desc, scratch); delete ctx; } else { ctx->device.parallelForAsync(ctx->tiling.block_mapper.blockCount(), ctx->tiling.cost, eval_block, [ctx]() { delete ctx; }); } }; ctx->evaluator.evalSubExprsIfNeededAsync(nullptr, on_eval_subexprs); } private: struct TensorAsyncExecutorContext { TensorAsyncExecutorContext(const Expression& expr, const ThreadPoolDevice& thread_pool, DoneCallback done) : device(thread_pool), evaluator(expr, thread_pool), on_done(std::move(done)) {} ~TensorAsyncExecutorContext() { evaluator.cleanup(); on_done(); } const ThreadPoolDevice& device; Evaluator evaluator; TilingContext tiling; private: DoneCallback on_done; }; }; #endif // EIGEN_USE_THREADS // GPU: the evaluation of the expression is offloaded to a GPU. #if defined(EIGEN_USE_GPU) template class TensorExecutor { public: typedef typename Expression::Index StorageIndex; static void run(const Expression& expr, const GpuDevice& device); }; #if defined(EIGEN_GPUCC) template struct EigenMetaKernelEval { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void run(Evaluator& eval, StorageIndex firstIdx, StorageIndex lastIdx, StorageIndex step_size) { for (StorageIndex i = firstIdx; i < lastIdx; i += step_size) { eval.evalScalar(i); } } }; template struct EigenMetaKernelEval { static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void run(Evaluator& eval, StorageIndex firstIdx, StorageIndex lastIdx, StorageIndex step_size) { const StorageIndex PacketSize = unpacket_traits::size; const StorageIndex vectorized_size = (lastIdx / PacketSize) * PacketSize; const StorageIndex vectorized_step_size = step_size * PacketSize; // Use the vector path for (StorageIndex i = firstIdx * PacketSize; i < vectorized_size; i += vectorized_step_size) { eval.evalPacket(i); } for (StorageIndex i = vectorized_size + firstIdx; i < lastIdx; i += step_size) { eval.evalScalar(i); } } }; template __global__ void __launch_bounds__(1024) EigenMetaKernel(Evaluator eval, StorageIndex size) { const StorageIndex first_index = blockIdx.x * blockDim.x + threadIdx.x; const StorageIndex step_size = blockDim.x * gridDim.x; const bool vectorizable = Evaluator::PacketAccess & Evaluator::IsAligned; EigenMetaKernelEval::run(eval, first_index, size, step_size); } /*static*/ template EIGEN_STRONG_INLINE void TensorExecutor::run( const Expression& expr, const GpuDevice& device) { TensorEvaluator evaluator(expr, device); const bool needs_assign = evaluator.evalSubExprsIfNeeded(nullptr); if (needs_assign) { const int block_size = device.maxGpuThreadsPerBlock(); const int max_blocks = device.getNumGpuMultiProcessors() * device.maxGpuThreadsPerMultiProcessor() / block_size; const StorageIndex size = array_prod(evaluator.dimensions()); // Create a least one block to ensure we won't crash when tensorflow calls with tensors of size 0. const int num_blocks = numext::maxi(numext::mini(max_blocks, divup(size, block_size)), 1); LAUNCH_GPU_KERNEL( (EigenMetaKernel, StorageIndex>), num_blocks, block_size, 0, device, evaluator, size); } evaluator.cleanup(); } #endif // EIGEN_GPUCC #endif // EIGEN_USE_GPU // SYCL Executor policy #ifdef EIGEN_USE_SYCL template struct ExecExprFunctorKernel { typedef typename Evaluator::Index Index; Evaluator evaluator; const Index range; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE ExecExprFunctorKernel( const Scratch, Evaluator evaluator_, const Index range_) : evaluator(evaluator_), range(range_) {} EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void operator()( cl::sycl::nd_item<1> itemID) { compute(itemID); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename std::enable_if::type compute(const cl::sycl::nd_item<1>& itemID) { Index gId = static_cast(itemID.get_global_linear_id()); Index total_threads = itemID.get_global_range(0); for (Index i = gId; i < range; i += total_threads) { evaluator.evalScalar(i); } } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename std::enable_if::type compute(const cl::sycl::nd_item<1>& itemID) { const Index vectorizedRange = (range / Evaluator::PacketSize) * Evaluator::PacketSize; Index gId = static_cast(itemID.get_global_linear_id()); const Index step = Evaluator::PacketSize * itemID.get_global_range(0); const Index start = Evaluator::PacketSize * gId; for (Index i = start; i < vectorizedRange; i += step) { evaluator.evalPacket(i); } gId += vectorizedRange; for (Index i = gId; i < range; i += itemID.get_global_range(0)) { evaluator.evalScalar(i); } } }; template class TensorExecutor { public: typedef typename Expression::Index Index; static EIGEN_STRONG_INLINE void run(const Expression& expr, const Eigen::SyclDevice& dev) { typedef Eigen::TensorEvaluator Evaluator; Evaluator evaluator(expr, dev); const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL); if (needs_assign) { Index range, GRange, tileSize; Index total_size = ::Eigen::internal::array_prod(evaluator.dimensions()); total_size = (total_size == 0) ? 1 : total_size; const int PacketSize = Eigen::PacketType::size; Index vectorizable_threads = static_cast(total_size / PacketSize); dev.parallel_for_setup(vectorizable_threads, tileSize, range, GRange); range = total_size; dev.template nullary_kernel_launcher< typename Evaluator::CoeffReturnType, ExecExprFunctorKernel >( evaluator, cl::sycl::nd_range<1>(cl::sycl::range<1>(GRange), cl::sycl::range<1>(tileSize)), Index(1), range); } evaluator.cleanup(); } }; #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorMacros.h0000644000176200001440000000707214562417221025614 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_META_MACROS_H #define EIGEN_CXX11_TENSOR_TENSOR_META_MACROS_H /** use this macro in sfinae selection in templated functions * * template::value , int >::type = 0 * > * void foo(){} * * becomes => * * template::value ) * > * void foo(){} */ // SFINAE requires variadic templates #if !defined(EIGEN_GPUCC) #if EIGEN_HAS_VARIADIC_TEMPLATES // SFINAE doesn't work for gcc <= 4.7 #ifdef EIGEN_COMP_GNUC #if EIGEN_GNUC_AT_LEAST(4,8) #define EIGEN_HAS_SFINAE #endif #else #define EIGEN_HAS_SFINAE #endif #endif #endif #define EIGEN_SFINAE_ENABLE_IF( __condition__ ) \ typename internal::enable_if< ( __condition__ ) , int >::type = 0 // Define a macro to use a reference on the host but a value on the device #if defined(SYCL_DEVICE_ONLY) #define EIGEN_DEVICE_REF #else #define EIGEN_DEVICE_REF & #endif // Define a macro for catching SYCL exceptions if exceptions are enabled #define EIGEN_SYCL_TRY_CATCH(X) \ do { \ EIGEN_TRY {X;} \ EIGEN_CATCH(const cl::sycl::exception& e) { \ EIGEN_THROW_X(std::runtime_error("SYCL exception at " + \ std::string(__FILE__) + ":" + \ std::to_string(__LINE__) + "\n" + \ e.what())); \ } \ } while (false) // Define a macro if local memory flags are unset or one of them is set // Setting both flags is the same as unsetting them #if (!defined(EIGEN_SYCL_LOCAL_MEM) && !defined(EIGEN_SYCL_NO_LOCAL_MEM)) || \ (defined(EIGEN_SYCL_LOCAL_MEM) && defined(EIGEN_SYCL_NO_LOCAL_MEM)) #define EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON 1 #define EIGEN_SYCL_LOCAL_MEM_UNSET_OR_OFF 1 #elif defined(EIGEN_SYCL_LOCAL_MEM) && !defined(EIGEN_SYCL_NO_LOCAL_MEM) #define EIGEN_SYCL_LOCAL_MEM_UNSET_OR_ON 1 #elif !defined(EIGEN_SYCL_LOCAL_MEM) && defined(EIGEN_SYCL_NO_LOCAL_MEM) #define EIGEN_SYCL_LOCAL_MEM_UNSET_OR_OFF 1 #endif #if EIGEN_COMP_CLANG // workaround clang bug (see http://forum.kde.org/viewtopic.php?f=74&t=102653) #define EIGEN_TENSOR_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ using Base::operator =; \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Derived& other) { Base::operator=(other); return *this; } \ template \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const OtherDerived& other) { Base::operator=(other); return *this; } #else #define EIGEN_TENSOR_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) #endif /** \internal * \brief Macro to manually inherit assignment operators. * This is necessary, because the implicitly defined assignment operator gets deleted when a custom operator= is defined. * This also inherits template operator=(const OtherDerived&) assignments. * With C++11 or later this also default-implements the copy-constructor */ #define EIGEN_TENSOR_INHERIT_ASSIGNMENT_OPERATORS(Derived) \ EIGEN_TENSOR_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ EIGEN_DEFAULT_COPY_CONSTRUCTOR(Derived) #endif RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBase.h0000644000176200001440000016111414562417221025240 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_BASE_H #define EIGEN_CXX11_TENSOR_TENSOR_BASE_H // clang-format off namespace Eigen { /** \class TensorBase * \ingroup CXX11_Tensor_Module * * \brief The tensor base class. * * This class is the common parent of the Tensor and TensorMap class, thus * making it possible to use either class interchangeably in expressions. */ #ifndef EIGEN_PARSED_BY_DOXYGEN // FIXME Doxygen does not like the inheritance with different template parameters // Since there is no doxygen documentation inside, we disable it for now template class TensorBase { public: typedef internal::traits DerivedTraits; typedef typename DerivedTraits::Scalar Scalar; typedef typename DerivedTraits::Index Index; typedef typename internal::remove_const::type CoeffReturnType; static const int NumDimensions = DerivedTraits::NumDimensions; // Generic nullary operation support. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseNullaryOp nullaryExpr(const CustomNullaryOp& func) const { return TensorCwiseNullaryOp(derived(), func); } // Coefficient-wise nullary operators EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseNullaryOp, const Derived> constant(const Scalar& value) const { return nullaryExpr(internal::scalar_constant_op(value)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseNullaryOp, const Derived> random() const { return nullaryExpr(internal::UniformRandomGenerator()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseNullaryOp random(const RandomGenerator& gen = RandomGenerator()) const { return nullaryExpr(gen); } // Tensor generation template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorGeneratorOp generate(const Generator& generator) const { return TensorGeneratorOp(derived(), generator); } // Generic unary operation support. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp unaryExpr(const CustomUnaryOp& func) const { return TensorCwiseUnaryOp(derived(), func); } // Coefficient-wise unary operators EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> operator-() const { return unaryExpr(internal::scalar_opposite_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> sqrt() const { return unaryExpr(internal::scalar_sqrt_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> sign() const { return unaryExpr(internal::scalar_sign_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> rsqrt() const { return unaryExpr(internal::scalar_rsqrt_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> square() const { return unaryExpr(internal::scalar_square_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> cube() const { return unaryExpr(internal::scalar_cube_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> inverse() const { return unaryExpr(internal::scalar_inverse_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> tanh() const { return unaryExpr(internal::scalar_tanh_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> lgamma() const { return unaryExpr(internal::scalar_lgamma_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> digamma() const { return unaryExpr(internal::scalar_digamma_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_i0() const { return unaryExpr(internal::scalar_bessel_i0_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_i0e() const { return unaryExpr(internal::scalar_bessel_i0e_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_i1() const { return unaryExpr(internal::scalar_bessel_i1_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_i1e() const { return unaryExpr(internal::scalar_bessel_i1e_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_j0() const { return unaryExpr(internal::scalar_bessel_j0_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_y0() const { return unaryExpr(internal::scalar_bessel_y0_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_j1() const { return unaryExpr(internal::scalar_bessel_j1_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_y1() const { return unaryExpr(internal::scalar_bessel_y1_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_k0() const { return unaryExpr(internal::scalar_bessel_k0_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_k0e() const { return unaryExpr(internal::scalar_bessel_k0e_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_k1() const { return unaryExpr(internal::scalar_bessel_k1_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> bessel_k1e() const { return unaryExpr(internal::scalar_bessel_k1e_op()); } // igamma(a = this, x = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> igamma(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_igamma_op()); } // igamma_der_a(a = this, x = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> igamma_der_a(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_igamma_der_a_op()); } // gamma_sample_der_alpha(alpha = this, sample = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> gamma_sample_der_alpha(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_gamma_sample_der_alpha_op()); } // igammac(a = this, x = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> igammac(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_igammac_op()); } // zeta(x = this, q = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> zeta(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_zeta_op()); } // polygamma(n = this, x = other) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> polygamma(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_polygamma_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> erf() const { return unaryExpr(internal::scalar_erf_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> erfc() const { return unaryExpr(internal::scalar_erfc_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> ndtri() const { return unaryExpr(internal::scalar_ndtri_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> sigmoid() const { return unaryExpr(internal::scalar_logistic_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> exp() const { return unaryExpr(internal::scalar_exp_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> expm1() const { return unaryExpr(internal::scalar_expm1_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> log() const { return unaryExpr(internal::scalar_log_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> log1p() const { return unaryExpr(internal::scalar_log1p_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> log2() const { return unaryExpr(internal::scalar_log2_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> abs() const { return unaryExpr(internal::scalar_abs_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> clip(Scalar min, Scalar max) const { return unaryExpr(internal::scalar_clamp_op(min, max)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::conditional::IsComplex, TensorCwiseUnaryOp, const Derived>, Derived>::type conjugate() const { return choose(Cond::IsComplex>(), unaryExpr(internal::scalar_conjugate_op()), derived()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp >, const Derived> pow(Scalar exponent) const { return unaryExpr(internal::bind2nd_op >(exponent)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> real() const { return unaryExpr(internal::scalar_real_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> imag() const { return unaryExpr(internal::scalar_imag_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp >, const Derived> operator+ (Scalar rhs) const { return unaryExpr(internal::bind2nd_op >(rhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend const TensorCwiseUnaryOp >, const Derived> operator+ (Scalar lhs, const Derived& rhs) { return rhs.unaryExpr(internal::bind1st_op >(lhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp >, const Derived> operator- (Scalar rhs) const { EIGEN_STATIC_ASSERT((NumTraits::IsSigned || internal::is_same >::value), YOU_MADE_A_PROGRAMMING_MISTAKE); return unaryExpr(internal::bind2nd_op >(rhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend const TensorCwiseUnaryOp >, const Derived> operator- (Scalar lhs, const Derived& rhs) { return rhs.unaryExpr(internal::bind1st_op >(lhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp >, const Derived> operator* (Scalar rhs) const { return unaryExpr(internal::bind2nd_op >(rhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend const TensorCwiseUnaryOp >, const Derived> operator* (Scalar lhs, const Derived& rhs) { return rhs.unaryExpr(internal::bind1st_op >(lhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp >, const Derived> operator/ (Scalar rhs) const { return unaryExpr(internal::bind2nd_op >(rhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend const TensorCwiseUnaryOp >, const Derived> operator/ (Scalar lhs, const Derived& rhs) { return rhs.unaryExpr(internal::bind1st_op >(lhs)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> operator% (Scalar rhs) const { EIGEN_STATIC_ASSERT(NumTraits::IsInteger, YOU_MADE_A_PROGRAMMING_MISTAKE_TRY_MOD); return unaryExpr(internal::scalar_mod_op(rhs)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > cwiseMax(Scalar threshold) const { return cwiseMax(constant(threshold)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > cwiseMin(Scalar threshold) const { return cwiseMin(constant(threshold)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::conditional::value, Derived, TensorConversionOp >::type cast() const { return choose(Cond::value>(), derived(), TensorConversionOp(derived())); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> round() const { return unaryExpr(internal::scalar_round_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> rint() const { return unaryExpr(internal::scalar_rint_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> ceil() const { return unaryExpr(internal::scalar_ceil_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> floor() const { return unaryExpr(internal::scalar_floor_op()); } // Generic binary operation support. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp binaryExpr(const OtherDerived& other, const CustomBinaryOp& func) const { return TensorCwiseBinaryOp(derived(), other, func); } // Coefficient-wise binary operators. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator+(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_sum_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator-(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_difference_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator*(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_product_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator/(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_quotient_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> cwiseMax(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_max_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> cwiseMin(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_min_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp operator&&(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_boolean_and_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp operator||(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_boolean_or_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp operator^(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_boolean_xor_op()); } // Comparisons and tests. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator<(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator<=(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator>(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator>=(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator==(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const OtherDerived> operator!=(const OtherDerived& other) const { return binaryExpr(other.derived(), internal::scalar_cmp_op()); } // comparisons and tests for Scalars EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator<(Scalar threshold) const { return operator<(constant(threshold)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator<=(Scalar threshold) const { return operator<=(constant(threshold)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator>(Scalar threshold) const { return operator>(constant(threshold)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator>=(Scalar threshold) const { return operator>=(constant(threshold)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator==(Scalar threshold) const { return operator==(constant(threshold)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseBinaryOp, const Derived, const TensorCwiseNullaryOp, const Derived> > operator!=(Scalar threshold) const { return operator!=(constant(threshold)); } // Checks EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> (isnan)() const { return unaryExpr(internal::scalar_isnan_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> (isinf)() const { return unaryExpr(internal::scalar_isinf_op()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCwiseUnaryOp, const Derived> (isfinite)() const { return unaryExpr(internal::scalar_isfinite_op()); } // Coefficient-wise ternary operators. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorSelectOp select(const ThenDerived& thenTensor, const ElseDerived& elseTensor) const { return TensorSelectOp(derived(), thenTensor.derived(), elseTensor.derived()); } // Contractions. typedef Eigen::IndexPair DimensionPair; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorContractionOp contract(const OtherDerived& other, const Dimensions& dims) const { return TensorContractionOp(derived(), other.derived(), dims); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorContractionOp contract(const OtherDerived& other, const Dimensions& dims, const OutputKernel& output_kernel) const { return TensorContractionOp(derived(), other.derived(), dims, output_kernel); } // Convolutions. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorConvolutionOp convolve(const KernelDerived& kernel, const Dimensions& dims) const { return TensorConvolutionOp(derived(), kernel.derived(), dims); } // Fourier transforms template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorFFTOp fft(const FFT& dims) const { return TensorFFTOp(derived(), dims); } // Scan. typedef TensorScanOp, const Derived> TensorScanSumOp; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorScanSumOp cumsum(const Index& axis, bool exclusive = false) const { return TensorScanSumOp(derived(), axis, exclusive); } typedef TensorScanOp, const Derived> TensorScanProdOp; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorScanProdOp cumprod(const Index& axis, bool exclusive = false) const { return TensorScanProdOp(derived(), axis, exclusive); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorScanOp scan(const Index& axis, const Reducer& reducer, bool exclusive = false) const { return TensorScanOp(derived(), axis, exclusive, reducer); } // Reductions. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const Dims, const Derived> sum(const Dims& dims) const { return TensorReductionOp, const Dims, const Derived>(derived(), dims, internal::SumReducer()); } const TensorReductionOp, const DimensionList, const Derived> sum() const { DimensionList in_dims; return TensorReductionOp, const DimensionList, const Derived>(derived(), in_dims, internal::SumReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const Dims, const Derived> mean(const Dims& dims) const { return TensorReductionOp, const Dims, const Derived>(derived(), dims, internal::MeanReducer()); } const TensorReductionOp, const DimensionList, const Derived> mean() const { DimensionList in_dims; return TensorReductionOp, const DimensionList, const Derived>(derived(), in_dims, internal::MeanReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const Dims, const Derived> prod(const Dims& dims) const { return TensorReductionOp, const Dims, const Derived>(derived(), dims, internal::ProdReducer()); } const TensorReductionOp, const DimensionList, const Derived> prod() const { DimensionList in_dims; return TensorReductionOp, const DimensionList, const Derived>(derived(), in_dims, internal::ProdReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const Dims, const Derived> maximum(const Dims& dims) const { return TensorReductionOp, const Dims, const Derived>(derived(), dims, internal::MaxReducer()); } template const TensorReductionOp, const DimensionList, const Derived> maximum() const { DimensionList in_dims; return TensorReductionOp, const DimensionList, const Derived>(derived(), in_dims, internal::MaxReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const Dims, const Derived> minimum(const Dims& dims) const { return TensorReductionOp, const Dims, const Derived>(derived(), dims, internal::MinReducer()); } template const TensorReductionOp, const DimensionList, const Derived> minimum() const { DimensionList in_dims; return TensorReductionOp, const DimensionList, const Derived>(derived(), in_dims, internal::MinReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp::value, Derived, TensorConversionOp >::type > all(const Dims& dims) const { return cast().reduce(dims, internal::AndReducer()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const typename internal::conditional::value, Derived, TensorConversionOp >::type > all() const { DimensionList in_dims; return cast().reduce(in_dims, internal::AndReducer()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp::value, Derived, TensorConversionOp >::type > any(const Dims& dims) const { return cast().reduce(dims, internal::OrReducer()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp, const typename internal::conditional::value, Derived, TensorConversionOp >::type > any() const { DimensionList in_dims; return cast().reduce(in_dims, internal::OrReducer()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorTupleReducerOp< internal::ArgMaxTupleReducer >, const array, const Derived> argmax() const { array in_dims; for (Index d = 0; d < NumDimensions; ++d) in_dims[d] = d; return TensorTupleReducerOp< internal::ArgMaxTupleReducer >, const array, const Derived>(derived(), internal::ArgMaxTupleReducer >(), -1, in_dims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorTupleReducerOp< internal::ArgMinTupleReducer >, const array, const Derived> argmin() const { array in_dims; for (Index d = 0; d < NumDimensions; ++d) in_dims[d] = d; return TensorTupleReducerOp< internal::ArgMinTupleReducer >, const array, const Derived>(derived(), internal::ArgMinTupleReducer >(), -1, in_dims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorTupleReducerOp< internal::ArgMaxTupleReducer >, const array, const Derived> argmax(const Index return_dim) const { array in_dims; in_dims[0] = return_dim; return TensorTupleReducerOp< internal::ArgMaxTupleReducer >, const array, const Derived>(derived(), internal::ArgMaxTupleReducer >(), return_dim, in_dims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorTupleReducerOp< internal::ArgMinTupleReducer >, const array, const Derived> argmin(const Index return_dim) const { array in_dims; in_dims[0] = return_dim; return TensorTupleReducerOp< internal::ArgMinTupleReducer >, const array, const Derived>(derived(), internal::ArgMinTupleReducer >(), return_dim, in_dims); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReductionOp reduce(const Dims& dims, const Reducer& reducer) const { return TensorReductionOp(derived(), dims, reducer); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorTraceOp trace(const Dims& dims) const { return TensorTraceOp(derived(), dims); } const TensorTraceOp, const Derived> trace() const { DimensionList in_dims; return TensorTraceOp, const Derived>(derived(), in_dims); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorBroadcastingOp broadcast(const Broadcast& bcast) const { return TensorBroadcastingOp(derived(), bcast); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorConcatenationOp concatenate(const OtherDerived& other, Axis axis) const { return TensorConcatenationOp(derived(), other.derived(), axis); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorPatchOp extract_patches(const PatchDims& patch_dims) const { return TensorPatchOp(derived(), patch_dims); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorImagePatchOp extract_image_patches(const Index patch_rows = 1, const Index patch_cols = 1, const Index row_stride = 1, const Index col_stride = 1, const Index in_row_stride = 1, const Index in_col_stride = 1, const PaddingType padding_type = PADDING_SAME, const Scalar padding_value = Scalar(0)) const { return TensorImagePatchOp(derived(), patch_rows, patch_cols, row_stride, col_stride, in_row_stride, in_col_stride, 1, 1, padding_type, padding_value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorImagePatchOp extract_image_patches(const Index patch_rows, const Index patch_cols, const Index row_stride, const Index col_stride, const Index in_row_stride, const Index in_col_stride, const Index row_inflate_stride, const Index col_inflate_stride, const Index padding_top, const Index padding_bottom, const Index padding_left,const Index padding_right, const Scalar padding_value) const { return TensorImagePatchOp(derived(), patch_rows, patch_cols, row_stride, col_stride, in_row_stride, in_col_stride, row_inflate_stride, col_inflate_stride, padding_top, padding_bottom, padding_left, padding_right, padding_value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorVolumePatchOp extract_volume_patches(const Index patch_planes, const Index patch_rows, const Index patch_cols, const Index plane_stride = 1, const Index row_stride = 1, const Index col_stride = 1, const PaddingType padding_type = PADDING_SAME, const Scalar padding_value = Scalar(0)) const { return TensorVolumePatchOp(derived(), patch_planes, patch_rows, patch_cols, plane_stride, row_stride, col_stride, 1, 1, 1, 1, 1, 1, padding_type, padding_value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorVolumePatchOp extract_volume_patches(const Index patch_planes, const Index patch_rows, const Index patch_cols, const Index plane_stride, const Index row_stride, const Index col_stride, const Index plane_inflate_stride, const Index row_inflate_stride, const Index col_inflate_stride, const Index padding_top_z, const Index padding_bottom_z, const Index padding_top, const Index padding_bottom, const Index padding_left, const Index padding_right, const Scalar padding_value = Scalar(0)) const { return TensorVolumePatchOp(derived(), patch_planes, patch_rows, patch_cols, plane_stride, row_stride, col_stride, 1, 1, 1, plane_inflate_stride, row_inflate_stride, col_inflate_stride, padding_top_z, padding_bottom_z, padding_top, padding_bottom, padding_left, padding_right, padding_value); } // Morphing operators. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorLayoutSwapOp swap_layout() const { return TensorLayoutSwapOp(derived()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReshapingOp reshape(const NewDimensions& newDimensions) const { return TensorReshapingOp(derived(), newDimensions); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorSlicingOp slice(const StartIndices& startIndices, const Sizes& sizes) const { return TensorSlicingOp(derived(), startIndices, sizes); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorStridingSlicingOp stridedSlice(const StartIndices& startIndices, const StopIndices& stopIndices, const Strides& strides) const { return TensorStridingSlicingOp(derived(), startIndices, stopIndices, strides); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorChippingOp chip(const Index offset) const { return TensorChippingOp(derived(), offset, DimId); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorChippingOp chip(const Index offset, const Index dim) const { return TensorChippingOp(derived(), offset, dim); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReverseOp reverse(const ReverseDimensions& rev) const { return TensorReverseOp(derived(), rev); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorPaddingOp pad(const PaddingDimensions& padding) const { return TensorPaddingOp(derived(), padding, internal::scalar_cast_op()(0)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorPaddingOp pad(const PaddingDimensions& padding, const Scalar padding_value) const { return TensorPaddingOp(derived(), padding, padding_value); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorShufflingOp shuffle(const Shuffle& shfl) const { return TensorShufflingOp(derived(), shfl); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorStridingOp stride(const Strides& strides) const { return TensorStridingOp(derived(), strides); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorInflationOp inflate(const Strides& strides) const { return TensorInflationOp(derived(), strides); } // Returns a tensor containing index/value tuples EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorIndexTupleOp index_tuples() const { return TensorIndexTupleOp(derived()); } // Support for custom unary and binary operations template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCustomUnaryOp customOp(const CustomUnaryFunc& op) const { return TensorCustomUnaryOp(derived(), op); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorCustomBinaryOp customOp(const OtherDerived& other, const CustomBinaryFunc& op) const { return TensorCustomBinaryOp(derived(), other, op); } // Force the evaluation of the expression. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorForcedEvalOp eval() const { return TensorForcedEvalOp(derived()); } protected: template friend class Tensor; template friend class TensorFixedSize; // the Eigen:: prefix is required to workaround a compilation issue with nvcc 9.0 template friend class Eigen::TensorBase; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Derived& derived() const { return *static_cast(this); } }; template::value> class TensorBase : public TensorBase { public: typedef TensorBase Base; typedef internal::traits DerivedTraits; typedef typename DerivedTraits::Scalar Scalar; typedef typename DerivedTraits::Index Index; typedef Scalar CoeffReturnType; static const int NumDimensions = DerivedTraits::NumDimensions; template friend class Tensor; template friend class TensorFixedSize; // the Eigen:: prefix is required to workaround a compilation issue with nvcc 9.0 template friend class Eigen::TensorBase; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& setZero() { return setConstant(Scalar(0)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& setConstant(const Scalar& val) { return derived() = this->constant(val); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& setRandom() { return derived() = this->random(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& setRandom() { return derived() = this->template random(); } #if EIGEN_HAS_VARIADIC_TEMPLATES EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& setValues( const typename internal::Initializer::InitList& vals) { TensorEvaluator eval(derived(), DefaultDevice()); internal::initialize_tensor(eval, vals); return derived(); } #endif // EIGEN_HAS_VARIADIC_TEMPLATES template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator+=(const OtherDerived& other) { return derived() = derived() + other.derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator-=(const OtherDerived& other) { return derived() = derived() - other.derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator*=(const OtherDerived& other) { return derived() = derived() * other.derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator/=(const OtherDerived& other) { return derived() = derived() / other.derived(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorLayoutSwapOp swap_layout() const { return TensorLayoutSwapOp(derived()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorLayoutSwapOp swap_layout() { return TensorLayoutSwapOp(derived()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorConcatenationOp concatenate(const OtherDerived& other, const Axis& axis) const { return TensorConcatenationOp(derived(), other, axis); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConcatenationOp concatenate(const OtherDerived& other, const Axis& axis) { return TensorConcatenationOp(derived(), other, axis); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReshapingOp reshape(const NewDimensions& newDimensions) const { return TensorReshapingOp(derived(), newDimensions); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReshapingOp reshape(const NewDimensions& newDimensions) { return TensorReshapingOp(derived(), newDimensions); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorSlicingOp slice(const StartIndices& startIndices, const Sizes& sizes) const { return TensorSlicingOp(derived(), startIndices, sizes); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorSlicingOp slice(const StartIndices& startIndices, const Sizes& sizes) { return TensorSlicingOp(derived(), startIndices, sizes); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorStridingSlicingOp stridedSlice(const StartIndices& startIndices, const StopIndices& stopIndices, const Strides& strides) const { return TensorStridingSlicingOp(derived(), startIndices, stopIndices, strides); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStridingSlicingOp stridedSlice(const StartIndices& startIndices, const StopIndices& stopIndices, const Strides& strides) { return TensorStridingSlicingOp(derived(), startIndices, stopIndices, strides); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorChippingOp chip(const Index offset) const { return TensorChippingOp(derived(), offset, DimId); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorChippingOp chip(const Index offset) { return TensorChippingOp(derived(), offset, DimId); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorChippingOp chip(const Index offset, const Index dim) const { return TensorChippingOp(derived(), offset, dim); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorChippingOp chip(const Index offset, const Index dim) { return TensorChippingOp(derived(), offset, dim); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorReverseOp reverse(const ReverseDimensions& rev) const { return TensorReverseOp(derived(), rev); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReverseOp reverse(const ReverseDimensions& rev) { return TensorReverseOp(derived(), rev); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorShufflingOp shuffle(const Shuffle& shfl) const { return TensorShufflingOp(derived(), shfl); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorShufflingOp shuffle(const Shuffle& shfl) { return TensorShufflingOp(derived(), shfl); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorStridingOp stride(const Strides& strides) const { return TensorStridingOp(derived(), strides); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStridingOp stride(const Strides& strides) { return TensorStridingOp(derived(), strides); } // Select the device on which to evaluate the expression. template TensorDevice device(const DeviceType& dev) { return TensorDevice(dev, derived()); } // Select the async device on which to evaluate the expression. template TensorAsyncDevice device(const DeviceType& dev, DoneCallback done) { return TensorAsyncDevice(dev, derived(), std::move(done)); } protected: EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TensorBase) EIGEN_DEFAULT_COPY_CONSTRUCTOR(TensorBase) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const OtherDerived& other) { typedef TensorAssignOp Assign; Assign assign(derived(), other.derived()); internal::TensorExecutor::run(assign, DefaultDevice()); return derived(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& derived() { return *static_cast(this); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Derived& derived() const { return *static_cast(this); } }; #endif // EIGEN_PARSED_BY_DOXYGEN } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_BASE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h0000644000176200001440000000525214562417221026651 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_INITIALIZER_H #define EIGEN_CXX11_TENSOR_TENSOR_INITIALIZER_H #if EIGEN_HAS_VARIADIC_TEMPLATES #include namespace Eigen { /** \class TensorInitializer * \ingroup CXX11_Tensor_Module * * \brief Helper template to initialize Tensors from std::initializer_lists. */ namespace internal { template struct Initializer { typedef std::initializer_list< typename Initializer::InitList> InitList; static void run(TensorEvaluator& tensor, Eigen::array::Index, traits::NumDimensions>* indices, const InitList& vals) { int i = 0; for (const auto& v : vals) { (*indices)[traits::NumDimensions - N] = i++; Initializer::run(tensor, indices, v); } } }; template struct Initializer { typedef std::initializer_list::Scalar> InitList; static void run(TensorEvaluator& tensor, Eigen::array::Index, traits::NumDimensions>* indices, const InitList& vals) { int i = 0; // There is likely a faster way to do that than iterating. for (const auto& v : vals) { (*indices)[traits::NumDimensions - 1] = i++; tensor.coeffRef(*indices) = v; } } }; template struct Initializer { typedef typename traits::Scalar InitList; static void run(TensorEvaluator& tensor, Eigen::array::Index, traits::NumDimensions>*, const InitList& v) { tensor.coeffRef(0) = v; } }; template void initialize_tensor(TensorEvaluator& tensor, const typename Initializer::NumDimensions>::InitList& vals) { Eigen::array::Index, traits::NumDimensions> indices; Initializer::NumDimensions>::run(tensor, &indices, vals); } } // namespace internal } // namespace Eigen #endif // EIGEN_HAS_VARIADIC_TEMPLATES #endif // EIGEN_CXX11_TENSOR_TENSOR_INITIALIZER_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorTrace.h0000644000176200001440000002365014562417221025426 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gagan Goel // Copyright (C) 2017 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_TRACE_H #define EIGEN_CXX11_TENSOR_TENSOR_TRACE_H namespace Eigen { /** \class TensorTrace * \ingroup CXX11_Tensor_Module * * \brief Tensor Trace class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions - array_size::value; static const int Layout = XprTraits::Layout; }; template struct eval, Eigen::Dense> { typedef const TensorTraceOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorTraceOp type; }; } // end namespace internal template class TensorTraceOp : public TensorBase > { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorTraceOp(const XprType& expr, const Dims& dims) : m_xpr(expr), m_dims(dims) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dims& dims() const { return m_dims; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const Dims m_dims; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorTraceOp XprType; static const int NumInputDims = internal::array_size::Dimensions>::value; static const int NumReducedDims = internal::array_size::value; static const int NumOutputDims = NumInputDims - NumReducedDims; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = internal::unpacket_traits::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_traceDim(1), m_device(device) { EIGEN_STATIC_ASSERT((NumOutputDims >= 0), YOU_MADE_A_PROGRAMMING_MISTAKE); EIGEN_STATIC_ASSERT((NumReducedDims >= 2) || ((NumReducedDims == 0) && (NumInputDims == 0)), YOU_MADE_A_PROGRAMMING_MISTAKE); for (int i = 0; i < NumInputDims; ++i) { m_reduced[i] = false; } const Dims& op_dims = op.dims(); for (int i = 0; i < NumReducedDims; ++i) { eigen_assert(op_dims[i] >= 0); eigen_assert(op_dims[i] < NumInputDims); m_reduced[op_dims[i]] = true; } // All the dimensions should be distinct to compute the trace int num_distinct_reduce_dims = 0; for (int i = 0; i < NumInputDims; ++i) { if (m_reduced[i]) { ++num_distinct_reduce_dims; } } eigen_assert(num_distinct_reduce_dims == NumReducedDims); // Compute the dimensions of the result. const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); int output_index = 0; int reduced_index = 0; for (int i = 0; i < NumInputDims; ++i) { if (m_reduced[i]) { m_reducedDims[reduced_index] = input_dims[i]; if (reduced_index > 0) { // All the trace dimensions must have the same size eigen_assert(m_reducedDims[0] == m_reducedDims[reduced_index]); } ++reduced_index; } else { m_dimensions[output_index] = input_dims[i]; ++output_index; } } if (NumReducedDims != 0) { m_traceDim = m_reducedDims[0]; } // Compute the output strides if (NumOutputDims > 0) { if (static_cast(Layout) == static_cast(ColMajor)) { m_outputStrides[0] = 1; for (int i = 1; i < NumOutputDims; ++i) { m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1]; } } else { m_outputStrides.back() = 1; for (int i = NumOutputDims - 2; i >= 0; --i) { m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1]; } } } // Compute the input strides if (NumInputDims > 0) { array input_strides; if (static_cast(Layout) == static_cast(ColMajor)) { input_strides[0] = 1; for (int i = 1; i < NumInputDims; ++i) { input_strides[i] = input_strides[i - 1] * input_dims[i - 1]; } } else { input_strides.back() = 1; for (int i = NumInputDims - 2; i >= 0; --i) { input_strides[i] = input_strides[i + 1] * input_dims[i + 1]; } } output_index = 0; reduced_index = 0; for (int i = 0; i < NumInputDims; ++i) { if(m_reduced[i]) { m_reducedStrides[reduced_index] = input_strides[i]; ++reduced_index; } else { m_preservedStrides[output_index] = input_strides[i]; ++output_index; } } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { // Initialize the result CoeffReturnType result = internal::cast(0); Index index_stride = 0; for (int i = 0; i < NumReducedDims; ++i) { index_stride += m_reducedStrides[i]; } // If trace is requested along all dimensions, starting index would be 0 Index cur_index = 0; if (NumOutputDims != 0) cur_index = firstInput(index); for (Index i = 0; i < m_traceDim; ++i) { result += m_impl.coeff(cur_index); cur_index += index_stride; } return result; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(index + PacketSize - 1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index + i); } PacketReturnType result = internal::ploadt(values); return result; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: // Given the output index, finds the first index in the input tensor used to compute the trace EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index firstInput(Index index) const { Index startInput = 0; if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = NumOutputDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; startInput += idx * m_preservedStrides[i]; index -= idx * m_outputStrides[i]; } startInput += index * m_preservedStrides[0]; } else { for (int i = 0; i < NumOutputDims - 1; ++i) { const Index idx = index / m_outputStrides[i]; startInput += idx * m_preservedStrides[i]; index -= idx * m_outputStrides[i]; } startInput += index * m_preservedStrides[NumOutputDims - 1]; } return startInput; } Dimensions m_dimensions; TensorEvaluator m_impl; // Initialize the size of the trace dimension Index m_traceDim; const Device EIGEN_DEVICE_REF m_device; array m_reduced; array m_reducedDims; array m_outputStrides; array m_reducedStrides; array m_preservedStrides; }; } // End namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_TRACE_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorScanSycl.h0000644000176200001440000006127714562417221026116 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * TensorScanSycl.h * * \brief: * Tensor Scan Sycl implement the extend version of * "Efficient parallel scan algorithms for GPUs." .for Tensor operations. * The algorithm requires up to 3 stage (consequently 3 kernels) depending on * the size of the tensor. In the first kernel (ScanKernelFunctor), each * threads within the work-group individually reduces the allocated elements per * thread in order to reduces the total number of blocks. In the next step all * thread within the work-group will reduce the associated blocks into the * temporary buffers. In the next kernel(ScanBlockKernelFunctor), the temporary * buffer is given as an input and all the threads within a work-group scan and * reduces the boundaries between the blocks (generated from the previous * kernel). and write the data on the temporary buffer. If the second kernel is * required, the third and final kerenl (ScanAdjustmentKernelFunctor) will * adjust the final result into the output buffer. * The original algorithm for the parallel prefix sum can be found here: * * Sengupta, Shubhabrata, Mark Harris, and Michael Garland. "Efficient parallel * scan algorithms for GPUs." NVIDIA, Santa Clara, CA, Tech. Rep. NVR-2008-003 *1, no. 1 (2008): 1-17. *****************************************************************/ #ifndef UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_SYCL_SYCL_HPP #define UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_SYCL_SYCL_HPP namespace Eigen { namespace TensorSycl { namespace internal { #ifndef EIGEN_SYCL_MAX_GLOBAL_RANGE #define EIGEN_SYCL_MAX_GLOBAL_RANGE (EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1 * 4) #endif template struct ScanParameters { // must be power of 2 static EIGEN_CONSTEXPR index_t ScanPerThread = 8; const index_t total_size; const index_t non_scan_size; const index_t scan_size; const index_t non_scan_stride; const index_t scan_stride; const index_t panel_threads; const index_t group_threads; const index_t block_threads; const index_t elements_per_group; const index_t elements_per_block; const index_t loop_range; ScanParameters(index_t total_size_, index_t non_scan_size_, index_t scan_size_, index_t non_scan_stride_, index_t scan_stride_, index_t panel_threads_, index_t group_threads_, index_t block_threads_, index_t elements_per_group_, index_t elements_per_block_, index_t loop_range_) : total_size(total_size_), non_scan_size(non_scan_size_), scan_size(scan_size_), non_scan_stride(non_scan_stride_), scan_stride(scan_stride_), panel_threads(panel_threads_), group_threads(group_threads_), block_threads(block_threads_), elements_per_group(elements_per_group_), elements_per_block(elements_per_block_), loop_range(loop_range_) {} }; enum class scan_step { first, second }; template struct ScanKernelFunctor { typedef cl::sycl::accessor LocalAccessor; static EIGEN_CONSTEXPR int PacketSize = ScanParameters::ScanPerThread / 2; LocalAccessor scratch; Evaluator dev_eval; OutAccessor out_accessor; OutAccessor temp_accessor; const ScanParameters scanParameters; Op accumulator; const bool inclusive; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ScanKernelFunctor(LocalAccessor scratch_, const Evaluator dev_eval_, OutAccessor out_accessor_, OutAccessor temp_accessor_, const ScanParameters scanParameters_, Op accumulator_, const bool inclusive_) : scratch(scratch_), dev_eval(dev_eval_), out_accessor(out_accessor_), temp_accessor(temp_accessor_), scanParameters(scanParameters_), accumulator(accumulator_), inclusive(inclusive_) {} template typename ::Eigen::internal::enable_if::type EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE read(const Input &inpt, Index global_id) { return inpt.coeff(global_id); } template typename ::Eigen::internal::enable_if::type EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE read(const Input &inpt, Index global_id) { return inpt[global_id]; } template typename ::Eigen::internal::enable_if::type EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE first_step_inclusive_Operation(InclusiveOp inclusive_op) { inclusive_op(); } template typename ::Eigen::internal::enable_if::type EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE first_step_inclusive_Operation(InclusiveOp) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { auto out_ptr = out_accessor.get_pointer(); auto tmp_ptr = temp_accessor.get_pointer(); auto scratch_ptr = scratch.get_pointer().get(); for (Index loop_offset = 0; loop_offset < scanParameters.loop_range; loop_offset++) { Index data_offset = (itemID.get_global_id(0) + (itemID.get_global_range(0) * loop_offset)); Index tmp = data_offset % scanParameters.panel_threads; const Index panel_id = data_offset / scanParameters.panel_threads; const Index group_id = tmp / scanParameters.group_threads; tmp = tmp % scanParameters.group_threads; const Index block_id = tmp / scanParameters.block_threads; const Index local_id = tmp % scanParameters.block_threads; // we put one element per packet in scratch_mem const Index scratch_stride = scanParameters.elements_per_block / PacketSize; const Index scratch_offset = (itemID.get_local_id(0) / scanParameters.block_threads) * scratch_stride; CoeffReturnType private_scan[ScanParameters::ScanPerThread]; CoeffReturnType inclusive_scan; // the actual panel size is scan_size * non_scan_size. // elements_per_panel is roundup to power of 2 for binary tree const Index panel_offset = panel_id * scanParameters.scan_size * scanParameters.non_scan_size; const Index group_offset = group_id * scanParameters.non_scan_stride; // This will be effective when the size is bigger than elements_per_block const Index block_offset = block_id * scanParameters.elements_per_block * scanParameters.scan_stride; const Index thread_offset = (ScanParameters::ScanPerThread * local_id * scanParameters.scan_stride); const Index global_offset = panel_offset + group_offset + block_offset + thread_offset; Index next_elements = 0; EIGEN_UNROLL_LOOP for (int i = 0; i < ScanParameters::ScanPerThread; i++) { Index global_id = global_offset + next_elements; private_scan[i] = ((((block_id * scanParameters.elements_per_block) + (ScanParameters::ScanPerThread * local_id) + i) < scanParameters.scan_size) && (global_id < scanParameters.total_size)) ? read(dev_eval, global_id) : accumulator.initialize(); next_elements += scanParameters.scan_stride; } first_step_inclusive_Operation([&]() EIGEN_DEVICE_FUNC { if (inclusive) { inclusive_scan = private_scan[ScanParameters::ScanPerThread - 1]; } }); // This for loop must be 2 EIGEN_UNROLL_LOOP for (int packetIndex = 0; packetIndex < ScanParameters::ScanPerThread; packetIndex += PacketSize) { Index private_offset = 1; // build sum in place up the tree EIGEN_UNROLL_LOOP for (Index d = PacketSize >> 1; d > 0; d >>= 1) { EIGEN_UNROLL_LOOP for (Index l = 0; l < d; l++) { Index ai = private_offset * (2 * l + 1) - 1 + packetIndex; Index bi = private_offset * (2 * l + 2) - 1 + packetIndex; CoeffReturnType accum = accumulator.initialize(); accumulator.reduce(private_scan[ai], &accum); accumulator.reduce(private_scan[bi], &accum); private_scan[bi] = accumulator.finalize(accum); } private_offset *= 2; } scratch_ptr[2 * local_id + (packetIndex / PacketSize) + scratch_offset] = private_scan[PacketSize - 1 + packetIndex]; private_scan[PacketSize - 1 + packetIndex] = accumulator.initialize(); // traverse down tree & build scan EIGEN_UNROLL_LOOP for (Index d = 1; d < PacketSize; d *= 2) { private_offset >>= 1; EIGEN_UNROLL_LOOP for (Index l = 0; l < d; l++) { Index ai = private_offset * (2 * l + 1) - 1 + packetIndex; Index bi = private_offset * (2 * l + 2) - 1 + packetIndex; CoeffReturnType accum = accumulator.initialize(); accumulator.reduce(private_scan[ai], &accum); accumulator.reduce(private_scan[bi], &accum); private_scan[ai] = private_scan[bi]; private_scan[bi] = accumulator.finalize(accum); } } } Index offset = 1; // build sum in place up the tree for (Index d = scratch_stride >> 1; d > 0; d >>= 1) { // Synchronise itemID.barrier(cl::sycl::access::fence_space::local_space); if (local_id < d) { Index ai = offset * (2 * local_id + 1) - 1 + scratch_offset; Index bi = offset * (2 * local_id + 2) - 1 + scratch_offset; CoeffReturnType accum = accumulator.initialize(); accumulator.reduce(scratch_ptr[ai], &accum); accumulator.reduce(scratch_ptr[bi], &accum); scratch_ptr[bi] = accumulator.finalize(accum); } offset *= 2; } // Synchronise itemID.barrier(cl::sycl::access::fence_space::local_space); // next step optimisation if (local_id == 0) { if (((scanParameters.elements_per_group / scanParameters.elements_per_block) > 1)) { const Index temp_id = panel_id * (scanParameters.elements_per_group / scanParameters.elements_per_block) * scanParameters.non_scan_size + group_id * (scanParameters.elements_per_group / scanParameters.elements_per_block) + block_id; tmp_ptr[temp_id] = scratch_ptr[scratch_stride - 1 + scratch_offset]; } // clear the last element scratch_ptr[scratch_stride - 1 + scratch_offset] = accumulator.initialize(); } // traverse down tree & build scan for (Index d = 1; d < scratch_stride; d *= 2) { offset >>= 1; // Synchronise itemID.barrier(cl::sycl::access::fence_space::local_space); if (local_id < d) { Index ai = offset * (2 * local_id + 1) - 1 + scratch_offset; Index bi = offset * (2 * local_id + 2) - 1 + scratch_offset; CoeffReturnType accum = accumulator.initialize(); accumulator.reduce(scratch_ptr[ai], &accum); accumulator.reduce(scratch_ptr[bi], &accum); scratch_ptr[ai] = scratch_ptr[bi]; scratch_ptr[bi] = accumulator.finalize(accum); } } // Synchronise itemID.barrier(cl::sycl::access::fence_space::local_space); // This for loop must be 2 EIGEN_UNROLL_LOOP for (int packetIndex = 0; packetIndex < ScanParameters::ScanPerThread; packetIndex += PacketSize) { EIGEN_UNROLL_LOOP for (Index i = 0; i < PacketSize; i++) { CoeffReturnType accum = private_scan[packetIndex + i]; accumulator.reduce(scratch_ptr[2 * local_id + (packetIndex / PacketSize) + scratch_offset], &accum); private_scan[packetIndex + i] = accumulator.finalize(accum); } } first_step_inclusive_Operation([&]() EIGEN_DEVICE_FUNC { if (inclusive) { accumulator.reduce(private_scan[ScanParameters::ScanPerThread - 1], &inclusive_scan); private_scan[0] = accumulator.finalize(inclusive_scan); } }); next_elements = 0; // right the first set of private param EIGEN_UNROLL_LOOP for (Index i = 0; i < ScanParameters::ScanPerThread; i++) { Index global_id = global_offset + next_elements; if ((((block_id * scanParameters.elements_per_block) + (ScanParameters::ScanPerThread * local_id) + i) < scanParameters.scan_size) && (global_id < scanParameters.total_size)) { Index private_id = (i * !inclusive) + (((i + 1) % ScanParameters::ScanPerThread) * (inclusive)); out_ptr[global_id] = private_scan[private_id]; } next_elements += scanParameters.scan_stride; } } // end for loop } }; template struct ScanAdjustmentKernelFunctor { typedef cl::sycl::accessor LocalAccessor; static EIGEN_CONSTEXPR int PacketSize = ScanParameters::ScanPerThread / 2; InAccessor in_accessor; OutAccessor out_accessor; const ScanParameters scanParameters; Op accumulator; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ScanAdjustmentKernelFunctor(LocalAccessor, InAccessor in_accessor_, OutAccessor out_accessor_, const ScanParameters scanParameters_, Op accumulator_) : in_accessor(in_accessor_), out_accessor(out_accessor_), scanParameters(scanParameters_), accumulator(accumulator_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { auto in_ptr = in_accessor.get_pointer(); auto out_ptr = out_accessor.get_pointer(); for (Index loop_offset = 0; loop_offset < scanParameters.loop_range; loop_offset++) { Index data_offset = (itemID.get_global_id(0) + (itemID.get_global_range(0) * loop_offset)); Index tmp = data_offset % scanParameters.panel_threads; const Index panel_id = data_offset / scanParameters.panel_threads; const Index group_id = tmp / scanParameters.group_threads; tmp = tmp % scanParameters.group_threads; const Index block_id = tmp / scanParameters.block_threads; const Index local_id = tmp % scanParameters.block_threads; // the actual panel size is scan_size * non_scan_size. // elements_per_panel is roundup to power of 2 for binary tree const Index panel_offset = panel_id * scanParameters.scan_size * scanParameters.non_scan_size; const Index group_offset = group_id * scanParameters.non_scan_stride; // This will be effective when the size is bigger than elements_per_block const Index block_offset = block_id * scanParameters.elements_per_block * scanParameters.scan_stride; const Index thread_offset = ScanParameters::ScanPerThread * local_id * scanParameters.scan_stride; const Index global_offset = panel_offset + group_offset + block_offset + thread_offset; const Index block_size = scanParameters.elements_per_group / scanParameters.elements_per_block; const Index in_id = (panel_id * block_size * scanParameters.non_scan_size) + (group_id * block_size) + block_id; CoeffReturnType adjust_val = in_ptr[in_id]; Index next_elements = 0; EIGEN_UNROLL_LOOP for (Index i = 0; i < ScanParameters::ScanPerThread; i++) { Index global_id = global_offset + next_elements; if ((((block_id * scanParameters.elements_per_block) + (ScanParameters::ScanPerThread * local_id) + i) < scanParameters.scan_size) && (global_id < scanParameters.total_size)) { CoeffReturnType accum = adjust_val; accumulator.reduce(out_ptr[global_id], &accum); out_ptr[global_id] = accumulator.finalize(accum); } next_elements += scanParameters.scan_stride; } } } }; template struct ScanInfo { const Index &total_size; const Index &scan_size; const Index &panel_size; const Index &non_scan_size; const Index &scan_stride; const Index &non_scan_stride; Index max_elements_per_block; Index block_size; Index panel_threads; Index group_threads; Index block_threads; Index elements_per_group; Index elements_per_block; Index loop_range; Index global_range; Index local_range; const Eigen::SyclDevice &dev; EIGEN_STRONG_INLINE ScanInfo(const Index &total_size_, const Index &scan_size_, const Index &panel_size_, const Index &non_scan_size_, const Index &scan_stride_, const Index &non_scan_stride_, const Eigen::SyclDevice &dev_) : total_size(total_size_), scan_size(scan_size_), panel_size(panel_size_), non_scan_size(non_scan_size_), scan_stride(scan_stride_), non_scan_stride(non_scan_stride_), dev(dev_) { // must be power of 2 local_range = std::min(Index(dev.getNearestPowerOfTwoWorkGroupSize()), Index(EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1)); max_elements_per_block = local_range * ScanParameters::ScanPerThread; elements_per_group = dev.getPowerOfTwo(Index(roundUp(Index(scan_size), ScanParameters::ScanPerThread)), true); const Index elements_per_panel = elements_per_group * non_scan_size; elements_per_block = std::min(Index(elements_per_group), Index(max_elements_per_block)); panel_threads = elements_per_panel / ScanParameters::ScanPerThread; group_threads = elements_per_group / ScanParameters::ScanPerThread; block_threads = elements_per_block / ScanParameters::ScanPerThread; block_size = elements_per_group / elements_per_block; #ifdef EIGEN_SYCL_MAX_GLOBAL_RANGE const Index max_threads = std::min(Index(panel_threads * panel_size), Index(EIGEN_SYCL_MAX_GLOBAL_RANGE)); #else const Index max_threads = panel_threads * panel_size; #endif global_range = roundUp(max_threads, local_range); loop_range = Index( std::ceil(double(elements_per_panel * panel_size) / (global_range * ScanParameters::ScanPerThread))); } inline ScanParameters get_scan_parameter() { return ScanParameters(total_size, non_scan_size, scan_size, non_scan_stride, scan_stride, panel_threads, group_threads, block_threads, elements_per_group, elements_per_block, loop_range); } inline cl::sycl::nd_range<1> get_thread_range() { return cl::sycl::nd_range<1>(cl::sycl::range<1>(global_range), cl::sycl::range<1>(local_range)); } }; template struct SYCLAdjustBlockOffset { EIGEN_STRONG_INLINE static void adjust_scan_block_offset(EvaluatorPointerType in_ptr, EvaluatorPointerType out_ptr, Reducer &accumulator, const Index total_size, const Index scan_size, const Index panel_size, const Index non_scan_size, const Index scan_stride, const Index non_scan_stride, const Eigen::SyclDevice &dev) { auto scan_info = ScanInfo(total_size, scan_size, panel_size, non_scan_size, scan_stride, non_scan_stride, dev); typedef ScanAdjustmentKernelFunctor AdjustFuctor; dev.template unary_kernel_launcher(in_ptr, out_ptr, scan_info.get_thread_range(), scan_info.max_elements_per_block, scan_info.get_scan_parameter(), accumulator); } }; template struct ScanLauncher_impl { template EIGEN_STRONG_INLINE static void scan_block(Input in_ptr, EvaluatorPointerType out_ptr, Reducer &accumulator, const Index total_size, const Index scan_size, const Index panel_size, const Index non_scan_size, const Index scan_stride, const Index non_scan_stride, const bool inclusive, const Eigen::SyclDevice &dev) { auto scan_info = ScanInfo(total_size, scan_size, panel_size, non_scan_size, scan_stride, non_scan_stride, dev); const Index temp_pointer_size = scan_info.block_size * non_scan_size * panel_size; const Index scratch_size = scan_info.max_elements_per_block / (ScanParameters::ScanPerThread / 2); CoeffReturnType *temp_pointer = static_cast(dev.allocate_temp(temp_pointer_size * sizeof(CoeffReturnType))); EvaluatorPointerType tmp_global_accessor = dev.get(temp_pointer); typedef ScanKernelFunctor ScanFunctor; dev.template binary_kernel_launcher( in_ptr, out_ptr, tmp_global_accessor, scan_info.get_thread_range(), scratch_size, scan_info.get_scan_parameter(), accumulator, inclusive); if (scan_info.block_size > 1) { ScanLauncher_impl::scan_block( tmp_global_accessor, tmp_global_accessor, accumulator, temp_pointer_size, scan_info.block_size, panel_size, non_scan_size, Index(1), scan_info.block_size, false, dev); SYCLAdjustBlockOffset::adjust_scan_block_offset( tmp_global_accessor, out_ptr, accumulator, total_size, scan_size, panel_size, non_scan_size, scan_stride, non_scan_stride, dev); } dev.deallocate_temp(temp_pointer); } }; } // namespace internal } // namespace TensorSycl namespace internal { template struct ScanLauncher { typedef typename Self::Index Index; typedef typename Self::CoeffReturnType CoeffReturnType; typedef typename Self::Storage Storage; typedef typename Self::EvaluatorPointerType EvaluatorPointerType; void operator()(Self &self, EvaluatorPointerType data) { const Index total_size = internal::array_prod(self.dimensions()); const Index scan_size = self.size(); const Index scan_stride = self.stride(); // this is the scan op (can be sum or ...) auto accumulator = self.accumulator(); auto inclusive = !self.exclusive(); auto consume_dim = self.consume_dim(); auto dev = self.device(); auto dims = self.inner().dimensions(); Index non_scan_size = 1; Index panel_size = 1; if (static_cast(Self::Layout) == static_cast(ColMajor)) { for (int i = 0; i < consume_dim; i++) { non_scan_size *= dims[i]; } for (int i = consume_dim + 1; i < Self::NumDims; i++) { panel_size *= dims[i]; } } else { for (int i = Self::NumDims - 1; i > consume_dim; i--) { non_scan_size *= dims[i]; } for (int i = consume_dim - 1; i >= 0; i--) { panel_size *= dims[i]; } } const Index non_scan_stride = (scan_stride > 1) ? 1 : scan_size; auto eval_impl = self.inner(); TensorSycl::internal::ScanLauncher_impl::scan_block( eval_impl, data, accumulator, total_size, scan_size, panel_size, non_scan_size, scan_stride, non_scan_stride, inclusive, dev); } }; } // namespace internal } // namespace Eigen #endif // UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_SYCL_SYCL_HPP RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionBlocking.h0000644000176200001440000000516314562417221030323 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_BLOCKING_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_BLOCKING_H namespace Eigen { namespace internal { enum { ShardByRow = 0, ShardByCol = 1 }; // Default Blocking Strategy template class TensorContractionBlocking { public: /* adding EIGEN_DEVICE_FUNC unconditionally to 'TensorContractionBlocking' constructor in `TensorContractionBlocking.h` requires adding EIGEN_DEVICE_FUNC to `computeProductBlockingSizes` in `GeneralBlockPanelKernel.h` which in turn, requires adding EIGEN_DEVICE_FUNC to `evaluateProductBlockingSizesHeuristic` in `GeneralBlockPanelKernel.h` which in turn, requires adding EIGEN_DEVICE_FUNC to `manage_caching_sizes` in `GeneralBlockPanelKernel.h` (else HIPCC will error out) However adding EIGEN_DEVICE_FUNC to `manage_caching_sizes` in `GeneralBlockPanelKernel.h` results in NVCC erroring out with the following error ../Eigen/src/Core/products/GeneralBlockPanelKernel.h(57): error #2901: dynamic initialization is not supported for function-scope static variables within a __device__/__global__ function */ #if !defined(EIGEN_HIPCC) EIGEN_DEVICE_FUNC #endif TensorContractionBlocking(StorageIndex k, StorageIndex m, StorageIndex n, StorageIndex num_threads = 1) : kc_(k), mc_(m), nc_(n) { if (ShardingType == ShardByCol) { computeProductBlockingSizes(kc_, mc_, nc_, num_threads); } else { computeProductBlockingSizes(kc_, nc_, mc_, num_threads); } const int rhs_packet_size = internal::packet_traits::size; kc_ = (rhs_packet_size <= 8 || kc_ <= rhs_packet_size) ? kc_ : (kc_ / rhs_packet_size) * rhs_packet_size; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE StorageIndex kc() const { return kc_; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE StorageIndex mc() const { return mc_; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE StorageIndex nc() const { return nc_; } private: StorageIndex kc_; StorageIndex mc_; StorageIndex nc_; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_BLOCKING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h0000644000176200001440000004456314562417221026523 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H #define EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H namespace Eigen { /** \class TensorConversionOp * \ingroup CXX11_Tensor_Module * * \brief Tensor conversion class. This class makes it possible to vectorize * type casting operations when the number of scalars per packet in the source * and the destination type differ */ namespace internal { template struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef TargetType Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = traits::NumDimensions; static const int Layout = traits::Layout; enum { Flags = 0 }; typedef typename TypeConversion::PointerType>::type PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorConversionOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorConversionOp type; }; } // end namespace internal template struct PacketConverter; template struct PacketConverter { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketConverter(const TensorEvaluator& impl) : m_impl(impl) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const { return internal::pcast(m_impl.template packet(index)); } private: const TensorEvaluator& m_impl; }; template struct PacketConverter { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketConverter(const TensorEvaluator& impl) : m_impl(impl) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const { const int SrcPacketSize = internal::unpacket_traits::size; SrcPacket src1 = m_impl.template packet(index); SrcPacket src2 = m_impl.template packet(index + SrcPacketSize); TgtPacket result = internal::pcast(src1, src2); return result; } private: const TensorEvaluator& m_impl; }; template struct PacketConverter { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketConverter(const TensorEvaluator& impl) : m_impl(impl) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const { const int SrcPacketSize = internal::unpacket_traits::size; SrcPacket src1 = m_impl.template packet(index); SrcPacket src2 = m_impl.template packet(index + SrcPacketSize); SrcPacket src3 = m_impl.template packet(index + 2 * SrcPacketSize); SrcPacket src4 = m_impl.template packet(index + 3 * SrcPacketSize); TgtPacket result = internal::pcast(src1, src2, src3, src4); return result; } private: const TensorEvaluator& m_impl; }; template struct PacketConverter { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketConverter(const TensorEvaluator& impl) : m_impl(impl) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const { const int SrcPacketSize = internal::unpacket_traits::size; SrcPacket src1 = m_impl.template packet(index); SrcPacket src2 = m_impl.template packet(index + 1 * SrcPacketSize); SrcPacket src3 = m_impl.template packet(index + 2 * SrcPacketSize); SrcPacket src4 = m_impl.template packet(index + 3 * SrcPacketSize); SrcPacket src5 = m_impl.template packet(index + 4 * SrcPacketSize); SrcPacket src6 = m_impl.template packet(index + 5 * SrcPacketSize); SrcPacket src7 = m_impl.template packet(index + 6 * SrcPacketSize); SrcPacket src8 = m_impl.template packet(index + 7 * SrcPacketSize); TgtPacket result = internal::pcast(src1, src2, src3, src4, src5, src6, src7, src8); return result; } private: const TensorEvaluator& m_impl; }; template struct PacketConverter { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketConverter(const TensorEvaluator& impl) : m_impl(impl), m_maxIndex(impl.dimensions().TotalSize()) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const { const int SrcPacketSize = internal::unpacket_traits::size; // Only call m_impl.packet() when we have direct access to the underlying data. This // ensures that we don't compute the subexpression twice. We may however load some // coefficients twice, but in practice this doesn't negatively impact performance. if (m_impl.data() && (index + SrcPacketSize < m_maxIndex)) { // Force unaligned memory loads since we can't ensure alignment anymore return internal::pcast(m_impl.template packet(index)); } else { const int TgtPacketSize = internal::unpacket_traits::size; typedef typename internal::unpacket_traits::type SrcType; typedef typename internal::unpacket_traits::type TgtType; internal::scalar_cast_op converter; EIGEN_ALIGN_MAX typename internal::unpacket_traits::type values[TgtPacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < TgtPacketSize; ++i) { values[i] = converter(m_impl.coeff(index+i)); } TgtPacket rslt = internal::pload(values); return rslt; } } private: const TensorEvaluator& m_impl; const typename TensorEvaluator::Index m_maxIndex; }; template class TensorConversionOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Index Index; typedef typename internal::nested::type Nested; typedef Scalar CoeffReturnType; typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConversionOp(const XprType& xpr) : m_xpr(xpr) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; }; template struct ConversionSubExprEval { static EIGEN_STRONG_INLINE bool run(Eval& impl, EvalPointerType) { impl.evalSubExprsIfNeeded(NULL); return true; } }; template struct ConversionSubExprEval { static EIGEN_STRONG_INLINE bool run(Eval& impl, EvalPointerType data) { return impl.evalSubExprsIfNeeded(data); } }; #ifdef EIGEN_USE_THREADS template struct ConversionSubExprEvalAsync { static EIGEN_STRONG_INLINE void run(Eval& impl, EvalPointerType, EvalSubExprsCallback done) { impl.evalSubExprsIfNeededAsync(nullptr, std::move(done)); } }; template struct ConversionSubExprEvalAsync { static EIGEN_STRONG_INLINE void run(Eval& impl, EvalPointerType data, EvalSubExprsCallback done) { impl.evalSubExprsIfNeededAsync(data, std::move(done)); } }; #endif namespace internal { template struct CoeffConv { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetType run(const TensorEvaluator& impl, Index index) { internal::scalar_cast_op converter; return converter(impl.coeff(index)); } }; template struct CoeffConv { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetType run(const TensorEvaluator& impl, Index index) { return impl.coeff(index); } }; template struct PacketConv { typedef typename internal::unpacket_traits::type SrcType; typedef typename internal::unpacket_traits::type TargetType; static const int PacketSize = internal::unpacket_traits::size; template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetPacket run(const TensorEvaluator& impl, Index index) { internal::scalar_cast_op converter; EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = converter(impl.coeff(index+i)); } TargetPacket rslt = internal::pload(values); return rslt; } }; template struct PacketConv { typedef typename internal::unpacket_traits::type SrcType; typedef typename internal::unpacket_traits::type TargetType; template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetPacket run(const TensorEvaluator& impl, Index index) { const int SrcCoeffRatio = internal::type_casting_traits::SrcCoeffRatio; const int TgtCoeffRatio = internal::type_casting_traits::TgtCoeffRatio; PacketConverter, SrcPacket, TargetPacket, SrcCoeffRatio, TgtCoeffRatio> converter(impl); return converter.template packet(index); } }; template struct PacketConv { typedef typename internal::unpacket_traits::type TargetType; static const int PacketSize = internal::unpacket_traits::size; template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetPacket run(const TensorEvaluator& impl, Index index) { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; for (int i = 0; i < PacketSize; ++i) values[i] = impl.coeff(index+i); return internal::pload(values); } }; template struct PacketConv { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TargetPacket run(const TensorEvaluator& impl, Index index) { return impl.template packet(index); } }; } // namespace internal // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorConversionOp XprType; typedef typename XprType::Index Index; typedef typename TensorEvaluator::Dimensions Dimensions; typedef TargetType Scalar; typedef TargetType CoeffReturnType; typedef typename internal::remove_all::Scalar>::type SrcType; typedef typename PacketType::type PacketReturnType; typedef typename PacketType::type PacketSourceType; static const int PacketSize = PacketType::size; static const bool IsSameType = internal::is_same::value; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = #ifndef EIGEN_USE_SYCL true, #else TensorEvaluator::PacketAccess & internal::type_casting_traits::VectorizedCast, #endif BlockAccess = TensorEvaluator::BlockAccess, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, RawAccess = false }; static const int NumDims = internal::array_size::value; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; struct TensorConversionOpBlockFactory { template struct XprType { typedef TensorConversionOp type; }; template typename XprType::type expr(const ArgXprType& expr) const { return typename XprType::type(expr); } }; typedef internal::TensorUnaryExprBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { return ConversionSubExprEval, EvaluatorPointerType>::run(m_impl, data); } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType data, EvalSubExprsCallback done) { ConversionSubExprEvalAsync, EvaluatorPointerType, EvalSubExprsCallback>::run(m_impl, data, std::move(done)); } #endif EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return internal::CoeffConv::run(m_impl,index); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { // If we are not going to do the cast, we just need to check that base // TensorEvaluator has packet access. Otherwise we also need to make sure, // that we have an implementation of vectorized cast. const bool Vectorizable = IsSameType ? TensorEvaluator::PacketAccess : int(TensorEvaluator::PacketAccess) & int(internal::type_casting_traits::VectorizedCast); return internal::PacketConv::run(m_impl, index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double cast_cost = TensorOpCost::CastCost(); if (vectorized) { const double SrcCoeffRatio = internal::type_casting_traits::SrcCoeffRatio; const double TgtCoeffRatio = internal::type_casting_traits::TgtCoeffRatio; return m_impl.costPerCoeff(vectorized) * (SrcCoeffRatio / PacketSize) + TensorOpCost(0, 0, TgtCoeffRatio * (cast_cost / PacketSize)); } else { return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, cast_cost); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return m_impl.getResourceRequirements(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { return TensorBlock(m_impl.block(desc, scratch), TensorConversionOpBlockFactory()); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } /// required by sycl in order to extract the sycl accessor const TensorEvaluator& impl() const { return m_impl; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: TensorEvaluator m_impl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h0000644000176200001440000016666314562417221025436 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H #define EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H namespace Eigen { namespace internal { // -------------------------------------------------------------------------- // // Forward declarations for templates defined below. template class TensorBlockIO; // -------------------------------------------------------------------------- // // Helper function to compute strides for densely stored buffer of given // dimensions. // TODO(ezhulenev): We compute strides 1000 times in different evaluators, use // this function instead everywhere. template EIGEN_ALWAYS_INLINE DSizes strides( const DSizes& dimensions) { DSizes strides; if (NumDims == 0) return strides; // TODO(ezhulenev): Use templates to unroll this loop (similar to // h_array_reduce in CXX11meta.h)? Benchmark it. if (static_cast(Layout) == static_cast(ColMajor)) { strides[0] = 1; for (int i = 1; i < NumDims; ++i) { strides[i] = strides[i - 1] * dimensions[i - 1]; } } else { strides[NumDims - 1] = 1; for (int i = NumDims - 2; i >= 0; --i) { strides[i] = strides[i + 1] * dimensions[i + 1]; } } return strides; } template EIGEN_ALWAYS_INLINE DSizes strides( const Eigen::array& dimensions) { return strides(DSizes(dimensions)); } template EIGEN_STRONG_INLINE DSizes strides( const Sizes& sizes) { return strides(DSizes(sizes)); } // -------------------------------------------------------------------------- // // Tensor block shape type defines what are the shape preference for the blocks // extracted from the larger tensor. // // Example: blocks of 100 elements from the large 100x100 tensor: // - tensor: 100x100 // - target_block_size: 100 // // TensorBlockShapeType: // - kUniformAllDims: 100 blocks of size 10x10 // - kSkewedInnerDims: 100 blocks of size 100x1 (or 1x100 depending on a column // or row major layout) enum class TensorBlockShapeType { kUniformAllDims, kSkewedInnerDims }; struct TensorBlockResourceRequirements { TensorBlockShapeType shape_type; // target block shape size_t size; // target block size TensorOpCost cost_per_coeff; // cost of computing a single block element #ifdef EIGEN_HIPCC // For HIPCC, we need to explicitly declare as a "device fun", the constructor // which is implicitly invoked in the "merge" / "any" routines. else HIPCC // errors out complaining about the lack of a matching constructor EIGEN_DEVICE_FUNC TensorBlockResourceRequirements(TensorBlockShapeType shape_type_, size_t size_, TensorOpCost cost_) : shape_type(shape_type_), size(size_), cost_per_coeff(cost_) {} #endif template EIGEN_DEVICE_FUNC static TensorBlockResourceRequirements withShapeAndSize( TensorBlockShapeType shape_type, size_t size_in_bytes, TensorOpCost cost) { const size_t size = numext::maxi(size_t(1), size_in_bytes / sizeof(Scalar)); return {shape_type, size, cost}; } template EIGEN_DEVICE_FUNC static TensorBlockResourceRequirements withShapeAndSize( TensorBlockShapeType shape_type, size_t size_in_bytes) { // This default cost per coefficient is valid for most materialized tensor // block evaluation implementations, because they typically just read // coefficients from the underlying tensor storage, and write to the tensor // block buffer (scratch or destination memory, reads and writes have linear // access pattern). We ignore the fixed cost of block evaluation, because in // practice it should negligible. // // Lazy block evaluation adds the cost of calling a functor for each // coefficient. // // All non-trivial block evaluation implementations must provide their own // cost approximation (e.g. shuffling inner dimension has a much higher cost // because it reads memory randomly, although the total number of moved // bytes is the same). return withShapeAndSize(shape_type, size_in_bytes, {/*bytes_loaded=*/sizeof(Scalar), /*bytes_stored=*/sizeof(Scalar), /*compute_cycles=*/0}); } template EIGEN_DEVICE_FUNC static TensorBlockResourceRequirements skewed( size_t size_in_bytes) { return withShapeAndSize(TensorBlockShapeType::kSkewedInnerDims, size_in_bytes); } template EIGEN_DEVICE_FUNC static TensorBlockResourceRequirements uniform( size_t size_in_bytes) { return withShapeAndSize(TensorBlockShapeType::kUniformAllDims, size_in_bytes); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE TensorBlockResourceRequirements merge(const TensorBlockResourceRequirements& lhs, const TensorBlockResourceRequirements& rhs) { return {merge(lhs.shape_type, rhs.shape_type), // shape_type merge(lhs.size, rhs.size), // size merge(lhs.cost_per_coeff, rhs.cost_per_coeff)}; // cost_per_coeff } EIGEN_DEVICE_FUNC TensorBlockResourceRequirements& addCostPerCoeff( TensorOpCost cost) { cost_per_coeff += cost; return *this; } // This is a resource requirement that should be returned from expressions // that do not have any block evaluation preference (e.g. default tensor // expression with raw buffer access). EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE TensorBlockResourceRequirements any() { return {TensorBlockShapeType::kUniformAllDims, 1, {0, 0, 0}}; } private: using Requirements = TensorBlockResourceRequirements; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE size_t merge(size_t lhs_size, size_t rhs_size) { return numext::maxi(lhs_size, rhs_size); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE TensorBlockShapeType merge(TensorBlockShapeType lhs, TensorBlockShapeType rhs) { return (lhs == TensorBlockShapeType::kSkewedInnerDims || rhs == TensorBlockShapeType::kSkewedInnerDims) ? TensorBlockShapeType::kSkewedInnerDims : TensorBlockShapeType::kUniformAllDims; } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE TensorOpCost merge(TensorOpCost lhs_cost, TensorOpCost rhs_cost) { return lhs_cost + rhs_cost; } }; // -------------------------------------------------------------------------- // // TensorBlockDescriptor specifies a block offset within a tensor and the block // sizes along each of the tensor dimensions. template class TensorBlockDescriptor { public: typedef DSizes Dimensions; // If we evaluate a Tensor assignment, and expression on the left, already has // a memory buffer, then we might do performance optimization, and evaluate // the root expression directly into the final output memory. Some time it's // possible to reuse it for materializing subexpressions inside an expression // tree, to to avoid dynamic memory allocation. // // The pointer type of the underlying storage is erased, because passing // Scalar type through all the expression evaluation layers is way too many // templates. In practice destination buffer type should always match the // evaluated expression scalar type. class DestinationBuffer { public: enum DestinationBufferKind : int { // The above explicit specification of "int" as the enum basetype is // needed to get around a HIPCC link error ("the field type is not // amp-compatible") // which is issued for class members with the enum type. // TODO(rocm): // remove the "int" basetype once HIPCC has been fixed to not error out // in the above scenario. // Destination buffer is not defined (`m_data` == nullptr). kEmpty, // Tensor block defined by an owning tensor block descriptor can fit // contiguously into the destination buffer. In this case it's safe to // materialize tensor block in the destination buffer, wrap it in a // TensorMap, and use to build Eigen expression on top of it. kContiguous, // Destination buffer strides do not match strides of the contiguously // stored block, and it's impossible to define a TensorMap over this // buffer. However if we are evaluating a root of an expression tree, we // still can materialize an output into this destination, because we can // guarantee that no one will ever access it through block API. // // In theory it is possible to build valid TensorStriding // expression on top of this destination buffer, however it has // inefficient coeff/packet access, and defeats the purpose of fast block // evaluation API. kStrided }; template Scalar* data() const { eigen_assert(m_data_type_size == sizeof(Scalar)); return static_cast(m_data); } const Dimensions& strides() const { return m_strides; } const DestinationBufferKind& kind() const { return m_kind; } private: friend class TensorBlockDescriptor; DestinationBuffer() : m_data(NULL), m_data_type_size(0), m_kind(kEmpty) {} template DestinationBuffer(Scalar* data, const Dimensions& strides, DestinationBufferKind kind) : m_data(static_cast(data)), m_data_type_size(sizeof(Scalar)), m_strides(strides), m_kind(kind) {} template static DestinationBuffer make(const TensorBlockDescriptor& desc, Scalar* data, const Dimensions& strides) { return DestinationBuffer(data, strides, kind(desc, strides)); } template static DestinationBufferKind kind(const TensorBlockDescriptor& desc, const Dimensions& strides) { const Dimensions& desc_dims = desc.dimensions(); const Dimensions& desc_strides = internal::strides(desc_dims); for (int i = 0; i < NumDims; ++i) { if (desc_dims[i] == 1) continue; if (desc_strides[i] != strides[i]) return kStrided; } return kContiguous; } // Storage pointer is type erased, to reduce template bloat, but we still // keep the size of the underlying element type for error checking. void* m_data; size_t m_data_type_size; // Destination buffer dimensions always match the dimensions of a tensor // block descriptor it belongs to, however strides might be different. Dimensions m_strides; DestinationBufferKind m_kind; }; TensorBlockDescriptor(const IndexType offset, const Dimensions& dimensions, const DestinationBuffer& destination) : m_offset(offset), m_dimensions(dimensions), m_destination(destination) {} TensorBlockDescriptor(const IndexType offset, const Dimensions& dimensions) : m_offset(offset), m_dimensions(dimensions), m_destination(DestinationBuffer()) {} IndexType offset() const { return m_offset; } const Dimensions& dimensions() const { return m_dimensions; } IndexType dimension(int index) const { return m_dimensions[index]; } IndexType size() const { return array_prod(m_dimensions); } const DestinationBuffer& destination() const { return m_destination; } template void AddDestinationBuffer(Scalar* dst_base, const Dimensions& dst_strides) { eigen_assert(dst_base != NULL); m_destination = DestinationBuffer::template make(*this, dst_base, dst_strides); } template void AddDestinationBuffer( Scalar* dst_base, const DSizes& dst_strides) { // DSizes constructor will do index type promotion if it's safe. AddDestinationBuffer(dst_base, Dimensions(dst_strides)); } TensorBlockDescriptor& DropDestinationBuffer() { m_destination.m_data = NULL; m_destination.m_kind = DestinationBuffer::kEmpty; return *this; } bool HasDestinationBuffer() const { return m_destination.kind() != DestinationBuffer::kEmpty; } // Returns a copy of `*this` with updated offset. TensorBlockDescriptor WithOffset(IndexType offset) const { return TensorBlockDescriptor(offset, m_dimensions, m_destination); } private: // Offset and dimensions are immutable after construction. Block descriptor // can only be mutated by adding or dropping destination. const IndexType m_offset; const Dimensions m_dimensions; DestinationBuffer m_destination; }; // -------------------------------------------------------------------------- // // TensorBlockMapper is responsible for iterating over the blocks of a tensor. template class TensorBlockMapper { typedef TensorBlockDescriptor BlockDescriptor; public: typedef DSizes Dimensions; TensorBlockMapper() = default; TensorBlockMapper(const DSizes& dimensions, const TensorBlockResourceRequirements& requirements) : m_tensor_dimensions(dimensions), m_requirements(requirements) { // Compute block dimensions and the total number of blocks. InitializeBlockDimensions(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE IndexType blockCount() const { return m_total_block_count; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE IndexType blockTotalSize() const { return m_block_dimensions.TotalSize(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const DSizes& blockDimensions() const { return m_block_dimensions; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockDescriptor blockDescriptor(IndexType block_index) const { static const bool isColMajor = Layout == static_cast(ColMajor); IndexType offset = 0; DSizes dimensions; if (NumDims == 0) return BlockDescriptor(offset, dimensions); // Iterate outer -> inner dimensions. for (int i = NumDims - 1; i >= 0; --i) { const int dim = isColMajor ? i : NumDims - i - 1; const IndexType idx = block_index / m_block_strides[dim]; block_index -= idx * m_block_strides[dim]; const IndexType coord = idx * m_block_dimensions[dim]; dimensions[dim] = numext::mini(m_tensor_dimensions[dim] - coord, m_block_dimensions[dim]); offset += coord * m_tensor_strides[dim]; } return {offset, dimensions}; } private: void InitializeBlockDimensions() { // Requested block shape and size. const TensorBlockShapeType shape_type = m_requirements.shape_type; IndexType target_block_size = numext::maxi(1, static_cast(m_requirements.size)); IndexType tensor_size = m_tensor_dimensions.TotalSize(); // Corner case: one of the dimensions is zero. Logic below is too complex // to handle this case on a general basis, just use unit block size. // Note: we must not yield blocks with zero dimensions (recipe for // overflows/underflows, divisions by zero and NaNs later). if (tensor_size == 0) { for (int i = 0; i < NumDims; ++i) { m_block_dimensions[i] = 1; } m_total_block_count = 0; return; } // If tensor fits into a target block size, evaluate it as a single block. if (tensor_size <= target_block_size) { m_block_dimensions = m_tensor_dimensions; m_total_block_count = 1; // The only valid block index is `0`, and in this case we do not need // to compute real strides for tensor or blocks (see blockDescriptor). for (int i = 0; i < NumDims; ++i) { m_tensor_strides[i] = 0; m_block_strides[i] = 1; } return; } static const bool isColMajor = Layout == static_cast(ColMajor); // Block shape skewed towards inner dimension. if (shape_type == TensorBlockShapeType::kSkewedInnerDims) { IndexType coeff_to_allocate = target_block_size; for (int i = 0; i < NumDims; ++i) { const int dim = isColMajor ? i : NumDims - i - 1; m_block_dimensions[dim] = numext::mini(coeff_to_allocate, m_tensor_dimensions[dim]); coeff_to_allocate = divup( coeff_to_allocate, numext::maxi(static_cast(1), m_block_dimensions[dim])); } eigen_assert(coeff_to_allocate == 1); } else if (shape_type == TensorBlockShapeType::kUniformAllDims) { // Tensor will not fit within 'target_block_size' budget: calculate tensor // block dimension sizes based on "square" dimension size target. const IndexType dim_size_target = convert_index( std::pow(static_cast(target_block_size), 1.0f / static_cast(m_block_dimensions.rank()))); for (int i = 0; i < NumDims; ++i) { // TODO(andydavis) Adjust the inner most 'block_dim_size' to make it // a multiple of the packet size. Note that reducing // 'block_dim_size' in this manner can increase the number of // blocks, and so will amplify any per-block overhead. m_block_dimensions[i] = numext::mini(dim_size_target, m_tensor_dimensions[i]); } // Add any un-allocated coefficients to inner dimension(s). IndexType total_size = m_block_dimensions.TotalSize(); for (int i = 0; i < NumDims; ++i) { const int dim = isColMajor ? i : NumDims - i - 1; if (m_block_dimensions[dim] < m_tensor_dimensions[dim]) { const IndexType total_size_other_dims = total_size / m_block_dimensions[dim]; const IndexType alloc_avail = divup(target_block_size, total_size_other_dims); if (alloc_avail == m_block_dimensions[dim]) { // Insufficient excess coefficients to allocate. break; } m_block_dimensions[dim] = numext::mini(m_tensor_dimensions[dim], alloc_avail); total_size = total_size_other_dims * m_block_dimensions[dim]; } } } else { eigen_assert(false); // unknown block shape } eigen_assert(m_block_dimensions.TotalSize() >= numext::mini(target_block_size, m_tensor_dimensions.TotalSize())); // Calculate block counts by dimension and total block count. DSizes block_count; for (int i = 0; i < NumDims; ++i) { block_count[i] = divup(m_tensor_dimensions[i], m_block_dimensions[i]); } m_total_block_count = array_prod(block_count); // Calculate block strides (used for enumerating blocks). m_tensor_strides = strides(m_tensor_dimensions); m_block_strides = strides(block_count); } DSizes m_tensor_dimensions; TensorBlockResourceRequirements m_requirements; DSizes m_block_dimensions; IndexType m_total_block_count; DSizes m_tensor_strides; DSizes m_block_strides; }; // -------------------------------------------------------------------------- // // TensorBlockScratchAllocator is responsible for allocating temporary buffers // for block evaluation (output or input block materialization). Given that // Eigen expression traversal order is deterministic, all temporary allocations // are happening in the same order, and usually have exactly the same size. // Scratch allocator keeps a trace of all dynamic allocations, and after the // first block evaluation is completed, we should be able to reuse all the // temporary buffers for the next block evaluation. template class TensorBlockScratchAllocator { public: explicit TensorBlockScratchAllocator(const Device& device) : m_device(device), m_allocation_index(0) {} ~TensorBlockScratchAllocator() { for (size_t i = 0; i < m_allocations.size(); ++i) { m_device.deallocate(m_allocations[i].ptr); } } void* allocate(size_t size) { // TODO(ezhulenev): Remove when replaced with inlined vector. if (m_allocations.capacity() == 0) m_allocations.reserve(8); // Check if we already have an existing allocation att current index. const int num_allocations = static_cast(m_allocations.size()); const bool has_allocation = m_allocation_index < num_allocations; // Allocation index can't be larger than the number of allocations. eigen_assert(m_allocation_index <= num_allocations); // If we have existing allocation, and its size is larger or equal to // requested size, we do nothing. // If current allocation can't fit requested size, we deallocate it, and // replace with a larger allocation. if (has_allocation && m_allocations[m_allocation_index].size < size) { m_device.deallocate(m_allocations[m_allocation_index].ptr); m_allocations[m_allocation_index].ptr = m_device.allocate(size); m_allocations[m_allocation_index].size = size; } // Make a new allocation if we don't have and existing one. if (!has_allocation) { Allocation allocation; allocation.ptr = m_device.allocate(size); allocation.size = size; m_allocations.push_back(allocation); } eigen_assert(m_allocations[m_allocation_index].ptr != NULL); eigen_assert(m_allocations[m_allocation_index].size >= size); return m_allocations[m_allocation_index++].ptr; } void reset() { m_allocation_index = 0; } private: struct Allocation { void* ptr; size_t size; }; const Device& m_device; int m_allocation_index; // TODO(ezhulenev): This should be an inlined vector. std::vector m_allocations; }; // -------------------------------------------------------------------------- // // TensorBlockKind represents all possible block kinds, that can be produced by // TensorEvaluator::evalBlock function. enum TensorBlockKind { // Tensor block that is a lazy expression that must be assigned to a // destination using TensorBlockAssign. kExpr, // Tensor block that is a view into a memory buffer owned by an underlying // Tensor expression (e.g. it can be a view into a Tensor buffer). kView, // Tensor block that was materialized in a scratch memory buffer, allocated // with TensorBlockScratchAllocator. This block must be copied to a // destination, similar to a block of `kExpr` type. kMaterializedInScratch, // Tensor block that was materialized directly into the final output memory // buffer. For example if the left side of an assignment is a Tensor, we can // directly materialize the block in the destination memory. // // If strides in the output buffer do not match tensor block strides, the // Tensor expression will be invalid, and should not be used by // TensorBlockAssign or for constructing another block expression. kMaterializedInOutput }; // -------------------------------------------------------------------------- // // TensorBlockNotImplemented should be used to defined TensorBlock typedef in // TensorEvaluators that do not support block evaluation. class TensorBlockNotImplemented { public: typedef void XprType; }; // -------------------------------------------------------------------------- // // XprScalar extracts Scalar type from the Eigen expressions (if expression type // is not void). It's required to be able to define lazy block expression for // argument types, that do not support block evaluation. template struct XprScalar { typedef typename XprType::Scalar type; }; template <> struct XprScalar { typedef void type; }; // -------------------------------------------------------------------------- // // TensorMaterializedBlock is a fully evaluated block of the original tensor, // and XprType is just a TensorMap over the data. This block type is typically // used to materialize blocks of tensor expressions, that can't be efficiently // represented as lazy Tensor expressions with fast coeff/packet operations, // e.g. we materialize all broadcasts into evaluated blocks. // // TensorMaterializedBlock does not own its memory buffer, it's either a memory // buffer that backs the original expression (e.g. block is just a view into a // Tensor), or a memory buffer allocated with scratch allocator, and in this // case the scratch allocator will deallocate it at the end of block based // expression execution. // // If the block was evaluated directly into the output buffer, and strides in // the output buffer do not match block strides, the TensorMap expression will // be invalid, and should never be used in block assignment or any other tensor // expression. template class TensorMaterializedBlock { public: typedef DSizes Dimensions; typedef TensorMap > XprType; TensorMaterializedBlock(TensorBlockKind kind, const Scalar* data, const Dimensions& dimensions, bool valid_expr = true) : m_kind(kind), m_data(data), m_dimensions(dimensions), m_expr(m_data, m_dimensions), m_valid_expr(valid_expr) { eigen_assert(m_kind == internal::TensorBlockKind::kView || m_kind == internal::TensorBlockKind::kMaterializedInScratch || m_kind == internal::TensorBlockKind::kMaterializedInOutput); } TensorBlockKind kind() const { return m_kind; } // NOTE(ezhulenev): Returning XprType by value like in other block types // causes asan failures. The theory is that XprType::Nested doesn't work // properly for TensorMap. const XprType& expr() const { eigen_assert(m_valid_expr); return m_expr; } const Scalar* data() const { return m_data; } void cleanup() {} typedef internal::TensorBlockDescriptor TensorBlockDesc; // TensorMaterializedBlock can be backed by different types of storage: // // (1) Contiguous block of memory allocated with scratch allocator. // (2) Contiguous block of memory reused from tensor block descriptor // destination buffer. // (3) Strided block of memory reused from tensor block descriptor // destination buffer. // class Storage { public: Scalar* data() const { return m_data; } const Dimensions& dimensions() const { return m_dimensions; } const Dimensions& strides() const { return m_strides; } TensorMaterializedBlock AsTensorMaterializedBlock() const { return TensorMaterializedBlock( m_materialized_in_output ? internal::TensorBlockKind::kMaterializedInOutput : internal::TensorBlockKind::kMaterializedInScratch, m_data, m_dimensions, !m_strided_storage); } private: friend class TensorMaterializedBlock; Storage(Scalar* data, const Dimensions& dimensions, const Dimensions& strides, bool materialized_in_output, bool strided_storage) : m_data(data), m_dimensions(dimensions), m_strides(strides), m_materialized_in_output(materialized_in_output), m_strided_storage(strided_storage) {} Scalar* m_data; Dimensions m_dimensions; Dimensions m_strides; bool m_materialized_in_output; bool m_strided_storage; }; // Creates a storage for materialized block either from the block descriptor // destination buffer, or allocates a new buffer with scratch allocator. template EIGEN_STRONG_INLINE static Storage prepareStorage( TensorBlockDesc& desc, TensorBlockScratch& scratch, bool allow_strided_storage = false) { // Try to reuse destination as an output block buffer. typedef typename TensorBlockDesc::DestinationBuffer DestinationBuffer; if (desc.destination().kind() == DestinationBuffer::kContiguous) { Scalar* buffer = desc.destination().template data(); desc.DropDestinationBuffer(); return Storage(buffer, desc.dimensions(), internal::strides(desc.dimensions()), /*materialized_in_output=*/true, /*strided_storage=*/false); } else if (desc.destination().kind() == DestinationBuffer::kStrided && allow_strided_storage) { Scalar* buffer = desc.destination().template data(); desc.DropDestinationBuffer(); return Storage(buffer, desc.dimensions(), desc.destination().strides(), /*materialized_in_output=*/true, /*strided_storage=*/true); } else { void* mem = scratch.allocate(desc.size() * sizeof(Scalar)); return Storage(static_cast(mem), desc.dimensions(), internal::strides(desc.dimensions()), /*materialized_in_output=*/false, /*strided_storage=*/false); } } // Creates a materialized block for the given descriptor from a memory buffer. template EIGEN_STRONG_INLINE static TensorMaterializedBlock materialize( const Scalar* data, const DataDimensions& data_dims, TensorBlockDesc& desc, TensorBlockScratch& scratch) { eigen_assert(array_size::value == desc.dimensions().size()); // If a tensor block dimensions covers a contiguous block of the underlying // memory, we can skip block buffer memory allocation, and construct a block // from existing `data` memory buffer. // // Example: (RowMajor layout) // data_dims: [11, 12, 13, 14] // desc.dimensions(): [1, 1, 3, 14] // // In this case we can construct a TensorBlock starting at // `data + desc.offset()`, with a `desc.dimensions()` block sizes. static const bool is_col_major = Layout == ColMajor; // Find out how many inner dimensions have a matching size. int num_matching_inner_dims = 0; for (int i = 0; i < NumDims; ++i) { int dim = is_col_major ? i : NumDims - i - 1; if (data_dims[dim] != desc.dimensions()[dim]) break; ++num_matching_inner_dims; } // All the outer dimensions must be of size `1`, except a single dimension // before the matching inner dimension (`3` in the example above). bool can_use_direct_access = true; for (int i = num_matching_inner_dims + 1; i < NumDims; ++i) { int dim = is_col_major ? i : NumDims - i - 1; if (desc.dimension(dim) != 1) { can_use_direct_access = false; break; } } if (can_use_direct_access) { const Scalar* block_start = data + desc.offset(); return TensorMaterializedBlock(internal::TensorBlockKind::kView, block_start, desc.dimensions()); } else { // Reuse destination buffer or allocate new buffer with scratch allocator. const Storage storage = prepareStorage(desc, scratch); typedef internal::TensorBlockIO TensorBlockIO; typedef typename TensorBlockIO::Dst TensorBlockIODst; typedef typename TensorBlockIO::Src TensorBlockIOSrc; TensorBlockIOSrc src(internal::strides(Dimensions(data_dims)), data, desc.offset()); TensorBlockIODst dst(storage.dimensions(), storage.strides(), storage.data()); TensorBlockIO::Copy(dst, src); return storage.AsTensorMaterializedBlock(); } } private: TensorBlockKind m_kind; const Scalar* m_data; Dimensions m_dimensions; XprType m_expr; bool m_valid_expr; }; // -------------------------------------------------------------------------- // // TensorCwiseUnaryBlock is a lazy tensor expression block that applies UnaryOp // functor to the blocks produced by the underlying Tensor expression. template class TensorCwiseUnaryBlock { static const bool NoArgBlockAccess = internal::is_void::value; public: typedef typename conditional< NoArgBlockAccess, void, TensorCwiseUnaryOp >:: type XprType; typedef typename XprScalar::type Scalar; TensorCwiseUnaryBlock(const ArgTensorBlock& arg_block, const UnaryOp& functor) : m_arg_block(arg_block), m_functor(functor) {} TensorBlockKind kind() const { return internal::TensorBlockKind::kExpr; } XprType expr() const { return XprType(m_arg_block.expr(), m_functor); } const Scalar* data() const { return NULL; } void cleanup() { m_arg_block.cleanup(); } private: ArgTensorBlock m_arg_block; UnaryOp m_functor; }; // -------------------------------------------------------------------------- // // TensorCwiseUnaryBlock is a lazy tensor expression block that applies BinaryOp // functor to the blocks produced by the underlying Tensor expression. template class TensorCwiseBinaryBlock { static const bool NoArgBlockAccess = internal::is_void::value || internal::is_void::value; public: typedef typename conditional< NoArgBlockAccess, void, TensorCwiseBinaryOp >::type XprType; typedef typename XprScalar::type Scalar; TensorCwiseBinaryBlock(const LhsTensorBlock& left_block, const RhsTensorBlock& right_block, const BinaryOp& functor) : m_left_block(left_block), m_right_block(right_block), m_functor(functor) {} TensorBlockKind kind() const { return internal::TensorBlockKind::kExpr; } XprType expr() const { return XprType(m_left_block.expr(), m_right_block.expr(), m_functor); } const Scalar* data() const { return NULL; } void cleanup() { m_left_block.cleanup(); m_right_block.cleanup(); } private: LhsTensorBlock m_left_block; RhsTensorBlock m_right_block; BinaryOp m_functor; }; // -------------------------------------------------------------------------- // // TensorUnaryExprBlock is a lazy tensor expression block that can construct // an arbitrary tensor expression from a block of the underlying type (this is a // generalization of the TensorCwiseUnaryBlock for arbitrary expressions). template class TensorUnaryExprBlock { typedef typename ArgTensorBlock::XprType ArgXprType; static const bool NoArgBlockAccess = internal::is_void::value; public: typedef typename conditional< NoArgBlockAccess, void, typename BlockFactory::template XprType::type>::type XprType; typedef typename XprScalar::type Scalar; TensorUnaryExprBlock(const ArgTensorBlock& arg_block, const BlockFactory& factory) : m_arg_block(arg_block), m_factory(factory) {} TensorBlockKind kind() const { return internal::TensorBlockKind::kExpr; } XprType expr() const { return m_factory.expr(m_arg_block.expr()); } const Scalar* data() const { return NULL; } void cleanup() { m_arg_block.cleanup(); } private: ArgTensorBlock m_arg_block; BlockFactory m_factory; }; // -------------------------------------------------------------------------- // // TensorTernaryExprBlock is a lazy tensor expression block that can construct // an arbitrary tensor expression from three blocks of the underlying type. template class TensorTernaryExprBlock { typedef typename Arg1TensorBlock::XprType Arg1XprType; typedef typename Arg2TensorBlock::XprType Arg2XprType; typedef typename Arg3TensorBlock::XprType Arg3XprType; static const bool NoArgBlockAccess = internal::is_void::value || internal::is_void::value || internal::is_void::value; public: typedef typename conditional< NoArgBlockAccess, void, typename BlockFactory::template XprType::type>::type XprType; typedef typename XprScalar::type Scalar; TensorTernaryExprBlock(const Arg1TensorBlock& arg1_block, const Arg2TensorBlock& arg2_block, const Arg3TensorBlock& arg3_block, const BlockFactory& factory) : m_arg1_block(arg1_block), m_arg2_block(arg2_block), m_arg3_block(arg3_block), m_factory(factory) {} TensorBlockKind kind() const { return internal::TensorBlockKind::kExpr; } XprType expr() const { return m_factory.expr(m_arg1_block.expr(), m_arg2_block.expr(), m_arg3_block.expr()); } const Scalar* data() const { return NULL; } void cleanup() { m_arg1_block.cleanup(); m_arg2_block.cleanup(); m_arg3_block.cleanup(); } private: Arg1TensorBlock m_arg1_block; Arg2TensorBlock m_arg2_block; Arg3TensorBlock m_arg3_block; BlockFactory m_factory; }; // -------------------------------------------------------------------------- // // StridedLinearBufferCopy provides a method to copy data between two linear // buffers with different strides, with optimized paths for scatter/gather. template class StridedLinearBufferCopy { typedef typename packet_traits::type Packet; enum { Vectorizable = packet_traits::Vectorizable, PacketSize = packet_traits::size }; public: // Specifying linear copy kind statically gives ~30% speedup for small sizes. enum class Kind { Linear = 0, // src_stride == 1 && dst_stride == 1 Scatter = 1, // src_stride == 1 && dst_stride != 1 FillLinear = 2, // src_stride == 0 && dst_stride == 1 FillScatter = 3, // src_stride == 0 && dst_stride != 1 Gather = 4, // dst_stride == 1 Random = 5 // everything else }; struct Dst { Dst(IndexType o, IndexType s, Scalar* d) : offset(o), stride(s), data(d) {} IndexType offset; IndexType stride; Scalar* data; }; struct Src { Src(IndexType o, IndexType s, const Scalar* d) : offset(o), stride(s), data(d) {} IndexType offset; IndexType stride; const Scalar* data; }; template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(const Dst& dst, const Src& src, const size_t count) { Run(count, dst.offset, dst.stride, dst.data, src.offset, src.stride, src.data); } private: template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run( const IndexType count, const IndexType dst_offset, const IndexType dst_stride, Scalar* EIGEN_RESTRICT dst_data, const IndexType src_offset, const IndexType src_stride, const Scalar* EIGEN_RESTRICT src_data) { const Scalar* src = &src_data[src_offset]; Scalar* dst = &dst_data[dst_offset]; if (!Vectorizable) { for (Index i = 0; i < count; ++i) { dst[i * dst_stride] = src[i * src_stride]; } return; } const IndexType vectorized_size = count - PacketSize; IndexType i = 0; if (kind == StridedLinearBufferCopy::Kind::Linear) { // ******************************************************************** // // Linear copy from `src` to `dst`. const IndexType unrolled_size = count - 4 * PacketSize; eigen_assert(src_stride == 1 && dst_stride == 1); for (; i <= unrolled_size; i += 4 * PacketSize) { for (int j = 0; j < 4; ++j) { Packet p = ploadu(src + i + j * PacketSize); pstoreu(dst + i + j * PacketSize, p); } } for (; i <= vectorized_size; i += PacketSize) { Packet p = ploadu(src + i); pstoreu(dst + i, p); } for (; i < count; ++i) { dst[i] = src[i]; } // ******************************************************************** // } else if (kind == StridedLinearBufferCopy::Kind::Scatter) { // Scatter from `src` to `dst`. eigen_assert(src_stride == 1 && dst_stride != 1); for (; i <= vectorized_size; i += PacketSize) { Packet p = ploadu(src + i); pscatter(dst + i * dst_stride, p, dst_stride); } for (; i < count; ++i) { dst[i * dst_stride] = src[i]; } // ******************************************************************** // } else if (kind == StridedLinearBufferCopy::Kind::FillLinear) { // Fill `dst` with value at `*src`. eigen_assert(src_stride == 0 && dst_stride == 1); const IndexType unrolled_size = count - 4 * PacketSize; Packet p = pload1(src); for (; i <= unrolled_size; i += 4 * PacketSize) { for (int j = 0; j < 4; ++j) { pstoreu(dst + i + j * PacketSize, p); } } for (; i <= vectorized_size; i += PacketSize) { pstoreu(dst + i, p); } for (; i < count; ++i) { dst[i] = *src; } // ******************************************************************** // } else if (kind == StridedLinearBufferCopy::Kind::FillScatter) { // Scatter `*src` into `dst`. eigen_assert(src_stride == 0 && dst_stride != 1); Packet p = pload1(src); for (; i <= vectorized_size; i += PacketSize) { pscatter(dst + i * dst_stride, p, dst_stride); } for (; i < count; ++i) { dst[i * dst_stride] = *src; } // ******************************************************************** // } else if (kind == StridedLinearBufferCopy::Kind::Gather) { // Gather from `src` into `dst`. eigen_assert(dst_stride == 1); for (; i <= vectorized_size; i += PacketSize) { Packet p = pgather(src + i * src_stride, src_stride); pstoreu(dst + i, p); } for (; i < count; ++i) { dst[i] = src[i * src_stride]; } // ******************************************************************** // } else if (kind == StridedLinearBufferCopy::Kind::Random) { // Random. for (; i < count; ++i) { dst[i * dst_stride] = src[i * src_stride]; } } else { eigen_assert(false); } } }; // -------------------------------------------------------------------------- // // TensorBlockIO copies data from `src` tensor block, to the `dst` tensor block. // It's possible to specify src->dst dimension mapping for the copy operation. // Dimensions of `dst` specify how many elements have to be copied, for the // `src` we need to know only stride to navigate through source memory buffer. template class TensorBlockIO { static const bool IsColMajor = (Layout == ColMajor); typedef StridedLinearBufferCopy LinCopy; public: typedef DSizes Dimensions; typedef DSizes DimensionsMap; struct Dst { Dst(const Dimensions& dst_dims, const Dimensions& dst_strides, Scalar* dst, IndexType dst_offset = 0) : dims(dst_dims), strides(dst_strides), data(dst), offset(dst_offset) {} Dimensions dims; Dimensions strides; Scalar* data; IndexType offset; }; struct Src { Src(const Dimensions& src_strides, const Scalar* src, IndexType src_offset = 0) : strides(src_strides), data(src), offset(src_offset) {} Dimensions strides; const Scalar* data; IndexType offset; }; // Copies data to `dst` from `src`, using provided dimensions mapping: // // src_dimension_index = dst_to_src_dim_map[dst_dimension_index] // // Returns the number of copied elements. static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE IndexType Copy( const Dst& dst, const Src& src, const DimensionsMap& dst_to_src_dim_map) { // Copy single scalar value from `src` to `dst`. if (NumDims == 0) { *(dst.data + dst.offset) = *(src.data + src.offset); return 1; } // Both `dst` and `src` must have contiguous innermost dimension. We also // accept the special case with stride '0', because it's used as a trick to // implement broadcasting. { int inner_dim = IsColMajor ? 0 : NumDims - 1; EIGEN_UNUSED_VARIABLE(inner_dim); eigen_assert(dst.strides[inner_dim] == 1 || dst.strides[inner_dim] == 0); eigen_assert(src.strides[inner_dim] == 1 || src.strides[inner_dim] == 0); } // Give a shorter name to `dst_to_src_dim_map`. const DimensionsMap& dim_map = dst_to_src_dim_map; // Do not squeeze reordered inner dimensions. int num_squeezable_dims = NumSqueezableInnerDims(dim_map); // NOTE: We find the innermost dimension (contiguous in memory) in the dst // block, and we write data linearly into that dimension, reading it from // the src. If dimensions are reordered, we might end up reading data from // the src with `stride != 1`. // // NOTE: Random-Read/Linear-Write can be up to ~2X faster than // Linear-Read/Random-Write: https://stackoverflow.com/a/54935680 // Find the innermost dimension in the dst whose size is not 1. This is the // effective inner dim. int num_size_one_inner_dims = 0; for (int i = 0; i < num_squeezable_dims; ++i) { const int dst_dim = IsColMajor ? i : NumDims - i - 1; if (dst.dims[dst_dim] != 1) break; num_size_one_inner_dims++; } // If all dimensions are of size 1, just copy a scalar from `src` to `dst`. if (num_size_one_inner_dims == NumDims) { *(dst.data + dst.offset) = *(src.data + src.offset); return 1; } // Outermost dimension in the dst with `stride == 1` (contiguous in memory). const int dst_stride1_dim = IsColMajor ? num_size_one_inner_dims : NumDims - num_size_one_inner_dims - 1; // Dimension in the src that corresponds to the dst innermost dimension. const int src_dim_for_dst_stride1_dim = NumDims == 0 ? 1 : dim_map[dst_stride1_dim]; // Size of the innermost dimension (length of contiguous blocks of memory). IndexType dst_inner_dim_size = NumDims == 0 ? 1 : dst.dims[dst_stride1_dim]; // Squeeze multiple inner dims into one if they are contiguous in `dst` and // `src` memory, so we can do less linear copy calls. for (int i = num_size_one_inner_dims + 1; i < num_squeezable_dims; ++i) { const int dst_dim = IsColMajor ? i : NumDims - i - 1; const IndexType dst_stride = dst.strides[dst_dim]; const IndexType src_stride = src.strides[dim_map[dst_dim]]; if (dst_inner_dim_size == dst_stride && dst_stride == src_stride) { dst_inner_dim_size *= dst.dims[dst_dim]; ++num_size_one_inner_dims; } else { break; } } // Setup strides to read data from `src` and write to `dst`. IndexType input_offset = src.offset; IndexType output_offset = dst.offset; IndexType input_stride = NumDims == 0 ? 1 : src.strides[src_dim_for_dst_stride1_dim]; IndexType output_stride = NumDims == 0 ? 1 : dst.strides[dst_stride1_dim]; const int at_least_1_dim = NumDims <= 1 ? 1 : NumDims - 1; array it; // Initialize block iterator state. Squeeze away any dimension of size 1. int idx = 0; // currently initialized iterator state index for (int i = num_size_one_inner_dims; i < NumDims - 1; ++i) { const int dst_dim = IsColMajor ? i + 1 : NumDims - i - 2; if (dst.dims[dst_dim] == 1) continue; it[idx].size = dst.dims[dst_dim]; it[idx].input_stride = src.strides[dim_map[dst_dim]]; it[idx].output_stride = dst.strides[dst_dim]; it[idx].input_span = it[idx].input_stride * (it[idx].size - 1); it[idx].output_span = it[idx].output_stride * (it[idx].size - 1); idx++; } // Iterate copying data from src to dst. const IndexType block_total_size = NumDims == 0 ? 1 : dst.dims.TotalSize(); #define COPY_INNER_DIM(KIND) \ IndexType num_copied = 0; \ for (num_copied = 0; num_copied < block_total_size; \ num_copied += dst_inner_dim_size) { \ LinCopy::template Run( \ typename LinCopy::Dst(output_offset, output_stride, dst.data), \ typename LinCopy::Src(input_offset, input_stride, src.data), \ dst_inner_dim_size); \ \ for (int j = 0; j < idx; ++j) { \ if (++it[j].count < it[j].size) { \ input_offset += it[j].input_stride; \ output_offset += it[j].output_stride; \ break; \ } \ it[j].count = 0; \ input_offset -= it[j].input_span; \ output_offset -= it[j].output_span; \ } \ } \ return num_copied; if (input_stride == 1 && output_stride == 1) { COPY_INNER_DIM(LinCopy::Kind::Linear); } else if (input_stride == 1 && output_stride != 1) { COPY_INNER_DIM(LinCopy::Kind::Scatter); } else if (input_stride == 0 && output_stride == 1) { COPY_INNER_DIM(LinCopy::Kind::FillLinear); } else if (input_stride == 0 && output_stride != 1) { COPY_INNER_DIM(LinCopy::Kind::FillScatter); } else if (output_stride == 1) { COPY_INNER_DIM(LinCopy::Kind::Gather); } else { COPY_INNER_DIM(LinCopy::Kind::Random); } #undef COPY_INNER_DIM } // Copy from `src` to `dst` with an identity src->dst dimension map. Returns // the number of copied elements. static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE IndexType Copy(const Dst& dst, const Src& src) { DimensionsMap dst_to_src_map; for (int i = 0; i < NumDims; ++i) dst_to_src_map[i] = i; return Copy(dst, src, dst_to_src_map); } private: struct BlockIteratorState { BlockIteratorState() : size(0), count(0), input_stride(0), output_stride(0), input_span(0), output_span(0) {} IndexType size; IndexType count; IndexType input_stride; IndexType output_stride; IndexType input_span; IndexType output_span; }; // Compute how many inner dimensions it's allowed to squeeze when doing IO // between two tensor blocks. It's safe to squeeze inner dimensions, only // if they are not reordered. static int NumSqueezableInnerDims(const DimensionsMap& dim_map) { int num_squeezable_dims = 0; for (int i = 0; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; if (dim_map[dim] != dim) break; num_squeezable_dims++; } return num_squeezable_dims; } }; // -------------------------------------------------------------------------- // // TensorBlockAssignment assigns a block expression of type `TensorBlockExpr` to // a Tensor block defined by `desc`, backed by a memory buffer at `target`. // // Currently there is no way to write from a Tensor expression to a block of // memory, if dimensions are reordered. If you need to do that, you should // materialize a Tensor block expression into a memory buffer, and then use // TensorBlockIO to copy data between two memory buffers with a custom // `target->src` dimension map (see definition above). // // Also currently the innermost dimension of `target` must have a stride '1' // (contiguous in memory). This restriction could be lifted with a `pscatter`, // but in practice it's never needed, and there is a similar TensorBlockIO // workaround for that. // // TODO(ezhulenev): TensorBlockAssignment is a special case of TensorBlockIO // where `src` is a tensor expression. Explore if it is possible to rewrite IO // to use expressions instead of pointers, and after that TensorBlockAssignment // will become an alias to IO. template class TensorBlockAssignment { // We will use coeff/packet path to evaluate block expressions. typedef TensorEvaluator TensorBlockEvaluator; typedef DSizes Dimensions; enum { Vectorizable = packet_traits::Vectorizable, PacketSize = packet_traits::size }; template struct InnerDimAssign { EIGEN_ALWAYS_INLINE static void Run(Scalar* target, IndexType count, const Evaluator& eval, IndexType eval_offset) { for (IndexType i = 0; i < count; ++i) { target[i] = eval.coeff(eval_offset + i); } } }; template struct InnerDimAssign { EIGEN_ALWAYS_INLINE static void Run(Scalar* target, IndexType count, const Evaluator& eval, IndexType eval_offset) { typedef typename packet_traits::type Packet; const IndexType unrolled_size = count - 4 * PacketSize; const IndexType vectorized_size = count - PacketSize; IndexType i = 0; for (; i <= unrolled_size; i += 4 * PacketSize) { for (int j = 0; j < 4; ++j) { const IndexType idx = eval_offset + i + j * PacketSize; Packet p = eval.template packet(idx); pstoreu(target + i + j * PacketSize, p); } } for (; i <= vectorized_size; i += PacketSize) { Packet p = eval.template packet(eval_offset + i); pstoreu(target + i, p); } for (; i < count; ++i) { target[i] = eval.coeff(eval_offset + i); } } }; public: struct Target { Target(const Dimensions& target_dims, const Dimensions& target_strides, Scalar* target_data, IndexType target_offset = 0) : dims(target_dims), strides(target_strides), data(target_data), offset(target_offset) {} Dimensions dims; Dimensions strides; Scalar* data; IndexType offset; }; static Target target(const Dimensions& target_dims, const Dimensions& target_strides, Scalar* target_data, IndexType target_offset = 0) { return Target(target_dims, target_strides, target_data, target_offset); } template static Target target( const DSizes& target_dims, const DSizes& target_strides, Scalar* target_data, IndexType target_offset = 0) { // DSizes constructor will do index type promotion if it's safe. return Target(Dimensions(target_dims), Dimensions(target_strides), target_data, target_offset); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run( const Target& target, const TensorBlockExpr& expr) { // Prepare evaluator for block expression. DefaultDevice default_device; TensorBlockEvaluator eval(expr, default_device); // Tensor block expression dimension should match destination dimensions. eigen_assert(dimensions_match(target.dims, eval.dimensions())); static const int Layout = TensorBlockEvaluator::Layout; static const bool is_col_major = Layout == ColMajor; // Initialize output inner dimension size based on a layout. const IndexType output_size = NumDims == 0 ? 1 : target.dims.TotalSize(); const int inner_dim_idx = is_col_major ? 0 : NumDims - 1; IndexType output_inner_dim_size = target.dims[inner_dim_idx]; // Target inner dimension stride must be '1'. eigen_assert(target.strides[inner_dim_idx] == 1); // Squeeze multiple inner dims into one if they are contiguous in `target`. IndexType num_squeezed_dims = 0; for (Index i = 1; i < NumDims; ++i) { const Index dim = is_col_major ? i : NumDims - i - 1; const IndexType target_stride = target.strides[dim]; if (output_inner_dim_size == target_stride) { output_inner_dim_size *= target.dims[dim]; num_squeezed_dims++; } else { break; } } // Initialize output block iterator state. Dimension in this array are // always in inner_most -> outer_most order (col major layout). array it; int idx = 0; // currently initialized iterator state index for (Index i = num_squeezed_dims; i < NumDims - 1; ++i) { const Index dim = is_col_major ? i + 1 : NumDims - i - 2; it[idx].count = 0; it[idx].size = target.dims[dim]; it[idx].output_stride = target.strides[dim]; it[idx].output_span = it[idx].output_stride * (it[idx].size - 1); idx++; } // We read block expression from the beginning, and start writing data to // `target` at given offset. IndexType input_offset = 0; IndexType output_offset = target.offset; // Iterate copying data from `eval` to `target`. for (IndexType i = 0; i < output_size; i += output_inner_dim_size) { // Assign to `target` at current offset. InnerDimAssign::Run(target.data + output_offset, output_inner_dim_size, eval, input_offset); // Move input offset forward by the number of assigned coefficients. input_offset += output_inner_dim_size; // Update index. for (int j = 0; j < idx; ++j) { if (++it[j].count < it[j].size) { output_offset += it[j].output_stride; break; } it[j].count = 0; output_offset -= it[j].output_span; } } } private: struct BlockIteratorState { BlockIteratorState() : count(0), size(0), output_stride(0), output_span(0) {} IndexType count; IndexType size; IndexType output_stride; IndexType output_span; }; }; // -------------------------------------------------------------------------- // } // namespace internal } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorIO.h0000644000176200001440000000500014107270226024661 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_IO_H #define EIGEN_CXX11_TENSOR_TENSOR_IO_H namespace Eigen { namespace internal { // Print the tensor as a 2d matrix template struct TensorPrinter { static void run (std::ostream& os, const Tensor& tensor) { typedef typename internal::remove_const::type Scalar; typedef typename Tensor::Index Index; const Index total_size = internal::array_prod(tensor.dimensions()); if (total_size > 0) { const Index first_dim = Eigen::internal::array_get<0>(tensor.dimensions()); static const int layout = Tensor::Layout; Map > matrix(const_cast(tensor.data()), first_dim, total_size/first_dim); os << matrix; } } }; // Print the tensor as a vector template struct TensorPrinter { static void run (std::ostream& os, const Tensor& tensor) { typedef typename internal::remove_const::type Scalar; typedef typename Tensor::Index Index; const Index total_size = internal::array_prod(tensor.dimensions()); if (total_size > 0) { Map > array(const_cast(tensor.data()), total_size); os << array; } } }; // Print the tensor as a scalar template struct TensorPrinter { static void run (std::ostream& os, const Tensor& tensor) { os << tensor.coeff(0); } }; } template std::ostream& operator << (std::ostream& os, const TensorBase& expr) { typedef TensorEvaluator, DefaultDevice> Evaluator; typedef typename Evaluator::Dimensions Dimensions; // Evaluate the expression if needed TensorForcedEvalOp eval = expr.eval(); Evaluator tensor(eval, DefaultDevice()); tensor.evalSubExprsIfNeeded(NULL); // Print the result static const int rank = internal::array_size::value; internal::TensorPrinter::run(os, tensor); // Cleanup. tensor.cleanup(); return os; } } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_IO_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h0000644000176200001440000002233014562417221025630 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H #define EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H namespace Eigen { namespace internal { template class compute_tensor_flags { enum { is_dynamic_size_storage = 1, is_aligned = ( ((Options&DontAlign)==0) && ( #if EIGEN_MAX_STATIC_ALIGN_BYTES>0 (!is_dynamic_size_storage) #else 0 #endif | #if EIGEN_MAX_ALIGN_BYTES>0 is_dynamic_size_storage #else 0 #endif ) ), packet_access_bit = packet_traits::Vectorizable && is_aligned ? PacketAccessBit : 0 }; public: enum { ret = packet_access_bit }; }; template struct traits > { typedef Scalar_ Scalar; typedef Dense StorageKind; typedef IndexType_ Index; static const int NumDimensions = NumIndices_; static const int Layout = Options_ & RowMajor ? RowMajor : ColMajor; enum { Options = Options_, Flags = compute_tensor_flags::ret | (is_const::value ? 0 : LvalueBit) }; template struct MakePointer { typedef T* Type; }; typedef typename MakePointer::Type PointerType; }; template struct traits > { typedef Scalar_ Scalar; typedef Dense StorageKind; typedef IndexType_ Index; static const int NumDimensions = array_size::value; static const int Layout = Options_ & RowMajor ? RowMajor : ColMajor; enum { Options = Options_, Flags = compute_tensor_flags::ret | (is_const::value ? 0: LvalueBit) }; template struct MakePointer { typedef T* Type; }; typedef typename MakePointer::Type PointerType; }; template class MakePointer_> struct traits > : public traits { typedef traits BaseTraits; typedef typename BaseTraits::Scalar Scalar; typedef typename BaseTraits::StorageKind StorageKind; typedef typename BaseTraits::Index Index; static const int NumDimensions = BaseTraits::NumDimensions; static const int Layout = BaseTraits::Layout; enum { Options = Options_, Flags = BaseTraits::Flags }; template struct MakePointer { // Intermediate typedef to workaround MSVC issue. typedef MakePointer_ MakePointerT; typedef typename MakePointerT::Type Type; }; typedef typename MakePointer::Type PointerType; }; template struct traits > : public traits { typedef traits BaseTraits; typedef typename BaseTraits::Scalar Scalar; typedef typename BaseTraits::StorageKind StorageKind; typedef typename BaseTraits::Index Index; static const int NumDimensions = BaseTraits::NumDimensions; static const int Layout = BaseTraits::Layout; enum { Options = BaseTraits::Options, Flags = BaseTraits::Flags }; typedef typename BaseTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const Tensor<_Scalar, NumIndices_, Options, IndexType_>EIGEN_DEVICE_REF type; }; template struct eval, Eigen::Dense> { typedef const Tensor<_Scalar, NumIndices_, Options, IndexType_>EIGEN_DEVICE_REF type; }; template struct eval, Eigen::Dense> { typedef const TensorFixedSizeEIGEN_DEVICE_REF type; }; template struct eval, Eigen::Dense> { typedef const TensorFixedSizeEIGEN_DEVICE_REF type; }; template class MakePointer> struct eval, Eigen::Dense> { typedef const TensorMapEIGEN_DEVICE_REF type; }; template class MakePointer> struct eval, Eigen::Dense> { typedef const TensorMapEIGEN_DEVICE_REF type; }; template struct eval, Eigen::Dense> { typedef const TensorRefEIGEN_DEVICE_REF type; }; template struct eval, Eigen::Dense> { typedef const TensorRefEIGEN_DEVICE_REF type; }; // TODO nested<> does not exist anymore in Eigen/Core, and it thus has to be removed in favor of ref_selector. template struct nested { typedef typename ref_selector::type type; }; template struct nested > { typedef const TensorEIGEN_DEVICE_REF type; }; template struct nested > { typedef const TensorEIGEN_DEVICE_REF type; }; template struct nested > { typedef const TensorFixedSizeEIGEN_DEVICE_REF type; }; template struct nested > { typedef const TensorFixedSizeEIGEN_DEVICE_REF type; }; template struct nested > { typedef const TensorRefEIGEN_DEVICE_REF type; }; template struct nested > { typedef const TensorRefEIGEN_DEVICE_REF type; }; } // end namespace internal // Convolutional layers take in an input tensor of shape (D, R, C, B), or (D, C, // R, B), and convolve it with a set of filters, which can also be presented as // a tensor (D, K, K, M), where M is the number of filters, K is the filter // size, and each 3-dimensional tensor of size (D, K, K) is a filter. For // simplicity we assume that we always use square filters (which is usually the // case in images), hence the two Ks in the tensor dimension. It also takes in // a few additional parameters: // Stride (S): The convolution stride is the offset between locations where we // apply the filters. A larger stride means that the output will be // spatially smaller. // Padding (P): The padding we apply to the input tensor along the R and C // dimensions. This is usually used to make sure that the spatial // dimensions of the output matches our intention. // // Two types of padding are often used: // SAME: The pad value is computed so that the output will have size // R/S and C/S. // VALID: no padding is carried out. // When we do padding, the padded values at the padded locations are usually // zero. // // The output dimensions for convolution, when given all the parameters above, // are as follows: // When Padding = SAME: the output size is (B, R', C', M), where // R' = ceil(float(R) / float(S)) // C' = ceil(float(C) / float(S)) // where ceil is the ceiling function. The input tensor is padded with 0 as // needed. The number of padded rows and columns are computed as: // Pr = ((R' - 1) * S + K - R) / 2 // Pc = ((C' - 1) * S + K - C) / 2 // when the stride is 1, we have the simplified case R'=R, C'=C, Pr=Pc=(K-1)/2. // This is where SAME comes from - the output has the same size as the input has. // When Padding = VALID: the output size is computed as // R' = ceil(float(R - K + 1) / float(S)) // C' = ceil(float(C - K + 1) / float(S)) // and the number of padded rows and columns are computed in the same way as in // the SAME case. // When the stride is 1, we have the simplified case R'=R-K+1, C'=C-K+1, Pr=0, // Pc=0. typedef enum { PADDING_VALID = 1, PADDING_SAME = 2 } PaddingType; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h0000644000176200001440000002055414562417221025562 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H #define EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H namespace Eigen { /** \class TensorForcedEval * \ingroup CXX11_Tensor_Module * * \brief Tensor reshaping class. * * */ namespace internal { template class MakePointer_> struct traits > { // Type promotion to handle the case where the types of the lhs and the rhs are different. typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename MakePointer_::Type PointerType; enum { Flags = 0 }; template struct MakePointer { // Intermediate typedef to workaround MSVC issue. typedef MakePointer_ MakePointerT; typedef typename MakePointerT::Type Type; }; }; template class MakePointer_> struct eval, Eigen::Dense> { typedef const TensorEvalToOp& type; }; template class MakePointer_> struct nested, 1, typename eval >::type> { typedef TensorEvalToOp type; }; } // end namespace internal template class MakePointer_> class TensorEvalToOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename internal::remove_const::type CoeffReturnType; typedef typename MakePointer_::Type PointerType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; static const int NumDims = Eigen::internal::traits::NumDimensions; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvalToOp(PointerType buffer, const XprType& expr) : m_xpr(expr), m_buffer(buffer) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } EIGEN_DEVICE_FUNC PointerType buffer() const { return m_buffer; } protected: typename XprType::Nested m_xpr; PointerType m_buffer; }; template class MakePointer_> struct TensorEvaluator, Device> { typedef TensorEvalToOp XprType; typedef typename ArgType::Scalar Scalar; typedef typename TensorEvaluator::Dimensions Dimensions; typedef typename XprType::Index Index; typedef typename internal::remove_const::type CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef typename Eigen::internal::traits::PointerType TensorPointerType; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = true, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = true }; static const int NumDims = internal::traits::NumDimensions; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; typedef internal::TensorBlockAssignment< CoeffReturnType, NumDims, typename ArgTensorBlock::XprType, Index> TensorBlockAssignment; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_buffer(device.get(op.buffer())), m_expression(op.expression()){} EIGEN_STRONG_INLINE ~TensorEvaluator() { } EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType scalar) { EIGEN_UNUSED_VARIABLE(scalar); eigen_assert(scalar == NULL); return m_impl.evalSubExprsIfNeeded(m_buffer); } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType scalar, EvalSubExprsCallback done) { EIGEN_UNUSED_VARIABLE(scalar); eigen_assert(scalar == NULL); m_impl.evalSubExprsIfNeededAsync(m_buffer, std::move(done)); } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalScalar(Index i) { m_buffer[i] = m_impl.coeff(i); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalPacket(Index i) { internal::pstoret(m_buffer + i, m_impl.template packet::IsAligned ? Aligned : Unaligned>(i)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { return m_impl.getResourceRequirements(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalBlock( TensorBlockDesc& desc, TensorBlockScratch& scratch) { // Add `m_buffer` as destination buffer to the block descriptor. desc.template AddDestinationBuffer( /*dst_base=*/m_buffer + desc.offset(), /*dst_strides=*/internal::strides(m_impl.dimensions())); ArgTensorBlock block = m_impl.block(desc, scratch, /*root_of_expr_ast=*/true); // If block was evaluated into a destination buffer, there is no need to do // an assignment. if (block.kind() != internal::TensorBlockKind::kMaterializedInOutput) { TensorBlockAssignment::Run( TensorBlockAssignment::target( desc.dimensions(), internal::strides(m_impl.dimensions()), m_buffer, desc.offset()), block.expr()); } block.cleanup(); } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_buffer[index]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { return internal::ploadt(m_buffer + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // We assume that evalPacket or evalScalar is called to perform the // assignment and account for the cost of the write here. return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, sizeof(CoeffReturnType), 0, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return m_buffer; } ArgType expression() const { return m_expression; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); m_buffer.bind(cgh); } #endif private: TensorEvaluator m_impl; EvaluatorPointerType m_buffer; const ArgType m_expression; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorContractionGpu.h0000644000176200001440000017365214562417221027337 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014-2015 Benoit Steiner // Copyright (C) 2015 Navdeep Jaitly // Copyright (C) 2014 Eric Martin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_GPU_H #define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_GPU_H #if defined(EIGEN_USE_GPU) && defined(EIGEN_GPUCC) namespace Eigen { template __device__ EIGEN_STRONG_INLINE void EigenContractionKernelInternal(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, Scalar* lhs_shmem, Scalar* rhs_shmem, const Index m_size, const Index n_size, const Index k_size) { const Index m_block_idx = blockIdx.x; const Index n_block_idx = blockIdx.y; const Index base_m = 64 * m_block_idx; const Index base_n = 64 * n_block_idx; // declare and initialize 64 registers for output 8x8 block // prefetch registers Scalar lhs_pf0; Scalar lhs_pf1; Scalar lhs_pf2; Scalar lhs_pf3; Scalar lhs_pf4; Scalar lhs_pf5; Scalar lhs_pf6; Scalar lhs_pf7; Scalar rhs_pf0; Scalar rhs_pf1; Scalar rhs_pf2; Scalar rhs_pf3; Scalar rhs_pf4; Scalar rhs_pf5; Scalar rhs_pf6; Scalar rhs_pf7; // shared memory is formatted // (contract idx in block, nocontract idx in block, block idx) // where block idx is column major. This transposition limits the number of // bank conflicts when reading the LHS. The core idea is that since the contracting // index is shared by both sides, then the contracting index should be in threadIdx.x. // On the LHS, we pad each row inside of each block with an extra element. This makes // each block 8 rows of 9 elements, which is 72 elements. This gives no bank conflicts // on writes and very few 2-way conflicts on reads. There is an 8x8 grid of these blocks. // On the RHS we just add 8 padding elements to the end of each block. This gives no bank // conflicts on writes and also none on reads. // storage indices const Index lhs_store_idx_base = threadIdx.y * 72 + threadIdx.x * 9 + threadIdx.z; const Index rhs_store_idx_base = threadIdx.y * 72 + threadIdx.z * 8 + threadIdx.x; const Index lhs_store_idx_0 = lhs_store_idx_base + 576 * 0; const Index lhs_store_idx_1 = lhs_store_idx_base + 576 * 1; const Index lhs_store_idx_2 = lhs_store_idx_base + 576 * 2; const Index lhs_store_idx_3 = lhs_store_idx_base + 576 * 3; const Index lhs_store_idx_4 = lhs_store_idx_base + 576 * 4; const Index lhs_store_idx_5 = lhs_store_idx_base + 576 * 5; const Index lhs_store_idx_6 = lhs_store_idx_base + 576 * 6; const Index lhs_store_idx_7 = lhs_store_idx_base + 576 * 7; const Index rhs_store_idx_0 = rhs_store_idx_base + 576 * 0; const Index rhs_store_idx_1 = rhs_store_idx_base + 576 * 1; const Index rhs_store_idx_2 = rhs_store_idx_base + 576 * 2; const Index rhs_store_idx_3 = rhs_store_idx_base + 576 * 3; const Index rhs_store_idx_4 = rhs_store_idx_base + 576 * 4; const Index rhs_store_idx_5 = rhs_store_idx_base + 576 * 5; const Index rhs_store_idx_6 = rhs_store_idx_base + 576 * 6; const Index rhs_store_idx_7 = rhs_store_idx_base + 576 * 7; // in the loading code, the following variables are important: // threadIdx.x: the vertical position in an 8x8 block // threadIdx.y: the vertical index of the 8x8 block in the grid // threadIdx.z: the horizontal position in an 8x8 block // k: the horizontal index of the 8x8 block in the grid // // The k parameter is implicit (it was the loop counter for a loop that went // from 0 to <8, but now that loop is unrolled in the below code. const Index load_idx_vert = threadIdx.x + 8 * threadIdx.y; const Index lhs_vert = base_m + load_idx_vert; #define prefetchIntoRegisters(base_k) \ { \ lhs_pf0 = conv(0); \ lhs_pf1 = conv(0); \ lhs_pf2 = conv(0); \ lhs_pf3 = conv(0); \ lhs_pf4 = conv(0); \ lhs_pf5 = conv(0); \ lhs_pf6 = conv(0); \ lhs_pf7 = conv(0); \ \ rhs_pf0 = conv(0); \ rhs_pf1 = conv(0); \ rhs_pf2 = conv(0); \ rhs_pf3 = conv(0); \ rhs_pf4 = conv(0); \ rhs_pf5 = conv(0); \ rhs_pf6 = conv(0); \ rhs_pf7 = conv(0); \ \ if (!needs_edge_check || lhs_vert < m_size) { \ const Index lhs_horiz_0 = base_k + threadIdx.z + 0 * 8; \ const Index lhs_horiz_1 = base_k + threadIdx.z + 1 * 8; \ const Index lhs_horiz_2 = base_k + threadIdx.z + 2 * 8; \ const Index lhs_horiz_3 = base_k + threadIdx.z + 3 * 8; \ const Index lhs_horiz_4 = base_k + threadIdx.z + 4 * 8; \ const Index lhs_horiz_5 = base_k + threadIdx.z + 5 * 8; \ const Index lhs_horiz_6 = base_k + threadIdx.z + 6 * 8; \ const Index lhs_horiz_7 = base_k + threadIdx.z + 7 * 8; \ \ if (!needs_edge_check || lhs_horiz_7 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ lhs_pf3 = lhs(lhs_vert, lhs_horiz_3); \ lhs_pf4 = lhs(lhs_vert, lhs_horiz_4); \ lhs_pf5 = lhs(lhs_vert, lhs_horiz_5); \ lhs_pf6 = lhs(lhs_vert, lhs_horiz_6); \ lhs_pf7 = lhs(lhs_vert, lhs_horiz_7); \ } else if (lhs_horiz_6 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ lhs_pf3 = lhs(lhs_vert, lhs_horiz_3); \ lhs_pf4 = lhs(lhs_vert, lhs_horiz_4); \ lhs_pf5 = lhs(lhs_vert, lhs_horiz_5); \ lhs_pf6 = lhs(lhs_vert, lhs_horiz_6); \ } else if (lhs_horiz_5 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ lhs_pf3 = lhs(lhs_vert, lhs_horiz_3); \ lhs_pf4 = lhs(lhs_vert, lhs_horiz_4); \ lhs_pf5 = lhs(lhs_vert, lhs_horiz_5); \ } else if (lhs_horiz_4 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ lhs_pf3 = lhs(lhs_vert, lhs_horiz_3); \ lhs_pf4 = lhs(lhs_vert, lhs_horiz_4); \ } else if (lhs_horiz_3 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ lhs_pf3 = lhs(lhs_vert, lhs_horiz_3); \ } else if (lhs_horiz_2 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ lhs_pf2 = lhs(lhs_vert, lhs_horiz_2); \ } else if (lhs_horiz_1 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ lhs_pf1 = lhs(lhs_vert, lhs_horiz_1); \ } else if (lhs_horiz_0 < k_size) { \ lhs_pf0 = lhs(lhs_vert, lhs_horiz_0); \ } \ } \ \ const Index rhs_vert = base_k + load_idx_vert; \ if (!needs_edge_check || rhs_vert < k_size) { \ const Index rhs_horiz_0 = base_n + threadIdx.z + 0 * 8; \ const Index rhs_horiz_1 = base_n + threadIdx.z + 1 * 8; \ const Index rhs_horiz_2 = base_n + threadIdx.z + 2 * 8; \ const Index rhs_horiz_3 = base_n + threadIdx.z + 3 * 8; \ const Index rhs_horiz_4 = base_n + threadIdx.z + 4 * 8; \ const Index rhs_horiz_5 = base_n + threadIdx.z + 5 * 8; \ const Index rhs_horiz_6 = base_n + threadIdx.z + 6 * 8; \ const Index rhs_horiz_7 = base_n + threadIdx.z + 7 * 8; \ \ if (rhs_horiz_7 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ rhs_pf3 = rhs(rhs_vert, rhs_horiz_3); \ rhs_pf4 = rhs(rhs_vert, rhs_horiz_4); \ rhs_pf5 = rhs(rhs_vert, rhs_horiz_5); \ rhs_pf6 = rhs(rhs_vert, rhs_horiz_6); \ rhs_pf7 = rhs(rhs_vert, rhs_horiz_7); \ } else if (rhs_horiz_6 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ rhs_pf3 = rhs(rhs_vert, rhs_horiz_3); \ rhs_pf4 = rhs(rhs_vert, rhs_horiz_4); \ rhs_pf5 = rhs(rhs_vert, rhs_horiz_5); \ rhs_pf6 = rhs(rhs_vert, rhs_horiz_6); \ } else if (rhs_horiz_5 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ rhs_pf3 = rhs(rhs_vert, rhs_horiz_3); \ rhs_pf4 = rhs(rhs_vert, rhs_horiz_4); \ rhs_pf5 = rhs(rhs_vert, rhs_horiz_5); \ } else if (rhs_horiz_4 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ rhs_pf3 = rhs(rhs_vert, rhs_horiz_3); \ rhs_pf4 = rhs(rhs_vert, rhs_horiz_4); \ } else if (rhs_horiz_3 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ rhs_pf3 = rhs(rhs_vert, rhs_horiz_3); \ } else if (rhs_horiz_2 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ rhs_pf2 = rhs(rhs_vert, rhs_horiz_2); \ } else if (rhs_horiz_1 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ rhs_pf1 = rhs(rhs_vert, rhs_horiz_1); \ } else if (rhs_horiz_0 < n_size) { \ rhs_pf0 = rhs(rhs_vert, rhs_horiz_0); \ } \ } \ } \ #define writeRegToShmem(_) \ lhs_shmem[lhs_store_idx_0] = lhs_pf0; \ rhs_shmem[rhs_store_idx_0] = rhs_pf0; \ \ lhs_shmem[lhs_store_idx_1] = lhs_pf1; \ rhs_shmem[rhs_store_idx_1] = rhs_pf1; \ \ lhs_shmem[lhs_store_idx_2] = lhs_pf2; \ rhs_shmem[rhs_store_idx_2] = rhs_pf2; \ \ lhs_shmem[lhs_store_idx_3] = lhs_pf3; \ rhs_shmem[rhs_store_idx_3] = rhs_pf3; \ \ lhs_shmem[lhs_store_idx_4] = lhs_pf4; \ rhs_shmem[rhs_store_idx_4] = rhs_pf4; \ \ lhs_shmem[lhs_store_idx_5] = lhs_pf5; \ rhs_shmem[rhs_store_idx_5] = rhs_pf5; \ \ lhs_shmem[lhs_store_idx_6] = lhs_pf6; \ rhs_shmem[rhs_store_idx_6] = rhs_pf6; \ \ lhs_shmem[lhs_store_idx_7] = lhs_pf7; \ rhs_shmem[rhs_store_idx_7] = rhs_pf7; \ // declare and initialize result array #define res(i, j) _res_##i##j #define initResultRow(i) \ Scalar res(i, 0) = conv(0); \ Scalar res(i, 1) = conv(0); \ Scalar res(i, 2) = conv(0); \ Scalar res(i, 3) = conv(0); \ Scalar res(i, 4) = conv(0); \ Scalar res(i, 5) = conv(0); \ Scalar res(i, 6) = conv(0); \ Scalar res(i, 7) = conv(0); \ internal::scalar_cast_op conv; initResultRow(0); initResultRow(1); initResultRow(2); initResultRow(3); initResultRow(4); initResultRow(5); initResultRow(6); initResultRow(7); #undef initResultRow for (Index base_k = 0; base_k < k_size; base_k += 64) { // wait for previous iteration to finish with shmem. Despite common sense, // the code is a bit faster with this here then at bottom of loop __syncthreads(); prefetchIntoRegisters(base_k); writeRegToShmem(); #undef prefetchIntoRegisters #undef writeRegToShmem // wait for shared mem packing to be done before starting computation __syncthreads(); // compute 8x8 matrix product by outer product. This involves packing one column // of LHS and one row of RHS into registers (takes 16 registers). #define lcol(i) _lcol##i Scalar lcol(0); Scalar lcol(1); Scalar lcol(2); Scalar lcol(3); Scalar lcol(4); Scalar lcol(5); Scalar lcol(6); Scalar lcol(7); #define rrow(j) _rrow##j Scalar rrow(0); Scalar rrow(1); Scalar rrow(2); Scalar rrow(3); Scalar rrow(4); Scalar rrow(5); Scalar rrow(6); Scalar rrow(7); // Now x corresponds to k, y to m, and z to n const Scalar* lhs_block = &lhs_shmem[threadIdx.x + 9 * threadIdx.y]; const Scalar* rhs_block = &rhs_shmem[threadIdx.x + 8 * threadIdx.z]; #define lhs_element(i, j) lhs_block[72 * ((i) + 8 * (j))] #define rhs_element(i, j) rhs_block[72 * ((i) + 8 * (j))] #define loadData(i, j) \ lcol(0) = lhs_element(0, j); \ rrow(0) = rhs_element(i, 0); \ lcol(1) = lhs_element(1, j); \ rrow(1) = rhs_element(i, 1); \ lcol(2) = lhs_element(2, j); \ rrow(2) = rhs_element(i, 2); \ lcol(3) = lhs_element(3, j); \ rrow(3) = rhs_element(i, 3); \ lcol(4) = lhs_element(4, j); \ rrow(4) = rhs_element(i, 4); \ lcol(5) = lhs_element(5, j); \ rrow(5) = rhs_element(i, 5); \ lcol(6) = lhs_element(6, j); \ rrow(6) = rhs_element(i, 6); \ lcol(7) = lhs_element(7, j); \ rrow(7) = rhs_element(i, 7); \ #define computeCol(j) \ res(0, j) += lcol(0) * rrow(j); \ res(1, j) += lcol(1) * rrow(j); \ res(2, j) += lcol(2) * rrow(j); \ res(3, j) += lcol(3) * rrow(j); \ res(4, j) += lcol(4) * rrow(j); \ res(5, j) += lcol(5) * rrow(j); \ res(6, j) += lcol(6) * rrow(j); \ res(7, j) += lcol(7) * rrow(j); \ #define computePass(i) \ loadData(i, i); \ \ computeCol(0); \ computeCol(1); \ computeCol(2); \ computeCol(3); \ computeCol(4); \ computeCol(5); \ computeCol(6); \ computeCol(7); \ computePass(0); computePass(1); computePass(2); computePass(3); computePass(4); computePass(5); computePass(6); computePass(7); #undef lcol #undef rrow #undef lhs_element #undef rhs_element #undef loadData #undef computeCol #undef computePass } // end loop over k // we've now iterated over all of the large (ie width 64) k blocks and // accumulated results in registers. At this point thread (x, y, z) contains // the sum across all big k blocks of the product of little k block of index (x, y) // with block of index (y, z). To compute the final output, we need to reduce // the 8 threads over y by summation. #if defined(EIGEN_HIPCC) || (defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000) #define shuffleInc(i, j, mask) res(i, j) += __shfl_xor(res(i, j), mask) #else #define shuffleInc(i, j, mask) res(i, j) += __shfl_xor_sync(0xFFFFFFFF, res(i, j), mask) #endif #define reduceRow(i, mask) \ shuffleInc(i, 0, mask); \ shuffleInc(i, 1, mask); \ shuffleInc(i, 2, mask); \ shuffleInc(i, 3, mask); \ shuffleInc(i, 4, mask); \ shuffleInc(i, 5, mask); \ shuffleInc(i, 6, mask); \ shuffleInc(i, 7, mask); \ #define reduceMatrix(mask) \ reduceRow(0, mask); \ reduceRow(1, mask); \ reduceRow(2, mask); \ reduceRow(3, mask); \ reduceRow(4, mask); \ reduceRow(5, mask); \ reduceRow(6, mask); \ reduceRow(7, mask); \ // actually perform the reduction, now each thread of index (_, y, z) // contains the correct values in its registers that belong in the output // block reduceMatrix(1); reduceMatrix(2); reduceMatrix(4); #undef shuffleInc #undef reduceRow #undef reduceMatrix // now we need to copy the 64 values into main memory. We can't split work // among threads because all variables are in registers. There's 2 ways // to do this: // (1) have 1 thread do 64 writes from registers into global memory // (2) have 1 thread do 64 writes into shared memory, and then 8 threads // each do 8 writes into global memory. We can just overwrite the shared // memory from the problem we just solved. // (2) is slightly faster than (1) due to less branching and more ILP // TODO: won't yield much gain, but could just use currently unused shared mem // and then we won't have to sync // wait for shared mem to be out of use __syncthreads(); #define writeResultShmem(i, j) \ lhs_shmem[i + 8 * threadIdx.y + 64 * threadIdx.z + 512 * j] = res(i, j); \ #define writeRow(i) \ writeResultShmem(i, 0); \ writeResultShmem(i, 1); \ writeResultShmem(i, 2); \ writeResultShmem(i, 3); \ writeResultShmem(i, 4); \ writeResultShmem(i, 5); \ writeResultShmem(i, 6); \ writeResultShmem(i, 7); \ if (threadIdx.x == 0) { writeRow(0); writeRow(1); writeRow(2); writeRow(3); writeRow(4); writeRow(5); writeRow(6); writeRow(7); } #undef writeResultShmem #undef writeRow const int max_i_write = numext::mini((int)((m_size - base_m - threadIdx.y + 7) / 8), 8); const int max_j_write = numext::mini((int)((n_size - base_n - threadIdx.z + 7) / 8), 8); if (threadIdx.x < max_i_write) { if (max_j_write == 8) { // TODO: can i trade bank conflicts for coalesced writes? Scalar val0 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 0]; Scalar val1 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 1]; Scalar val2 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 2]; Scalar val3 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 3]; Scalar val4 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 4]; Scalar val5 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 5]; Scalar val6 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 6]; Scalar val7 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 7]; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 0) = val0; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 1) = val1; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 2) = val2; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 3) = val3; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 4) = val4; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 5) = val5; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 6) = val6; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 7) = val7; } else { #pragma unroll 7 for (int j = 0; j < max_j_write; j++) { Scalar val = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * j]; output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * j) = val; } } } #undef res } template __global__ void #if defined(EIGEN_HIPCC) __launch_bounds__(512, 1) #else __launch_bounds__(512) #endif EigenContractionKernel(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, const Index m_size, const Index n_size, const Index k_size) { __shared__ Scalar lhs_shmem[72 * 64]; __shared__ Scalar rhs_shmem[72 * 64]; const Index m_block_idx = blockIdx.x; const Index n_block_idx = blockIdx.y; const Index base_m = 64 * m_block_idx; const Index base_n = 64 * n_block_idx; if (base_m + 63 < m_size && base_n + 63 < n_size) { EigenContractionKernelInternal(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size); } else { EigenContractionKernelInternal(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size); } } template __device__ __forceinline__ void EigenFloatContractionKernelInternal16x16(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, float2 lhs_shmem2[][16], float2 rhs_shmem2[][8], const Index m_size, const Index n_size, const Index k_size, const Index base_m, const Index base_n) { // prefetch registers float4 lhs_pf0, rhs_pf0; float4 results[4]; for (int i=0; i < 4; i++) { results[i].x = results[i].y = results[i].z = results[i].w = 0; } #define prefetch_lhs(reg, row, col) \ if (!CHECK_LHS_BOUNDARY) { \ if (col < k_size) { \ reg =lhs.template loadPacket(row, col); \ } \ } else { \ if (col < k_size) { \ if (row + 3 < m_size) { \ reg =lhs.template loadPacket(row, col); \ } else if (row + 2 < m_size) { \ reg.x =lhs(row + 0, col); \ reg.y =lhs(row + 1, col); \ reg.z =lhs(row + 2, col); \ } else if (row + 1 < m_size) { \ reg.x =lhs(row + 0, col); \ reg.y =lhs(row + 1, col); \ } else if (row < m_size) { \ reg.x =lhs(row + 0, col); \ } \ } \ } \ Index lhs_vert = base_m+threadIdx.x*4; for (Index k = 0; k < k_size; k += 16) { lhs_pf0 = internal::pset1(0); rhs_pf0 = internal::pset1(0); Index lhs_horiz = threadIdx.y+k; prefetch_lhs(lhs_pf0, lhs_vert, lhs_horiz) Index rhs_vert = k+(threadIdx.x%4)*4; Index rhs_horiz0 = (threadIdx.x>>2)+threadIdx.y*4+base_n; if (!CHECK_RHS_BOUNDARY) { if ((rhs_vert + 3) < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0 = rhs.template loadPacket(rhs_vert, rhs_horiz0); } else if (rhs_vert + 2 < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0); } else if (rhs_vert + 1 < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); } else if (rhs_vert < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); } } else { if (rhs_horiz0 < n_size) { if ((rhs_vert + 3) < k_size) { rhs_pf0 = rhs.template loadPacket(rhs_vert, rhs_horiz0); } else if ((rhs_vert + 2) < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0); } else if ((rhs_vert + 1) < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); } else if (rhs_vert < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); } } } float x1, x2 ; // the following can be a bitwise operation..... some day. if((threadIdx.x%8) < 4) { x1 = rhs_pf0.y; x2 = rhs_pf0.w; } else { x1 = rhs_pf0.x; x2 = rhs_pf0.z; } #if defined(EIGEN_HIPCC) || (defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000) x1 = __shfl_xor(x1, 4); x2 = __shfl_xor(x2, 4); #else x1 = __shfl_xor_sync(0xFFFFFFFF, x1, 4); x2 = __shfl_xor_sync(0xFFFFFFFF, x2, 4); #endif if((threadIdx.x%8) < 4) { rhs_pf0.y = x1; rhs_pf0.w = x2; } else { rhs_pf0.x = x1; rhs_pf0.z = x2; } // We have 64 features. // Row 0 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 0, 1. // Row 1 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 2, 3. // ... // Row 31 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 62, 63 // Row 32 -> times (2, 6, 10, 14, 3, 7, 11, 15) for features 0, 1 // ... rhs_shmem2[(threadIdx.x>>3)+ threadIdx.y*2][threadIdx.x%8] = make_float2(rhs_pf0.x, rhs_pf0.y); rhs_shmem2[(threadIdx.x>>3)+ threadIdx.y*2+32][threadIdx.x%8] = make_float2(rhs_pf0.z, rhs_pf0.w); // Row 0 (time 0) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), .. (60, 61) // Row 1 (time 1) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), .. (60, 61) // ... // Row 15 (time 15) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), .. (60, 61) // Row 16 (time 0) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), .. (62, 63) // ... lhs_shmem2[threadIdx.y][threadIdx.x] = make_float2(lhs_pf0.x, lhs_pf0.y); lhs_shmem2[threadIdx.y+16][threadIdx.x] = make_float2(lhs_pf0.z, lhs_pf0.w); #define add_vals(fl1, fl2, fr1, fr2)\ results[0].x += fl1.x * fr1.x;\ results[0].y += fl1.y * fr1.x;\ results[0].z += fl2.x * fr1.x;\ results[0].w += fl2.y * fr1.x;\ \ results[1].x += fl1.x * fr1.y;\ results[1].y += fl1.y * fr1.y;\ results[1].z += fl2.x * fr1.y;\ results[1].w += fl2.y * fr1.y;\ \ results[2].x += fl1.x * fr2.x;\ results[2].y += fl1.y * fr2.x;\ results[2].z += fl2.x * fr2.x;\ results[2].w += fl2.y * fr2.x;\ \ results[3].x += fl1.x * fr2.y;\ results[3].y += fl1.y * fr2.y;\ results[3].z += fl2.x * fr2.y;\ results[3].w += fl2.y * fr2.y;\ __syncthreads(); // Do the multiplies. #pragma unroll for (int koff = 0; koff < 16; koff ++) { // 32 x threads. float2 fl1 = lhs_shmem2[koff][threadIdx.x]; float2 fl2 = lhs_shmem2[koff + 16][threadIdx.x]; int start_feature = threadIdx.y * 4; float2 fr1 = rhs_shmem2[(start_feature>>1) + 32*((koff%4)/2)][koff/4 + (koff%2)*4]; float2 fr2 = rhs_shmem2[(start_feature>>1) + 1 + 32*((koff%4)/2)][koff/4 + (koff%2)*4]; add_vals(fl1, fl2, fr1, fr2) } __syncthreads(); } #undef prefetch_lhs #undef add_vals Index horiz_base = threadIdx.y*4+base_n; if (!CHECK_LHS_BOUNDARY && !CHECK_RHS_BOUNDARY) { for (int i = 0; i < 4; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } else if (!CHECK_RHS_BOUNDARY) { // CHECK LHS if (lhs_vert + 3 < m_size) { for (int i = 0; i < 4; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } else if (lhs_vert + 2 < m_size) { for (int i = 0; i < 4; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; } } else if (lhs_vert + 1 < m_size) { for (int i = 0; i < 4; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; } } else if (lhs_vert < m_size) { for (int i = 0; i < 4; i++) { output(lhs_vert, horiz_base + i) = results[i].x; } } } else if (!CHECK_LHS_BOUNDARY) { // CHECK RHS /* int ncols_rem = fminf(n_size- horiz_base, 4); for (int i = 0; i < ncols_rem; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; }*/ for (int i = 0; i < 4; i++) { if (horiz_base+i < n_size) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } } else { // CHECK both boundaries. for (int i = 0; i < 4; i++) { if (horiz_base+i < n_size) { if (lhs_vert < m_size) output(lhs_vert, horiz_base + i) = results[i].x; if (lhs_vert + 1 < m_size) output(lhs_vert + 1, horiz_base + i) = results[i].y; if (lhs_vert + 2 < m_size) output(lhs_vert + 2, horiz_base + i) = results[i].z; if (lhs_vert + 3 < m_size) output(lhs_vert + 3, horiz_base + i) = results[i].w; } } } } template __device__ __forceinline__ void EigenFloatContractionKernelInternal(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, float2 lhs_shmem2[][32], float2 rhs_shmem2[][8], const Index m_size, const Index n_size, const Index k_size, const Index base_m, const Index base_n) { // prefetch registers float4 lhs_pf0, lhs_pf1, lhs_pf2, lhs_pf3; float4 rhs_pf0, rhs_pf1; float4 results[8]; for (int i=0; i < 8; i++) { results[i].x = results[i].y = results[i].z = results[i].w = 0; } Index lhs_vert = base_m+threadIdx.x*4+(threadIdx.y%4)*32; for (Index k = 0; k < k_size; k += 32) { lhs_pf0 = internal::pset1(0); lhs_pf1 = internal::pset1(0); lhs_pf2 = internal::pset1(0); lhs_pf3 = internal::pset1(0); rhs_pf0 = internal::pset1(0); rhs_pf1 = internal::pset1(0); if (!CHECK_LHS_BOUNDARY) { if ((threadIdx.y/4+k+24) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); lhs_pf2 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+16)); lhs_pf3 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+24)); } else if ((threadIdx.y/4+k+16) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); lhs_pf2 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+16)); } else if ((threadIdx.y/4+k+8) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); } else if ((threadIdx.y/4+k) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); } } else { // just CHECK_LHS_BOUNDARY if (lhs_vert + 3 < m_size) { if ((threadIdx.y/4+k+24) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); lhs_pf2 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+16)); lhs_pf3 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+24)); } else if ((threadIdx.y/4+k+16) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); lhs_pf2 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+16)); } else if ((threadIdx.y/4+k+8) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); lhs_pf1 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k+8)); } else if ((threadIdx.y/4+k) < k_size) { lhs_pf0 =lhs.template loadPacket(lhs_vert, (threadIdx.y/4+k)); } } else if (lhs_vert + 2 < m_size) { if ((threadIdx.y/4+k+24) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16)); lhs_pf2.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+16)); lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24)); lhs_pf3.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+24)); lhs_pf3.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+24)); } else if ((threadIdx.y/4+k+16) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16)); lhs_pf2.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+16)); } else if ((threadIdx.y/4+k+8) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8)); } else if ((threadIdx.y/4+k) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k)); } } else if (lhs_vert + 1 < m_size) { if ((threadIdx.y/4+k+24) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16)); lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24)); lhs_pf3.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+24)); } else if ((threadIdx.y/4+k+16) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16)); } else if ((threadIdx.y/4+k+8) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8)); } else if ((threadIdx.y/4+k) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k)); } } else if (lhs_vert < m_size) { if ((threadIdx.y/4+k+24) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24)); } else if ((threadIdx.y/4+k+16) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16)); } else if ((threadIdx.y/4+k+8) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8)); } else if ((threadIdx.y/4+k) < k_size) { lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k)); } } } __syncthreads(); Index rhs_vert = k+threadIdx.x*4; Index rhs_horiz0 = threadIdx.y*2+base_n; Index rhs_horiz1 = threadIdx.y*2+1+base_n; if (!CHECK_RHS_BOUNDARY) { if ((rhs_vert + 3) < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0 = rhs.template loadPacket(rhs_vert, rhs_horiz0); rhs_pf1 = rhs.template loadPacket(rhs_vert, rhs_horiz1); } else if (rhs_vert + 2 < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1); rhs_pf1.z = rhs(rhs_vert + 2, rhs_horiz1); } else if (rhs_vert + 1 < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1); } else if (rhs_vert < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); } } else { if (rhs_horiz1 < n_size) { if ((rhs_vert + 3) < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0 = rhs.template loadPacket(rhs_vert, rhs_horiz0); rhs_pf1 = rhs.template loadPacket(rhs_vert, rhs_horiz1); } else if (rhs_vert + 2 < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1); rhs_pf1.z = rhs(rhs_vert + 2, rhs_horiz1); } else if (k+threadIdx.x*4 + 1 < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1); } else if (k+threadIdx.x*4 < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf1.x = rhs(rhs_vert, rhs_horiz1); } } else if (rhs_horiz0 < n_size) { if ((rhs_vert + 3) < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0 = rhs.template loadPacket(rhs_vert, rhs_horiz0); } else if ((rhs_vert + 2) < k_size) { // just CHECK_RHS_BOUNDARY rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0); } else if ((rhs_vert + 1) < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0); } else if (rhs_vert < k_size) { rhs_pf0.x = rhs(rhs_vert, rhs_horiz0); } } } __syncthreads(); // Loaded. Do computation // Row 0 -> times (0, 4, 8, .. 28) for features 0, 1. // Row 1 -> times (0, 4, 8, .. 28) for features 2, 3. // .. // Row 31 -> times (0, 4, 8, .. 28) for features 62, 63 rhs_shmem2[threadIdx.y][threadIdx.x] = make_float2(rhs_pf0.x, rhs_pf1.x); // Row 32 -> times (1, 5, 9, .. 29) for features 0, 1. // Row 33 -> times (1, 5, 9, .. 29) for features 2, 3. // .. rhs_shmem2[threadIdx.y+32][threadIdx.x] = make_float2(rhs_pf0.y, rhs_pf1.y); // Row 64 -> times (2, 6, 10, .. 30) for features 0, 1. // Row 65 -> times (2, 6, 10, .. 30) for features 2, 3. rhs_shmem2[threadIdx.y+64][threadIdx.x] = make_float2(rhs_pf0.z, rhs_pf1.z); // Row 96 -> times (3, 7, 11, .. 31) for features 0, 1. // Row 97 -> times (3, 7, 11, .. 31) for features 2, 3. rhs_shmem2[threadIdx.y+96][threadIdx.x] = make_float2(rhs_pf0.w, rhs_pf1.w); // LHS. // Row 0 (time 0) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), .. (60, 61) .. (124, 125) // Row 1 (time 1) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), .. (60, 61) .. (124, 125) // ... // Row 8 (time 0) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), .. (62, 63) .. (126, 127) // Row 15 (time 7) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), .. (62, 63) .. (126, 127) #define add_vals(a_feat1, a_feat2, f1, f2, f3, f4)\ results[0].x += a_feat1.x * f1.x;\ results[1].x += a_feat1.x * f1.y;\ results[2].x += a_feat1.x * f2.x;\ results[3].x += a_feat1.x * f2.y;\ results[4].x += a_feat1.x * f3.x;\ results[5].x += a_feat1.x * f3.y;\ results[6].x += a_feat1.x * f4.x;\ results[7].x += a_feat1.x * f4.y;\ \ results[0].y += a_feat1.y * f1.x;\ results[1].y += a_feat1.y * f1.y;\ results[2].y += a_feat1.y * f2.x;\ results[3].y += a_feat1.y * f2.y;\ results[4].y += a_feat1.y * f3.x;\ results[5].y += a_feat1.y * f3.y;\ results[6].y += a_feat1.y * f4.x;\ results[7].y += a_feat1.y * f4.y;\ \ results[0].z += a_feat2.x * f1.x;\ results[1].z += a_feat2.x * f1.y;\ results[2].z += a_feat2.x * f2.x;\ results[3].z += a_feat2.x * f2.y;\ results[4].z += a_feat2.x * f3.x;\ results[5].z += a_feat2.x * f3.y;\ results[6].z += a_feat2.x * f4.x;\ results[7].z += a_feat2.x * f4.y;\ \ results[0].w += a_feat2.y * f1.x;\ results[1].w += a_feat2.y * f1.y;\ results[2].w += a_feat2.y * f2.x;\ results[3].w += a_feat2.y * f2.y;\ results[4].w += a_feat2.y * f3.x;\ results[5].w += a_feat2.y * f3.y;\ results[6].w += a_feat2.y * f4.x;\ results[7].w += a_feat2.y * f4.y;\ lhs_shmem2[threadIdx.y/4][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf0.x, lhs_pf0.y); lhs_shmem2[threadIdx.y/4+8][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf1.x, lhs_pf1.y); lhs_shmem2[threadIdx.y/4+16][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf2.x, lhs_pf2.y); lhs_shmem2[threadIdx.y/4+24][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf3.x, lhs_pf3.y); lhs_shmem2[threadIdx.y/4 + 32][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf0.z, lhs_pf0.w); lhs_shmem2[threadIdx.y/4 + 40][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf1.z, lhs_pf1.w); lhs_shmem2[threadIdx.y/4 + 48][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf2.z, lhs_pf2.w); lhs_shmem2[threadIdx.y/4 + 56][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf3.z, lhs_pf3.w); __syncthreads(); // Do the multiplies. #pragma unroll for (int koff = 0; koff < 32; koff ++) { float2 a3 = lhs_shmem2[koff][threadIdx.x + (threadIdx.y % 4) * 8]; float2 a4 = lhs_shmem2[koff + 32][threadIdx.x + (threadIdx.y % 4) * 8]; // first feature is at (threadIdx.y/4) * 8 last is at start + 8. int start_feature = (threadIdx.y / 4) * 8; float2 br1 = rhs_shmem2[start_feature/2 + (koff % 4) * 32][koff/4]; float2 br2 = rhs_shmem2[start_feature/2 + 1 + (koff % 4) * 32][koff/4]; float2 br3 = rhs_shmem2[start_feature/2 + 2 + (koff % 4) * 32][koff/4]; float2 br4 = rhs_shmem2[start_feature/2 + 3 + (koff % 4) * 32][koff/4]; add_vals(a3, a4, br1, br2, br3, br4) } __syncthreads(); } // end loop over k __syncthreads(); Index horiz_base = (threadIdx.y/4)*8+base_n; if (!CHECK_LHS_BOUNDARY && !CHECK_RHS_BOUNDARY) { for (int i = 0; i < 8; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } else if (!CHECK_RHS_BOUNDARY) { if (lhs_vert + 3 < m_size) { for (int i = 0; i < 8; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } else if (lhs_vert + 2 < m_size) { for (int i = 0; i < 8; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; } } else if (lhs_vert + 1 < m_size) { for (int i = 0; i < 8; i++) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; } } else if (lhs_vert < m_size) { for (int i = 0; i < 8; i++) { output(lhs_vert, horiz_base + i) = results[i].x; } } } else if (!CHECK_LHS_BOUNDARY) { // CHECK BOUNDARY_B for (int i = 0; i < 8; i++) { if (horiz_base + i < n_size) { output(lhs_vert, horiz_base + i) = results[i].x; output(lhs_vert + 1, horiz_base + i) = results[i].y; output(lhs_vert + 2, horiz_base + i) = results[i].z; output(lhs_vert + 3, horiz_base + i) = results[i].w; } } } else { // CHECK both boundaries. for (int i = 0; i < 8; i++) { if (horiz_base + i < n_size) { if (lhs_vert < m_size) output(lhs_vert, horiz_base + i) = results[i].x; if (lhs_vert + 1 < m_size) output(lhs_vert + 1, horiz_base + i) = results[i].y; if (lhs_vert + 2 < m_size) output(lhs_vert + 2, horiz_base + i) = results[i].z; if (lhs_vert + 3 < m_size) output(lhs_vert + 3, horiz_base + i) = results[i].w; } } } } template __global__ void #if defined(EIGEN_HIPCC) __launch_bounds__(256, 1) #else __launch_bounds__(256) #endif EigenFloatContractionKernel(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, const Index m_size, const Index n_size, const Index k_size) { __shared__ float2 lhs_shmem[64*32]; __shared__ float2 rhs_shmem[128*8]; typedef float2 LHS_MEM[64][32]; typedef float2 RHS_MEM[128][8]; const Index m_block_idx = blockIdx.x; const Index n_block_idx = blockIdx.y; const Index base_m = 128 * m_block_idx; const Index base_n = 64 * n_block_idx; bool check_rhs = (base_n + 63) >= n_size; bool check_lhs128 = (base_m + 127) >= m_size; if (!check_rhs) { if (!check_lhs128) { // >= 128 rows left EigenFloatContractionKernelInternal( lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n); } else { EigenFloatContractionKernelInternal( lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n); } } else { if (!check_lhs128) { // >= 128 rows left EigenFloatContractionKernelInternal( lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n); } else { EigenFloatContractionKernelInternal( lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n); } } } template __global__ void #if defined(EIGEN_HIPCC) __launch_bounds__(256, 1) #else __launch_bounds__(256) #endif EigenFloatContractionKernel16x16(const LhsMapper lhs, const RhsMapper rhs, const OutputMapper output, const Index m_size, const Index n_size, const Index k_size) { __shared__ float2 lhs_shmem[32][16]; __shared__ float2 rhs_shmem[64][8]; const Index m_block_idx = blockIdx.x; const Index n_block_idx = blockIdx.y; const Index base_m = 64 * m_block_idx; const Index base_n = 64 * n_block_idx; if (base_m + 63 < m_size) { if (base_n + 63 < n_size) { EigenFloatContractionKernelInternal16x16(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n); } else { EigenFloatContractionKernelInternal16x16(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n); } } else { if (base_n + 63 < n_size) { EigenFloatContractionKernelInternal16x16(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n); } else { EigenFloatContractionKernelInternal16x16(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n); } } } template struct TensorEvaluator, GpuDevice> : public TensorContractionEvaluatorBase, GpuDevice> > { typedef GpuDevice Device; typedef TensorEvaluator, Device> Self; typedef TensorContractionEvaluatorBase Base; typedef TensorContractionOp XprType; typedef typename internal::remove_const::type Scalar; typedef typename XprType::Index Index; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; enum { Layout = TensorEvaluator::Layout, }; // Most of the code is assuming that both input tensors are ColMajor. If the // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS: // If we want to compute A * B = C, where A is LHS and B is RHS, the code // will pretend B is LHS and A is RHS. typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType; typedef typename internal::conditional< static_cast(Layout) == static_cast(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType; static const int LDims = internal::array_size::Dimensions>::value; static const int RDims = internal::array_size::Dimensions>::value; static const int ContractDims = internal::array_size::value; typedef array left_dim_mapper_t; typedef array right_dim_mapper_t; typedef array contract_t; typedef array left_nocontract_t; typedef array right_nocontract_t; static const int NumDims = LDims + RDims - 2 * ContractDims; typedef DSizes Dimensions; // typedefs needed in evalTo typedef typename internal::remove_const::type LhsScalar; typedef typename internal::remove_const::type RhsScalar; typedef TensorEvaluator LeftEvaluator; typedef TensorEvaluator RightEvaluator; typedef typename LeftEvaluator::Dimensions LeftDimensions; typedef typename RightEvaluator::Dimensions RightDimensions; TensorEvaluator(const XprType& op, const Device& device) : Base(op, device) { EIGEN_STATIC_ASSERT( (internal::is_same::value), GPU_TENSOR_CONTRACTION_DOES_NOT_SUPPORT_OUTPUT_KERNELS); } // We need to redefine this method to make nvcc happy EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) { this->m_leftImpl.evalSubExprsIfNeeded(NULL); this->m_rightImpl.evalSubExprsIfNeeded(NULL); if (data) { evalTo(data); return false; } else { this->m_result = static_cast(this->m_device.allocate(this->dimensions().TotalSize() * sizeof(Scalar))); evalTo(this->m_result); return true; } } void evalTo(Scalar* buffer) const { if (this->m_lhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } else { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } } else { if (this->m_rhs_inner_dim_contiguous) { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } else { if (this->m_rhs_inner_dim_reordered) { evalTyped(buffer); } else { evalTyped(buffer); } } } } template struct LaunchKernels { static void Run(const LhsMapper& lhs, const RhsMapper& rhs, const OutputMapper& output, Index m, Index n, Index k, const GpuDevice& device) { const Index m_blocks = (m + 63) / 64; const Index n_blocks = (n + 63) / 64; const dim3 num_blocks(m_blocks, n_blocks, 1); const dim3 block_size(8, 8, 8); LAUNCH_GPU_KERNEL((EigenContractionKernel), num_blocks, block_size, 0, device, lhs, rhs, output, m, n, k); } }; template struct LaunchKernels { static void Run(const LhsMapper& lhs, const RhsMapper& rhs, const OutputMapper& output, Index m, Index n, Index k, const GpuDevice& device) { if (m < 768 || n < 768) { const Index m_blocks = (m + 63) / 64; const Index n_blocks = (n + 63) / 64; const dim3 num_blocks(m_blocks, n_blocks, 1); const dim3 block_size(16, 16, 1); LAUNCH_GPU_KERNEL((EigenFloatContractionKernel16x16), num_blocks, block_size, 0, device, lhs, rhs, output, m, n, k); } else { const Index m_blocks = (m + 127) / 128; const Index n_blocks = (n + 63) / 64; const dim3 num_blocks(m_blocks, n_blocks, 1); const dim3 block_size(8, 32, 1); LAUNCH_GPU_KERNEL((EigenFloatContractionKernel), num_blocks, block_size, 0, device, lhs, rhs, output, m, n, k); } } }; template void evalTyped(Scalar* buffer) const { // columns in left side, rows in right side const Index k = this->m_k_size; EIGEN_UNUSED_VARIABLE(k) // rows in left side const Index m = this->m_i_size; // columns in right side const Index n = this->m_j_size; // zero out the result buffer (which must be of size at least m * n * sizeof(Scalar) this->m_device.memset(buffer, 0, m * n * sizeof(Scalar)); typedef internal::TensorContractionInputMapper LhsMapper; typedef internal::TensorContractionInputMapper RhsMapper; typedef internal::blas_data_mapper OutputMapper; // initialize data mappers LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides, this->m_left_contracting_strides, this->m_k_strides); RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides, this->m_right_contracting_strides, this->m_k_strides); OutputMapper output(buffer, m); #if defined(EIGEN_USE_HIP) setGpuSharedMemConfig(hipSharedMemBankSizeEightByte); #else setGpuSharedMemConfig(cudaSharedMemBankSizeEightByte); #endif LaunchKernels::Run(lhs, rhs, output, m, n, k, this->m_device); } }; } // end namespace Eigen #endif // EIGEN_USE_GPU and EIGEN_GPUCC #endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_GPU_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolutionSycl.h0000644000176200001440000006560714562417221027552 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_SYCL_H #define EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_SYCL_H namespace Eigen { /** \class TensorConvolution * \ingroup CXX11_Tensor_Module * * \brief Tensor convolution class. * * */ enum class convolution_type { CONV1D, CONV2D, CONV3D }; template struct EigenConvolutionKernel; template struct EigenConvolutionKernel { typedef cl::sycl::accessor Local_accessor; Local_accessor local_acc; Evaluator device_evaluator; Kernel_accessor kernel_filter; Buffer_accessor buffer_acc; internal::IndexMapper indexMapper; const size_t kernelSize; const cl::sycl::range<2> input_range; EigenConvolutionKernel(Local_accessor local_acc_, Evaluator device_evaluator_, Kernel_accessor kernel_filter_, Buffer_accessor buffer_acc_, internal::IndexMapper indexMapper_, const size_t kernelSize_, const cl::sycl::range<2> input_range_) : local_acc(local_acc_), device_evaluator(device_evaluator_), kernel_filter(kernel_filter_), buffer_acc(buffer_acc_), indexMapper(indexMapper_), kernelSize(kernelSize_), input_range(input_range_) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool boundary_check(const BooleanDim2 boolean_check) { return (boolean_check[0] && boolean_check[1]); } void operator()(cl::sycl::nd_item<2> itemID) { auto buffer_ptr = buffer_acc.get_pointer(); auto kernel_ptr = kernel_filter.get_pointer(); // the required row to be calculated for the for each plane in shered memory const size_t num_input = (itemID.get_local_range()[0] + kernelSize - 1); const size_t plane_kernel_offset = itemID.get_local_id(1) * num_input; const size_t input_offset = itemID.get_group(0) * itemID.get_local_range()[0]; const size_t plane_tensor_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(itemID.get_global_id(1)); /// fill the shared memory for (size_t i = itemID.get_local_id(0); i < num_input; i += itemID.get_local_range()[0]) { const size_t local_index = i + plane_kernel_offset; const size_t tensor_index = plane_tensor_offset + indexMapper.mapGpuInputKernelToTensorInputOffset(i + input_offset); local_acc[local_index] = (((i + input_offset) < (input_range[0] + kernelSize - 1)) && itemID.get_global_id(1) < input_range[1]) ? device_evaluator.coeff(tensor_index) : CoeffReturnType(0); } itemID.barrier(cl::sycl::access::fence_space::local_space); // calculate the convolution // output start x const size_t first_output_start = itemID.get_group(0) * (itemID.get_local_range()[0]); if (boundary_check(itemID.get_global_id() < input_range)) { CoeffReturnType result = static_cast(0); const size_t index = plane_kernel_offset + itemID.get_local_id(0); for (size_t k = 0; k < kernelSize; ++k) { result += (local_acc[k + index] * kernel_ptr[k]); } const size_t tensor_index = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(itemID.get_global_id(1)) + indexMapper.mapGpuOutputKernelToTensorOutputOffset(itemID.get_local_id(0) + first_output_start); buffer_ptr[tensor_index] = result; } } }; template struct EigenConvolutionKernel { typedef cl::sycl::accessor Local_accessor; Local_accessor local_acc; Evaluator device_evaluator; Kernel_accessor kernel_filter; Buffer_accessor buffer_acc; internal::IndexMapper indexMapper; const cl::sycl::range<2> kernel_size; const cl::sycl::range<3> input_range; EigenConvolutionKernel(Local_accessor local_acc_, Evaluator device_evaluator_, Kernel_accessor kernel_filter_, Buffer_accessor buffer_acc_, internal::IndexMapper indexMapper_, const cl::sycl::range<2> kernel_size_, const cl::sycl::range<3> input_range_) : local_acc(local_acc_), device_evaluator(device_evaluator_), kernel_filter(kernel_filter_), buffer_acc(buffer_acc_), indexMapper(indexMapper_), kernel_size(kernel_size_), input_range(input_range_) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool boundary_check(const BooleanDim3 boolean_check) { return (boolean_check[0] && boolean_check[1] && boolean_check[2]); } void operator()(cl::sycl::nd_item<3> itemID) { auto buffer_ptr = buffer_acc.get_pointer(); auto kernel_ptr = kernel_filter.get_pointer(); // the required row to be calculated for the for each plane in shered memory const auto num_input = cl::sycl::range<2>{ (cl::sycl::range<2>(itemID.get_local_range()[0], itemID.get_local_range()[1]) + kernel_size - 1)}; const size_t plane_input_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(itemID.get_global_id(2)); const size_t plane_kernel_offset = itemID.get_local_id(2) * num_input[1]; const auto input_offset = cl::sycl::range<2>{itemID.get_group(0) * itemID.get_local_range()[0], itemID.get_group(1) * itemID.get_local_range()[1]}; // fill the local memory bool in_range_dim2 = itemID.get_global_id(2) < input_range[2]; for (size_t j = itemID.get_local_id(1); j < num_input[1]; j += itemID.get_local_range()[1]) { const size_t local_input_offset = num_input[0] * (j + plane_kernel_offset); bool in_range_dim1 = ((j + input_offset[1]) < (input_range[1] + kernel_size[1] - 1)); for (size_t i = itemID.get_local_id(0); i < num_input[0]; i += itemID.get_local_range()[0]) { const size_t local_index = i + local_input_offset; const size_t tensor_index = plane_input_offset + indexMapper.mapGpuInputKernelToTensorInputOffset( i + input_offset[0], j + input_offset[1]); local_acc[local_index] = (((i + input_offset[0]) < (input_range[0] + kernel_size[0] - 1)) && in_range_dim1 && in_range_dim2) ? device_evaluator.coeff(tensor_index) : CoeffReturnType(0); } } itemID.barrier(cl::sycl::access::fence_space::local_space); // output offset start for each thread const auto output_offset = cl::sycl::range<2>{itemID.get_group(0) * itemID.get_local_range()[0], itemID.get_group(1) * itemID.get_local_range()[1]}; if (boundary_check(itemID.get_global_id() < input_range)) { CoeffReturnType result = static_cast(0); for (size_t j = 0; j < kernel_size[1]; j++) { size_t kernel_offset = kernel_size[0] * j; const size_t index = (num_input[0] * (plane_kernel_offset + j + itemID.get_local_id(1))) + itemID.get_local_id(0); for (size_t i = 0; i < kernel_size[0]; i++) { result += (local_acc[i + index] * kernel_ptr[i + kernel_offset]); } } const size_t tensor_index = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(itemID.get_global_id(2)) + indexMapper.mapGpuOutputKernelToTensorOutputOffset(itemID.get_local_id(0) + output_offset[0], itemID.get_local_id(1) + output_offset[1]); buffer_ptr[tensor_index] = result; } } }; template struct EigenConvolutionKernel { typedef cl::sycl::accessor Local_accessor; Local_accessor local_acc; Evaluator device_evaluator; Kernel_accessor kernel_filter; Buffer_accessor buffer_acc; internal::IndexMapper indexMapper; const cl::sycl::range<3> kernel_size; const cl::sycl::range<3> input_range; const size_t numP; EigenConvolutionKernel(Local_accessor local_acc_, Evaluator device_evaluator_, Kernel_accessor kernel_filter_, Buffer_accessor buffer_acc_, internal::IndexMapper indexMapper_, const cl::sycl::range<3> kernel_size_, const cl::sycl::range<3> input_range_, const size_t numP_) : local_acc(local_acc_), device_evaluator(device_evaluator_), kernel_filter(kernel_filter_), buffer_acc(buffer_acc_), indexMapper(indexMapper_), kernel_size(kernel_size_), input_range(input_range_), numP(numP_) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool boundary_check(const BooleanDim3 boolean_check) { return (boolean_check[0] && boolean_check[1] && boolean_check[2]); } void operator()(cl::sycl::nd_item<3> itemID) { auto buffer_ptr = buffer_acc.get_pointer(); auto kernel_ptr = kernel_filter.get_pointer(); const auto num_input = cl::sycl::range<3>{itemID.get_local_range() + kernel_size - 1}; const auto input_offset = cl::sycl::range<3>{itemID.get_group().get_id() * itemID.get_local_range()}; const auto output_offset = cl::sycl::range<3>{itemID.get_group().get_id() * itemID.get_local_range() + itemID.get_local_id()}; for (size_t p = 0; p < numP; p++) { /// fill the shared memory const size_t plane_input_offset = indexMapper.mapGpuInputPlaneToTensorInputOffset(p); for (size_t k = itemID.get_local_id(2); k < num_input[2]; k += itemID.get_local_range()[2]) { size_t local_index_dim2 = num_input[0] * num_input[1] * k; bool cond_k_dim = (k + input_offset[2] < (input_range[2] + kernel_size[2] - 1)); for (size_t j = itemID.get_local_id(1); j < num_input[1]; j += itemID.get_local_range()[1]) { bool cond_j_dim = cond_k_dim && (j + input_offset[1] < (input_range[1] + kernel_size[1] - 1)); size_t local_index_dim1 = (num_input[0] * j) + local_index_dim2; for (size_t i = itemID.get_local_id(0); i < num_input[0]; i += itemID.get_local_range()[0]) { bool conds = cond_j_dim && (i + input_offset[0] < (input_range[0] + kernel_size[0] - 1)); const size_t local_index = local_index_dim1 + i; const size_t tensor_index = plane_input_offset + indexMapper.mapGpuInputKernelToTensorInputOffset( i + input_offset[0], j + input_offset[1], k + input_offset[2]); local_acc[local_index] = conds ? device_evaluator.coeff(tensor_index) : CoeffReturnType(0); } } } itemID.barrier(cl::sycl::access::fence_space::local_space); // calculate the convolution if (boundary_check(itemID.get_global_id() < input_range)) { CoeffReturnType result = static_cast(0); for (size_t k = 0; k < kernel_size[2]; k++) { for (size_t j = 0; j < kernel_size[1]; j++) { for (size_t i = 0; i < kernel_size[0]; i++) { const size_t kernel_index = i + kernel_size[0] * (j + kernel_size[1] * k); const size_t local_index = ((i + itemID.get_local_id(0)) + num_input[0] * ((j + itemID.get_local_id(1)) + num_input[1] * (k + itemID.get_local_id(2)))); result += (local_acc[local_index] * kernel_ptr[kernel_index]); } } } const size_t tensor_index = indexMapper.mapGpuOutputPlaneToTensorOutputOffset(p) + indexMapper.mapGpuOutputKernelToTensorOutputOffset(output_offset[0], output_offset[1], output_offset[2]); buffer_ptr[tensor_index] = result; } itemID.barrier(cl::sycl::access::fence_space::local_space); } } }; template struct TensorEvaluator, Eigen::SyclDevice> { typedef TensorConvolutionOp XprType; static const int NumDims = internal::array_size::Dimensions>::value; static const int NumKernelDims = internal::array_size::value; typedef typename XprType::Index Index; typedef DSizes Dimensions; typedef typename TensorEvaluator::Dimensions KernelDimensions; typedef const Eigen::SyclDevice Device; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef typename InputArgType::Scalar Scalar; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; typedef StorageMemory KernelStorage; enum { IsAligned = TensorEvaluator::IsAligned & TensorEvaluator::IsAligned, PacketAccess = false, BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, // to be implemented RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// TensorEvaluator(const XprType &op, const Eigen::SyclDevice &device) : m_inputImpl(op.inputExpression(), device), m_kernelArg(op.kernelExpression()), m_kernelImpl(op.kernelExpression(), device), m_indices(op.indices()), m_buf(NULL), m_kernel(NULL), m_local_kernel(false), m_device(device) { EIGEN_STATIC_ASSERT((static_cast(TensorEvaluator::Layout) == static_cast(TensorEvaluator::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE); const typename TensorEvaluator::Dimensions &input_dims = m_inputImpl.dimensions(); const typename TensorEvaluator::Dimensions &kernel_dims = m_kernelImpl.dimensions(); m_dimensions = m_inputImpl.dimensions(); for (int i = 0; i < NumKernelDims; ++i) { const Index index = op.indices()[i]; const Index input_dim = input_dims[index]; const Index kernel_dim = kernel_dims[i]; const Index result_dim = input_dim - kernel_dim + 1; m_dimensions[index] = result_dim; } } EIGEN_DEVICE_FUNC const Dimensions &dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { preloadKernel(); m_inputImpl.evalSubExprsIfNeeded(NULL); if (data) { executeEval(data); return false; } else { m_buf = (EvaluatorPointerType)m_device.get( (Scalar *)m_device.allocate_temp(dimensions().TotalSize() * sizeof(Scalar))); executeEval(m_buf); return true; } } EIGEN_STRONG_INLINE void cleanup() { m_inputImpl.cleanup(); if (m_buf) { m_device.deallocate_temp(m_buf); m_buf = NULL; } if (m_local_kernel) { m_device.deallocate_temp(m_kernel); m_local_kernel = false; } m_kernel = NULL; } /// used by sycl in order to build the sycl buffer EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Device &device() const { return m_device; } /// used by sycl in order to build the sycl buffer EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return m_buf; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void preloadKernel() { // Don't make a local copy of the kernel unless we have to (i.e. it's an // expression that needs to be evaluated) typename KernelStorage::Type in_place = m_kernelImpl.data(); if (in_place) { m_kernel = in_place; m_local_kernel = false; } else { ptrdiff_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar); EvaluatorPointerType local = (EvaluatorPointerType)m_device.get((Scalar *)m_device.allocate_temp(kernel_sz)); typedef TensorEvalToOp EvalTo; EvalTo evalToTmp(m_device.get(local), m_kernelArg); const bool PacketAccess = internal::IsVectorizable::value; internal::TensorExecutor::run(evalToTmp, m_device); m_kernel = local; m_local_kernel = true; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void executeEval(EvaluatorPointerType data) const { typedef TensorEvaluator InputEvaluator; typedef typename InputEvaluator::Dimensions InputDims; switch (NumKernelDims) { case 1: { const size_t numX = dimensions()[m_indices[0]]; const size_t numP = dimensions().TotalSize() / numX; const auto input_dim = std::array{numX, numP}; auto global_range = cl::sycl::range<2>{}; auto local_range = cl::sycl::range<2>{}; const size_t kernel_size = m_kernelImpl.dimensions().TotalSize(); m_device.parallel_for_setup(input_dim, global_range, local_range); const size_t local_memory_size = (local_range[0] + kernel_size - 1) * (local_range[1]); gpu_assert(static_cast(local_memory_size) <= m_device.sharedMemPerBlock()); const array indices{{m_indices[0]}}; const array kernel_dims{{m_kernelImpl.dimensions()[0]}}; internal::IndexMapper indexMapper(m_inputImpl.dimensions(), kernel_dims, indices); typedef EigenConvolutionKernel ConvKernel; m_device.template binary_kernel_launcher( m_inputImpl, m_kernel, data, cl::sycl::nd_range<2>(global_range, local_range), local_memory_size, indexMapper, kernel_size, cl::sycl::range<2>(input_dim[0], input_dim[1])); break; } case 2: { auto kernel_index = std::array{static_cast(Layout) == static_cast(ColMajor) ? 0 : 1, static_cast(Layout) == static_cast(ColMajor) ? 1 : 0}; auto kernel_size = cl::sycl::range<2>{(size_t)m_kernelImpl.dimensions()[kernel_index[0]], (size_t)m_kernelImpl.dimensions()[kernel_index[1]]}; const size_t numX = dimensions()[m_indices[kernel_index[0]]]; const size_t numY = dimensions()[m_indices[kernel_index[1]]]; const size_t numP = dimensions().TotalSize() / (numX * numY); auto input_dim = std::array{numX, numY, numP}; auto global_range = cl::sycl::range<3>{}; auto local_range = cl::sycl::range<3>{}; m_device.parallel_for_setup(input_dim, global_range, local_range); const size_t local_memory_size = (local_range[0] + kernel_size[0] - 1) * (local_range[1] + kernel_size[1] - 1) * local_range[2]; gpu_assert(static_cast(local_memory_size) <= m_device.sharedMemPerBlock()); const array indices{{m_indices[kernel_index[0]], m_indices[kernel_index[1]]}}; const array kernel_dims{ {m_kernelImpl.dimensions()[kernel_index[0]], m_kernelImpl.dimensions()[kernel_index[1]]}}; internal::IndexMapper indexMapper(m_inputImpl.dimensions(), kernel_dims, indices); typedef EigenConvolutionKernel ConvKernel; m_device.template binary_kernel_launcher( m_inputImpl, m_kernel, data, cl::sycl::nd_range<3>(global_range, local_range), local_memory_size, indexMapper, kernel_size, cl::sycl::range<3>{input_dim[0], input_dim[1], input_dim[2]}); break; } case 3: { auto kernel_index = std::array{static_cast(Layout) == static_cast(ColMajor) ? 0 : 2, static_cast(Layout) == static_cast(ColMajor) ? 1 : 1, static_cast(Layout) == static_cast(ColMajor) ? 2 : 0}; auto kernel_size = cl::sycl::range<3>{(size_t)m_kernelImpl.dimensions()[kernel_index[0]], (size_t)m_kernelImpl.dimensions()[kernel_index[1]], (size_t)m_kernelImpl.dimensions()[kernel_index[2]]}; const size_t numX = dimensions()[m_indices[kernel_index[0]]]; const size_t numY = dimensions()[m_indices[kernel_index[1]]]; const size_t numZ = dimensions()[m_indices[kernel_index[2]]]; auto input_dim = std::array{numX, numY, numZ}; const size_t numP = dimensions().TotalSize() / (numX * numY * numZ); const array indices{ {m_indices[kernel_index[0]], m_indices[kernel_index[1]], m_indices[kernel_index[2]]}}; const array kernel_dims{{m_kernelImpl.dimensions()[kernel_index[0]], m_kernelImpl.dimensions()[kernel_index[1]], m_kernelImpl.dimensions()[kernel_index[2]]}}; internal::IndexMapper indexMapper(m_inputImpl.dimensions(), kernel_dims, indices); auto global_range = cl::sycl::range<3>{}; auto local_range = cl::sycl::range<3>{}; m_device.parallel_for_setup(input_dim, global_range, local_range); auto local_memory_range = (local_range + kernel_size - 1); const size_t local_memory_size = local_memory_range[0] * local_memory_range[1] * local_memory_range[2]; gpu_assert(static_cast(local_memory_size) <= m_device.sharedMemPerBlock()); typedef EigenConvolutionKernel ConvKernel; m_device.template binary_kernel_launcher( m_inputImpl, m_kernel, data, cl::sycl::nd_range<3>(global_range, local_range), local_memory_size, indexMapper, kernel_size, cl::sycl::range<3>(input_dim[0], input_dim[1], input_dim[2]), numP); break; } default: { EIGEN_STATIC_ASSERT((NumKernelDims >= 1 && NumKernelDims <= 3), THIS_METHOD_IS_ONLY_FOR_OBJECTS_OF_A_SPECIFIC_SIZE); } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { eigen_assert(m_buf != NULL); eigen_assert(index < m_dimensions.TotalSize()); return m_buf[index]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(const Index index) const { eigen_assert(m_buf != NULL); eigen_assert(index < m_dimensions.TotalSize()); return internal::ploadt(m_buf + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { // TODO(rmlarsen): FIXME: For now, this is just a copy of the CPU cost // model. const double kernel_size = m_kernelImpl.dimensions().TotalSize(); // We ignore the use of fused multiply-add. const double convolve_compute_cost = TensorOpCost::AddCost() + TensorOpCost::MulCost(); const double firstIndex_compute_cost = NumDims * (2 * TensorOpCost::AddCost() + 2 * TensorOpCost::MulCost() + TensorOpCost::DivCost()); return TensorOpCost(0, 0, firstIndex_compute_cost, vectorized, PacketSize) + kernel_size * (m_inputImpl.costPerCoeff(vectorized) + m_kernelImpl.costPerCoeff(vectorized) + TensorOpCost(0, 0, convolve_compute_cost, vectorized, PacketSize)); } // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_kernelImpl.bind(cgh); m_inputImpl.bind(cgh); m_buf.bind(cgh); m_kernel.bind(cgh); } private: // No assignment (copies are needed by the kernels) TensorEvaluator &operator=(const TensorEvaluator &); TensorEvaluator m_inputImpl; KernelArgType m_kernelArg; TensorEvaluator m_kernelImpl; Indices m_indices; Dimensions m_dimensions; EvaluatorPointerType m_buf; typename KernelStorage::Type m_kernel; bool m_local_kernel; const Eigen::SyclDevice EIGEN_DEVICE_REF m_device; }; // namespace Eigen } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h0000644000176200001440000000033514562417221027114 0ustar liggesusers #if defined(__clang__) || defined(__GNUC__) #warning "Deprecated header file, please either include the main Eigen/CXX11/Tensor header or the respective TensorReductionGpu.h file" #endif #include "TensorReductionGpu.h" RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorGpuHipCudaUndefines.h0000644000176200001440000000236314562417221030220 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2018 Deven Desai // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #if defined(EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H) #ifndef EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES #undef gpuStream_t #undef gpuDeviceProp_t #undef gpuError_t #undef gpuSuccess #undef gpuErrorNotReady #undef gpuGetDeviceCount #undef gpuGetErrorString #undef gpuGetDeviceProperties #undef gpuStreamDefault #undef gpuGetDevice #undef gpuSetDevice #undef gpuMalloc #undef gpuFree #undef gpuMemsetAsync #undef gpuMemcpyAsync #undef gpuMemcpyDeviceToDevice #undef gpuMemcpyDeviceToHost #undef gpuMemcpyHostToDevice #undef gpuStreamQuery #undef gpuSharedMemConfig #undef gpuDeviceSetSharedMemConfig #undef gpuStreamSynchronize #undef gpuDeviceSynchronize #undef gpuMemcpy #endif // EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES #undef EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H #endif // EIGEN_CXX11_TENSOR_GPU_HIP_CUDA_DEFINES_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h0000644000176200001440000012224614562417221026771 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H #define EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H namespace Eigen { /** \class TensorBroadcasting * \ingroup CXX11_Tensor_Module * * \brief Tensor broadcasting class. * * */ namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorBroadcastingOp EIGEN_DEVICE_REF type; }; template struct nested, 1, typename eval >::type> { typedef TensorBroadcastingOp type; }; template struct is_input_scalar { static const bool value = false; }; template <> struct is_input_scalar > { static const bool value = true; }; #ifndef EIGEN_EMULATE_CXX11_META_H template struct is_input_scalar > { static const bool value = (Sizes::total_size == 1); }; #endif } // end namespace internal template class TensorBroadcastingOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBroadcastingOp(const XprType& expr, const Broadcast& broadcast) : m_xpr(expr), m_broadcast(broadcast) {} EIGEN_DEVICE_FUNC const Broadcast& broadcast() const { return m_broadcast; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const Broadcast m_broadcast; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorBroadcastingOp XprType; typedef typename XprType::Index Index; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename XprType::Scalar Scalar; typedef typename TensorEvaluator::Dimensions InputDimensions; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; protected: // all the non-static fields must have the same access control, otherwise the TensorEvaluator wont be standard layout; bool isCopy, nByOne, oneByN; public: typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = TensorEvaluator::IsAligned, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = TensorEvaluator::BlockAccess, PreferBlockAccess = true, Layout = TensorEvaluator::Layout, RawAccess = false }; typedef typename internal::remove_const::type ScalarNoConst; // We do block based broadcasting using a trick with 2x tensor rank and 0 // strides. See block method implementation for details. typedef DSizes BroadcastDimensions; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockDescriptor TensorBlockDesc; typedef internal::TensorBlockScratchAllocator TensorBlockScratch; typedef typename TensorEvaluator::TensorBlock ArgTensorBlock; typedef typename internal::TensorMaterializedBlock TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : isCopy(false), nByOne(false), oneByN(false), m_device(device), m_broadcast(op.broadcast()), m_impl(op.expression(), device) { // The broadcasting op doesn't change the rank of the tensor. One can't broadcast a scalar // and store the result in a scalar. Instead one should reshape the scalar into a a N-D // tensor with N >= 1 of 1 element first and then broadcast. EIGEN_STATIC_ASSERT((NumDims > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); const InputDimensions& input_dims = m_impl.dimensions(); isCopy = true; for (int i = 0; i < NumDims; ++i) { eigen_assert(input_dims[i] > 0); m_dimensions[i] = input_dims[i] * m_broadcast[i]; if (m_broadcast[i] != 1) { isCopy = false; } } if (static_cast(Layout) == static_cast(ColMajor)) { m_inputStrides[0] = 1; m_outputStrides[0] = 1; for (int i = 1; i < NumDims; ++i) { m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1]; m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1]; } } else { m_inputStrides[NumDims-1] = 1; m_outputStrides[NumDims-1] = 1; for (int i = NumDims-2; i >= 0; --i) { m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1]; m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1]; } } if (input_dims[0] == 1) { oneByN = true; for (int i = 1; i < NumDims; ++i) { if (m_broadcast[i] != 1) { oneByN = false; break; } } } else if (input_dims[NumDims-1] == 1) { nByOne = true; for (int i = 0; i < NumDims-1; ++i) { if (m_broadcast[i] != 1) { nByOne = false; break; } } } // Handle special format like NCHW, its input shape is '[1, N..., 1]' and // broadcast shape is '[N, 1..., N]' if (!oneByN && !nByOne) { if (input_dims[0] == 1 && input_dims[NumDims-1] == 1 && NumDims > 2) { nByOne = true; oneByN = true; for (int i = 1; i < NumDims-1; ++i) { if (m_broadcast[i] != 1) { nByOne = false; oneByN = false; break; } } } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType) { m_impl.evalSubExprsIfNeeded(NULL); return true; } #ifdef EIGEN_USE_THREADS template EIGEN_STRONG_INLINE void evalSubExprsIfNeededAsync( EvaluatorPointerType, EvalSubExprsCallback done) { m_impl.evalSubExprsIfNeededAsync(nullptr, [done](bool) { done(true); }); } #endif // EIGEN_USE_THREADS EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffReturnType coeff(Index index) const { if (internal::is_input_scalar::type>::value) { return m_impl.coeff(0); } if (static_cast(Layout) == static_cast(ColMajor)) { if (isCopy) { return m_impl.coeff(index); } else { return coeffColMajor(index); } } else { if (isCopy) { return m_impl.coeff(index); } else { return coeffRowMajor(index); } } } // TODO: attempt to speed this up. The integer divisions and modulo are slow EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index indexColMajor(Index index) const { Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; if (internal::index_statically_eq(i, 1)) { eigen_assert(idx < m_impl.dimensions()[i]); inputIndex += idx * m_inputStrides[i]; } else { if (internal::index_statically_eq(i, 1)) { eigen_assert(idx % m_impl.dimensions()[i] == 0); } else { inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i]; } } index -= idx * m_outputStrides[i]; } if (internal::index_statically_eq(0, 1)) { eigen_assert(index < m_impl.dimensions()[0]); inputIndex += index; } else { if (internal::index_statically_eq(0, 1)) { eigen_assert(index % m_impl.dimensions()[0] == 0); } else { inputIndex += (index % m_impl.dimensions()[0]); } } return inputIndex; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeffColMajor(Index index) const { return m_impl.coeff(indexColMajor(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index indexRowMajor(Index index) const { Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStrides[i]; if (internal::index_statically_eq(i, 1)) { eigen_assert(idx < m_impl.dimensions()[i]); inputIndex += idx * m_inputStrides[i]; } else { if (internal::index_statically_eq(i, 1)) { eigen_assert(idx % m_impl.dimensions()[i] == 0); } else { inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i]; } } index -= idx * m_outputStrides[i]; } if (internal::index_statically_eq(NumDims - 1, 1)) { eigen_assert(index < m_impl.dimensions()[NumDims - 1]); inputIndex += index; } else { if (internal::index_statically_eq(NumDims - 1, 1)) { eigen_assert(index % m_impl.dimensions()[NumDims - 1] == 0); } else { inputIndex += (index % m_impl.dimensions()[NumDims - 1]); } } return inputIndex; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeffRowMajor(Index index) const { return m_impl.coeff(indexRowMajor(index)); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketReturnType packet(Index index) const { if (internal::is_input_scalar::type>::value) { return internal::pset1(m_impl.coeff(0)); } if (static_cast(Layout) == static_cast(ColMajor)) { if (isCopy) { #ifdef EIGEN_GPU_COMPILE_PHASE // See PR 437: on NVIDIA P100 and K20m we observed a x3-4 speed up by enforcing // unaligned loads here. The reason is unclear though. return m_impl.template packet(index); #else return m_impl.template packet(index); #endif } else if (oneByN && !nByOne) { return packetNByOne(index); } else if (!oneByN && nByOne) { return packetOneByN(index); } else if (oneByN && nByOne) { return packetOneByNByOne(index); } else { return packetColMajor(index); } } else { if (isCopy) { #ifdef EIGEN_GPU_COMPILE_PHASE // See above. return m_impl.template packet(index); #else return m_impl.template packet(index); #endif } else if (oneByN && !nByOne) { return packetOneByN(index); } else if (!oneByN && nByOne) { return packetNByOne(index); } else if (oneByN && nByOne) { return packetOneByNByOne(index); } else { return packetRowMajor(index); } } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetOneByNByOne (Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; Index startDim, endDim; Index inputIndex, outputOffset, batchedIndex; if (static_cast(Layout) == static_cast(ColMajor)) { startDim = NumDims - 1; endDim = 1; } else { startDim = 0; endDim = NumDims - 2; } batchedIndex = index % m_outputStrides[startDim]; inputIndex = batchedIndex / m_outputStrides[endDim]; outputOffset = batchedIndex % m_outputStrides[endDim]; if (outputOffset + PacketSize <= m_outputStrides[endDim]) { values[0] = m_impl.coeff(inputIndex); return internal::pload1(values); } else { EIGEN_UNROLL_LOOP for (int i = 0, cur = 0; i < PacketSize; ++i, ++cur) { if (outputOffset + cur < m_outputStrides[endDim]) { values[i] = m_impl.coeff(inputIndex); } else { ++inputIndex; inputIndex = (inputIndex == m_inputStrides[startDim] ? 0 : inputIndex); values[i] = m_impl.coeff(inputIndex); outputOffset = 0; cur = 0; } } return internal::pload(values); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetOneByN(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); Index dim, inputIndex; if (static_cast(Layout) == static_cast(ColMajor)) { dim = NumDims - 1; } else { dim = 0; } inputIndex = index % m_inputStrides[dim]; if (inputIndex + PacketSize <= m_inputStrides[dim]) { return m_impl.template packet(inputIndex); } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { if (inputIndex > m_inputStrides[dim]-1) { inputIndex = 0; } values[i] = m_impl.coeff(inputIndex++); } return internal::pload(values); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetNByOne(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; Index dim, inputIndex, outputOffset; if (static_cast(Layout) == static_cast(ColMajor)) { dim = 1; } else { dim = NumDims - 2; } inputIndex = index / m_outputStrides[dim]; outputOffset = index % m_outputStrides[dim]; if (outputOffset + PacketSize <= m_outputStrides[dim]) { values[0] = m_impl.coeff(inputIndex); return internal::pload1(values); } else { EIGEN_UNROLL_LOOP for (int i = 0, cur = 0; i < PacketSize; ++i, ++cur) { if (outputOffset + cur < m_outputStrides[dim]) { values[i] = m_impl.coeff(inputIndex); } else { values[i] = m_impl.coeff(++inputIndex); outputOffset = 0; cur = 0; } } return internal::pload(values); } } // Ignore the LoadMode and always use unaligned loads since we can't guarantee // the alignment at compile time. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetColMajor(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); const Index originalIndex = index; Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { const Index idx = index / m_outputStrides[i]; if (internal::index_statically_eq(i, 1)) { eigen_assert(idx < m_impl.dimensions()[i]); inputIndex += idx * m_inputStrides[i]; } else { if (internal::index_statically_eq(i, 1)) { eigen_assert(idx % m_impl.dimensions()[i] == 0); } else { inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i]; } } index -= idx * m_outputStrides[i]; } Index innermostLoc; if (internal::index_statically_eq(0, 1)) { eigen_assert(index < m_impl.dimensions()[0]); innermostLoc = index; } else { if (internal::index_statically_eq(0, 1)) { eigen_assert(index % m_impl.dimensions()[0] == 0); innermostLoc = 0; } else { innermostLoc = index % m_impl.dimensions()[0]; } } inputIndex += innermostLoc; // Todo: this could be extended to the second dimension if we're not // broadcasting alongside the first dimension, and so on. if (innermostLoc + PacketSize <= m_impl.dimensions()[0]) { return m_impl.template packet(inputIndex); } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; values[0] = m_impl.coeff(inputIndex); EIGEN_UNROLL_LOOP for (int i = 1; i < PacketSize; ++i) { if (innermostLoc + i < m_impl.dimensions()[0]) { values[i] = m_impl.coeff(inputIndex+i); } else { values[i] = coeffColMajor(originalIndex+i); } } PacketReturnType rslt = internal::pload(values); return rslt; } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetRowMajor(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); const Index originalIndex = index; Index inputIndex = 0; EIGEN_UNROLL_LOOP for (int i = 0; i < NumDims - 1; ++i) { const Index idx = index / m_outputStrides[i]; if (internal::index_statically_eq(i, 1)) { eigen_assert(idx < m_impl.dimensions()[i]); inputIndex += idx * m_inputStrides[i]; } else { if (internal::index_statically_eq(i, 1)) { eigen_assert(idx % m_impl.dimensions()[i] == 0); } else { inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i]; } } index -= idx * m_outputStrides[i]; } Index innermostLoc; if (internal::index_statically_eq(NumDims-1, 1)) { eigen_assert(index < m_impl.dimensions()[NumDims-1]); innermostLoc = index; } else { if (internal::index_statically_eq(NumDims-1, 1)) { eigen_assert(index % m_impl.dimensions()[NumDims-1] == 0); innermostLoc = 0; } else { innermostLoc = index % m_impl.dimensions()[NumDims-1]; } } inputIndex += innermostLoc; // Todo: this could be extended to the second dimension if we're not // broadcasting alongside the first dimension, and so on. if (innermostLoc + PacketSize <= m_impl.dimensions()[NumDims-1]) { return m_impl.template packet(inputIndex); } else { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; values[0] = m_impl.coeff(inputIndex); EIGEN_UNROLL_LOOP for (int i = 1; i < PacketSize; ++i) { if (innermostLoc + i < m_impl.dimensions()[NumDims-1]) { values[i] = m_impl.coeff(inputIndex+i); } else { values[i] = coeffRowMajor(originalIndex+i); } } PacketReturnType rslt = internal::pload(values); return rslt; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { double compute_cost = TensorOpCost::AddCost(); if (!isCopy && NumDims > 0) { EIGEN_UNROLL_LOOP for (int i = NumDims - 1; i > 0; --i) { compute_cost += TensorOpCost::DivCost(); if (internal::index_statically_eq(i, 1)) { compute_cost += TensorOpCost::MulCost() + TensorOpCost::AddCost(); } else { if (!internal::index_statically_eq(i, 1)) { compute_cost += TensorOpCost::MulCost() + TensorOpCost::ModCost() + TensorOpCost::AddCost(); } } compute_cost += TensorOpCost::MulCost() + TensorOpCost::AddCost(); } } return m_impl.costPerCoeff(vectorized) + TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE internal::TensorBlockResourceRequirements getResourceRequirements() const { // TODO(wuke): Targeting L1 size is 30% faster than targeting L{-1} on large // tensors. But this might need further tuning. const size_t target_size = m_device.firstLevelCacheSize(); return internal::TensorBlockResourceRequirements::merge( m_impl.getResourceRequirements(), internal::TensorBlockResourceRequirements::skewed(target_size)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock block(TensorBlockDesc& desc, TensorBlockScratch& scratch, bool /*root_of_expr_ast*/ = false) const { BlockBroadcastingParams params = blockBroadcastingParams(desc); if (params.inner_dim_size == 0 || params.bcast_dim_size == 0) { return emptyBlock(); } // Prepare storage for the materialized broadcasting result. const typename TensorBlock::Storage block_storage = TensorBlock::prepareStorage(desc, scratch); ScalarNoConst* materialized_output = block_storage.data(); // We potentially will need to materialize input blocks. size_t materialized_input_size = 0; ScalarNoConst* materialized_input = NULL; // Initialize block broadcating iterator state for outer dimensions (outer // with regard to bcast dimension). Dimension in this array are always in // inner_most -> outer_most order (col major layout). array it; int idx = 0; for (int i = params.inner_dim_count + 1; i < NumDims; ++i) { const Index dim = IsColMajor ? i : NumDims - 1 - i; it[idx].size = params.output_dims[dim]; it[idx].count = 0; it[idx].output_stride = m_outputStrides[dim]; it[idx].output_span = it[idx].output_stride * (it[idx].size - 1); idx++; } // Write output into the beginning of `materialized_output`. Index output_offset = 0; // We will fill output block by broadcasting along the bcast dim, and // iterating over outer dimension. const Index output_size = NumDims == 0 ? 1 : params.output_dims.TotalSize(); for (Index num_output_coeffs = 0; num_output_coeffs < output_size;) { ScalarNoConst* bcast_output = materialized_output + num_output_coeffs; Index bcast_offset = desc.offset() + output_offset; // Broadcast along the bcast dimension. num_output_coeffs += BroadcastBlockAlongBcastDim( params, bcast_offset, scratch, bcast_output, &materialized_input, &materialized_input_size); // Switch to the next outer dimension. for (int j = 0; j < idx; ++j) { if (++it[j].count < it[j].size) { output_offset += it[j].output_stride; break; } it[j].count = 0; output_offset -= it[j].output_span; } } return block_storage.AsTensorMaterializedBlock(); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } const TensorEvaluator& impl() const { return m_impl; } Broadcast functor() const { return m_broadcast; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind( cl::sycl::handler& cgh) const { m_impl.bind(cgh); } #endif private: static const bool IsColMajor = static_cast(Layout) == static_cast(ColMajor); // We will build a general case block broadcasting on top of broadcasting // primitive that will do broadcasting only for the inner dimension(s) along // the first dimension smaller than the input size (it's called `bcast_dim`). // // Example: // dim: 0 1 2 (ColMajor) // input size: [9, 3, 6] // block size: [9, 2, 6] // // We will compute broadcasted block by iterating over the outer dimensions // before `bcast_dim` (only dimension `2` in this example) and computing // broadcasts along the `bcast_dim` (dimension `1` in this example). // BlockBroadcastingParams holds precomputed parameters for broadcasting a // single block along the broadcasting dimension. Sizes and strides along the // `bcast_dim` might be invalid, they will be adjusted later in // `BroadcastBlockAlongBcastDim`. struct BlockBroadcastingParams { Dimensions input_dims; // input expression dimensions Dimensions output_dims; // output block sizes Dimensions output_strides; // output block strides int inner_dim_count; // count inner dimensions matching in size int bcast_dim; // broadcasting dimension index Index bcast_dim_size; // broadcasting dimension size Index inner_dim_size; // inner dimensions size // Block sizes and strides for the input block where all dimensions before // `bcast_dim` are equal to `1`. Dimensions input_block_sizes; Dimensions input_block_strides; // Block sizes and strides for blocks with extra dimensions and strides `0`. BroadcastDimensions bcast_block_sizes; BroadcastDimensions bcast_block_strides; BroadcastDimensions bcast_input_strides; }; struct BlockBroadcastingIteratorState { Index size; Index count; Index output_stride; Index output_span; }; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE BlockBroadcastingParams blockBroadcastingParams(TensorBlockDesc& desc) const { BlockBroadcastingParams params; params.input_dims = Dimensions(m_impl.dimensions()); // Output block sizes and strides. params.output_dims = desc.dimensions(); params.output_strides = internal::strides(params.output_dims); // Find the broadcasting dimension (first dimension with output size smaller // that the input size). params.bcast_dim = 0; params.bcast_dim_size = 1; params.inner_dim_size = 1; // Count the number of inner dimensions that have the same size in the block // and in the broadcast expression. params.inner_dim_count = 0; for (int i = 0; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; if (params.output_dims[dim] == m_dimensions[dim]) { params.inner_dim_size *= params.output_dims[dim]; ++params.inner_dim_count; continue; } // First non-matching dimension is the broadcasting dimension. eigen_assert(params.output_dims[dim] < m_dimensions[dim]); params.bcast_dim = dim; params.bcast_dim_size = params.output_dims[dim]; break; } // Calculate the input block size for looking into the input. for (int i = 0; i < params.inner_dim_count; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; params.input_block_sizes[dim] = params.input_dims[dim]; } for (int i = params.inner_dim_count; i < NumDims; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; params.input_block_sizes[dim] = 1; } params.input_block_strides = internal::strides(params.input_block_sizes); // Broadcast with the 0-stride trick: Create 1 extra dim for each // broadcast, set the input stride to 0. // // When ColMajor: // // - bcast_block_sizes: // [d_0, b_0, d_1, b_1, ...] // // - bcast_block_strides: // [output_block_strides[0], output_block_strides[0] * d_0, // output_block_strides[1], output_block_strides[1] * d_1, // ...] // // - bcast_input_strides: // [input_block_strides[0], 0, // input_block_strides[1], 0, // ...]. // for (int i = 0; i < params.inner_dim_count; ++i) { const int dim = IsColMajor ? i : NumDims - i - 1; const int copy_dim = IsColMajor ? 2 * i : 2 * NumDims - 2 * i - 1; const int broadcast_dim = IsColMajor ? copy_dim + 1 : copy_dim - 1; params.bcast_block_sizes[copy_dim] = params.input_dims[dim]; params.bcast_block_sizes[broadcast_dim] = m_broadcast[dim]; params.bcast_block_strides[copy_dim] = params.output_strides[dim]; params.bcast_block_strides[broadcast_dim] = params.output_strides[dim] * params.input_dims[dim]; params.bcast_input_strides[copy_dim] = params.input_block_strides[dim]; params.bcast_input_strides[broadcast_dim] = 0; } for (int i = 2 * params.inner_dim_count; i < 2 * NumDims; ++i) { const int dim = IsColMajor ? i : 2 * NumDims - i - 1; params.bcast_block_sizes[dim] = 1; params.bcast_block_strides[dim] = 0; params.bcast_input_strides[dim] = 0; } return params; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock emptyBlock() const { DSizes dimensions; for (int i = 0; i < NumDims; ++i) dimensions[i] = 0; return TensorBlock(internal::TensorBlockKind::kView, NULL, dimensions); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index BroadcastBlockAlongBcastDim( BlockBroadcastingParams params, Index bcast_offset, TensorBlockScratch& scratch, ScalarNoConst* materialized_output, ScalarNoConst** materialized_input, size_t* materialized_input_size) const { if (params.bcast_dim_size == 1) { // We just need one block read using the ready-set values above. return BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, 0, scratch, materialized_output, materialized_input, materialized_input_size); } else if (params.input_dims[params.bcast_dim] == 1) { // Broadcast bcast dimension (< NumDims) by bcast_dim_size. const int broadcast_bcast_dim = IsColMajor ? 2 * params.inner_dim_count + 1 : 2 * NumDims - 2 * params.inner_dim_count - 2; params.bcast_block_sizes[broadcast_bcast_dim] = params.bcast_dim_size; params.bcast_input_strides[broadcast_bcast_dim] = 0; params.bcast_block_strides[broadcast_bcast_dim] = params.output_strides[params.bcast_dim]; return BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, 0, scratch, materialized_output, materialized_input, materialized_input_size); } else { // Keep track of the total number of the coefficients written to the // output block. Index num_output_coeffs = 0; // The general case. Let's denote the output block as // // x[..., a:a+bcast_dim_size, :, ..., :] // // where a:a+bcast_dim_size is a slice on the bcast_dim dimension // (< NumDims). We need to split the a:a+bcast_dim_size into possibly 3 // sub-blocks: // // (1) a:b, where b is the smallest multiple of // input_dims[bcast_dim_start] in [a, a+bcast_dim_size]. // // (2) b:c, where c is the largest multiple of input_dims[bcast_dim_start] // in [a, a+bcast_dim_size]. // // (3) c:a+bcast_dim_size . // // Or, when b and c do not exist, we just need to process the whole block // together. // Find a. const Index bcast_dim_left_index = bcast_offset / m_outputStrides[params.bcast_dim]; // Find b and c. const Index input_bcast_dim_size = params.input_dims[params.bcast_dim]; // First multiple after a. This is b when <= bcast_dim_left_index + // bcast_dim_size. const Index first_multiple = divup(bcast_dim_left_index, input_bcast_dim_size) * input_bcast_dim_size; if (first_multiple <= bcast_dim_left_index + params.bcast_dim_size) { // b exists, so does c. Find it. const Index last_multiple = (bcast_dim_left_index + params.bcast_dim_size) / input_bcast_dim_size * input_bcast_dim_size; const int copy_bcast_dim = IsColMajor ? 2 * params.inner_dim_count : 2 * NumDims - 2 * params.inner_dim_count - 1; const int broadcast_bcast_dim = IsColMajor ? 2 * params.inner_dim_count + 1 : 2 * NumDims - 2 * params.inner_dim_count - 2; if (first_multiple > bcast_dim_left_index) { const Index head_size = first_multiple - bcast_dim_left_index; params.input_block_sizes[params.bcast_dim] = head_size; params.bcast_block_sizes[copy_bcast_dim] = head_size; params.bcast_input_strides[copy_bcast_dim] = params.input_block_strides[params.bcast_dim]; params.bcast_block_strides[copy_bcast_dim] = params.output_strides[params.bcast_dim]; params.bcast_block_sizes[broadcast_bcast_dim] = 1; params.bcast_input_strides[broadcast_bcast_dim] = 0; params.bcast_block_strides[broadcast_bcast_dim] = params.output_strides[params.bcast_dim] * params.input_dims[params.bcast_dim]; num_output_coeffs += BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, 0, scratch, materialized_output, materialized_input, materialized_input_size); } if (first_multiple < last_multiple) { params.input_block_sizes[params.bcast_dim] = input_bcast_dim_size; params.bcast_block_sizes[copy_bcast_dim] = input_bcast_dim_size; params.bcast_input_strides[copy_bcast_dim] = params.input_block_strides[params.bcast_dim]; params.bcast_block_strides[copy_bcast_dim] = params.output_strides[params.bcast_dim]; params.bcast_block_sizes[broadcast_bcast_dim] = (last_multiple - first_multiple) / input_bcast_dim_size; params.bcast_input_strides[broadcast_bcast_dim] = 0; params.bcast_block_strides[broadcast_bcast_dim] = params.output_strides[params.bcast_dim] * params.input_dims[params.bcast_dim]; const Index offset = (first_multiple - bcast_dim_left_index) * m_outputStrides[params.bcast_dim]; num_output_coeffs += BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, offset, scratch, materialized_output, materialized_input, materialized_input_size); } if (last_multiple < bcast_dim_left_index + params.bcast_dim_size) { const Index tail_size = bcast_dim_left_index + params.bcast_dim_size - last_multiple; params.input_block_sizes[params.bcast_dim] = tail_size; params.bcast_block_sizes[copy_bcast_dim] = tail_size; params.bcast_input_strides[copy_bcast_dim] = params.input_block_strides[params.bcast_dim]; params.bcast_block_strides[copy_bcast_dim] = params.output_strides[params.bcast_dim]; params.bcast_block_sizes[broadcast_bcast_dim] = 1; params.bcast_input_strides[broadcast_bcast_dim] = 0; params.bcast_block_strides[broadcast_bcast_dim] = params.output_strides[params.bcast_dim] * params.input_dims[params.bcast_dim]; const Index offset = (last_multiple - bcast_dim_left_index) * m_outputStrides[params.bcast_dim]; num_output_coeffs += BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, offset, scratch, materialized_output, materialized_input, materialized_input_size); } } else { // b and c do not exist. const int copy_bcast_dim = IsColMajor ? 2 * params.inner_dim_count : 2 * NumDims - 2 * params.inner_dim_count - 1; params.input_block_sizes[params.bcast_dim] = params.bcast_dim_size; params.bcast_block_sizes[copy_bcast_dim] = params.bcast_dim_size; params.bcast_input_strides[copy_bcast_dim] = params.input_block_strides[params.bcast_dim]; params.bcast_block_strides[copy_bcast_dim] = params.output_strides[params.bcast_dim]; num_output_coeffs += BroadcastBlock( params.input_block_sizes, params.input_block_strides, params.bcast_block_sizes, params.bcast_block_strides, params.bcast_input_strides, bcast_offset, 0, scratch, materialized_output, materialized_input, materialized_input_size); } return num_output_coeffs; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index BroadcastBlock( const Dimensions& input_block_sizes, const Dimensions& input_block_strides, const BroadcastDimensions& bcast_block_sizes, const BroadcastDimensions& bcast_block_strides, const BroadcastDimensions& bcast_input_strides, Index bcast_offset, Index offset, TensorBlockScratch& scratch, ScalarNoConst* materialized_output, ScalarNoConst** materialized_input, size_t* materialized_input_size) const { // ---------------------------------------------------------------------- // // Tensor block descriptor for reading block from the input. const Index input_offset = bcast_offset + offset; TensorBlockDesc input_desc( IsColMajor ? indexColMajor(input_offset) : indexRowMajor(input_offset), input_block_sizes); ArgTensorBlock input_block = m_impl.block(input_desc, scratch); // ---------------------------------------------------------------------- // // Materialize input block into a temporary memory buffer only if it's not // already available in the arg block. const ScalarNoConst* input_buffer = NULL; if (input_block.data() != NULL) { // Input block already has raw data, there is no need to materialize it. input_buffer = input_block.data(); } else { // Otherwise we have to do block assignment into a temporary buffer. // Maybe reuse previously allocated buffer, or allocate a new one with a // scratch allocator. const size_t input_total_size = input_block_sizes.TotalSize(); if (*materialized_input == NULL || *materialized_input_size < input_total_size) { *materialized_input_size = input_total_size; void* mem = scratch.allocate(*materialized_input_size * sizeof(Scalar)); *materialized_input = static_cast(mem); } typedef internal::TensorBlockAssignment< ScalarNoConst, NumDims, typename ArgTensorBlock::XprType, Index> TensorBlockAssignment; TensorBlockAssignment::Run( TensorBlockAssignment::target(input_block_sizes, input_block_strides, *materialized_input), input_block.expr()); input_buffer = *materialized_input; } // ---------------------------------------------------------------------- // // Copy data from materialized input block to the materialized output, using // given broadcast strides (strides with zeroes). typedef internal::TensorBlockIO TensorBlockIO; typename TensorBlockIO::Src src(bcast_input_strides, input_buffer); typename TensorBlockIO::Dst dst(bcast_block_sizes, bcast_block_strides, materialized_output + offset); return TensorBlockIO::Copy(dst, src); } protected: const Device EIGEN_DEVICE_REF m_device; const typename internal::remove_reference::type m_broadcast; Dimensions m_dimensions; array m_outputStrides; array m_inputStrides; TensorEvaluator m_impl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorScan.h0000644000176200001440000004717314562417221025262 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Igor Babuschkin // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_SCAN_H #define EIGEN_CXX11_TENSOR_TENSOR_SCAN_H namespace Eigen { namespace internal { template struct traits > : public traits { typedef typename XprType::Scalar Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorScanOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorScanOp type; }; } // end namespace internal /** \class TensorScan * \ingroup CXX11_Tensor_Module * * \brief Tensor scan class. */ template class TensorScanOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorScanOp( const XprType& expr, const Index& axis, bool exclusive = false, const Op& op = Op()) : m_expr(expr), m_axis(axis), m_accumulator(op), m_exclusive(exclusive) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Index axis() const { return m_axis; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const XprType& expression() const { return m_expr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Op accumulator() const { return m_accumulator; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool exclusive() const { return m_exclusive; } protected: typename XprType::Nested m_expr; const Index m_axis; const Op m_accumulator; const bool m_exclusive; }; namespace internal { template EIGEN_STRONG_INLINE void ReduceScalar(Self& self, Index offset, typename Self::CoeffReturnType* data) { // Compute the scan along the axis, starting at the given offset typename Self::CoeffReturnType accum = self.accumulator().initialize(); if (self.stride() == 1) { if (self.exclusive()) { for (Index curr = offset; curr < offset + self.size(); ++curr) { data[curr] = self.accumulator().finalize(accum); self.accumulator().reduce(self.inner().coeff(curr), &accum); } } else { for (Index curr = offset; curr < offset + self.size(); ++curr) { self.accumulator().reduce(self.inner().coeff(curr), &accum); data[curr] = self.accumulator().finalize(accum); } } } else { if (self.exclusive()) { for (Index idx3 = 0; idx3 < self.size(); idx3++) { Index curr = offset + idx3 * self.stride(); data[curr] = self.accumulator().finalize(accum); self.accumulator().reduce(self.inner().coeff(curr), &accum); } } else { for (Index idx3 = 0; idx3 < self.size(); idx3++) { Index curr = offset + idx3 * self.stride(); self.accumulator().reduce(self.inner().coeff(curr), &accum); data[curr] = self.accumulator().finalize(accum); } } } } template EIGEN_STRONG_INLINE void ReducePacket(Self& self, Index offset, typename Self::CoeffReturnType* data) { using Scalar = typename Self::CoeffReturnType; using Packet = typename Self::PacketReturnType; // Compute the scan along the axis, starting at the calculated offset Packet accum = self.accumulator().template initializePacket(); if (self.stride() == 1) { if (self.exclusive()) { for (Index curr = offset; curr < offset + self.size(); ++curr) { internal::pstoreu(data + curr, self.accumulator().finalizePacket(accum)); self.accumulator().reducePacket(self.inner().template packet(curr), &accum); } } else { for (Index curr = offset; curr < offset + self.size(); ++curr) { self.accumulator().reducePacket(self.inner().template packet(curr), &accum); internal::pstoreu(data + curr, self.accumulator().finalizePacket(accum)); } } } else { if (self.exclusive()) { for (Index idx3 = 0; idx3 < self.size(); idx3++) { const Index curr = offset + idx3 * self.stride(); internal::pstoreu(data + curr, self.accumulator().finalizePacket(accum)); self.accumulator().reducePacket(self.inner().template packet(curr), &accum); } } else { for (Index idx3 = 0; idx3 < self.size(); idx3++) { const Index curr = offset + idx3 * self.stride(); self.accumulator().reducePacket(self.inner().template packet(curr), &accum); internal::pstoreu(data + curr, self.accumulator().finalizePacket(accum)); } } } } template struct ReduceBlock { EIGEN_STRONG_INLINE void operator()(Self& self, Index idx1, typename Self::CoeffReturnType* data) { for (Index idx2 = 0; idx2 < self.stride(); idx2++) { // Calculate the starting offset for the scan Index offset = idx1 + idx2; ReduceScalar(self, offset, data); } } }; // Specialization for vectorized reduction. template struct ReduceBlock { EIGEN_STRONG_INLINE void operator()(Self& self, Index idx1, typename Self::CoeffReturnType* data) { using Packet = typename Self::PacketReturnType; const int PacketSize = internal::unpacket_traits::size; Index idx2 = 0; for (; idx2 + PacketSize <= self.stride(); idx2 += PacketSize) { // Calculate the starting offset for the packet scan Index offset = idx1 + idx2; ReducePacket(self, offset, data); } for (; idx2 < self.stride(); idx2++) { // Calculate the starting offset for the scan Index offset = idx1 + idx2; ReduceScalar(self, offset, data); } } }; // Single-threaded CPU implementation of scan template ::PacketAccess && internal::reducer_traits::PacketAccess)> struct ScanLauncher { void operator()(Self& self, typename Self::CoeffReturnType* data) { Index total_size = internal::array_prod(self.dimensions()); // We fix the index along the scan axis to 0 and perform a // scan per remaining entry. The iteration is split into two nested // loops to avoid an integer division by keeping track of each idx1 and // idx2. for (Index idx1 = 0; idx1 < total_size; idx1 += self.stride() * self.size()) { ReduceBlock block_reducer; block_reducer(self, idx1, data); } } }; #ifdef EIGEN_USE_THREADS // Adjust block_size to avoid false sharing of cachelines among // threads. Currently set to twice the cache line size on Intel and ARM // processors. EIGEN_STRONG_INLINE Index AdjustBlockSize(Index item_size, Index block_size) { EIGEN_CONSTEXPR Index kBlockAlignment = 128; const Index items_per_cacheline = numext::maxi(1, kBlockAlignment / item_size); return items_per_cacheline * divup(block_size, items_per_cacheline); } template struct ReduceBlock { EIGEN_STRONG_INLINE void operator()(Self& self, Index idx1, typename Self::CoeffReturnType* data) { using Scalar = typename Self::CoeffReturnType; using Packet = typename Self::PacketReturnType; const int PacketSize = internal::unpacket_traits::size; Index num_scalars = self.stride(); Index num_packets = 0; if (self.stride() >= PacketSize) { num_packets = self.stride() / PacketSize; self.device().parallelFor( num_packets, TensorOpCost(PacketSize * self.size(), PacketSize * self.size(), 16 * PacketSize * self.size(), true, PacketSize), // Make the shard size large enough that two neighboring threads // won't write to the same cacheline of `data`. [=](Index blk_size) { return AdjustBlockSize(PacketSize * sizeof(Scalar), blk_size); }, [&](Index first, Index last) { for (Index packet = first; packet < last; ++packet) { const Index idx2 = packet * PacketSize; ReducePacket(self, idx1 + idx2, data); } }); num_scalars -= num_packets * PacketSize; } self.device().parallelFor( num_scalars, TensorOpCost(self.size(), self.size(), 16 * self.size()), // Make the shard size large enough that two neighboring threads // won't write to the same cacheline of `data`. [=](Index blk_size) { return AdjustBlockSize(sizeof(Scalar), blk_size); }, [&](Index first, Index last) { for (Index scalar = first; scalar < last; ++scalar) { const Index idx2 = num_packets * PacketSize + scalar; ReduceScalar(self, idx1 + idx2, data); } }); } }; template struct ReduceBlock { EIGEN_STRONG_INLINE void operator()(Self& self, Index idx1, typename Self::CoeffReturnType* data) { using Scalar = typename Self::CoeffReturnType; self.device().parallelFor( self.stride(), TensorOpCost(self.size(), self.size(), 16 * self.size()), // Make the shard size large enough that two neighboring threads // won't write to the same cacheline of `data`. [=](Index blk_size) { return AdjustBlockSize(sizeof(Scalar), blk_size); }, [&](Index first, Index last) { for (Index idx2 = first; idx2 < last; ++idx2) { ReduceScalar(self, idx1 + idx2, data); } }); } }; // Specialization for multi-threaded execution. template struct ScanLauncher { void operator()(Self& self, typename Self::CoeffReturnType* data) { using Scalar = typename Self::CoeffReturnType; using Packet = typename Self::PacketReturnType; const int PacketSize = internal::unpacket_traits::size; const Index total_size = internal::array_prod(self.dimensions()); const Index inner_block_size = self.stride() * self.size(); bool parallelize_by_outer_blocks = (total_size >= (self.stride() * inner_block_size)); if ((parallelize_by_outer_blocks && total_size <= 4096) || (!parallelize_by_outer_blocks && self.stride() < PacketSize)) { ScanLauncher launcher; launcher(self, data); return; } if (parallelize_by_outer_blocks) { // Parallelize over outer blocks. const Index num_outer_blocks = total_size / inner_block_size; self.device().parallelFor( num_outer_blocks, TensorOpCost(inner_block_size, inner_block_size, 16 * PacketSize * inner_block_size, Vectorize, PacketSize), [=](Index blk_size) { return AdjustBlockSize(inner_block_size * sizeof(Scalar), blk_size); }, [&](Index first, Index last) { for (Index idx1 = first; idx1 < last; ++idx1) { ReduceBlock block_reducer; block_reducer(self, idx1 * inner_block_size, data); } }); } else { // Parallelize over inner packets/scalars dimensions when the reduction // axis is not an inner dimension. ReduceBlock block_reducer; for (Index idx1 = 0; idx1 < total_size; idx1 += self.stride() * self.size()) { block_reducer(self, idx1, data); } } } }; #endif // EIGEN_USE_THREADS #if defined(EIGEN_USE_GPU) && (defined(EIGEN_GPUCC)) // GPU implementation of scan // TODO(ibab) This placeholder implementation performs multiple scans in // parallel, but it would be better to use a parallel scan algorithm and // optimize memory access. template __global__ EIGEN_HIP_LAUNCH_BOUNDS_1024 void ScanKernel(Self self, Index total_size, typename Self::CoeffReturnType* data) { // Compute offset as in the CPU version Index val = threadIdx.x + blockIdx.x * blockDim.x; Index offset = (val / self.stride()) * self.stride() * self.size() + val % self.stride(); if (offset + (self.size() - 1) * self.stride() < total_size) { // Compute the scan along the axis, starting at the calculated offset typename Self::CoeffReturnType accum = self.accumulator().initialize(); for (Index idx = 0; idx < self.size(); idx++) { Index curr = offset + idx * self.stride(); if (self.exclusive()) { data[curr] = self.accumulator().finalize(accum); self.accumulator().reduce(self.inner().coeff(curr), &accum); } else { self.accumulator().reduce(self.inner().coeff(curr), &accum); data[curr] = self.accumulator().finalize(accum); } } } __syncthreads(); } template struct ScanLauncher { void operator()(const Self& self, typename Self::CoeffReturnType* data) { Index total_size = internal::array_prod(self.dimensions()); Index num_blocks = (total_size / self.size() + 63) / 64; Index block_size = 64; LAUNCH_GPU_KERNEL((ScanKernel), num_blocks, block_size, 0, self.device(), self, total_size, data); } }; #endif // EIGEN_USE_GPU && (EIGEN_GPUCC) } // namespace internal // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorScanOp XprType; typedef typename XprType::Index Index; typedef const ArgType ChildTypeNoConst; typedef const ArgType ChildType; static const int NumDims = internal::array_size::Dimensions>::value; typedef DSizes Dimensions; typedef typename internal::remove_const::type Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; typedef TensorEvaluator, Device> Self; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = (PacketType::size > 1), BlockAccess = false, PreferBlockAccess = false, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = true }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device), m_device(device), m_exclusive(op.exclusive()), m_accumulator(op.accumulator()), m_size(m_impl.dimensions()[op.axis()]), m_stride(1), m_consume_dim(op.axis()), m_output(NULL) { // Accumulating a scalar isn't supported. EIGEN_STATIC_ASSERT((NumDims > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); eigen_assert(op.axis() >= 0 && op.axis() < NumDims); // Compute stride of scan axis const Dimensions& dims = m_impl.dimensions(); if (static_cast(Layout) == static_cast(ColMajor)) { for (int i = 0; i < op.axis(); ++i) { m_stride = m_stride * dims[i]; } } else { // dims can only be indexed through unsigned integers, // so let's use an unsigned type to let the compiler knows. // This prevents stupid warnings: ""'*((void*)(& evaluator)+64)[18446744073709551615]' may be used uninitialized in this function" unsigned int axis = internal::convert_index(op.axis()); for (unsigned int i = NumDims - 1; i > axis; --i) { m_stride = m_stride * dims[i]; } } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_impl.dimensions(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Index& stride() const { return m_stride; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Index& consume_dim() const { return m_consume_dim; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Index& size() const { return m_size; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Op& accumulator() const { return m_accumulator; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool exclusive() const { return m_exclusive; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TensorEvaluator& inner() const { return m_impl; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Device& device() const { return m_device; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType data) { m_impl.evalSubExprsIfNeeded(NULL); internal::ScanLauncher launcher; if (data) { launcher(*this, data); return false; } const Index total_size = internal::array_prod(dimensions()); m_output = static_cast(m_device.get((Scalar*) m_device.allocate_temp(total_size * sizeof(Scalar)))); launcher(*this, m_output); return true; } template EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const { return internal::ploadt(m_output + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvaluatorPointerType data() const { return m_output; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_output[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool) const { return TensorOpCost(sizeof(CoeffReturnType), 0, 0); } EIGEN_STRONG_INLINE void cleanup() { if (m_output) { m_device.deallocate_temp(m_output); m_output = NULL; } m_impl.cleanup(); } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); m_output.bind(cgh); } #endif protected: TensorEvaluator m_impl; const Device EIGEN_DEVICE_REF m_device; const bool m_exclusive; Op m_accumulator; const Index m_size; Index m_stride; Index m_consume_dim; EvaluatorPointerType m_output; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_SCAN_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h0000644000176200001440000007261114562417221026620 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. #ifndef EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H #define EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H namespace Eigen { /** \class TensorVolumePatch * \ingroup CXX11_Tensor_Module * * \brief Patch extraction specialized for processing of volumetric data. * This assumes that the input has a least 4 dimensions ordered as follows: * - channels * - planes * - rows * - columns * - (optional) additional dimensions such as time or batch size. * Calling the volume patch code with patch_planes, patch_rows, and patch_cols * is equivalent to calling the regular patch extraction code with parameters * d, patch_planes, patch_rows, patch_cols, and 1 for all the additional * dimensions. */ namespace internal { template struct traits > : public traits { typedef typename internal::remove_const::type Scalar; typedef traits XprTraits; typedef typename XprTraits::StorageKind StorageKind; typedef typename XprTraits::Index Index; typedef typename XprType::Nested Nested; typedef typename remove_reference::type _Nested; static const int NumDimensions = XprTraits::NumDimensions + 1; static const int Layout = XprTraits::Layout; typedef typename XprTraits::PointerType PointerType; }; template struct eval, Eigen::Dense> { typedef const TensorVolumePatchOp& type; }; template struct nested, 1, typename eval >::type> { typedef TensorVolumePatchOp type; }; } // end namespace internal template class TensorVolumePatchOp : public TensorBase, ReadOnlyAccessors> { public: typedef typename Eigen::internal::traits::Scalar Scalar; typedef typename Eigen::NumTraits::Real RealScalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename Eigen::internal::nested::type Nested; typedef typename Eigen::internal::traits::StorageKind StorageKind; typedef typename Eigen::internal::traits::Index Index; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorVolumePatchOp(const XprType& expr, DenseIndex patch_planes, DenseIndex patch_rows, DenseIndex patch_cols, DenseIndex plane_strides, DenseIndex row_strides, DenseIndex col_strides, DenseIndex in_plane_strides, DenseIndex in_row_strides, DenseIndex in_col_strides, DenseIndex plane_inflate_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides, PaddingType padding_type, Scalar padding_value) : m_xpr(expr), m_patch_planes(patch_planes), m_patch_rows(patch_rows), m_patch_cols(patch_cols), m_plane_strides(plane_strides), m_row_strides(row_strides), m_col_strides(col_strides), m_in_plane_strides(in_plane_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides), m_plane_inflate_strides(plane_inflate_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides), m_padding_explicit(false), m_padding_top_z(0), m_padding_bottom_z(0), m_padding_top(0), m_padding_bottom(0), m_padding_left(0), m_padding_right(0), m_padding_type(padding_type), m_padding_value(padding_value) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorVolumePatchOp(const XprType& expr, DenseIndex patch_planes, DenseIndex patch_rows, DenseIndex patch_cols, DenseIndex plane_strides, DenseIndex row_strides, DenseIndex col_strides, DenseIndex in_plane_strides, DenseIndex in_row_strides, DenseIndex in_col_strides, DenseIndex plane_inflate_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides, DenseIndex padding_top_z, DenseIndex padding_bottom_z, DenseIndex padding_top, DenseIndex padding_bottom, DenseIndex padding_left, DenseIndex padding_right, Scalar padding_value) : m_xpr(expr), m_patch_planes(patch_planes), m_patch_rows(patch_rows), m_patch_cols(patch_cols), m_plane_strides(plane_strides), m_row_strides(row_strides), m_col_strides(col_strides), m_in_plane_strides(in_plane_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides), m_plane_inflate_strides(plane_inflate_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides), m_padding_explicit(true), m_padding_top_z(padding_top_z), m_padding_bottom_z(padding_bottom_z), m_padding_top(padding_top), m_padding_bottom(padding_bottom), m_padding_left(padding_left), m_padding_right(padding_right), m_padding_type(PADDING_VALID), m_padding_value(padding_value) {} EIGEN_DEVICE_FUNC DenseIndex patch_planes() const { return m_patch_planes; } EIGEN_DEVICE_FUNC DenseIndex patch_rows() const { return m_patch_rows; } EIGEN_DEVICE_FUNC DenseIndex patch_cols() const { return m_patch_cols; } EIGEN_DEVICE_FUNC DenseIndex plane_strides() const { return m_plane_strides; } EIGEN_DEVICE_FUNC DenseIndex row_strides() const { return m_row_strides; } EIGEN_DEVICE_FUNC DenseIndex col_strides() const { return m_col_strides; } EIGEN_DEVICE_FUNC DenseIndex in_plane_strides() const { return m_in_plane_strides; } EIGEN_DEVICE_FUNC DenseIndex in_row_strides() const { return m_in_row_strides; } EIGEN_DEVICE_FUNC DenseIndex in_col_strides() const { return m_in_col_strides; } EIGEN_DEVICE_FUNC DenseIndex plane_inflate_strides() const { return m_plane_inflate_strides; } EIGEN_DEVICE_FUNC DenseIndex row_inflate_strides() const { return m_row_inflate_strides; } EIGEN_DEVICE_FUNC DenseIndex col_inflate_strides() const { return m_col_inflate_strides; } EIGEN_DEVICE_FUNC bool padding_explicit() const { return m_padding_explicit; } EIGEN_DEVICE_FUNC DenseIndex padding_top_z() const { return m_padding_top_z; } EIGEN_DEVICE_FUNC DenseIndex padding_bottom_z() const { return m_padding_bottom_z; } EIGEN_DEVICE_FUNC DenseIndex padding_top() const { return m_padding_top; } EIGEN_DEVICE_FUNC DenseIndex padding_bottom() const { return m_padding_bottom; } EIGEN_DEVICE_FUNC DenseIndex padding_left() const { return m_padding_left; } EIGEN_DEVICE_FUNC DenseIndex padding_right() const { return m_padding_right; } EIGEN_DEVICE_FUNC PaddingType padding_type() const { return m_padding_type; } EIGEN_DEVICE_FUNC Scalar padding_value() const { return m_padding_value; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& expression() const { return m_xpr; } protected: typename XprType::Nested m_xpr; const DenseIndex m_patch_planes; const DenseIndex m_patch_rows; const DenseIndex m_patch_cols; const DenseIndex m_plane_strides; const DenseIndex m_row_strides; const DenseIndex m_col_strides; const DenseIndex m_in_plane_strides; const DenseIndex m_in_row_strides; const DenseIndex m_in_col_strides; const DenseIndex m_plane_inflate_strides; const DenseIndex m_row_inflate_strides; const DenseIndex m_col_inflate_strides; const bool m_padding_explicit; const DenseIndex m_padding_top_z; const DenseIndex m_padding_bottom_z; const DenseIndex m_padding_top; const DenseIndex m_padding_bottom; const DenseIndex m_padding_left; const DenseIndex m_padding_right; const PaddingType m_padding_type; const Scalar m_padding_value; }; // Eval as rvalue template struct TensorEvaluator, Device> { typedef TensorVolumePatchOp XprType; typedef typename XprType::Index Index; static const int NumInputDims = internal::array_size::Dimensions>::value; static const int NumDims = NumInputDims + 1; typedef DSizes Dimensions; typedef typename internal::remove_const::type Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename PacketType::type PacketReturnType; static const int PacketSize = PacketType::size; typedef StorageMemory Storage; typedef typename Storage::Type EvaluatorPointerType; enum { IsAligned = false, PacketAccess = TensorEvaluator::PacketAccess, BlockAccess = false, PreferBlockAccess = TensorEvaluator::PreferBlockAccess, Layout = TensorEvaluator::Layout, CoordAccess = false, RawAccess = false }; //===- Tensor block evaluation strategy (see TensorBlock.h) -------------===// typedef internal::TensorBlockNotImplemented TensorBlock; //===--------------------------------------------------------------------===// EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_impl(op.expression(), device) { EIGEN_STATIC_ASSERT((NumDims >= 5), YOU_MADE_A_PROGRAMMING_MISTAKE); m_paddingValue = op.padding_value(); const typename TensorEvaluator::Dimensions& input_dims = m_impl.dimensions(); // Cache a few variables. if (static_cast(Layout) == static_cast(ColMajor)) { m_inputDepth = input_dims[0]; m_inputPlanes = input_dims[1]; m_inputRows = input_dims[2]; m_inputCols = input_dims[3]; } else { m_inputDepth = input_dims[NumInputDims-1]; m_inputPlanes = input_dims[NumInputDims-2]; m_inputRows = input_dims[NumInputDims-3]; m_inputCols = input_dims[NumInputDims-4]; } m_plane_strides = op.plane_strides(); m_row_strides = op.row_strides(); m_col_strides = op.col_strides(); // Input strides and effective input/patch size m_in_plane_strides = op.in_plane_strides(); m_in_row_strides = op.in_row_strides(); m_in_col_strides = op.in_col_strides(); m_plane_inflate_strides = op.plane_inflate_strides(); m_row_inflate_strides = op.row_inflate_strides(); m_col_inflate_strides = op.col_inflate_strides(); // The "effective" spatial size after inflating data with zeros. m_input_planes_eff = (m_inputPlanes - 1) * m_plane_inflate_strides + 1; m_input_rows_eff = (m_inputRows - 1) * m_row_inflate_strides + 1; m_input_cols_eff = (m_inputCols - 1) * m_col_inflate_strides + 1; m_patch_planes_eff = op.patch_planes() + (op.patch_planes() - 1) * (m_in_plane_strides - 1); m_patch_rows_eff = op.patch_rows() + (op.patch_rows() - 1) * (m_in_row_strides - 1); m_patch_cols_eff = op.patch_cols() + (op.patch_cols() - 1) * (m_in_col_strides - 1); if (op.padding_explicit()) { m_outputPlanes = numext::ceil((m_input_planes_eff + op.padding_top_z() + op.padding_bottom_z() - m_patch_planes_eff + 1.f) / static_cast(m_plane_strides)); m_outputRows = numext::ceil((m_input_rows_eff + op.padding_top() + op.padding_bottom() - m_patch_rows_eff + 1.f) / static_cast(m_row_strides)); m_outputCols = numext::ceil((m_input_cols_eff + op.padding_left() + op.padding_right() - m_patch_cols_eff + 1.f) / static_cast(m_col_strides)); m_planePaddingTop = op.padding_top_z(); m_rowPaddingTop = op.padding_top(); m_colPaddingLeft = op.padding_left(); } else { // Computing padding from the type switch (op.padding_type()) { case PADDING_VALID: m_outputPlanes = numext::ceil((m_input_planes_eff - m_patch_planes_eff + 1.f) / static_cast(m_plane_strides)); m_outputRows = numext::ceil((m_input_rows_eff - m_patch_rows_eff + 1.f) / static_cast(m_row_strides)); m_outputCols = numext::ceil((m_input_cols_eff - m_patch_cols_eff + 1.f) / static_cast(m_col_strides)); m_planePaddingTop = 0; m_rowPaddingTop = 0; m_colPaddingLeft = 0; break; case PADDING_SAME: { m_outputPlanes = numext::ceil(m_input_planes_eff / static_cast(m_plane_strides)); m_outputRows = numext::ceil(m_input_rows_eff / static_cast(m_row_strides)); m_outputCols = numext::ceil(m_input_cols_eff / static_cast(m_col_strides)); const Index dz = (m_outputPlanes - 1) * m_plane_strides + m_patch_planes_eff - m_input_planes_eff; const Index dy = (m_outputRows - 1) * m_row_strides + m_patch_rows_eff - m_input_rows_eff; const Index dx = (m_outputCols - 1) * m_col_strides + m_patch_cols_eff - m_input_cols_eff; m_planePaddingTop = dz / 2; m_rowPaddingTop = dy / 2; m_colPaddingLeft = dx / 2; break; } default: eigen_assert(false && "unexpected padding"); } } eigen_assert(m_outputRows > 0); eigen_assert(m_outputCols > 0); eigen_assert(m_outputPlanes > 0); // Dimensions for result of extraction. if (static_cast(Layout) == static_cast(ColMajor)) { // ColMajor // 0: depth // 1: patch_planes // 2: patch_rows // 3: patch_cols // 4: number of patches // 5 and beyond: anything else (such as batch). m_dimensions[0] = input_dims[0]; m_dimensions[1] = op.patch_planes(); m_dimensions[2] = op.patch_rows(); m_dimensions[3] = op.patch_cols(); m_dimensions[4] = m_outputPlanes * m_outputRows * m_outputCols; for (int i = 5; i < NumDims; ++i) { m_dimensions[i] = input_dims[i-1]; } } else { // RowMajor // NumDims-1: depth // NumDims-2: patch_planes // NumDims-3: patch_rows // NumDims-4: patch_cols // NumDims-5: number of patches // NumDims-6 and beyond: anything else (such as batch). m_dimensions[NumDims-1] = input_dims[NumInputDims-1]; m_dimensions[NumDims-2] = op.patch_planes(); m_dimensions[NumDims-3] = op.patch_rows(); m_dimensions[NumDims-4] = op.patch_cols(); m_dimensions[NumDims-5] = m_outputPlanes * m_outputRows * m_outputCols; for (int i = NumDims-6; i >= 0; --i) { m_dimensions[i] = input_dims[i]; } } // Strides for the output tensor. if (static_cast(Layout) == static_cast(ColMajor)) { m_rowStride = m_dimensions[1]; m_colStride = m_dimensions[2] * m_rowStride; m_patchStride = m_colStride * m_dimensions[3] * m_dimensions[0]; m_otherStride = m_patchStride * m_dimensions[4]; } else { m_rowStride = m_dimensions[NumDims-2]; m_colStride = m_dimensions[NumDims-3] * m_rowStride; m_patchStride = m_colStride * m_dimensions[NumDims-4] * m_dimensions[NumDims-1]; m_otherStride = m_patchStride * m_dimensions[NumDims-5]; } // Strides for navigating through the input tensor. m_planeInputStride = m_inputDepth; m_rowInputStride = m_inputDepth * m_inputPlanes; m_colInputStride = m_inputDepth * m_inputRows * m_inputPlanes; m_otherInputStride = m_inputDepth * m_inputRows * m_inputCols * m_inputPlanes; m_outputPlanesRows = m_outputPlanes * m_outputRows; // Fast representations of different variables. m_fastOtherStride = internal::TensorIntDivisor(m_otherStride); m_fastPatchStride = internal::TensorIntDivisor(m_patchStride); m_fastColStride = internal::TensorIntDivisor(m_colStride); m_fastRowStride = internal::TensorIntDivisor(m_rowStride); m_fastInputRowStride = internal::TensorIntDivisor(m_row_inflate_strides); m_fastInputColStride = internal::TensorIntDivisor(m_col_inflate_strides); m_fastInputPlaneStride = internal::TensorIntDivisor(m_plane_inflate_strides); m_fastInputColsEff = internal::TensorIntDivisor(m_input_cols_eff); m_fastOutputPlanes = internal::TensorIntDivisor(m_outputPlanes); m_fastOutputPlanesRows = internal::TensorIntDivisor(m_outputPlanesRows); if (static_cast(Layout) == static_cast(ColMajor)) { m_fastOutputDepth = internal::TensorIntDivisor(m_dimensions[0]); } else { m_fastOutputDepth = internal::TensorIntDivisor(m_dimensions[NumDims-1]); } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; } EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(EvaluatorPointerType /*data*/) { m_impl.evalSubExprsIfNeeded(NULL); return true; } EIGEN_STRONG_INLINE void cleanup() { m_impl.cleanup(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { // Patch index corresponding to the passed in index. const Index patchIndex = index / m_fastPatchStride; // Spatial offset within the patch. This has to be translated into 3D // coordinates within the patch. const Index patchOffset = (index - patchIndex * m_patchStride) / m_fastOutputDepth; // Batch, etc. const Index otherIndex = (NumDims == 5) ? 0 : index / m_fastOtherStride; const Index patch3DIndex = (NumDims == 5) ? patchIndex : (index - otherIndex * m_otherStride) / m_fastPatchStride; // Calculate column index in the input original tensor. const Index colIndex = patch3DIndex / m_fastOutputPlanesRows; const Index colOffset = patchOffset / m_fastColStride; const Index inputCol = colIndex * m_col_strides + colOffset * m_in_col_strides - m_colPaddingLeft; const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0); if (inputCol < 0 || inputCol >= m_input_cols_eff || ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides))) { return Scalar(m_paddingValue); } // Calculate row index in the original input tensor. const Index rowIndex = (patch3DIndex - colIndex * m_outputPlanesRows) / m_fastOutputPlanes; const Index rowOffset = (patchOffset - colOffset * m_colStride) / m_fastRowStride; const Index inputRow = rowIndex * m_row_strides + rowOffset * m_in_row_strides - m_rowPaddingTop; const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0); if (inputRow < 0 || inputRow >= m_input_rows_eff || ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) { return Scalar(m_paddingValue); } // Calculate plane index in the original input tensor. const Index planeIndex = (patch3DIndex - m_outputPlanes * (colIndex * m_outputRows + rowIndex)); const Index planeOffset = patchOffset - colOffset * m_colStride - rowOffset * m_rowStride; const Index inputPlane = planeIndex * m_plane_strides + planeOffset * m_in_plane_strides - m_planePaddingTop; const Index origInputPlane = (m_plane_inflate_strides == 1) ? inputPlane : ((inputPlane >= 0) ? (inputPlane / m_fastInputPlaneStride) : 0); if (inputPlane < 0 || inputPlane >= m_input_planes_eff || ((m_plane_inflate_strides != 1) && (inputPlane != origInputPlane * m_plane_inflate_strides))) { return Scalar(m_paddingValue); } const int depth_index = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - 1; const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index]; const Index inputIndex = depth + origInputRow * m_rowInputStride + origInputCol * m_colInputStride + origInputPlane * m_planeInputStride + otherIndex * m_otherInputStride; return m_impl.coeff(inputIndex); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const { EIGEN_STATIC_ASSERT((PacketSize > 1), YOU_MADE_A_PROGRAMMING_MISTAKE) eigen_assert(index+PacketSize-1 < dimensions().TotalSize()); if (m_in_row_strides != 1 || m_in_col_strides != 1 || m_row_inflate_strides != 1 || m_col_inflate_strides != 1 || m_in_plane_strides != 1 || m_plane_inflate_strides != 1) { return packetWithPossibleZero(index); } const Index indices[2] = {index, index + PacketSize - 1}; const Index patchIndex = indices[0] / m_fastPatchStride; if (patchIndex != indices[1] / m_fastPatchStride) { return packetWithPossibleZero(index); } const Index otherIndex = (NumDims == 5) ? 0 : indices[0] / m_fastOtherStride; eigen_assert(otherIndex == indices[1] / m_fastOtherStride); // Find the offset of the element wrt the location of the first element. const Index patchOffsets[2] = {(indices[0] - patchIndex * m_patchStride) / m_fastOutputDepth, (indices[1] - patchIndex * m_patchStride) / m_fastOutputDepth}; const Index patch3DIndex = (NumDims == 5) ? patchIndex : (indices[0] - otherIndex * m_otherStride) / m_fastPatchStride; eigen_assert(patch3DIndex == (indices[1] - otherIndex * m_otherStride) / m_fastPatchStride); const Index colIndex = patch3DIndex / m_fastOutputPlanesRows; const Index colOffsets[2] = { patchOffsets[0] / m_fastColStride, patchOffsets[1] / m_fastColStride}; // Calculate col indices in the original input tensor. const Index inputCols[2] = { colIndex * m_col_strides + colOffsets[0] - m_colPaddingLeft, colIndex * m_col_strides + colOffsets[1] - m_colPaddingLeft}; if (inputCols[1] < 0 || inputCols[0] >= m_inputCols) { return internal::pset1(Scalar(m_paddingValue)); } if (inputCols[0] != inputCols[1]) { return packetWithPossibleZero(index); } const Index rowIndex = (patch3DIndex - colIndex * m_outputPlanesRows) / m_fastOutputPlanes; const Index rowOffsets[2] = { (patchOffsets[0] - colOffsets[0] * m_colStride) / m_fastRowStride, (patchOffsets[1] - colOffsets[1] * m_colStride) / m_fastRowStride}; eigen_assert(rowOffsets[0] <= rowOffsets[1]); // Calculate col indices in the original input tensor. const Index inputRows[2] = { rowIndex * m_row_strides + rowOffsets[0] - m_rowPaddingTop, rowIndex * m_row_strides + rowOffsets[1] - m_rowPaddingTop}; if (inputRows[1] < 0 || inputRows[0] >= m_inputRows) { return internal::pset1(Scalar(m_paddingValue)); } if (inputRows[0] != inputRows[1]) { return packetWithPossibleZero(index); } const Index planeIndex = (patch3DIndex - m_outputPlanes * (colIndex * m_outputRows + rowIndex)); const Index planeOffsets[2] = { patchOffsets[0] - colOffsets[0] * m_colStride - rowOffsets[0] * m_rowStride, patchOffsets[1] - colOffsets[1] * m_colStride - rowOffsets[1] * m_rowStride}; eigen_assert(planeOffsets[0] <= planeOffsets[1]); const Index inputPlanes[2] = { planeIndex * m_plane_strides + planeOffsets[0] - m_planePaddingTop, planeIndex * m_plane_strides + planeOffsets[1] - m_planePaddingTop}; if (inputPlanes[1] < 0 || inputPlanes[0] >= m_inputPlanes) { return internal::pset1(Scalar(m_paddingValue)); } if (inputPlanes[0] >= 0 && inputPlanes[1] < m_inputPlanes) { // no padding const int depth_index = static_cast(Layout) == static_cast(ColMajor) ? 0 : NumDims - 1; const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index]; const Index inputIndex = depth + inputRows[0] * m_rowInputStride + inputCols[0] * m_colInputStride + m_planeInputStride * inputPlanes[0] + otherIndex * m_otherInputStride; return m_impl.template packet(inputIndex); } return packetWithPossibleZero(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost costPerCoeff(bool vectorized) const { const double compute_cost = 10 * TensorOpCost::DivCost() + 21 * TensorOpCost::MulCost() + 8 * TensorOpCost::AddCost(); return TensorOpCost(0, 0, compute_cost, vectorized, PacketSize); } EIGEN_DEVICE_FUNC EvaluatorPointerType data() const { return NULL; } const TensorEvaluator& impl() const { return m_impl; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index planePaddingTop() const { return m_planePaddingTop; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowPaddingTop() const { return m_rowPaddingTop; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colPaddingLeft() const { return m_colPaddingLeft; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outputPlanes() const { return m_outputPlanes; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outputRows() const { return m_outputRows; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outputCols() const { return m_outputCols; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userPlaneStride() const { return m_plane_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userRowStride() const { return m_row_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userColStride() const { return m_col_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userInPlaneStride() const { return m_in_plane_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userInRowStride() const { return m_in_row_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index userInColStride() const { return m_in_col_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index planeInflateStride() const { return m_plane_inflate_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rowInflateStride() const { return m_row_inflate_strides; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index colInflateStride() const { return m_col_inflate_strides; } #ifdef EIGEN_USE_SYCL // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind(cl::sycl::handler &cgh) const { m_impl.bind(cgh); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const { EIGEN_ALIGN_MAX typename internal::remove_const::type values[PacketSize]; EIGEN_UNROLL_LOOP for (int i = 0; i < PacketSize; ++i) { values[i] = coeff(index+i); } PacketReturnType rslt = internal::pload(values); return rslt; } Dimensions m_dimensions; // Parameters passed to the constructor. Index m_plane_strides; Index m_row_strides; Index m_col_strides; Index m_outputPlanes; Index m_outputRows; Index m_outputCols; Index m_planePaddingTop; Index m_rowPaddingTop; Index m_colPaddingLeft; Index m_in_plane_strides; Index m_in_row_strides; Index m_in_col_strides; Index m_plane_inflate_strides; Index m_row_inflate_strides; Index m_col_inflate_strides; // Cached input size. Index m_inputDepth; Index m_inputPlanes; Index m_inputRows; Index m_inputCols; // Other cached variables. Index m_outputPlanesRows; // Effective input/patch post-inflation size. Index m_input_planes_eff; Index m_input_rows_eff; Index m_input_cols_eff; Index m_patch_planes_eff; Index m_patch_rows_eff; Index m_patch_cols_eff; // Strides for the output tensor. Index m_otherStride; Index m_patchStride; Index m_rowStride; Index m_colStride; // Strides for the input tensor. Index m_planeInputStride; Index m_rowInputStride; Index m_colInputStride; Index m_otherInputStride; internal::TensorIntDivisor m_fastOtherStride; internal::TensorIntDivisor m_fastPatchStride; internal::TensorIntDivisor m_fastColStride; internal::TensorIntDivisor m_fastRowStride; internal::TensorIntDivisor m_fastInputPlaneStride; internal::TensorIntDivisor m_fastInputRowStride; internal::TensorIntDivisor m_fastInputColStride; internal::TensorIntDivisor m_fastInputColsEff; internal::TensorIntDivisor m_fastOutputPlanesRows; internal::TensorIntDivisor m_fastOutputPlanes; internal::TensorIntDivisor m_fastOutputDepth; Scalar m_paddingValue; TensorEvaluator m_impl; }; } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorCostModel.h0000644000176200001440000002070214562417221026254 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Rasmus Munk Larsen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_COST_MODEL_H #define EIGEN_CXX11_TENSOR_TENSOR_COST_MODEL_H namespace Eigen { /** \class TensorEvaluator * \ingroup CXX11_Tensor_Module * * \brief A cost model used to limit the number of threads used for evaluating * tensor expression. * */ // Class storing the cost of evaluating a tensor expression in terms of the // estimated number of operand bytes loads, bytes stored, and compute cycles. class TensorOpCost { public: // TODO(rmlarsen): Fix the scalar op costs in Eigen proper. Even a simple // model based on minimal reciprocal throughput numbers from Intel or // Agner Fog's tables would be better than what is there now. template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int MulCost() { return internal::functor_traits< internal::scalar_product_op >::Cost; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int AddCost() { return internal::functor_traits >::Cost; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int DivCost() { return internal::functor_traits< internal::scalar_quotient_op >::Cost; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int ModCost() { return internal::functor_traits >::Cost; } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int CastCost() { return internal::functor_traits< internal::scalar_cast_op >::Cost; } EIGEN_DEVICE_FUNC TensorOpCost() : bytes_loaded_(0), bytes_stored_(0), compute_cycles_(0) {} EIGEN_DEVICE_FUNC TensorOpCost(double bytes_loaded, double bytes_stored, double compute_cycles) : bytes_loaded_(bytes_loaded), bytes_stored_(bytes_stored), compute_cycles_(compute_cycles) {} EIGEN_DEVICE_FUNC TensorOpCost(double bytes_loaded, double bytes_stored, double compute_cycles, bool vectorized, double packet_size) : bytes_loaded_(bytes_loaded), bytes_stored_(bytes_stored), compute_cycles_(vectorized ? compute_cycles / packet_size : compute_cycles) { eigen_assert(bytes_loaded >= 0 && (numext::isfinite)(bytes_loaded)); eigen_assert(bytes_stored >= 0 && (numext::isfinite)(bytes_stored)); eigen_assert(compute_cycles >= 0 && (numext::isfinite)(compute_cycles)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bytes_loaded() const { return bytes_loaded_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bytes_stored() const { return bytes_stored_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double compute_cycles() const { return compute_cycles_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double total_cost( double load_cost, double store_cost, double compute_cost) const { return load_cost * bytes_loaded_ + store_cost * bytes_stored_ + compute_cost * compute_cycles_; } // Drop memory access component. Intended for cases when memory accesses are // sequential or are completely masked by computations. EIGEN_DEVICE_FUNC void dropMemoryCost() { bytes_loaded_ = 0; bytes_stored_ = 0; } // TODO(rmlarsen): Define min in terms of total cost, not elementwise. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost cwiseMin( const TensorOpCost& rhs) const { double bytes_loaded = numext::mini(bytes_loaded_, rhs.bytes_loaded()); double bytes_stored = numext::mini(bytes_stored_, rhs.bytes_stored()); double compute_cycles = numext::mini(compute_cycles_, rhs.compute_cycles()); return TensorOpCost(bytes_loaded, bytes_stored, compute_cycles); } // TODO(rmlarsen): Define max in terms of total cost, not elementwise. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost cwiseMax( const TensorOpCost& rhs) const { double bytes_loaded = numext::maxi(bytes_loaded_, rhs.bytes_loaded()); double bytes_stored = numext::maxi(bytes_stored_, rhs.bytes_stored()); double compute_cycles = numext::maxi(compute_cycles_, rhs.compute_cycles()); return TensorOpCost(bytes_loaded, bytes_stored, compute_cycles); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost& operator+=( const TensorOpCost& rhs) { bytes_loaded_ += rhs.bytes_loaded(); bytes_stored_ += rhs.bytes_stored(); compute_cycles_ += rhs.compute_cycles(); return *this; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorOpCost& operator*=(double rhs) { bytes_loaded_ *= rhs; bytes_stored_ *= rhs; compute_cycles_ *= rhs; return *this; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend TensorOpCost operator+( TensorOpCost lhs, const TensorOpCost& rhs) { lhs += rhs; return lhs; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend TensorOpCost operator*( TensorOpCost lhs, double rhs) { lhs *= rhs; return lhs; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend TensorOpCost operator*( double lhs, TensorOpCost rhs) { rhs *= lhs; return rhs; } friend std::ostream& operator<<(std::ostream& os, const TensorOpCost& tc) { return os << "[bytes_loaded = " << tc.bytes_loaded() << ", bytes_stored = " << tc.bytes_stored() << ", compute_cycles = " << tc.compute_cycles() << "]"; } private: double bytes_loaded_; double bytes_stored_; double compute_cycles_; }; // TODO(rmlarsen): Implement a policy that chooses an "optimal" number of theads // in [1:max_threads] instead of just switching multi-threading off for small // work units. template class TensorCostModel { public: // Scaling from Eigen compute cost to device cycles. static const int kDeviceCyclesPerComputeCycle = 1; // Costs in device cycles. static const int kStartupCycles = 100000; static const int kPerThreadCycles = 100000; static const int kTaskSize = 40000; // Returns the number of threads in [1:max_threads] to use for // evaluating an expression with the given output size and cost per // coefficient. static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int numThreads( double output_size, const TensorOpCost& cost_per_coeff, int max_threads) { double cost = totalCost(output_size, cost_per_coeff); double threads = (cost - kStartupCycles) / kPerThreadCycles + 0.9; // Make sure we don't invoke undefined behavior when we convert to an int. threads = numext::mini(threads, GenericNumTraits::highest()); return numext::mini(max_threads, numext::maxi(1, static_cast(threads))); } // taskSize assesses parallel task size. // Value of 1.0 means ideal parallel task size. Values < 1.0 mean that task // granularity needs to be increased to mitigate parallelization overheads. static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double taskSize( double output_size, const TensorOpCost& cost_per_coeff) { return totalCost(output_size, cost_per_coeff) / kTaskSize; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double totalCost( double output_size, const TensorOpCost& cost_per_coeff) { // Cost of memory fetches from L2 cache. 64 is typical cache line size. // 11 is L2 cache latency on Haswell. // We don't know whether data is in L1, L2 or L3. But we are most interested // in single-threaded computational time around 100us-10ms (smaller time // is too small for parallelization, larger time is not interesting // either because we are probably using all available threads already). // And for the target time range, L2 seems to be what matters. Data set // fitting into L1 is too small to take noticeable time. Data set fitting // only into L3 presumably will take more than 10ms to load and process. const double kLoadCycles = 1.0 / 64 * 11; const double kStoreCycles = 1.0 / 64 * 11; // Scaling from Eigen compute cost to device cycles. return output_size * cost_per_coeff.total_cost(kLoadCycles, kStoreCycles, kDeviceCyclesPerComputeCycle); } }; } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_COST_MODEL_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h0000644000176200001440000002020014562417221030311 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H #define EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H namespace Eigen { // MakePointer class is used as a container of the address space of the pointer // on the host and on the device. From the host side it generates the T* pointer // and when EIGEN_USE_SYCL is used it construct a buffer with a map_allocator to // T* m_data on the host. It is always called on the device. // Specialisation of MakePointer class for creating the sycl buffer with // map_allocator. template struct MakePointer { typedef T* Type; typedef const T* ConstType; }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T* constCast(const T* data) { return const_cast(data); } // The StorageMemory class is a container of the device specific pointer // used for refering to a Pointer on TensorEvaluator class. While the TensorExpression // is a device-agnostic type and need MakePointer class for type conversion, // the TensorEvaluator class can be specialized for a device, hence it is possible // to construct different types of temproray storage memory in TensorEvaluator // for different devices by specializing the following StorageMemory class. template struct StorageMemory: MakePointer {}; namespace internal{ template struct Pointer_type_promotion { static const bool val=false; }; template struct Pointer_type_promotion { static const bool val = true; }; template struct TypeConversion { typedef A* type; }; } template class MakePointer_ = MakePointer> class TensorMap; template class Tensor; template class TensorFixedSize; template class TensorRef; template class TensorBase; template class TensorCwiseNullaryOp; template class TensorCwiseUnaryOp; template class TensorCwiseBinaryOp; template class TensorCwiseTernaryOp; template class TensorSelectOp; template class MakePointer_ = MakePointer > class TensorReductionOp; template class TensorIndexTupleOp; template class TensorTupleReducerOp; template class TensorConcatenationOp; template class TensorContractionOp; template class TensorConversionOp; template class TensorConvolutionOp; template class TensorFFTOp; template class TensorPatchOp; template class TensorImagePatchOp; template class TensorVolumePatchOp; template class TensorBroadcastingOp; template class TensorChippingOp; template class TensorReshapingOp; template class TensorLayoutSwapOp; template class TensorSlicingOp; template class TensorReverseOp; template class TensorPaddingOp; template class TensorShufflingOp; template class TensorStridingOp; template class TensorStridingSlicingOp; template class TensorInflationOp; template class TensorGeneratorOp; template class TensorAssignOp; template class TensorScanOp; template class TensorTraceOp; template class TensorCustomUnaryOp; template class TensorCustomBinaryOp; template class MakePointer_ = MakePointer> class TensorEvalToOp; template class TensorForcedEvalOp; template class TensorDevice; template class TensorAsyncDevice; template struct TensorEvaluator; struct NoOpOutputKernel; struct DefaultDevice; struct ThreadPoolDevice; struct GpuDevice; struct SyclDevice; #ifdef EIGEN_USE_SYCL template struct MakeSYCLPointer { typedef Eigen::TensorSycl::internal::RangeAccess Type; }; template EIGEN_STRONG_INLINE const Eigen::TensorSycl::internal::RangeAccess& constCast(const Eigen::TensorSycl::internal::RangeAccess& data) { return data; } template struct StorageMemory : MakeSYCLPointer {}; template struct StorageMemory : StorageMemory {}; namespace TensorSycl { namespace internal{ template class GenericNondeterministicReducer; } } #endif enum FFTResultType { RealPart = 0, ImagPart = 1, BothParts = 2 }; enum FFTDirection { FFT_FORWARD = 0, FFT_REVERSE = 1 }; namespace internal { template struct IsVectorizable { static const bool value = TensorEvaluator::PacketAccess; }; template struct IsVectorizable { static const bool value = TensorEvaluator::PacketAccess && TensorEvaluator::IsAligned; }; // Tiled evaluation strategy. enum TiledEvaluation { Off = 0, // tiled evaluation is not supported On = 1, // still work in progress (see TensorBlock.h) }; template struct IsTileable { // Check that block evaluation is supported and it's a preferred option (at // least one sub-expression has much faster block evaluation, e.g. // broadcasting). static const bool BlockAccess = TensorEvaluator::BlockAccess && TensorEvaluator::PreferBlockAccess; static const TiledEvaluation value = BlockAccess ? TiledEvaluation::On : TiledEvaluation::Off; }; template ::value, TiledEvaluation Tiling = IsTileable::value> class TensorExecutor; template ::value, TiledEvaluation Tiling = IsTileable::value> class TensorAsyncExecutor; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/Tensor/TensorReductionSycl.h0000644000176200001440000007257214562417221027166 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * TensorReductionSycl.h * * \brief: * This is the specialization of the reduction operation. Two phase reduction approach * is used since the GPU does not have Global Synchronization for global memory among * different work-group/thread block. To solve the problem, we need to create two kernels * to reduce the data, where the first kernel reduce the data locally and each local * workgroup/thread-block save the input data into global memory. In the second phase (global reduction) * one work-group uses one work-group/thread-block to reduces the intermediate data into one single element. * Here is an NVIDIA presentation explaining the optimized two phase reduction algorithm on GPU: * https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf * *****************************************************************/ #ifndef UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_REDUCTION_SYCL_HPP #define UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_REDUCTION_SYCL_HPP namespace Eigen { namespace TensorSycl { namespace internal { template struct OpDefiner { typedef typename Vectorise::PacketReturnType PacketReturnType; typedef Op type; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE type get_op(Op &op) { return op; } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType finalise_op(const PacketReturnType &accumulator, const Index &) { return accumulator; } }; template struct OpDefiner, CoeffReturnType, Index, false> { typedef Eigen::internal::SumReducer type; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE type get_op(Eigen::internal::MeanReducer &) { return type(); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType finalise_op(const CoeffReturnType &accumulator, const Index &scale) { ::Eigen::internal::scalar_quotient_op quotient_op; return quotient_op(accumulator, CoeffReturnType(scale)); } }; template struct OpDefiner, CoeffReturnType, Index, true> { typedef typename Vectorise::PacketReturnType PacketReturnType; typedef Eigen::internal::SumReducer type; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE type get_op(Eigen::internal::MeanReducer &) { return type(); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType finalise_op(const PacketReturnType &accumulator, const Index &scale) { return ::Eigen::internal::pdiv(accumulator, ::Eigen::internal::pset1(CoeffReturnType(scale))); } }; template struct SecondStepFullReducer { typedef cl::sycl::accessor LocalAccessor; typedef OpDefiner OpDef; typedef typename OpDef::type Op; LocalAccessor scratch; InputAccessor aI; OutputAccessor outAcc; Op op; SecondStepFullReducer(LocalAccessor scratch_, InputAccessor aI_, OutputAccessor outAcc_, OpType op_) : scratch(scratch_), aI(aI_), outAcc(outAcc_), op(OpDef::get_op(op_)) {} void operator()(cl::sycl::nd_item<1> itemID) { // Our empirical research shows that the best performance will be achieved // when there is only one element per thread to reduce in the second step. // in this step the second step reduction time is almost negligible. // Hence, in the second step of reduction the input size is fixed to the // local size, thus, there is only one element read per thread. The // algorithm must be changed if the number of reduce per thread in the // second step is greater than 1. Otherwise, the result will be wrong. const Index localid = itemID.get_local_id(0); auto aInPtr = aI.get_pointer() + localid; auto aOutPtr = outAcc.get_pointer(); CoeffReturnType *scratchptr = scratch.get_pointer(); CoeffReturnType accumulator = *aInPtr; scratchptr[localid] = op.finalize(accumulator); for (Index offset = itemID.get_local_range(0) / 2; offset > 0; offset /= 2) { itemID.barrier(cl::sycl::access::fence_space::local_space); if (localid < offset) { op.reduce(scratchptr[localid + offset], &accumulator); scratchptr[localid] = op.finalize(accumulator); } } if (localid == 0) *aOutPtr = op.finalize(accumulator); } }; // Full reduction first phase. In this version the vectorization is true and the reduction accept // any generic reducerOp e.g( max, min, sum, mean, iamax, iamin, etc ). template class FullReductionKernelFunctor { public: typedef typename Evaluator::CoeffReturnType CoeffReturnType; typedef typename Evaluator::Index Index; typedef OpDefiner OpDef; typedef typename OpDef::type Op; typedef typename Evaluator::EvaluatorPointerType EvaluatorPointerType; typedef typename Evaluator::PacketReturnType PacketReturnType; typedef typename ::Eigen::internal::conditional<(Evaluator::ReducerTraits::PacketAccess & Evaluator::InputPacketAccess), PacketReturnType, CoeffReturnType>::type OutType; typedef cl::sycl::accessor LocalAccessor; LocalAccessor scratch; Evaluator evaluator; EvaluatorPointerType final_output; Index rng; Op op; FullReductionKernelFunctor(LocalAccessor scratch_, Evaluator evaluator_, EvaluatorPointerType final_output_, Index rng_, OpType op_) : scratch(scratch_), evaluator(evaluator_), final_output(final_output_), rng(rng_), op(OpDef::get_op(op_)) {} void operator()(cl::sycl::nd_item<1> itemID) { compute_reduction(itemID); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type compute_reduction( const cl::sycl::nd_item<1> &itemID) { auto output_ptr = final_output.get_pointer(); Index VectorizedRange = (rng / Evaluator::PacketSize) * Evaluator::PacketSize; Index globalid = itemID.get_global_id(0); Index localid = itemID.get_local_id(0); Index step = Evaluator::PacketSize * itemID.get_global_range(0); Index start = Evaluator::PacketSize * globalid; // vectorizable parts PacketReturnType packetAccumulator = op.template initializePacket(); for (Index i = start; i < VectorizedRange; i += step) { op.template reducePacket(evaluator.impl().template packet(i), &packetAccumulator); } globalid += VectorizedRange; // non vectorizable parts for (Index i = globalid; i < rng; i += itemID.get_global_range(0)) { op.template reducePacket( ::Eigen::TensorSycl::internal::PacketWrapper::convert_to_packet_type( evaluator.impl().coeff(i), op.initialize()), &packetAccumulator); } scratch[localid] = packetAccumulator = OpDef::finalise_op(op.template finalizePacket(packetAccumulator), rng); // reduction parts // Local size is always power of 2 EIGEN_UNROLL_LOOP for (Index offset = local_range / 2; offset > 0; offset /= 2) { itemID.barrier(cl::sycl::access::fence_space::local_space); if (localid < offset) { op.template reducePacket(scratch[localid + offset], &packetAccumulator); scratch[localid] = op.template finalizePacket(packetAccumulator); } } if (localid == 0) { output_ptr[itemID.get_group(0)] = op.finalizeBoth(op.initialize(), op.template finalizePacket(packetAccumulator)); } } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename ::Eigen::internal::enable_if::type compute_reduction( const cl::sycl::nd_item<1> &itemID) { auto output_ptr = final_output.get_pointer(); Index globalid = itemID.get_global_id(0); Index localid = itemID.get_local_id(0); // vectorizable parts CoeffReturnType accumulator = op.initialize(); // non vectorizable parts for (Index i = globalid; i < rng; i += itemID.get_global_range(0)) { op.reduce(evaluator.impl().coeff(i), &accumulator); } scratch[localid] = accumulator = OpDef::finalise_op(op.finalize(accumulator), rng); // reduction parts. the local size is always power of 2 EIGEN_UNROLL_LOOP for (Index offset = local_range / 2; offset > 0; offset /= 2) { itemID.barrier(cl::sycl::access::fence_space::local_space); if (localid < offset) { op.reduce(scratch[localid + offset], &accumulator); scratch[localid] = op.finalize(accumulator); } } if (localid == 0) { output_ptr[itemID.get_group(0)] = op.finalize(accumulator); } } }; template class GenericNondeterministicReducer { public: typedef typename Evaluator::CoeffReturnType CoeffReturnType; typedef typename Evaluator::EvaluatorPointerType EvaluatorPointerType; typedef typename Evaluator::Index Index; typedef OpDefiner OpDef; typedef typename OpDef::type Op; template GenericNondeterministicReducer(Scratch, Evaluator evaluator_, EvaluatorPointerType output_accessor_, OpType functor_, Index range_, Index num_values_to_reduce_) : evaluator(evaluator_), output_accessor(output_accessor_), functor(OpDef::get_op(functor_)), range(range_), num_values_to_reduce(num_values_to_reduce_) {} void operator()(cl::sycl::nd_item<1> itemID) { auto output_accessor_ptr = output_accessor.get_pointer(); /// const cast added as a naive solution to solve the qualifier drop error Index globalid = static_cast(itemID.get_global_linear_id()); if (globalid < range) { CoeffReturnType accum = functor.initialize(); Eigen::internal::GenericDimReducer::reduce( evaluator, evaluator.firstInput(globalid), functor, &accum); output_accessor_ptr[globalid] = OpDef::finalise_op(functor.finalize(accum), num_values_to_reduce); } } private: Evaluator evaluator; EvaluatorPointerType output_accessor; Op functor; Index range; Index num_values_to_reduce; }; enum class reduction_dim { inner_most, outer_most }; // default is preserver template struct PartialReductionKernel { typedef typename Evaluator::CoeffReturnType CoeffReturnType; typedef typename Evaluator::EvaluatorPointerType EvaluatorPointerType; typedef typename Evaluator::Index Index; typedef OpDefiner OpDef; typedef typename OpDef::type Op; typedef cl::sycl::accessor ScratchAcc; ScratchAcc scratch; Evaluator evaluator; EvaluatorPointerType output_accessor; Op op; const Index preserve_elements_num_groups; const Index reduce_elements_num_groups; const Index num_coeffs_to_preserve; const Index num_coeffs_to_reduce; PartialReductionKernel(ScratchAcc scratch_, Evaluator evaluator_, EvaluatorPointerType output_accessor_, OpType op_, const Index preserve_elements_num_groups_, const Index reduce_elements_num_groups_, const Index num_coeffs_to_preserve_, const Index num_coeffs_to_reduce_) : scratch(scratch_), evaluator(evaluator_), output_accessor(output_accessor_), op(OpDef::get_op(op_)), preserve_elements_num_groups(preserve_elements_num_groups_), reduce_elements_num_groups(reduce_elements_num_groups_), num_coeffs_to_preserve(num_coeffs_to_preserve_), num_coeffs_to_reduce(num_coeffs_to_reduce_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void element_wise_reduce(Index globalRId, Index globalPId, CoeffReturnType &accumulator) { if (globalPId >= num_coeffs_to_preserve) { return; } Index global_offset = rt == reduction_dim::outer_most ? globalPId + (globalRId * num_coeffs_to_preserve) : globalRId + (globalPId * num_coeffs_to_reduce); Index localOffset = globalRId; const Index per_thread_local_stride = PannelParameters::LocalThreadSizeR * reduce_elements_num_groups; const Index per_thread_global_stride = rt == reduction_dim::outer_most ? num_coeffs_to_preserve * per_thread_local_stride : per_thread_local_stride; for (Index i = globalRId; i < num_coeffs_to_reduce; i += per_thread_local_stride) { op.reduce(evaluator.impl().coeff(global_offset), &accumulator); localOffset += per_thread_local_stride; global_offset += per_thread_global_stride; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { const Index linearLocalThreadId = itemID.get_local_id(0); Index pLocalThreadId = rt == reduction_dim::outer_most ? linearLocalThreadId % PannelParameters::LocalThreadSizeP : linearLocalThreadId / PannelParameters::LocalThreadSizeR; Index rLocalThreadId = rt == reduction_dim::outer_most ? linearLocalThreadId / PannelParameters::LocalThreadSizeP : linearLocalThreadId % PannelParameters::LocalThreadSizeR; const Index pGroupId = rt == reduction_dim::outer_most ? itemID.get_group(0) % preserve_elements_num_groups : itemID.get_group(0) / reduce_elements_num_groups; const Index rGroupId = rt == reduction_dim::outer_most ? itemID.get_group(0) / preserve_elements_num_groups : itemID.get_group(0) % reduce_elements_num_groups; Index globalPId = pGroupId * PannelParameters::LocalThreadSizeP + pLocalThreadId; const Index globalRId = rGroupId * PannelParameters::LocalThreadSizeR + rLocalThreadId; auto scratchPtr = scratch.get_pointer().get(); auto outPtr = output_accessor.get_pointer() + (reduce_elements_num_groups > 1 ? rGroupId * num_coeffs_to_preserve : 0); CoeffReturnType accumulator = op.initialize(); element_wise_reduce(globalRId, globalPId, accumulator); accumulator = OpDef::finalise_op(op.finalize(accumulator), num_coeffs_to_reduce); scratchPtr[pLocalThreadId + rLocalThreadId * (PannelParameters::LocalThreadSizeP + PannelParameters::BC)] = accumulator; if (rt == reduction_dim::inner_most) { pLocalThreadId = linearLocalThreadId % PannelParameters::LocalThreadSizeP; rLocalThreadId = linearLocalThreadId / PannelParameters::LocalThreadSizeP; globalPId = pGroupId * PannelParameters::LocalThreadSizeP + pLocalThreadId; } /* Apply the reduction operation between the current local * id and the one on the other half of the vector. */ auto out_scratch_ptr = scratchPtr + (pLocalThreadId + (rLocalThreadId * (PannelParameters::LocalThreadSizeP + PannelParameters::BC))); itemID.barrier(cl::sycl::access::fence_space::local_space); if (rt == reduction_dim::inner_most) { accumulator = *out_scratch_ptr; } // The Local LocalThreadSizeR is always power of 2 EIGEN_UNROLL_LOOP for (Index offset = PannelParameters::LocalThreadSizeR >> 1; offset > 0; offset >>= 1) { if (rLocalThreadId < offset) { op.reduce(out_scratch_ptr[(PannelParameters::LocalThreadSizeP + PannelParameters::BC) * offset], &accumulator); // The result has already been divided for mean reducer in the // previous reduction so no need to divide furthermore *out_scratch_ptr = op.finalize(accumulator); } /* All threads collectively read from global memory into local. * The barrier ensures all threads' IO is resolved before * execution continues (strictly speaking, all threads within * a single work-group - there is no co-ordination between * work-groups, only work-items). */ itemID.barrier(cl::sycl::access::fence_space::local_space); } if (rLocalThreadId == 0 && (globalPId < num_coeffs_to_preserve)) { outPtr[globalPId] = op.finalize(accumulator); } } }; template struct SecondStepPartialReduction { typedef OpDefiner OpDef; typedef typename OpDef::type Op; typedef cl::sycl::accessor ScratchAccessor; InputAccessor input_accessor; OutputAccessor output_accessor; Op op; const Index num_coeffs_to_preserve; const Index num_coeffs_to_reduce; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE SecondStepPartialReduction(ScratchAccessor, InputAccessor input_accessor_, OutputAccessor output_accessor_, OpType op_, const Index num_coeffs_to_preserve_, const Index num_coeffs_to_reduce_) : input_accessor(input_accessor_), output_accessor(output_accessor_), op(OpDef::get_op(op_)), num_coeffs_to_preserve(num_coeffs_to_preserve_), num_coeffs_to_reduce(num_coeffs_to_reduce_) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void operator()(cl::sycl::nd_item<1> itemID) { const Index globalId = itemID.get_global_id(0); if (globalId >= num_coeffs_to_preserve) return; auto in_ptr = input_accessor.get_pointer() + globalId; OutScalar accumulator = op.initialize(); // num_coeffs_to_reduce is not bigger that 256 for (Index i = 0; i < num_coeffs_to_reduce; i++) { op.reduce(*in_ptr, &accumulator); in_ptr += num_coeffs_to_preserve; } output_accessor.get_pointer()[globalId] = op.finalize(accumulator); } }; // namespace internal template struct ReductionPannel { static EIGEN_CONSTEXPR Index LocalThreadSizeP = LTP; static EIGEN_CONSTEXPR Index LocalThreadSizeR = LTR; static EIGEN_CONSTEXPR bool BC = BC_; }; template struct PartialReducerLauncher { typedef typename Self::EvaluatorPointerType EvaluatorPointerType; typedef typename Self::CoeffReturnType CoeffReturnType; typedef typename Self::Storage Storage; typedef typename Self::Index Index; typedef ReductionPannel PannelParameters; typedef PartialReductionKernel SyclReducerKerneType; static bool run(const Self &self, const Op &reducer, const Eigen::SyclDevice &dev, EvaluatorPointerType output, Index num_coeffs_to_reduce, Index num_coeffs_to_preserve) { Index roundUpP = roundUp(num_coeffs_to_preserve, PannelParameters::LocalThreadSizeP); // getPowerOfTwo makes sure local range is power of 2 and <= // maxSyclThreadPerBlock this will help us to avoid extra check on the // kernel static_assert(!((PannelParameters::LocalThreadSizeP * PannelParameters::LocalThreadSizeR) & (PannelParameters::LocalThreadSizeP * PannelParameters::LocalThreadSizeR - 1)), "The Local thread size must be a power of 2 for the reduction " "operation"); EIGEN_CONSTEXPR Index localRange = PannelParameters::LocalThreadSizeP * PannelParameters::LocalThreadSizeR; // In this step, we force the code not to be more than 2-step reduction: // Our empirical research shows that if each thread reduces at least 64 // elemnts individually, we get better performance. However, this can change // on different platforms. In this step we force the code not to be // morthan step reduction: Our empirical research shows that for inner_most // dim reducer, it is better to have 8 group in a reduce dimension for sizes // > 1024 to achieve the best performance. const Index reductionPerThread = 64; Index cu = dev.getPowerOfTwo(dev.getNumSyclMultiProcessors(), true); const Index pNumGroups = roundUpP / PannelParameters::LocalThreadSizeP; Index rGroups = (cu + pNumGroups - 1) / pNumGroups; const Index rNumGroups = num_coeffs_to_reduce > reductionPerThread * localRange ? std::min(rGroups, localRange) : 1; const Index globalRange = pNumGroups * rNumGroups * localRange; EIGEN_CONSTEXPR Index scratchSize = PannelParameters::LocalThreadSizeR * (PannelParameters::LocalThreadSizeP + PannelParameters::BC); auto thread_range = cl::sycl::nd_range<1>(cl::sycl::range<1>(globalRange), cl::sycl::range<1>(localRange)); if (rNumGroups > 1) { CoeffReturnType *temp_pointer = static_cast( dev.allocate_temp(num_coeffs_to_preserve * rNumGroups * sizeof(CoeffReturnType))); EvaluatorPointerType temp_accessor = dev.get(temp_pointer); dev.template unary_kernel_launcher( self, temp_accessor, thread_range, scratchSize, reducer, pNumGroups, rNumGroups, num_coeffs_to_preserve, num_coeffs_to_reduce); typedef SecondStepPartialReduction SecondStepPartialReductionKernel; dev.template unary_kernel_launcher( temp_accessor, output, cl::sycl::nd_range<1>(cl::sycl::range<1>(pNumGroups * localRange), cl::sycl::range<1>(localRange)), Index(1), reducer, num_coeffs_to_preserve, rNumGroups); self.device().deallocate_temp(temp_pointer); } else { dev.template unary_kernel_launcher( self, output, thread_range, scratchSize, reducer, pNumGroups, rNumGroups, num_coeffs_to_preserve, num_coeffs_to_reduce); } return false; } }; } // namespace internal } // namespace TensorSycl namespace internal { template struct FullReducer { typedef typename Self::CoeffReturnType CoeffReturnType; typedef typename Self::EvaluatorPointerType EvaluatorPointerType; static EIGEN_CONSTEXPR bool HasOptimizedImplementation = true; static EIGEN_CONSTEXPR int PacketSize = Self::PacketAccess ? Self::PacketSize : 1; static void run(const Self &self, Op &reducer, const Eigen::SyclDevice &dev, EvaluatorPointerType data) { typedef typename conditional::type OutType; static_assert(!((EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1) & (EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1 - 1)), "The Local thread size must be a power of 2 for the reduction " "operation"); EIGEN_CONSTEXPR Index local_range = EIGEN_SYCL_LOCAL_THREAD_DIM0 * EIGEN_SYCL_LOCAL_THREAD_DIM1; typename Self::Index inputSize = self.impl().dimensions().TotalSize(); // In this step we force the code not to be more than 2-step reduction: // Our empirical research shows that if each thread reduces at least 512 // elemnts individually, we get better performance. const Index reductionPerThread = 2048; // const Index num_work_group = Index reductionGroup = dev.getPowerOfTwo( (inputSize + (reductionPerThread * local_range - 1)) / (reductionPerThread * local_range), true); const Index num_work_group = std::min(reductionGroup, local_range); // 1 // ? local_range // : 1); const Index global_range = num_work_group * local_range; auto thread_range = cl::sycl::nd_range<1>(cl::sycl::range<1>(global_range), cl::sycl::range<1>(local_range)); typedef TensorSycl::internal::FullReductionKernelFunctor reduction_kernel_t; if (num_work_group > 1) { CoeffReturnType *temp_pointer = static_cast(dev.allocate_temp(num_work_group * sizeof(CoeffReturnType))); typename Self::EvaluatorPointerType tmp_global_accessor = dev.get(temp_pointer); dev.template unary_kernel_launcher(self, tmp_global_accessor, thread_range, local_range, inputSize, reducer); typedef TensorSycl::internal::SecondStepFullReducer GenericRKernel; dev.template unary_kernel_launcher( tmp_global_accessor, data, cl::sycl::nd_range<1>(cl::sycl::range<1>(num_work_group), cl::sycl::range<1>(num_work_group)), num_work_group, reducer); dev.deallocate_temp(temp_pointer); } else { dev.template unary_kernel_launcher(self, data, thread_range, local_range, inputSize, reducer); } } }; // vectorizable inner_most most dim preserver // col reduction template struct OuterReducer { static EIGEN_CONSTEXPR bool HasOptimizedImplementation = true; static bool run(const Self &self, const Op &reducer, const Eigen::SyclDevice &dev, typename Self::EvaluatorPointerType output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_coeffs_to_preserve) { return ::Eigen::TensorSycl::internal::PartialReducerLauncher< Self, Op, ::Eigen::TensorSycl::internal::reduction_dim::outer_most>::run(self, reducer, dev, output, num_coeffs_to_reduce, num_coeffs_to_preserve); } }; // row reduction template struct InnerReducer { static EIGEN_CONSTEXPR bool HasOptimizedImplementation = true; static bool run(const Self &self, const Op &reducer, const Eigen::SyclDevice &dev, typename Self::EvaluatorPointerType output, typename Self::Index num_coeffs_to_reduce, typename Self::Index num_coeffs_to_preserve) { return ::Eigen::TensorSycl::internal::PartialReducerLauncher< Self, Op, ::Eigen::TensorSycl::internal::reduction_dim::inner_most>::run(self, reducer, dev, output, num_coeffs_to_reduce, num_coeffs_to_preserve); } }; // ArmgMax uses this kernel for partial reduction// // TODO(@mehdi.goli) come up with a better kernel // generic partial reduction template struct GenericReducer { static EIGEN_CONSTEXPR bool HasOptimizedImplementation = false; static bool run(const Self &self, const Op &reducer, const Eigen::SyclDevice &dev, typename Self::EvaluatorPointerType output, typename Self::Index num_values_to_reduce, typename Self::Index num_coeffs_to_preserve) { typename Self::Index range, GRange, tileSize; dev.parallel_for_setup(num_coeffs_to_preserve, tileSize, range, GRange); dev.template unary_kernel_launcher>( self, output, cl::sycl::nd_range<1>(cl::sycl::range<1>(GRange), cl::sycl::range<1>(tileSize)), Index(1), reducer, range, (num_values_to_reduce != 0) ? num_values_to_reduce : static_cast(1)); return false; } }; } // namespace internal } // namespace Eigen #endif // UNSUPPORTED_EIGEN_CXX11_SRC_TENSOR_TENSOR_REDUCTION_SYCL_HPP RcppEigen/inst/include/unsupported/Eigen/CXX11/src/util/0000755000176200001440000000000014562417221022521 5ustar liggesusersRcppEigen/inst/include/unsupported/Eigen/CXX11/src/util/MaxSizeVector.h0000644000176200001440000001011614562417221025434 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_FIXEDSIZEVECTOR_H #define EIGEN_FIXEDSIZEVECTOR_H namespace Eigen { /** \class MaxSizeVector * \ingroup Core * * \brief The MaxSizeVector class. * * The %MaxSizeVector provides a subset of std::vector functionality. * * The goal is to provide basic std::vector operations when using * std::vector is not an option (e.g. on GPU or when compiling using * FMA/AVX, as this can cause either compilation failures or illegal * instruction failures). * * Beware: The constructors are not API compatible with these of * std::vector. */ template class MaxSizeVector { static const size_t alignment = EIGEN_PLAIN_ENUM_MAX(EIGEN_ALIGNOF(T), sizeof(void*)); public: // Construct a new MaxSizeVector, reserve n elements. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit MaxSizeVector(size_t n) : reserve_(n), size_(0), data_(static_cast(internal::handmade_aligned_malloc(n * sizeof(T), alignment))) { } // Construct a new MaxSizeVector, reserve and resize to n. // Copy the init value to all elements. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE MaxSizeVector(size_t n, const T& init) : reserve_(n), size_(n), data_(static_cast(internal::handmade_aligned_malloc(n * sizeof(T), alignment))) { size_t i = 0; EIGEN_TRY { for(; i < size_; ++i) { new (&data_[i]) T(init); } } EIGEN_CATCH(...) { // Construction failed, destruct in reverse order: for(; (i+1) > 0; --i) { data_[i-1].~T(); } internal::handmade_aligned_free(data_); EIGEN_THROW; } } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ~MaxSizeVector() { for (size_t i = size_; i > 0; --i) { data_[i-1].~T(); } internal::handmade_aligned_free(data_); } void resize(size_t n) { eigen_assert(n <= reserve_); for (; size_ < n; ++size_) { new (&data_[size_]) T; } for (; size_ > n; --size_) { data_[size_-1].~T(); } eigen_assert(size_ == n); } // Append new elements (up to reserved size). EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void push_back(const T& t) { eigen_assert(size_ < reserve_); new (&data_[size_++]) T(t); } // For C++03 compatibility this only takes one argument template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void emplace_back(const X& x) { eigen_assert(size_ < reserve_); new (&data_[size_++]) T(x); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& operator[] (size_t i) const { eigen_assert(i < size_); return data_[i]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& operator[] (size_t i) { eigen_assert(i < size_); return data_[i]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& back() { eigen_assert(size_ > 0); return data_[size_ - 1]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& back() const { eigen_assert(size_ > 0); return data_[size_ - 1]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pop_back() { eigen_assert(size_ > 0); data_[--size_].~T(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t size() const { return size_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool empty() const { return size_ == 0; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T* data() { return data_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T* data() const { return data_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T* begin() { return data_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T* end() { return data_ + size_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T* begin() const { return data_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T* end() const { return data_ + size_; } private: size_t reserve_; size_t size_; T* data_; }; } // namespace Eigen #endif // EIGEN_FIXEDSIZEVECTOR_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/util/CXX11Meta.h0000644000176200001440000005434014562417221024313 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11META_H #define EIGEN_CXX11META_H #include #include "EmulateArray.h" #include "CXX11Workarounds.h" namespace Eigen { namespace internal { /** \internal * \file CXX11/util/CXX11Meta.h * This file contains generic metaprogramming classes which are not specifically related to Eigen. * This file expands upon Core/util/Meta.h and adds support for C++11 specific features. */ template struct type_list { constexpr static int count = sizeof...(tt); }; template struct type_list { constexpr static int count = sizeof...(tt) + 1; typedef t first_type; }; template struct numeric_list { constexpr static std::size_t count = sizeof...(nn); }; template struct numeric_list { static const std::size_t count = sizeof...(nn) + 1; const static T first_value = n; }; #ifndef EIGEN_PARSED_BY_DOXYGEN /* numeric list constructors * * equivalencies: * constructor result * typename gen_numeric_list::type numeric_list * typename gen_numeric_list_reversed::type numeric_list * typename gen_numeric_list_swapped_pair::type numeric_list * typename gen_numeric_list_repeated::type numeric_list */ template struct gen_numeric_list : gen_numeric_list {}; template struct gen_numeric_list { typedef numeric_list type; }; template struct gen_numeric_list_reversed : gen_numeric_list_reversed {}; template struct gen_numeric_list_reversed { typedef numeric_list type; }; template struct gen_numeric_list_swapped_pair : gen_numeric_list_swapped_pair {}; template struct gen_numeric_list_swapped_pair { typedef numeric_list type; }; template struct gen_numeric_list_repeated : gen_numeric_list_repeated {}; template struct gen_numeric_list_repeated { typedef numeric_list type; }; /* list manipulation: concatenate */ template struct concat; template struct concat, type_list> { typedef type_list type; }; template struct concat, numeric_list > { typedef numeric_list type; }; template struct mconcat; template struct mconcat { typedef a type; }; template struct mconcat : concat {}; template struct mconcat : concat::type> {}; /* list manipulation: extract slices */ template struct take; template struct take> : concat, typename take>::type> {}; template struct take> { typedef type_list<> type; }; template struct take<0, type_list> { typedef type_list<> type; }; template<> struct take<0, type_list<>> { typedef type_list<> type; }; template struct take> : concat, typename take>::type> {}; template struct take> { typedef numeric_list type; }; template struct take<0, numeric_list> { typedef numeric_list type; }; template struct take<0, numeric_list> { typedef numeric_list type; }; template struct h_skip_helper_numeric; template struct h_skip_helper_numeric : h_skip_helper_numeric {}; template struct h_skip_helper_numeric { typedef numeric_list type; }; template struct h_skip_helper_numeric { typedef numeric_list type; }; template struct h_skip_helper_numeric { typedef numeric_list type; }; template struct h_skip_helper_type; template struct h_skip_helper_type : h_skip_helper_type {}; template struct h_skip_helper_type<0, t, tt...> { typedef type_list type; }; template struct h_skip_helper_type { typedef type_list<> type; }; template<> struct h_skip_helper_type<0> { typedef type_list<> type; }; #endif //not EIGEN_PARSED_BY_DOXYGEN template struct h_skip { template constexpr static EIGEN_STRONG_INLINE typename h_skip_helper_numeric::type helper(numeric_list) { return typename h_skip_helper_numeric::type(); } template constexpr static EIGEN_STRONG_INLINE typename h_skip_helper_type::type helper(type_list) { return typename h_skip_helper_type::type(); } }; template struct skip { typedef decltype(h_skip::helper(a())) type; }; template struct slice : take::type> {}; /* list manipulation: retrieve single element from list */ template struct get; template struct get> : get> {}; template struct get<0, type_list> { typedef a type; }; template struct get> : get> {}; template struct get<0, numeric_list> { constexpr static T value = a; }; template constexpr T array_get(const numeric_list&) { return get<(int)n, numeric_list>::value; } /* always get type, regardless of dummy; good for parameter pack expansion */ template struct id_numeric { typedef t type; }; template struct id_type { typedef t type; }; /* equality checking, flagged version */ template struct is_same_gf : is_same { constexpr static int global_flags = 0; }; /* apply_op to list */ template< bool from_left, // false template class op, typename additional_param, typename... values > struct h_apply_op_helper { typedef type_list::type...> type; }; template< template class op, typename additional_param, typename... values > struct h_apply_op_helper { typedef type_list::type...> type; }; template< bool from_left, template class op, typename additional_param > struct h_apply_op { template constexpr static typename h_apply_op_helper::type helper(type_list) { return typename h_apply_op_helper::type(); } }; template< template class op, typename additional_param, typename a > struct apply_op_from_left { typedef decltype(h_apply_op::helper(a())) type; }; template< template class op, typename additional_param, typename a > struct apply_op_from_right { typedef decltype(h_apply_op::helper(a())) type; }; /* see if an element is in a list */ template< template class test, typename check_against, typename h_list, bool last_check_positive = false > struct contained_in_list; template< template class test, typename check_against, typename h_list > struct contained_in_list { constexpr static bool value = true; }; template< template class test, typename check_against, typename a, typename... as > struct contained_in_list, false> : contained_in_list, test::value> {}; template< template class test, typename check_against EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty) > struct contained_in_list, false> { constexpr static bool value = false; }; /* see if an element is in a list and check for global flags */ template< template class test, typename check_against, typename h_list, int default_flags = 0, bool last_check_positive = false, int last_check_flags = default_flags > struct contained_in_list_gf; template< template class test, typename check_against, typename h_list, int default_flags, int last_check_flags > struct contained_in_list_gf { constexpr static bool value = true; constexpr static int global_flags = last_check_flags; }; template< template class test, typename check_against, typename a, typename... as, int default_flags, int last_check_flags > struct contained_in_list_gf, default_flags, false, last_check_flags> : contained_in_list_gf, default_flags, test::value, test::global_flags> {}; template< template class test, typename check_against EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty), int default_flags, int last_check_flags > struct contained_in_list_gf, default_flags, false, last_check_flags> { constexpr static bool value = false; constexpr static int global_flags = default_flags; }; /* generic reductions */ template< typename Reducer, typename... Ts > struct reduce; template< typename Reducer > struct reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE int run() { return Reducer::Identity; } }; template< typename Reducer, typename A > struct reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE A run(A a) { return a; } }; template< typename Reducer, typename A, typename... Ts > struct reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE auto run(A a, Ts... ts) -> decltype(Reducer::run(a, reduce::run(ts...))) { return Reducer::run(a, reduce::run(ts...)); } }; /* generic binary operations */ struct sum_op { template EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a + b) { return a + b; } static constexpr int Identity = 0; }; struct product_op { template EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a * b) { return a * b; } static constexpr int Identity = 1; }; struct logical_and_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a && b) { return a && b; } }; struct logical_or_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a || b) { return a || b; } }; struct equal_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a == b) { return a == b; } }; struct not_equal_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a != b) { return a != b; } }; struct lesser_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a < b) { return a < b; } }; struct lesser_equal_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a <= b) { return a <= b; } }; struct greater_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a > b) { return a > b; } }; struct greater_equal_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a, B b) -> decltype(a >= b) { return a >= b; } }; /* generic unary operations */ struct not_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a) -> decltype(!a) { return !a; } }; struct negation_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a) -> decltype(-a) { return -a; } }; struct greater_equal_zero_op { template constexpr static EIGEN_STRONG_INLINE auto run(A a) -> decltype(a >= 0) { return a >= 0; } }; /* reductions for lists */ // using auto -> return value spec makes ICC 13.0 and 13.1 crash here, so we have to hack it // together in front... (13.0 doesn't work with array_prod/array_reduce/... anyway, but 13.1 // does... template EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE decltype(reduce::run((*((Ts*)0))...)) arg_prod(Ts... ts) { return reduce::run(ts...); } template constexpr EIGEN_STRONG_INLINE decltype(reduce::run((*((Ts*)0))...)) arg_sum(Ts... ts) { return reduce::run(ts...); } /* reverse arrays */ template constexpr EIGEN_STRONG_INLINE Array h_array_reverse(Array arr, numeric_list) { return {{array_get(arr)...}}; } template constexpr EIGEN_STRONG_INLINE array array_reverse(array arr) { return h_array_reverse(arr, typename gen_numeric_list::type()); } /* generic array reductions */ // can't reuse standard reduce() interface above because Intel's Compiler // *really* doesn't like it, so we just reimplement the stuff // (start from N - 1 and work down to 0 because specialization for // n == N - 1 also doesn't work in Intel's compiler, so it goes into // an infinite loop) template struct h_array_reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE auto run(array arr, T identity) -> decltype(Reducer::run(h_array_reduce::run(arr, identity), array_get(arr))) { return Reducer::run(h_array_reduce::run(arr, identity), array_get(arr)); } }; template struct h_array_reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE T run(const array& arr, T) { return array_get<0>(arr); } }; template struct h_array_reduce { EIGEN_DEVICE_FUNC constexpr static EIGEN_STRONG_INLINE T run(const array&, T identity) { return identity; } }; template EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE auto array_reduce(const array& arr, T identity) -> decltype(h_array_reduce::run(arr, identity)) { return h_array_reduce::run(arr, identity); } /* standard array reductions */ template EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE auto array_sum(const array& arr) -> decltype(array_reduce(arr, static_cast(0))) { return array_reduce(arr, static_cast(0)); } template EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE auto array_prod(const array& arr) -> decltype(array_reduce(arr, static_cast(1))) { return array_reduce(arr, static_cast(1)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE t array_prod(const std::vector& a) { eigen_assert(a.size() > 0); t prod = 1; for (size_t i = 0; i < a.size(); ++i) { prod *= a[i]; } return prod; } /* zip an array */ template constexpr EIGEN_STRONG_INLINE array h_array_zip(array a, array b, numeric_list) { return array{{ Op::run(array_get(a), array_get(b))... }}; } template constexpr EIGEN_STRONG_INLINE array array_zip(array a, array b) { return h_array_zip(a, b, typename gen_numeric_list::type()); } /* zip an array and reduce the result */ template constexpr EIGEN_STRONG_INLINE auto h_array_zip_and_reduce(array a, array b, numeric_list) -> decltype(reduce::type...>::run(Op::run(array_get(a), array_get(b))...)) { return reduce::type...>::run(Op::run(array_get(a), array_get(b))...); } template constexpr EIGEN_STRONG_INLINE auto array_zip_and_reduce(array a, array b) -> decltype(h_array_zip_and_reduce(a, b, typename gen_numeric_list::type())) { return h_array_zip_and_reduce(a, b, typename gen_numeric_list::type()); } /* apply stuff to an array */ template constexpr EIGEN_STRONG_INLINE array h_array_apply(array a, numeric_list) { return array{{ Op::run(array_get(a))... }}; } template constexpr EIGEN_STRONG_INLINE array array_apply(array a) { return h_array_apply(a, typename gen_numeric_list::type()); } /* apply stuff to an array and reduce */ template constexpr EIGEN_STRONG_INLINE auto h_array_apply_and_reduce(array arr, numeric_list) -> decltype(reduce::type...>::run(Op::run(array_get(arr))...)) { return reduce::type...>::run(Op::run(array_get(arr))...); } template constexpr EIGEN_STRONG_INLINE auto array_apply_and_reduce(array a) -> decltype(h_array_apply_and_reduce(a, typename gen_numeric_list::type())) { return h_array_apply_and_reduce(a, typename gen_numeric_list::type()); } /* repeat a value n times (and make an array out of it * usage: * array = repeat<16>(42); */ template struct h_repeat { template constexpr static EIGEN_STRONG_INLINE array run(t v, numeric_list) { return {{ typename id_numeric::type(v)... }}; } }; template constexpr array repeat(t v) { return h_repeat::run(v, typename gen_numeric_list::type()); } /* instantiate a class by a C-style array */ template struct h_instantiate_by_c_array; template struct h_instantiate_by_c_array { static InstType run(ArrType* arr, Ps... args) { return h_instantiate_by_c_array::run(arr + 1, args..., arr[0]); } }; template struct h_instantiate_by_c_array { static InstType run(ArrType* arr, Ps... args) { return h_instantiate_by_c_array::run(arr + 1, arr[0], args...); } }; template struct h_instantiate_by_c_array { static InstType run(ArrType* arr, Ps... args) { (void)arr; return InstType(args...); } }; template struct h_instantiate_by_c_array { static InstType run(ArrType* arr, Ps... args) { (void)arr; return InstType(args...); } }; template InstType instantiate_by_c_array(ArrType* arr) { return h_instantiate_by_c_array::run(arr); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11META_H RcppEigen/inst/include/unsupported/Eigen/CXX11/src/util/CXX11Workarounds.h0000644000176200001440000001002314562417221025731 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2013 Christian Seiler // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CXX11WORKAROUNDS_H #define EIGEN_CXX11WORKAROUNDS_H /* COMPATIBILITY CHECKS * (so users of compilers that are too old get some realistic error messages) */ #if defined(__INTEL_COMPILER) && (__INTEL_COMPILER < 1310) #error Intel Compiler only supports required C++ features since version 13.1. // note that most stuff in principle works with 13.0 but when combining // some features, at some point 13.0 will just fail with an internal assertion #elif defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 6)) // G++ < 4.6 by default will continue processing the source files - even if we use #error to make // it error out. For this reason, we use the pragma to make sure G++ aborts at the first error // it sees. Unfortunately, that is still not our #error directive, but at least the output is // short enough the user has a chance to see that the compiler version is not sufficient for // the funky template mojo we use. #pragma GCC diagnostic error "-Wfatal-errors" #error GNU C++ Compiler (g++) only supports required C++ features since version 4.6. #endif /* Check that the compiler at least claims to support C++11. It might not be sufficient * because the compiler may not implement it correctly, but at least we'll know. * On the other hand, visual studio still doesn't claim to support C++11 although it's * compliant enugh for our purpose. */ #if (EIGEN_COMP_CXXVER < 11) #if defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER) #pragma GCC diagnostic error "-Wfatal-errors" #endif #error This library needs at least a C++11 compliant compiler. If you use g++/clang, please enable the -std=c++11 compiler flag. (-std=c++0x on older versions.) #endif namespace Eigen { namespace internal { /* std::get is only constexpr in C++14, not yet in C++11 */ template constexpr inline T& array_get(std::vector& a) { return a[I_]; } template constexpr inline T&& array_get(std::vector&& a) { return a[I_]; } template constexpr inline T const& array_get(std::vector const& a) { return a[I_]; } /* Suppose you have a template of the form * template struct X; * And you want to specialize it in such a way: * template struct X> { ::: }; * template<> struct X> { ::: }; * This will work in Intel's compiler 13.0, but only to some extent in g++ 4.6, since * g++ can only match templates called with parameter packs if the number of template * arguments is not a fixed size (so inside the first specialization, referencing * X> will fail in g++). On the other hand, g++ will accept the following: * template struct X> { ::: }: * as an additional (!) specialization, which will then only match the empty case. * But Intel's compiler 13.0 won't accept that, it will only accept the empty syntax, * so we have to create a workaround for this. */ #if defined(__GNUC__) && !defined(__INTEL_COMPILER) #define EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n) mt... n #define EIGEN_TPL_PP_SPEC_HACK_DEFC(mt, n) , EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n) #define EIGEN_TPL_PP_SPEC_HACK_USE(n) n... #define EIGEN_TPL_PP_SPEC_HACK_USEC(n) , n... #else #define EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n) #define EIGEN_TPL_PP_SPEC_HACK_DEFC(mt, n) #define EIGEN_TPL_PP_SPEC_HACK_USE(n) #define EIGEN_TPL_PP_SPEC_HACK_USEC(n) #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_CXX11WORKAROUNDS_H /* * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle; */ RcppEigen/inst/include/unsupported/Eigen/CXX11/src/util/EmulateArray.h0000644000176200001440000001773314562417221025300 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EMULATE_ARRAY_H #define EIGEN_EMULATE_ARRAY_H // The array class is only available starting with cxx11. Emulate our own here // if needed. Beware, msvc still doesn't advertise itself as a c++11 compiler! // Moreover, CUDA doesn't support the STL containers, so we use our own instead. #if (__cplusplus <= 199711L && EIGEN_COMP_MSVC < 1900) || defined(EIGEN_GPUCC) || defined(EIGEN_AVOID_STL_ARRAY) namespace Eigen { template class array { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& operator[] (size_t index) { eigen_internal_assert(index < size()); return values[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& operator[] (size_t index) const { eigen_internal_assert(index < size()); return values[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& at(size_t index) { eigen_assert(index < size()); return values[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& at(size_t index) const { eigen_assert(index < size()); return values[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& front() { return values[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& front() const { return values[0]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& back() { return values[n-1]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& back() const { return values[n-1]; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static std::size_t size() { return n; } T values[n]; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array() { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v) { EIGEN_STATIC_ASSERT(n==1, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2) { EIGEN_STATIC_ASSERT(n==2, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3) { EIGEN_STATIC_ASSERT(n==3, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4) { EIGEN_STATIC_ASSERT(n==4, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; values[3] = v4; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4, const T& v5) { EIGEN_STATIC_ASSERT(n==5, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; values[3] = v4; values[4] = v5; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4, const T& v5, const T& v6) { EIGEN_STATIC_ASSERT(n==6, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; values[3] = v4; values[4] = v5; values[5] = v6; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4, const T& v5, const T& v6, const T& v7) { EIGEN_STATIC_ASSERT(n==7, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; values[3] = v4; values[4] = v5; values[5] = v6; values[6] = v7; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array( const T& v1, const T& v2, const T& v3, const T& v4, const T& v5, const T& v6, const T& v7, const T& v8) { EIGEN_STATIC_ASSERT(n==8, YOU_MADE_A_PROGRAMMING_MISTAKE) values[0] = v1; values[1] = v2; values[2] = v3; values[3] = v4; values[4] = v5; values[5] = v6; values[6] = v7; values[7] = v8; } #if EIGEN_HAS_VARIADIC_TEMPLATES EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array(std::initializer_list l) { eigen_assert(l.size() == n); internal::smart_copy(l.begin(), l.end(), values); } #endif }; // Specialize array for zero size template class array { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& operator[] (size_t) { eigen_assert(false && "Can't index a zero size array"); return dummy; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& operator[] (size_t) const { eigen_assert(false && "Can't index a zero size array"); return dummy; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& front() { eigen_assert(false && "Can't index a zero size array"); return dummy; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& front() const { eigen_assert(false && "Can't index a zero size array"); return dummy; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& back() { eigen_assert(false && "Can't index a zero size array"); return dummy; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& back() const { eigen_assert(false && "Can't index a zero size array"); return dummy; } static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE std::size_t size() { return 0; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE array() : dummy() { } #if EIGEN_HAS_VARIADIC_TEMPLATES EIGEN_DEVICE_FUNC array(std::initializer_list l) : dummy() { EIGEN_UNUSED_VARIABLE(l); eigen_assert(l.size() == 0); } #endif private: T dummy; }; // Comparison operator // Todo: implement !=, <, <=, >, and >= template EIGEN_DEVICE_FUNC bool operator==(const array& lhs, const array& rhs) { for (std::size_t i = 0; i < N; ++i) { if (lhs[i] != rhs[i]) { return false; } } return true; } namespace internal { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& array_get(array& a) { return a[I_]; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& array_get(const array& a) { return a[I_]; } template struct array_size > { enum { value = N }; }; template struct array_size& > { enum { value = N }; }; template struct array_size > { enum { value = N }; }; template struct array_size& > { enum { value = N }; }; } // end namespace internal } // end namespace Eigen #else // The compiler supports c++11, and we're not targeting cuda: use std::array as Eigen::array #include namespace Eigen { template using array = std::array; namespace internal { /* std::get is only constexpr in C++14, not yet in C++11 * - libstdc++ from version 4.7 onwards has it nevertheless, * so use that * - libstdc++ older versions: use _M_instance directly * - libc++ all versions so far: use __elems_ directly * - all other libs: use std::get to be portable, but * this may not be constexpr */ #if defined(__GLIBCXX__) && __GLIBCXX__ < 20120322 #define STD_GET_ARR_HACK a._M_instance[I_] #elif defined(_LIBCPP_VERSION) #define STD_GET_ARR_HACK a.__elems_[I_] #else #define STD_GET_ARR_HACK std::template get(a) #endif template constexpr inline T& array_get(std::array& a) { return (T&) STD_GET_ARR_HACK; } template constexpr inline T&& array_get(std::array&& a) { return (T&&) STD_GET_ARR_HACK; } template constexpr inline T const& array_get(std::array const& a) { return (T const&) STD_GET_ARR_HACK; } #undef STD_GET_ARR_HACK } // end namespace internal } // end namespace Eigen #endif #endif // EIGEN_EMULATE_ARRAY_H RcppEigen/inst/include/unsupported/Eigen/MoreVectorization0000644000176200001440000000112014562417221023551 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MOREVECTORIZATION_MODULE_H #define EIGEN_MOREVECTORIZATION_MODULE_H #include "../../Eigen/Core" namespace Eigen { /** * \defgroup MoreVectorization More vectorization module */ } #include "src/MoreVectorization/MathFunctions.h" #endif // EIGEN_MOREVECTORIZATION_MODULE_H RcppEigen/inst/include/unsupported/Eigen/SparseExtra0000644000176200001440000000252014562417221022334 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSE_EXTRA_MODULE_H #define EIGEN_SPARSE_EXTRA_MODULE_H #include "../../Eigen/Sparse" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include #include #include #include #include #include #include #ifdef EIGEN_GOOGLEHASH_SUPPORT #include #include #endif /** * \defgroup SparseExtra_Module SparseExtra module * * This module contains some experimental features extending the sparse module. * * \code * #include * \endcode */ #include "src/SparseExtra/DynamicSparseMatrix.h" #include "src/SparseExtra/BlockOfDynamicSparseMatrix.h" #include "src/SparseExtra/RandomSetter.h" #include "src/SparseExtra/MarketIO.h" #if !defined(_WIN32) #include #include "src/SparseExtra/MatrixMarketIterator.h" #endif #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPARSE_EXTRA_MODULE_H RcppEigen/inst/include/unsupported/Eigen/EulerAngles0000644000176200001440000000214614562417221022305 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Tal Hadad // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EULERANGLES_MODULE_H #define EIGEN_EULERANGLES_MODULE_H #include "../../Eigen/Core" #include "../../Eigen/Geometry" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" namespace Eigen { /** * \defgroup EulerAngles_Module EulerAngles module * \brief This module provides generic euler angles rotation. * * Euler angles are a way to represent 3D rotation. * * In order to use this module in your code, include this header: * \code * #include * \endcode * * See \ref EulerAngles for more information. * */ } #include "src/EulerAngles/EulerSystem.h" #include "src/EulerAngles/EulerAngles.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_EULERANGLES_MODULE_H RcppEigen/inst/include/unsupported/Eigen/Skyline0000644000176200001440000000164214562417221021515 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SKYLINE_MODULE_H #define EIGEN_SKYLINE_MODULE_H #include "../../Eigen/Core" #include "../../Eigen/src/Core/util/DisableStupidWarnings.h" #include #include #include #include /** * \defgroup Skyline_Module Skyline module * * * * */ #include "src/Skyline/SkylineUtil.h" #include "src/Skyline/SkylineMatrixBase.h" #include "src/Skyline/SkylineStorage.h" #include "src/Skyline/SkylineMatrix.h" #include "src/Skyline/SkylineInplaceLU.h" #include "src/Skyline/SkylineProduct.h" #include "../../Eigen/src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SKYLINE_MODULE_H RcppEigen/inst/include/Eigen/0000755000176200001440000000000014562417221015561 5ustar liggesusersRcppEigen/inst/include/Eigen/Geometry0000644000176200001440000000362414562417221017304 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GEOMETRY_MODULE_H #define EIGEN_GEOMETRY_MODULE_H #include "Core" #include "SVD" #include "LU" #include #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup Geometry_Module Geometry module * * This module provides support for: * - fixed-size homogeneous transformations * - translation, scaling, 2D and 3D rotations * - \link Quaternion quaternions \endlink * - cross products (\ref MatrixBase::cross, \ref MatrixBase::cross3) * - orthognal vector generation (\ref MatrixBase::unitOrthogonal) * - some linear components: \link ParametrizedLine parametrized-lines \endlink and \link Hyperplane hyperplanes \endlink * - \link AlignedBox axis aligned bounding boxes \endlink * - \link umeyama least-square transformation fitting \endlink * * \code * #include * \endcode */ #include "src/Geometry/OrthoMethods.h" #include "src/Geometry/EulerAngles.h" #include "src/Geometry/Homogeneous.h" #include "src/Geometry/RotationBase.h" #include "src/Geometry/Rotation2D.h" #include "src/Geometry/Quaternion.h" #include "src/Geometry/AngleAxis.h" #include "src/Geometry/Transform.h" #include "src/Geometry/Translation.h" #include "src/Geometry/Scaling.h" #include "src/Geometry/Hyperplane.h" #include "src/Geometry/ParametrizedLine.h" #include "src/Geometry/AlignedBox.h" #include "src/Geometry/Umeyama.h" // Use the SSE optimized version whenever possible. #if (defined EIGEN_VECTORIZE_SSE) || (defined EIGEN_VECTORIZE_NEON) #include "src/Geometry/arch/Geometry_SIMD.h" #endif #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_GEOMETRY_MODULE_H RcppEigen/inst/include/Eigen/LU0000644000176200001440000000236414562417221016031 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LU_MODULE_H #define EIGEN_LU_MODULE_H #include "Core" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup LU_Module LU module * This module includes %LU decomposition and related notions such as matrix inversion and determinant. * This module defines the following MatrixBase methods: * - MatrixBase::inverse() * - MatrixBase::determinant() * * \code * #include * \endcode */ #include "src/misc/Kernel.h" #include "src/misc/Image.h" #include "src/LU/FullPivLU.h" #include "src/LU/PartialPivLU.h" #ifdef EIGEN_USE_LAPACKE #ifdef EIGEN_USE_MKL #include "mkl_lapacke.h" #else #include "src/misc/lapacke.h" #endif #include "src/LU/PartialPivLU_LAPACKE.h" #endif #include "src/LU/Determinant.h" #include "src/LU/InverseImpl.h" #if defined EIGEN_VECTORIZE_SSE || defined EIGEN_VECTORIZE_NEON #include "src/LU/arch/InverseSize4.h" #endif #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_LU_MODULE_H RcppEigen/inst/include/Eigen/IterativeLinearSolvers0000644000176200001440000000404314107270226022147 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ITERATIVELINEARSOLVERS_MODULE_H #define EIGEN_ITERATIVELINEARSOLVERS_MODULE_H #include "SparseCore" #include "OrderingMethods" #include "src/Core/util/DisableStupidWarnings.h" /** * \defgroup IterativeLinearSolvers_Module IterativeLinearSolvers module * * This module currently provides iterative methods to solve problems of the form \c A \c x = \c b, where \c A is a squared matrix, usually very large and sparse. * Those solvers are accessible via the following classes: * - ConjugateGradient for selfadjoint (hermitian) matrices, * - LeastSquaresConjugateGradient for rectangular least-square problems, * - BiCGSTAB for general square matrices. * * These iterative solvers are associated with some preconditioners: * - IdentityPreconditioner - not really useful * - DiagonalPreconditioner - also called Jacobi preconditioner, work very well on diagonal dominant matrices. * - IncompleteLUT - incomplete LU factorization with dual thresholding * * Such problems can also be solved using the direct sparse decomposition modules: SparseCholesky, CholmodSupport, UmfPackSupport, SuperLUSupport. * \code #include \endcode */ #include "src/IterativeLinearSolvers/SolveWithGuess.h" #include "src/IterativeLinearSolvers/IterativeSolverBase.h" #include "src/IterativeLinearSolvers/BasicPreconditioners.h" #include "src/IterativeLinearSolvers/ConjugateGradient.h" #include "src/IterativeLinearSolvers/LeastSquareConjugateGradient.h" #include "src/IterativeLinearSolvers/BiCGSTAB.h" #include "src/IterativeLinearSolvers/IncompleteLUT.h" #include "src/IterativeLinearSolvers/IncompleteCholesky.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_ITERATIVELINEARSOLVERS_MODULE_H RcppEigen/inst/include/Eigen/CholmodSupport0000644000176200001440000000350514562417221020471 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CHOLMODSUPPORT_MODULE_H #define EIGEN_CHOLMODSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" /** \ingroup Support_modules * \defgroup CholmodSupport_Module CholmodSupport module * * This module provides an interface to the Cholmod library which is part of the suitesparse package. * It provides the two following main factorization classes: * - class CholmodSupernodalLLT: a supernodal LLT Cholesky factorization. * - class CholmodDecomposiiton: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of the underlying factorization method (supernodal or simplicial). * * For the sake of completeness, this module also propose the two following classes: * - class CholmodSimplicialLLT * - class CholmodSimplicialLDLT * Note that these classes does not bring any particular advantage compared to the built-in * SimplicialLLT and SimplicialLDLT factorization classes. * * \code * #include * \endcode * * In order to use this module, the cholmod headers must be accessible from the include paths, and your binary must be linked to the cholmod library and its dependencies. * The dependencies depend on how cholmod has been compiled. * For a cmake based project, you can use our FindCholmod.cmake module to help you in this task. * */ #include "src/CholmodSupport/CholmodSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_CHOLMODSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/SparseCholesky0000644000176200001440000000232314562417221020443 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2013 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSECHOLESKY_MODULE_H #define EIGEN_SPARSECHOLESKY_MODULE_H #include "SparseCore" #include "OrderingMethods" #include "src/Core/util/DisableStupidWarnings.h" /** * \defgroup SparseCholesky_Module SparseCholesky module * * This module currently provides two variants of the direct sparse Cholesky decomposition for selfadjoint (hermitian) matrices. * Those decompositions are accessible via the following classes: * - SimplicialLLt, * - SimplicialLDLt * * Such problems can also be solved using the ConjugateGradient solver from the IterativeLinearSolvers module. * * \code * #include * \endcode */ #include "src/SparseCholesky/SimplicialCholesky.h" #include "src/SparseCholesky/SimplicialCholesky_impl.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPARSECHOLESKY_MODULE_H RcppEigen/inst/include/Eigen/Core0000644000176200001440000003077714562417221016412 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2007-2011 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CORE_H #define EIGEN_CORE_H // first thing Eigen does: stop the compiler from reporting useless warnings. #include "src/Core/util/DisableStupidWarnings.h" // then include this file where all our macros are defined. It's really important to do it first because // it's where we do all the compiler/OS/arch detections and define most defaults. #include "src/Core/util/Macros.h" // This detects SSE/AVX/NEON/etc. and configure alignment settings #include "src/Core/util/ConfigureVectorization.h" // We need cuda_runtime.h/hip_runtime.h to ensure that // the EIGEN_USING_STD macro works properly on the device side #if defined(EIGEN_CUDACC) #include #elif defined(EIGEN_HIPCC) #include #endif #ifdef EIGEN_EXCEPTIONS #include #endif // Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3) // See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details. #if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6) && EIGEN_GNUC_AT_MOST(5,5) #pragma GCC optimize ("-fno-ipa-cp-clone") #endif // Prevent ICC from specializing std::complex operators that silently fail // on device. This allows us to use our own device-compatible specializations // instead. #if defined(EIGEN_COMP_ICC) && defined(EIGEN_GPU_COMPILE_PHASE) \ && !defined(_OVERRIDE_COMPLEX_SPECIALIZATION_) #define _OVERRIDE_COMPLEX_SPECIALIZATION_ 1 #endif #include // this include file manages BLAS and MKL related macros // and inclusion of their respective header files #include "src/Core/util/MKL_support.h" #if defined(EIGEN_HAS_CUDA_FP16) || defined(EIGEN_HAS_HIP_FP16) #define EIGEN_HAS_GPU_FP16 #endif #if defined(EIGEN_HAS_CUDA_BF16) || defined(EIGEN_HAS_HIP_BF16) #define EIGEN_HAS_GPU_BF16 #endif #if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE) #define EIGEN_HAS_OPENMP #endif #ifdef EIGEN_HAS_OPENMP #include #endif // MSVC for windows mobile does not have the errno.h file #if !(EIGEN_COMP_MSVC && EIGEN_OS_WINCE) && !EIGEN_COMP_ARM #define EIGEN_HAS_ERRNO #endif #ifdef EIGEN_HAS_ERRNO #include #endif #include #include #include #include #include #include #ifndef EIGEN_NO_IO #include #endif #include #include #include #include // for CHAR_BIT // for min/max: #include #if EIGEN_HAS_CXX11 #include #endif // for std::is_nothrow_move_assignable #ifdef EIGEN_INCLUDE_TYPE_TRAITS #include #endif // for outputting debug info #ifdef EIGEN_DEBUG_ASSIGN #include #endif // required for __cpuid, needs to be included after cmath #if EIGEN_COMP_MSVC && EIGEN_ARCH_i386_OR_x86_64 && !EIGEN_OS_WINCE #include #endif #if defined(EIGEN_USE_SYCL) #undef min #undef max #undef isnan #undef isinf #undef isfinite #include #include #include #include #include #ifndef EIGEN_SYCL_LOCAL_THREAD_DIM0 #define EIGEN_SYCL_LOCAL_THREAD_DIM0 16 #endif #ifndef EIGEN_SYCL_LOCAL_THREAD_DIM1 #define EIGEN_SYCL_LOCAL_THREAD_DIM1 16 #endif #endif #if defined EIGEN2_SUPPORT_STAGE40_FULL_EIGEN3_STRICTNESS || defined EIGEN2_SUPPORT_STAGE30_FULL_EIGEN3_API || defined EIGEN2_SUPPORT_STAGE20_RESOLVE_API_CONFLICTS || defined EIGEN2_SUPPORT_STAGE10_FULL_EIGEN2_API || defined EIGEN2_SUPPORT // This will generate an error message: #error Eigen2-support is only available up to version 3.2. Please go to "http://eigen.tuxfamily.org/index.php?title=Eigen2" for further information #endif namespace Eigen { // we use size_t frequently and we'll never remember to prepend it with std:: every time just to // ensure QNX/QCC support using std::size_t; // gcc 4.6.0 wants std:: for ptrdiff_t using std::ptrdiff_t; } /** \defgroup Core_Module Core module * This is the main module of Eigen providing dense matrix and vector support * (both fixed and dynamic size) with all the features corresponding to a BLAS library * and much more... * * \code * #include * \endcode */ #include "src/Core/util/Constants.h" #include "src/Core/util/Meta.h" #include "src/Core/util/ForwardDeclarations.h" #include "src/Core/util/StaticAssert.h" #include "src/Core/util/XprHelper.h" #include "src/Core/util/Memory.h" #include "src/Core/util/IntegralConstant.h" #include "src/Core/util/SymbolicIndex.h" #include "src/Core/NumTraits.h" #include "src/Core/MathFunctions.h" #include "src/Core/GenericPacketMath.h" #include "src/Core/MathFunctionsImpl.h" #include "src/Core/arch/Default/ConjHelper.h" // Generic half float support #include "src/Core/arch/Default/Half.h" #include "src/Core/arch/Default/BFloat16.h" #include "src/Core/arch/Default/TypeCasting.h" #include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h" #if defined EIGEN_VECTORIZE_AVX512 #include "src/Core/arch/SSE/PacketMath.h" #include "src/Core/arch/SSE/TypeCasting.h" #include "src/Core/arch/SSE/Complex.h" #include "src/Core/arch/AVX/PacketMath.h" #include "src/Core/arch/AVX/TypeCasting.h" #include "src/Core/arch/AVX/Complex.h" #include "src/Core/arch/AVX512/PacketMath.h" #include "src/Core/arch/AVX512/TypeCasting.h" #include "src/Core/arch/AVX512/Complex.h" #include "src/Core/arch/SSE/MathFunctions.h" #include "src/Core/arch/AVX/MathFunctions.h" #include "src/Core/arch/AVX512/MathFunctions.h" #elif defined EIGEN_VECTORIZE_AVX // Use AVX for floats and doubles, SSE for integers #include "src/Core/arch/SSE/PacketMath.h" #include "src/Core/arch/SSE/TypeCasting.h" #include "src/Core/arch/SSE/Complex.h" #include "src/Core/arch/AVX/PacketMath.h" #include "src/Core/arch/AVX/TypeCasting.h" #include "src/Core/arch/AVX/Complex.h" #include "src/Core/arch/SSE/MathFunctions.h" #include "src/Core/arch/AVX/MathFunctions.h" #elif defined EIGEN_VECTORIZE_SSE #include "src/Core/arch/SSE/PacketMath.h" #include "src/Core/arch/SSE/TypeCasting.h" #include "src/Core/arch/SSE/MathFunctions.h" #include "src/Core/arch/SSE/Complex.h" #elif defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX) #include "src/Core/arch/AltiVec/PacketMath.h" #include "src/Core/arch/AltiVec/MathFunctions.h" #include "src/Core/arch/AltiVec/Complex.h" #elif defined EIGEN_VECTORIZE_NEON #include "src/Core/arch/NEON/PacketMath.h" #include "src/Core/arch/NEON/TypeCasting.h" #include "src/Core/arch/NEON/MathFunctions.h" #include "src/Core/arch/NEON/Complex.h" #elif defined EIGEN_VECTORIZE_SVE #include "src/Core/arch/SVE/PacketMath.h" #include "src/Core/arch/SVE/TypeCasting.h" #include "src/Core/arch/SVE/MathFunctions.h" #elif defined EIGEN_VECTORIZE_ZVECTOR #include "src/Core/arch/ZVector/PacketMath.h" #include "src/Core/arch/ZVector/MathFunctions.h" #include "src/Core/arch/ZVector/Complex.h" #elif defined EIGEN_VECTORIZE_MSA #include "src/Core/arch/MSA/PacketMath.h" #include "src/Core/arch/MSA/MathFunctions.h" #include "src/Core/arch/MSA/Complex.h" #endif #if defined EIGEN_VECTORIZE_GPU #include "src/Core/arch/GPU/PacketMath.h" #include "src/Core/arch/GPU/MathFunctions.h" #include "src/Core/arch/GPU/TypeCasting.h" #endif #if defined(EIGEN_USE_SYCL) #include "src/Core/arch/SYCL/SyclMemoryModel.h" #include "src/Core/arch/SYCL/InteropHeaders.h" #if !defined(EIGEN_DONT_VECTORIZE_SYCL) #include "src/Core/arch/SYCL/PacketMath.h" #include "src/Core/arch/SYCL/MathFunctions.h" #include "src/Core/arch/SYCL/TypeCasting.h" #endif #endif #include "src/Core/arch/Default/Settings.h" // This file provides generic implementations valid for scalar as well #include "src/Core/arch/Default/GenericPacketMathFunctions.h" #include "src/Core/functors/TernaryFunctors.h" #include "src/Core/functors/BinaryFunctors.h" #include "src/Core/functors/UnaryFunctors.h" #include "src/Core/functors/NullaryFunctors.h" #include "src/Core/functors/StlFunctors.h" #include "src/Core/functors/AssignmentFunctors.h" // Specialized functors to enable the processing of complex numbers // on CUDA devices #ifdef EIGEN_CUDACC #include "src/Core/arch/CUDA/Complex.h" #endif #include "src/Core/util/IndexedViewHelper.h" #include "src/Core/util/ReshapedHelper.h" #include "src/Core/ArithmeticSequence.h" #ifndef EIGEN_NO_IO #include "src/Core/IO.h" #endif #include "src/Core/DenseCoeffsBase.h" #include "src/Core/DenseBase.h" #include "src/Core/MatrixBase.h" #include "src/Core/EigenBase.h" #include "src/Core/Product.h" #include "src/Core/CoreEvaluators.h" #include "src/Core/AssignEvaluator.h" #ifndef EIGEN_PARSED_BY_DOXYGEN // work around Doxygen bug triggered by Assign.h r814874 // at least confirmed with Doxygen 1.5.5 and 1.5.6 #include "src/Core/Assign.h" #endif #include "src/Core/ArrayBase.h" #include "src/Core/util/BlasUtil.h" #include "src/Core/DenseStorage.h" #include "src/Core/NestByValue.h" // #include "src/Core/ForceAlignedAccess.h" #include "src/Core/ReturnByValue.h" #include "src/Core/NoAlias.h" #include "src/Core/PlainObjectBase.h" #include "src/Core/Matrix.h" #include "src/Core/Array.h" #include "src/Core/CwiseTernaryOp.h" #include "src/Core/CwiseBinaryOp.h" #include "src/Core/CwiseUnaryOp.h" #include "src/Core/CwiseNullaryOp.h" #include "src/Core/CwiseUnaryView.h" #include "src/Core/SelfCwiseBinaryOp.h" #include "src/Core/Dot.h" #include "src/Core/StableNorm.h" #include "src/Core/Stride.h" #include "src/Core/MapBase.h" #include "src/Core/Map.h" #include "src/Core/Ref.h" #include "src/Core/Block.h" #include "src/Core/VectorBlock.h" #include "src/Core/IndexedView.h" #include "src/Core/Reshaped.h" #include "src/Core/Transpose.h" #include "src/Core/DiagonalMatrix.h" #include "src/Core/Diagonal.h" #include "src/Core/DiagonalProduct.h" #include "src/Core/Redux.h" #include "src/Core/Visitor.h" #include "src/Core/Fuzzy.h" #include "src/Core/Swap.h" #include "src/Core/CommaInitializer.h" #include "src/Core/GeneralProduct.h" #include "src/Core/Solve.h" #include "src/Core/Inverse.h" #include "src/Core/SolverBase.h" #include "src/Core/PermutationMatrix.h" #include "src/Core/Transpositions.h" #include "src/Core/TriangularMatrix.h" #include "src/Core/SelfAdjointView.h" #include "src/Core/products/GeneralBlockPanelKernel.h" #include "src/Core/products/Parallelizer.h" #include "src/Core/ProductEvaluators.h" #include "src/Core/products/GeneralMatrixVector.h" #include "src/Core/products/GeneralMatrixMatrix.h" #include "src/Core/SolveTriangular.h" #include "src/Core/products/GeneralMatrixMatrixTriangular.h" #include "src/Core/products/SelfadjointMatrixVector.h" #include "src/Core/products/SelfadjointMatrixMatrix.h" #include "src/Core/products/SelfadjointProduct.h" #include "src/Core/products/SelfadjointRank2Update.h" #include "src/Core/products/TriangularMatrixVector.h" #include "src/Core/products/TriangularMatrixMatrix.h" #include "src/Core/products/TriangularSolverMatrix.h" #include "src/Core/products/TriangularSolverVector.h" #include "src/Core/BandMatrix.h" #include "src/Core/CoreIterators.h" #include "src/Core/ConditionEstimator.h" #if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX) #include "src/Core/arch/AltiVec/MatrixProduct.h" #elif defined EIGEN_VECTORIZE_NEON #include "src/Core/arch/NEON/GeneralBlockPanelKernel.h" #endif #include "src/Core/BooleanRedux.h" #include "src/Core/Select.h" #include "src/Core/VectorwiseOp.h" #include "src/Core/PartialReduxEvaluator.h" #include "src/Core/Random.h" #include "src/Core/Replicate.h" #include "src/Core/Reverse.h" #include "src/Core/ArrayWrapper.h" #include "src/Core/StlIterators.h" #ifdef EIGEN_USE_BLAS #include "src/Core/products/GeneralMatrixMatrix_BLAS.h" #include "src/Core/products/GeneralMatrixVector_BLAS.h" #include "src/Core/products/GeneralMatrixMatrixTriangular_BLAS.h" #include "src/Core/products/SelfadjointMatrixMatrix_BLAS.h" #include "src/Core/products/SelfadjointMatrixVector_BLAS.h" #include "src/Core/products/TriangularMatrixMatrix_BLAS.h" #include "src/Core/products/TriangularMatrixVector_BLAS.h" #include "src/Core/products/TriangularSolverMatrix_BLAS.h" #endif // EIGEN_USE_BLAS #ifdef EIGEN_USE_MKL_VML #include "src/Core/Assign_MKL.h" #endif #include "src/Core/GlobalFunctions.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_CORE_H RcppEigen/inst/include/Eigen/OrderingMethods0000644000176200001440000000462314562417221020606 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ORDERINGMETHODS_MODULE_H #define EIGEN_ORDERINGMETHODS_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" /** * \defgroup OrderingMethods_Module OrderingMethods module * * This module is currently for internal use only * * It defines various built-in and external ordering methods for sparse matrices. * They are typically used to reduce the number of elements during * the sparse matrix decomposition (LLT, LU, QR). * Precisely, in a preprocessing step, a permutation matrix P is computed using * those ordering methods and applied to the columns of the matrix. * Using for instance the sparse Cholesky decomposition, it is expected that * the nonzeros elements in LLT(A*P) will be much smaller than that in LLT(A). * * * Usage : * \code * #include * \endcode * * A simple usage is as a template parameter in the sparse decomposition classes : * * \code * SparseLU > solver; * \endcode * * \code * SparseQR > solver; * \endcode * * It is possible as well to call directly a particular ordering method for your own purpose, * \code * AMDOrdering ordering; * PermutationMatrix perm; * SparseMatrix A; * //Fill the matrix ... * * ordering(A, perm); // Call AMD * \endcode * * \note Some of these methods (like AMD or METIS), need the sparsity pattern * of the input matrix to be symmetric. When the matrix is structurally unsymmetric, * Eigen computes internally the pattern of \f$A^T*A\f$ before calling the method. * If your matrix is already symmetric (at leat in structure), you can avoid that * by calling the method with a SelfAdjointView type. * * \code * // Call the ordering on the pattern of the lower triangular matrix A * ordering(A.selfadjointView(), perm); * \endcode */ #include "src/OrderingMethods/Amd.h" #include "src/OrderingMethods/Ordering.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_ORDERINGMETHODS_MODULE_H RcppEigen/inst/include/Eigen/Householder0000644000176200001440000000147514562417221017774 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_HOUSEHOLDER_MODULE_H #define EIGEN_HOUSEHOLDER_MODULE_H #include "Core" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup Householder_Module Householder module * This module provides Householder transformations. * * \code * #include * \endcode */ #include "src/Householder/Householder.h" #include "src/Householder/HouseholderSequence.h" #include "src/Householder/BlockHouseholder.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_HOUSEHOLDER_MODULE_H RcppEigen/inst/include/Eigen/SparseLU0000644000176200001440000000342614562417221017207 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Désiré Nuentsa-Wakam // Copyright (C) 2012 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSELU_MODULE_H #define EIGEN_SPARSELU_MODULE_H #include "SparseCore" /** * \defgroup SparseLU_Module SparseLU module * This module defines a supernodal factorization of general sparse matrices. * The code is fully optimized for supernode-panel updates with specialized kernels. * Please, see the documentation of the SparseLU class for more details. */ // Ordering interface #include "OrderingMethods" #include "src/Core/util/DisableStupidWarnings.h" #include "src/SparseLU/SparseLU_gemm_kernel.h" #include "src/SparseLU/SparseLU_Structs.h" #include "src/SparseLU/SparseLU_SupernodalMatrix.h" #include "src/SparseLU/SparseLUImpl.h" #include "src/SparseCore/SparseColEtree.h" #include "src/SparseLU/SparseLU_Memory.h" #include "src/SparseLU/SparseLU_heap_relax_snode.h" #include "src/SparseLU/SparseLU_relax_snode.h" #include "src/SparseLU/SparseLU_pivotL.h" #include "src/SparseLU/SparseLU_panel_dfs.h" #include "src/SparseLU/SparseLU_kernel_bmod.h" #include "src/SparseLU/SparseLU_panel_bmod.h" #include "src/SparseLU/SparseLU_column_dfs.h" #include "src/SparseLU/SparseLU_column_bmod.h" #include "src/SparseLU/SparseLU_copy_to_ucol.h" #include "src/SparseLU/SparseLU_pruneL.h" #include "src/SparseLU/SparseLU_Utils.h" #include "src/SparseLU/SparseLU.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPARSELU_MODULE_H RcppEigen/inst/include/Eigen/Eigenvalues0000644000176200001440000000336114562417221017756 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EIGENVALUES_MODULE_H #define EIGEN_EIGENVALUES_MODULE_H #include "Core" #include "Cholesky" #include "Jacobi" #include "Householder" #include "LU" #include "Geometry" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup Eigenvalues_Module Eigenvalues module * * * * This module mainly provides various eigenvalue solvers. * This module also provides some MatrixBase methods, including: * - MatrixBase::eigenvalues(), * - MatrixBase::operatorNorm() * * \code * #include * \endcode */ #include "src/misc/RealSvd2x2.h" #include "src/Eigenvalues/Tridiagonalization.h" #include "src/Eigenvalues/RealSchur.h" #include "src/Eigenvalues/EigenSolver.h" #include "src/Eigenvalues/SelfAdjointEigenSolver.h" #include "src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h" #include "src/Eigenvalues/HessenbergDecomposition.h" #include "src/Eigenvalues/ComplexSchur.h" #include "src/Eigenvalues/ComplexEigenSolver.h" #include "src/Eigenvalues/RealQZ.h" #include "src/Eigenvalues/GeneralizedEigenSolver.h" #include "src/Eigenvalues/MatrixBaseEigenvalues.h" #ifdef EIGEN_USE_LAPACKE #ifdef EIGEN_USE_MKL #include "mkl_lapacke.h" #else #include "src/misc/lapacke.h" #endif #include "src/Eigenvalues/RealSchur_LAPACKE.h" #include "src/Eigenvalues/ComplexSchur_LAPACKE.h" #include "src/Eigenvalues/SelfAdjointEigenSolver_LAPACKE.h" #endif #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_EIGENVALUES_MODULE_H RcppEigen/inst/include/Eigen/QR0000644000176200001440000000237014562417221016030 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_QR_MODULE_H #define EIGEN_QR_MODULE_H #include "Core" #include "Cholesky" #include "Jacobi" #include "Householder" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup QR_Module QR module * * * * This module provides various QR decompositions * This module also provides some MatrixBase methods, including: * - MatrixBase::householderQr() * - MatrixBase::colPivHouseholderQr() * - MatrixBase::fullPivHouseholderQr() * * \code * #include * \endcode */ #include "src/QR/HouseholderQR.h" #include "src/QR/FullPivHouseholderQR.h" #include "src/QR/ColPivHouseholderQR.h" #include "src/QR/CompleteOrthogonalDecomposition.h" #ifdef EIGEN_USE_LAPACKE #ifdef EIGEN_USE_MKL #include "mkl_lapacke.h" #else #include "src/misc/lapacke.h" #endif #include "src/QR/HouseholderQR_LAPACKE.h" #include "src/QR/ColPivHouseholderQR_LAPACKE.h" #endif #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_QR_MODULE_H RcppEigen/inst/include/Eigen/SparseCore0000644000176200001440000000430014107270226017544 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSECORE_MODULE_H #define EIGEN_SPARSECORE_MODULE_H #include "Core" #include "src/Core/util/DisableStupidWarnings.h" #include #include #include #include #include /** * \defgroup SparseCore_Module SparseCore module * * This module provides a sparse matrix representation, and basic associated matrix manipulations * and operations. * * See the \ref TutorialSparse "Sparse tutorial" * * \code * #include * \endcode * * This module depends on: Core. */ #include "src/SparseCore/SparseUtil.h" #include "src/SparseCore/SparseMatrixBase.h" #include "src/SparseCore/SparseAssign.h" #include "src/SparseCore/CompressedStorage.h" #include "src/SparseCore/AmbiVector.h" #include "src/SparseCore/SparseCompressedBase.h" #include "src/SparseCore/SparseMatrix.h" #include "src/SparseCore/SparseMap.h" #include "src/SparseCore/MappedSparseMatrix.h" #include "src/SparseCore/SparseVector.h" #include "src/SparseCore/SparseRef.h" #include "src/SparseCore/SparseCwiseUnaryOp.h" #include "src/SparseCore/SparseCwiseBinaryOp.h" #include "src/SparseCore/SparseTranspose.h" #include "src/SparseCore/SparseBlock.h" #include "src/SparseCore/SparseDot.h" #include "src/SparseCore/SparseRedux.h" #include "src/SparseCore/SparseView.h" #include "src/SparseCore/SparseDiagonalProduct.h" #include "src/SparseCore/ConservativeSparseSparseProduct.h" #include "src/SparseCore/SparseSparseProductWithPruning.h" #include "src/SparseCore/SparseProduct.h" #include "src/SparseCore/SparseDenseProduct.h" #include "src/SparseCore/SparseSelfAdjointView.h" #include "src/SparseCore/SparseTriangularView.h" #include "src/SparseCore/TriangularSolver.h" #include "src/SparseCore/SparsePermutation.h" #include "src/SparseCore/SparseFuzzy.h" #include "src/SparseCore/SparseSolverBase.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SPARSECORE_MODULE_H RcppEigen/inst/include/Eigen/PardisoSupport0000644000176200001440000000213414562417221020502 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PARDISOSUPPORT_MODULE_H #define EIGEN_PARDISOSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" #include /** \ingroup Support_modules * \defgroup PardisoSupport_Module PardisoSupport module * * This module brings support for the Intel(R) MKL PARDISO direct sparse solvers. * * \code * #include * \endcode * * In order to use this module, the MKL headers must be accessible from the include paths, and your binary must be linked to the MKL library and its dependencies. * See this \ref TopicUsingIntelMKL "page" for more information on MKL-Eigen integration. * */ #include "src/PardisoSupport/PardisoSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_PARDISOSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/Cholesky0000644000176200001440000000221114562417221017261 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CHOLESKY_MODULE_H #define EIGEN_CHOLESKY_MODULE_H #include "Core" #include "Jacobi" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup Cholesky_Module Cholesky module * * * * This module provides two variants of the Cholesky decomposition for selfadjoint (hermitian) matrices. * Those decompositions are also accessible via the following methods: * - MatrixBase::llt() * - MatrixBase::ldlt() * - SelfAdjointView::llt() * - SelfAdjointView::ldlt() * * \code * #include * \endcode */ #include "src/Cholesky/LLT.h" #include "src/Cholesky/LDLT.h" #ifdef EIGEN_USE_LAPACKE #ifdef EIGEN_USE_MKL #include "mkl_lapacke.h" #else #include "src/misc/lapacke.h" #endif #include "src/Cholesky/LLT_LAPACKE.h" #endif #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_CHOLESKY_MODULE_H RcppEigen/inst/include/Eigen/MetisSupport0000644000176200001440000000173714107270226020167 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_METISSUPPORT_MODULE_H #define EIGEN_METISSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" extern "C" { #include } /** \ingroup Support_modules * \defgroup MetisSupport_Module MetisSupport module * * \code * #include * \endcode * This module defines an interface to the METIS reordering package (http://glaros.dtc.umn.edu/gkhome/views/metis). * It can be used just as any other built-in method as explained in \link OrderingMethods_Module here. \endlink */ #include "src/MetisSupport/MetisSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_METISSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/SuperLUSupport0000644000176200001440000000430314107270226020435 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SUPERLUSUPPORT_MODULE_H #define EIGEN_SUPERLUSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" #ifdef EMPTY #define EIGEN_EMPTY_WAS_ALREADY_DEFINED #endif typedef int int_t; #include #include #include // slu_util.h defines a preprocessor token named EMPTY which is really polluting, // so we remove it in favor of a SUPERLU_EMPTY token. // If EMPTY was already defined then we don't undef it. #if defined(EIGEN_EMPTY_WAS_ALREADY_DEFINED) # undef EIGEN_EMPTY_WAS_ALREADY_DEFINED #elif defined(EMPTY) # undef EMPTY #endif #define SUPERLU_EMPTY (-1) namespace Eigen { struct SluMatrix; } /** \ingroup Support_modules * \defgroup SuperLUSupport_Module SuperLUSupport module * * This module provides an interface to the SuperLU library. * It provides the following factorization class: * - class SuperLU: a supernodal sequential LU factorization. * - class SuperILU: a supernodal sequential incomplete LU factorization (to be used as a preconditioner for iterative methods). * * \warning This wrapper requires at least versions 4.0 of SuperLU. The 3.x versions are not supported. * * \warning When including this module, you have to use SUPERLU_EMPTY instead of EMPTY which is no longer defined because it is too polluting. * * \code * #include * \endcode * * In order to use this module, the superlu headers must be accessible from the include paths, and your binary must be linked to the superlu library and its dependencies. * The dependencies depend on how superlu has been compiled. * For a cmake based project, you can use our FindSuperLU.cmake module to help you in this task. * */ #include "src/SuperLUSupport/SuperLUSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_SUPERLUSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/Sparse0000644000176200001440000000157014562417221016744 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSE_MODULE_H #define EIGEN_SPARSE_MODULE_H /** \defgroup Sparse_Module Sparse meta-module * * Meta-module including all related modules: * - \ref SparseCore_Module * - \ref OrderingMethods_Module * - \ref SparseCholesky_Module * - \ref SparseLU_Module * - \ref SparseQR_Module * - \ref IterativeLinearSolvers_Module * \code #include \endcode */ #include "SparseCore" #include "OrderingMethods" #include "SparseCholesky" #include "SparseLU" #include "SparseQR" #include "IterativeLinearSolvers" #endif // EIGEN_SPARSE_MODULE_H RcppEigen/inst/include/Eigen/KLUSupport0000644000176200001440000000255514562417221017543 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_KLUSUPPORT_MODULE_H #define EIGEN_KLUSUPPORT_MODULE_H #include #include extern "C" { #include #include } /** \ingroup Support_modules * \defgroup KLUSupport_Module KLUSupport module * * This module provides an interface to the KLU library which is part of the suitesparse package. * It provides the following factorization class: * - class KLU: a sparse LU factorization, well-suited for circuit simulation. * * \code * #include * \endcode * * In order to use this module, the klu and btf headers must be accessible from the include paths, and your binary must be linked to the klu library and its dependencies. * The dependencies depend on how umfpack has been compiled. * For a cmake based project, you can use our FindKLU.cmake module to help you in this task. * */ #include "src/KLUSupport/KLUSupport.h" #include #endif // EIGEN_KLUSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/PaStiXSupport0000644000176200001440000000332714562417221020256 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PASTIXSUPPORT_MODULE_H #define EIGEN_PASTIXSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" extern "C" { #include #include } #ifdef complex #undef complex #endif /** \ingroup Support_modules * \defgroup PaStiXSupport_Module PaStiXSupport module * * This module provides an interface to the PaSTiX library. * PaSTiX is a general \b supernodal, \b parallel and \b opensource sparse solver. * It provides the two following main factorization classes: * - class PastixLLT : a supernodal, parallel LLt Cholesky factorization. * - class PastixLDLT: a supernodal, parallel LDLt Cholesky factorization. * - class PastixLU : a supernodal, parallel LU factorization (optimized for a symmetric pattern). * * \code * #include * \endcode * * In order to use this module, the PaSTiX headers must be accessible from the include paths, and your binary must be linked to the PaSTiX library and its dependencies. * This wrapper resuires PaStiX version 5.x compiled without MPI support. * The dependencies depend on how PaSTiX has been compiled. * For a cmake based project, you can use our FindPaSTiX.cmake module to help you in this task. * */ #include "src/PaStiXSupport/PaStiXSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_PASTIXSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/Dense0000644000176200001440000000017214107270226016537 0ustar liggesusers#include "Core" #include "LU" #include "Cholesky" #include "QR" #include "SVD" #include "Geometry" #include "Eigenvalues" RcppEigen/inst/include/Eigen/SparseQR0000644000176200001440000000225314107270226017203 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SPARSEQR_MODULE_H #define EIGEN_SPARSEQR_MODULE_H #include "SparseCore" #include "OrderingMethods" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup SparseQR_Module SparseQR module * \brief Provides QR decomposition for sparse matrices * * This module provides a simplicial version of the left-looking Sparse QR decomposition. * The columns of the input matrix should be reordered to limit the fill-in during the * decomposition. Built-in methods (COLAMD, AMD) or external methods (METIS) can be used to this end. * See the \link OrderingMethods_Module OrderingMethods\endlink module for the list * of built-in and external ordering methods. * * \code * #include * \endcode * * */ #include "src/SparseCore/SparseColEtree.h" #include "src/SparseQR/SparseQR.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif RcppEigen/inst/include/Eigen/Eigen0000644000176200001440000000004314107270226016525 0ustar liggesusers#include "Dense" #include "Sparse" RcppEigen/inst/include/Eigen/Jacobi0000644000176200001440000000157614562417221016704 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_JACOBI_MODULE_H #define EIGEN_JACOBI_MODULE_H #include "Core" #include "src/Core/util/DisableStupidWarnings.h" /** \defgroup Jacobi_Module Jacobi module * This module provides Jacobi and Givens rotations. * * \code * #include * \endcode * * In addition to listed classes, it defines the two following MatrixBase methods to apply a Jacobi or Givens rotation: * - MatrixBase::applyOnTheLeft() * - MatrixBase::applyOnTheRight(). */ #include "src/Jacobi/Jacobi.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_JACOBI_MODULE_H RcppEigen/inst/include/Eigen/UmfPackSupport0000644000176200001440000000254614107270226020433 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_UMFPACKSUPPORT_MODULE_H #define EIGEN_UMFPACKSUPPORT_MODULE_H #include "SparseCore" #include "src/Core/util/DisableStupidWarnings.h" extern "C" { #include } /** \ingroup Support_modules * \defgroup UmfPackSupport_Module UmfPackSupport module * * This module provides an interface to the UmfPack library which is part of the suitesparse package. * It provides the following factorization class: * - class UmfPackLU: a multifrontal sequential LU factorization. * * \code * #include * \endcode * * In order to use this module, the umfpack headers must be accessible from the include paths, and your binary must be linked to the umfpack library and its dependencies. * The dependencies depend on how umfpack has been compiled. * For a cmake based project, you can use our FindUmfPack.cmake module to help you in this task. * */ #include "src/UmfPackSupport/UmfPackSupport.h" #include "src/Core/util/ReenableStupidWarnings.h" #endif // EIGEN_UMFPACKSUPPORT_MODULE_H RcppEigen/inst/include/Eigen/StdList0000644000176200001440000000132614107270226017071 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STDLIST_MODULE_H #define EIGEN_STDLIST_MODULE_H #include "Core" #include #if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 && (EIGEN_MAX_STATIC_ALIGN_BYTES<=16) /* MSVC auto aligns up to 16 bytes in 64 bit builds */ #define EIGEN_DEFINE_STL_LIST_SPECIALIZATION(...) #else #include "src/StlSupport/StdList.h" #endif #endif // EIGEN_STDLIST_MODULE_H RcppEigen/inst/include/Eigen/src/0000755000176200001440000000000014562417221016350 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Geometry/0000755000176200001440000000000014562417221020143 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Geometry/Rotation2D.h0000644000176200001440000001531614562417221022307 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ROTATION2D_H #define EIGEN_ROTATION2D_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class Rotation2D * * \brief Represents a rotation/orientation in a 2 dimensional space. * * \tparam _Scalar the scalar type, i.e., the type of the coefficients * * This class is equivalent to a single scalar representing a counter clock wise rotation * as a single angle in radian. It provides some additional features such as the automatic * conversion from/to a 2x2 rotation matrix. Moreover this class aims to provide a similar * interface to Quaternion in order to facilitate the writing of generic algorithms * dealing with rotations. * * \sa class Quaternion, class Transform */ namespace internal { template struct traits > { typedef _Scalar Scalar; }; } // end namespace internal template class Rotation2D : public RotationBase,2> { typedef RotationBase,2> Base; public: using Base::operator*; enum { Dim = 2 }; /** the scalar type of the coefficients */ typedef _Scalar Scalar; typedef Matrix Vector2; typedef Matrix Matrix2; protected: Scalar m_angle; public: /** Construct a 2D counter clock wise rotation from the angle \a a in radian. */ EIGEN_DEVICE_FUNC explicit inline Rotation2D(const Scalar& a) : m_angle(a) {} /** Default constructor wihtout initialization. The represented rotation is undefined. */ EIGEN_DEVICE_FUNC Rotation2D() {} /** Construct a 2D rotation from a 2x2 rotation matrix \a mat. * * \sa fromRotationMatrix() */ template EIGEN_DEVICE_FUNC explicit Rotation2D(const MatrixBase& m) { fromRotationMatrix(m.derived()); } /** \returns the rotation angle */ EIGEN_DEVICE_FUNC inline Scalar angle() const { return m_angle; } /** \returns a read-write reference to the rotation angle */ EIGEN_DEVICE_FUNC inline Scalar& angle() { return m_angle; } /** \returns the rotation angle in [0,2pi] */ EIGEN_DEVICE_FUNC inline Scalar smallestPositiveAngle() const { Scalar tmp = numext::fmod(m_angle,Scalar(2*EIGEN_PI)); return tmpScalar(EIGEN_PI)) tmp -= Scalar(2*EIGEN_PI); else if(tmp<-Scalar(EIGEN_PI)) tmp += Scalar(2*EIGEN_PI); return tmp; } /** \returns the inverse rotation */ EIGEN_DEVICE_FUNC inline Rotation2D inverse() const { return Rotation2D(-m_angle); } /** Concatenates two rotations */ EIGEN_DEVICE_FUNC inline Rotation2D operator*(const Rotation2D& other) const { return Rotation2D(m_angle + other.m_angle); } /** Concatenates two rotations */ EIGEN_DEVICE_FUNC inline Rotation2D& operator*=(const Rotation2D& other) { m_angle += other.m_angle; return *this; } /** Applies the rotation to a 2D vector */ EIGEN_DEVICE_FUNC Vector2 operator* (const Vector2& vec) const { return toRotationMatrix() * vec; } template EIGEN_DEVICE_FUNC Rotation2D& fromRotationMatrix(const MatrixBase& m); EIGEN_DEVICE_FUNC Matrix2 toRotationMatrix() const; /** Set \c *this from a 2x2 rotation matrix \a mat. * In other words, this function extract the rotation angle from the rotation matrix. * * This method is an alias for fromRotationMatrix() * * \sa fromRotationMatrix() */ template EIGEN_DEVICE_FUNC Rotation2D& operator=(const MatrixBase& m) { return fromRotationMatrix(m.derived()); } /** \returns the spherical interpolation between \c *this and \a other using * parameter \a t. It is in fact equivalent to a linear interpolation. */ EIGEN_DEVICE_FUNC inline Rotation2D slerp(const Scalar& t, const Rotation2D& other) const { Scalar dist = Rotation2D(other.m_angle-m_angle).smallestAngle(); return Rotation2D(m_angle + dist*t); } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit Rotation2D(const Rotation2D& other) { m_angle = Scalar(other.angle()); } EIGEN_DEVICE_FUNC static inline Rotation2D Identity() { return Rotation2D(0); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const Rotation2D& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return internal::isApprox(m_angle,other.m_angle, prec); } }; /** \ingroup Geometry_Module * single precision 2D rotation type */ typedef Rotation2D Rotation2Df; /** \ingroup Geometry_Module * double precision 2D rotation type */ typedef Rotation2D Rotation2Dd; /** Set \c *this from a 2x2 rotation matrix \a mat. * In other words, this function extract the rotation angle * from the rotation matrix. */ template template EIGEN_DEVICE_FUNC Rotation2D& Rotation2D::fromRotationMatrix(const MatrixBase& mat) { EIGEN_USING_STD(atan2) EIGEN_STATIC_ASSERT(Derived::RowsAtCompileTime==2 && Derived::ColsAtCompileTime==2,YOU_MADE_A_PROGRAMMING_MISTAKE) m_angle = atan2(mat.coeff(1,0), mat.coeff(0,0)); return *this; } /** Constructs and \returns an equivalent 2x2 rotation matrix. */ template typename Rotation2D::Matrix2 EIGEN_DEVICE_FUNC Rotation2D::toRotationMatrix(void) const { EIGEN_USING_STD(sin) EIGEN_USING_STD(cos) Scalar sinA = sin(m_angle); Scalar cosA = cos(m_angle); return (Matrix2() << cosA, -sinA, sinA, cosA).finished(); } } // end namespace Eigen #endif // EIGEN_ROTATION2D_H RcppEigen/inst/include/Eigen/src/Geometry/Hyperplane.h0000644000176200001440000002727214562417221022435 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_HYPERPLANE_H #define EIGEN_HYPERPLANE_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class Hyperplane * * \brief A hyperplane * * A hyperplane is an affine subspace of dimension n-1 in a space of dimension n. * For example, a hyperplane in a plane is a line; a hyperplane in 3-space is a plane. * * \tparam _Scalar the scalar type, i.e., the type of the coefficients * \tparam _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic. * Notice that the dimension of the hyperplane is _AmbientDim-1. * * This class represents an hyperplane as the zero set of the implicit equation * \f$ n \cdot x + d = 0 \f$ where \f$ n \f$ is a unit normal vector of the plane (linear part) * and \f$ d \f$ is the distance (offset) to the origin. */ template class Hyperplane { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim==Dynamic ? Dynamic : _AmbientDim+1) enum { AmbientDimAtCompileTime = _AmbientDim, Options = _Options }; typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 typedef Matrix VectorType; typedef Matrix Coefficients; typedef Block NormalReturnType; typedef const Block ConstNormalReturnType; /** Default constructor without initialization */ EIGEN_DEVICE_FUNC inline Hyperplane() {} template EIGEN_DEVICE_FUNC Hyperplane(const Hyperplane& other) : m_coeffs(other.coeffs()) {} /** Constructs a dynamic-size hyperplane with \a _dim the dimension * of the ambient space */ EIGEN_DEVICE_FUNC inline explicit Hyperplane(Index _dim) : m_coeffs(_dim+1) {} /** Construct a plane from its normal \a n and a point \a e onto the plane. * \warning the vector normal is assumed to be normalized. */ EIGEN_DEVICE_FUNC inline Hyperplane(const VectorType& n, const VectorType& e) : m_coeffs(n.size()+1) { normal() = n; offset() = -n.dot(e); } /** Constructs a plane from its normal \a n and distance to the origin \a d * such that the algebraic equation of the plane is \f$ n \cdot x + d = 0 \f$. * \warning the vector normal is assumed to be normalized. */ EIGEN_DEVICE_FUNC inline Hyperplane(const VectorType& n, const Scalar& d) : m_coeffs(n.size()+1) { normal() = n; offset() = d; } /** Constructs a hyperplane passing through the two points. If the dimension of the ambient space * is greater than 2, then there isn't uniqueness, so an arbitrary choice is made. */ EIGEN_DEVICE_FUNC static inline Hyperplane Through(const VectorType& p0, const VectorType& p1) { Hyperplane result(p0.size()); result.normal() = (p1 - p0).unitOrthogonal(); result.offset() = -p0.dot(result.normal()); return result; } /** Constructs a hyperplane passing through the three points. The dimension of the ambient space * is required to be exactly 3. */ EIGEN_DEVICE_FUNC static inline Hyperplane Through(const VectorType& p0, const VectorType& p1, const VectorType& p2) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 3) Hyperplane result(p0.size()); VectorType v0(p2 - p0), v1(p1 - p0); result.normal() = v0.cross(v1); RealScalar norm = result.normal().norm(); if(norm <= v0.norm() * v1.norm() * NumTraits::epsilon()) { Matrix m; m << v0.transpose(), v1.transpose(); JacobiSVD > svd(m, ComputeFullV); result.normal() = svd.matrixV().col(2); } else result.normal() /= norm; result.offset() = -p0.dot(result.normal()); return result; } /** Constructs a hyperplane passing through the parametrized line \a parametrized. * If the dimension of the ambient space is greater than 2, then there isn't uniqueness, * so an arbitrary choice is made. */ // FIXME to be consistent with the rest this could be implemented as a static Through function ?? EIGEN_DEVICE_FUNC explicit Hyperplane(const ParametrizedLine& parametrized) { normal() = parametrized.direction().unitOrthogonal(); offset() = -parametrized.origin().dot(normal()); } EIGEN_DEVICE_FUNC ~Hyperplane() {} /** \returns the dimension in which the plane holds */ EIGEN_DEVICE_FUNC inline Index dim() const { return AmbientDimAtCompileTime==Dynamic ? m_coeffs.size()-1 : Index(AmbientDimAtCompileTime); } /** normalizes \c *this */ EIGEN_DEVICE_FUNC void normalize(void) { m_coeffs /= normal().norm(); } /** \returns the signed distance between the plane \c *this and a point \a p. * \sa absDistance() */ EIGEN_DEVICE_FUNC inline Scalar signedDistance(const VectorType& p) const { return normal().dot(p) + offset(); } /** \returns the absolute distance between the plane \c *this and a point \a p. * \sa signedDistance() */ EIGEN_DEVICE_FUNC inline Scalar absDistance(const VectorType& p) const { return numext::abs(signedDistance(p)); } /** \returns the projection of a point \a p onto the plane \c *this. */ EIGEN_DEVICE_FUNC inline VectorType projection(const VectorType& p) const { return p - signedDistance(p) * normal(); } /** \returns a constant reference to the unit normal vector of the plane, which corresponds * to the linear part of the implicit equation. */ EIGEN_DEVICE_FUNC inline ConstNormalReturnType normal() const { return ConstNormalReturnType(m_coeffs,0,0,dim(),1); } /** \returns a non-constant reference to the unit normal vector of the plane, which corresponds * to the linear part of the implicit equation. */ EIGEN_DEVICE_FUNC inline NormalReturnType normal() { return NormalReturnType(m_coeffs,0,0,dim(),1); } /** \returns the distance to the origin, which is also the "constant term" of the implicit equation * \warning the vector normal is assumed to be normalized. */ EIGEN_DEVICE_FUNC inline const Scalar& offset() const { return m_coeffs.coeff(dim()); } /** \returns a non-constant reference to the distance to the origin, which is also the constant part * of the implicit equation */ EIGEN_DEVICE_FUNC inline Scalar& offset() { return m_coeffs(dim()); } /** \returns a constant reference to the coefficients c_i of the plane equation: * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$ */ EIGEN_DEVICE_FUNC inline const Coefficients& coeffs() const { return m_coeffs; } /** \returns a non-constant reference to the coefficients c_i of the plane equation: * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$ */ EIGEN_DEVICE_FUNC inline Coefficients& coeffs() { return m_coeffs; } /** \returns the intersection of *this with \a other. * * \warning The ambient space must be a plane, i.e. have dimension 2, so that \c *this and \a other are lines. * * \note If \a other is approximately parallel to *this, this method will return any point on *this. */ EIGEN_DEVICE_FUNC VectorType intersection(const Hyperplane& other) const { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2) Scalar det = coeffs().coeff(0) * other.coeffs().coeff(1) - coeffs().coeff(1) * other.coeffs().coeff(0); // since the line equations ax+by=c are normalized with a^2+b^2=1, the following tests // whether the two lines are approximately parallel. if(internal::isMuchSmallerThan(det, Scalar(1))) { // special case where the two lines are approximately parallel. Pick any point on the first line. if(numext::abs(coeffs().coeff(1))>numext::abs(coeffs().coeff(0))) return VectorType(coeffs().coeff(1), -coeffs().coeff(2)/coeffs().coeff(1)-coeffs().coeff(0)); else return VectorType(-coeffs().coeff(2)/coeffs().coeff(0)-coeffs().coeff(1), coeffs().coeff(0)); } else { // general case Scalar invdet = Scalar(1) / det; return VectorType(invdet*(coeffs().coeff(1)*other.coeffs().coeff(2)-other.coeffs().coeff(1)*coeffs().coeff(2)), invdet*(other.coeffs().coeff(0)*coeffs().coeff(2)-coeffs().coeff(0)*other.coeffs().coeff(2))); } } /** Applies the transformation matrix \a mat to \c *this and returns a reference to \c *this. * * \param mat the Dim x Dim transformation matrix * \param traits specifies whether the matrix \a mat represents an #Isometry * or a more generic #Affine transformation. The default is #Affine. */ template EIGEN_DEVICE_FUNC inline Hyperplane& transform(const MatrixBase& mat, TransformTraits traits = Affine) { if (traits==Affine) { normal() = mat.inverse().transpose() * normal(); m_coeffs /= normal().norm(); } else if (traits==Isometry) normal() = mat * normal(); else { eigen_assert(0 && "invalid traits value in Hyperplane::transform()"); } return *this; } /** Applies the transformation \a t to \c *this and returns a reference to \c *this. * * \param t the transformation of dimension Dim * \param traits specifies whether the transformation \a t represents an #Isometry * or a more generic #Affine transformation. The default is #Affine. * Other kind of transformations are not supported. */ template EIGEN_DEVICE_FUNC inline Hyperplane& transform(const Transform& t, TransformTraits traits = Affine) { transform(t.linear(), traits); offset() -= normal().dot(t.translation()); return *this; } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit Hyperplane(const Hyperplane& other) { m_coeffs = other.coeffs().template cast(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ template EIGEN_DEVICE_FUNC bool isApprox(const Hyperplane& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return m_coeffs.isApprox(other.m_coeffs, prec); } protected: Coefficients m_coeffs; }; } // end namespace Eigen #endif // EIGEN_HYPERPLANE_H RcppEigen/inst/include/Eigen/src/Geometry/Umeyama.h0000644000176200001440000001405614107270226021715 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_UMEYAMA_H #define EIGEN_UMEYAMA_H // This file requires the user to include // * Eigen/Core // * Eigen/LU // * Eigen/SVD // * Eigen/Array namespace Eigen { #ifndef EIGEN_PARSED_BY_DOXYGEN // These helpers are required since it allows to use mixed types as parameters // for the Umeyama. The problem with mixed parameters is that the return type // cannot trivially be deduced when float and double types are mixed. namespace internal { // Compile time return type deduction for different MatrixBase types. // Different means here different alignment and parameters but the same underlying // real scalar type. template struct umeyama_transform_matrix_type { enum { MinRowsAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(MatrixType::RowsAtCompileTime, OtherMatrixType::RowsAtCompileTime), // When possible we want to choose some small fixed size value since the result // is likely to fit on the stack. So here, EIGEN_SIZE_MIN_PREFER_DYNAMIC is not what we want. HomogeneousDimension = int(MinRowsAtCompileTime) == Dynamic ? Dynamic : int(MinRowsAtCompileTime)+1 }; typedef Matrix::Scalar, HomogeneousDimension, HomogeneousDimension, AutoAlign | (traits::Flags & RowMajorBit ? RowMajor : ColMajor), HomogeneousDimension, HomogeneousDimension > type; }; } #endif /** * \geometry_module \ingroup Geometry_Module * * \brief Returns the transformation between two point sets. * * The algorithm is based on: * "Least-squares estimation of transformation parameters between two point patterns", * Shinji Umeyama, PAMI 1991, DOI: 10.1109/34.88573 * * It estimates parameters \f$ c, \mathbf{R}, \f$ and \f$ \mathbf{t} \f$ such that * \f{align*} * \frac{1}{n} \sum_{i=1}^n \vert\vert y_i - (c\mathbf{R}x_i + \mathbf{t}) \vert\vert_2^2 * \f} * is minimized. * * The algorithm is based on the analysis of the covariance matrix * \f$ \Sigma_{\mathbf{x}\mathbf{y}} \in \mathbb{R}^{d \times d} \f$ * of the input point sets \f$ \mathbf{x} \f$ and \f$ \mathbf{y} \f$ where * \f$d\f$ is corresponding to the dimension (which is typically small). * The analysis is involving the SVD having a complexity of \f$O(d^3)\f$ * though the actual computational effort lies in the covariance * matrix computation which has an asymptotic lower bound of \f$O(dm)\f$ when * the input point sets have dimension \f$d \times m\f$. * * Currently the method is working only for floating point matrices. * * \todo Should the return type of umeyama() become a Transform? * * \param src Source points \f$ \mathbf{x} = \left( x_1, \hdots, x_n \right) \f$. * \param dst Destination points \f$ \mathbf{y} = \left( y_1, \hdots, y_n \right) \f$. * \param with_scaling Sets \f$ c=1 \f$ when false is passed. * \return The homogeneous transformation * \f{align*} * T = \begin{bmatrix} c\mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{bmatrix} * \f} * minimizing the residual above. This transformation is always returned as an * Eigen::Matrix. */ template typename internal::umeyama_transform_matrix_type::type umeyama(const MatrixBase& src, const MatrixBase& dst, bool with_scaling = true) { typedef typename internal::umeyama_transform_matrix_type::type TransformationMatrixType; typedef typename internal::traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; EIGEN_STATIC_ASSERT(!NumTraits::IsComplex, NUMERIC_TYPE_MUST_BE_REAL) EIGEN_STATIC_ASSERT((internal::is_same::Scalar>::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) enum { Dimension = EIGEN_SIZE_MIN_PREFER_DYNAMIC(Derived::RowsAtCompileTime, OtherDerived::RowsAtCompileTime) }; typedef Matrix VectorType; typedef Matrix MatrixType; typedef typename internal::plain_matrix_type_row_major::type RowMajorMatrixType; const Index m = src.rows(); // dimension const Index n = src.cols(); // number of measurements // required for demeaning ... const RealScalar one_over_n = RealScalar(1) / static_cast(n); // computation of mean const VectorType src_mean = src.rowwise().sum() * one_over_n; const VectorType dst_mean = dst.rowwise().sum() * one_over_n; // demeaning of src and dst points const RowMajorMatrixType src_demean = src.colwise() - src_mean; const RowMajorMatrixType dst_demean = dst.colwise() - dst_mean; // Eq. (36)-(37) const Scalar src_var = src_demean.rowwise().squaredNorm().sum() * one_over_n; // Eq. (38) const MatrixType sigma = one_over_n * dst_demean * src_demean.transpose(); JacobiSVD svd(sigma, ComputeFullU | ComputeFullV); // Initialize the resulting transformation with an identity matrix... TransformationMatrixType Rt = TransformationMatrixType::Identity(m+1,m+1); // Eq. (39) VectorType S = VectorType::Ones(m); if ( svd.matrixU().determinant() * svd.matrixV().determinant() < 0 ) S(m-1) = -1; // Eq. (40) and (43) Rt.block(0,0,m,m).noalias() = svd.matrixU() * S.asDiagonal() * svd.matrixV().transpose(); if (with_scaling) { // Eq. (42) const Scalar c = Scalar(1)/src_var * svd.singularValues().dot(S); // Eq. (41) Rt.col(m).head(m) = dst_mean; Rt.col(m).head(m).noalias() -= c*Rt.topLeftCorner(m,m)*src_mean; Rt.block(0,0,m,m) *= c; } else { Rt.col(m).head(m) = dst_mean; Rt.col(m).head(m).noalias() -= Rt.topLeftCorner(m,m)*src_mean; } return Rt; } } // end namespace Eigen #endif // EIGEN_UMEYAMA_H RcppEigen/inst/include/Eigen/src/Geometry/Transform.h0000644000176200001440000017075214562417221022303 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2009 Benoit Jacob // Copyright (C) 2010 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRANSFORM_H #define EIGEN_TRANSFORM_H namespace Eigen { namespace internal { template struct transform_traits { enum { Dim = Transform::Dim, HDim = Transform::HDim, Mode = Transform::Mode, IsProjective = (int(Mode)==int(Projective)) }; }; template< typename TransformType, typename MatrixType, int Case = transform_traits::IsProjective ? 0 : int(MatrixType::RowsAtCompileTime) == int(transform_traits::HDim) ? 1 : 2, int RhsCols = MatrixType::ColsAtCompileTime> struct transform_right_product_impl; template< typename Other, int Mode, int Options, int Dim, int HDim, int OtherRows=Other::RowsAtCompileTime, int OtherCols=Other::ColsAtCompileTime> struct transform_left_product_impl; template< typename Lhs, typename Rhs, bool AnyProjective = transform_traits::IsProjective || transform_traits::IsProjective> struct transform_transform_product_impl; template< typename Other, int Mode, int Options, int Dim, int HDim, int OtherRows=Other::RowsAtCompileTime, int OtherCols=Other::ColsAtCompileTime> struct transform_construct_from_matrix; template struct transform_take_affine_part; template struct traits > { typedef _Scalar Scalar; typedef Eigen::Index StorageIndex; typedef Dense StorageKind; enum { Dim1 = _Dim==Dynamic ? _Dim : _Dim + 1, RowsAtCompileTime = _Mode==Projective ? Dim1 : _Dim, ColsAtCompileTime = Dim1, MaxRowsAtCompileTime = RowsAtCompileTime, MaxColsAtCompileTime = ColsAtCompileTime, Flags = 0 }; }; template struct transform_make_affine; } // end namespace internal /** \geometry_module \ingroup Geometry_Module * * \class Transform * * \brief Represents an homogeneous transformation in a N dimensional space * * \tparam _Scalar the scalar type, i.e., the type of the coefficients * \tparam _Dim the dimension of the space * \tparam _Mode the type of the transformation. Can be: * - #Affine: the transformation is stored as a (Dim+1)^2 matrix, * where the last row is assumed to be [0 ... 0 1]. * - #AffineCompact: the transformation is stored as a (Dim)x(Dim+1) matrix. * - #Projective: the transformation is stored as a (Dim+1)^2 matrix * without any assumption. * - #Isometry: same as #Affine with the additional assumption that * the linear part represents a rotation. This assumption is exploited * to speed up some functions such as inverse() and rotation(). * \tparam _Options has the same meaning as in class Matrix. It allows to specify DontAlign and/or RowMajor. * These Options are passed directly to the underlying matrix type. * * The homography is internally represented and stored by a matrix which * is available through the matrix() method. To understand the behavior of * this class you have to think a Transform object as its internal * matrix representation. The chosen convention is right multiply: * * \code v' = T * v \endcode * * Therefore, an affine transformation matrix M is shaped like this: * * \f$ \left( \begin{array}{cc} * linear & translation\\ * 0 ... 0 & 1 * \end{array} \right) \f$ * * Note that for a projective transformation the last row can be anything, * and then the interpretation of different parts might be slightly different. * * However, unlike a plain matrix, the Transform class provides many features * simplifying both its assembly and usage. In particular, it can be composed * with any other transformations (Transform,Translation,RotationBase,DiagonalMatrix) * and can be directly used to transform implicit homogeneous vectors. All these * operations are handled via the operator*. For the composition of transformations, * its principle consists to first convert the right/left hand sides of the product * to a compatible (Dim+1)^2 matrix and then perform a pure matrix product. * Of course, internally, operator* tries to perform the minimal number of operations * according to the nature of each terms. Likewise, when applying the transform * to points, the latters are automatically promoted to homogeneous vectors * before doing the matrix product. The conventions to homogeneous representations * are performed as follow: * * \b Translation t (Dim)x(1): * \f$ \left( \begin{array}{cc} * I & t \\ * 0\,...\,0 & 1 * \end{array} \right) \f$ * * \b Rotation R (Dim)x(Dim): * \f$ \left( \begin{array}{cc} * R & 0\\ * 0\,...\,0 & 1 * \end{array} \right) \f$ * * \b Scaling \b DiagonalMatrix S (Dim)x(Dim): * \f$ \left( \begin{array}{cc} * S & 0\\ * 0\,...\,0 & 1 * \end{array} \right) \f$ * * \b Column \b point v (Dim)x(1): * \f$ \left( \begin{array}{c} * v\\ * 1 * \end{array} \right) \f$ * * \b Set \b of \b column \b points V1...Vn (Dim)x(n): * \f$ \left( \begin{array}{ccc} * v_1 & ... & v_n\\ * 1 & ... & 1 * \end{array} \right) \f$ * * The concatenation of a Transform object with any kind of other transformation * always returns a Transform object. * * A little exception to the "as pure matrix product" rule is the case of the * transformation of non homogeneous vectors by an affine transformation. In * that case the last matrix row can be ignored, and the product returns non * homogeneous vectors. * * Since, for instance, a Dim x Dim matrix is interpreted as a linear transformation, * it is not possible to directly transform Dim vectors stored in a Dim x Dim matrix. * The solution is either to use a Dim x Dynamic matrix or explicitly request a * vector transformation by making the vector homogeneous: * \code * m' = T * m.colwise().homogeneous(); * \endcode * Note that there is zero overhead. * * Conversion methods from/to Qt's QMatrix and QTransform are available if the * preprocessor token EIGEN_QT_SUPPORT is defined. * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_TRANSFORM_PLUGIN. * * \sa class Matrix, class Quaternion */ template class Transform { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim==Dynamic ? Dynamic : (_Dim+1)*(_Dim+1)) enum { Mode = _Mode, Options = _Options, Dim = _Dim, ///< space dimension in which the transformation holds HDim = _Dim+1, ///< size of a respective homogeneous vector Rows = int(Mode)==(AffineCompact) ? Dim : HDim }; /** the scalar type of the coefficients */ typedef _Scalar Scalar; typedef Eigen::Index StorageIndex; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 /** type of the matrix used to represent the transformation */ typedef typename internal::make_proper_matrix_type::type MatrixType; /** constified MatrixType */ typedef const MatrixType ConstMatrixType; /** type of the matrix used to represent the linear part of the transformation */ typedef Matrix LinearMatrixType; /** type of read/write reference to the linear part of the transformation */ typedef Block LinearPart; /** type of read reference to the linear part of the transformation */ typedef const Block ConstLinearPart; /** type of read/write reference to the affine part of the transformation */ typedef typename internal::conditional >::type AffinePart; /** type of read reference to the affine part of the transformation */ typedef typename internal::conditional >::type ConstAffinePart; /** type of a vector */ typedef Matrix VectorType; /** type of a read/write reference to the translation part of the rotation */ typedef Block::Flags & RowMajorBit)> TranslationPart; /** type of a read reference to the translation part of the rotation */ typedef const Block::Flags & RowMajorBit)> ConstTranslationPart; /** corresponding translation type */ typedef Translation TranslationType; // this intermediate enum is needed to avoid an ICE with gcc 3.4 and 4.0 enum { TransformTimeDiagonalMode = ((Mode==int(Isometry))?Affine:int(Mode)) }; /** The return type of the product between a diagonal matrix and a transform */ typedef Transform TransformTimeDiagonalReturnType; protected: MatrixType m_matrix; public: /** Default constructor without initialization of the meaningful coefficients. * If Mode==Affine or Mode==Isometry, then the last row is set to [0 ... 0 1] */ EIGEN_DEVICE_FUNC inline Transform() { check_template_params(); internal::transform_make_affine<(int(Mode)==Affine || int(Mode)==Isometry) ? Affine : AffineCompact>::run(m_matrix); } EIGEN_DEVICE_FUNC inline explicit Transform(const TranslationType& t) { check_template_params(); *this = t; } EIGEN_DEVICE_FUNC inline explicit Transform(const UniformScaling& s) { check_template_params(); *this = s; } template EIGEN_DEVICE_FUNC inline explicit Transform(const RotationBase& r) { check_template_params(); *this = r; } typedef internal::transform_take_affine_part take_affine_part; /** Constructs and initializes a transformation from a Dim^2 or a (Dim+1)^2 matrix. */ template EIGEN_DEVICE_FUNC inline explicit Transform(const EigenBase& other) { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY); check_template_params(); internal::transform_construct_from_matrix::run(this, other.derived()); } /** Set \c *this from a Dim^2 or (Dim+1)^2 matrix. */ template EIGEN_DEVICE_FUNC inline Transform& operator=(const EigenBase& other) { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY); internal::transform_construct_from_matrix::run(this, other.derived()); return *this; } template EIGEN_DEVICE_FUNC inline Transform(const Transform& other) { check_template_params(); // only the options change, we can directly copy the matrices m_matrix = other.matrix(); } template EIGEN_DEVICE_FUNC inline Transform(const Transform& other) { check_template_params(); // prevent conversions as: // Affine | AffineCompact | Isometry = Projective EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(OtherMode==int(Projective), Mode==int(Projective)), YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION) // prevent conversions as: // Isometry = Affine | AffineCompact EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(OtherMode==int(Affine)||OtherMode==int(AffineCompact), Mode!=int(Isometry)), YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION) enum { ModeIsAffineCompact = Mode == int(AffineCompact), OtherModeIsAffineCompact = OtherMode == int(AffineCompact) }; if(EIGEN_CONST_CONDITIONAL(ModeIsAffineCompact == OtherModeIsAffineCompact)) { // We need the block expression because the code is compiled for all // combinations of transformations and will trigger a compile time error // if one tries to assign the matrices directly m_matrix.template block(0,0) = other.matrix().template block(0,0); makeAffine(); } else if(EIGEN_CONST_CONDITIONAL(OtherModeIsAffineCompact)) { typedef typename Transform::MatrixType OtherMatrixType; internal::transform_construct_from_matrix::run(this, other.matrix()); } else { // here we know that Mode == AffineCompact and OtherMode != AffineCompact. // if OtherMode were Projective, the static assert above would already have caught it. // So the only possibility is that OtherMode == Affine linear() = other.linear(); translation() = other.translation(); } } template EIGEN_DEVICE_FUNC Transform(const ReturnByValue& other) { check_template_params(); other.evalTo(*this); } template EIGEN_DEVICE_FUNC Transform& operator=(const ReturnByValue& other) { other.evalTo(*this); return *this; } #ifdef EIGEN_QT_SUPPORT inline Transform(const QMatrix& other); inline Transform& operator=(const QMatrix& other); inline QMatrix toQMatrix(void) const; inline Transform(const QTransform& other); inline Transform& operator=(const QTransform& other); inline QTransform toQTransform(void) const; #endif EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return int(Mode)==int(Projective) ? m_matrix.cols() : (m_matrix.cols()-1); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); } /** shortcut for m_matrix(row,col); * \sa MatrixBase::operator(Index,Index) const */ EIGEN_DEVICE_FUNC inline Scalar operator() (Index row, Index col) const { return m_matrix(row,col); } /** shortcut for m_matrix(row,col); * \sa MatrixBase::operator(Index,Index) */ EIGEN_DEVICE_FUNC inline Scalar& operator() (Index row, Index col) { return m_matrix(row,col); } /** \returns a read-only expression of the transformation matrix */ EIGEN_DEVICE_FUNC inline const MatrixType& matrix() const { return m_matrix; } /** \returns a writable expression of the transformation matrix */ EIGEN_DEVICE_FUNC inline MatrixType& matrix() { return m_matrix; } /** \returns a read-only expression of the linear part of the transformation */ EIGEN_DEVICE_FUNC inline ConstLinearPart linear() const { return ConstLinearPart(m_matrix,0,0); } /** \returns a writable expression of the linear part of the transformation */ EIGEN_DEVICE_FUNC inline LinearPart linear() { return LinearPart(m_matrix,0,0); } /** \returns a read-only expression of the Dim x HDim affine part of the transformation */ EIGEN_DEVICE_FUNC inline ConstAffinePart affine() const { return take_affine_part::run(m_matrix); } /** \returns a writable expression of the Dim x HDim affine part of the transformation */ EIGEN_DEVICE_FUNC inline AffinePart affine() { return take_affine_part::run(m_matrix); } /** \returns a read-only expression of the translation vector of the transformation */ EIGEN_DEVICE_FUNC inline ConstTranslationPart translation() const { return ConstTranslationPart(m_matrix,0,Dim); } /** \returns a writable expression of the translation vector of the transformation */ EIGEN_DEVICE_FUNC inline TranslationPart translation() { return TranslationPart(m_matrix,0,Dim); } /** \returns an expression of the product between the transform \c *this and a matrix expression \a other. * * The right-hand-side \a other can be either: * \li an homogeneous vector of size Dim+1, * \li a set of homogeneous vectors of size Dim+1 x N, * \li a transformation matrix of size Dim+1 x Dim+1. * * Moreover, if \c *this represents an affine transformation (i.e., Mode!=Projective), then \a other can also be: * \li a point of size Dim (computes: \code this->linear() * other + this->translation()\endcode), * \li a set of N points as a Dim x N matrix (computes: \code (this->linear() * other).colwise() + this->translation()\endcode), * * In all cases, the return type is a matrix or vector of same sizes as the right-hand-side \a other. * * If you want to interpret \a other as a linear or affine transformation, then first convert it to a Transform<> type, * or do your own cooking. * * Finally, if you want to apply Affine transformations to vectors, then explicitly apply the linear part only: * \code * Affine3f A; * Vector3f v1, v2; * v2 = A.linear() * v1; * \endcode * */ // note: this function is defined here because some compilers cannot find the respective declaration template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::transform_right_product_impl::ResultType operator * (const EigenBase &other) const { return internal::transform_right_product_impl::run(*this,other.derived()); } /** \returns the product expression of a transformation matrix \a a times a transform \a b * * The left hand side \a other can be either: * \li a linear transformation matrix of size Dim x Dim, * \li an affine transformation matrix of size Dim x Dim+1, * \li a general transformation matrix of size Dim+1 x Dim+1. */ template friend EIGEN_DEVICE_FUNC inline const typename internal::transform_left_product_impl::ResultType operator * (const EigenBase &a, const Transform &b) { return internal::transform_left_product_impl::run(a.derived(),b); } /** \returns The product expression of a transform \a a times a diagonal matrix \a b * * The rhs diagonal matrix is interpreted as an affine scaling transformation. The * product results in a Transform of the same type (mode) as the lhs only if the lhs * mode is no isometry. In that case, the returned transform is an affinity. */ template EIGEN_DEVICE_FUNC inline const TransformTimeDiagonalReturnType operator * (const DiagonalBase &b) const { TransformTimeDiagonalReturnType res(*this); res.linearExt() *= b; return res; } /** \returns The product expression of a diagonal matrix \a a times a transform \a b * * The lhs diagonal matrix is interpreted as an affine scaling transformation. The * product results in a Transform of the same type (mode) as the lhs only if the lhs * mode is no isometry. In that case, the returned transform is an affinity. */ template EIGEN_DEVICE_FUNC friend inline TransformTimeDiagonalReturnType operator * (const DiagonalBase &a, const Transform &b) { TransformTimeDiagonalReturnType res; res.linear().noalias() = a*b.linear(); res.translation().noalias() = a*b.translation(); if (EIGEN_CONST_CONDITIONAL(Mode!=int(AffineCompact))) res.matrix().row(Dim) = b.matrix().row(Dim); return res; } template EIGEN_DEVICE_FUNC inline Transform& operator*=(const EigenBase& other) { return *this = *this * other; } /** Concatenates two transformations */ EIGEN_DEVICE_FUNC inline const Transform operator * (const Transform& other) const { return internal::transform_transform_product_impl::run(*this,other); } #if EIGEN_COMP_ICC private: // this intermediate structure permits to workaround a bug in ICC 11: // error: template instantiation resulted in unexpected function type of "Eigen::Transform // (const Eigen::Transform &) const" // (the meaning of a name may have changed since the template declaration -- the type of the template is: // "Eigen::internal::transform_transform_product_impl, // Eigen::Transform, >::ResultType (const Eigen::Transform &) const") // template struct icc_11_workaround { typedef internal::transform_transform_product_impl > ProductType; typedef typename ProductType::ResultType ResultType; }; public: /** Concatenates two different transformations */ template inline typename icc_11_workaround::ResultType operator * (const Transform& other) const { typedef typename icc_11_workaround::ProductType ProductType; return ProductType::run(*this,other); } #else /** Concatenates two different transformations */ template EIGEN_DEVICE_FUNC inline typename internal::transform_transform_product_impl >::ResultType operator * (const Transform& other) const { return internal::transform_transform_product_impl >::run(*this,other); } #endif /** \sa MatrixBase::setIdentity() */ EIGEN_DEVICE_FUNC void setIdentity() { m_matrix.setIdentity(); } /** * \brief Returns an identity transformation. * \todo In the future this function should be returning a Transform expression. */ EIGEN_DEVICE_FUNC static const Transform Identity() { return Transform(MatrixType::Identity()); } template EIGEN_DEVICE_FUNC inline Transform& scale(const MatrixBase &other); template EIGEN_DEVICE_FUNC inline Transform& prescale(const MatrixBase &other); EIGEN_DEVICE_FUNC inline Transform& scale(const Scalar& s); EIGEN_DEVICE_FUNC inline Transform& prescale(const Scalar& s); template EIGEN_DEVICE_FUNC inline Transform& translate(const MatrixBase &other); template EIGEN_DEVICE_FUNC inline Transform& pretranslate(const MatrixBase &other); template EIGEN_DEVICE_FUNC inline Transform& rotate(const RotationType& rotation); template EIGEN_DEVICE_FUNC inline Transform& prerotate(const RotationType& rotation); EIGEN_DEVICE_FUNC Transform& shear(const Scalar& sx, const Scalar& sy); EIGEN_DEVICE_FUNC Transform& preshear(const Scalar& sx, const Scalar& sy); EIGEN_DEVICE_FUNC inline Transform& operator=(const TranslationType& t); EIGEN_DEVICE_FUNC inline Transform& operator*=(const TranslationType& t) { return translate(t.vector()); } EIGEN_DEVICE_FUNC inline Transform operator*(const TranslationType& t) const; EIGEN_DEVICE_FUNC inline Transform& operator=(const UniformScaling& t); EIGEN_DEVICE_FUNC inline Transform& operator*=(const UniformScaling& s) { return scale(s.factor()); } EIGEN_DEVICE_FUNC inline TransformTimeDiagonalReturnType operator*(const UniformScaling& s) const { TransformTimeDiagonalReturnType res = *this; res.scale(s.factor()); return res; } EIGEN_DEVICE_FUNC inline Transform& operator*=(const DiagonalMatrix& s) { linearExt() *= s; return *this; } template EIGEN_DEVICE_FUNC inline Transform& operator=(const RotationBase& r); template EIGEN_DEVICE_FUNC inline Transform& operator*=(const RotationBase& r) { return rotate(r.toRotationMatrix()); } template EIGEN_DEVICE_FUNC inline Transform operator*(const RotationBase& r) const; typedef typename internal::conditional::type RotationReturnType; EIGEN_DEVICE_FUNC RotationReturnType rotation() const; template EIGEN_DEVICE_FUNC void computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const; template EIGEN_DEVICE_FUNC void computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const; template EIGEN_DEVICE_FUNC Transform& fromPositionOrientationScale(const MatrixBase &position, const OrientationType& orientation, const MatrixBase &scale); EIGEN_DEVICE_FUNC inline Transform inverse(TransformTraits traits = (TransformTraits)Mode) const; /** \returns a const pointer to the column major internal matrix */ EIGEN_DEVICE_FUNC const Scalar* data() const { return m_matrix.data(); } /** \returns a non-const pointer to the column major internal matrix */ EIGEN_DEVICE_FUNC Scalar* data() { return m_matrix.data(); } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit Transform(const Transform& other) { check_template_params(); m_matrix = other.matrix().template cast(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const Transform& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return m_matrix.isApprox(other.m_matrix, prec); } /** Sets the last row to [0 ... 0 1] */ EIGEN_DEVICE_FUNC void makeAffine() { internal::transform_make_affine::run(m_matrix); } /** \internal * \returns the Dim x Dim linear part if the transformation is affine, * and the HDim x Dim part for projective transformations. */ EIGEN_DEVICE_FUNC inline Block linearExt() { return m_matrix.template block(0,0); } /** \internal * \returns the Dim x Dim linear part if the transformation is affine, * and the HDim x Dim part for projective transformations. */ EIGEN_DEVICE_FUNC inline const Block linearExt() const { return m_matrix.template block(0,0); } /** \internal * \returns the translation part if the transformation is affine, * and the last column for projective transformations. */ EIGEN_DEVICE_FUNC inline Block translationExt() { return m_matrix.template block(0,Dim); } /** \internal * \returns the translation part if the transformation is affine, * and the last column for projective transformations. */ EIGEN_DEVICE_FUNC inline const Block translationExt() const { return m_matrix.template block(0,Dim); } #ifdef EIGEN_TRANSFORM_PLUGIN #include EIGEN_TRANSFORM_PLUGIN #endif protected: #ifndef EIGEN_PARSED_BY_DOXYGEN EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void check_template_params() { EIGEN_STATIC_ASSERT((Options & (DontAlign|RowMajor)) == Options, INVALID_MATRIX_TEMPLATE_PARAMETERS) } #endif }; /** \ingroup Geometry_Module */ typedef Transform Isometry2f; /** \ingroup Geometry_Module */ typedef Transform Isometry3f; /** \ingroup Geometry_Module */ typedef Transform Isometry2d; /** \ingroup Geometry_Module */ typedef Transform Isometry3d; /** \ingroup Geometry_Module */ typedef Transform Affine2f; /** \ingroup Geometry_Module */ typedef Transform Affine3f; /** \ingroup Geometry_Module */ typedef Transform Affine2d; /** \ingroup Geometry_Module */ typedef Transform Affine3d; /** \ingroup Geometry_Module */ typedef Transform AffineCompact2f; /** \ingroup Geometry_Module */ typedef Transform AffineCompact3f; /** \ingroup Geometry_Module */ typedef Transform AffineCompact2d; /** \ingroup Geometry_Module */ typedef Transform AffineCompact3d; /** \ingroup Geometry_Module */ typedef Transform Projective2f; /** \ingroup Geometry_Module */ typedef Transform Projective3f; /** \ingroup Geometry_Module */ typedef Transform Projective2d; /** \ingroup Geometry_Module */ typedef Transform Projective3d; /************************** *** Optional QT support *** **************************/ #ifdef EIGEN_QT_SUPPORT /** Initializes \c *this from a QMatrix assuming the dimension is 2. * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template Transform::Transform(const QMatrix& other) { check_template_params(); *this = other; } /** Set \c *this from a QMatrix assuming the dimension is 2. * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template Transform& Transform::operator=(const QMatrix& other) { EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE) if (EIGEN_CONST_CONDITIONAL(Mode == int(AffineCompact))) m_matrix << other.m11(), other.m21(), other.dx(), other.m12(), other.m22(), other.dy(); else m_matrix << other.m11(), other.m21(), other.dx(), other.m12(), other.m22(), other.dy(), 0, 0, 1; return *this; } /** \returns a QMatrix from \c *this assuming the dimension is 2. * * \warning this conversion might loss data if \c *this is not affine * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template QMatrix Transform::toQMatrix(void) const { check_template_params(); EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE) return QMatrix(m_matrix.coeff(0,0), m_matrix.coeff(1,0), m_matrix.coeff(0,1), m_matrix.coeff(1,1), m_matrix.coeff(0,2), m_matrix.coeff(1,2)); } /** Initializes \c *this from a QTransform assuming the dimension is 2. * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template Transform::Transform(const QTransform& other) { check_template_params(); *this = other; } /** Set \c *this from a QTransform assuming the dimension is 2. * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template Transform& Transform::operator=(const QTransform& other) { check_template_params(); EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE) if (EIGEN_CONST_CONDITIONAL(Mode == int(AffineCompact))) m_matrix << other.m11(), other.m21(), other.dx(), other.m12(), other.m22(), other.dy(); else m_matrix << other.m11(), other.m21(), other.dx(), other.m12(), other.m22(), other.dy(), other.m13(), other.m23(), other.m33(); return *this; } /** \returns a QTransform from \c *this assuming the dimension is 2. * * This function is available only if the token EIGEN_QT_SUPPORT is defined. */ template QTransform Transform::toQTransform(void) const { EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE) if (EIGEN_CONST_CONDITIONAL(Mode == int(AffineCompact))) return QTransform(m_matrix.coeff(0,0), m_matrix.coeff(1,0), m_matrix.coeff(0,1), m_matrix.coeff(1,1), m_matrix.coeff(0,2), m_matrix.coeff(1,2)); else return QTransform(m_matrix.coeff(0,0), m_matrix.coeff(1,0), m_matrix.coeff(2,0), m_matrix.coeff(0,1), m_matrix.coeff(1,1), m_matrix.coeff(2,1), m_matrix.coeff(0,2), m_matrix.coeff(1,2), m_matrix.coeff(2,2)); } #endif /********************* *** Procedural API *** *********************/ /** Applies on the right the non uniform scale transformation represented * by the vector \a other to \c *this and returns a reference to \c *this. * \sa prescale() */ template template EIGEN_DEVICE_FUNC Transform& Transform::scale(const MatrixBase &other) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim)) EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) linearExt().noalias() = (linearExt() * other.asDiagonal()); return *this; } /** Applies on the right a uniform scale of a factor \a c to \c *this * and returns a reference to \c *this. * \sa prescale(Scalar) */ template EIGEN_DEVICE_FUNC inline Transform& Transform::scale(const Scalar& s) { EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) linearExt() *= s; return *this; } /** Applies on the left the non uniform scale transformation represented * by the vector \a other to \c *this and returns a reference to \c *this. * \sa scale() */ template template EIGEN_DEVICE_FUNC Transform& Transform::prescale(const MatrixBase &other) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim)) EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) affine().noalias() = (other.asDiagonal() * affine()); return *this; } /** Applies on the left a uniform scale of a factor \a c to \c *this * and returns a reference to \c *this. * \sa scale(Scalar) */ template EIGEN_DEVICE_FUNC inline Transform& Transform::prescale(const Scalar& s) { EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) m_matrix.template topRows() *= s; return *this; } /** Applies on the right the translation matrix represented by the vector \a other * to \c *this and returns a reference to \c *this. * \sa pretranslate() */ template template EIGEN_DEVICE_FUNC Transform& Transform::translate(const MatrixBase &other) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim)) translationExt() += linearExt() * other; return *this; } /** Applies on the left the translation matrix represented by the vector \a other * to \c *this and returns a reference to \c *this. * \sa translate() */ template template EIGEN_DEVICE_FUNC Transform& Transform::pretranslate(const MatrixBase &other) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim)) if(EIGEN_CONST_CONDITIONAL(int(Mode)==int(Projective))) affine() += other * m_matrix.row(Dim); else translation() += other; return *this; } /** Applies on the right the rotation represented by the rotation \a rotation * to \c *this and returns a reference to \c *this. * * The template parameter \a RotationType is the type of the rotation which * must be known by internal::toRotationMatrix<>. * * Natively supported types includes: * - any scalar (2D), * - a Dim x Dim matrix expression, * - a Quaternion (3D), * - a AngleAxis (3D) * * This mechanism is easily extendable to support user types such as Euler angles, * or a pair of Quaternion for 4D rotations. * * \sa rotate(Scalar), class Quaternion, class AngleAxis, prerotate(RotationType) */ template template EIGEN_DEVICE_FUNC Transform& Transform::rotate(const RotationType& rotation) { linearExt() *= internal::toRotationMatrix(rotation); return *this; } /** Applies on the left the rotation represented by the rotation \a rotation * to \c *this and returns a reference to \c *this. * * See rotate() for further details. * * \sa rotate() */ template template EIGEN_DEVICE_FUNC Transform& Transform::prerotate(const RotationType& rotation) { m_matrix.template block(0,0) = internal::toRotationMatrix(rotation) * m_matrix.template block(0,0); return *this; } /** Applies on the right the shear transformation represented * by the vector \a other to \c *this and returns a reference to \c *this. * \warning 2D only. * \sa preshear() */ template EIGEN_DEVICE_FUNC Transform& Transform::shear(const Scalar& sx, const Scalar& sy) { EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE) EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) VectorType tmp = linear().col(0)*sy + linear().col(1); linear() << linear().col(0) + linear().col(1)*sx, tmp; return *this; } /** Applies on the left the shear transformation represented * by the vector \a other to \c *this and returns a reference to \c *this. * \warning 2D only. * \sa shear() */ template EIGEN_DEVICE_FUNC Transform& Transform::preshear(const Scalar& sx, const Scalar& sy) { EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE) EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS) m_matrix.template block(0,0) = LinearMatrixType(1, sx, sy, 1) * m_matrix.template block(0,0); return *this; } /****************************************************** *** Scaling, Translation and Rotation compatibility *** ******************************************************/ template EIGEN_DEVICE_FUNC inline Transform& Transform::operator=(const TranslationType& t) { linear().setIdentity(); translation() = t.vector(); makeAffine(); return *this; } template EIGEN_DEVICE_FUNC inline Transform Transform::operator*(const TranslationType& t) const { Transform res = *this; res.translate(t.vector()); return res; } template EIGEN_DEVICE_FUNC inline Transform& Transform::operator=(const UniformScaling& s) { m_matrix.setZero(); linear().diagonal().fill(s.factor()); makeAffine(); return *this; } template template EIGEN_DEVICE_FUNC inline Transform& Transform::operator=(const RotationBase& r) { linear() = internal::toRotationMatrix(r); translation().setZero(); makeAffine(); return *this; } template template EIGEN_DEVICE_FUNC inline Transform Transform::operator*(const RotationBase& r) const { Transform res = *this; res.rotate(r.derived()); return res; } /************************ *** Special functions *** ************************/ namespace internal { template struct transform_rotation_impl { template EIGEN_DEVICE_FUNC static inline const typename TransformType::LinearMatrixType run(const TransformType& t) { typedef typename TransformType::LinearMatrixType LinearMatrixType; LinearMatrixType result; t.computeRotationScaling(&result, (LinearMatrixType*)0); return result; } }; template<> struct transform_rotation_impl { template EIGEN_DEVICE_FUNC static inline typename TransformType::ConstLinearPart run(const TransformType& t) { return t.linear(); } }; } /** \returns the rotation part of the transformation * * If Mode==Isometry, then this method is an alias for linear(), * otherwise it calls computeRotationScaling() to extract the rotation * through a SVD decomposition. * * \svd_module * * \sa computeRotationScaling(), computeScalingRotation(), class SVD */ template EIGEN_DEVICE_FUNC typename Transform::RotationReturnType Transform::rotation() const { return internal::transform_rotation_impl::run(*this); } /** decomposes the linear part of the transformation as a product rotation x scaling, the scaling being * not necessarily positive. * * If either pointer is zero, the corresponding computation is skipped. * * * * \svd_module * * \sa computeScalingRotation(), rotation(), class SVD */ template template EIGEN_DEVICE_FUNC void Transform::computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const { // Note that JacobiSVD is faster than BDCSVD for small matrices. JacobiSVD svd(linear(), ComputeFullU | ComputeFullV); Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant() < Scalar(0) ? Scalar(-1) : Scalar(1); // so x has absolute value 1 VectorType sv(svd.singularValues()); sv.coeffRef(Dim-1) *= x; if(scaling) *scaling = svd.matrixV() * sv.asDiagonal() * svd.matrixV().adjoint(); if(rotation) { LinearMatrixType m(svd.matrixU()); m.col(Dim-1) *= x; *rotation = m * svd.matrixV().adjoint(); } } /** decomposes the linear part of the transformation as a product scaling x rotation, the scaling being * not necessarily positive. * * If either pointer is zero, the corresponding computation is skipped. * * * * \svd_module * * \sa computeRotationScaling(), rotation(), class SVD */ template template EIGEN_DEVICE_FUNC void Transform::computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const { // Note that JacobiSVD is faster than BDCSVD for small matrices. JacobiSVD svd(linear(), ComputeFullU | ComputeFullV); Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant() < Scalar(0) ? Scalar(-1) : Scalar(1); // so x has absolute value 1 VectorType sv(svd.singularValues()); sv.coeffRef(Dim-1) *= x; if(scaling) *scaling = svd.matrixU() * sv.asDiagonal() * svd.matrixU().adjoint(); if(rotation) { LinearMatrixType m(svd.matrixU()); m.col(Dim-1) *= x; *rotation = m * svd.matrixV().adjoint(); } } /** Convenient method to set \c *this from a position, orientation and scale * of a 3D object. */ template template EIGEN_DEVICE_FUNC Transform& Transform::fromPositionOrientationScale(const MatrixBase &position, const OrientationType& orientation, const MatrixBase &scale) { linear() = internal::toRotationMatrix(orientation); linear() *= scale.asDiagonal(); translation() = position; makeAffine(); return *this; } namespace internal { template struct transform_make_affine { template EIGEN_DEVICE_FUNC static void run(MatrixType &mat) { static const int Dim = MatrixType::ColsAtCompileTime-1; mat.template block<1,Dim>(Dim,0).setZero(); mat.coeffRef(Dim,Dim) = typename MatrixType::Scalar(1); } }; template<> struct transform_make_affine { template EIGEN_DEVICE_FUNC static void run(MatrixType &) { } }; // selector needed to avoid taking the inverse of a 3x4 matrix template struct projective_transform_inverse { EIGEN_DEVICE_FUNC static inline void run(const TransformType&, TransformType&) {} }; template struct projective_transform_inverse { EIGEN_DEVICE_FUNC static inline void run(const TransformType& m, TransformType& res) { res.matrix() = m.matrix().inverse(); } }; } // end namespace internal /** * * \returns the inverse transformation according to some given knowledge * on \c *this. * * \param hint allows to optimize the inversion process when the transformation * is known to be not a general transformation (optional). The possible values are: * - #Projective if the transformation is not necessarily affine, i.e., if the * last row is not guaranteed to be [0 ... 0 1] * - #Affine if the last row can be assumed to be [0 ... 0 1] * - #Isometry if the transformation is only a concatenations of translations * and rotations. * The default is the template class parameter \c Mode. * * \warning unless \a traits is always set to NoShear or NoScaling, this function * requires the generic inverse method of MatrixBase defined in the LU module. If * you forget to include this module, then you will get hard to debug linking errors. * * \sa MatrixBase::inverse() */ template EIGEN_DEVICE_FUNC Transform Transform::inverse(TransformTraits hint) const { Transform res; if (hint == Projective) { internal::projective_transform_inverse::run(*this, res); } else { if (hint == Isometry) { res.matrix().template topLeftCorner() = linear().transpose(); } else if(hint&Affine) { res.matrix().template topLeftCorner() = linear().inverse(); } else { eigen_assert(false && "Invalid transform traits in Transform::Inverse"); } // translation and remaining parts res.matrix().template topRightCorner() = - res.matrix().template topLeftCorner() * translation(); res.makeAffine(); // we do need this, because in the beginning res is uninitialized } return res; } namespace internal { /***************************************************** *** Specializations of take affine part *** *****************************************************/ template struct transform_take_affine_part { typedef typename TransformType::MatrixType MatrixType; typedef typename TransformType::AffinePart AffinePart; typedef typename TransformType::ConstAffinePart ConstAffinePart; static inline AffinePart run(MatrixType& m) { return m.template block(0,0); } static inline ConstAffinePart run(const MatrixType& m) { return m.template block(0,0); } }; template struct transform_take_affine_part > { typedef typename Transform::MatrixType MatrixType; static inline MatrixType& run(MatrixType& m) { return m; } static inline const MatrixType& run(const MatrixType& m) { return m; } }; /***************************************************** *** Specializations of construct from matrix *** *****************************************************/ template struct transform_construct_from_matrix { static inline void run(Transform *transform, const Other& other) { transform->linear() = other; transform->translation().setZero(); transform->makeAffine(); } }; template struct transform_construct_from_matrix { static inline void run(Transform *transform, const Other& other) { transform->affine() = other; transform->makeAffine(); } }; template struct transform_construct_from_matrix { static inline void run(Transform *transform, const Other& other) { transform->matrix() = other; } }; template struct transform_construct_from_matrix { static inline void run(Transform *transform, const Other& other) { transform->matrix() = other.template block(0,0); } }; /********************************************************** *** Specializations of operator* with rhs EigenBase *** **********************************************************/ template struct transform_product_result { enum { Mode = (LhsMode == (int)Projective || RhsMode == (int)Projective ) ? Projective : (LhsMode == (int)Affine || RhsMode == (int)Affine ) ? Affine : (LhsMode == (int)AffineCompact || RhsMode == (int)AffineCompact ) ? AffineCompact : (LhsMode == (int)Isometry || RhsMode == (int)Isometry ) ? Isometry : Projective }; }; template< typename TransformType, typename MatrixType, int RhsCols> struct transform_right_product_impl< TransformType, MatrixType, 0, RhsCols> { typedef typename MatrixType::PlainObject ResultType; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other) { return T.matrix() * other; } }; template< typename TransformType, typename MatrixType, int RhsCols> struct transform_right_product_impl< TransformType, MatrixType, 1, RhsCols> { enum { Dim = TransformType::Dim, HDim = TransformType::HDim, OtherRows = MatrixType::RowsAtCompileTime, OtherCols = MatrixType::ColsAtCompileTime }; typedef typename MatrixType::PlainObject ResultType; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other) { EIGEN_STATIC_ASSERT(OtherRows==HDim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES); typedef Block TopLeftLhs; ResultType res(other.rows(),other.cols()); TopLeftLhs(res, 0, 0, Dim, other.cols()).noalias() = T.affine() * other; res.row(OtherRows-1) = other.row(OtherRows-1); return res; } }; template< typename TransformType, typename MatrixType, int RhsCols> struct transform_right_product_impl< TransformType, MatrixType, 2, RhsCols> { enum { Dim = TransformType::Dim, HDim = TransformType::HDim, OtherRows = MatrixType::RowsAtCompileTime, OtherCols = MatrixType::ColsAtCompileTime }; typedef typename MatrixType::PlainObject ResultType; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other) { EIGEN_STATIC_ASSERT(OtherRows==Dim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES); typedef Block TopLeftLhs; ResultType res(Replicate(T.translation(),1,other.cols())); TopLeftLhs(res, 0, 0, Dim, other.cols()).noalias() += T.linear() * other; return res; } }; template< typename TransformType, typename MatrixType > struct transform_right_product_impl< TransformType, MatrixType, 2, 1> // rhs is a vector of size Dim { typedef typename TransformType::MatrixType TransformMatrix; enum { Dim = TransformType::Dim, HDim = TransformType::HDim, OtherRows = MatrixType::RowsAtCompileTime, WorkingRows = EIGEN_PLAIN_ENUM_MIN(TransformMatrix::RowsAtCompileTime,HDim) }; typedef typename MatrixType::PlainObject ResultType; static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other) { EIGEN_STATIC_ASSERT(OtherRows==Dim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES); Matrix rhs; rhs.template head() = other; rhs[Dim] = typename ResultType::Scalar(1); Matrix res(T.matrix() * rhs); return res.template head(); } }; /********************************************************** *** Specializations of operator* with lhs EigenBase *** **********************************************************/ // generic HDim x HDim matrix * T => Projective template struct transform_left_product_impl { typedef Transform TransformType; typedef typename TransformType::MatrixType MatrixType; typedef Transform ResultType; static ResultType run(const Other& other,const TransformType& tr) { return ResultType(other * tr.matrix()); } }; // generic HDim x HDim matrix * AffineCompact => Projective template struct transform_left_product_impl { typedef Transform TransformType; typedef typename TransformType::MatrixType MatrixType; typedef Transform ResultType; static ResultType run(const Other& other,const TransformType& tr) { ResultType res; res.matrix().noalias() = other.template block(0,0) * tr.matrix(); res.matrix().col(Dim) += other.col(Dim); return res; } }; // affine matrix * T template struct transform_left_product_impl { typedef Transform TransformType; typedef typename TransformType::MatrixType MatrixType; typedef TransformType ResultType; static ResultType run(const Other& other,const TransformType& tr) { ResultType res; res.affine().noalias() = other * tr.matrix(); res.matrix().row(Dim) = tr.matrix().row(Dim); return res; } }; // affine matrix * AffineCompact template struct transform_left_product_impl { typedef Transform TransformType; typedef typename TransformType::MatrixType MatrixType; typedef TransformType ResultType; static ResultType run(const Other& other,const TransformType& tr) { ResultType res; res.matrix().noalias() = other.template block(0,0) * tr.matrix(); res.translation() += other.col(Dim); return res; } }; // linear matrix * T template struct transform_left_product_impl { typedef Transform TransformType; typedef typename TransformType::MatrixType MatrixType; typedef TransformType ResultType; static ResultType run(const Other& other, const TransformType& tr) { TransformType res; if(Mode!=int(AffineCompact)) res.matrix().row(Dim) = tr.matrix().row(Dim); res.matrix().template topRows().noalias() = other * tr.matrix().template topRows(); return res; } }; /********************************************************** *** Specializations of operator* with another Transform *** **********************************************************/ template struct transform_transform_product_impl,Transform,false > { enum { ResultMode = transform_product_result::Mode }; typedef Transform Lhs; typedef Transform Rhs; typedef Transform ResultType; static ResultType run(const Lhs& lhs, const Rhs& rhs) { ResultType res; res.linear() = lhs.linear() * rhs.linear(); res.translation() = lhs.linear() * rhs.translation() + lhs.translation(); res.makeAffine(); return res; } }; template struct transform_transform_product_impl,Transform,true > { typedef Transform Lhs; typedef Transform Rhs; typedef Transform ResultType; static ResultType run(const Lhs& lhs, const Rhs& rhs) { return ResultType( lhs.matrix() * rhs.matrix() ); } }; template struct transform_transform_product_impl,Transform,true > { typedef Transform Lhs; typedef Transform Rhs; typedef Transform ResultType; static ResultType run(const Lhs& lhs, const Rhs& rhs) { ResultType res; res.matrix().template topRows() = lhs.matrix() * rhs.matrix(); res.matrix().row(Dim) = rhs.matrix().row(Dim); return res; } }; template struct transform_transform_product_impl,Transform,true > { typedef Transform Lhs; typedef Transform Rhs; typedef Transform ResultType; static ResultType run(const Lhs& lhs, const Rhs& rhs) { ResultType res(lhs.matrix().template leftCols() * rhs.matrix()); res.matrix().col(Dim) += lhs.matrix().col(Dim); return res; } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRANSFORM_H RcppEigen/inst/include/Eigen/src/Geometry/Translation.h0000644000176200001440000001676014562417221022624 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRANSLATION_H #define EIGEN_TRANSLATION_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class Translation * * \brief Represents a translation transformation * * \tparam _Scalar the scalar type, i.e., the type of the coefficients. * \tparam _Dim the dimension of the space, can be a compile time value or Dynamic * * \note This class is not aimed to be used to store a translation transformation, * but rather to make easier the constructions and updates of Transform objects. * * \sa class Scaling, class Transform */ template class Translation { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim) /** dimension of the space */ enum { Dim = _Dim }; /** the scalar type of the coefficients */ typedef _Scalar Scalar; /** corresponding vector type */ typedef Matrix VectorType; /** corresponding linear transformation matrix type */ typedef Matrix LinearMatrixType; /** corresponding affine transformation type */ typedef Transform AffineTransformType; /** corresponding isometric transformation type */ typedef Transform IsometryTransformType; protected: VectorType m_coeffs; public: /** Default constructor without initialization. */ EIGEN_DEVICE_FUNC Translation() {} /** */ EIGEN_DEVICE_FUNC inline Translation(const Scalar& sx, const Scalar& sy) { eigen_assert(Dim==2); m_coeffs.x() = sx; m_coeffs.y() = sy; } /** */ EIGEN_DEVICE_FUNC inline Translation(const Scalar& sx, const Scalar& sy, const Scalar& sz) { eigen_assert(Dim==3); m_coeffs.x() = sx; m_coeffs.y() = sy; m_coeffs.z() = sz; } /** Constructs and initialize the translation transformation from a vector of translation coefficients */ EIGEN_DEVICE_FUNC explicit inline Translation(const VectorType& vector) : m_coeffs(vector) {} /** \brief Returns the x-translation by value. **/ EIGEN_DEVICE_FUNC inline Scalar x() const { return m_coeffs.x(); } /** \brief Returns the y-translation by value. **/ EIGEN_DEVICE_FUNC inline Scalar y() const { return m_coeffs.y(); } /** \brief Returns the z-translation by value. **/ EIGEN_DEVICE_FUNC inline Scalar z() const { return m_coeffs.z(); } /** \brief Returns the x-translation as a reference. **/ EIGEN_DEVICE_FUNC inline Scalar& x() { return m_coeffs.x(); } /** \brief Returns the y-translation as a reference. **/ EIGEN_DEVICE_FUNC inline Scalar& y() { return m_coeffs.y(); } /** \brief Returns the z-translation as a reference. **/ EIGEN_DEVICE_FUNC inline Scalar& z() { return m_coeffs.z(); } EIGEN_DEVICE_FUNC const VectorType& vector() const { return m_coeffs; } EIGEN_DEVICE_FUNC VectorType& vector() { return m_coeffs; } EIGEN_DEVICE_FUNC const VectorType& translation() const { return m_coeffs; } EIGEN_DEVICE_FUNC VectorType& translation() { return m_coeffs; } /** Concatenates two translation */ EIGEN_DEVICE_FUNC inline Translation operator* (const Translation& other) const { return Translation(m_coeffs + other.m_coeffs); } /** Concatenates a translation and a uniform scaling */ EIGEN_DEVICE_FUNC inline AffineTransformType operator* (const UniformScaling& other) const; /** Concatenates a translation and a linear transformation */ template EIGEN_DEVICE_FUNC inline AffineTransformType operator* (const EigenBase& linear) const; /** Concatenates a translation and a rotation */ template EIGEN_DEVICE_FUNC inline IsometryTransformType operator*(const RotationBase& r) const { return *this * IsometryTransformType(r); } /** \returns the concatenation of a linear transformation \a l with the translation \a t */ // its a nightmare to define a templated friend function outside its declaration template friend EIGEN_DEVICE_FUNC inline AffineTransformType operator*(const EigenBase& linear, const Translation& t) { AffineTransformType res; res.matrix().setZero(); res.linear() = linear.derived(); res.translation() = linear.derived() * t.m_coeffs; res.matrix().row(Dim).setZero(); res(Dim,Dim) = Scalar(1); return res; } /** Concatenates a translation and a transformation */ template EIGEN_DEVICE_FUNC inline Transform operator* (const Transform& t) const { Transform res = t; res.pretranslate(m_coeffs); return res; } /** Applies translation to vector */ template inline typename internal::enable_if::type operator* (const MatrixBase& vec) const { return m_coeffs + vec.derived(); } /** \returns the inverse translation (opposite) */ Translation inverse() const { return Translation(-m_coeffs); } static const Translation Identity() { return Translation(VectorType::Zero()); } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit Translation(const Translation& other) { m_coeffs = other.vector().template cast(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const Translation& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return m_coeffs.isApprox(other.m_coeffs, prec); } }; /** \addtogroup Geometry_Module */ //@{ typedef Translation Translation2f; typedef Translation Translation2d; typedef Translation Translation3f; typedef Translation Translation3d; //@} template EIGEN_DEVICE_FUNC inline typename Translation::AffineTransformType Translation::operator* (const UniformScaling& other) const { AffineTransformType res; res.matrix().setZero(); res.linear().diagonal().fill(other.factor()); res.translation() = m_coeffs; res(Dim,Dim) = Scalar(1); return res; } template template EIGEN_DEVICE_FUNC inline typename Translation::AffineTransformType Translation::operator* (const EigenBase& linear) const { AffineTransformType res; res.matrix().setZero(); res.linear() = linear.derived(); res.translation() = m_coeffs; res.matrix().row(Dim).setZero(); res(Dim,Dim) = Scalar(1); return res; } } // end namespace Eigen #endif // EIGEN_TRANSLATION_H RcppEigen/inst/include/Eigen/src/Geometry/arch/0000755000176200001440000000000014562417221021060 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Geometry/arch/Geometry_SIMD.h0000644000176200001440000001347114562417221023646 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Rohit Garg // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GEOMETRY_SIMD_H #define EIGEN_GEOMETRY_SIMD_H namespace Eigen { namespace internal { template struct quat_product { enum { AAlignment = traits::Alignment, BAlignment = traits::Alignment, ResAlignment = traits >::Alignment }; static inline Quaternion run(const QuaternionBase& _a, const QuaternionBase& _b) { evaluator ae(_a.coeffs()); evaluator be(_b.coeffs()); Quaternion res; const float neg_zero = numext::bit_cast(0x80000000u); const float arr[4] = {0.f, 0.f, 0.f, neg_zero}; const Packet4f mask = ploadu(arr); Packet4f a = ae.template packet(0); Packet4f b = be.template packet(0); Packet4f s1 = pmul(vec4f_swizzle1(a,1,2,0,2),vec4f_swizzle1(b,2,0,1,2)); Packet4f s2 = pmul(vec4f_swizzle1(a,3,3,3,1),vec4f_swizzle1(b,0,1,2,1)); pstoret( &res.x(), padd(psub(pmul(a,vec4f_swizzle1(b,3,3,3,3)), pmul(vec4f_swizzle1(a,2,0,1,0), vec4f_swizzle1(b,1,2,0,0))), pxor(mask,padd(s1,s2)))); return res; } }; template struct quat_conj { enum { ResAlignment = traits >::Alignment }; static inline Quaternion run(const QuaternionBase& q) { evaluator qe(q.coeffs()); Quaternion res; const float neg_zero = numext::bit_cast(0x80000000u); const float arr[4] = {neg_zero, neg_zero, neg_zero,0.f}; const Packet4f mask = ploadu(arr); pstoret(&res.x(), pxor(mask, qe.template packet::Alignment,Packet4f>(0))); return res; } }; template struct cross3_impl { enum { ResAlignment = traits::type>::Alignment }; static inline typename plain_matrix_type::type run(const VectorLhs& lhs, const VectorRhs& rhs) { evaluator lhs_eval(lhs); evaluator rhs_eval(rhs); Packet4f a = lhs_eval.template packet::Alignment,Packet4f>(0); Packet4f b = rhs_eval.template packet::Alignment,Packet4f>(0); Packet4f mul1 = pmul(vec4f_swizzle1(a,1,2,0,3),vec4f_swizzle1(b,2,0,1,3)); Packet4f mul2 = pmul(vec4f_swizzle1(a,2,0,1,3),vec4f_swizzle1(b,1,2,0,3)); typename plain_matrix_type::type res; pstoret(&res.x(),psub(mul1,mul2)); return res; } }; #if (defined EIGEN_VECTORIZE_SSE) || (EIGEN_ARCH_ARM64) template struct quat_product { enum { BAlignment = traits::Alignment, ResAlignment = traits >::Alignment }; static inline Quaternion run(const QuaternionBase& _a, const QuaternionBase& _b) { Quaternion res; evaluator ae(_a.coeffs()); evaluator be(_b.coeffs()); const double* a = _a.coeffs().data(); Packet2d b_xy = be.template packet(0); Packet2d b_zw = be.template packet(2); Packet2d a_xx = pset1(a[0]); Packet2d a_yy = pset1(a[1]); Packet2d a_zz = pset1(a[2]); Packet2d a_ww = pset1(a[3]); // two temporaries: Packet2d t1, t2; /* * t1 = ww*xy + yy*zw * t2 = zz*xy - xx*zw * res.xy = t1 +/- swap(t2) */ t1 = padd(pmul(a_ww, b_xy), pmul(a_yy, b_zw)); t2 = psub(pmul(a_zz, b_xy), pmul(a_xx, b_zw)); pstoret(&res.x(), paddsub(t1, preverse(t2))); /* * t1 = ww*zw - yy*xy * t2 = zz*zw + xx*xy * res.zw = t1 -/+ swap(t2) = swap( swap(t1) +/- t2) */ t1 = psub(pmul(a_ww, b_zw), pmul(a_yy, b_xy)); t2 = padd(pmul(a_zz, b_zw), pmul(a_xx, b_xy)); pstoret(&res.z(), preverse(paddsub(preverse(t1), t2))); return res; } }; template struct quat_conj { enum { ResAlignment = traits >::Alignment }; static inline Quaternion run(const QuaternionBase& q) { evaluator qe(q.coeffs()); Quaternion res; const double neg_zero = numext::bit_cast(0x8000000000000000ull); const double arr1[2] = {neg_zero, neg_zero}; const double arr2[2] = {neg_zero, 0.0}; const Packet2d mask0 = ploadu(arr1); const Packet2d mask2 = ploadu(arr2); pstoret(&res.x(), pxor(mask0, qe.template packet::Alignment,Packet2d>(0))); pstoret(&res.z(), pxor(mask2, qe.template packet::Alignment,Packet2d>(2))); return res; } }; #endif // end EIGEN_VECTORIZE_SSE_OR_EIGEN_ARCH_ARM64 } // end namespace internal } // end namespace Eigen #endif // EIGEN_GEOMETRY_SIMD_H RcppEigen/inst/include/Eigen/src/Geometry/Homogeneous.h0000644000176200001440000005036614562417221022616 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_HOMOGENEOUS_H #define EIGEN_HOMOGENEOUS_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class Homogeneous * * \brief Expression of one (or a set of) homogeneous vector(s) * * \param MatrixType the type of the object in which we are making homogeneous * * This class represents an expression of one (or a set of) homogeneous vector(s). * It is the return type of MatrixBase::homogeneous() and most of the time * this is the only way it is used. * * \sa MatrixBase::homogeneous() */ namespace internal { template struct traits > : traits { typedef typename traits::StorageKind StorageKind; typedef typename ref_selector::type MatrixTypeNested; typedef typename remove_reference::type _MatrixTypeNested; enum { RowsPlusOne = (MatrixType::RowsAtCompileTime != Dynamic) ? int(MatrixType::RowsAtCompileTime) + 1 : Dynamic, ColsPlusOne = (MatrixType::ColsAtCompileTime != Dynamic) ? int(MatrixType::ColsAtCompileTime) + 1 : Dynamic, RowsAtCompileTime = Direction==Vertical ? RowsPlusOne : MatrixType::RowsAtCompileTime, ColsAtCompileTime = Direction==Horizontal ? ColsPlusOne : MatrixType::ColsAtCompileTime, MaxRowsAtCompileTime = RowsAtCompileTime, MaxColsAtCompileTime = ColsAtCompileTime, TmpFlags = _MatrixTypeNested::Flags & HereditaryBits, Flags = ColsAtCompileTime==1 ? (TmpFlags & ~RowMajorBit) : RowsAtCompileTime==1 ? (TmpFlags | RowMajorBit) : TmpFlags }; }; template struct homogeneous_left_product_impl; template struct homogeneous_right_product_impl; } // end namespace internal template class Homogeneous : public MatrixBase >, internal::no_assignment_operator { public: typedef MatrixType NestedExpression; enum { Direction = _Direction }; typedef MatrixBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(Homogeneous) EIGEN_DEVICE_FUNC explicit inline Homogeneous(const MatrixType& matrix) : m_matrix(matrix) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows() + (int(Direction)==Vertical ? 1 : 0); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols() + (int(Direction)==Horizontal ? 1 : 0); } EIGEN_DEVICE_FUNC const NestedExpression& nestedExpression() const { return m_matrix; } template EIGEN_DEVICE_FUNC inline const Product operator* (const MatrixBase& rhs) const { eigen_assert(int(Direction)==Horizontal); return Product(*this,rhs.derived()); } template friend EIGEN_DEVICE_FUNC inline const Product operator* (const MatrixBase& lhs, const Homogeneous& rhs) { eigen_assert(int(Direction)==Vertical); return Product(lhs.derived(),rhs); } template friend EIGEN_DEVICE_FUNC inline const Product, Homogeneous > operator* (const Transform& lhs, const Homogeneous& rhs) { eigen_assert(int(Direction)==Vertical); return Product, Homogeneous>(lhs,rhs); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::result_of::type redux(const Func& func) const { return func(m_matrix.redux(func), Scalar(1)); } protected: typename MatrixType::Nested m_matrix; }; /** \geometry_module \ingroup Geometry_Module * * \returns a vector expression that is one longer than the vector argument, with the value 1 symbolically appended as the last coefficient. * * This can be used to convert affine coordinates to homogeneous coordinates. * * \only_for_vectors * * Example: \include MatrixBase_homogeneous.cpp * Output: \verbinclude MatrixBase_homogeneous.out * * \sa VectorwiseOp::homogeneous(), class Homogeneous */ template EIGEN_DEVICE_FUNC inline typename MatrixBase::HomogeneousReturnType MatrixBase::homogeneous() const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived); return HomogeneousReturnType(derived()); } /** \geometry_module \ingroup Geometry_Module * * \returns an expression where the value 1 is symbolically appended as the final coefficient to each column (or row) of the matrix. * * This can be used to convert affine coordinates to homogeneous coordinates. * * Example: \include VectorwiseOp_homogeneous.cpp * Output: \verbinclude VectorwiseOp_homogeneous.out * * \sa MatrixBase::homogeneous(), class Homogeneous */ template EIGEN_DEVICE_FUNC inline Homogeneous VectorwiseOp::homogeneous() const { return HomogeneousReturnType(_expression()); } /** \geometry_module \ingroup Geometry_Module * * \brief homogeneous normalization * * \returns a vector expression of the N-1 first coefficients of \c *this divided by that last coefficient. * * This can be used to convert homogeneous coordinates to affine coordinates. * * It is essentially a shortcut for: * \code this->head(this->size()-1)/this->coeff(this->size()-1); \endcode * * Example: \include MatrixBase_hnormalized.cpp * Output: \verbinclude MatrixBase_hnormalized.out * * \sa VectorwiseOp::hnormalized() */ template EIGEN_DEVICE_FUNC inline const typename MatrixBase::HNormalizedReturnType MatrixBase::hnormalized() const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived); return ConstStartMinusOne(derived(),0,0, ColsAtCompileTime==1?size()-1:1, ColsAtCompileTime==1?1:size()-1) / coeff(size()-1); } /** \geometry_module \ingroup Geometry_Module * * \brief column or row-wise homogeneous normalization * * \returns an expression of the first N-1 coefficients of each column (or row) of \c *this divided by the last coefficient of each column (or row). * * This can be used to convert homogeneous coordinates to affine coordinates. * * It is conceptually equivalent to calling MatrixBase::hnormalized() to each column (or row) of \c *this. * * Example: \include DirectionWise_hnormalized.cpp * Output: \verbinclude DirectionWise_hnormalized.out * * \sa MatrixBase::hnormalized() */ template EIGEN_DEVICE_FUNC inline const typename VectorwiseOp::HNormalizedReturnType VectorwiseOp::hnormalized() const { return HNormalized_Block(_expression(),0,0, Direction==Vertical ? _expression().rows()-1 : _expression().rows(), Direction==Horizontal ? _expression().cols()-1 : _expression().cols()).cwiseQuotient( Replicate (HNormalized_Factors(_expression(), Direction==Vertical ? _expression().rows()-1:0, Direction==Horizontal ? _expression().cols()-1:0, Direction==Vertical ? 1 : _expression().rows(), Direction==Horizontal ? 1 : _expression().cols()), Direction==Vertical ? _expression().rows()-1 : 1, Direction==Horizontal ? _expression().cols()-1 : 1)); } namespace internal { template struct take_matrix_for_product { typedef MatrixOrTransformType type; EIGEN_DEVICE_FUNC static const type& run(const type &x) { return x; } }; template struct take_matrix_for_product > { typedef Transform TransformType; typedef typename internal::add_const::type type; EIGEN_DEVICE_FUNC static type run (const TransformType& x) { return x.affine(); } }; template struct take_matrix_for_product > { typedef Transform TransformType; typedef typename TransformType::MatrixType type; EIGEN_DEVICE_FUNC static const type& run (const TransformType& x) { return x.matrix(); } }; template struct traits,Lhs> > { typedef typename take_matrix_for_product::type LhsMatrixType; typedef typename remove_all::type MatrixTypeCleaned; typedef typename remove_all::type LhsMatrixTypeCleaned; typedef typename make_proper_matrix_type< typename traits::Scalar, LhsMatrixTypeCleaned::RowsAtCompileTime, MatrixTypeCleaned::ColsAtCompileTime, MatrixTypeCleaned::PlainObject::Options, LhsMatrixTypeCleaned::MaxRowsAtCompileTime, MatrixTypeCleaned::MaxColsAtCompileTime>::type ReturnType; }; template struct homogeneous_left_product_impl,Lhs> : public ReturnByValue,Lhs> > { typedef typename traits::LhsMatrixType LhsMatrixType; typedef typename remove_all::type LhsMatrixTypeCleaned; typedef typename remove_all::type LhsMatrixTypeNested; EIGEN_DEVICE_FUNC homogeneous_left_product_impl(const Lhs& lhs, const MatrixType& rhs) : m_lhs(take_matrix_for_product::run(lhs)), m_rhs(rhs) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_lhs.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); } template EIGEN_DEVICE_FUNC void evalTo(Dest& dst) const { // FIXME investigate how to allow lazy evaluation of this product when possible dst = Block (m_lhs,0,0,m_lhs.rows(),m_lhs.cols()-1) * m_rhs; dst += m_lhs.col(m_lhs.cols()-1).rowwise() .template replicate(m_rhs.cols()); } typename LhsMatrixTypeCleaned::Nested m_lhs; typename MatrixType::Nested m_rhs; }; template struct traits,Rhs> > { typedef typename make_proper_matrix_type::Scalar, MatrixType::RowsAtCompileTime, Rhs::ColsAtCompileTime, MatrixType::PlainObject::Options, MatrixType::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime>::type ReturnType; }; template struct homogeneous_right_product_impl,Rhs> : public ReturnByValue,Rhs> > { typedef typename remove_all::type RhsNested; EIGEN_DEVICE_FUNC homogeneous_right_product_impl(const MatrixType& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_lhs.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); } template EIGEN_DEVICE_FUNC void evalTo(Dest& dst) const { // FIXME investigate how to allow lazy evaluation of this product when possible dst = m_lhs * Block (m_rhs,0,0,m_rhs.rows()-1,m_rhs.cols()); dst += m_rhs.row(m_rhs.rows()-1).colwise() .template replicate(m_lhs.rows()); } typename MatrixType::Nested m_lhs; typename Rhs::Nested m_rhs; }; template struct evaluator_traits > { typedef typename storage_kind_to_evaluator_kind::Kind Kind; typedef HomogeneousShape Shape; }; template<> struct AssignmentKind { typedef Dense2Dense Kind; }; template struct unary_evaluator, IndexBased> : evaluator::PlainObject > { typedef Homogeneous XprType; typedef typename XprType::PlainObject PlainObject; typedef evaluator Base; EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& op) : Base(), m_temp(op) { ::new (static_cast(this)) Base(m_temp); } protected: PlainObject m_temp; }; // dense = homogeneous template< typename DstXprType, typename ArgType, typename Scalar> struct Assignment, internal::assign_op, Dense2Dense> { typedef Homogeneous SrcXprType; EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); dst.template topRows(src.nestedExpression().rows()) = src.nestedExpression(); dst.row(dst.rows()-1).setOnes(); } }; // dense = homogeneous template< typename DstXprType, typename ArgType, typename Scalar> struct Assignment, internal::assign_op, Dense2Dense> { typedef Homogeneous SrcXprType; EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); dst.template leftCols(src.nestedExpression().cols()) = src.nestedExpression(); dst.col(dst.cols()-1).setOnes(); } }; template struct generic_product_impl, Rhs, HomogeneousShape, DenseShape, ProductTag> { template EIGEN_DEVICE_FUNC static void evalTo(Dest& dst, const Homogeneous& lhs, const Rhs& rhs) { homogeneous_right_product_impl, Rhs>(lhs.nestedExpression(), rhs).evalTo(dst); } }; template struct homogeneous_right_product_refactoring_helper { enum { Dim = Lhs::ColsAtCompileTime, Rows = Lhs::RowsAtCompileTime }; typedef typename Rhs::template ConstNRowsBlockXpr::Type LinearBlockConst; typedef typename remove_const::type LinearBlock; typedef typename Rhs::ConstRowXpr ConstantColumn; typedef Replicate ConstantBlock; typedef Product LinearProduct; typedef CwiseBinaryOp, const LinearProduct, const ConstantBlock> Xpr; }; template struct product_evaluator, ProductTag, HomogeneousShape, DenseShape> : public evaluator::Xpr> { typedef Product XprType; typedef homogeneous_right_product_refactoring_helper helper; typedef typename helper::ConstantBlock ConstantBlock; typedef typename helper::Xpr RefactoredXpr; typedef evaluator Base; EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr) : Base( xpr.lhs().nestedExpression() .lazyProduct( xpr.rhs().template topRows(xpr.lhs().nestedExpression().cols()) ) + ConstantBlock(xpr.rhs().row(xpr.rhs().rows()-1),xpr.lhs().rows(), 1) ) {} }; template struct generic_product_impl, DenseShape, HomogeneousShape, ProductTag> { template EIGEN_DEVICE_FUNC static void evalTo(Dest& dst, const Lhs& lhs, const Homogeneous& rhs) { homogeneous_left_product_impl, Lhs>(lhs, rhs.nestedExpression()).evalTo(dst); } }; // TODO: the following specialization is to address a regression from 3.2 to 3.3 // In the future, this path should be optimized. template struct generic_product_impl, TriangularShape, HomogeneousShape, ProductTag> { template static void evalTo(Dest& dst, const Lhs& lhs, const Homogeneous& rhs) { dst.noalias() = lhs * rhs.eval(); } }; template struct homogeneous_left_product_refactoring_helper { enum { Dim = Rhs::RowsAtCompileTime, Cols = Rhs::ColsAtCompileTime }; typedef typename Lhs::template ConstNColsBlockXpr::Type LinearBlockConst; typedef typename remove_const::type LinearBlock; typedef typename Lhs::ConstColXpr ConstantColumn; typedef Replicate ConstantBlock; typedef Product LinearProduct; typedef CwiseBinaryOp, const LinearProduct, const ConstantBlock> Xpr; }; template struct product_evaluator, ProductTag, DenseShape, HomogeneousShape> : public evaluator::Xpr> { typedef Product XprType; typedef homogeneous_left_product_refactoring_helper helper; typedef typename helper::ConstantBlock ConstantBlock; typedef typename helper::Xpr RefactoredXpr; typedef evaluator Base; EIGEN_DEVICE_FUNC explicit product_evaluator(const XprType& xpr) : Base( xpr.lhs().template leftCols(xpr.rhs().nestedExpression().rows()) .lazyProduct( xpr.rhs().nestedExpression() ) + ConstantBlock(xpr.lhs().col(xpr.lhs().cols()-1),1,xpr.rhs().cols()) ) {} }; template struct generic_product_impl, Homogeneous, DenseShape, HomogeneousShape, ProductTag> { typedef Transform TransformType; template EIGEN_DEVICE_FUNC static void evalTo(Dest& dst, const TransformType& lhs, const Homogeneous& rhs) { homogeneous_left_product_impl, TransformType>(lhs, rhs.nestedExpression()).evalTo(dst); } }; template struct permutation_matrix_product : public permutation_matrix_product {}; } // end namespace internal } // end namespace Eigen #endif // EIGEN_HOMOGENEOUS_H RcppEigen/inst/include/Eigen/src/Geometry/EulerAngles.h0000644000176200001440000000705014562417221022524 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EULERANGLES_H #define EIGEN_EULERANGLES_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * * \returns the Euler-angles of the rotation matrix \c *this using the convention defined by the triplet (\a a0,\a a1,\a a2) * * Each of the three parameters \a a0,\a a1,\a a2 represents the respective rotation axis as an integer in {0,1,2}. * For instance, in: * \code Vector3f ea = mat.eulerAngles(2, 0, 2); \endcode * "2" represents the z axis and "0" the x axis, etc. The returned angles are such that * we have the following equality: * \code * mat == AngleAxisf(ea[0], Vector3f::UnitZ()) * * AngleAxisf(ea[1], Vector3f::UnitX()) * * AngleAxisf(ea[2], Vector3f::UnitZ()); \endcode * This corresponds to the right-multiply conventions (with right hand side frames). * * The returned angles are in the ranges [0:pi]x[-pi:pi]x[-pi:pi]. * * \sa class AngleAxis */ template EIGEN_DEVICE_FUNC inline Matrix::Scalar,3,1> MatrixBase::eulerAngles(Index a0, Index a1, Index a2) const { EIGEN_USING_STD(atan2) EIGEN_USING_STD(sin) EIGEN_USING_STD(cos) /* Implemented from Graphics Gems IV */ EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Derived,3,3) Matrix res; typedef Matrix Vector2; const Index odd = ((a0+1)%3 == a1) ? 0 : 1; const Index i = a0; const Index j = (a0 + 1 + odd)%3; const Index k = (a0 + 2 - odd)%3; if (a0==a2) { res[0] = atan2(coeff(j,i), coeff(k,i)); if((odd && res[0]Scalar(0))) { if(res[0] > Scalar(0)) { res[0] -= Scalar(EIGEN_PI); } else { res[0] += Scalar(EIGEN_PI); } Scalar s2 = Vector2(coeff(j,i), coeff(k,i)).norm(); res[1] = -atan2(s2, coeff(i,i)); } else { Scalar s2 = Vector2(coeff(j,i), coeff(k,i)).norm(); res[1] = atan2(s2, coeff(i,i)); } // With a=(0,1,0), we have i=0; j=1; k=2, and after computing the first two angles, // we can compute their respective rotation, and apply its inverse to M. Since the result must // be a rotation around x, we have: // // c2 s1.s2 c1.s2 1 0 0 // 0 c1 -s1 * M = 0 c3 s3 // -s2 s1.c2 c1.c2 0 -s3 c3 // // Thus: m11.c1 - m21.s1 = c3 & m12.c1 - m22.s1 = s3 Scalar s1 = sin(res[0]); Scalar c1 = cos(res[0]); res[2] = atan2(c1*coeff(j,k)-s1*coeff(k,k), c1*coeff(j,j) - s1 * coeff(k,j)); } else { res[0] = atan2(coeff(j,k), coeff(k,k)); Scalar c2 = Vector2(coeff(i,i), coeff(i,j)).norm(); if((odd && res[0]Scalar(0))) { if(res[0] > Scalar(0)) { res[0] -= Scalar(EIGEN_PI); } else { res[0] += Scalar(EIGEN_PI); } res[1] = atan2(-coeff(i,k), -c2); } else res[1] = atan2(-coeff(i,k), c2); Scalar s1 = sin(res[0]); Scalar c1 = cos(res[0]); res[2] = atan2(s1*coeff(k,i)-c1*coeff(j,i), c1*coeff(j,j) - s1 * coeff(k,j)); } if (!odd) res = -res; return res; } } // end namespace Eigen #endif // EIGEN_EULERANGLES_H RcppEigen/inst/include/Eigen/src/Geometry/AlignedBox.h0000644000176200001440000004477314562417221022347 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. // Function void Eigen::AlignedBox::transform(const Transform& transform) // is provided under the following license agreement: // // Software License Agreement (BSD License) // // Copyright (c) 2011-2014, Willow Garage, Inc. // Copyright (c) 2014-2015, Open Source Robotics Foundation // All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions // are met: // // * Redistributions of source code must retain the above copyright // notice, this list of conditions and the following disclaimer. // * Redistributions in binary form must reproduce the above // copyright notice, this list of conditions and the following // disclaimer in the documentation and/or other materials provided // with the distribution. // * Neither the name of Open Source Robotics Foundation nor the names of its // contributors may be used to endorse or promote products derived // from this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE // COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE // POSSIBILITY OF SUCH DAMAGE. #ifndef EIGEN_ALIGNEDBOX_H #define EIGEN_ALIGNEDBOX_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * * \class AlignedBox * * \brief An axis aligned box * * \tparam _Scalar the type of the scalar coefficients * \tparam _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic. * * This class represents an axis aligned box as a pair of the minimal and maximal corners. * \warning The result of most methods is undefined when applied to an empty box. You can check for empty boxes using isEmpty(). * \sa alignedboxtypedefs */ template class AlignedBox { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim) enum { AmbientDimAtCompileTime = _AmbientDim }; typedef _Scalar Scalar; typedef NumTraits ScalarTraits; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 typedef typename ScalarTraits::Real RealScalar; typedef typename ScalarTraits::NonInteger NonInteger; typedef Matrix VectorType; typedef CwiseBinaryOp, const VectorType, const VectorType> VectorTypeSum; /** Define constants to name the corners of a 1D, 2D or 3D axis aligned bounding box */ enum CornerType { /** 1D names @{ */ Min=0, Max=1, /** @} */ /** Identifier for 2D corner @{ */ BottomLeft=0, BottomRight=1, TopLeft=2, TopRight=3, /** @} */ /** Identifier for 3D corner @{ */ BottomLeftFloor=0, BottomRightFloor=1, TopLeftFloor=2, TopRightFloor=3, BottomLeftCeil=4, BottomRightCeil=5, TopLeftCeil=6, TopRightCeil=7 /** @} */ }; /** Default constructor initializing a null box. */ EIGEN_DEVICE_FUNC inline AlignedBox() { if (EIGEN_CONST_CONDITIONAL(AmbientDimAtCompileTime!=Dynamic)) setEmpty(); } /** Constructs a null box with \a _dim the dimension of the ambient space. */ EIGEN_DEVICE_FUNC inline explicit AlignedBox(Index _dim) : m_min(_dim), m_max(_dim) { setEmpty(); } /** Constructs a box with extremities \a _min and \a _max. * \warning If either component of \a _min is larger than the same component of \a _max, the constructed box is empty. */ template EIGEN_DEVICE_FUNC inline AlignedBox(const OtherVectorType1& _min, const OtherVectorType2& _max) : m_min(_min), m_max(_max) {} /** Constructs a box containing a single point \a p. */ template EIGEN_DEVICE_FUNC inline explicit AlignedBox(const MatrixBase& p) : m_min(p), m_max(m_min) { } EIGEN_DEVICE_FUNC ~AlignedBox() {} /** \returns the dimension in which the box holds */ EIGEN_DEVICE_FUNC inline Index dim() const { return AmbientDimAtCompileTime==Dynamic ? m_min.size() : Index(AmbientDimAtCompileTime); } /** \deprecated use isEmpty() */ EIGEN_DEVICE_FUNC inline bool isNull() const { return isEmpty(); } /** \deprecated use setEmpty() */ EIGEN_DEVICE_FUNC inline void setNull() { setEmpty(); } /** \returns true if the box is empty. * \sa setEmpty */ EIGEN_DEVICE_FUNC inline bool isEmpty() const { return (m_min.array() > m_max.array()).any(); } /** Makes \c *this an empty box. * \sa isEmpty */ EIGEN_DEVICE_FUNC inline void setEmpty() { m_min.setConstant( ScalarTraits::highest() ); m_max.setConstant( ScalarTraits::lowest() ); } /** \returns the minimal corner */ EIGEN_DEVICE_FUNC inline const VectorType& (min)() const { return m_min; } /** \returns a non const reference to the minimal corner */ EIGEN_DEVICE_FUNC inline VectorType& (min)() { return m_min; } /** \returns the maximal corner */ EIGEN_DEVICE_FUNC inline const VectorType& (max)() const { return m_max; } /** \returns a non const reference to the maximal corner */ EIGEN_DEVICE_FUNC inline VectorType& (max)() { return m_max; } /** \returns the center of the box */ EIGEN_DEVICE_FUNC inline const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(VectorTypeSum, RealScalar, quotient) center() const { return (m_min+m_max)/RealScalar(2); } /** \returns the lengths of the sides of the bounding box. * Note that this function does not get the same * result for integral or floating scalar types: see */ EIGEN_DEVICE_FUNC inline const CwiseBinaryOp< internal::scalar_difference_op, const VectorType, const VectorType> sizes() const { return m_max - m_min; } /** \returns the volume of the bounding box */ EIGEN_DEVICE_FUNC inline Scalar volume() const { return sizes().prod(); } /** \returns an expression for the bounding box diagonal vector * if the length of the diagonal is needed: diagonal().norm() * will provide it. */ EIGEN_DEVICE_FUNC inline CwiseBinaryOp< internal::scalar_difference_op, const VectorType, const VectorType> diagonal() const { return sizes(); } /** \returns the vertex of the bounding box at the corner defined by * the corner-id corner. It works only for a 1D, 2D or 3D bounding box. * For 1D bounding boxes corners are named by 2 enum constants: * BottomLeft and BottomRight. * For 2D bounding boxes, corners are named by 4 enum constants: * BottomLeft, BottomRight, TopLeft, TopRight. * For 3D bounding boxes, the following names are added: * BottomLeftCeil, BottomRightCeil, TopLeftCeil, TopRightCeil. */ EIGEN_DEVICE_FUNC inline VectorType corner(CornerType corner) const { EIGEN_STATIC_ASSERT(_AmbientDim <= 3, THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE); VectorType res; Index mult = 1; for(Index d=0; d(Scalar(0), Scalar(1)); } else r[d] = internal::random(m_min[d], m_max[d]); } return r; } /** \returns true if the point \a p is inside the box \c *this. */ template EIGEN_DEVICE_FUNC inline bool contains(const MatrixBase& p) const { typename internal::nested_eval::type p_n(p.derived()); return (m_min.array()<=p_n.array()).all() && (p_n.array()<=m_max.array()).all(); } /** \returns true if the box \a b is entirely inside the box \c *this. */ EIGEN_DEVICE_FUNC inline bool contains(const AlignedBox& b) const { return (m_min.array()<=(b.min)().array()).all() && ((b.max)().array()<=m_max.array()).all(); } /** \returns true if the box \a b is intersecting the box \c *this. * \sa intersection, clamp */ EIGEN_DEVICE_FUNC inline bool intersects(const AlignedBox& b) const { return (m_min.array()<=(b.max)().array()).all() && ((b.min)().array()<=m_max.array()).all(); } /** Extends \c *this such that it contains the point \a p and returns a reference to \c *this. * \sa extend(const AlignedBox&) */ template EIGEN_DEVICE_FUNC inline AlignedBox& extend(const MatrixBase& p) { typename internal::nested_eval::type p_n(p.derived()); m_min = m_min.cwiseMin(p_n); m_max = m_max.cwiseMax(p_n); return *this; } /** Extends \c *this such that it contains the box \a b and returns a reference to \c *this. * \sa merged, extend(const MatrixBase&) */ EIGEN_DEVICE_FUNC inline AlignedBox& extend(const AlignedBox& b) { m_min = m_min.cwiseMin(b.m_min); m_max = m_max.cwiseMax(b.m_max); return *this; } /** Clamps \c *this by the box \a b and returns a reference to \c *this. * \note If the boxes don't intersect, the resulting box is empty. * \sa intersection(), intersects() */ EIGEN_DEVICE_FUNC inline AlignedBox& clamp(const AlignedBox& b) { m_min = m_min.cwiseMax(b.m_min); m_max = m_max.cwiseMin(b.m_max); return *this; } /** Returns an AlignedBox that is the intersection of \a b and \c *this * \note If the boxes don't intersect, the resulting box is empty. * \sa intersects(), clamp, contains() */ EIGEN_DEVICE_FUNC inline AlignedBox intersection(const AlignedBox& b) const {return AlignedBox(m_min.cwiseMax(b.m_min), m_max.cwiseMin(b.m_max)); } /** Returns an AlignedBox that is the union of \a b and \c *this. * \note Merging with an empty box may result in a box bigger than \c *this. * \sa extend(const AlignedBox&) */ EIGEN_DEVICE_FUNC inline AlignedBox merged(const AlignedBox& b) const { return AlignedBox(m_min.cwiseMin(b.m_min), m_max.cwiseMax(b.m_max)); } /** Translate \c *this by the vector \a t and returns a reference to \c *this. */ template EIGEN_DEVICE_FUNC inline AlignedBox& translate(const MatrixBase& a_t) { const typename internal::nested_eval::type t(a_t.derived()); m_min += t; m_max += t; return *this; } /** \returns a copy of \c *this translated by the vector \a t. */ template EIGEN_DEVICE_FUNC inline AlignedBox translated(const MatrixBase& a_t) const { AlignedBox result(m_min, m_max); result.translate(a_t); return result; } /** \returns the squared distance between the point \a p and the box \c *this, * and zero if \a p is inside the box. * \sa exteriorDistance(const MatrixBase&), squaredExteriorDistance(const AlignedBox&) */ template EIGEN_DEVICE_FUNC inline Scalar squaredExteriorDistance(const MatrixBase& p) const; /** \returns the squared distance between the boxes \a b and \c *this, * and zero if the boxes intersect. * \sa exteriorDistance(const AlignedBox&), squaredExteriorDistance(const MatrixBase&) */ EIGEN_DEVICE_FUNC inline Scalar squaredExteriorDistance(const AlignedBox& b) const; /** \returns the distance between the point \a p and the box \c *this, * and zero if \a p is inside the box. * \sa squaredExteriorDistance(const MatrixBase&), exteriorDistance(const AlignedBox&) */ template EIGEN_DEVICE_FUNC inline NonInteger exteriorDistance(const MatrixBase& p) const { EIGEN_USING_STD(sqrt) return sqrt(NonInteger(squaredExteriorDistance(p))); } /** \returns the distance between the boxes \a b and \c *this, * and zero if the boxes intersect. * \sa squaredExteriorDistance(const AlignedBox&), exteriorDistance(const MatrixBase&) */ EIGEN_DEVICE_FUNC inline NonInteger exteriorDistance(const AlignedBox& b) const { EIGEN_USING_STD(sqrt) return sqrt(NonInteger(squaredExteriorDistance(b))); } /** * Specialization of transform for pure translation. */ template EIGEN_DEVICE_FUNC inline void transform( const typename Transform::TranslationType& translation) { this->translate(translation); } /** * Transforms this box by \a transform and recomputes it to * still be an axis-aligned box. * * \note This method is provided under BSD license (see the top of this file). */ template EIGEN_DEVICE_FUNC inline void transform(const Transform& transform) { // Only Affine and Isometry transforms are currently supported. EIGEN_STATIC_ASSERT(Mode == Affine || Mode == AffineCompact || Mode == Isometry, THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS); // Method adapted from FCL src/shape/geometric_shapes_utility.cpp#computeBV(...) // https://github.com/flexible-collision-library/fcl/blob/fcl-0.4/src/shape/geometric_shapes_utility.cpp#L292 // // Here's a nice explanation why it works: https://zeuxcg.org/2010/10/17/aabb-from-obb-with-component-wise-abs/ // two times rotated extent const VectorType rotated_extent_2 = transform.linear().cwiseAbs() * sizes(); // two times new center const VectorType rotated_center_2 = transform.linear() * (this->m_max + this->m_min) + Scalar(2) * transform.translation(); this->m_max = (rotated_center_2 + rotated_extent_2) / Scalar(2); this->m_min = (rotated_center_2 - rotated_extent_2) / Scalar(2); } /** * \returns a copy of \c *this transformed by \a transform and recomputed to * still be an axis-aligned box. */ template EIGEN_DEVICE_FUNC AlignedBox transformed(const Transform& transform) const { AlignedBox result(m_min, m_max); result.transform(transform); return result; } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit AlignedBox(const AlignedBox& other) { m_min = (other.min)().template cast(); m_max = (other.max)().template cast(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const AlignedBox& other, const RealScalar& prec = ScalarTraits::dummy_precision()) const { return m_min.isApprox(other.m_min, prec) && m_max.isApprox(other.m_max, prec); } protected: VectorType m_min, m_max; }; template template EIGEN_DEVICE_FUNC inline Scalar AlignedBox::squaredExteriorDistance(const MatrixBase& a_p) const { typename internal::nested_eval::type p(a_p.derived()); Scalar dist2(0); Scalar aux; for (Index k=0; k p[k] ) { aux = m_min[k] - p[k]; dist2 += aux*aux; } else if( p[k] > m_max[k] ) { aux = p[k] - m_max[k]; dist2 += aux*aux; } } return dist2; } template EIGEN_DEVICE_FUNC inline Scalar AlignedBox::squaredExteriorDistance(const AlignedBox& b) const { Scalar dist2(0); Scalar aux; for (Index k=0; k b.m_max[k] ) { aux = m_min[k] - b.m_max[k]; dist2 += aux*aux; } else if( b.m_min[k] > m_max[k] ) { aux = b.m_min[k] - m_max[k]; dist2 += aux*aux; } } return dist2; } /** \defgroup alignedboxtypedefs Global aligned box typedefs * * \ingroup Geometry_Module * * Eigen defines several typedef shortcuts for most common aligned box types. * * The general patterns are the following: * * \c AlignedBoxSizeType where \c Size can be \c 1, \c 2,\c 3,\c 4 for fixed size boxes or \c X for dynamic size, * and where \c Type can be \c i for integer, \c f for float, \c d for double. * * For example, \c AlignedBox3d is a fixed-size 3x3 aligned box type of doubles, and \c AlignedBoxXf is a dynamic-size aligned box of floats. * * \sa class AlignedBox */ #define EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix) \ /** \ingroup alignedboxtypedefs */ \ typedef AlignedBox AlignedBox##SizeSuffix##TypeSuffix; #define EIGEN_MAKE_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 1, 1) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 2, 2) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 3, 3) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 4, 4) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Dynamic, X) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(int, i) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(float, f) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(double, d) #undef EIGEN_MAKE_TYPEDEFS_ALL_SIZES #undef EIGEN_MAKE_TYPEDEFS } // end namespace Eigen #endif // EIGEN_ALIGNEDBOX_H RcppEigen/inst/include/Eigen/src/Geometry/AngleAxis.h0000644000176200001440000002032314562417221022167 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ANGLEAXIS_H #define EIGEN_ANGLEAXIS_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class AngleAxis * * \brief Represents a 3D rotation as a rotation angle around an arbitrary 3D axis * * \param _Scalar the scalar type, i.e., the type of the coefficients. * * \warning When setting up an AngleAxis object, the axis vector \b must \b be \b normalized. * * The following two typedefs are provided for convenience: * \li \c AngleAxisf for \c float * \li \c AngleAxisd for \c double * * Combined with MatrixBase::Unit{X,Y,Z}, AngleAxis can be used to easily * mimic Euler-angles. Here is an example: * \include AngleAxis_mimic_euler.cpp * Output: \verbinclude AngleAxis_mimic_euler.out * * \note This class is not aimed to be used to store a rotation transformation, * but rather to make easier the creation of other rotation (Quaternion, rotation Matrix) * and transformation objects. * * \sa class Quaternion, class Transform, MatrixBase::UnitX() */ namespace internal { template struct traits > { typedef _Scalar Scalar; }; } template class AngleAxis : public RotationBase,3> { typedef RotationBase,3> Base; public: using Base::operator*; enum { Dim = 3 }; /** the scalar type of the coefficients */ typedef _Scalar Scalar; typedef Matrix Matrix3; typedef Matrix Vector3; typedef Quaternion QuaternionType; protected: Vector3 m_axis; Scalar m_angle; public: /** Default constructor without initialization. */ EIGEN_DEVICE_FUNC AngleAxis() {} /** Constructs and initialize the angle-axis rotation from an \a angle in radian * and an \a axis which \b must \b be \b normalized. * * \warning If the \a axis vector is not normalized, then the angle-axis object * represents an invalid rotation. */ template EIGEN_DEVICE_FUNC inline AngleAxis(const Scalar& angle, const MatrixBase& axis) : m_axis(axis), m_angle(angle) {} /** Constructs and initialize the angle-axis rotation from a quaternion \a q. * This function implicitly normalizes the quaternion \a q. */ template EIGEN_DEVICE_FUNC inline explicit AngleAxis(const QuaternionBase& q) { *this = q; } /** Constructs and initialize the angle-axis rotation from a 3x3 rotation matrix. */ template EIGEN_DEVICE_FUNC inline explicit AngleAxis(const MatrixBase& m) { *this = m; } /** \returns the value of the rotation angle in radian */ EIGEN_DEVICE_FUNC Scalar angle() const { return m_angle; } /** \returns a read-write reference to the stored angle in radian */ EIGEN_DEVICE_FUNC Scalar& angle() { return m_angle; } /** \returns the rotation axis */ EIGEN_DEVICE_FUNC const Vector3& axis() const { return m_axis; } /** \returns a read-write reference to the stored rotation axis. * * \warning The rotation axis must remain a \b unit vector. */ EIGEN_DEVICE_FUNC Vector3& axis() { return m_axis; } /** Concatenates two rotations */ EIGEN_DEVICE_FUNC inline QuaternionType operator* (const AngleAxis& other) const { return QuaternionType(*this) * QuaternionType(other); } /** Concatenates two rotations */ EIGEN_DEVICE_FUNC inline QuaternionType operator* (const QuaternionType& other) const { return QuaternionType(*this) * other; } /** Concatenates two rotations */ friend EIGEN_DEVICE_FUNC inline QuaternionType operator* (const QuaternionType& a, const AngleAxis& b) { return a * QuaternionType(b); } /** \returns the inverse rotation, i.e., an angle-axis with opposite rotation angle */ EIGEN_DEVICE_FUNC AngleAxis inverse() const { return AngleAxis(-m_angle, m_axis); } template EIGEN_DEVICE_FUNC AngleAxis& operator=(const QuaternionBase& q); template EIGEN_DEVICE_FUNC AngleAxis& operator=(const MatrixBase& m); template EIGEN_DEVICE_FUNC AngleAxis& fromRotationMatrix(const MatrixBase& m); EIGEN_DEVICE_FUNC Matrix3 toRotationMatrix(void) const; /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit AngleAxis(const AngleAxis& other) { m_axis = other.axis().template cast(); m_angle = Scalar(other.angle()); } EIGEN_DEVICE_FUNC static inline const AngleAxis Identity() { return AngleAxis(Scalar(0), Vector3::UnitX()); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const AngleAxis& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return m_axis.isApprox(other.m_axis, prec) && internal::isApprox(m_angle,other.m_angle, prec); } }; /** \ingroup Geometry_Module * single precision angle-axis type */ typedef AngleAxis AngleAxisf; /** \ingroup Geometry_Module * double precision angle-axis type */ typedef AngleAxis AngleAxisd; /** Set \c *this from a \b unit quaternion. * * The resulting axis is normalized, and the computed angle is in the [0,pi] range. * * This function implicitly normalizes the quaternion \a q. */ template template EIGEN_DEVICE_FUNC AngleAxis& AngleAxis::operator=(const QuaternionBase& q) { EIGEN_USING_STD(atan2) EIGEN_USING_STD(abs) Scalar n = q.vec().norm(); if(n::epsilon()) n = q.vec().stableNorm(); if (n != Scalar(0)) { m_angle = Scalar(2)*atan2(n, abs(q.w())); if(q.w() < Scalar(0)) n = -n; m_axis = q.vec() / n; } else { m_angle = Scalar(0); m_axis << Scalar(1), Scalar(0), Scalar(0); } return *this; } /** Set \c *this from a 3x3 rotation matrix \a mat. */ template template EIGEN_DEVICE_FUNC AngleAxis& AngleAxis::operator=(const MatrixBase& mat) { // Since a direct conversion would not be really faster, // let's use the robust Quaternion implementation: return *this = QuaternionType(mat); } /** * \brief Sets \c *this from a 3x3 rotation matrix. **/ template template EIGEN_DEVICE_FUNC AngleAxis& AngleAxis::fromRotationMatrix(const MatrixBase& mat) { return *this = QuaternionType(mat); } /** Constructs and \returns an equivalent 3x3 rotation matrix. */ template typename AngleAxis::Matrix3 EIGEN_DEVICE_FUNC AngleAxis::toRotationMatrix(void) const { EIGEN_USING_STD(sin) EIGEN_USING_STD(cos) Matrix3 res; Vector3 sin_axis = sin(m_angle) * m_axis; Scalar c = cos(m_angle); Vector3 cos1_axis = (Scalar(1)-c) * m_axis; Scalar tmp; tmp = cos1_axis.x() * m_axis.y(); res.coeffRef(0,1) = tmp - sin_axis.z(); res.coeffRef(1,0) = tmp + sin_axis.z(); tmp = cos1_axis.x() * m_axis.z(); res.coeffRef(0,2) = tmp + sin_axis.y(); res.coeffRef(2,0) = tmp - sin_axis.y(); tmp = cos1_axis.y() * m_axis.z(); res.coeffRef(1,2) = tmp - sin_axis.x(); res.coeffRef(2,1) = tmp + sin_axis.x(); res.diagonal() = (cos1_axis.cwiseProduct(m_axis)).array() + c; return res; } } // end namespace Eigen #endif // EIGEN_ANGLEAXIS_H RcppEigen/inst/include/Eigen/src/Geometry/ParametrizedLine.h0000644000176200001440000002312414562417221023555 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PARAMETRIZEDLINE_H #define EIGEN_PARAMETRIZEDLINE_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class ParametrizedLine * * \brief A parametrized line * * A parametrized line is defined by an origin point \f$ \mathbf{o} \f$ and a unit * direction vector \f$ \mathbf{d} \f$ such that the line corresponds to * the set \f$ l(t) = \mathbf{o} + t \mathbf{d} \f$, \f$ t \in \mathbf{R} \f$. * * \tparam _Scalar the scalar type, i.e., the type of the coefficients * \tparam _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic. */ template class ParametrizedLine { public: EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim) enum { AmbientDimAtCompileTime = _AmbientDim, Options = _Options }; typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 typedef Matrix VectorType; /** Default constructor without initialization */ EIGEN_DEVICE_FUNC inline ParametrizedLine() {} template EIGEN_DEVICE_FUNC ParametrizedLine(const ParametrizedLine& other) : m_origin(other.origin()), m_direction(other.direction()) {} /** Constructs a dynamic-size line with \a _dim the dimension * of the ambient space */ EIGEN_DEVICE_FUNC inline explicit ParametrizedLine(Index _dim) : m_origin(_dim), m_direction(_dim) {} /** Initializes a parametrized line of direction \a direction and origin \a origin. * \warning the vector direction is assumed to be normalized. */ EIGEN_DEVICE_FUNC ParametrizedLine(const VectorType& origin, const VectorType& direction) : m_origin(origin), m_direction(direction) {} template EIGEN_DEVICE_FUNC explicit ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane); /** Constructs a parametrized line going from \a p0 to \a p1. */ EIGEN_DEVICE_FUNC static inline ParametrizedLine Through(const VectorType& p0, const VectorType& p1) { return ParametrizedLine(p0, (p1-p0).normalized()); } EIGEN_DEVICE_FUNC ~ParametrizedLine() {} /** \returns the dimension in which the line holds */ EIGEN_DEVICE_FUNC inline Index dim() const { return m_direction.size(); } EIGEN_DEVICE_FUNC const VectorType& origin() const { return m_origin; } EIGEN_DEVICE_FUNC VectorType& origin() { return m_origin; } EIGEN_DEVICE_FUNC const VectorType& direction() const { return m_direction; } EIGEN_DEVICE_FUNC VectorType& direction() { return m_direction; } /** \returns the squared distance of a point \a p to its projection onto the line \c *this. * \sa distance() */ EIGEN_DEVICE_FUNC RealScalar squaredDistance(const VectorType& p) const { VectorType diff = p - origin(); return (diff - direction().dot(diff) * direction()).squaredNorm(); } /** \returns the distance of a point \a p to its projection onto the line \c *this. * \sa squaredDistance() */ EIGEN_DEVICE_FUNC RealScalar distance(const VectorType& p) const { EIGEN_USING_STD(sqrt) return sqrt(squaredDistance(p)); } /** \returns the projection of a point \a p onto the line \c *this. */ EIGEN_DEVICE_FUNC VectorType projection(const VectorType& p) const { return origin() + direction().dot(p-origin()) * direction(); } EIGEN_DEVICE_FUNC VectorType pointAt(const Scalar& t) const; template EIGEN_DEVICE_FUNC Scalar intersectionParameter(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const; template EIGEN_DEVICE_FUNC Scalar intersection(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const; template EIGEN_DEVICE_FUNC VectorType intersectionPoint(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const; /** Applies the transformation matrix \a mat to \c *this and returns a reference to \c *this. * * \param mat the Dim x Dim transformation matrix * \param traits specifies whether the matrix \a mat represents an #Isometry * or a more generic #Affine transformation. The default is #Affine. */ template EIGEN_DEVICE_FUNC inline ParametrizedLine& transform(const MatrixBase& mat, TransformTraits traits = Affine) { if (traits==Affine) direction() = (mat * direction()).normalized(); else if (traits==Isometry) direction() = mat * direction(); else { eigen_assert(0 && "invalid traits value in ParametrizedLine::transform()"); } origin() = mat * origin(); return *this; } /** Applies the transformation \a t to \c *this and returns a reference to \c *this. * * \param t the transformation of dimension Dim * \param traits specifies whether the transformation \a t represents an #Isometry * or a more generic #Affine transformation. The default is #Affine. * Other kind of transformations are not supported. */ template EIGEN_DEVICE_FUNC inline ParametrizedLine& transform(const Transform& t, TransformTraits traits = Affine) { transform(t.linear(), traits); origin() += t.translation(); return *this; } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const { return typename internal::cast_return_type >::type(*this); } /** Copy constructor with scalar type conversion */ template EIGEN_DEVICE_FUNC inline explicit ParametrizedLine(const ParametrizedLine& other) { m_origin = other.origin().template cast(); m_direction = other.direction().template cast(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ EIGEN_DEVICE_FUNC bool isApprox(const ParametrizedLine& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return m_origin.isApprox(other.m_origin, prec) && m_direction.isApprox(other.m_direction, prec); } protected: VectorType m_origin, m_direction; }; /** Constructs a parametrized line from a 2D hyperplane * * \warning the ambient space must have dimension 2 such that the hyperplane actually describes a line */ template template EIGEN_DEVICE_FUNC inline ParametrizedLine<_Scalar, _AmbientDim,_Options>::ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim,OtherOptions>& hyperplane) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2) direction() = hyperplane.normal().unitOrthogonal(); origin() = -hyperplane.normal()*hyperplane.offset(); } /** \returns the point at \a t along this line */ template EIGEN_DEVICE_FUNC inline typename ParametrizedLine<_Scalar, _AmbientDim,_Options>::VectorType ParametrizedLine<_Scalar, _AmbientDim,_Options>::pointAt(const _Scalar& t) const { return origin() + (direction()*t); } /** \returns the parameter value of the intersection between \c *this and the given \a hyperplane */ template template EIGEN_DEVICE_FUNC inline _Scalar ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersectionParameter(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const { return -(hyperplane.offset()+hyperplane.normal().dot(origin())) / hyperplane.normal().dot(direction()); } /** \deprecated use intersectionParameter() * \returns the parameter value of the intersection between \c *this and the given \a hyperplane */ template template EIGEN_DEVICE_FUNC inline _Scalar ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersection(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const { return intersectionParameter(hyperplane); } /** \returns the point of the intersection between \c *this and the given hyperplane */ template template EIGEN_DEVICE_FUNC inline typename ParametrizedLine<_Scalar, _AmbientDim,_Options>::VectorType ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersectionPoint(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const { return pointAt(intersectionParameter(hyperplane)); } } // end namespace Eigen #endif // EIGEN_PARAMETRIZEDLINE_H RcppEigen/inst/include/Eigen/src/Geometry/Scaling.h0000644000176200001440000001510414562417221021675 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SCALING_H #define EIGEN_SCALING_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \class UniformScaling * * \brief Represents a generic uniform scaling transformation * * \tparam _Scalar the scalar type, i.e., the type of the coefficients. * * This class represent a uniform scaling transformation. It is the return * type of Scaling(Scalar), and most of the time this is the only way it * is used. In particular, this class is not aimed to be used to store a scaling transformation, * but rather to make easier the constructions and updates of Transform objects. * * To represent an axis aligned scaling, use the DiagonalMatrix class. * * \sa Scaling(), class DiagonalMatrix, MatrixBase::asDiagonal(), class Translation, class Transform */ namespace internal { // This helper helps nvcc+MSVC to properly parse this file. // See bug 1412. template struct uniformscaling_times_affine_returntype { enum { NewMode = int(Mode) == int(Isometry) ? Affine : Mode }; typedef Transform type; }; } template class UniformScaling { public: /** the scalar type of the coefficients */ typedef _Scalar Scalar; protected: Scalar m_factor; public: /** Default constructor without initialization. */ UniformScaling() {} /** Constructs and initialize a uniform scaling transformation */ explicit inline UniformScaling(const Scalar& s) : m_factor(s) {} inline const Scalar& factor() const { return m_factor; } inline Scalar& factor() { return m_factor; } /** Concatenates two uniform scaling */ inline UniformScaling operator* (const UniformScaling& other) const { return UniformScaling(m_factor * other.factor()); } /** Concatenates a uniform scaling and a translation */ template inline Transform operator* (const Translation& t) const; /** Concatenates a uniform scaling and an affine transformation */ template inline typename internal::uniformscaling_times_affine_returntype::type operator* (const Transform& t) const { typename internal::uniformscaling_times_affine_returntype::type res = t; res.prescale(factor()); return res; } /** Concatenates a uniform scaling and a linear transformation matrix */ // TODO returns an expression template inline typename Eigen::internal::plain_matrix_type::type operator* (const MatrixBase& other) const { return other * m_factor; } template inline Matrix operator*(const RotationBase& r) const { return r.toRotationMatrix() * m_factor; } /** \returns the inverse scaling */ inline UniformScaling inverse() const { return UniformScaling(Scalar(1)/m_factor); } /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template inline UniformScaling cast() const { return UniformScaling(NewScalarType(m_factor)); } /** Copy constructor with scalar type conversion */ template inline explicit UniformScaling(const UniformScaling& other) { m_factor = Scalar(other.factor()); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ bool isApprox(const UniformScaling& other, const typename NumTraits::Real& prec = NumTraits::dummy_precision()) const { return internal::isApprox(m_factor, other.factor(), prec); } }; /** \addtogroup Geometry_Module */ //@{ /** Concatenates a linear transformation matrix and a uniform scaling * \relates UniformScaling */ // NOTE this operator is defined in MatrixBase and not as a friend function // of UniformScaling to fix an internal crash of Intel's ICC template EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,Scalar,product) operator*(const MatrixBase& matrix, const UniformScaling& s) { return matrix.derived() * s.factor(); } /** Constructs a uniform scaling from scale factor \a s */ inline UniformScaling Scaling(float s) { return UniformScaling(s); } /** Constructs a uniform scaling from scale factor \a s */ inline UniformScaling Scaling(double s) { return UniformScaling(s); } /** Constructs a uniform scaling from scale factor \a s */ template inline UniformScaling > Scaling(const std::complex& s) { return UniformScaling >(s); } /** Constructs a 2D axis aligned scaling */ template inline DiagonalMatrix Scaling(const Scalar& sx, const Scalar& sy) { return DiagonalMatrix(sx, sy); } /** Constructs a 3D axis aligned scaling */ template inline DiagonalMatrix Scaling(const Scalar& sx, const Scalar& sy, const Scalar& sz) { return DiagonalMatrix(sx, sy, sz); } /** Constructs an axis aligned scaling expression from vector expression \a coeffs * This is an alias for coeffs.asDiagonal() */ template inline const DiagonalWrapper Scaling(const MatrixBase& coeffs) { return coeffs.asDiagonal(); } /** \deprecated */ typedef DiagonalMatrix AlignedScaling2f; /** \deprecated */ typedef DiagonalMatrix AlignedScaling2d; /** \deprecated */ typedef DiagonalMatrix AlignedScaling3f; /** \deprecated */ typedef DiagonalMatrix AlignedScaling3d; //@} template template inline Transform UniformScaling::operator* (const Translation& t) const { Transform res; res.matrix().setZero(); res.linear().diagonal().fill(factor()); res.translation() = factor() * t.vector(); res(Dim,Dim) = Scalar(1); return res; } } // end namespace Eigen #endif // EIGEN_SCALING_H RcppEigen/inst/include/Eigen/src/Geometry/Quaternion.h0000644000176200001440000010307714562417221022451 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // Copyright (C) 2009 Mathieu Gautier // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_QUATERNION_H #define EIGEN_QUATERNION_H namespace Eigen { /*************************************************************************** * Definition of QuaternionBase * The implementation is at the end of the file ***************************************************************************/ namespace internal { template struct quaternionbase_assign_impl; } /** \geometry_module \ingroup Geometry_Module * \class QuaternionBase * \brief Base class for quaternion expressions * \tparam Derived derived type (CRTP) * \sa class Quaternion */ template class QuaternionBase : public RotationBase { public: typedef RotationBase Base; using Base::operator*; using Base::derived; typedef typename internal::traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef typename internal::traits::Coefficients Coefficients; typedef typename Coefficients::CoeffReturnType CoeffReturnType; typedef typename internal::conditional::Flags&LvalueBit), Scalar&, CoeffReturnType>::type NonConstCoeffReturnType; enum { Flags = Eigen::internal::traits::Flags }; // typedef typename Matrix Coefficients; /** the type of a 3D vector */ typedef Matrix Vector3; /** the equivalent rotation matrix type */ typedef Matrix Matrix3; /** the equivalent angle-axis type */ typedef AngleAxis AngleAxisType; /** \returns the \c x coefficient */ EIGEN_DEVICE_FUNC inline CoeffReturnType x() const { return this->derived().coeffs().coeff(0); } /** \returns the \c y coefficient */ EIGEN_DEVICE_FUNC inline CoeffReturnType y() const { return this->derived().coeffs().coeff(1); } /** \returns the \c z coefficient */ EIGEN_DEVICE_FUNC inline CoeffReturnType z() const { return this->derived().coeffs().coeff(2); } /** \returns the \c w coefficient */ EIGEN_DEVICE_FUNC inline CoeffReturnType w() const { return this->derived().coeffs().coeff(3); } /** \returns a reference to the \c x coefficient (if Derived is a non-const lvalue) */ EIGEN_DEVICE_FUNC inline NonConstCoeffReturnType x() { return this->derived().coeffs().x(); } /** \returns a reference to the \c y coefficient (if Derived is a non-const lvalue) */ EIGEN_DEVICE_FUNC inline NonConstCoeffReturnType y() { return this->derived().coeffs().y(); } /** \returns a reference to the \c z coefficient (if Derived is a non-const lvalue) */ EIGEN_DEVICE_FUNC inline NonConstCoeffReturnType z() { return this->derived().coeffs().z(); } /** \returns a reference to the \c w coefficient (if Derived is a non-const lvalue) */ EIGEN_DEVICE_FUNC inline NonConstCoeffReturnType w() { return this->derived().coeffs().w(); } /** \returns a read-only vector expression of the imaginary part (x,y,z) */ EIGEN_DEVICE_FUNC inline const VectorBlock vec() const { return coeffs().template head<3>(); } /** \returns a vector expression of the imaginary part (x,y,z) */ EIGEN_DEVICE_FUNC inline VectorBlock vec() { return coeffs().template head<3>(); } /** \returns a read-only vector expression of the coefficients (x,y,z,w) */ EIGEN_DEVICE_FUNC inline const typename internal::traits::Coefficients& coeffs() const { return derived().coeffs(); } /** \returns a vector expression of the coefficients (x,y,z,w) */ EIGEN_DEVICE_FUNC inline typename internal::traits::Coefficients& coeffs() { return derived().coeffs(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE QuaternionBase& operator=(const QuaternionBase& other); template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const QuaternionBase& other); // disabled this copy operator as it is giving very strange compilation errors when compiling // test_stdvector with GCC 4.4.2. This looks like a GCC bug though, so feel free to re-enable it if it's // useful; however notice that we already have the templated operator= above and e.g. in MatrixBase // we didn't have to add, in addition to templated operator=, such a non-templated copy operator. // Derived& operator=(const QuaternionBase& other) // { return operator=(other); } EIGEN_DEVICE_FUNC Derived& operator=(const AngleAxisType& aa); template EIGEN_DEVICE_FUNC Derived& operator=(const MatrixBase& m); /** \returns a quaternion representing an identity rotation * \sa MatrixBase::Identity() */ EIGEN_DEVICE_FUNC static inline Quaternion Identity() { return Quaternion(Scalar(1), Scalar(0), Scalar(0), Scalar(0)); } /** \sa QuaternionBase::Identity(), MatrixBase::setIdentity() */ EIGEN_DEVICE_FUNC inline QuaternionBase& setIdentity() { coeffs() << Scalar(0), Scalar(0), Scalar(0), Scalar(1); return *this; } /** \returns the squared norm of the quaternion's coefficients * \sa QuaternionBase::norm(), MatrixBase::squaredNorm() */ EIGEN_DEVICE_FUNC inline Scalar squaredNorm() const { return coeffs().squaredNorm(); } /** \returns the norm of the quaternion's coefficients * \sa QuaternionBase::squaredNorm(), MatrixBase::norm() */ EIGEN_DEVICE_FUNC inline Scalar norm() const { return coeffs().norm(); } /** Normalizes the quaternion \c *this * \sa normalized(), MatrixBase::normalize() */ EIGEN_DEVICE_FUNC inline void normalize() { coeffs().normalize(); } /** \returns a normalized copy of \c *this * \sa normalize(), MatrixBase::normalized() */ EIGEN_DEVICE_FUNC inline Quaternion normalized() const { return Quaternion(coeffs().normalized()); } /** \returns the dot product of \c *this and \a other * Geometrically speaking, the dot product of two unit quaternions * corresponds to the cosine of half the angle between the two rotations. * \sa angularDistance() */ template EIGEN_DEVICE_FUNC inline Scalar dot(const QuaternionBase& other) const { return coeffs().dot(other.coeffs()); } template EIGEN_DEVICE_FUNC Scalar angularDistance(const QuaternionBase& other) const; /** \returns an equivalent 3x3 rotation matrix */ EIGEN_DEVICE_FUNC inline Matrix3 toRotationMatrix() const; /** \returns the quaternion which transform \a a into \a b through a rotation */ template EIGEN_DEVICE_FUNC Derived& setFromTwoVectors(const MatrixBase& a, const MatrixBase& b); template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Quaternion operator* (const QuaternionBase& q) const; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator*= (const QuaternionBase& q); /** \returns the quaternion describing the inverse rotation */ EIGEN_DEVICE_FUNC Quaternion inverse() const; /** \returns the conjugated quaternion */ EIGEN_DEVICE_FUNC Quaternion conjugate() const; template EIGEN_DEVICE_FUNC Quaternion slerp(const Scalar& t, const QuaternionBase& other) const; /** \returns true if each coefficients of \c *this and \a other are all exactly equal. * \warning When using floating point scalar values you probably should rather use a * fuzzy comparison such as isApprox() * \sa isApprox(), operator!= */ template EIGEN_DEVICE_FUNC inline bool operator==(const QuaternionBase& other) const { return coeffs() == other.coeffs(); } /** \returns true if at least one pair of coefficients of \c *this and \a other are not exactly equal to each other. * \warning When using floating point scalar values you probably should rather use a * fuzzy comparison such as isApprox() * \sa isApprox(), operator== */ template EIGEN_DEVICE_FUNC inline bool operator!=(const QuaternionBase& other) const { return coeffs() != other.coeffs(); } /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \sa MatrixBase::isApprox() */ template EIGEN_DEVICE_FUNC bool isApprox(const QuaternionBase& other, const RealScalar& prec = NumTraits::dummy_precision()) const { return coeffs().isApprox(other.coeffs(), prec); } /** return the result vector of \a v through the rotation*/ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Vector3 _transformVector(const Vector3& v) const; #ifdef EIGEN_PARSED_BY_DOXYGEN /** \returns \c *this with scalar type casted to \a NewScalarType * * Note that if \a NewScalarType is equal to the current scalar type of \c *this * then this function smartly returns a const reference to \c *this. */ template EIGEN_DEVICE_FUNC inline typename internal::cast_return_type >::type cast() const; #else template EIGEN_DEVICE_FUNC inline typename internal::enable_if::value,const Derived&>::type cast() const { return derived(); } template EIGEN_DEVICE_FUNC inline typename internal::enable_if::value,Quaternion >::type cast() const { return Quaternion(coeffs().template cast()); } #endif #ifndef EIGEN_NO_IO friend std::ostream& operator<<(std::ostream& s, const QuaternionBase& q) { s << q.x() << "i + " << q.y() << "j + " << q.z() << "k" << " + " << q.w(); return s; } #endif #ifdef EIGEN_QUATERNIONBASE_PLUGIN # include EIGEN_QUATERNIONBASE_PLUGIN #endif protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(QuaternionBase) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(QuaternionBase) }; /*************************************************************************** * Definition/implementation of Quaternion ***************************************************************************/ /** \geometry_module \ingroup Geometry_Module * * \class Quaternion * * \brief The quaternion class used to represent 3D orientations and rotations * * \tparam _Scalar the scalar type, i.e., the type of the coefficients * \tparam _Options controls the memory alignment of the coefficients. Can be \# AutoAlign or \# DontAlign. Default is AutoAlign. * * This class represents a quaternion \f$ w+xi+yj+zk \f$ that is a convenient representation of * orientations and rotations of objects in three dimensions. Compared to other representations * like Euler angles or 3x3 matrices, quaternions offer the following advantages: * \li \b compact storage (4 scalars) * \li \b efficient to compose (28 flops), * \li \b stable spherical interpolation * * The following two typedefs are provided for convenience: * \li \c Quaternionf for \c float * \li \c Quaterniond for \c double * * \warning Operations interpreting the quaternion as rotation have undefined behavior if the quaternion is not normalized. * * \sa class AngleAxis, class Transform */ namespace internal { template struct traits > { typedef Quaternion<_Scalar,_Options> PlainObject; typedef _Scalar Scalar; typedef Matrix<_Scalar,4,1,_Options> Coefficients; enum{ Alignment = internal::traits::Alignment, Flags = LvalueBit }; }; } template class Quaternion : public QuaternionBase > { public: typedef QuaternionBase > Base; enum { NeedsAlignment = internal::traits::Alignment>0 }; typedef _Scalar Scalar; EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Quaternion) using Base::operator*=; typedef typename internal::traits::Coefficients Coefficients; typedef typename Base::AngleAxisType AngleAxisType; /** Default constructor leaving the quaternion uninitialized. */ EIGEN_DEVICE_FUNC inline Quaternion() {} /** Constructs and initializes the quaternion \f$ w+xi+yj+zk \f$ from * its four coefficients \a w, \a x, \a y and \a z. * * \warning Note the order of the arguments: the real \a w coefficient first, * while internally the coefficients are stored in the following order: * [\c x, \c y, \c z, \c w] */ EIGEN_DEVICE_FUNC inline Quaternion(const Scalar& w, const Scalar& x, const Scalar& y, const Scalar& z) : m_coeffs(x, y, z, w){} /** Constructs and initialize a quaternion from the array data */ EIGEN_DEVICE_FUNC explicit inline Quaternion(const Scalar* data) : m_coeffs(data) {} /** Copy constructor */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Quaternion(const QuaternionBase& other) { this->Base::operator=(other); } /** Constructs and initializes a quaternion from the angle-axis \a aa */ EIGEN_DEVICE_FUNC explicit inline Quaternion(const AngleAxisType& aa) { *this = aa; } /** Constructs and initializes a quaternion from either: * - a rotation matrix expression, * - a 4D vector expression representing quaternion coefficients. */ template EIGEN_DEVICE_FUNC explicit inline Quaternion(const MatrixBase& other) { *this = other; } /** Explicit copy constructor with scalar conversion */ template EIGEN_DEVICE_FUNC explicit inline Quaternion(const Quaternion& other) { m_coeffs = other.coeffs().template cast(); } #if EIGEN_HAS_RVALUE_REFERENCES // We define a copy constructor, which means we don't get an implicit move constructor or assignment operator. /** Default move constructor */ EIGEN_DEVICE_FUNC inline Quaternion(Quaternion&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_constructible::value) : m_coeffs(std::move(other.coeffs())) {} /** Default move assignment operator */ EIGEN_DEVICE_FUNC Quaternion& operator=(Quaternion&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable::value) { m_coeffs = std::move(other.coeffs()); return *this; } #endif EIGEN_DEVICE_FUNC static Quaternion UnitRandom(); template EIGEN_DEVICE_FUNC static Quaternion FromTwoVectors(const MatrixBase& a, const MatrixBase& b); EIGEN_DEVICE_FUNC inline Coefficients& coeffs() { return m_coeffs;} EIGEN_DEVICE_FUNC inline const Coefficients& coeffs() const { return m_coeffs;} EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(bool(NeedsAlignment)) #ifdef EIGEN_QUATERNION_PLUGIN # include EIGEN_QUATERNION_PLUGIN #endif protected: Coefficients m_coeffs; #ifndef EIGEN_PARSED_BY_DOXYGEN static EIGEN_STRONG_INLINE void _check_template_params() { EIGEN_STATIC_ASSERT( (_Options & DontAlign) == _Options, INVALID_MATRIX_TEMPLATE_PARAMETERS) } #endif }; /** \ingroup Geometry_Module * single precision quaternion type */ typedef Quaternion Quaternionf; /** \ingroup Geometry_Module * double precision quaternion type */ typedef Quaternion Quaterniond; /*************************************************************************** * Specialization of Map> ***************************************************************************/ namespace internal { template struct traits, _Options> > : traits > { typedef Map, _Options> Coefficients; }; } namespace internal { template struct traits, _Options> > : traits > { typedef Map, _Options> Coefficients; typedef traits > TraitsBase; enum { Flags = TraitsBase::Flags & ~LvalueBit }; }; } /** \ingroup Geometry_Module * \brief Quaternion expression mapping a constant memory buffer * * \tparam _Scalar the type of the Quaternion coefficients * \tparam _Options see class Map * * This is a specialization of class Map for Quaternion. This class allows to view * a 4 scalar memory buffer as an Eigen's Quaternion object. * * \sa class Map, class Quaternion, class QuaternionBase */ template class Map, _Options > : public QuaternionBase, _Options> > { public: typedef QuaternionBase, _Options> > Base; typedef _Scalar Scalar; typedef typename internal::traits::Coefficients Coefficients; EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Map) using Base::operator*=; /** Constructs a Mapped Quaternion object from the pointer \a coeffs * * The pointer \a coeffs must reference the four coefficients of Quaternion in the following order: * \code *coeffs == {x, y, z, w} \endcode * * If the template parameter _Options is set to #Aligned, then the pointer coeffs must be aligned. */ EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE Map(const Scalar* coeffs) : m_coeffs(coeffs) {} EIGEN_DEVICE_FUNC inline const Coefficients& coeffs() const { return m_coeffs;} protected: const Coefficients m_coeffs; }; /** \ingroup Geometry_Module * \brief Expression of a quaternion from a memory buffer * * \tparam _Scalar the type of the Quaternion coefficients * \tparam _Options see class Map * * This is a specialization of class Map for Quaternion. This class allows to view * a 4 scalar memory buffer as an Eigen's Quaternion object. * * \sa class Map, class Quaternion, class QuaternionBase */ template class Map, _Options > : public QuaternionBase, _Options> > { public: typedef QuaternionBase, _Options> > Base; typedef _Scalar Scalar; typedef typename internal::traits::Coefficients Coefficients; EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Map) using Base::operator*=; /** Constructs a Mapped Quaternion object from the pointer \a coeffs * * The pointer \a coeffs must reference the four coefficients of Quaternion in the following order: * \code *coeffs == {x, y, z, w} \endcode * * If the template parameter _Options is set to #Aligned, then the pointer coeffs must be aligned. */ EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE Map(Scalar* coeffs) : m_coeffs(coeffs) {} EIGEN_DEVICE_FUNC inline Coefficients& coeffs() { return m_coeffs; } EIGEN_DEVICE_FUNC inline const Coefficients& coeffs() const { return m_coeffs; } protected: Coefficients m_coeffs; }; /** \ingroup Geometry_Module * Map an unaligned array of single precision scalars as a quaternion */ typedef Map, 0> QuaternionMapf; /** \ingroup Geometry_Module * Map an unaligned array of double precision scalars as a quaternion */ typedef Map, 0> QuaternionMapd; /** \ingroup Geometry_Module * Map a 16-byte aligned array of single precision scalars as a quaternion */ typedef Map, Aligned> QuaternionMapAlignedf; /** \ingroup Geometry_Module * Map a 16-byte aligned array of double precision scalars as a quaternion */ typedef Map, Aligned> QuaternionMapAlignedd; /*************************************************************************** * Implementation of QuaternionBase methods ***************************************************************************/ // Generic Quaternion * Quaternion product // This product can be specialized for a given architecture via the Arch template argument. namespace internal { template struct quat_product { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Quaternion run(const QuaternionBase& a, const QuaternionBase& b){ return Quaternion ( a.w() * b.w() - a.x() * b.x() - a.y() * b.y() - a.z() * b.z(), a.w() * b.x() + a.x() * b.w() + a.y() * b.z() - a.z() * b.y(), a.w() * b.y() + a.y() * b.w() + a.z() * b.x() - a.x() * b.z(), a.w() * b.z() + a.z() * b.w() + a.x() * b.y() - a.y() * b.x() ); } }; } /** \returns the concatenation of two rotations as a quaternion-quaternion product */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Quaternion::Scalar> QuaternionBase::operator* (const QuaternionBase& other) const { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) return internal::quat_product::Scalar>::run(*this, other); } /** \sa operator*(Quaternion) */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& QuaternionBase::operator*= (const QuaternionBase& other) { derived() = derived() * other.derived(); return derived(); } /** Rotation of a vector by a quaternion. * \remarks If the quaternion is used to rotate several points (>1) * then it is much more efficient to first convert it to a 3x3 Matrix. * Comparison of the operation cost for n transformations: * - Quaternion2: 30n * - Via a Matrix3: 24 + 15n */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename QuaternionBase::Vector3 QuaternionBase::_transformVector(const Vector3& v) const { // Note that this algorithm comes from the optimization by hand // of the conversion to a Matrix followed by a Matrix/Vector product. // It appears to be much faster than the common algorithm found // in the literature (30 versus 39 flops). It also requires two // Vector3 as temporaries. Vector3 uv = this->vec().cross(v); uv += uv; return v + this->w() * uv + this->vec().cross(uv); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE QuaternionBase& QuaternionBase::operator=(const QuaternionBase& other) { coeffs() = other.coeffs(); return derived(); } template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& QuaternionBase::operator=(const QuaternionBase& other) { coeffs() = other.coeffs(); return derived(); } /** Set \c *this from an angle-axis \a aa and returns a reference to \c *this */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& QuaternionBase::operator=(const AngleAxisType& aa) { EIGEN_USING_STD(cos) EIGEN_USING_STD(sin) Scalar ha = Scalar(0.5)*aa.angle(); // Scalar(0.5) to suppress precision loss warnings this->w() = cos(ha); this->vec() = sin(ha) * aa.axis(); return derived(); } /** Set \c *this from the expression \a xpr: * - if \a xpr is a 4x1 vector, then \a xpr is assumed to be a quaternion * - if \a xpr is a 3x3 matrix, then \a xpr is assumed to be rotation matrix * and \a xpr is converted to a quaternion */ template template EIGEN_DEVICE_FUNC inline Derived& QuaternionBase::operator=(const MatrixBase& xpr) { EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) internal::quaternionbase_assign_impl::run(*this, xpr.derived()); return derived(); } /** Convert the quaternion to a 3x3 rotation matrix. The quaternion is required to * be normalized, otherwise the result is undefined. */ template EIGEN_DEVICE_FUNC inline typename QuaternionBase::Matrix3 QuaternionBase::toRotationMatrix(void) const { // NOTE if inlined, then gcc 4.2 and 4.4 get rid of the temporary (not gcc 4.3 !!) // if not inlined then the cost of the return by value is huge ~ +35%, // however, not inlining this function is an order of magnitude slower, so // it has to be inlined, and so the return by value is not an issue Matrix3 res; const Scalar tx = Scalar(2)*this->x(); const Scalar ty = Scalar(2)*this->y(); const Scalar tz = Scalar(2)*this->z(); const Scalar twx = tx*this->w(); const Scalar twy = ty*this->w(); const Scalar twz = tz*this->w(); const Scalar txx = tx*this->x(); const Scalar txy = ty*this->x(); const Scalar txz = tz*this->x(); const Scalar tyy = ty*this->y(); const Scalar tyz = tz*this->y(); const Scalar tzz = tz*this->z(); res.coeffRef(0,0) = Scalar(1)-(tyy+tzz); res.coeffRef(0,1) = txy-twz; res.coeffRef(0,2) = txz+twy; res.coeffRef(1,0) = txy+twz; res.coeffRef(1,1) = Scalar(1)-(txx+tzz); res.coeffRef(1,2) = tyz-twx; res.coeffRef(2,0) = txz-twy; res.coeffRef(2,1) = tyz+twx; res.coeffRef(2,2) = Scalar(1)-(txx+tyy); return res; } /** Sets \c *this to be a quaternion representing a rotation between * the two arbitrary vectors \a a and \a b. In other words, the built * rotation represent a rotation sending the line of direction \a a * to the line of direction \a b, both lines passing through the origin. * * \returns a reference to \c *this. * * Note that the two input vectors do \b not have to be normalized, and * do not need to have the same norm. */ template template EIGEN_DEVICE_FUNC inline Derived& QuaternionBase::setFromTwoVectors(const MatrixBase& a, const MatrixBase& b) { EIGEN_USING_STD(sqrt) Vector3 v0 = a.normalized(); Vector3 v1 = b.normalized(); Scalar c = v1.dot(v0); // if dot == -1, vectors are nearly opposites // => accurately compute the rotation axis by computing the // intersection of the two planes. This is done by solving: // x^T v0 = 0 // x^T v1 = 0 // under the constraint: // ||x|| = 1 // which yields a singular value problem if (c < Scalar(-1)+NumTraits::dummy_precision()) { c = numext::maxi(c,Scalar(-1)); Matrix m; m << v0.transpose(), v1.transpose(); JacobiSVD > svd(m, ComputeFullV); Vector3 axis = svd.matrixV().col(2); Scalar w2 = (Scalar(1)+c)*Scalar(0.5); this->w() = sqrt(w2); this->vec() = axis * sqrt(Scalar(1) - w2); return derived(); } Vector3 axis = v0.cross(v1); Scalar s = sqrt((Scalar(1)+c)*Scalar(2)); Scalar invs = Scalar(1)/s; this->vec() = axis * invs; this->w() = s * Scalar(0.5); return derived(); } /** \returns a random unit quaternion following a uniform distribution law on SO(3) * * \note The implementation is based on http://planning.cs.uiuc.edu/node198.html */ template EIGEN_DEVICE_FUNC Quaternion Quaternion::UnitRandom() { EIGEN_USING_STD(sqrt) EIGEN_USING_STD(sin) EIGEN_USING_STD(cos) const Scalar u1 = internal::random(0, 1), u2 = internal::random(0, 2*EIGEN_PI), u3 = internal::random(0, 2*EIGEN_PI); const Scalar a = sqrt(Scalar(1) - u1), b = sqrt(u1); return Quaternion (a * sin(u2), a * cos(u2), b * sin(u3), b * cos(u3)); } /** Returns a quaternion representing a rotation between * the two arbitrary vectors \a a and \a b. In other words, the built * rotation represent a rotation sending the line of direction \a a * to the line of direction \a b, both lines passing through the origin. * * \returns resulting quaternion * * Note that the two input vectors do \b not have to be normalized, and * do not need to have the same norm. */ template template EIGEN_DEVICE_FUNC Quaternion Quaternion::FromTwoVectors(const MatrixBase& a, const MatrixBase& b) { Quaternion quat; quat.setFromTwoVectors(a, b); return quat; } /** \returns the multiplicative inverse of \c *this * Note that in most cases, i.e., if you simply want the opposite rotation, * and/or the quaternion is normalized, then it is enough to use the conjugate. * * \sa QuaternionBase::conjugate() */ template EIGEN_DEVICE_FUNC inline Quaternion::Scalar> QuaternionBase::inverse() const { // FIXME should this function be called multiplicativeInverse and conjugate() be called inverse() or opposite() ?? Scalar n2 = this->squaredNorm(); if (n2 > Scalar(0)) return Quaternion(conjugate().coeffs() / n2); else { // return an invalid result to flag the error return Quaternion(Coefficients::Zero()); } } // Generic conjugate of a Quaternion namespace internal { template struct quat_conj { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Quaternion run(const QuaternionBase& q){ return Quaternion(q.w(),-q.x(),-q.y(),-q.z()); } }; } /** \returns the conjugate of the \c *this which is equal to the multiplicative inverse * if the quaternion is normalized. * The conjugate of a quaternion represents the opposite rotation. * * \sa Quaternion2::inverse() */ template EIGEN_DEVICE_FUNC inline Quaternion::Scalar> QuaternionBase::conjugate() const { return internal::quat_conj::Scalar>::run(*this); } /** \returns the angle (in radian) between two rotations * \sa dot() */ template template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar QuaternionBase::angularDistance(const QuaternionBase& other) const { EIGEN_USING_STD(atan2) Quaternion d = (*this) * other.conjugate(); return Scalar(2) * atan2( d.vec().norm(), numext::abs(d.w()) ); } /** \returns the spherical linear interpolation between the two quaternions * \c *this and \a other at the parameter \a t in [0;1]. * * This represents an interpolation for a constant motion between \c *this and \a other, * see also http://en.wikipedia.org/wiki/Slerp. */ template template EIGEN_DEVICE_FUNC Quaternion::Scalar> QuaternionBase::slerp(const Scalar& t, const QuaternionBase& other) const { EIGEN_USING_STD(acos) EIGEN_USING_STD(sin) const Scalar one = Scalar(1) - NumTraits::epsilon(); Scalar d = this->dot(other); Scalar absD = numext::abs(d); Scalar scale0; Scalar scale1; if(absD>=one) { scale0 = Scalar(1) - t; scale1 = t; } else { // theta is the angle between the 2 quaternions Scalar theta = acos(absD); Scalar sinTheta = sin(theta); scale0 = sin( ( Scalar(1) - t ) * theta) / sinTheta; scale1 = sin( ( t * theta) ) / sinTheta; } if(d(scale0 * coeffs() + scale1 * other.coeffs()); } namespace internal { // set from a rotation matrix template struct quaternionbase_assign_impl { typedef typename Other::Scalar Scalar; template EIGEN_DEVICE_FUNC static inline void run(QuaternionBase& q, const Other& a_mat) { const typename internal::nested_eval::type mat(a_mat); EIGEN_USING_STD(sqrt) // This algorithm comes from "Quaternion Calculus and Fast Animation", // Ken Shoemake, 1987 SIGGRAPH course notes Scalar t = mat.trace(); if (t > Scalar(0)) { t = sqrt(t + Scalar(1.0)); q.w() = Scalar(0.5)*t; t = Scalar(0.5)/t; q.x() = (mat.coeff(2,1) - mat.coeff(1,2)) * t; q.y() = (mat.coeff(0,2) - mat.coeff(2,0)) * t; q.z() = (mat.coeff(1,0) - mat.coeff(0,1)) * t; } else { Index i = 0; if (mat.coeff(1,1) > mat.coeff(0,0)) i = 1; if (mat.coeff(2,2) > mat.coeff(i,i)) i = 2; Index j = (i+1)%3; Index k = (j+1)%3; t = sqrt(mat.coeff(i,i)-mat.coeff(j,j)-mat.coeff(k,k) + Scalar(1.0)); q.coeffs().coeffRef(i) = Scalar(0.5) * t; t = Scalar(0.5)/t; q.w() = (mat.coeff(k,j)-mat.coeff(j,k))*t; q.coeffs().coeffRef(j) = (mat.coeff(j,i)+mat.coeff(i,j))*t; q.coeffs().coeffRef(k) = (mat.coeff(k,i)+mat.coeff(i,k))*t; } } }; // set from a vector of coefficients assumed to be a quaternion template struct quaternionbase_assign_impl { typedef typename Other::Scalar Scalar; template EIGEN_DEVICE_FUNC static inline void run(QuaternionBase& q, const Other& vec) { q.coeffs() = vec; } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_QUATERNION_H RcppEigen/inst/include/Eigen/src/Geometry/OrthoMethods.h0000644000176200001440000002137314562417221022741 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ORTHOMETHODS_H #define EIGEN_ORTHOMETHODS_H namespace Eigen { /** \geometry_module \ingroup Geometry_Module * * \returns the cross product of \c *this and \a other * * Here is a very good explanation of cross-product: http://xkcd.com/199/ * * With complex numbers, the cross product is implemented as * \f$ (\mathbf{a}+i\mathbf{b}) \times (\mathbf{c}+i\mathbf{d}) = (\mathbf{a} \times \mathbf{c} - \mathbf{b} \times \mathbf{d}) - i(\mathbf{a} \times \mathbf{d} - \mathbf{b} \times \mathbf{c})\f$ * * \sa MatrixBase::cross3() */ template template #ifndef EIGEN_PARSED_BY_DOXYGEN EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename MatrixBase::template cross_product_return_type::type #else typename MatrixBase::PlainObject #endif MatrixBase::cross(const MatrixBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Derived,3) EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,3) // Note that there is no need for an expression here since the compiler // optimize such a small temporary very well (even within a complex expression) typename internal::nested_eval::type lhs(derived()); typename internal::nested_eval::type rhs(other.derived()); return typename cross_product_return_type::type( numext::conj(lhs.coeff(1) * rhs.coeff(2) - lhs.coeff(2) * rhs.coeff(1)), numext::conj(lhs.coeff(2) * rhs.coeff(0) - lhs.coeff(0) * rhs.coeff(2)), numext::conj(lhs.coeff(0) * rhs.coeff(1) - lhs.coeff(1) * rhs.coeff(0)) ); } namespace internal { template< int Arch,typename VectorLhs,typename VectorRhs, typename Scalar = typename VectorLhs::Scalar, bool Vectorizable = bool((VectorLhs::Flags&VectorRhs::Flags)&PacketAccessBit)> struct cross3_impl { EIGEN_DEVICE_FUNC static inline typename internal::plain_matrix_type::type run(const VectorLhs& lhs, const VectorRhs& rhs) { return typename internal::plain_matrix_type::type( numext::conj(lhs.coeff(1) * rhs.coeff(2) - lhs.coeff(2) * rhs.coeff(1)), numext::conj(lhs.coeff(2) * rhs.coeff(0) - lhs.coeff(0) * rhs.coeff(2)), numext::conj(lhs.coeff(0) * rhs.coeff(1) - lhs.coeff(1) * rhs.coeff(0)), 0 ); } }; } /** \geometry_module \ingroup Geometry_Module * * \returns the cross product of \c *this and \a other using only the x, y, and z coefficients * * The size of \c *this and \a other must be four. This function is especially useful * when using 4D vectors instead of 3D ones to get advantage of SSE/AltiVec vectorization. * * \sa MatrixBase::cross() */ template template EIGEN_DEVICE_FUNC inline typename MatrixBase::PlainObject MatrixBase::cross3(const MatrixBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Derived,4) EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,4) typedef typename internal::nested_eval::type DerivedNested; typedef typename internal::nested_eval::type OtherDerivedNested; DerivedNested lhs(derived()); OtherDerivedNested rhs(other.derived()); return internal::cross3_impl::type, typename internal::remove_all::type>::run(lhs,rhs); } /** \geometry_module \ingroup Geometry_Module * * \returns a matrix expression of the cross product of each column or row * of the referenced expression with the \a other vector. * * The referenced matrix must have one dimension equal to 3. * The result matrix has the same dimensions than the referenced one. * * \sa MatrixBase::cross() */ template template EIGEN_DEVICE_FUNC const typename VectorwiseOp::CrossReturnType VectorwiseOp::cross(const MatrixBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,3) EIGEN_STATIC_ASSERT((internal::is_same::value), YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) typename internal::nested_eval::type mat(_expression()); typename internal::nested_eval::type vec(other.derived()); CrossReturnType res(_expression().rows(),_expression().cols()); if(Direction==Vertical) { eigen_assert(CrossReturnType::RowsAtCompileTime==3 && "the matrix must have exactly 3 rows"); res.row(0) = (mat.row(1) * vec.coeff(2) - mat.row(2) * vec.coeff(1)).conjugate(); res.row(1) = (mat.row(2) * vec.coeff(0) - mat.row(0) * vec.coeff(2)).conjugate(); res.row(2) = (mat.row(0) * vec.coeff(1) - mat.row(1) * vec.coeff(0)).conjugate(); } else { eigen_assert(CrossReturnType::ColsAtCompileTime==3 && "the matrix must have exactly 3 columns"); res.col(0) = (mat.col(1) * vec.coeff(2) - mat.col(2) * vec.coeff(1)).conjugate(); res.col(1) = (mat.col(2) * vec.coeff(0) - mat.col(0) * vec.coeff(2)).conjugate(); res.col(2) = (mat.col(0) * vec.coeff(1) - mat.col(1) * vec.coeff(0)).conjugate(); } return res; } namespace internal { template struct unitOrthogonal_selector { typedef typename plain_matrix_type::type VectorType; typedef typename traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef Matrix Vector2; EIGEN_DEVICE_FUNC static inline VectorType run(const Derived& src) { VectorType perp = VectorType::Zero(src.size()); Index maxi = 0; Index sndi = 0; src.cwiseAbs().maxCoeff(&maxi); if (maxi==0) sndi = 1; RealScalar invnm = RealScalar(1)/(Vector2() << src.coeff(sndi),src.coeff(maxi)).finished().norm(); perp.coeffRef(maxi) = -numext::conj(src.coeff(sndi)) * invnm; perp.coeffRef(sndi) = numext::conj(src.coeff(maxi)) * invnm; return perp; } }; template struct unitOrthogonal_selector { typedef typename plain_matrix_type::type VectorType; typedef typename traits::Scalar Scalar; typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline VectorType run(const Derived& src) { VectorType perp; /* Let us compute the crossed product of *this with a vector * that is not too close to being colinear to *this. */ /* unless the x and y coords are both close to zero, we can * simply take ( -y, x, 0 ) and normalize it. */ if((!isMuchSmallerThan(src.x(), src.z())) || (!isMuchSmallerThan(src.y(), src.z()))) { RealScalar invnm = RealScalar(1)/src.template head<2>().norm(); perp.coeffRef(0) = -numext::conj(src.y())*invnm; perp.coeffRef(1) = numext::conj(src.x())*invnm; perp.coeffRef(2) = 0; } /* if both x and y are close to zero, then the vector is close * to the z-axis, so it's far from colinear to the x-axis for instance. * So we take the crossed product with (1,0,0) and normalize it. */ else { RealScalar invnm = RealScalar(1)/src.template tail<2>().norm(); perp.coeffRef(0) = 0; perp.coeffRef(1) = -numext::conj(src.z())*invnm; perp.coeffRef(2) = numext::conj(src.y())*invnm; } return perp; } }; template struct unitOrthogonal_selector { typedef typename plain_matrix_type::type VectorType; EIGEN_DEVICE_FUNC static inline VectorType run(const Derived& src) { return VectorType(-numext::conj(src.y()), numext::conj(src.x())).normalized(); } }; } // end namespace internal /** \geometry_module \ingroup Geometry_Module * * \returns a unit vector which is orthogonal to \c *this * * The size of \c *this must be at least 2. If the size is exactly 2, * then the returned vector is a counter clock wise rotation of \c *this, i.e., (-y,x).normalized(). * * \sa cross() */ template EIGEN_DEVICE_FUNC typename MatrixBase::PlainObject MatrixBase::unitOrthogonal() const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return internal::unitOrthogonal_selector::run(derived()); } } // end namespace Eigen #endif // EIGEN_ORTHOMETHODS_H RcppEigen/inst/include/Eigen/src/Geometry/RotationBase.h0000644000176200001440000001757714107270226022724 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ROTATIONBASE_H #define EIGEN_ROTATIONBASE_H namespace Eigen { // forward declaration namespace internal { template struct rotation_base_generic_product_selector; } /** \class RotationBase * * \brief Common base class for compact rotation representations * * \tparam Derived is the derived type, i.e., a rotation type * \tparam _Dim the dimension of the space */ template class RotationBase { public: enum { Dim = _Dim }; /** the scalar type of the coefficients */ typedef typename internal::traits::Scalar Scalar; /** corresponding linear transformation matrix type */ typedef Matrix RotationMatrixType; typedef Matrix VectorType; public: EIGEN_DEVICE_FUNC inline const Derived& derived() const { return *static_cast(this); } EIGEN_DEVICE_FUNC inline Derived& derived() { return *static_cast(this); } /** \returns an equivalent rotation matrix */ EIGEN_DEVICE_FUNC inline RotationMatrixType toRotationMatrix() const { return derived().toRotationMatrix(); } /** \returns an equivalent rotation matrix * This function is added to be conform with the Transform class' naming scheme. */ EIGEN_DEVICE_FUNC inline RotationMatrixType matrix() const { return derived().toRotationMatrix(); } /** \returns the inverse rotation */ EIGEN_DEVICE_FUNC inline Derived inverse() const { return derived().inverse(); } /** \returns the concatenation of the rotation \c *this with a translation \a t */ EIGEN_DEVICE_FUNC inline Transform operator*(const Translation& t) const { return Transform(*this) * t; } /** \returns the concatenation of the rotation \c *this with a uniform scaling \a s */ EIGEN_DEVICE_FUNC inline RotationMatrixType operator*(const UniformScaling& s) const { return toRotationMatrix() * s.factor(); } /** \returns the concatenation of the rotation \c *this with a generic expression \a e * \a e can be: * - a DimxDim linear transformation matrix * - a DimxDim diagonal matrix (axis aligned scaling) * - a vector of size Dim */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::rotation_base_generic_product_selector::ReturnType operator*(const EigenBase& e) const { return internal::rotation_base_generic_product_selector::run(derived(), e.derived()); } /** \returns the concatenation of a linear transformation \a l with the rotation \a r */ template friend EIGEN_DEVICE_FUNC inline RotationMatrixType operator*(const EigenBase& l, const Derived& r) { return l.derived() * r.toRotationMatrix(); } /** \returns the concatenation of a scaling \a l with the rotation \a r */ EIGEN_DEVICE_FUNC friend inline Transform operator*(const DiagonalMatrix& l, const Derived& r) { Transform res(r); res.linear().applyOnTheLeft(l); return res; } /** \returns the concatenation of the rotation \c *this with a transformation \a t */ template EIGEN_DEVICE_FUNC inline Transform operator*(const Transform& t) const { return toRotationMatrix() * t; } template EIGEN_DEVICE_FUNC inline VectorType _transformVector(const OtherVectorType& v) const { return toRotationMatrix() * v; } }; namespace internal { // implementation of the generic product rotation * matrix template struct rotation_base_generic_product_selector { enum { Dim = RotationDerived::Dim }; typedef Matrix ReturnType; EIGEN_DEVICE_FUNC static inline ReturnType run(const RotationDerived& r, const MatrixType& m) { return r.toRotationMatrix() * m; } }; template struct rotation_base_generic_product_selector< RotationDerived, DiagonalMatrix, false > { typedef Transform ReturnType; EIGEN_DEVICE_FUNC static inline ReturnType run(const RotationDerived& r, const DiagonalMatrix& m) { ReturnType res(r); res.linear() *= m; return res; } }; template struct rotation_base_generic_product_selector { enum { Dim = RotationDerived::Dim }; typedef Matrix ReturnType; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE ReturnType run(const RotationDerived& r, const OtherVectorType& v) { return r._transformVector(v); } }; } // end namespace internal /** \geometry_module * * \brief Constructs a Dim x Dim rotation matrix from the rotation \a r */ template template EIGEN_DEVICE_FUNC Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols> ::Matrix(const RotationBase& r) { EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim)) *this = r.toRotationMatrix(); } /** \geometry_module * * \brief Set a Dim x Dim rotation matrix from the rotation \a r */ template template EIGEN_DEVICE_FUNC Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>& Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols> ::operator=(const RotationBase& r) { EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim)) return *this = r.toRotationMatrix(); } namespace internal { /** \internal * * Helper function to return an arbitrary rotation object to a rotation matrix. * * \tparam Scalar the numeric type of the matrix coefficients * \tparam Dim the dimension of the current space * * It returns a Dim x Dim fixed size matrix. * * Default specializations are provided for: * - any scalar type (2D), * - any matrix expression, * - any type based on RotationBase (e.g., Quaternion, AngleAxis, Rotation2D) * * Currently toRotationMatrix is only used by Transform. * * \sa class Transform, class Rotation2D, class Quaternion, class AngleAxis */ template EIGEN_DEVICE_FUNC static inline Matrix toRotationMatrix(const Scalar& s) { EIGEN_STATIC_ASSERT(Dim==2,YOU_MADE_A_PROGRAMMING_MISTAKE) return Rotation2D(s).toRotationMatrix(); } template EIGEN_DEVICE_FUNC static inline Matrix toRotationMatrix(const RotationBase& r) { return r.toRotationMatrix(); } template EIGEN_DEVICE_FUNC static inline const MatrixBase& toRotationMatrix(const MatrixBase& mat) { EIGEN_STATIC_ASSERT(OtherDerived::RowsAtCompileTime==Dim && OtherDerived::ColsAtCompileTime==Dim, YOU_MADE_A_PROGRAMMING_MISTAKE) return mat; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_ROTATIONBASE_H RcppEigen/inst/include/Eigen/src/LU/0000755000176200001440000000000014562417221016670 5ustar liggesusersRcppEigen/inst/include/Eigen/src/LU/arch/0000755000176200001440000000000014562417221017605 5ustar liggesusersRcppEigen/inst/include/Eigen/src/LU/arch/InverseSize4.h0000644000176200001440000003257514562417221022324 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2001 Intel Corporation // Copyright (C) 2010 Gael Guennebaud // Copyright (C) 2009 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. // // The algorithm below is a reimplementation of former \src\LU\Inverse_SSE.h using PacketMath. // inv(M) = M#/|M|, where inv(M), M# and |M| denote the inverse of M, // adjugate of M and determinant of M respectively. M# is computed block-wise // using specific formulae. For proof, see: // https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html // Variable names are adopted from \src\LU\Inverse_SSE.h. // // The SSE code for the 4x4 float and double matrix inverse in former (deprecated) \src\LU\Inverse_SSE.h // comes from the following Intel's library: // http://software.intel.com/en-us/articles/optimized-matrix-library-for-use-with-the-intel-pentiumr-4-processors-sse2-instructions/ // // Here is the respective copyright and license statement: // // Copyright (c) 2001 Intel Corporation. // // Permition is granted to use, copy, distribute and prepare derivative works // of this library for any purpose and without fee, provided, that the above // copyright notice and this statement appear in all copies. // Intel makes no representations about the suitability of this software for // any purpose, and specifically disclaims all warranties. // See LEGAL.TXT for all the legal information. // // TODO: Unify implementations of different data types (i.e. float and double). #ifndef EIGEN_INVERSE_SIZE_4_H #define EIGEN_INVERSE_SIZE_4_H namespace Eigen { namespace internal { template struct compute_inverse_size4 { enum { MatrixAlignment = traits::Alignment, ResultAlignment = traits::Alignment, StorageOrdersMatch = (MatrixType::Flags & RowMajorBit) == (ResultType::Flags & RowMajorBit) }; typedef typename conditional<(MatrixType::Flags & LinearAccessBit), MatrixType const &, typename MatrixType::PlainObject>::type ActualMatrixType; static void run(const MatrixType &mat, ResultType &result) { ActualMatrixType matrix(mat); const float* data = matrix.data(); const Index stride = matrix.innerStride(); Packet4f _L1 = ploadt(data); Packet4f _L2 = ploadt(data + stride*4); Packet4f _L3 = ploadt(data + stride*8); Packet4f _L4 = ploadt(data + stride*12); // Four 2x2 sub-matrices of the input matrix // input = [[A, B], // [C, D]] Packet4f A, B, C, D; if (!StorageOrdersMatch) { A = vec4f_unpacklo(_L1, _L2); B = vec4f_unpacklo(_L3, _L4); C = vec4f_unpackhi(_L1, _L2); D = vec4f_unpackhi(_L3, _L4); } else { A = vec4f_movelh(_L1, _L2); B = vec4f_movehl(_L2, _L1); C = vec4f_movelh(_L3, _L4); D = vec4f_movehl(_L4, _L3); } Packet4f AB, DC; // AB = A# * B, where A# denotes the adjugate of A, and * denotes matrix product. AB = pmul(vec4f_swizzle2(A, A, 3, 3, 0, 0), B); AB = psub(AB, pmul(vec4f_swizzle2(A, A, 1, 1, 2, 2), vec4f_swizzle2(B, B, 2, 3, 0, 1))); // DC = D#*C DC = pmul(vec4f_swizzle2(D, D, 3, 3, 0, 0), C); DC = psub(DC, pmul(vec4f_swizzle2(D, D, 1, 1, 2, 2), vec4f_swizzle2(C, C, 2, 3, 0, 1))); // determinants of the sub-matrices Packet4f dA, dB, dC, dD; dA = pmul(vec4f_swizzle2(A, A, 3, 3, 1, 1), A); dA = psub(dA, vec4f_movehl(dA, dA)); dB = pmul(vec4f_swizzle2(B, B, 3, 3, 1, 1), B); dB = psub(dB, vec4f_movehl(dB, dB)); dC = pmul(vec4f_swizzle2(C, C, 3, 3, 1, 1), C); dC = psub(dC, vec4f_movehl(dC, dC)); dD = pmul(vec4f_swizzle2(D, D, 3, 3, 1, 1), D); dD = psub(dD, vec4f_movehl(dD, dD)); Packet4f d, d1, d2; d = pmul(vec4f_swizzle2(DC, DC, 0, 2, 1, 3), AB); d = padd(d, vec4f_movehl(d, d)); d = padd(d, vec4f_swizzle2(d, d, 1, 0, 0, 0)); d1 = pmul(dA, dD); d2 = pmul(dB, dC); // determinant of the input matrix, det = |A||D| + |B||C| - trace(A#*B*D#*C) Packet4f det = vec4f_duplane(psub(padd(d1, d2), d), 0); // reciprocal of the determinant of the input matrix, rd = 1/det Packet4f rd = pdiv(pset1(1.0f), det); // Four sub-matrices of the inverse Packet4f iA, iB, iC, iD; // iD = D*|A| - C*A#*B iD = pmul(vec4f_swizzle2(C, C, 0, 0, 2, 2), vec4f_movelh(AB, AB)); iD = padd(iD, pmul(vec4f_swizzle2(C, C, 1, 1, 3, 3), vec4f_movehl(AB, AB))); iD = psub(pmul(D, vec4f_duplane(dA, 0)), iD); // iA = A*|D| - B*D#*C iA = pmul(vec4f_swizzle2(B, B, 0, 0, 2, 2), vec4f_movelh(DC, DC)); iA = padd(iA, pmul(vec4f_swizzle2(B, B, 1, 1, 3, 3), vec4f_movehl(DC, DC))); iA = psub(pmul(A, vec4f_duplane(dD, 0)), iA); // iB = C*|B| - D * (A#B)# = C*|B| - D*B#*A iB = pmul(D, vec4f_swizzle2(AB, AB, 3, 0, 3, 0)); iB = psub(iB, pmul(vec4f_swizzle2(D, D, 1, 0, 3, 2), vec4f_swizzle2(AB, AB, 2, 1, 2, 1))); iB = psub(pmul(C, vec4f_duplane(dB, 0)), iB); // iC = B*|C| - A * (D#C)# = B*|C| - A*C#*D iC = pmul(A, vec4f_swizzle2(DC, DC, 3, 0, 3, 0)); iC = psub(iC, pmul(vec4f_swizzle2(A, A, 1, 0, 3, 2), vec4f_swizzle2(DC, DC, 2, 1, 2, 1))); iC = psub(pmul(B, vec4f_duplane(dC, 0)), iC); const float sign_mask[4] = {0.0f, numext::bit_cast(0x80000000u), numext::bit_cast(0x80000000u), 0.0f}; const Packet4f p4f_sign_PNNP = ploadu(sign_mask); rd = pxor(rd, p4f_sign_PNNP); iA = pmul(iA, rd); iB = pmul(iB, rd); iC = pmul(iC, rd); iD = pmul(iD, rd); Index res_stride = result.outerStride(); float *res = result.data(); pstoret(res + 0, vec4f_swizzle2(iA, iB, 3, 1, 3, 1)); pstoret(res + res_stride, vec4f_swizzle2(iA, iB, 2, 0, 2, 0)); pstoret(res + 2 * res_stride, vec4f_swizzle2(iC, iD, 3, 1, 3, 1)); pstoret(res + 3 * res_stride, vec4f_swizzle2(iC, iD, 2, 0, 2, 0)); } }; #if !(defined EIGEN_VECTORIZE_NEON && !(EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG)) // same algorithm as above, except that each operand is split into // halves for two registers to hold. template struct compute_inverse_size4 { enum { MatrixAlignment = traits::Alignment, ResultAlignment = traits::Alignment, StorageOrdersMatch = (MatrixType::Flags & RowMajorBit) == (ResultType::Flags & RowMajorBit) }; typedef typename conditional<(MatrixType::Flags & LinearAccessBit), MatrixType const &, typename MatrixType::PlainObject>::type ActualMatrixType; static void run(const MatrixType &mat, ResultType &result) { ActualMatrixType matrix(mat); // Four 2x2 sub-matrices of the input matrix, each is further divided into upper and lower // row e.g. A1, upper row of A, A2, lower row of A // input = [[A, B], = [[[A1, [B1, // [C, D]] A2], B2]], // [[C1, [D1, // C2], D2]]] Packet2d A1, A2, B1, B2, C1, C2, D1, D2; const double* data = matrix.data(); const Index stride = matrix.innerStride(); if (StorageOrdersMatch) { A1 = ploadt(data + stride*0); B1 = ploadt(data + stride*2); A2 = ploadt(data + stride*4); B2 = ploadt(data + stride*6); C1 = ploadt(data + stride*8); D1 = ploadt(data + stride*10); C2 = ploadt(data + stride*12); D2 = ploadt(data + stride*14); } else { Packet2d temp; A1 = ploadt(data + stride*0); C1 = ploadt(data + stride*2); A2 = ploadt(data + stride*4); C2 = ploadt(data + stride*6); temp = A1; A1 = vec2d_unpacklo(A1, A2); A2 = vec2d_unpackhi(temp, A2); temp = C1; C1 = vec2d_unpacklo(C1, C2); C2 = vec2d_unpackhi(temp, C2); B1 = ploadt(data + stride*8); D1 = ploadt(data + stride*10); B2 = ploadt(data + stride*12); D2 = ploadt(data + stride*14); temp = B1; B1 = vec2d_unpacklo(B1, B2); B2 = vec2d_unpackhi(temp, B2); temp = D1; D1 = vec2d_unpacklo(D1, D2); D2 = vec2d_unpackhi(temp, D2); } // determinants of the sub-matrices Packet2d dA, dB, dC, dD; dA = vec2d_swizzle2(A2, A2, 1); dA = pmul(A1, dA); dA = psub(dA, vec2d_duplane(dA, 1)); dB = vec2d_swizzle2(B2, B2, 1); dB = pmul(B1, dB); dB = psub(dB, vec2d_duplane(dB, 1)); dC = vec2d_swizzle2(C2, C2, 1); dC = pmul(C1, dC); dC = psub(dC, vec2d_duplane(dC, 1)); dD = vec2d_swizzle2(D2, D2, 1); dD = pmul(D1, dD); dD = psub(dD, vec2d_duplane(dD, 1)); Packet2d DC1, DC2, AB1, AB2; // AB = A# * B, where A# denotes the adjugate of A, and * denotes matrix product. AB1 = pmul(B1, vec2d_duplane(A2, 1)); AB2 = pmul(B2, vec2d_duplane(A1, 0)); AB1 = psub(AB1, pmul(B2, vec2d_duplane(A1, 1))); AB2 = psub(AB2, pmul(B1, vec2d_duplane(A2, 0))); // DC = D#*C DC1 = pmul(C1, vec2d_duplane(D2, 1)); DC2 = pmul(C2, vec2d_duplane(D1, 0)); DC1 = psub(DC1, pmul(C2, vec2d_duplane(D1, 1))); DC2 = psub(DC2, pmul(C1, vec2d_duplane(D2, 0))); Packet2d d1, d2; // determinant of the input matrix, det = |A||D| + |B||C| - trace(A#*B*D#*C) Packet2d det; // reciprocal of the determinant of the input matrix, rd = 1/det Packet2d rd; d1 = pmul(AB1, vec2d_swizzle2(DC1, DC2, 0)); d2 = pmul(AB2, vec2d_swizzle2(DC1, DC2, 3)); rd = padd(d1, d2); rd = padd(rd, vec2d_duplane(rd, 1)); d1 = pmul(dA, dD); d2 = pmul(dB, dC); det = padd(d1, d2); det = psub(det, rd); det = vec2d_duplane(det, 0); rd = pdiv(pset1(1.0), det); // rows of four sub-matrices of the inverse Packet2d iA1, iA2, iB1, iB2, iC1, iC2, iD1, iD2; // iD = D*|A| - C*A#*B iD1 = pmul(AB1, vec2d_duplane(C1, 0)); iD2 = pmul(AB1, vec2d_duplane(C2, 0)); iD1 = padd(iD1, pmul(AB2, vec2d_duplane(C1, 1))); iD2 = padd(iD2, pmul(AB2, vec2d_duplane(C2, 1))); dA = vec2d_duplane(dA, 0); iD1 = psub(pmul(D1, dA), iD1); iD2 = psub(pmul(D2, dA), iD2); // iA = A*|D| - B*D#*C iA1 = pmul(DC1, vec2d_duplane(B1, 0)); iA2 = pmul(DC1, vec2d_duplane(B2, 0)); iA1 = padd(iA1, pmul(DC2, vec2d_duplane(B1, 1))); iA2 = padd(iA2, pmul(DC2, vec2d_duplane(B2, 1))); dD = vec2d_duplane(dD, 0); iA1 = psub(pmul(A1, dD), iA1); iA2 = psub(pmul(A2, dD), iA2); // iB = C*|B| - D * (A#B)# = C*|B| - D*B#*A iB1 = pmul(D1, vec2d_swizzle2(AB2, AB1, 1)); iB2 = pmul(D2, vec2d_swizzle2(AB2, AB1, 1)); iB1 = psub(iB1, pmul(vec2d_swizzle2(D1, D1, 1), vec2d_swizzle2(AB2, AB1, 2))); iB2 = psub(iB2, pmul(vec2d_swizzle2(D2, D2, 1), vec2d_swizzle2(AB2, AB1, 2))); dB = vec2d_duplane(dB, 0); iB1 = psub(pmul(C1, dB), iB1); iB2 = psub(pmul(C2, dB), iB2); // iC = B*|C| - A * (D#C)# = B*|C| - A*C#*D iC1 = pmul(A1, vec2d_swizzle2(DC2, DC1, 1)); iC2 = pmul(A2, vec2d_swizzle2(DC2, DC1, 1)); iC1 = psub(iC1, pmul(vec2d_swizzle2(A1, A1, 1), vec2d_swizzle2(DC2, DC1, 2))); iC2 = psub(iC2, pmul(vec2d_swizzle2(A2, A2, 1), vec2d_swizzle2(DC2, DC1, 2))); dC = vec2d_duplane(dC, 0); iC1 = psub(pmul(B1, dC), iC1); iC2 = psub(pmul(B2, dC), iC2); const double sign_mask1[2] = {0.0, numext::bit_cast(0x8000000000000000ull)}; const double sign_mask2[2] = {numext::bit_cast(0x8000000000000000ull), 0.0}; const Packet2d sign_PN = ploadu(sign_mask1); const Packet2d sign_NP = ploadu(sign_mask2); d1 = pxor(rd, sign_PN); d2 = pxor(rd, sign_NP); Index res_stride = result.outerStride(); double *res = result.data(); pstoret(res + 0, pmul(vec2d_swizzle2(iA2, iA1, 3), d1)); pstoret(res + res_stride, pmul(vec2d_swizzle2(iA2, iA1, 0), d2)); pstoret(res + 2, pmul(vec2d_swizzle2(iB2, iB1, 3), d1)); pstoret(res + res_stride + 2, pmul(vec2d_swizzle2(iB2, iB1, 0), d2)); pstoret(res + 2 * res_stride, pmul(vec2d_swizzle2(iC2, iC1, 3), d1)); pstoret(res + 3 * res_stride, pmul(vec2d_swizzle2(iC2, iC1, 0), d2)); pstoret(res + 2 * res_stride + 2, pmul(vec2d_swizzle2(iD2, iD1, 3), d1)); pstoret(res + 3 * res_stride + 2, pmul(vec2d_swizzle2(iD2, iD1, 0), d2)); } }; #endif } // namespace internal } // namespace Eigen #endif RcppEigen/inst/include/Eigen/src/LU/PartialPivLU.h0000644000176200001440000005306514562417221021366 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2009 Benoit Jacob // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PARTIALLU_H #define EIGEN_PARTIALLU_H namespace Eigen { namespace internal { template struct traits > : traits<_MatrixType> { typedef MatrixXpr XprKind; typedef SolverStorage StorageKind; typedef int StorageIndex; typedef traits<_MatrixType> BaseTraits; enum { Flags = BaseTraits::Flags & RowMajorBit, CoeffReadCost = Dynamic }; }; template struct enable_if_ref; // { // typedef Derived type; // }; template struct enable_if_ref,Derived> { typedef Derived type; }; } // end namespace internal /** \ingroup LU_Module * * \class PartialPivLU * * \brief LU decomposition of a matrix with partial pivoting, and related features * * \tparam _MatrixType the type of the matrix of which we are computing the LU decomposition * * This class represents a LU decomposition of a \b square \b invertible matrix, with partial pivoting: the matrix A * is decomposed as A = PLU where L is unit-lower-triangular, U is upper-triangular, and P * is a permutation matrix. * * Typically, partial pivoting LU decomposition is only considered numerically stable for square invertible * matrices. Thus LAPACK's dgesv and dgesvx require the matrix to be square and invertible. The present class * does the same. It will assert that the matrix is square, but it won't (actually it can't) check that the * matrix is invertible: it is your task to check that you only use this decomposition on invertible matrices. * * The guaranteed safe alternative, working for all matrices, is the full pivoting LU decomposition, provided * by class FullPivLU. * * This is \b not a rank-revealing LU decomposition. Many features are intentionally absent from this class, * such as rank computation. If you need these features, use class FullPivLU. * * This LU decomposition is suitable to invert invertible matrices. It is what MatrixBase::inverse() uses * in the general case. * On the other hand, it is \b not suitable to determine whether a given matrix is invertible. * * The data of the LU decomposition can be directly accessed through the methods matrixLU(), permutationP(). * * This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism. * * \sa MatrixBase::partialPivLu(), MatrixBase::determinant(), MatrixBase::inverse(), MatrixBase::computeInverse(), class FullPivLU */ template class PartialPivLU : public SolverBase > { public: typedef _MatrixType MatrixType; typedef SolverBase Base; friend class SolverBase; EIGEN_GENERIC_PUBLIC_INTERFACE(PartialPivLU) enum { MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime }; typedef PermutationMatrix PermutationType; typedef Transpositions TranspositionType; typedef typename MatrixType::PlainObject PlainObject; /** * \brief Default Constructor. * * The default constructor is useful in cases in which the user intends to * perform decompositions via PartialPivLU::compute(const MatrixType&). */ PartialPivLU(); /** \brief Default Constructor with memory preallocation * * Like the default constructor but with preallocation of the internal data * according to the specified problem \a size. * \sa PartialPivLU() */ explicit PartialPivLU(Index size); /** Constructor. * * \param matrix the matrix of which to compute the LU decomposition. * * \warning The matrix should have full rank (e.g. if it's square, it should be invertible). * If you need to deal with non-full rank, use class FullPivLU instead. */ template explicit PartialPivLU(const EigenBase& matrix); /** Constructor for \link InplaceDecomposition inplace decomposition \endlink * * \param matrix the matrix of which to compute the LU decomposition. * * \warning The matrix should have full rank (e.g. if it's square, it should be invertible). * If you need to deal with non-full rank, use class FullPivLU instead. */ template explicit PartialPivLU(EigenBase& matrix); template PartialPivLU& compute(const EigenBase& matrix) { m_lu = matrix.derived(); compute(); return *this; } /** \returns the LU decomposition matrix: the upper-triangular part is U, the * unit-lower-triangular part is L (at least for square matrices; in the non-square * case, special care is needed, see the documentation of class FullPivLU). * * \sa matrixL(), matrixU() */ inline const MatrixType& matrixLU() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return m_lu; } /** \returns the permutation matrix P. */ inline const PermutationType& permutationP() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return m_p; } #ifdef EIGEN_PARSED_BY_DOXYGEN /** This method returns the solution x to the equation Ax=b, where A is the matrix of which * *this is the LU decomposition. * * \param b the right-hand-side of the equation to solve. Can be a vector or a matrix, * the only requirement in order for the equation to make sense is that * b.rows()==A.rows(), where A is the matrix of which *this is the LU decomposition. * * \returns the solution. * * Example: \include PartialPivLU_solve.cpp * Output: \verbinclude PartialPivLU_solve.out * * Since this PartialPivLU class assumes anyway that the matrix A is invertible, the solution * theoretically exists and is unique regardless of b. * * \sa TriangularView::solve(), inverse(), computeInverse() */ template inline const Solve solve(const MatrixBase& b) const; #endif /** \returns an estimate of the reciprocal condition number of the matrix of which \c *this is the LU decomposition. */ inline RealScalar rcond() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return internal::rcond_estimate_helper(m_l1_norm, *this); } /** \returns the inverse of the matrix of which *this is the LU decomposition. * * \warning The matrix being decomposed here is assumed to be invertible. If you need to check for * invertibility, use class FullPivLU instead. * * \sa MatrixBase::inverse(), LU::inverse() */ inline const Inverse inverse() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return Inverse(*this); } /** \returns the determinant of the matrix of which * *this is the LU decomposition. It has only linear complexity * (that is, O(n) where n is the dimension of the square matrix) * as the LU decomposition has already been computed. * * \note For fixed-size matrices of size up to 4, MatrixBase::determinant() offers * optimized paths. * * \warning a determinant can be very big or small, so for matrices * of large enough dimension, there is a risk of overflow/underflow. * * \sa MatrixBase::determinant() */ Scalar determinant() const; MatrixType reconstructedMatrix() const; EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_lu.rows(); } EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_lu.cols(); } #ifndef EIGEN_PARSED_BY_DOXYGEN template EIGEN_DEVICE_FUNC void _solve_impl(const RhsType &rhs, DstType &dst) const { /* The decomposition PA = LU can be rewritten as A = P^{-1} L U. * So we proceed as follows: * Step 1: compute c = Pb. * Step 2: replace c by the solution x to Lx = c. * Step 3: replace c by the solution x to Ux = c. */ // Step 1 dst = permutationP() * rhs; // Step 2 m_lu.template triangularView().solveInPlace(dst); // Step 3 m_lu.template triangularView().solveInPlace(dst); } template EIGEN_DEVICE_FUNC void _solve_impl_transposed(const RhsType &rhs, DstType &dst) const { /* The decomposition PA = LU can be rewritten as A^T = U^T L^T P. * So we proceed as follows: * Step 1: compute c as the solution to L^T c = b * Step 2: replace c by the solution x to U^T x = c. * Step 3: update c = P^-1 c. */ eigen_assert(rhs.rows() == m_lu.cols()); // Step 1 dst = m_lu.template triangularView().transpose() .template conjugateIf().solve(rhs); // Step 2 m_lu.template triangularView().transpose() .template conjugateIf().solveInPlace(dst); // Step 3 dst = permutationP().transpose() * dst; } #endif protected: static void check_template_parameters() { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar); } void compute(); MatrixType m_lu; PermutationType m_p; TranspositionType m_rowsTranspositions; RealScalar m_l1_norm; signed char m_det_p; bool m_isInitialized; }; template PartialPivLU::PartialPivLU() : m_lu(), m_p(), m_rowsTranspositions(), m_l1_norm(0), m_det_p(0), m_isInitialized(false) { } template PartialPivLU::PartialPivLU(Index size) : m_lu(size, size), m_p(size), m_rowsTranspositions(size), m_l1_norm(0), m_det_p(0), m_isInitialized(false) { } template template PartialPivLU::PartialPivLU(const EigenBase& matrix) : m_lu(matrix.rows(),matrix.cols()), m_p(matrix.rows()), m_rowsTranspositions(matrix.rows()), m_l1_norm(0), m_det_p(0), m_isInitialized(false) { compute(matrix.derived()); } template template PartialPivLU::PartialPivLU(EigenBase& matrix) : m_lu(matrix.derived()), m_p(matrix.rows()), m_rowsTranspositions(matrix.rows()), m_l1_norm(0), m_det_p(0), m_isInitialized(false) { compute(); } namespace internal { /** \internal This is the blocked version of fullpivlu_unblocked() */ template struct partial_lu_impl { static const int UnBlockedBound = 16; static const bool UnBlockedAtCompileTime = SizeAtCompileTime!=Dynamic && SizeAtCompileTime<=UnBlockedBound; static const int ActualSizeAtCompileTime = UnBlockedAtCompileTime ? SizeAtCompileTime : Dynamic; // Remaining rows and columns at compile-time: static const int RRows = SizeAtCompileTime==2 ? 1 : Dynamic; static const int RCols = SizeAtCompileTime==2 ? 1 : Dynamic; typedef Matrix MatrixType; typedef Ref MatrixTypeRef; typedef Ref > BlockType; typedef typename MatrixType::RealScalar RealScalar; /** \internal performs the LU decomposition in-place of the matrix \a lu * using an unblocked algorithm. * * In addition, this function returns the row transpositions in the * vector \a row_transpositions which must have a size equal to the number * of columns of the matrix \a lu, and an integer \a nb_transpositions * which returns the actual number of transpositions. * * \returns The index of the first pivot which is exactly zero if any, or a negative number otherwise. */ static Index unblocked_lu(MatrixTypeRef& lu, PivIndex* row_transpositions, PivIndex& nb_transpositions) { typedef scalar_score_coeff_op Scoring; typedef typename Scoring::result_type Score; const Index rows = lu.rows(); const Index cols = lu.cols(); const Index size = (std::min)(rows,cols); // For small compile-time matrices it is worth processing the last row separately: // speedup: +100% for 2x2, +10% for others. const Index endk = UnBlockedAtCompileTime ? size-1 : size; nb_transpositions = 0; Index first_zero_pivot = -1; for(Index k = 0; k < endk; ++k) { int rrows = internal::convert_index(rows-k-1); int rcols = internal::convert_index(cols-k-1); Index row_of_biggest_in_col; Score biggest_in_corner = lu.col(k).tail(rows-k).unaryExpr(Scoring()).maxCoeff(&row_of_biggest_in_col); row_of_biggest_in_col += k; row_transpositions[k] = PivIndex(row_of_biggest_in_col); if(biggest_in_corner != Score(0)) { if(k != row_of_biggest_in_col) { lu.row(k).swap(lu.row(row_of_biggest_in_col)); ++nb_transpositions; } lu.col(k).tail(fix(rrows)) /= lu.coeff(k,k); } else if(first_zero_pivot==-1) { // the pivot is exactly zero, we record the index of the first pivot which is exactly 0, // and continue the factorization such we still have A = PLU first_zero_pivot = k; } if(k(rrows),fix(rcols)).noalias() -= lu.col(k).tail(fix(rrows)) * lu.row(k).tail(fix(rcols)); } // special handling of the last entry if(UnBlockedAtCompileTime) { Index k = endk; row_transpositions[k] = PivIndex(k); if (Scoring()(lu(k, k)) == Score(0) && first_zero_pivot == -1) first_zero_pivot = k; } return first_zero_pivot; } /** \internal performs the LU decomposition in-place of the matrix represented * by the variables \a rows, \a cols, \a lu_data, and \a lu_stride using a * recursive, blocked algorithm. * * In addition, this function returns the row transpositions in the * vector \a row_transpositions which must have a size equal to the number * of columns of the matrix \a lu, and an integer \a nb_transpositions * which returns the actual number of transpositions. * * \returns The index of the first pivot which is exactly zero if any, or a negative number otherwise. * * \note This very low level interface using pointers, etc. is to: * 1 - reduce the number of instantiations to the strict minimum * 2 - avoid infinite recursion of the instantiations with Block > > */ static Index blocked_lu(Index rows, Index cols, Scalar* lu_data, Index luStride, PivIndex* row_transpositions, PivIndex& nb_transpositions, Index maxBlockSize=256) { MatrixTypeRef lu = MatrixType::Map(lu_data,rows, cols, OuterStride<>(luStride)); const Index size = (std::min)(rows,cols); // if the matrix is too small, no blocking: if(UnBlockedAtCompileTime || size<=UnBlockedBound) { return unblocked_lu(lu, row_transpositions, nb_transpositions); } // automatically adjust the number of subdivisions to the size // of the matrix so that there is enough sub blocks: Index blockSize; { blockSize = size/8; blockSize = (blockSize/16)*16; blockSize = (std::min)((std::max)(blockSize,Index(8)), maxBlockSize); } nb_transpositions = 0; Index first_zero_pivot = -1; for(Index k = 0; k < size; k+=blockSize) { Index bs = (std::min)(size-k,blockSize); // actual size of the block Index trows = rows - k - bs; // trailing rows Index tsize = size - k - bs; // trailing size // partition the matrix: // A00 | A01 | A02 // lu = A_0 | A_1 | A_2 = A10 | A11 | A12 // A20 | A21 | A22 BlockType A_0 = lu.block(0,0,rows,k); BlockType A_2 = lu.block(0,k+bs,rows,tsize); BlockType A11 = lu.block(k,k,bs,bs); BlockType A12 = lu.block(k,k+bs,bs,tsize); BlockType A21 = lu.block(k+bs,k,trows,bs); BlockType A22 = lu.block(k+bs,k+bs,trows,tsize); PivIndex nb_transpositions_in_panel; // recursively call the blocked LU algorithm on [A11^T A21^T]^T // with a very small blocking size: Index ret = blocked_lu(trows+bs, bs, &lu.coeffRef(k,k), luStride, row_transpositions+k, nb_transpositions_in_panel, 16); if(ret>=0 && first_zero_pivot==-1) first_zero_pivot = k+ret; nb_transpositions += nb_transpositions_in_panel; // update permutations and apply them to A_0 for(Index i=k; i(k)); A_0.row(i).swap(A_0.row(piv)); } if(trows) { // apply permutations to A_2 for(Index i=k;i().solveInPlace(A12); A22.noalias() -= A21 * A12; } } return first_zero_pivot; } }; /** \internal performs the LU decomposition with partial pivoting in-place. */ template void partial_lu_inplace(MatrixType& lu, TranspositionType& row_transpositions, typename TranspositionType::StorageIndex& nb_transpositions) { // Special-case of zero matrix. if (lu.rows() == 0 || lu.cols() == 0) { nb_transpositions = 0; return; } eigen_assert(lu.cols() == row_transpositions.size()); eigen_assert(row_transpositions.size() < 2 || (&row_transpositions.coeffRef(1)-&row_transpositions.coeffRef(0)) == 1); partial_lu_impl < typename MatrixType::Scalar, MatrixType::Flags&RowMajorBit?RowMajor:ColMajor, typename TranspositionType::StorageIndex, EIGEN_SIZE_MIN_PREFER_FIXED(MatrixType::RowsAtCompileTime,MatrixType::ColsAtCompileTime)> ::blocked_lu(lu.rows(), lu.cols(), &lu.coeffRef(0,0), lu.outerStride(), &row_transpositions.coeffRef(0), nb_transpositions); } } // end namespace internal template void PartialPivLU::compute() { check_template_parameters(); // the row permutation is stored as int indices, so just to be sure: eigen_assert(m_lu.rows()::highest()); if(m_lu.cols()>0) m_l1_norm = m_lu.cwiseAbs().colwise().sum().maxCoeff(); else m_l1_norm = RealScalar(0); eigen_assert(m_lu.rows() == m_lu.cols() && "PartialPivLU is only for square (and moreover invertible) matrices"); const Index size = m_lu.rows(); m_rowsTranspositions.resize(size); typename TranspositionType::StorageIndex nb_transpositions; internal::partial_lu_inplace(m_lu, m_rowsTranspositions, nb_transpositions); m_det_p = (nb_transpositions%2) ? -1 : 1; m_p = m_rowsTranspositions; m_isInitialized = true; } template typename PartialPivLU::Scalar PartialPivLU::determinant() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return Scalar(m_det_p) * m_lu.diagonal().prod(); } /** \returns the matrix represented by the decomposition, * i.e., it returns the product: P^{-1} L U. * This function is provided for debug purpose. */ template MatrixType PartialPivLU::reconstructedMatrix() const { eigen_assert(m_isInitialized && "LU is not initialized."); // LU MatrixType res = m_lu.template triangularView().toDenseMatrix() * m_lu.template triangularView(); // P^{-1}(LU) res = m_p.inverse() * res; return res; } /***** Implementation details *****************************************************/ namespace internal { /***** Implementation of inverse() *****************************************************/ template struct Assignment >, internal::assign_op::Scalar>, Dense2Dense> { typedef PartialPivLU LuType; typedef Inverse SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { dst = src.nestedExpression().solve(MatrixType::Identity(src.rows(), src.cols())); } }; } // end namespace internal /******** MatrixBase methods *******/ /** \lu_module * * \return the partial-pivoting LU decomposition of \c *this. * * \sa class PartialPivLU */ template inline const PartialPivLU::PlainObject> MatrixBase::partialPivLu() const { return PartialPivLU(eval()); } /** \lu_module * * Synonym of partialPivLu(). * * \return the partial-pivoting LU decomposition of \c *this. * * \sa class PartialPivLU */ template inline const PartialPivLU::PlainObject> MatrixBase::lu() const { return PartialPivLU(eval()); } } // end namespace Eigen #endif // EIGEN_PARTIALLU_H RcppEigen/inst/include/Eigen/src/LU/InverseImpl.h0000644000176200001440000003655714562417221021316 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Benoit Jacob // Copyright (C) 2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INVERSE_IMPL_H #define EIGEN_INVERSE_IMPL_H namespace Eigen { namespace internal { /********************************** *** General case implementation *** **********************************/ template struct compute_inverse { EIGEN_DEVICE_FUNC static inline void run(const MatrixType& matrix, ResultType& result) { result = matrix.partialPivLu().inverse(); } }; template struct compute_inverse_and_det_with_check { /* nothing! general case not supported. */ }; /**************************** *** Size 1 implementation *** ****************************/ template struct compute_inverse { EIGEN_DEVICE_FUNC static inline void run(const MatrixType& matrix, ResultType& result) { typedef typename MatrixType::Scalar Scalar; internal::evaluator matrixEval(matrix); result.coeffRef(0,0) = Scalar(1) / matrixEval.coeff(0,0); } }; template struct compute_inverse_and_det_with_check { EIGEN_DEVICE_FUNC static inline void run( const MatrixType& matrix, const typename MatrixType::RealScalar& absDeterminantThreshold, ResultType& result, typename ResultType::Scalar& determinant, bool& invertible ) { using std::abs; determinant = matrix.coeff(0,0); invertible = abs(determinant) > absDeterminantThreshold; if(invertible) result.coeffRef(0,0) = typename ResultType::Scalar(1) / determinant; } }; /**************************** *** Size 2 implementation *** ****************************/ template EIGEN_DEVICE_FUNC inline void compute_inverse_size2_helper( const MatrixType& matrix, const typename ResultType::Scalar& invdet, ResultType& result) { typename ResultType::Scalar temp = matrix.coeff(0,0); result.coeffRef(0,0) = matrix.coeff(1,1) * invdet; result.coeffRef(1,0) = -matrix.coeff(1,0) * invdet; result.coeffRef(0,1) = -matrix.coeff(0,1) * invdet; result.coeffRef(1,1) = temp * invdet; } template struct compute_inverse { EIGEN_DEVICE_FUNC static inline void run(const MatrixType& matrix, ResultType& result) { typedef typename ResultType::Scalar Scalar; const Scalar invdet = typename MatrixType::Scalar(1) / matrix.determinant(); compute_inverse_size2_helper(matrix, invdet, result); } }; template struct compute_inverse_and_det_with_check { EIGEN_DEVICE_FUNC static inline void run( const MatrixType& matrix, const typename MatrixType::RealScalar& absDeterminantThreshold, ResultType& inverse, typename ResultType::Scalar& determinant, bool& invertible ) { using std::abs; typedef typename ResultType::Scalar Scalar; determinant = matrix.determinant(); invertible = abs(determinant) > absDeterminantThreshold; if(!invertible) return; const Scalar invdet = Scalar(1) / determinant; compute_inverse_size2_helper(matrix, invdet, inverse); } }; /**************************** *** Size 3 implementation *** ****************************/ template EIGEN_DEVICE_FUNC inline typename MatrixType::Scalar cofactor_3x3(const MatrixType& m) { enum { i1 = (i+1) % 3, i2 = (i+2) % 3, j1 = (j+1) % 3, j2 = (j+2) % 3 }; return m.coeff(i1, j1) * m.coeff(i2, j2) - m.coeff(i1, j2) * m.coeff(i2, j1); } template EIGEN_DEVICE_FUNC inline void compute_inverse_size3_helper( const MatrixType& matrix, const typename ResultType::Scalar& invdet, const Matrix& cofactors_col0, ResultType& result) { // Compute cofactors in a way that avoids aliasing issues. typedef typename ResultType::Scalar Scalar; const Scalar c01 = cofactor_3x3(matrix) * invdet; const Scalar c11 = cofactor_3x3(matrix) * invdet; const Scalar c02 = cofactor_3x3(matrix) * invdet; result.coeffRef(1,2) = cofactor_3x3(matrix) * invdet; result.coeffRef(2,1) = cofactor_3x3(matrix) * invdet; result.coeffRef(2,2) = cofactor_3x3(matrix) * invdet; result.coeffRef(1,0) = c01; result.coeffRef(1,1) = c11; result.coeffRef(2,0) = c02; result.row(0) = cofactors_col0 * invdet; } template struct compute_inverse { EIGEN_DEVICE_FUNC static inline void run(const MatrixType& matrix, ResultType& result) { typedef typename ResultType::Scalar Scalar; Matrix cofactors_col0; cofactors_col0.coeffRef(0) = cofactor_3x3(matrix); cofactors_col0.coeffRef(1) = cofactor_3x3(matrix); cofactors_col0.coeffRef(2) = cofactor_3x3(matrix); const Scalar det = (cofactors_col0.cwiseProduct(matrix.col(0))).sum(); const Scalar invdet = Scalar(1) / det; compute_inverse_size3_helper(matrix, invdet, cofactors_col0, result); } }; template struct compute_inverse_and_det_with_check { EIGEN_DEVICE_FUNC static inline void run( const MatrixType& matrix, const typename MatrixType::RealScalar& absDeterminantThreshold, ResultType& inverse, typename ResultType::Scalar& determinant, bool& invertible ) { typedef typename ResultType::Scalar Scalar; Matrix cofactors_col0; cofactors_col0.coeffRef(0) = cofactor_3x3(matrix); cofactors_col0.coeffRef(1) = cofactor_3x3(matrix); cofactors_col0.coeffRef(2) = cofactor_3x3(matrix); determinant = (cofactors_col0.cwiseProduct(matrix.col(0))).sum(); invertible = Eigen::numext::abs(determinant) > absDeterminantThreshold; if(!invertible) return; const Scalar invdet = Scalar(1) / determinant; compute_inverse_size3_helper(matrix, invdet, cofactors_col0, inverse); } }; /**************************** *** Size 4 implementation *** ****************************/ template EIGEN_DEVICE_FUNC inline const typename Derived::Scalar general_det3_helper (const MatrixBase& matrix, int i1, int i2, int i3, int j1, int j2, int j3) { return matrix.coeff(i1,j1) * (matrix.coeff(i2,j2) * matrix.coeff(i3,j3) - matrix.coeff(i2,j3) * matrix.coeff(i3,j2)); } template EIGEN_DEVICE_FUNC inline typename MatrixType::Scalar cofactor_4x4(const MatrixType& matrix) { enum { i1 = (i+1) % 4, i2 = (i+2) % 4, i3 = (i+3) % 4, j1 = (j+1) % 4, j2 = (j+2) % 4, j3 = (j+3) % 4 }; return general_det3_helper(matrix, i1, i2, i3, j1, j2, j3) + general_det3_helper(matrix, i2, i3, i1, j1, j2, j3) + general_det3_helper(matrix, i3, i1, i2, j1, j2, j3); } template struct compute_inverse_size4 { EIGEN_DEVICE_FUNC static void run(const MatrixType& matrix, ResultType& result) { result.coeffRef(0,0) = cofactor_4x4(matrix); result.coeffRef(1,0) = -cofactor_4x4(matrix); result.coeffRef(2,0) = cofactor_4x4(matrix); result.coeffRef(3,0) = -cofactor_4x4(matrix); result.coeffRef(0,2) = cofactor_4x4(matrix); result.coeffRef(1,2) = -cofactor_4x4(matrix); result.coeffRef(2,2) = cofactor_4x4(matrix); result.coeffRef(3,2) = -cofactor_4x4(matrix); result.coeffRef(0,1) = -cofactor_4x4(matrix); result.coeffRef(1,1) = cofactor_4x4(matrix); result.coeffRef(2,1) = -cofactor_4x4(matrix); result.coeffRef(3,1) = cofactor_4x4(matrix); result.coeffRef(0,3) = -cofactor_4x4(matrix); result.coeffRef(1,3) = cofactor_4x4(matrix); result.coeffRef(2,3) = -cofactor_4x4(matrix); result.coeffRef(3,3) = cofactor_4x4(matrix); result /= (matrix.col(0).cwiseProduct(result.row(0).transpose())).sum(); } }; template struct compute_inverse : compute_inverse_size4 { }; template struct compute_inverse_and_det_with_check { EIGEN_DEVICE_FUNC static inline void run( const MatrixType& matrix, const typename MatrixType::RealScalar& absDeterminantThreshold, ResultType& inverse, typename ResultType::Scalar& determinant, bool& invertible ) { using std::abs; determinant = matrix.determinant(); invertible = abs(determinant) > absDeterminantThreshold; if(invertible && extract_data(matrix) != extract_data(inverse)) { compute_inverse::run(matrix, inverse); } else if(invertible) { MatrixType matrix_t = matrix; compute_inverse::run(matrix_t, inverse); } } }; /************************* *** MatrixBase methods *** *************************/ } // end namespace internal namespace internal { // Specialization for "dense = dense_xpr.inverse()" template struct Assignment, internal::assign_op, Dense2Dense> { typedef Inverse SrcXprType; EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); const int Size = EIGEN_PLAIN_ENUM_MIN(XprType::ColsAtCompileTime,DstXprType::ColsAtCompileTime); EIGEN_ONLY_USED_FOR_DEBUG(Size); eigen_assert(( (Size<=1) || (Size>4) || (extract_data(src.nestedExpression())!=extract_data(dst))) && "Aliasing problem detected in inverse(), you need to do inverse().eval() here."); typedef typename internal::nested_eval::type ActualXprType; typedef typename internal::remove_all::type ActualXprTypeCleanded; ActualXprType actual_xpr(src.nestedExpression()); compute_inverse::run(actual_xpr, dst); } }; } // end namespace internal /** \lu_module * * \returns the matrix inverse of this matrix. * * For small fixed sizes up to 4x4, this method uses cofactors. * In the general case, this method uses class PartialPivLU. * * \note This matrix must be invertible, otherwise the result is undefined. If you need an * invertibility check, do the following: * \li for fixed sizes up to 4x4, use computeInverseAndDetWithCheck(). * \li for the general case, use class FullPivLU. * * Example: \include MatrixBase_inverse.cpp * Output: \verbinclude MatrixBase_inverse.out * * \sa computeInverseAndDetWithCheck() */ template EIGEN_DEVICE_FUNC inline const Inverse MatrixBase::inverse() const { EIGEN_STATIC_ASSERT(!NumTraits::IsInteger,THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES) eigen_assert(rows() == cols()); return Inverse(derived()); } /** \lu_module * * Computation of matrix inverse and determinant, with invertibility check. * * This is only for fixed-size square matrices of size up to 4x4. * * Notice that it will trigger a copy of input matrix when trying to do the inverse in place. * * \param inverse Reference to the matrix in which to store the inverse. * \param determinant Reference to the variable in which to store the determinant. * \param invertible Reference to the bool variable in which to store whether the matrix is invertible. * \param absDeterminantThreshold Optional parameter controlling the invertibility check. * The matrix will be declared invertible if the absolute value of its * determinant is greater than this threshold. * * Example: \include MatrixBase_computeInverseAndDetWithCheck.cpp * Output: \verbinclude MatrixBase_computeInverseAndDetWithCheck.out * * \sa inverse(), computeInverseWithCheck() */ template template inline void MatrixBase::computeInverseAndDetWithCheck( ResultType& inverse, typename ResultType::Scalar& determinant, bool& invertible, const RealScalar& absDeterminantThreshold ) const { // i'd love to put some static assertions there, but SFINAE means that they have no effect... eigen_assert(rows() == cols()); // for 2x2, it's worth giving a chance to avoid evaluating. // for larger sizes, evaluating has negligible cost and limits code size. typedef typename internal::conditional< RowsAtCompileTime == 2, typename internal::remove_all::type>::type, PlainObject >::type MatrixType; internal::compute_inverse_and_det_with_check::run (derived(), absDeterminantThreshold, inverse, determinant, invertible); } /** \lu_module * * Computation of matrix inverse, with invertibility check. * * This is only for fixed-size square matrices of size up to 4x4. * * Notice that it will trigger a copy of input matrix when trying to do the inverse in place. * * \param inverse Reference to the matrix in which to store the inverse. * \param invertible Reference to the bool variable in which to store whether the matrix is invertible. * \param absDeterminantThreshold Optional parameter controlling the invertibility check. * The matrix will be declared invertible if the absolute value of its * determinant is greater than this threshold. * * Example: \include MatrixBase_computeInverseWithCheck.cpp * Output: \verbinclude MatrixBase_computeInverseWithCheck.out * * \sa inverse(), computeInverseAndDetWithCheck() */ template template inline void MatrixBase::computeInverseWithCheck( ResultType& inverse, bool& invertible, const RealScalar& absDeterminantThreshold ) const { Scalar determinant; // i'd love to put some static assertions there, but SFINAE means that they have no effect... eigen_assert(rows() == cols()); computeInverseAndDetWithCheck(inverse,determinant,invertible,absDeterminantThreshold); } } // end namespace Eigen #endif // EIGEN_INVERSE_IMPL_H RcppEigen/inst/include/Eigen/src/LU/Determinant.h0000644000176200001440000000655714562417221021330 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DETERMINANT_H #define EIGEN_DETERMINANT_H namespace Eigen { namespace internal { template EIGEN_DEVICE_FUNC inline const typename Derived::Scalar bruteforce_det3_helper (const MatrixBase& matrix, int a, int b, int c) { return matrix.coeff(0,a) * (matrix.coeff(1,b) * matrix.coeff(2,c) - matrix.coeff(1,c) * matrix.coeff(2,b)); } template struct determinant_impl { static inline typename traits::Scalar run(const Derived& m) { if(Derived::ColsAtCompileTime==Dynamic && m.rows()==0) return typename traits::Scalar(1); return m.partialPivLu().determinant(); } }; template struct determinant_impl { static inline EIGEN_DEVICE_FUNC typename traits::Scalar run(const Derived& m) { return m.coeff(0,0); } }; template struct determinant_impl { static inline EIGEN_DEVICE_FUNC typename traits::Scalar run(const Derived& m) { return m.coeff(0,0) * m.coeff(1,1) - m.coeff(1,0) * m.coeff(0,1); } }; template struct determinant_impl { static inline EIGEN_DEVICE_FUNC typename traits::Scalar run(const Derived& m) { return bruteforce_det3_helper(m,0,1,2) - bruteforce_det3_helper(m,1,0,2) + bruteforce_det3_helper(m,2,0,1); } }; template struct determinant_impl { typedef typename traits::Scalar Scalar; static EIGEN_DEVICE_FUNC Scalar run(const Derived& m) { Scalar d2_01 = det2(m, 0, 1); Scalar d2_02 = det2(m, 0, 2); Scalar d2_03 = det2(m, 0, 3); Scalar d2_12 = det2(m, 1, 2); Scalar d2_13 = det2(m, 1, 3); Scalar d2_23 = det2(m, 2, 3); Scalar d3_0 = det3(m, 1,d2_23, 2,d2_13, 3,d2_12); Scalar d3_1 = det3(m, 0,d2_23, 2,d2_03, 3,d2_02); Scalar d3_2 = det3(m, 0,d2_13, 1,d2_03, 3,d2_01); Scalar d3_3 = det3(m, 0,d2_12, 1,d2_02, 2,d2_01); return internal::pmadd(-m(0,3),d3_0, m(1,3)*d3_1) + internal::pmadd(-m(2,3),d3_2, m(3,3)*d3_3); } protected: static EIGEN_DEVICE_FUNC Scalar det2(const Derived& m, Index i0, Index i1) { return m(i0,0) * m(i1,1) - m(i1,0) * m(i0,1); } static EIGEN_DEVICE_FUNC Scalar det3(const Derived& m, Index i0, const Scalar& d0, Index i1, const Scalar& d1, Index i2, const Scalar& d2) { return internal::pmadd(m(i0,2), d0, internal::pmadd(-m(i1,2), d1, m(i2,2)*d2)); } }; } // end namespace internal /** \lu_module * * \returns the determinant of this matrix */ template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar MatrixBase::determinant() const { eigen_assert(rows() == cols()); typedef typename internal::nested_eval::type Nested; return internal::determinant_impl::type>::run(derived()); } } // end namespace Eigen #endif // EIGEN_DETERMINANT_H RcppEigen/inst/include/Eigen/src/LU/PartialPivLU_LAPACKE.h0000644000176200001440000000674314107270226022504 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to LAPACKe * LU decomposition with partial pivoting based on LAPACKE_?getrf function. ******************************************************************************** */ #ifndef EIGEN_PARTIALLU_LAPACK_H #define EIGEN_PARTIALLU_LAPACK_H namespace Eigen { namespace internal { /** \internal Specialization for the data types supported by LAPACKe */ #define EIGEN_LAPACKE_LU_PARTPIV(EIGTYPE, LAPACKE_TYPE, LAPACKE_PREFIX) \ template \ struct partial_lu_impl \ { \ /* \internal performs the LU decomposition in-place of the matrix represented */ \ static lapack_int blocked_lu(Index rows, Index cols, EIGTYPE* lu_data, Index luStride, lapack_int* row_transpositions, lapack_int& nb_transpositions, lapack_int maxBlockSize=256) \ { \ EIGEN_UNUSED_VARIABLE(maxBlockSize);\ lapack_int matrix_order, first_zero_pivot; \ lapack_int m, n, lda, *ipiv, info; \ EIGTYPE* a; \ /* Set up parameters for ?getrf */ \ matrix_order = StorageOrder==RowMajor ? LAPACK_ROW_MAJOR : LAPACK_COL_MAJOR; \ lda = convert_index(luStride); \ a = lu_data; \ ipiv = row_transpositions; \ m = convert_index(rows); \ n = convert_index(cols); \ nb_transpositions = 0; \ \ info = LAPACKE_##LAPACKE_PREFIX##getrf( matrix_order, m, n, (LAPACKE_TYPE*)a, lda, ipiv ); \ \ for(int i=0;i= 0); \ /* something should be done with nb_transpositions */ \ \ first_zero_pivot = info; \ return first_zero_pivot; \ } \ }; EIGEN_LAPACKE_LU_PARTPIV(double, double, d) EIGEN_LAPACKE_LU_PARTPIV(float, float, s) EIGEN_LAPACKE_LU_PARTPIV(dcomplex, lapack_complex_double, z) EIGEN_LAPACKE_LU_PARTPIV(scomplex, lapack_complex_float, c) } // end namespace internal } // end namespace Eigen #endif // EIGEN_PARTIALLU_LAPACK_H RcppEigen/inst/include/Eigen/src/LU/FullPivLU.h0000644000176200001440000007717714562417221020706 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2009 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LU_H #define EIGEN_LU_H namespace Eigen { namespace internal { template struct traits > : traits<_MatrixType> { typedef MatrixXpr XprKind; typedef SolverStorage StorageKind; typedef int StorageIndex; enum { Flags = 0 }; }; } // end namespace internal /** \ingroup LU_Module * * \class FullPivLU * * \brief LU decomposition of a matrix with complete pivoting, and related features * * \tparam _MatrixType the type of the matrix of which we are computing the LU decomposition * * This class represents a LU decomposition of any matrix, with complete pivoting: the matrix A is * decomposed as \f$ A = P^{-1} L U Q^{-1} \f$ where L is unit-lower-triangular, U is * upper-triangular, and P and Q are permutation matrices. This is a rank-revealing LU * decomposition. The eigenvalues (diagonal coefficients) of U are sorted in such a way that any * zeros are at the end. * * This decomposition provides the generic approach to solving systems of linear equations, computing * the rank, invertibility, inverse, kernel, and determinant. * * This LU decomposition is very stable and well tested with large matrices. However there are use cases where the SVD * decomposition is inherently more stable and/or flexible. For example, when computing the kernel of a matrix, * working with the SVD allows to select the smallest singular values of the matrix, something that * the LU decomposition doesn't see. * * The data of the LU decomposition can be directly accessed through the methods matrixLU(), * permutationP(), permutationQ(). * * As an example, here is how the original matrix can be retrieved: * \include class_FullPivLU.cpp * Output: \verbinclude class_FullPivLU.out * * This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism. * * \sa MatrixBase::fullPivLu(), MatrixBase::determinant(), MatrixBase::inverse() */ template class FullPivLU : public SolverBase > { public: typedef _MatrixType MatrixType; typedef SolverBase Base; friend class SolverBase; EIGEN_GENERIC_PUBLIC_INTERFACE(FullPivLU) enum { MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime }; typedef typename internal::plain_row_type::type IntRowVectorType; typedef typename internal::plain_col_type::type IntColVectorType; typedef PermutationMatrix PermutationQType; typedef PermutationMatrix PermutationPType; typedef typename MatrixType::PlainObject PlainObject; /** * \brief Default Constructor. * * The default constructor is useful in cases in which the user intends to * perform decompositions via LU::compute(const MatrixType&). */ FullPivLU(); /** \brief Default Constructor with memory preallocation * * Like the default constructor but with preallocation of the internal data * according to the specified problem \a size. * \sa FullPivLU() */ FullPivLU(Index rows, Index cols); /** Constructor. * * \param matrix the matrix of which to compute the LU decomposition. * It is required to be nonzero. */ template explicit FullPivLU(const EigenBase& matrix); /** \brief Constructs a LU factorization from a given matrix * * This overloaded constructor is provided for \link InplaceDecomposition inplace decomposition \endlink when \c MatrixType is a Eigen::Ref. * * \sa FullPivLU(const EigenBase&) */ template explicit FullPivLU(EigenBase& matrix); /** Computes the LU decomposition of the given matrix. * * \param matrix the matrix of which to compute the LU decomposition. * It is required to be nonzero. * * \returns a reference to *this */ template FullPivLU& compute(const EigenBase& matrix) { m_lu = matrix.derived(); computeInPlace(); return *this; } /** \returns the LU decomposition matrix: the upper-triangular part is U, the * unit-lower-triangular part is L (at least for square matrices; in the non-square * case, special care is needed, see the documentation of class FullPivLU). * * \sa matrixL(), matrixU() */ inline const MatrixType& matrixLU() const { eigen_assert(m_isInitialized && "LU is not initialized."); return m_lu; } /** \returns the number of nonzero pivots in the LU decomposition. * Here nonzero is meant in the exact sense, not in a fuzzy sense. * So that notion isn't really intrinsically interesting, but it is * still useful when implementing algorithms. * * \sa rank() */ inline Index nonzeroPivots() const { eigen_assert(m_isInitialized && "LU is not initialized."); return m_nonzero_pivots; } /** \returns the absolute value of the biggest pivot, i.e. the biggest * diagonal coefficient of U. */ RealScalar maxPivot() const { return m_maxpivot; } /** \returns the permutation matrix P * * \sa permutationQ() */ EIGEN_DEVICE_FUNC inline const PermutationPType& permutationP() const { eigen_assert(m_isInitialized && "LU is not initialized."); return m_p; } /** \returns the permutation matrix Q * * \sa permutationP() */ inline const PermutationQType& permutationQ() const { eigen_assert(m_isInitialized && "LU is not initialized."); return m_q; } /** \returns the kernel of the matrix, also called its null-space. The columns of the returned matrix * will form a basis of the kernel. * * \note If the kernel has dimension zero, then the returned matrix is a column-vector filled with zeros. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). * * Example: \include FullPivLU_kernel.cpp * Output: \verbinclude FullPivLU_kernel.out * * \sa image() */ inline const internal::kernel_retval kernel() const { eigen_assert(m_isInitialized && "LU is not initialized."); return internal::kernel_retval(*this); } /** \returns the image of the matrix, also called its column-space. The columns of the returned matrix * will form a basis of the image (column-space). * * \param originalMatrix the original matrix, of which *this is the LU decomposition. * The reason why it is needed to pass it here, is that this allows * a large optimization, as otherwise this method would need to reconstruct it * from the LU decomposition. * * \note If the image has dimension zero, then the returned matrix is a column-vector filled with zeros. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). * * Example: \include FullPivLU_image.cpp * Output: \verbinclude FullPivLU_image.out * * \sa kernel() */ inline const internal::image_retval image(const MatrixType& originalMatrix) const { eigen_assert(m_isInitialized && "LU is not initialized."); return internal::image_retval(*this, originalMatrix); } #ifdef EIGEN_PARSED_BY_DOXYGEN /** \return a solution x to the equation Ax=b, where A is the matrix of which * *this is the LU decomposition. * * \param b the right-hand-side of the equation to solve. Can be a vector or a matrix, * the only requirement in order for the equation to make sense is that * b.rows()==A.rows(), where A is the matrix of which *this is the LU decomposition. * * \returns a solution. * * \note_about_checking_solutions * * \note_about_arbitrary_choice_of_solution * \note_about_using_kernel_to_study_multiple_solutions * * Example: \include FullPivLU_solve.cpp * Output: \verbinclude FullPivLU_solve.out * * \sa TriangularView::solve(), kernel(), inverse() */ template inline const Solve solve(const MatrixBase& b) const; #endif /** \returns an estimate of the reciprocal condition number of the matrix of which \c *this is the LU decomposition. */ inline RealScalar rcond() const { eigen_assert(m_isInitialized && "PartialPivLU is not initialized."); return internal::rcond_estimate_helper(m_l1_norm, *this); } /** \returns the determinant of the matrix of which * *this is the LU decomposition. It has only linear complexity * (that is, O(n) where n is the dimension of the square matrix) * as the LU decomposition has already been computed. * * \note This is only for square matrices. * * \note For fixed-size matrices of size up to 4, MatrixBase::determinant() offers * optimized paths. * * \warning a determinant can be very big or small, so for matrices * of large enough dimension, there is a risk of overflow/underflow. * * \sa MatrixBase::determinant() */ typename internal::traits::Scalar determinant() const; /** Allows to prescribe a threshold to be used by certain methods, such as rank(), * who need to determine when pivots are to be considered nonzero. This is not used for the * LU decomposition itself. * * When it needs to get the threshold value, Eigen calls threshold(). By default, this * uses a formula to automatically determine a reasonable threshold. * Once you have called the present method setThreshold(const RealScalar&), * your value is used instead. * * \param threshold The new value to use as the threshold. * * A pivot will be considered nonzero if its absolute value is strictly greater than * \f$ \vert pivot \vert \leqslant threshold \times \vert maxpivot \vert \f$ * where maxpivot is the biggest pivot. * * If you want to come back to the default behavior, call setThreshold(Default_t) */ FullPivLU& setThreshold(const RealScalar& threshold) { m_usePrescribedThreshold = true; m_prescribedThreshold = threshold; return *this; } /** Allows to come back to the default behavior, letting Eigen use its default formula for * determining the threshold. * * You should pass the special object Eigen::Default as parameter here. * \code lu.setThreshold(Eigen::Default); \endcode * * See the documentation of setThreshold(const RealScalar&). */ FullPivLU& setThreshold(Default_t) { m_usePrescribedThreshold = false; return *this; } /** Returns the threshold that will be used by certain methods such as rank(). * * See the documentation of setThreshold(const RealScalar&). */ RealScalar threshold() const { eigen_assert(m_isInitialized || m_usePrescribedThreshold); return m_usePrescribedThreshold ? m_prescribedThreshold // this formula comes from experimenting (see "LU precision tuning" thread on the list) // and turns out to be identical to Higham's formula used already in LDLt. : NumTraits::epsilon() * RealScalar(m_lu.diagonalSize()); } /** \returns the rank of the matrix of which *this is the LU decomposition. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). */ inline Index rank() const { using std::abs; eigen_assert(m_isInitialized && "LU is not initialized."); RealScalar premultiplied_threshold = abs(m_maxpivot) * threshold(); Index result = 0; for(Index i = 0; i < m_nonzero_pivots; ++i) result += (abs(m_lu.coeff(i,i)) > premultiplied_threshold); return result; } /** \returns the dimension of the kernel of the matrix of which *this is the LU decomposition. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). */ inline Index dimensionOfKernel() const { eigen_assert(m_isInitialized && "LU is not initialized."); return cols() - rank(); } /** \returns true if the matrix of which *this is the LU decomposition represents an injective * linear map, i.e. has trivial kernel; false otherwise. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). */ inline bool isInjective() const { eigen_assert(m_isInitialized && "LU is not initialized."); return rank() == cols(); } /** \returns true if the matrix of which *this is the LU decomposition represents a surjective * linear map; false otherwise. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). */ inline bool isSurjective() const { eigen_assert(m_isInitialized && "LU is not initialized."); return rank() == rows(); } /** \returns true if the matrix of which *this is the LU decomposition is invertible. * * \note This method has to determine which pivots should be considered nonzero. * For that, it uses the threshold value that you can control by calling * setThreshold(const RealScalar&). */ inline bool isInvertible() const { eigen_assert(m_isInitialized && "LU is not initialized."); return isInjective() && (m_lu.rows() == m_lu.cols()); } /** \returns the inverse of the matrix of which *this is the LU decomposition. * * \note If this matrix is not invertible, the returned matrix has undefined coefficients. * Use isInvertible() to first determine whether this matrix is invertible. * * \sa MatrixBase::inverse() */ inline const Inverse inverse() const { eigen_assert(m_isInitialized && "LU is not initialized."); eigen_assert(m_lu.rows() == m_lu.cols() && "You can't take the inverse of a non-square matrix!"); return Inverse(*this); } MatrixType reconstructedMatrix() const; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_lu.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_lu.cols(); } #ifndef EIGEN_PARSED_BY_DOXYGEN template void _solve_impl(const RhsType &rhs, DstType &dst) const; template void _solve_impl_transposed(const RhsType &rhs, DstType &dst) const; #endif protected: static void check_template_parameters() { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar); } void computeInPlace(); MatrixType m_lu; PermutationPType m_p; PermutationQType m_q; IntColVectorType m_rowsTranspositions; IntRowVectorType m_colsTranspositions; Index m_nonzero_pivots; RealScalar m_l1_norm; RealScalar m_maxpivot, m_prescribedThreshold; signed char m_det_pq; bool m_isInitialized, m_usePrescribedThreshold; }; template FullPivLU::FullPivLU() : m_isInitialized(false), m_usePrescribedThreshold(false) { } template FullPivLU::FullPivLU(Index rows, Index cols) : m_lu(rows, cols), m_p(rows), m_q(cols), m_rowsTranspositions(rows), m_colsTranspositions(cols), m_isInitialized(false), m_usePrescribedThreshold(false) { } template template FullPivLU::FullPivLU(const EigenBase& matrix) : m_lu(matrix.rows(), matrix.cols()), m_p(matrix.rows()), m_q(matrix.cols()), m_rowsTranspositions(matrix.rows()), m_colsTranspositions(matrix.cols()), m_isInitialized(false), m_usePrescribedThreshold(false) { compute(matrix.derived()); } template template FullPivLU::FullPivLU(EigenBase& matrix) : m_lu(matrix.derived()), m_p(matrix.rows()), m_q(matrix.cols()), m_rowsTranspositions(matrix.rows()), m_colsTranspositions(matrix.cols()), m_isInitialized(false), m_usePrescribedThreshold(false) { computeInPlace(); } template void FullPivLU::computeInPlace() { check_template_parameters(); // the permutations are stored as int indices, so just to be sure: eigen_assert(m_lu.rows()<=NumTraits::highest() && m_lu.cols()<=NumTraits::highest()); m_l1_norm = m_lu.cwiseAbs().colwise().sum().maxCoeff(); const Index size = m_lu.diagonalSize(); const Index rows = m_lu.rows(); const Index cols = m_lu.cols(); // will store the transpositions, before we accumulate them at the end. // can't accumulate on-the-fly because that will be done in reverse order for the rows. m_rowsTranspositions.resize(m_lu.rows()); m_colsTranspositions.resize(m_lu.cols()); Index number_of_transpositions = 0; // number of NONTRIVIAL transpositions, i.e. m_rowsTranspositions[i]!=i m_nonzero_pivots = size; // the generic case is that in which all pivots are nonzero (invertible case) m_maxpivot = RealScalar(0); for(Index k = 0; k < size; ++k) { // First, we need to find the pivot. // biggest coefficient in the remaining bottom-right corner (starting at row k, col k) Index row_of_biggest_in_corner, col_of_biggest_in_corner; typedef internal::scalar_score_coeff_op Scoring; typedef typename Scoring::result_type Score; Score biggest_in_corner; biggest_in_corner = m_lu.bottomRightCorner(rows-k, cols-k) .unaryExpr(Scoring()) .maxCoeff(&row_of_biggest_in_corner, &col_of_biggest_in_corner); row_of_biggest_in_corner += k; // correct the values! since they were computed in the corner, col_of_biggest_in_corner += k; // need to add k to them. if(biggest_in_corner==Score(0)) { // before exiting, make sure to initialize the still uninitialized transpositions // in a sane state without destroying what we already have. m_nonzero_pivots = k; for(Index i = k; i < size; ++i) { m_rowsTranspositions.coeffRef(i) = internal::convert_index(i); m_colsTranspositions.coeffRef(i) = internal::convert_index(i); } break; } RealScalar abs_pivot = internal::abs_knowing_score()(m_lu(row_of_biggest_in_corner, col_of_biggest_in_corner), biggest_in_corner); if(abs_pivot > m_maxpivot) m_maxpivot = abs_pivot; // Now that we've found the pivot, we need to apply the row/col swaps to // bring it to the location (k,k). m_rowsTranspositions.coeffRef(k) = internal::convert_index(row_of_biggest_in_corner); m_colsTranspositions.coeffRef(k) = internal::convert_index(col_of_biggest_in_corner); if(k != row_of_biggest_in_corner) { m_lu.row(k).swap(m_lu.row(row_of_biggest_in_corner)); ++number_of_transpositions; } if(k != col_of_biggest_in_corner) { m_lu.col(k).swap(m_lu.col(col_of_biggest_in_corner)); ++number_of_transpositions; } // Now that the pivot is at the right location, we update the remaining // bottom-right corner by Gaussian elimination. if(k= 0; --k) m_p.applyTranspositionOnTheRight(k, m_rowsTranspositions.coeff(k)); m_q.setIdentity(cols); for(Index k = 0; k < size; ++k) m_q.applyTranspositionOnTheRight(k, m_colsTranspositions.coeff(k)); m_det_pq = (number_of_transpositions%2) ? -1 : 1; m_isInitialized = true; } template typename internal::traits::Scalar FullPivLU::determinant() const { eigen_assert(m_isInitialized && "LU is not initialized."); eigen_assert(m_lu.rows() == m_lu.cols() && "You can't take the determinant of a non-square matrix!"); return Scalar(m_det_pq) * Scalar(m_lu.diagonal().prod()); } /** \returns the matrix represented by the decomposition, * i.e., it returns the product: \f$ P^{-1} L U Q^{-1} \f$. * This function is provided for debug purposes. */ template MatrixType FullPivLU::reconstructedMatrix() const { eigen_assert(m_isInitialized && "LU is not initialized."); const Index smalldim = (std::min)(m_lu.rows(), m_lu.cols()); // LU MatrixType res(m_lu.rows(),m_lu.cols()); // FIXME the .toDenseMatrix() should not be needed... res = m_lu.leftCols(smalldim) .template triangularView().toDenseMatrix() * m_lu.topRows(smalldim) .template triangularView().toDenseMatrix(); // P^{-1}(LU) res = m_p.inverse() * res; // (P^{-1}LU)Q^{-1} res = res * m_q.inverse(); return res; } /********* Implementation of kernel() **************************************************/ namespace internal { template struct kernel_retval > : kernel_retval_base > { EIGEN_MAKE_KERNEL_HELPERS(FullPivLU<_MatrixType>) enum { MaxSmallDimAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED( MatrixType::MaxColsAtCompileTime, MatrixType::MaxRowsAtCompileTime) }; template void evalTo(Dest& dst) const { using std::abs; const Index cols = dec().matrixLU().cols(), dimker = cols - rank(); if(dimker == 0) { // The Kernel is just {0}, so it doesn't have a basis properly speaking, but let's // avoid crashing/asserting as that depends on floating point calculations. Let's // just return a single column vector filled with zeros. dst.setZero(); return; } /* Let us use the following lemma: * * Lemma: If the matrix A has the LU decomposition PAQ = LU, * then Ker A = Q(Ker U). * * Proof: trivial: just keep in mind that P, Q, L are invertible. */ /* Thus, all we need to do is to compute Ker U, and then apply Q. * * U is upper triangular, with eigenvalues sorted so that any zeros appear at the end. * Thus, the diagonal of U ends with exactly * dimKer zero's. Let us use that to construct dimKer linearly * independent vectors in Ker U. */ Matrix pivots(rank()); RealScalar premultiplied_threshold = dec().maxPivot() * dec().threshold(); Index p = 0; for(Index i = 0; i < dec().nonzeroPivots(); ++i) if(abs(dec().matrixLU().coeff(i,i)) > premultiplied_threshold) pivots.coeffRef(p++) = i; eigen_internal_assert(p == rank()); // we construct a temporaty trapezoid matrix m, by taking the U matrix and // permuting the rows and cols to bring the nonnegligible pivots to the top of // the main diagonal. We need that to be able to apply our triangular solvers. // FIXME when we get triangularView-for-rectangular-matrices, this can be simplified Matrix m(dec().matrixLU().block(0, 0, rank(), cols)); for(Index i = 0; i < rank(); ++i) { if(i) m.row(i).head(i).setZero(); m.row(i).tail(cols-i) = dec().matrixLU().row(pivots.coeff(i)).tail(cols-i); } m.block(0, 0, rank(), rank()); m.block(0, 0, rank(), rank()).template triangularView().setZero(); for(Index i = 0; i < rank(); ++i) m.col(i).swap(m.col(pivots.coeff(i))); // ok, we have our trapezoid matrix, we can apply the triangular solver. // notice that the math behind this suggests that we should apply this to the // negative of the RHS, but for performance we just put the negative sign elsewhere, see below. m.topLeftCorner(rank(), rank()) .template triangularView().solveInPlace( m.topRightCorner(rank(), dimker) ); // now we must undo the column permutation that we had applied! for(Index i = rank()-1; i >= 0; --i) m.col(i).swap(m.col(pivots.coeff(i))); // see the negative sign in the next line, that's what we were talking about above. for(Index i = 0; i < rank(); ++i) dst.row(dec().permutationQ().indices().coeff(i)) = -m.row(i).tail(dimker); for(Index i = rank(); i < cols; ++i) dst.row(dec().permutationQ().indices().coeff(i)).setZero(); for(Index k = 0; k < dimker; ++k) dst.coeffRef(dec().permutationQ().indices().coeff(rank()+k), k) = Scalar(1); } }; /***** Implementation of image() *****************************************************/ template struct image_retval > : image_retval_base > { EIGEN_MAKE_IMAGE_HELPERS(FullPivLU<_MatrixType>) enum { MaxSmallDimAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED( MatrixType::MaxColsAtCompileTime, MatrixType::MaxRowsAtCompileTime) }; template void evalTo(Dest& dst) const { using std::abs; if(rank() == 0) { // The Image is just {0}, so it doesn't have a basis properly speaking, but let's // avoid crashing/asserting as that depends on floating point calculations. Let's // just return a single column vector filled with zeros. dst.setZero(); return; } Matrix pivots(rank()); RealScalar premultiplied_threshold = dec().maxPivot() * dec().threshold(); Index p = 0; for(Index i = 0; i < dec().nonzeroPivots(); ++i) if(abs(dec().matrixLU().coeff(i,i)) > premultiplied_threshold) pivots.coeffRef(p++) = i; eigen_internal_assert(p == rank()); for(Index i = 0; i < rank(); ++i) dst.col(i) = originalMatrix().col(dec().permutationQ().indices().coeff(pivots.coeff(i))); } }; /***** Implementation of solve() *****************************************************/ } // end namespace internal #ifndef EIGEN_PARSED_BY_DOXYGEN template template void FullPivLU<_MatrixType>::_solve_impl(const RhsType &rhs, DstType &dst) const { /* The decomposition PAQ = LU can be rewritten as A = P^{-1} L U Q^{-1}. * So we proceed as follows: * Step 1: compute c = P * rhs. * Step 2: replace c by the solution x to Lx = c. Exists because L is invertible. * Step 3: replace c by the solution x to Ux = c. May or may not exist. * Step 4: result = Q * c; */ const Index rows = this->rows(), cols = this->cols(), nonzero_pivots = this->rank(); const Index smalldim = (std::min)(rows, cols); if(nonzero_pivots == 0) { dst.setZero(); return; } typename RhsType::PlainObject c(rhs.rows(), rhs.cols()); // Step 1 c = permutationP() * rhs; // Step 2 m_lu.topLeftCorner(smalldim,smalldim) .template triangularView() .solveInPlace(c.topRows(smalldim)); if(rows>cols) c.bottomRows(rows-cols) -= m_lu.bottomRows(rows-cols) * c.topRows(cols); // Step 3 m_lu.topLeftCorner(nonzero_pivots, nonzero_pivots) .template triangularView() .solveInPlace(c.topRows(nonzero_pivots)); // Step 4 for(Index i = 0; i < nonzero_pivots; ++i) dst.row(permutationQ().indices().coeff(i)) = c.row(i); for(Index i = nonzero_pivots; i < m_lu.cols(); ++i) dst.row(permutationQ().indices().coeff(i)).setZero(); } template template void FullPivLU<_MatrixType>::_solve_impl_transposed(const RhsType &rhs, DstType &dst) const { /* The decomposition PAQ = LU can be rewritten as A = P^{-1} L U Q^{-1}, * and since permutations are real and unitary, we can write this * as A^T = Q U^T L^T P, * So we proceed as follows: * Step 1: compute c = Q^T rhs. * Step 2: replace c by the solution x to U^T x = c. May or may not exist. * Step 3: replace c by the solution x to L^T x = c. * Step 4: result = P^T c. * If Conjugate is true, replace "^T" by "^*" above. */ const Index rows = this->rows(), cols = this->cols(), nonzero_pivots = this->rank(); const Index smalldim = (std::min)(rows, cols); if(nonzero_pivots == 0) { dst.setZero(); return; } typename RhsType::PlainObject c(rhs.rows(), rhs.cols()); // Step 1 c = permutationQ().inverse() * rhs; // Step 2 m_lu.topLeftCorner(nonzero_pivots, nonzero_pivots) .template triangularView() .transpose() .template conjugateIf() .solveInPlace(c.topRows(nonzero_pivots)); // Step 3 m_lu.topLeftCorner(smalldim, smalldim) .template triangularView() .transpose() .template conjugateIf() .solveInPlace(c.topRows(smalldim)); // Step 4 PermutationPType invp = permutationP().inverse().eval(); for(Index i = 0; i < smalldim; ++i) dst.row(invp.indices().coeff(i)) = c.row(i); for(Index i = smalldim; i < rows; ++i) dst.row(invp.indices().coeff(i)).setZero(); } #endif namespace internal { /***** Implementation of inverse() *****************************************************/ template struct Assignment >, internal::assign_op::Scalar>, Dense2Dense> { typedef FullPivLU LuType; typedef Inverse SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { dst = src.nestedExpression().solve(MatrixType::Identity(src.rows(), src.cols())); } }; } // end namespace internal /******* MatrixBase methods *****************************************************************/ /** \lu_module * * \return the full-pivoting LU decomposition of \c *this. * * \sa class FullPivLU */ template inline const FullPivLU::PlainObject> MatrixBase::fullPivLu() const { return FullPivLU(eval()); } } // end namespace Eigen #endif // EIGEN_LU_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/0000755000176200001440000000000014562417221023015 5ustar liggesusersRcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/IncompleteCholesky.h0000644000176200001440000003527414562417221027002 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Désiré Nuentsa-Wakam // Copyright (C) 2015 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INCOMPLETE_CHOlESKY_H #define EIGEN_INCOMPLETE_CHOlESKY_H #include #include namespace Eigen { /** * \brief Modified Incomplete Cholesky with dual threshold * * References : C-J. Lin and J. J. Moré, Incomplete Cholesky Factorizations with * Limited memory, SIAM J. Sci. Comput. 21(1), pp. 24-45, 1999 * * \tparam Scalar the scalar type of the input matrices * \tparam _UpLo The triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * \tparam _OrderingType The ordering method to use, either AMDOrdering<> or NaturalOrdering<>. Default is AMDOrdering, * unless EIGEN_MPL2_ONLY is defined, in which case the default is NaturalOrdering. * * \implsparsesolverconcept * * It performs the following incomplete factorization: \f$ S P A P' S \approx L L' \f$ * where L is a lower triangular factor, S is a diagonal scaling matrix, and P is a * fill-in reducing permutation as computed by the ordering method. * * \b Shifting \b strategy: Let \f$ B = S P A P' S \f$ be the scaled matrix on which the factorization is carried out, * and \f$ \beta \f$ be the minimum value of the diagonal. If \f$ \beta > 0 \f$ then, the factorization is directly performed * on the matrix B. Otherwise, the factorization is performed on the shifted matrix \f$ B + (\sigma+|\beta| I \f$ where * \f$ \sigma \f$ is the initial shift value as returned and set by setInitialShift() method. The default value is \f$ \sigma = 10^{-3} \f$. * If the factorization fails, then the shift in doubled until it succeed or a maximum of ten attempts. If it still fails, as returned by * the info() method, then you can either increase the initial shift, or better use another preconditioning technique. * */ template > class IncompleteCholesky : public SparseSolverBase > { protected: typedef SparseSolverBase > Base; using Base::m_isInitialized; public: typedef typename NumTraits::Real RealScalar; typedef _OrderingType OrderingType; typedef typename OrderingType::PermutationType PermutationType; typedef typename PermutationType::StorageIndex StorageIndex; typedef SparseMatrix FactorType; typedef Matrix VectorSx; typedef Matrix VectorRx; typedef Matrix VectorIx; typedef std::vector > VectorList; enum { UpLo = _UpLo }; enum { ColsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic }; public: /** Default constructor leaving the object in a partly non-initialized stage. * * You must call compute() or the pair analyzePattern()/factorize() to make it valid. * * \sa IncompleteCholesky(const MatrixType&) */ IncompleteCholesky() : m_initialShift(1e-3),m_analysisIsOk(false),m_factorizationIsOk(false) {} /** Constructor computing the incomplete factorization for the given matrix \a matrix. */ template IncompleteCholesky(const MatrixType& matrix) : m_initialShift(1e-3),m_analysisIsOk(false),m_factorizationIsOk(false) { compute(matrix); } /** \returns number of rows of the factored matrix */ EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_L.rows(); } /** \returns number of columns of the factored matrix */ EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_L.cols(); } /** \brief Reports whether previous computation was successful. * * It triggers an assertion if \c *this has not been initialized through the respective constructor, * or a call to compute() or analyzePattern(). * * \returns \c Success if computation was successful, * \c NumericalIssue if the matrix appears to be negative. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "IncompleteCholesky is not initialized."); return m_info; } /** \brief Set the initial shift parameter \f$ \sigma \f$. */ void setInitialShift(RealScalar shift) { m_initialShift = shift; } /** \brief Computes the fill reducing permutation vector using the sparsity pattern of \a mat */ template void analyzePattern(const MatrixType& mat) { OrderingType ord; PermutationType pinv; ord(mat.template selfadjointView(), pinv); if(pinv.size()>0) m_perm = pinv.inverse(); else m_perm.resize(0); m_L.resize(mat.rows(), mat.cols()); m_analysisIsOk = true; m_isInitialized = true; m_info = Success; } /** \brief Performs the numerical factorization of the input matrix \a mat * * The method analyzePattern() or compute() must have been called beforehand * with a matrix having the same pattern. * * \sa compute(), analyzePattern() */ template void factorize(const MatrixType& mat); /** Computes or re-computes the incomplete Cholesky factorization of the input matrix \a mat * * It is a shortcut for a sequential call to the analyzePattern() and factorize() methods. * * \sa analyzePattern(), factorize() */ template void compute(const MatrixType& mat) { analyzePattern(mat); factorize(mat); } // internal template void _solve_impl(const Rhs& b, Dest& x) const { eigen_assert(m_factorizationIsOk && "factorize() should be called first"); if (m_perm.rows() == b.rows()) x = m_perm * b; else x = b; x = m_scale.asDiagonal() * x; x = m_L.template triangularView().solve(x); x = m_L.adjoint().template triangularView().solve(x); x = m_scale.asDiagonal() * x; if (m_perm.rows() == b.rows()) x = m_perm.inverse() * x; } /** \returns the sparse lower triangular factor L */ const FactorType& matrixL() const { eigen_assert("m_factorizationIsOk"); return m_L; } /** \returns a vector representing the scaling factor S */ const VectorRx& scalingS() const { eigen_assert("m_factorizationIsOk"); return m_scale; } /** \returns the fill-in reducing permutation P (can be empty for a natural ordering) */ const PermutationType& permutationP() const { eigen_assert("m_analysisIsOk"); return m_perm; } protected: FactorType m_L; // The lower part stored in CSC VectorRx m_scale; // The vector for scaling the matrix RealScalar m_initialShift; // The initial shift parameter bool m_analysisIsOk; bool m_factorizationIsOk; ComputationInfo m_info; PermutationType m_perm; private: inline void updateList(Ref colPtr, Ref rowIdx, Ref vals, const Index& col, const Index& jk, VectorIx& firstElt, VectorList& listCol); }; // Based on the following paper: // C-J. Lin and J. J. Moré, Incomplete Cholesky Factorizations with // Limited memory, SIAM J. Sci. Comput. 21(1), pp. 24-45, 1999 // http://ftp.mcs.anl.gov/pub/tech_reports/reports/P682.pdf template template void IncompleteCholesky::factorize(const _MatrixType& mat) { using std::sqrt; eigen_assert(m_analysisIsOk && "analyzePattern() should be called first"); // Dropping strategy : Keep only the p largest elements per column, where p is the number of elements in the column of the original matrix. Other strategies will be added // Apply the fill-reducing permutation computed in analyzePattern() if (m_perm.rows() == mat.rows() ) // To detect the null permutation { // The temporary is needed to make sure that the diagonal entry is properly sorted FactorType tmp(mat.rows(), mat.cols()); tmp = mat.template selfadjointView<_UpLo>().twistedBy(m_perm); m_L.template selfadjointView() = tmp.template selfadjointView(); } else { m_L.template selfadjointView() = mat.template selfadjointView<_UpLo>(); } Index n = m_L.cols(); Index nnz = m_L.nonZeros(); Map vals(m_L.valuePtr(), nnz); //values Map rowIdx(m_L.innerIndexPtr(), nnz); //Row indices Map colPtr( m_L.outerIndexPtr(), n+1); // Pointer to the beginning of each row VectorIx firstElt(n-1); // for each j, points to the next entry in vals that will be used in the factorization VectorList listCol(n); // listCol(j) is a linked list of columns to update column j VectorSx col_vals(n); // Store a nonzero values in each column VectorIx col_irow(n); // Row indices of nonzero elements in each column VectorIx col_pattern(n); col_pattern.fill(-1); StorageIndex col_nnz; // Computes the scaling factors m_scale.resize(n); m_scale.setZero(); for (Index j = 0; j < n; j++) for (Index k = colPtr[j]; k < colPtr[j+1]; k++) { m_scale(j) += numext::abs2(vals(k)); if(rowIdx[k]!=j) m_scale(rowIdx[k]) += numext::abs2(vals(k)); } m_scale = m_scale.cwiseSqrt().cwiseSqrt(); for (Index j = 0; j < n; ++j) if(m_scale(j)>(std::numeric_limits::min)()) m_scale(j) = RealScalar(1)/m_scale(j); else m_scale(j) = 1; // TODO disable scaling if not needed, i.e., if it is roughly uniform? (this will make solve() faster) // Scale and compute the shift for the matrix RealScalar mindiag = NumTraits::highest(); for (Index j = 0; j < n; j++) { for (Index k = colPtr[j]; k < colPtr[j+1]; k++) vals[k] *= (m_scale(j)*m_scale(rowIdx[k])); eigen_internal_assert(rowIdx[colPtr[j]]==j && "IncompleteCholesky: only the lower triangular part must be stored"); mindiag = numext::mini(numext::real(vals[colPtr[j]]), mindiag); } FactorType L_save = m_L; RealScalar shift = 0; if(mindiag <= RealScalar(0.)) shift = m_initialShift - mindiag; m_info = NumericalIssue; // Try to perform the incomplete factorization using the current shift int iter = 0; do { // Apply the shift to the diagonal elements of the matrix for (Index j = 0; j < n; j++) vals[colPtr[j]] += shift; // jki version of the Cholesky factorization Index j=0; for (; j < n; ++j) { // Left-looking factorization of the j-th column // First, load the j-th column into col_vals Scalar diag = vals[colPtr[j]]; // It is assumed that only the lower part is stored col_nnz = 0; for (Index i = colPtr[j] + 1; i < colPtr[j+1]; i++) { StorageIndex l = rowIdx[i]; col_vals(col_nnz) = vals[i]; col_irow(col_nnz) = l; col_pattern(l) = col_nnz; col_nnz++; } { typename std::list::iterator k; // Browse all previous columns that will update column j for(k = listCol[j].begin(); k != listCol[j].end(); k++) { Index jk = firstElt(*k); // First element to use in the column eigen_internal_assert(rowIdx[jk]==j); Scalar v_j_jk = numext::conj(vals[jk]); jk += 1; for (Index i = jk; i < colPtr[*k+1]; i++) { StorageIndex l = rowIdx[i]; if(col_pattern[l]<0) { col_vals(col_nnz) = vals[i] * v_j_jk; col_irow[col_nnz] = l; col_pattern(l) = col_nnz; col_nnz++; } else col_vals(col_pattern[l]) -= vals[i] * v_j_jk; } updateList(colPtr,rowIdx,vals, *k, jk, firstElt, listCol); } } // Scale the current column if(numext::real(diag) <= 0) { if(++iter>=10) return; // increase shift shift = numext::maxi(m_initialShift,RealScalar(2)*shift); // restore m_L, col_pattern, and listCol vals = Map(L_save.valuePtr(), nnz); rowIdx = Map(L_save.innerIndexPtr(), nnz); colPtr = Map(L_save.outerIndexPtr(), n+1); col_pattern.fill(-1); for(Index i=0; i cvals = col_vals.head(col_nnz); Ref cirow = col_irow.head(col_nnz); internal::QuickSplit(cvals,cirow, p); // Insert the largest p elements in the matrix Index cpt = 0; for (Index i = colPtr[j]+1; i < colPtr[j+1]; i++) { vals[i] = col_vals(cpt); rowIdx[i] = col_irow(cpt); // restore col_pattern: col_pattern(col_irow(cpt)) = -1; cpt++; } // Get the first smallest row index and put it after the diagonal element Index jk = colPtr(j)+1; updateList(colPtr,rowIdx,vals,j,jk,firstElt,listCol); } if(j==n) { m_factorizationIsOk = true; m_info = Success; } } while(m_info!=Success); } template inline void IncompleteCholesky::updateList(Ref colPtr, Ref rowIdx, Ref vals, const Index& col, const Index& jk, VectorIx& firstElt, VectorList& listCol) { if (jk < colPtr(col+1) ) { Index p = colPtr(col+1) - jk; Index minpos; rowIdx.segment(jk,p).minCoeff(&minpos); minpos += jk; if (rowIdx(minpos) != rowIdx(jk)) { //Swap std::swap(rowIdx(jk),rowIdx(minpos)); std::swap(vals(jk),vals(minpos)); } firstElt(col) = internal::convert_index(jk); listCol[rowIdx(jk)].push_back(internal::convert_index(col)); } } } // end namespace Eigen #endif RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/SolveWithGuess.h0000644000176200001440000001016414562417221026123 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SOLVEWITHGUESS_H #define EIGEN_SOLVEWITHGUESS_H namespace Eigen { template class SolveWithGuess; /** \class SolveWithGuess * \ingroup IterativeLinearSolvers_Module * * \brief Pseudo expression representing a solving operation * * \tparam Decomposition the type of the matrix or decomposion object * \tparam Rhstype the type of the right-hand side * * This class represents an expression of A.solve(B) * and most of the time this is the only way it is used. * */ namespace internal { template struct traits > : traits > {}; } template class SolveWithGuess : public internal::generic_xpr_base, MatrixXpr, typename internal::traits::StorageKind>::type { public: typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::PlainObject PlainObject; typedef typename internal::generic_xpr_base, MatrixXpr, typename internal::traits::StorageKind>::type Base; typedef typename internal::ref_selector::type Nested; SolveWithGuess(const Decomposition &dec, const RhsType &rhs, const GuessType &guess) : m_dec(dec), m_rhs(rhs), m_guess(guess) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dec.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); } EIGEN_DEVICE_FUNC const Decomposition& dec() const { return m_dec; } EIGEN_DEVICE_FUNC const RhsType& rhs() const { return m_rhs; } EIGEN_DEVICE_FUNC const GuessType& guess() const { return m_guess; } protected: const Decomposition &m_dec; const RhsType &m_rhs; const GuessType &m_guess; private: Scalar coeff(Index row, Index col) const; Scalar coeff(Index i) const; }; namespace internal { // Evaluator of SolveWithGuess -> eval into a temporary template struct evaluator > : public evaluator::PlainObject> { typedef SolveWithGuess SolveType; typedef typename SolveType::PlainObject PlainObject; typedef evaluator Base; evaluator(const SolveType& solve) : m_result(solve.rows(), solve.cols()) { ::new (static_cast(this)) Base(m_result); m_result = solve.guess(); solve.dec()._solve_with_guess_impl(solve.rhs(), m_result); } protected: PlainObject m_result; }; // Specialization for "dst = dec.solveWithGuess(rhs)" // NOTE we need to specialize it for Dense2Dense to avoid ambiguous specialization error and a Sparse2Sparse specialization must exist somewhere template struct Assignment, internal::assign_op, Dense2Dense> { typedef SolveWithGuess SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); dst = src.guess(); src.dec()._solve_with_guess_impl(src.rhs(), dst/*, src.guess()*/); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SOLVEWITHGUESS_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h0000644000176200001440000001516314562417221027305 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011-2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BASIC_PRECONDITIONERS_H #define EIGEN_BASIC_PRECONDITIONERS_H namespace Eigen { /** \ingroup IterativeLinearSolvers_Module * \brief A preconditioner based on the digonal entries * * This class allows to approximately solve for A.x = b problems assuming A is a diagonal matrix. * In other words, this preconditioner neglects all off diagonal entries and, in Eigen's language, solves for: \code A.diagonal().asDiagonal() . x = b \endcode * * \tparam _Scalar the type of the scalar. * * \implsparsesolverconcept * * This preconditioner is suitable for both selfadjoint and general problems. * The diagonal entries are pre-inverted and stored into a dense vector. * * \note A variant that has yet to be implemented would attempt to preserve the norm of each column. * * \sa class LeastSquareDiagonalPreconditioner, class ConjugateGradient */ template class DiagonalPreconditioner { typedef _Scalar Scalar; typedef Matrix Vector; public: typedef typename Vector::StorageIndex StorageIndex; enum { ColsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic }; DiagonalPreconditioner() : m_isInitialized(false) {} template explicit DiagonalPreconditioner(const MatType& mat) : m_invdiag(mat.cols()) { compute(mat); } EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_invdiag.size(); } EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_invdiag.size(); } template DiagonalPreconditioner& analyzePattern(const MatType& ) { return *this; } template DiagonalPreconditioner& factorize(const MatType& mat) { m_invdiag.resize(mat.cols()); for(int j=0; j DiagonalPreconditioner& compute(const MatType& mat) { return factorize(mat); } /** \internal */ template void _solve_impl(const Rhs& b, Dest& x) const { x = m_invdiag.array() * b.array() ; } template inline const Solve solve(const MatrixBase& b) const { eigen_assert(m_isInitialized && "DiagonalPreconditioner is not initialized."); eigen_assert(m_invdiag.size()==b.rows() && "DiagonalPreconditioner::solve(): invalid number of rows of the right hand side matrix b"); return Solve(*this, b.derived()); } ComputationInfo info() { return Success; } protected: Vector m_invdiag; bool m_isInitialized; }; /** \ingroup IterativeLinearSolvers_Module * \brief Jacobi preconditioner for LeastSquaresConjugateGradient * * This class allows to approximately solve for A' A x = A' b problems assuming A' A is a diagonal matrix. * In other words, this preconditioner neglects all off diagonal entries and, in Eigen's language, solves for: \code (A.adjoint() * A).diagonal().asDiagonal() * x = b \endcode * * \tparam _Scalar the type of the scalar. * * \implsparsesolverconcept * * The diagonal entries are pre-inverted and stored into a dense vector. * * \sa class LeastSquaresConjugateGradient, class DiagonalPreconditioner */ template class LeastSquareDiagonalPreconditioner : public DiagonalPreconditioner<_Scalar> { typedef _Scalar Scalar; typedef typename NumTraits::Real RealScalar; typedef DiagonalPreconditioner<_Scalar> Base; using Base::m_invdiag; public: LeastSquareDiagonalPreconditioner() : Base() {} template explicit LeastSquareDiagonalPreconditioner(const MatType& mat) : Base() { compute(mat); } template LeastSquareDiagonalPreconditioner& analyzePattern(const MatType& ) { return *this; } template LeastSquareDiagonalPreconditioner& factorize(const MatType& mat) { // Compute the inverse squared-norm of each column of mat m_invdiag.resize(mat.cols()); if(MatType::IsRowMajor) { m_invdiag.setZero(); for(Index j=0; jRealScalar(0)) m_invdiag(j) = RealScalar(1)/numext::real(m_invdiag(j)); } else { for(Index j=0; jRealScalar(0)) m_invdiag(j) = RealScalar(1)/sum; else m_invdiag(j) = RealScalar(1); } } Base::m_isInitialized = true; return *this; } template LeastSquareDiagonalPreconditioner& compute(const MatType& mat) { return factorize(mat); } ComputationInfo info() { return Success; } protected: }; /** \ingroup IterativeLinearSolvers_Module * \brief A naive preconditioner which approximates any matrix as the identity matrix * * \implsparsesolverconcept * * \sa class DiagonalPreconditioner */ class IdentityPreconditioner { public: IdentityPreconditioner() {} template explicit IdentityPreconditioner(const MatrixType& ) {} template IdentityPreconditioner& analyzePattern(const MatrixType& ) { return *this; } template IdentityPreconditioner& factorize(const MatrixType& ) { return *this; } template IdentityPreconditioner& compute(const MatrixType& ) { return *this; } template inline const Rhs& solve(const Rhs& b) const { return b; } ComputationInfo info() { return Success; } }; } // end namespace Eigen #endif // EIGEN_BASIC_PRECONDITIONERS_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/ConjugateGradient.h0000644000176200001440000002126714562417221026573 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011-2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CONJUGATE_GRADIENT_H #define EIGEN_CONJUGATE_GRADIENT_H namespace Eigen { namespace internal { /** \internal Low-level conjugate gradient algorithm * \param mat The matrix A * \param rhs The right hand side vector b * \param x On input and initial solution, on output the computed solution. * \param precond A preconditioner being able to efficiently solve for an * approximation of Ax=b (regardless of b) * \param iters On input the max number of iteration, on output the number of performed iterations. * \param tol_error On input the tolerance error, on output an estimation of the relative error. */ template EIGEN_DONT_INLINE void conjugate_gradient(const MatrixType& mat, const Rhs& rhs, Dest& x, const Preconditioner& precond, Index& iters, typename Dest::RealScalar& tol_error) { using std::sqrt; using std::abs; typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix VectorType; RealScalar tol = tol_error; Index maxIters = iters; Index n = mat.cols(); VectorType residual = rhs - mat * x; //initial residual RealScalar rhsNorm2 = rhs.squaredNorm(); if(rhsNorm2 == 0) { x.setZero(); iters = 0; tol_error = 0; return; } const RealScalar considerAsZero = (std::numeric_limits::min)(); RealScalar threshold = numext::maxi(RealScalar(tol*tol*rhsNorm2),considerAsZero); RealScalar residualNorm2 = residual.squaredNorm(); if (residualNorm2 < threshold) { iters = 0; tol_error = sqrt(residualNorm2 / rhsNorm2); return; } VectorType p(n); p = precond.solve(residual); // initial search direction VectorType z(n), tmp(n); RealScalar absNew = numext::real(residual.dot(p)); // the square of the absolute value of r scaled by invM Index i = 0; while(i < maxIters) { tmp.noalias() = mat * p; // the bottleneck of the algorithm Scalar alpha = absNew / p.dot(tmp); // the amount we travel on dir x += alpha * p; // update solution residual -= alpha * tmp; // update residual residualNorm2 = residual.squaredNorm(); if(residualNorm2 < threshold) break; z = precond.solve(residual); // approximately solve for "A z = residual" RealScalar absOld = absNew; absNew = numext::real(residual.dot(z)); // update the absolute value of r RealScalar beta = absNew / absOld; // calculate the Gram-Schmidt value used to create the new search direction p = z + beta * p; // update search direction i++; } tol_error = sqrt(residualNorm2 / rhsNorm2); iters = i; } } template< typename _MatrixType, int _UpLo=Lower, typename _Preconditioner = DiagonalPreconditioner > class ConjugateGradient; namespace internal { template< typename _MatrixType, int _UpLo, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } /** \ingroup IterativeLinearSolvers_Module * \brief A conjugate gradient solver for sparse (or dense) self-adjoint problems * * This class allows to solve for A.x = b linear problems using an iterative conjugate gradient algorithm. * The matrix A must be selfadjoint. The matrix A and the vectors x and b can be either dense or sparse. * * \tparam _MatrixType the type of the matrix A, can be a dense or a sparse matrix. * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower, * \c Upper, or \c Lower|Upper in which the full matrix entries will be considered. * Default is \c Lower, best performance is \c Lower|Upper. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * * \implsparsesolverconcept * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * The tolerance corresponds to the relative residual error: |Ax-b|/|b| * * \b Performance: Even though the default value of \c _UpLo is \c Lower, significantly higher performance is * achieved when using a complete matrix and \b Lower|Upper as the \a _UpLo template parameter. Moreover, in this * case multi-threading can be exploited if the user code is compiled with OpenMP enabled. * See \ref TopicMultiThreading for details. * * This class can be used as the direct solver classes. Here is a typical usage example: \code int n = 10000; VectorXd x(n), b(n); SparseMatrix A(n,n); // fill A and b ConjugateGradient, Lower|Upper> cg; cg.compute(A); x = cg.solve(b); std::cout << "#iterations: " << cg.iterations() << std::endl; std::cout << "estimated error: " << cg.error() << std::endl; // update b, and solve again x = cg.solve(b); \endcode * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * ConjugateGradient can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * \sa class LeastSquaresConjugateGradient, class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template< typename _MatrixType, int _UpLo, typename _Preconditioner> class ConjugateGradient : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; public: typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; enum { UpLo = _UpLo }; public: /** Default constructor. */ ConjugateGradient() : Base() {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit ConjugateGradient(const EigenBase& A) : Base(A.derived()) {} ~ConjugateGradient() {} /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { typedef typename Base::MatrixWrapper MatrixWrapper; typedef typename Base::ActualMatrixType ActualMatrixType; enum { TransposeInput = (!MatrixWrapper::MatrixFree) && (UpLo==(Lower|Upper)) && (!MatrixType::IsRowMajor) && (!NumTraits::IsComplex) }; typedef typename internal::conditional, ActualMatrixType const&>::type RowMajorWrapper; EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(MatrixWrapper::MatrixFree,UpLo==(Lower|Upper)),MATRIX_FREE_CONJUGATE_GRADIENT_IS_COMPATIBLE_WITH_UPPER_UNION_LOWER_MODE_ONLY); typedef typename internal::conditional::Type >::type SelfAdjointWrapper; m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; RowMajorWrapper row_mat(matrix()); internal::conjugate_gradient(SelfAdjointWrapper(row_mat), b, x, Base::m_preconditioner, m_iterations, m_error); m_info = m_error <= Base::m_tolerance ? Success : NoConvergence; } protected: }; } // end namespace Eigen #endif // EIGEN_CONJUGATE_GRADIENT_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h0000644000176200001440000003210314562417221027107 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011-2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ITERATIVE_SOLVER_BASE_H #define EIGEN_ITERATIVE_SOLVER_BASE_H namespace Eigen { namespace internal { template struct is_ref_compatible_impl { private: template struct any_conversion { template any_conversion(const volatile T&); template any_conversion(T&); }; struct yes {int a[1];}; struct no {int a[2];}; template static yes test(const Ref&, int); template static no test(any_conversion, ...); public: static MatrixType ms_from; enum { value = sizeof(test(ms_from, 0))==sizeof(yes) }; }; template struct is_ref_compatible { enum { value = is_ref_compatible_impl::type>::value }; }; template::value> class generic_matrix_wrapper; // We have an explicit matrix at hand, compatible with Ref<> template class generic_matrix_wrapper { public: typedef Ref ActualMatrixType; template struct ConstSelfAdjointViewReturnType { typedef typename ActualMatrixType::template ConstSelfAdjointViewReturnType::Type Type; }; enum { MatrixFree = false }; generic_matrix_wrapper() : m_dummy(0,0), m_matrix(m_dummy) {} template generic_matrix_wrapper(const InputType &mat) : m_matrix(mat) {} const ActualMatrixType& matrix() const { return m_matrix; } template void grab(const EigenBase &mat) { m_matrix.~Ref(); ::new (&m_matrix) Ref(mat.derived()); } void grab(const Ref &mat) { if(&(mat.derived()) != &m_matrix) { m_matrix.~Ref(); ::new (&m_matrix) Ref(mat); } } protected: MatrixType m_dummy; // used to default initialize the Ref<> object ActualMatrixType m_matrix; }; // MatrixType is not compatible with Ref<> -> matrix-free wrapper template class generic_matrix_wrapper { public: typedef MatrixType ActualMatrixType; template struct ConstSelfAdjointViewReturnType { typedef ActualMatrixType Type; }; enum { MatrixFree = true }; generic_matrix_wrapper() : mp_matrix(0) {} generic_matrix_wrapper(const MatrixType &mat) : mp_matrix(&mat) {} const ActualMatrixType& matrix() const { return *mp_matrix; } void grab(const MatrixType &mat) { mp_matrix = &mat; } protected: const ActualMatrixType *mp_matrix; }; } /** \ingroup IterativeLinearSolvers_Module * \brief Base class for linear iterative solvers * * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template< typename Derived> class IterativeSolverBase : public SparseSolverBase { protected: typedef SparseSolverBase Base; using Base::m_isInitialized; public: typedef typename internal::traits::MatrixType MatrixType; typedef typename internal::traits::Preconditioner Preconditioner; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef typename MatrixType::RealScalar RealScalar; enum { ColsAtCompileTime = MatrixType::ColsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime }; public: using Base::derived; /** Default constructor. */ IterativeSolverBase() { init(); } /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit IterativeSolverBase(const EigenBase& A) : m_matrixWrapper(A.derived()) { init(); compute(matrix()); } ~IterativeSolverBase() {} /** Initializes the iterative solver for the sparsity pattern of the matrix \a A for further solving \c Ax=b problems. * * Currently, this function mostly calls analyzePattern on the preconditioner. In the future * we might, for instance, implement column reordering for faster matrix vector products. */ template Derived& analyzePattern(const EigenBase& A) { grab(A.derived()); m_preconditioner.analyzePattern(matrix()); m_isInitialized = true; m_analysisIsOk = true; m_info = m_preconditioner.info(); return derived(); } /** Initializes the iterative solver with the numerical values of the matrix \a A for further solving \c Ax=b problems. * * Currently, this function mostly calls factorize on the preconditioner. * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template Derived& factorize(const EigenBase& A) { eigen_assert(m_analysisIsOk && "You must first call analyzePattern()"); grab(A.derived()); m_preconditioner.factorize(matrix()); m_factorizationIsOk = true; m_info = m_preconditioner.info(); return derived(); } /** Initializes the iterative solver with the matrix \a A for further solving \c Ax=b problems. * * Currently, this function mostly initializes/computes the preconditioner. In the future * we might, for instance, implement column reordering for faster matrix vector products. * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template Derived& compute(const EigenBase& A) { grab(A.derived()); m_preconditioner.compute(matrix()); m_isInitialized = true; m_analysisIsOk = true; m_factorizationIsOk = true; m_info = m_preconditioner.info(); return derived(); } /** \internal */ EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return matrix().rows(); } /** \internal */ EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return matrix().cols(); } /** \returns the tolerance threshold used by the stopping criteria. * \sa setTolerance() */ RealScalar tolerance() const { return m_tolerance; } /** Sets the tolerance threshold used by the stopping criteria. * * This value is used as an upper bound to the relative residual error: |Ax-b|/|b|. * The default value is the machine precision given by NumTraits::epsilon() */ Derived& setTolerance(const RealScalar& tolerance) { m_tolerance = tolerance; return derived(); } /** \returns a read-write reference to the preconditioner for custom configuration. */ Preconditioner& preconditioner() { return m_preconditioner; } /** \returns a read-only reference to the preconditioner. */ const Preconditioner& preconditioner() const { return m_preconditioner; } /** \returns the max number of iterations. * It is either the value set by setMaxIterations or, by default, * twice the number of columns of the matrix. */ Index maxIterations() const { return (m_maxIterations<0) ? 2*matrix().cols() : m_maxIterations; } /** Sets the max number of iterations. * Default is twice the number of columns of the matrix. */ Derived& setMaxIterations(Index maxIters) { m_maxIterations = maxIters; return derived(); } /** \returns the number of iterations performed during the last solve */ Index iterations() const { eigen_assert(m_isInitialized && "ConjugateGradient is not initialized."); return m_iterations; } /** \returns the tolerance error reached during the last solve. * It is a close approximation of the true relative residual error |Ax-b|/|b|. */ RealScalar error() const { eigen_assert(m_isInitialized && "ConjugateGradient is not initialized."); return m_error; } /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A * and \a x0 as an initial solution. * * \sa solve(), compute() */ template inline const SolveWithGuess solveWithGuess(const MatrixBase& b, const Guess& x0) const { eigen_assert(m_isInitialized && "Solver is not initialized."); eigen_assert(derived().rows()==b.rows() && "solve(): invalid number of rows of the right hand side matrix b"); return SolveWithGuess(derived(), b.derived(), x0); } /** \returns Success if the iterations converged, and NoConvergence otherwise. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "IterativeSolverBase is not initialized."); return m_info; } /** \internal */ template void _solve_with_guess_impl(const Rhs& b, SparseMatrixBase &aDest) const { eigen_assert(rows()==b.rows()); Index rhsCols = b.cols(); Index size = b.rows(); DestDerived& dest(aDest.derived()); typedef typename DestDerived::Scalar DestScalar; Eigen::Matrix tb(size); Eigen::Matrix tx(cols()); // We do not directly fill dest because sparse expressions have to be free of aliasing issue. // For non square least-square problems, b and dest might not have the same size whereas they might alias each-other. typename DestDerived::PlainObject tmp(cols(),rhsCols); ComputationInfo global_info = Success; for(Index k=0; k typename internal::enable_if::type _solve_with_guess_impl(const Rhs& b, MatrixBase &aDest) const { eigen_assert(rows()==b.rows()); Index rhsCols = b.cols(); DestDerived& dest(aDest.derived()); ComputationInfo global_info = Success; for(Index k=0; k typename internal::enable_if::type _solve_with_guess_impl(const Rhs& b, MatrixBase &dest) const { derived()._solve_vector_with_guess_impl(b,dest.derived()); } /** \internal default initial guess = 0 */ template void _solve_impl(const Rhs& b, Dest& x) const { x.setZero(); derived()._solve_with_guess_impl(b,x); } protected: void init() { m_isInitialized = false; m_analysisIsOk = false; m_factorizationIsOk = false; m_maxIterations = -1; m_tolerance = NumTraits::epsilon(); } typedef internal::generic_matrix_wrapper MatrixWrapper; typedef typename MatrixWrapper::ActualMatrixType ActualMatrixType; const ActualMatrixType& matrix() const { return m_matrixWrapper.matrix(); } template void grab(const InputType &A) { m_matrixWrapper.grab(A); } MatrixWrapper m_matrixWrapper; Preconditioner m_preconditioner; Index m_maxIterations; RealScalar m_tolerance; mutable RealScalar m_error; mutable Index m_iterations; mutable ComputationInfo m_info; mutable bool m_analysisIsOk, m_factorizationIsOk; }; } // end namespace Eigen #endif // EIGEN_ITERATIVE_SOLVER_BASE_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h0000644000176200001440000003513414562417221025660 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Désiré Nuentsa-Wakam // Copyright (C) 2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INCOMPLETE_LUT_H #define EIGEN_INCOMPLETE_LUT_H namespace Eigen { namespace internal { /** \internal * Compute a quick-sort split of a vector * On output, the vector row is permuted such that its elements satisfy * abs(row(i)) >= abs(row(ncut)) if incut * \param row The vector of values * \param ind The array of index for the elements in @p row * \param ncut The number of largest elements to keep **/ template Index QuickSplit(VectorV &row, VectorI &ind, Index ncut) { typedef typename VectorV::RealScalar RealScalar; using std::swap; using std::abs; Index mid; Index n = row.size(); /* length of the vector */ Index first, last ; ncut--; /* to fit the zero-based indices */ first = 0; last = n-1; if (ncut < first || ncut > last ) return 0; do { mid = first; RealScalar abskey = abs(row(mid)); for (Index j = first + 1; j <= last; j++) { if ( abs(row(j)) > abskey) { ++mid; swap(row(mid), row(j)); swap(ind(mid), ind(j)); } } /* Interchange for the pivot element */ swap(row(mid), row(first)); swap(ind(mid), ind(first)); if (mid > ncut) last = mid - 1; else if (mid < ncut ) first = mid + 1; } while (mid != ncut ); return 0; /* mid is equal to ncut */ } }// end namespace internal /** \ingroup IterativeLinearSolvers_Module * \class IncompleteLUT * \brief Incomplete LU factorization with dual-threshold strategy * * \implsparsesolverconcept * * During the numerical factorization, two dropping rules are used : * 1) any element whose magnitude is less than some tolerance is dropped. * This tolerance is obtained by multiplying the input tolerance @p droptol * by the average magnitude of all the original elements in the current row. * 2) After the elimination of the row, only the @p fill largest elements in * the L part and the @p fill largest elements in the U part are kept * (in addition to the diagonal element ). Note that @p fill is computed from * the input parameter @p fillfactor which is used the ratio to control the fill_in * relatively to the initial number of nonzero elements. * * The two extreme cases are when @p droptol=0 (to keep all the @p fill*2 largest elements) * and when @p fill=n/2 with @p droptol being different to zero. * * References : Yousef Saad, ILUT: A dual threshold incomplete LU factorization, * Numerical Linear Algebra with Applications, 1(4), pp 387-402, 1994. * * NOTE : The following implementation is derived from the ILUT implementation * in the SPARSKIT package, Copyright (C) 2005, the Regents of the University of Minnesota * released under the terms of the GNU LGPL: * http://www-users.cs.umn.edu/~saad/software/SPARSKIT/README * However, Yousef Saad gave us permission to relicense his ILUT code to MPL2. * See the Eigen mailing list archive, thread: ILUT, date: July 8, 2012: * http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/07/msg00064.html * alternatively, on GMANE: * http://comments.gmane.org/gmane.comp.lib.eigen/3302 */ template class IncompleteLUT : public SparseSolverBase > { protected: typedef SparseSolverBase Base; using Base::m_isInitialized; public: typedef _Scalar Scalar; typedef _StorageIndex StorageIndex; typedef typename NumTraits::Real RealScalar; typedef Matrix Vector; typedef Matrix VectorI; typedef SparseMatrix FactorType; enum { ColsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic }; public: IncompleteLUT() : m_droptol(NumTraits::dummy_precision()), m_fillfactor(10), m_analysisIsOk(false), m_factorizationIsOk(false) {} template explicit IncompleteLUT(const MatrixType& mat, const RealScalar& droptol=NumTraits::dummy_precision(), int fillfactor = 10) : m_droptol(droptol),m_fillfactor(fillfactor), m_analysisIsOk(false),m_factorizationIsOk(false) { eigen_assert(fillfactor != 0); compute(mat); } EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_lu.rows(); } EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_lu.cols(); } /** \brief Reports whether previous computation was successful. * * \returns \c Success if computation was successful, * \c NumericalIssue if the matrix.appears to be negative. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "IncompleteLUT is not initialized."); return m_info; } template void analyzePattern(const MatrixType& amat); template void factorize(const MatrixType& amat); /** * Compute an incomplete LU factorization with dual threshold on the matrix mat * No pivoting is done in this version * **/ template IncompleteLUT& compute(const MatrixType& amat) { analyzePattern(amat); factorize(amat); return *this; } void setDroptol(const RealScalar& droptol); void setFillfactor(int fillfactor); template void _solve_impl(const Rhs& b, Dest& x) const { x = m_Pinv * b; x = m_lu.template triangularView().solve(x); x = m_lu.template triangularView().solve(x); x = m_P * x; } protected: /** keeps off-diagonal entries; drops diagonal entries */ struct keep_diag { inline bool operator() (const Index& row, const Index& col, const Scalar&) const { return row!=col; } }; protected: FactorType m_lu; RealScalar m_droptol; int m_fillfactor; bool m_analysisIsOk; bool m_factorizationIsOk; ComputationInfo m_info; PermutationMatrix m_P; // Fill-reducing permutation PermutationMatrix m_Pinv; // Inverse permutation }; /** * Set control parameter droptol * \param droptol Drop any element whose magnitude is less than this tolerance **/ template void IncompleteLUT::setDroptol(const RealScalar& droptol) { this->m_droptol = droptol; } /** * Set control parameter fillfactor * \param fillfactor This is used to compute the number @p fill_in of largest elements to keep on each row. **/ template void IncompleteLUT::setFillfactor(int fillfactor) { this->m_fillfactor = fillfactor; } template template void IncompleteLUT::analyzePattern(const _MatrixType& amat) { // Compute the Fill-reducing permutation // Since ILUT does not perform any numerical pivoting, // it is highly preferable to keep the diagonal through symmetric permutations. // To this end, let's symmetrize the pattern and perform AMD on it. SparseMatrix mat1 = amat; SparseMatrix mat2 = amat.transpose(); // FIXME for a matrix with nearly symmetric pattern, mat2+mat1 is the appropriate choice. // on the other hand for a really non-symmetric pattern, mat2*mat1 should be preferred... SparseMatrix AtA = mat2 + mat1; AMDOrdering ordering; ordering(AtA,m_P); m_Pinv = m_P.inverse(); // cache the inverse permutation m_analysisIsOk = true; m_factorizationIsOk = false; m_isInitialized = true; } template template void IncompleteLUT::factorize(const _MatrixType& amat) { using std::sqrt; using std::swap; using std::abs; using internal::convert_index; eigen_assert((amat.rows() == amat.cols()) && "The factorization should be done on a square matrix"); Index n = amat.cols(); // Size of the matrix m_lu.resize(n,n); // Declare Working vectors and variables Vector u(n) ; // real values of the row -- maximum size is n -- VectorI ju(n); // column position of the values in u -- maximum size is n VectorI jr(n); // Indicate the position of the nonzero elements in the vector u -- A zero location is indicated by -1 // Apply the fill-reducing permutation eigen_assert(m_analysisIsOk && "You must first call analyzePattern()"); SparseMatrix mat; mat = amat.twistedBy(m_Pinv); // Initialization jr.fill(-1); ju.fill(0); u.fill(0); // number of largest elements to keep in each row: Index fill_in = (amat.nonZeros()*m_fillfactor)/n + 1; if (fill_in > n) fill_in = n; // number of largest nonzero elements to keep in the L and the U part of the current row: Index nnzL = fill_in/2; Index nnzU = nnzL; m_lu.reserve(n * (nnzL + nnzU + 1)); // global loop over the rows of the sparse matrix for (Index ii = 0; ii < n; ii++) { // 1 - copy the lower and the upper part of the row i of mat in the working vector u Index sizeu = 1; // number of nonzero elements in the upper part of the current row Index sizel = 0; // number of nonzero elements in the lower part of the current row ju(ii) = convert_index(ii); u(ii) = 0; jr(ii) = convert_index(ii); RealScalar rownorm = 0; typename FactorType::InnerIterator j_it(mat, ii); // Iterate through the current row ii for (; j_it; ++j_it) { Index k = j_it.index(); if (k < ii) { // copy the lower part ju(sizel) = convert_index(k); u(sizel) = j_it.value(); jr(k) = convert_index(sizel); ++sizel; } else if (k == ii) { u(ii) = j_it.value(); } else { // copy the upper part Index jpos = ii + sizeu; ju(jpos) = convert_index(k); u(jpos) = j_it.value(); jr(k) = convert_index(jpos); ++sizeu; } rownorm += numext::abs2(j_it.value()); } // 2 - detect possible zero row if(rownorm==0) { m_info = NumericalIssue; return; } // Take the 2-norm of the current row as a relative tolerance rownorm = sqrt(rownorm); // 3 - eliminate the previous nonzero rows Index jj = 0; Index len = 0; while (jj < sizel) { // In order to eliminate in the correct order, // we must select first the smallest column index among ju(jj:sizel) Index k; Index minrow = ju.segment(jj,sizel-jj).minCoeff(&k); // k is relative to the segment k += jj; if (minrow != ju(jj)) { // swap the two locations Index j = ju(jj); swap(ju(jj), ju(k)); jr(minrow) = convert_index(jj); jr(j) = convert_index(k); swap(u(jj), u(k)); } // Reset this location jr(minrow) = -1; // Start elimination typename FactorType::InnerIterator ki_it(m_lu, minrow); while (ki_it && ki_it.index() < minrow) ++ki_it; eigen_internal_assert(ki_it && ki_it.col()==minrow); Scalar fact = u(jj) / ki_it.value(); // drop too small elements if(abs(fact) <= m_droptol) { jj++; continue; } // linear combination of the current row ii and the row minrow ++ki_it; for (; ki_it; ++ki_it) { Scalar prod = fact * ki_it.value(); Index j = ki_it.index(); Index jpos = jr(j); if (jpos == -1) // fill-in element { Index newpos; if (j >= ii) // dealing with the upper part { newpos = ii + sizeu; sizeu++; eigen_internal_assert(sizeu<=n); } else // dealing with the lower part { newpos = sizel; sizel++; eigen_internal_assert(sizel<=ii); } ju(newpos) = convert_index(j); u(newpos) = -prod; jr(j) = convert_index(newpos); } else u(jpos) -= prod; } // store the pivot element u(len) = fact; ju(len) = convert_index(minrow); ++len; jj++; } // end of the elimination on the row ii // reset the upper part of the pointer jr to zero for(Index k = 0; k m_droptol * rownorm ) { ++len; u(ii + len) = u(ii + k); ju(ii + len) = ju(ii + k); } } sizeu = len + 1; // +1 to take into account the diagonal element len = (std::min)(sizeu, nnzU); typename Vector::SegmentReturnType uu(u.segment(ii+1, sizeu-1)); typename VectorI::SegmentReturnType juu(ju.segment(ii+1, sizeu-1)); internal::QuickSplit(uu, juu, len); // store the largest elements of the U part for(Index k = ii + 1; k < ii + len; k++) m_lu.insertBackByOuterInnerUnordered(ii,ju(k)) = u(k); } m_lu.finalize(); m_lu.makeCompressed(); m_factorizationIsOk = true; m_info = Success; } } // end namespace Eigen #endif // EIGEN_INCOMPLETE_LUT_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h0000644000176200001440000001530214562417221024405 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011-2014 Gael Guennebaud // Copyright (C) 2012 Désiré Nuentsa-Wakam // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BICGSTAB_H #define EIGEN_BICGSTAB_H namespace Eigen { namespace internal { /** \internal Low-level bi conjugate gradient stabilized algorithm * \param mat The matrix A * \param rhs The right hand side vector b * \param x On input and initial solution, on output the computed solution. * \param precond A preconditioner being able to efficiently solve for an * approximation of Ax=b (regardless of b) * \param iters On input the max number of iteration, on output the number of performed iterations. * \param tol_error On input the tolerance error, on output an estimation of the relative error. * \return false in the case of numerical issue, for example a break down of BiCGSTAB. */ template bool bicgstab(const MatrixType& mat, const Rhs& rhs, Dest& x, const Preconditioner& precond, Index& iters, typename Dest::RealScalar& tol_error) { using std::sqrt; using std::abs; typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix VectorType; RealScalar tol = tol_error; Index maxIters = iters; Index n = mat.cols(); VectorType r = rhs - mat * x; VectorType r0 = r; RealScalar r0_sqnorm = r0.squaredNorm(); RealScalar rhs_sqnorm = rhs.squaredNorm(); if(rhs_sqnorm == 0) { x.setZero(); return true; } Scalar rho = 1; Scalar alpha = 1; Scalar w = 1; VectorType v = VectorType::Zero(n), p = VectorType::Zero(n); VectorType y(n), z(n); VectorType kt(n), ks(n); VectorType s(n), t(n); RealScalar tol2 = tol*tol*rhs_sqnorm; RealScalar eps2 = NumTraits::epsilon()*NumTraits::epsilon(); Index i = 0; Index restarts = 0; while ( r.squaredNorm() > tol2 && iRealScalar(0)) w = t.dot(s) / tmp; else w = Scalar(0); x += alpha * y + w * z; r = s - w * t; ++i; } tol_error = sqrt(r.squaredNorm()/rhs_sqnorm); iters = i; return true; } } template< typename _MatrixType, typename _Preconditioner = DiagonalPreconditioner > class BiCGSTAB; namespace internal { template< typename _MatrixType, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } /** \ingroup IterativeLinearSolvers_Module * \brief A bi conjugate gradient stabilized solver for sparse square problems * * This class allows to solve for A.x = b sparse linear problems using a bi conjugate gradient * stabilized algorithm. The vectors x and b can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix. * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner * * \implsparsesolverconcept * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * The tolerance corresponds to the relative residual error: |Ax-b|/|b| * * \b Performance: when using sparse matrices, best performance is achied for a row-major sparse matrix format. * Moreover, in this case multi-threading can be exploited if the user code is compiled with OpenMP enabled. * See \ref TopicMultiThreading for details. * * This class can be used as the direct solver classes. Here is a typical usage example: * \include BiCGSTAB_simple.cpp * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * BiCGSTAB can also be used in a matrix-free context, see the following \link MatrixfreeSolverExample example \endlink. * * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner */ template< typename _MatrixType, typename _Preconditioner> class BiCGSTAB : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; public: typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; public: /** Default constructor. */ BiCGSTAB() : Base() {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit BiCGSTAB(const EigenBase& A) : Base(A.derived()) {} ~BiCGSTAB() {} /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; bool ret = internal::bicgstab(matrix(), b, x, Base::m_preconditioner, m_iterations, m_error); m_info = (!ret) ? NumericalIssue : m_error <= Base::m_tolerance ? Success : NoConvergence; } protected: }; } // end namespace Eigen #endif // EIGEN_BICGSTAB_H RcppEigen/inst/include/Eigen/src/IterativeLinearSolvers/LeastSquareConjugateGradient.h0000644000176200001440000001626514562417221030747 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_LEAST_SQUARE_CONJUGATE_GRADIENT_H #define EIGEN_LEAST_SQUARE_CONJUGATE_GRADIENT_H namespace Eigen { namespace internal { /** \internal Low-level conjugate gradient algorithm for least-square problems * \param mat The matrix A * \param rhs The right hand side vector b * \param x On input and initial solution, on output the computed solution. * \param precond A preconditioner being able to efficiently solve for an * approximation of A'Ax=b (regardless of b) * \param iters On input the max number of iteration, on output the number of performed iterations. * \param tol_error On input the tolerance error, on output an estimation of the relative error. */ template EIGEN_DONT_INLINE void least_square_conjugate_gradient(const MatrixType& mat, const Rhs& rhs, Dest& x, const Preconditioner& precond, Index& iters, typename Dest::RealScalar& tol_error) { using std::sqrt; using std::abs; typedef typename Dest::RealScalar RealScalar; typedef typename Dest::Scalar Scalar; typedef Matrix VectorType; RealScalar tol = tol_error; Index maxIters = iters; Index m = mat.rows(), n = mat.cols(); VectorType residual = rhs - mat * x; VectorType normal_residual = mat.adjoint() * residual; RealScalar rhsNorm2 = (mat.adjoint()*rhs).squaredNorm(); if(rhsNorm2 == 0) { x.setZero(); iters = 0; tol_error = 0; return; } RealScalar threshold = tol*tol*rhsNorm2; RealScalar residualNorm2 = normal_residual.squaredNorm(); if (residualNorm2 < threshold) { iters = 0; tol_error = sqrt(residualNorm2 / rhsNorm2); return; } VectorType p(n); p = precond.solve(normal_residual); // initial search direction VectorType z(n), tmp(m); RealScalar absNew = numext::real(normal_residual.dot(p)); // the square of the absolute value of r scaled by invM Index i = 0; while(i < maxIters) { tmp.noalias() = mat * p; Scalar alpha = absNew / tmp.squaredNorm(); // the amount we travel on dir x += alpha * p; // update solution residual -= alpha * tmp; // update residual normal_residual = mat.adjoint() * residual; // update residual of the normal equation residualNorm2 = normal_residual.squaredNorm(); if(residualNorm2 < threshold) break; z = precond.solve(normal_residual); // approximately solve for "A'A z = normal_residual" RealScalar absOld = absNew; absNew = numext::real(normal_residual.dot(z)); // update the absolute value of r RealScalar beta = absNew / absOld; // calculate the Gram-Schmidt value used to create the new search direction p = z + beta * p; // update search direction i++; } tol_error = sqrt(residualNorm2 / rhsNorm2); iters = i; } } template< typename _MatrixType, typename _Preconditioner = LeastSquareDiagonalPreconditioner > class LeastSquaresConjugateGradient; namespace internal { template< typename _MatrixType, typename _Preconditioner> struct traits > { typedef _MatrixType MatrixType; typedef _Preconditioner Preconditioner; }; } /** \ingroup IterativeLinearSolvers_Module * \brief A conjugate gradient solver for sparse (or dense) least-square problems * * This class allows to solve for A x = b linear problems using an iterative conjugate gradient algorithm. * The matrix A can be non symmetric and rectangular, but the matrix A' A should be positive-definite to guaranty stability. * Otherwise, the SparseLU or SparseQR classes might be preferable. * The matrix A and the vectors x and b can be either dense or sparse. * * \tparam _MatrixType the type of the matrix A, can be a dense or a sparse matrix. * \tparam _Preconditioner the type of the preconditioner. Default is LeastSquareDiagonalPreconditioner * * \implsparsesolverconcept * * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations() * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations * and NumTraits::epsilon() for the tolerance. * * This class can be used as the direct solver classes. Here is a typical usage example: \code int m=1000000, n = 10000; VectorXd x(n), b(m); SparseMatrix A(m,n); // fill A and b LeastSquaresConjugateGradient > lscg; lscg.compute(A); x = lscg.solve(b); std::cout << "#iterations: " << lscg.iterations() << std::endl; std::cout << "estimated error: " << lscg.error() << std::endl; // update b, and solve again x = lscg.solve(b); \endcode * * By default the iterations start with x=0 as an initial guess of the solution. * One can control the start using the solveWithGuess() method. * * \sa class ConjugateGradient, SparseLU, SparseQR */ template< typename _MatrixType, typename _Preconditioner> class LeastSquaresConjugateGradient : public IterativeSolverBase > { typedef IterativeSolverBase Base; using Base::matrix; using Base::m_error; using Base::m_iterations; using Base::m_info; using Base::m_isInitialized; public: typedef _MatrixType MatrixType; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef _Preconditioner Preconditioner; public: /** Default constructor. */ LeastSquaresConjugateGradient() : Base() {} /** Initialize the solver with matrix \a A for further \c Ax=b solving. * * This constructor is a shortcut for the default constructor followed * by a call to compute(). * * \warning this class stores a reference to the matrix A as well as some * precomputed values that depend on it. Therefore, if \a A is changed * this class becomes invalid. Call compute() to update it with the new * matrix A, or modify a copy of A. */ template explicit LeastSquaresConjugateGradient(const EigenBase& A) : Base(A.derived()) {} ~LeastSquaresConjugateGradient() {} /** \internal */ template void _solve_vector_with_guess_impl(const Rhs& b, Dest& x) const { m_iterations = Base::maxIterations(); m_error = Base::m_tolerance; internal::least_square_conjugate_gradient(matrix(), b, x, Base::m_preconditioner, m_iterations, m_error); m_info = m_error <= Base::m_tolerance ? Success : NoConvergence; } }; } // end namespace Eigen #endif // EIGEN_LEAST_SQUARE_CONJUGATE_GRADIENT_H RcppEigen/inst/include/Eigen/src/CholmodSupport/0000755000176200001440000000000014562417221021332 5ustar liggesusersRcppEigen/inst/include/Eigen/src/CholmodSupport/CholmodSupport.h0000644000176200001440000006134614562417221024477 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CHOLMODSUPPORT_H #define EIGEN_CHOLMODSUPPORT_H #ifndef R_MATRIX_CHOLMOD # define R_MATRIX_CHOLMOD(_NAME_) cholmod_ ## _NAME_ #endif namespace Eigen { namespace internal { template struct cholmod_configure_matrix; template<> struct cholmod_configure_matrix { template static void run(CholmodType& mat) { mat.xtype = CHOLMOD_REAL; mat.dtype = CHOLMOD_DOUBLE; } }; template<> struct cholmod_configure_matrix > { template static void run(CholmodType& mat) { mat.xtype = CHOLMOD_COMPLEX; mat.dtype = CHOLMOD_DOUBLE; } }; // Other scalar types are not yet supported by Cholmod // template<> struct cholmod_configure_matrix { // template // static void run(CholmodType& mat) { // mat.xtype = CHOLMOD_REAL; // mat.dtype = CHOLMOD_SINGLE; // } // }; // // template<> struct cholmod_configure_matrix > { // template // static void run(CholmodType& mat) { // mat.xtype = CHOLMOD_COMPLEX; // mat.dtype = CHOLMOD_SINGLE; // } // }; } // namespace internal /** Wraps the Eigen sparse matrix \a mat into a Cholmod sparse matrix object. * Note that the data are shared. */ template cholmod_sparse viewAsCholmod(Ref > mat) { cholmod_sparse res; res.nzmax = mat.nonZeros(); res.nrow = mat.rows(); res.ncol = mat.cols(); res.p = mat.outerIndexPtr(); res.i = mat.innerIndexPtr(); res.x = mat.valuePtr(); res.z = 0; res.sorted = 1; if(mat.isCompressed()) { res.packed = 1; res.nz = 0; } else { res.packed = 0; res.nz = mat.innerNonZeroPtr(); } res.dtype = 0; res.stype = -1; if (internal::is_same<_StorageIndex,int>::value) { res.itype = CHOLMOD_INT; } // else if (internal::is_same<_StorageIndex,SuiteSparse_long>::value) // { // res.itype = CHOLMOD_LONG; // } else { eigen_assert(false && "Index type not supported yet"); } // setup res.xtype internal::cholmod_configure_matrix<_Scalar>::run(res); res.stype = 0; return res; } template const cholmod_sparse viewAsCholmod(const SparseMatrix<_Scalar,_Options,_Index>& mat) { cholmod_sparse res = viewAsCholmod(Ref >(mat.const_cast_derived())); return res; } template const cholmod_sparse viewAsCholmod(const SparseVector<_Scalar,_Options,_Index>& mat) { cholmod_sparse res = viewAsCholmod(Ref >(mat.const_cast_derived())); return res; } /** Returns a view of the Eigen sparse matrix \a mat as Cholmod sparse matrix. * The data are not copied but shared. */ template cholmod_sparse viewAsCholmod(const SparseSelfAdjointView, UpLo>& mat) { cholmod_sparse res = viewAsCholmod(Ref >(mat.matrix().const_cast_derived())); if(UpLo==Upper) res.stype = 1; if(UpLo==Lower) res.stype = -1; // swap stype for rowmajor matrices (only works for real matrices) EIGEN_STATIC_ASSERT((_Options & RowMajorBit) == 0 || NumTraits<_Scalar>::IsComplex == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES); if(_Options & RowMajorBit) res.stype *=-1; return res; } /** Returns a view of the Eigen \b dense matrix \a mat as Cholmod dense matrix. * The data are not copied but shared. */ template cholmod_dense viewAsCholmod(MatrixBase& mat) { EIGEN_STATIC_ASSERT((internal::traits::Flags&RowMajorBit)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES); typedef typename Derived::Scalar Scalar; cholmod_dense res; res.nrow = mat.rows(); res.ncol = mat.cols(); res.nzmax = res.nrow * res.ncol; res.d = Derived::IsVectorAtCompileTime ? mat.derived().size() : mat.derived().outerStride(); res.x = (void*)(mat.derived().data()); res.z = 0; internal::cholmod_configure_matrix::run(res); return res; } /** Returns a view of the Cholmod sparse matrix \a cm as an Eigen sparse matrix. * The data are not copied but shared. */ template MappedSparseMatrix viewAsEigen(cholmod_sparse& cm) { return MappedSparseMatrix (cm.nrow, cm.ncol, static_cast(cm.p)[cm.ncol], static_cast(cm.p), static_cast(cm.i),static_cast(cm.x) ); } namespace internal { // template specializations for int and long that call the correct cholmod method #define EIGEN_CHOLMOD_SPECIALIZE0(ret, name) \ template inline ret cm_ ## name (cholmod_common &Common) { return R_MATRIX_CHOLMOD(name) (&Common); } #define EIGEN_CHOLMOD_SPECIALIZE1(ret, name, t1, a1) \ template inline ret cm_ ## name (t1& a1, cholmod_common &Common) { return R_MATRIX_CHOLMOD(name) (&a1, &Common); } EIGEN_CHOLMOD_SPECIALIZE0(int, start) EIGEN_CHOLMOD_SPECIALIZE0(int, finish) EIGEN_CHOLMOD_SPECIALIZE1(int, free_factor, cholmod_factor*, L) EIGEN_CHOLMOD_SPECIALIZE1(int, free_dense, cholmod_dense*, X) EIGEN_CHOLMOD_SPECIALIZE1(int, free_sparse, cholmod_sparse*, A) EIGEN_CHOLMOD_SPECIALIZE1(cholmod_factor*, analyze, cholmod_sparse, A) template inline cholmod_dense* cm_solve (int sys, cholmod_factor& L, cholmod_dense& B, cholmod_common &Common) { return R_MATRIX_CHOLMOD(solve) (sys, &L, &B, &Common); } // template<> inline cholmod_dense* cm_solve (int sys, cholmod_factor& L, cholmod_dense& B, cholmod_common &Common) { return R_MATRIX_CHOLMOD(l_solve) (sys, &L, &B, &Common); } template inline cholmod_sparse* cm_spsolve (int sys, cholmod_factor& L, cholmod_sparse& B, cholmod_common &Common) { return R_MATRIX_CHOLMOD(spsolve) (sys, &L, &B, &Common); } // template<> inline cholmod_sparse* cm_spsolve (int sys, cholmod_factor& L, cholmod_sparse& B, cholmod_common &Common) { return R_MATRIX_CHOLMOD(l_spsolve) (sys, &L, &B, &Common); } template inline int cm_factorize_p (cholmod_sparse* A, double beta[2], _StorageIndex* fset, std::size_t fsize, cholmod_factor* L, cholmod_common &Common) { return R_MATRIX_CHOLMOD(factorize_p) (A, beta, fset, fsize, L, &Common); } // template<> // inline int cm_factorize_p (cholmod_sparse* A, double beta[2], SuiteSparse_long* fset, std::size_t fsize, cholmod_factor* L, cholmod_common &Common) { return R_MATRIX_CHOLMOD(l_factorize_p) (A, beta, fset, fsize, L, &Common); } #undef EIGEN_CHOLMOD_SPECIALIZE0 #undef EIGEN_CHOLMOD_SPECIALIZE1 } // namespace internal enum CholmodMode { CholmodAuto, CholmodSimplicialLLt, CholmodSupernodalLLt, CholmodLDLt }; /** \ingroup CholmodSupport_Module * \class CholmodBase * \brief The base class for the direct Cholesky factorization of Cholmod * \sa class CholmodSupernodalLLT, class CholmodSimplicialLDLT, class CholmodSimplicialLLT */ template class CholmodBase : public SparseSolverBase { protected: typedef SparseSolverBase Base; using Base::derived; using Base::m_isInitialized; public: typedef _MatrixType MatrixType; enum { UpLo = _UpLo }; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef MatrixType CholMatrixType; typedef typename MatrixType::StorageIndex StorageIndex; enum { ColsAtCompileTime = MatrixType::ColsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime }; public: CholmodBase() : m_cholmodFactor(0), m_info(Success), m_factorizationIsOk(false), m_analysisIsOk(false) { EIGEN_STATIC_ASSERT((internal::is_same::value), CHOLMOD_SUPPORTS_DOUBLE_PRECISION_ONLY); m_shiftOffset[0] = m_shiftOffset[1] = 0.0; internal::cm_start(m_cholmod); } explicit CholmodBase(const MatrixType& matrix) : m_cholmodFactor(0), m_info(Success), m_factorizationIsOk(false), m_analysisIsOk(false) { EIGEN_STATIC_ASSERT((internal::is_same::value), CHOLMOD_SUPPORTS_DOUBLE_PRECISION_ONLY); m_shiftOffset[0] = m_shiftOffset[1] = 0.0; internal::cm_start(m_cholmod); compute(matrix); } ~CholmodBase() { if(m_cholmodFactor) internal::cm_free_factor(m_cholmodFactor, m_cholmod); internal::cm_finish(m_cholmod); } inline StorageIndex cols() const { return internal::convert_index(m_cholmodFactor->n); } inline StorageIndex rows() const { return internal::convert_index(m_cholmodFactor->n); } /** \brief Reports whether previous computation was successful. * * \returns \c Success if computation was successful, * \c NumericalIssue if the matrix.appears to be negative. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "Decomposition is not initialized."); return m_info; } /** Computes the sparse Cholesky decomposition of \a matrix */ Derived& compute(const MatrixType& matrix) { analyzePattern(matrix); factorize(matrix); return derived(); } /** Performs a symbolic decomposition on the sparsity pattern of \a matrix. * * This function is particularly useful when solving for several problems having the same structure. * * \sa factorize() */ void analyzePattern(const MatrixType& matrix) { if(m_cholmodFactor) { internal::cm_free_factor(m_cholmodFactor, m_cholmod); m_cholmodFactor = 0; } cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView()); m_cholmodFactor = internal::cm_analyze(A, m_cholmod); this->m_isInitialized = true; this->m_info = Success; m_analysisIsOk = true; m_factorizationIsOk = false; } /** Performs a numeric decomposition of \a matrix * * The given matrix must have the same sparsity pattern as the matrix on which the symbolic decomposition has been performed. * * \sa analyzePattern() */ void factorize(const MatrixType& matrix) { eigen_assert(m_analysisIsOk && "You must first call analyzePattern()"); cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView()); internal::cm_factorize_p(&A, m_shiftOffset, 0, 0, m_cholmodFactor, m_cholmod); // If the factorization failed, minor is the column at which it did. On success minor == n. this->m_info = (m_cholmodFactor->minor == m_cholmodFactor->n ? Success : NumericalIssue); m_factorizationIsOk = true; } /** Returns a reference to the Cholmod's configuration structure to get a full control over the performed operations. * See the Cholmod user guide for details. */ cholmod_common& cholmod() { return m_cholmod; } #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal */ template void _solve_impl(const MatrixBase &b, MatrixBase &dest) const { eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()"); const Index size = m_cholmodFactor->n; EIGEN_UNUSED_VARIABLE(size); eigen_assert(size==b.rows()); // Cholmod needs column-major storage without inner-stride, which corresponds to the default behavior of Ref. Ref > b_ref(b.derived()); cholmod_dense b_cd = viewAsCholmod(b_ref); cholmod_dense* x_cd = internal::cm_solve(CHOLMOD_A, *m_cholmodFactor, b_cd, m_cholmod); if(!x_cd) { this->m_info = NumericalIssue; return; } // TODO optimize this copy by swapping when possible (be careful with alignment, etc.) // NOTE Actually, the copy can be avoided by calling cholmod_solve2 instead of cholmod_solve dest = Matrix::Map(reinterpret_cast(x_cd->x),b.rows(),b.cols()); internal::cm_free_dense(x_cd, m_cholmod); } /** \internal */ template void _solve_impl(const SparseMatrixBase &b, SparseMatrixBase &dest) const { eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()"); const Index size = m_cholmodFactor->n; EIGEN_UNUSED_VARIABLE(size); eigen_assert(size==b.rows()); // note: cs stands for Cholmod Sparse Ref > b_ref(b.const_cast_derived()); cholmod_sparse b_cs = viewAsCholmod(b_ref); cholmod_sparse* x_cs = internal::cm_spsolve(CHOLMOD_A, *m_cholmodFactor, b_cs, m_cholmod); if(!x_cs) { this->m_info = NumericalIssue; return; } // TODO optimize this copy by swapping when possible (be careful with alignment, etc.) // NOTE cholmod_spsolve in fact just calls the dense solver for blocks of 4 columns at a time (similar to Eigen's sparse solver) dest.derived() = viewAsEigen(*x_cs); internal::cm_free_sparse(x_cs, m_cholmod); } #endif // EIGEN_PARSED_BY_DOXYGEN /** Sets the shift parameter that will be used to adjust the diagonal coefficients during the numerical factorization. * * During the numerical factorization, an offset term is added to the diagonal coefficients:\n * \c d_ii = \a offset + \c d_ii * * The default is \a offset=0. * * \returns a reference to \c *this. */ Derived& setShift(const RealScalar& offset) { m_shiftOffset[0] = double(offset); return derived(); } /** \returns the determinant of the underlying matrix from the current factorization */ Scalar determinant() const { using std::exp; return exp(logDeterminant()); } /** \returns the log determinant of the underlying matrix from the current factorization */ Scalar logDeterminant() const { using std::log; using numext::real; eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()"); RealScalar logDet = 0; Scalar *x = static_cast(m_cholmodFactor->x); if (m_cholmodFactor->is_super) { // Supernodal factorization stored as a packed list of dense column-major blocs, // as described by the following structure: // super[k] == index of the first column of the j-th super node StorageIndex *super = static_cast(m_cholmodFactor->super); // pi[k] == offset to the description of row indices StorageIndex *pi = static_cast(m_cholmodFactor->pi); // px[k] == offset to the respective dense block StorageIndex *px = static_cast(m_cholmodFactor->px); Index nb_super_nodes = m_cholmodFactor->nsuper; for (Index k=0; k < nb_super_nodes; ++k) { StorageIndex ncols = super[k + 1] - super[k]; StorageIndex nrows = pi[k + 1] - pi[k]; Map, 0, InnerStride<> > sk(x + px[k], ncols, InnerStride<>(nrows+1)); logDet += sk.real().log().sum(); } } else { // Simplicial factorization stored as standard CSC matrix. StorageIndex *p = static_cast(m_cholmodFactor->p); Index size = m_cholmodFactor->n; for (Index k=0; kis_ll) logDet *= 2.0; return logDet; }; template void dumpMemory(Stream& /*s*/) {} protected: mutable cholmod_common m_cholmod; cholmod_factor* m_cholmodFactor; double m_shiftOffset[2]; mutable ComputationInfo m_info; int m_factorizationIsOk; int m_analysisIsOk; }; /** \ingroup CholmodSupport_Module * \class CholmodSimplicialLLT * \brief A simplicial direct Cholesky (LLT) factorization and solver based on Cholmod * * This class allows to solve for A.X = B sparse linear problems via a simplicial LL^T Cholesky factorization * using the Cholmod library. * This simplicial variant is equivalent to Eigen's built-in SimplicialLLT class. Therefore, it has little practical interest. * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices * X and B can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * * \implsparsesolverconcept * * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed. * * \warning Only double precision real and complex scalar types are supported by Cholmod. * * \sa \ref TutorialSparseSolverConcept, class CholmodSupernodalLLT, class SimplicialLLT */ template class CholmodSimplicialLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLLT<_MatrixType, _UpLo> > { typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLLT> Base; using Base::m_cholmod; public: typedef _MatrixType MatrixType; CholmodSimplicialLLT() : Base() { init(); } CholmodSimplicialLLT(const MatrixType& matrix) : Base() { init(); this->compute(matrix); } ~CholmodSimplicialLLT() {} protected: void init() { m_cholmod.final_asis = 0; m_cholmod.supernodal = CHOLMOD_SIMPLICIAL; m_cholmod.final_ll = 1; } }; /** \ingroup CholmodSupport_Module * \class CholmodSimplicialLDLT * \brief A simplicial direct Cholesky (LDLT) factorization and solver based on Cholmod * * This class allows to solve for A.X = B sparse linear problems via a simplicial LDL^T Cholesky factorization * using the Cholmod library. * This simplicial variant is equivalent to Eigen's built-in SimplicialLDLT class. Therefore, it has little practical interest. * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices * X and B can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * * \implsparsesolverconcept * * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed. * * \warning Only double precision real and complex scalar types are supported by Cholmod. * * \sa \ref TutorialSparseSolverConcept, class CholmodSupernodalLLT, class SimplicialLDLT */ template class CholmodSimplicialLDLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLDLT<_MatrixType, _UpLo> > { typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLDLT> Base; using Base::m_cholmod; public: typedef _MatrixType MatrixType; CholmodSimplicialLDLT() : Base() { init(); } CholmodSimplicialLDLT(const MatrixType& matrix) : Base() { init(); this->compute(matrix); } ~CholmodSimplicialLDLT() {} protected: void init() { m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_SIMPLICIAL; } }; /** \ingroup CholmodSupport_Module * \class CholmodSupernodalLLT * \brief A supernodal Cholesky (LLT) factorization and solver based on Cholmod * * This class allows to solve for A.X = B sparse linear problems via a supernodal LL^T Cholesky factorization * using the Cholmod library. * This supernodal variant performs best on dense enough problems, e.g., 3D FEM, or very high order 2D FEM. * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices * X and B can be either dense or sparse. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * * \implsparsesolverconcept * * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed. * * \warning Only double precision real and complex scalar types are supported by Cholmod. * * \sa \ref TutorialSparseSolverConcept */ template class CholmodSupernodalLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSupernodalLLT<_MatrixType, _UpLo> > { typedef CholmodBase<_MatrixType, _UpLo, CholmodSupernodalLLT> Base; using Base::m_cholmod; public: typedef _MatrixType MatrixType; CholmodSupernodalLLT() : Base() { init(); } CholmodSupernodalLLT(const MatrixType& matrix) : Base() { init(); this->compute(matrix); } ~CholmodSupernodalLLT() {} protected: void init() { m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_SUPERNODAL; } }; /** \ingroup CholmodSupport_Module * \class CholmodDecomposition * \brief A general Cholesky factorization and solver based on Cholmod * * This class allows to solve for A.X = B sparse linear problems via a LL^T or LDL^T Cholesky factorization * using the Cholmod library. The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices * X and B can be either dense or sparse. * * This variant permits to change the underlying Cholesky method at runtime. * On the other hand, it does not provide access to the result of the factorization. * The default is to let Cholmod automatically choose between a simplicial and supernodal factorization. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * * \implsparsesolverconcept * * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed. * * \warning Only double precision real and complex scalar types are supported by Cholmod. * * \sa \ref TutorialSparseSolverConcept */ template class CholmodDecomposition : public CholmodBase<_MatrixType, _UpLo, CholmodDecomposition<_MatrixType, _UpLo> > { typedef CholmodBase<_MatrixType, _UpLo, CholmodDecomposition> Base; using Base::m_cholmod; public: typedef _MatrixType MatrixType; CholmodDecomposition() : Base() { init(); } CholmodDecomposition(const MatrixType& matrix) : Base() { init(); this->compute(matrix); } ~CholmodDecomposition() {} void setMode(CholmodMode mode) { switch(mode) { case CholmodAuto: m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_AUTO; break; case CholmodSimplicialLLt: m_cholmod.final_asis = 0; m_cholmod.supernodal = CHOLMOD_SIMPLICIAL; m_cholmod.final_ll = 1; break; case CholmodSupernodalLLt: m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_SUPERNODAL; break; case CholmodLDLt: m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_SIMPLICIAL; break; default: break; } } protected: void init() { m_cholmod.final_asis = 1; m_cholmod.supernodal = CHOLMOD_AUTO; } }; } // end namespace Eigen #endif // EIGEN_CHOLMODSUPPORT_H RcppEigen/inst/include/Eigen/src/SparseCholesky/0000755000176200001440000000000014562417221021307 5ustar liggesusersRcppEigen/inst/include/Eigen/src/SparseCholesky/SimplicialCholesky_impl.h0000644000176200001440000001330614562417221026274 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2012 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* NOTE: these functions have been adapted from the LDL library: LDL Copyright (c) 2005 by Timothy A. Davis. All Rights Reserved. The author of LDL, Timothy A. Davis., has executed a license with Google LLC to permit distribution of this code and derivative works as part of Eigen under the Mozilla Public License v. 2.0, as stated at the top of this file. */ #ifndef EIGEN_SIMPLICIAL_CHOLESKY_IMPL_H #define EIGEN_SIMPLICIAL_CHOLESKY_IMPL_H namespace Eigen { template void SimplicialCholeskyBase::analyzePattern_preordered(const CholMatrixType& ap, bool doLDLT) { const StorageIndex size = StorageIndex(ap.rows()); m_matrix.resize(size, size); m_parent.resize(size); m_nonZerosPerCol.resize(size); ei_declare_aligned_stack_constructed_variable(StorageIndex, tags, size, 0); for(StorageIndex k = 0; k < size; ++k) { /* L(k,:) pattern: all nodes reachable in etree from nz in A(0:k-1,k) */ m_parent[k] = -1; /* parent of k is not yet known */ tags[k] = k; /* mark node k as visited */ m_nonZerosPerCol[k] = 0; /* count of nonzeros in column k of L */ for(typename CholMatrixType::InnerIterator it(ap,k); it; ++it) { StorageIndex i = it.index(); if(i < k) { /* follow path from i to root of etree, stop at flagged node */ for(; tags[i] != k; i = m_parent[i]) { /* find parent of i if not yet determined */ if (m_parent[i] == -1) m_parent[i] = k; m_nonZerosPerCol[i]++; /* L (k,i) is nonzero */ tags[i] = k; /* mark i as visited */ } } } } /* construct Lp index array from m_nonZerosPerCol column counts */ StorageIndex* Lp = m_matrix.outerIndexPtr(); Lp[0] = 0; for(StorageIndex k = 0; k < size; ++k) Lp[k+1] = Lp[k] + m_nonZerosPerCol[k] + (doLDLT ? 0 : 1); m_matrix.resizeNonZeros(Lp[size]); m_isInitialized = true; m_info = Success; m_analysisIsOk = true; m_factorizationIsOk = false; } template template void SimplicialCholeskyBase::factorize_preordered(const CholMatrixType& ap) { using std::sqrt; eigen_assert(m_analysisIsOk && "You must first call analyzePattern()"); eigen_assert(ap.rows()==ap.cols()); eigen_assert(m_parent.size()==ap.rows()); eigen_assert(m_nonZerosPerCol.size()==ap.rows()); const StorageIndex size = StorageIndex(ap.rows()); const StorageIndex* Lp = m_matrix.outerIndexPtr(); StorageIndex* Li = m_matrix.innerIndexPtr(); Scalar* Lx = m_matrix.valuePtr(); ei_declare_aligned_stack_constructed_variable(Scalar, y, size, 0); ei_declare_aligned_stack_constructed_variable(StorageIndex, pattern, size, 0); ei_declare_aligned_stack_constructed_variable(StorageIndex, tags, size, 0); bool ok = true; m_diag.resize(DoLDLT ? size : 0); for(StorageIndex k = 0; k < size; ++k) { // compute nonzero pattern of kth row of L, in topological order y[k] = Scalar(0); // Y(0:k) is now all zero StorageIndex top = size; // stack for pattern is empty tags[k] = k; // mark node k as visited m_nonZerosPerCol[k] = 0; // count of nonzeros in column k of L for(typename CholMatrixType::InnerIterator it(ap,k); it; ++it) { StorageIndex i = it.index(); if(i <= k) { y[i] += numext::conj(it.value()); /* scatter A(i,k) into Y (sum duplicates) */ Index len; for(len = 0; tags[i] != k; i = m_parent[i]) { pattern[len++] = i; /* L(k,i) is nonzero */ tags[i] = k; /* mark i as visited */ } while(len > 0) pattern[--top] = pattern[--len]; } } /* compute numerical values kth row of L (a sparse triangular solve) */ RealScalar d = numext::real(y[k]) * m_shiftScale + m_shiftOffset; // get D(k,k), apply the shift function, and clear Y(k) y[k] = Scalar(0); for(; top < size; ++top) { Index i = pattern[top]; /* pattern[top:n-1] is pattern of L(:,k) */ Scalar yi = y[i]; /* get and clear Y(i) */ y[i] = Scalar(0); /* the nonzero entry L(k,i) */ Scalar l_ki; if(DoLDLT) l_ki = yi / numext::real(m_diag[i]); else yi = l_ki = yi / Lx[Lp[i]]; Index p2 = Lp[i] + m_nonZerosPerCol[i]; Index p; for(p = Lp[i] + (DoLDLT ? 0 : 1); p < p2; ++p) y[Li[p]] -= numext::conj(Lx[p]) * yi; d -= numext::real(l_ki * numext::conj(yi)); Li[p] = k; /* store L(k,i) in column form of L */ Lx[p] = l_ki; ++m_nonZerosPerCol[i]; /* increment count of nonzeros in col i */ } if(DoLDLT) { m_diag[k] = d; if(d == RealScalar(0)) { ok = false; /* failure, D(k,k) is zero */ break; } } else { Index p = Lp[k] + m_nonZerosPerCol[k]++; Li[p] = k ; /* store L(k,k) = sqrt (d) in column k */ if(d <= RealScalar(0)) { ok = false; /* failure, matrix is not positive definite */ break; } Lx[p] = sqrt(d) ; } } m_info = ok ? Success : NumericalIssue; m_factorizationIsOk = true; } } // end namespace Eigen #endif // EIGEN_SIMPLICIAL_CHOLESKY_IMPL_H RcppEigen/inst/include/Eigen/src/SparseCholesky/SimplicialCholesky.h0000644000176200001440000005723014562417221025257 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2012 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SIMPLICIAL_CHOLESKY_H #define EIGEN_SIMPLICIAL_CHOLESKY_H namespace Eigen { enum SimplicialCholeskyMode { SimplicialCholeskyLLT, SimplicialCholeskyLDLT }; namespace internal { template struct simplicial_cholesky_grab_input { typedef CholMatrixType const * ConstCholMatrixPtr; static void run(const InputMatrixType& input, ConstCholMatrixPtr &pmat, CholMatrixType &tmp) { tmp = input; pmat = &tmp; } }; template struct simplicial_cholesky_grab_input { typedef MatrixType const * ConstMatrixPtr; static void run(const MatrixType& input, ConstMatrixPtr &pmat, MatrixType &/*tmp*/) { pmat = &input; } }; } // end namespace internal /** \ingroup SparseCholesky_Module * \brief A base class for direct sparse Cholesky factorizations * * This is a base class for LL^T and LDL^T Cholesky factorizations of sparse matrices that are * selfadjoint and positive definite. These factorizations allow for solving A.X = B where * X and B can be either dense or sparse. * * In order to reduce the fill-in, a symmetric permutation P is applied prior to the factorization * such that the factorized matrix is P A P^-1. * * \tparam Derived the type of the derived class, that is the actual factorization type. * */ template class SimplicialCholeskyBase : public SparseSolverBase { typedef SparseSolverBase Base; using Base::m_isInitialized; public: typedef typename internal::traits::MatrixType MatrixType; typedef typename internal::traits::OrderingType OrderingType; enum { UpLo = internal::traits::UpLo }; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef CholMatrixType const * ConstCholMatrixPtr; typedef Matrix VectorType; typedef Matrix VectorI; enum { ColsAtCompileTime = MatrixType::ColsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime }; public: using Base::derived; /** Default constructor */ SimplicialCholeskyBase() : m_info(Success), m_factorizationIsOk(false), m_analysisIsOk(false), m_shiftOffset(0), m_shiftScale(1) {} explicit SimplicialCholeskyBase(const MatrixType& matrix) : m_info(Success), m_factorizationIsOk(false), m_analysisIsOk(false), m_shiftOffset(0), m_shiftScale(1) { derived().compute(matrix); } ~SimplicialCholeskyBase() { } Derived& derived() { return *static_cast(this); } const Derived& derived() const { return *static_cast(this); } inline Index cols() const { return m_matrix.cols(); } inline Index rows() const { return m_matrix.rows(); } /** \brief Reports whether previous computation was successful. * * \returns \c Success if computation was successful, * \c NumericalIssue if the matrix.appears to be negative. */ ComputationInfo info() const { eigen_assert(m_isInitialized && "Decomposition is not initialized."); return m_info; } /** \returns the permutation P * \sa permutationPinv() */ const PermutationMatrix& permutationP() const { return m_P; } /** \returns the inverse P^-1 of the permutation P * \sa permutationP() */ const PermutationMatrix& permutationPinv() const { return m_Pinv; } /** Sets the shift parameters that will be used to adjust the diagonal coefficients during the numerical factorization. * * During the numerical factorization, the diagonal coefficients are transformed by the following linear model:\n * \c d_ii = \a offset + \a scale * \c d_ii * * The default is the identity transformation with \a offset=0, and \a scale=1. * * \returns a reference to \c *this. */ Derived& setShift(const RealScalar& offset, const RealScalar& scale = 1) { m_shiftOffset = offset; m_shiftScale = scale; return derived(); } #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal */ template void dumpMemory(Stream& s) { int total = 0; s << " L: " << ((total+=(m_matrix.cols()+1) * sizeof(int) + m_matrix.nonZeros()*(sizeof(int)+sizeof(Scalar))) >> 20) << "Mb" << "\n"; s << " diag: " << ((total+=m_diag.size() * sizeof(Scalar)) >> 20) << "Mb" << "\n"; s << " tree: " << ((total+=m_parent.size() * sizeof(int)) >> 20) << "Mb" << "\n"; s << " nonzeros: " << ((total+=m_nonZerosPerCol.size() * sizeof(int)) >> 20) << "Mb" << "\n"; s << " perm: " << ((total+=m_P.size() * sizeof(int)) >> 20) << "Mb" << "\n"; s << " perm^-1: " << ((total+=m_Pinv.size() * sizeof(int)) >> 20) << "Mb" << "\n"; s << " TOTAL: " << (total>> 20) << "Mb" << "\n"; } /** \internal */ template void _solve_impl(const MatrixBase &b, MatrixBase &dest) const { eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()"); eigen_assert(m_matrix.rows()==b.rows()); if(m_info!=Success) return; if(m_P.size()>0) dest = m_P * b; else dest = b; if(m_matrix.nonZeros()>0) // otherwise L==I derived().matrixL().solveInPlace(dest); if(m_diag.size()>0) dest = m_diag.asDiagonal().inverse() * dest; if (m_matrix.nonZeros()>0) // otherwise U==I derived().matrixU().solveInPlace(dest); if(m_P.size()>0) dest = m_Pinv * dest; } template void _solve_impl(const SparseMatrixBase &b, SparseMatrixBase &dest) const { internal::solve_sparse_through_dense_panels(derived(), b, dest); } #endif // EIGEN_PARSED_BY_DOXYGEN protected: /** Computes the sparse Cholesky decomposition of \a matrix */ template void compute(const MatrixType& matrix) { eigen_assert(matrix.rows()==matrix.cols()); Index size = matrix.cols(); CholMatrixType tmp(size,size); ConstCholMatrixPtr pmat; ordering(matrix, pmat, tmp); analyzePattern_preordered(*pmat, DoLDLT); factorize_preordered(*pmat); } template void factorize(const MatrixType& a) { eigen_assert(a.rows()==a.cols()); Index size = a.cols(); CholMatrixType tmp(size,size); ConstCholMatrixPtr pmat; if(m_P.size() == 0 && (int(UpLo) & int(Upper)) == Upper) { // If there is no ordering, try to directly use the input matrix without any copy internal::simplicial_cholesky_grab_input::run(a, pmat, tmp); } else { tmp.template selfadjointView() = a.template selfadjointView().twistedBy(m_P); pmat = &tmp; } factorize_preordered(*pmat); } template void factorize_preordered(const CholMatrixType& a); void analyzePattern(const MatrixType& a, bool doLDLT) { eigen_assert(a.rows()==a.cols()); Index size = a.cols(); CholMatrixType tmp(size,size); ConstCholMatrixPtr pmat; ordering(a, pmat, tmp); analyzePattern_preordered(*pmat,doLDLT); } void analyzePattern_preordered(const CholMatrixType& a, bool doLDLT); void ordering(const MatrixType& a, ConstCholMatrixPtr &pmat, CholMatrixType& ap); /** keeps off-diagonal entries; drops diagonal entries */ struct keep_diag { inline bool operator() (const Index& row, const Index& col, const Scalar&) const { return row!=col; } }; mutable ComputationInfo m_info; bool m_factorizationIsOk; bool m_analysisIsOk; CholMatrixType m_matrix; VectorType m_diag; // the diagonal coefficients (LDLT mode) VectorI m_parent; // elimination tree VectorI m_nonZerosPerCol; PermutationMatrix m_P; // the permutation PermutationMatrix m_Pinv; // the inverse permutation RealScalar m_shiftOffset; RealScalar m_shiftScale; }; template > class SimplicialLLT; template > class SimplicialLDLT; template > class SimplicialCholesky; namespace internal { template struct traits > { typedef _MatrixType MatrixType; typedef _Ordering OrderingType; enum { UpLo = _UpLo }; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef TriangularView MatrixL; typedef TriangularView MatrixU; static inline MatrixL getL(const CholMatrixType& m) { return MatrixL(m); } static inline MatrixU getU(const CholMatrixType& m) { return MatrixU(m.adjoint()); } }; template struct traits > { typedef _MatrixType MatrixType; typedef _Ordering OrderingType; enum { UpLo = _UpLo }; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef TriangularView MatrixL; typedef TriangularView MatrixU; static inline MatrixL getL(const CholMatrixType& m) { return MatrixL(m); } static inline MatrixU getU(const CholMatrixType& m) { return MatrixU(m.adjoint()); } }; template struct traits > { typedef _MatrixType MatrixType; typedef _Ordering OrderingType; enum { UpLo = _UpLo }; }; } /** \ingroup SparseCholesky_Module * \class SimplicialLLT * \brief A direct sparse LLT Cholesky factorizations * * This class provides a LL^T Cholesky factorizations of sparse matrices that are * selfadjoint and positive definite. The factorization allows for solving A.X = B where * X and B can be either dense or sparse. * * In order to reduce the fill-in, a symmetric permutation P is applied prior to the factorization * such that the factorized matrix is P A P^-1. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * \tparam _Ordering The ordering method to use, either AMDOrdering<> or NaturalOrdering<>. Default is AMDOrdering<> * * \implsparsesolverconcept * * \sa class SimplicialLDLT, class AMDOrdering, class NaturalOrdering */ template class SimplicialLLT : public SimplicialCholeskyBase > { public: typedef _MatrixType MatrixType; enum { UpLo = _UpLo }; typedef SimplicialCholeskyBase Base; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef Matrix VectorType; typedef internal::traits Traits; typedef typename Traits::MatrixL MatrixL; typedef typename Traits::MatrixU MatrixU; public: /** Default constructor */ SimplicialLLT() : Base() {} /** Constructs and performs the LLT factorization of \a matrix */ explicit SimplicialLLT(const MatrixType& matrix) : Base(matrix) {} /** \returns an expression of the factor L */ inline const MatrixL matrixL() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial LLT not factorized"); return Traits::getL(Base::m_matrix); } /** \returns an expression of the factor U (= L^*) */ inline const MatrixU matrixU() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial LLT not factorized"); return Traits::getU(Base::m_matrix); } /** Computes the sparse Cholesky decomposition of \a matrix */ SimplicialLLT& compute(const MatrixType& matrix) { Base::template compute(matrix); return *this; } /** Performs a symbolic decomposition on the sparcity of \a matrix. * * This function is particularly useful when solving for several problems having the same structure. * * \sa factorize() */ void analyzePattern(const MatrixType& a) { Base::analyzePattern(a, false); } /** Performs a numeric decomposition of \a matrix * * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed. * * \sa analyzePattern() */ void factorize(const MatrixType& a) { Base::template factorize(a); } /** \returns the determinant of the underlying matrix from the current factorization */ Scalar determinant() const { Scalar detL = Base::m_matrix.diagonal().prod(); return numext::abs2(detL); } }; /** \ingroup SparseCholesky_Module * \class SimplicialLDLT * \brief A direct sparse LDLT Cholesky factorizations without square root. * * This class provides a LDL^T Cholesky factorizations without square root of sparse matrices that are * selfadjoint and positive definite. The factorization allows for solving A.X = B where * X and B can be either dense or sparse. * * In order to reduce the fill-in, a symmetric permutation P is applied prior to the factorization * such that the factorized matrix is P A P^-1. * * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<> * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower * or Upper. Default is Lower. * \tparam _Ordering The ordering method to use, either AMDOrdering<> or NaturalOrdering<>. Default is AMDOrdering<> * * \implsparsesolverconcept * * \sa class SimplicialLLT, class AMDOrdering, class NaturalOrdering */ template class SimplicialLDLT : public SimplicialCholeskyBase > { public: typedef _MatrixType MatrixType; enum { UpLo = _UpLo }; typedef SimplicialCholeskyBase Base; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef Matrix VectorType; typedef internal::traits Traits; typedef typename Traits::MatrixL MatrixL; typedef typename Traits::MatrixU MatrixU; public: /** Default constructor */ SimplicialLDLT() : Base() {} /** Constructs and performs the LLT factorization of \a matrix */ explicit SimplicialLDLT(const MatrixType& matrix) : Base(matrix) {} /** \returns a vector expression of the diagonal D */ inline const VectorType vectorD() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial LDLT not factorized"); return Base::m_diag; } /** \returns an expression of the factor L */ inline const MatrixL matrixL() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial LDLT not factorized"); return Traits::getL(Base::m_matrix); } /** \returns an expression of the factor U (= L^*) */ inline const MatrixU matrixU() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial LDLT not factorized"); return Traits::getU(Base::m_matrix); } /** Computes the sparse Cholesky decomposition of \a matrix */ SimplicialLDLT& compute(const MatrixType& matrix) { Base::template compute(matrix); return *this; } /** Performs a symbolic decomposition on the sparcity of \a matrix. * * This function is particularly useful when solving for several problems having the same structure. * * \sa factorize() */ void analyzePattern(const MatrixType& a) { Base::analyzePattern(a, true); } /** Performs a numeric decomposition of \a matrix * * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed. * * \sa analyzePattern() */ void factorize(const MatrixType& a) { Base::template factorize(a); } /** \returns the determinant of the underlying matrix from the current factorization */ Scalar determinant() const { return Base::m_diag.prod(); } }; /** \deprecated use SimplicialLDLT or class SimplicialLLT * \ingroup SparseCholesky_Module * \class SimplicialCholesky * * \sa class SimplicialLDLT, class SimplicialLLT */ template class SimplicialCholesky : public SimplicialCholeskyBase > { public: typedef _MatrixType MatrixType; enum { UpLo = _UpLo }; typedef SimplicialCholeskyBase Base; typedef typename MatrixType::Scalar Scalar; typedef typename MatrixType::RealScalar RealScalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef SparseMatrix CholMatrixType; typedef Matrix VectorType; typedef internal::traits Traits; typedef internal::traits > LDLTTraits; typedef internal::traits > LLTTraits; public: SimplicialCholesky() : Base(), m_LDLT(true) {} explicit SimplicialCholesky(const MatrixType& matrix) : Base(), m_LDLT(true) { compute(matrix); } SimplicialCholesky& setMode(SimplicialCholeskyMode mode) { switch(mode) { case SimplicialCholeskyLLT: m_LDLT = false; break; case SimplicialCholeskyLDLT: m_LDLT = true; break; default: break; } return *this; } inline const VectorType vectorD() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial Cholesky not factorized"); return Base::m_diag; } inline const CholMatrixType rawMatrix() const { eigen_assert(Base::m_factorizationIsOk && "Simplicial Cholesky not factorized"); return Base::m_matrix; } /** Computes the sparse Cholesky decomposition of \a matrix */ SimplicialCholesky& compute(const MatrixType& matrix) { if(m_LDLT) Base::template compute(matrix); else Base::template compute(matrix); return *this; } /** Performs a symbolic decomposition on the sparcity of \a matrix. * * This function is particularly useful when solving for several problems having the same structure. * * \sa factorize() */ void analyzePattern(const MatrixType& a) { Base::analyzePattern(a, m_LDLT); } /** Performs a numeric decomposition of \a matrix * * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed. * * \sa analyzePattern() */ void factorize(const MatrixType& a) { if(m_LDLT) Base::template factorize(a); else Base::template factorize(a); } /** \internal */ template void _solve_impl(const MatrixBase &b, MatrixBase &dest) const { eigen_assert(Base::m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()"); eigen_assert(Base::m_matrix.rows()==b.rows()); if(Base::m_info!=Success) return; if(Base::m_P.size()>0) dest = Base::m_P * b; else dest = b; if(Base::m_matrix.nonZeros()>0) // otherwise L==I { if(m_LDLT) LDLTTraits::getL(Base::m_matrix).solveInPlace(dest); else LLTTraits::getL(Base::m_matrix).solveInPlace(dest); } if(Base::m_diag.size()>0) dest = Base::m_diag.real().asDiagonal().inverse() * dest; if (Base::m_matrix.nonZeros()>0) // otherwise I==I { if(m_LDLT) LDLTTraits::getU(Base::m_matrix).solveInPlace(dest); else LLTTraits::getU(Base::m_matrix).solveInPlace(dest); } if(Base::m_P.size()>0) dest = Base::m_Pinv * dest; } /** \internal */ template void _solve_impl(const SparseMatrixBase &b, SparseMatrixBase &dest) const { internal::solve_sparse_through_dense_panels(*this, b, dest); } Scalar determinant() const { if(m_LDLT) { return Base::m_diag.prod(); } else { Scalar detL = Diagonal(Base::m_matrix).prod(); return numext::abs2(detL); } } protected: bool m_LDLT; }; template void SimplicialCholeskyBase::ordering(const MatrixType& a, ConstCholMatrixPtr &pmat, CholMatrixType& ap) { eigen_assert(a.rows()==a.cols()); const Index size = a.rows(); pmat = ≈ // Note that ordering methods compute the inverse permutation if(!internal::is_same >::value) { { CholMatrixType C; C = a.template selfadjointView(); OrderingType ordering; ordering(C,m_Pinv); } if(m_Pinv.size()>0) m_P = m_Pinv.inverse(); else m_P.resize(0); ap.resize(size,size); ap.template selfadjointView() = a.template selfadjointView().twistedBy(m_P); } else { m_Pinv.resize(0); m_P.resize(0); if(int(UpLo)==int(Lower) || MatrixType::IsRowMajor) { // we have to transpose the lower part to to the upper one ap.resize(size,size); ap.template selfadjointView() = a.template selfadjointView(); } else internal::simplicial_cholesky_grab_input::run(a, pmat, ap); } } } // end namespace Eigen #endif // EIGEN_SIMPLICIAL_CHOLESKY_H RcppEigen/inst/include/Eigen/src/Core/0000755000176200001440000000000014562417221017240 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/Solve.h0000644000176200001440000001526514562417221020512 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SOLVE_H #define EIGEN_SOLVE_H namespace Eigen { template class SolveImpl; /** \class Solve * \ingroup Core_Module * * \brief Pseudo expression representing a solving operation * * \tparam Decomposition the type of the matrix or decomposition object * \tparam Rhstype the type of the right-hand side * * This class represents an expression of A.solve(B) * and most of the time this is the only way it is used. * */ namespace internal { // this solve_traits class permits to determine the evaluation type with respect to storage kind (Dense vs Sparse) template struct solve_traits; template struct solve_traits { typedef typename make_proper_matrix_type::type PlainObject; }; template struct traits > : traits::StorageKind>::PlainObject> { typedef typename solve_traits::StorageKind>::PlainObject PlainObject; typedef typename promote_index_type::type StorageIndex; typedef traits BaseTraits; enum { Flags = BaseTraits::Flags & RowMajorBit, CoeffReadCost = HugeCost }; }; } template class Solve : public SolveImpl::StorageKind> { public: typedef typename internal::traits::PlainObject PlainObject; typedef typename internal::traits::StorageIndex StorageIndex; Solve(const Decomposition &dec, const RhsType &rhs) : m_dec(dec), m_rhs(rhs) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dec.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); } EIGEN_DEVICE_FUNC const Decomposition& dec() const { return m_dec; } EIGEN_DEVICE_FUNC const RhsType& rhs() const { return m_rhs; } protected: const Decomposition &m_dec; const RhsType &m_rhs; }; // Specialization of the Solve expression for dense results template class SolveImpl : public MatrixBase > { typedef Solve Derived; public: typedef MatrixBase > Base; EIGEN_DENSE_PUBLIC_INTERFACE(Derived) private: Scalar coeff(Index row, Index col) const; Scalar coeff(Index i) const; }; // Generic API dispatcher template class SolveImpl : public internal::generic_xpr_base, MatrixXpr, StorageKind>::type { public: typedef typename internal::generic_xpr_base, MatrixXpr, StorageKind>::type Base; }; namespace internal { // Evaluator of Solve -> eval into a temporary template struct evaluator > : public evaluator::PlainObject> { typedef Solve SolveType; typedef typename SolveType::PlainObject PlainObject; typedef evaluator Base; enum { Flags = Base::Flags | EvalBeforeNestingBit }; EIGEN_DEVICE_FUNC explicit evaluator(const SolveType& solve) : m_result(solve.rows(), solve.cols()) { ::new (static_cast(this)) Base(m_result); solve.dec()._solve_impl(solve.rhs(), m_result); } protected: PlainObject m_result; }; // Specialization for "dst = dec.solve(rhs)" // NOTE we need to specialize it for Dense2Dense to avoid ambiguous specialization error and a Sparse2Sparse specialization must exist somewhere template struct Assignment, internal::assign_op, Dense2Dense> { typedef Solve SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); src.dec()._solve_impl(src.rhs(), dst); } }; // Specialization for "dst = dec.transpose().solve(rhs)" template struct Assignment,RhsType>, internal::assign_op, Dense2Dense> { typedef Solve,RhsType> SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); src.dec().nestedExpression().template _solve_impl_transposed(src.rhs(), dst); } }; // Specialization for "dst = dec.adjoint().solve(rhs)" template struct Assignment, const Transpose >,RhsType>, internal::assign_op, Dense2Dense> { typedef Solve, const Transpose >,RhsType> SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); src.dec().nestedExpression().nestedExpression().template _solve_impl_transposed(src.rhs(), dst); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SOLVE_H RcppEigen/inst/include/Eigen/src/Core/CwiseUnaryOp.h0000644000176200001440000000754114562417221022010 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2014 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CWISE_UNARY_OP_H #define EIGEN_CWISE_UNARY_OP_H namespace Eigen { namespace internal { template struct traits > : traits { typedef typename result_of< UnaryOp(const typename XprType::Scalar&) >::type Scalar; typedef typename XprType::Nested XprTypeNested; typedef typename remove_reference::type _XprTypeNested; enum { Flags = _XprTypeNested::Flags & RowMajorBit }; }; } template class CwiseUnaryOpImpl; /** \class CwiseUnaryOp * \ingroup Core_Module * * \brief Generic expression where a coefficient-wise unary operator is applied to an expression * * \tparam UnaryOp template functor implementing the operator * \tparam XprType the type of the expression to which we are applying the unary operator * * This class represents an expression where a unary operator is applied to an expression. * It is the return type of all operations taking exactly 1 input expression, regardless of the * presence of other inputs such as scalars. For example, the operator* in the expression 3*matrix * is considered unary, because only the right-hand side is an expression, and its * return type is a specialization of CwiseUnaryOp. * * Most of the time, this is the only way that it is used, so you typically don't have to name * CwiseUnaryOp types explicitly. * * \sa MatrixBase::unaryExpr(const CustomUnaryOp &) const, class CwiseBinaryOp, class CwiseNullaryOp */ template class CwiseUnaryOp : public CwiseUnaryOpImpl::StorageKind>, internal::no_assignment_operator { public: typedef typename CwiseUnaryOpImpl::StorageKind>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryOp) typedef typename internal::ref_selector::type XprTypeNested; typedef typename internal::remove_all::type NestedExpression; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit CwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp()) : m_xpr(xpr), m_functor(func) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.rows(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.cols(); } /** \returns the functor representing the unary operation */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const UnaryOp& functor() const { return m_functor; } /** \returns the nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } /** \returns the nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::remove_all::type& nestedExpression() { return m_xpr; } protected: XprTypeNested m_xpr; const UnaryOp m_functor; }; // Generic API dispatcher template class CwiseUnaryOpImpl : public internal::generic_xpr_base >::type { public: typedef typename internal::generic_xpr_base >::type Base; }; } // end namespace Eigen #endif // EIGEN_CWISE_UNARY_OP_H RcppEigen/inst/include/Eigen/src/Core/Matrix.h0000644000176200001440000005742714562417221020674 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2010 Benoit Jacob // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_H #define EIGEN_MATRIX_H namespace Eigen { namespace internal { template struct traits > { private: enum { size = internal::size_at_compile_time<_Rows,_Cols>::ret }; typedef typename find_best_packet<_Scalar,size>::type PacketScalar; enum { row_major_bit = _Options&RowMajor ? RowMajorBit : 0, is_dynamic_size_storage = _MaxRows==Dynamic || _MaxCols==Dynamic, max_size = is_dynamic_size_storage ? Dynamic : _MaxRows*_MaxCols, default_alignment = compute_default_alignment<_Scalar,max_size>::value, actual_alignment = ((_Options&DontAlign)==0) ? default_alignment : 0, required_alignment = unpacket_traits::alignment, packet_access_bit = (packet_traits<_Scalar>::Vectorizable && (EIGEN_UNALIGNED_VECTORIZE || (actual_alignment>=required_alignment))) ? PacketAccessBit : 0 }; public: typedef _Scalar Scalar; typedef Dense StorageKind; typedef Eigen::Index StorageIndex; typedef MatrixXpr XprKind; enum { RowsAtCompileTime = _Rows, ColsAtCompileTime = _Cols, MaxRowsAtCompileTime = _MaxRows, MaxColsAtCompileTime = _MaxCols, Flags = compute_matrix_flags<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>::ret, Options = _Options, InnerStrideAtCompileTime = 1, OuterStrideAtCompileTime = (Options&RowMajor) ? ColsAtCompileTime : RowsAtCompileTime, // FIXME, the following flag in only used to define NeedsToAlign in PlainObjectBase EvaluatorFlags = LinearAccessBit | DirectAccessBit | packet_access_bit | row_major_bit, Alignment = actual_alignment }; }; } /** \class Matrix * \ingroup Core_Module * * \brief The matrix class, also used for vectors and row-vectors * * The %Matrix class is the work-horse for all \em dense (\ref dense "note") matrices and vectors within Eigen. * Vectors are matrices with one column, and row-vectors are matrices with one row. * * The %Matrix class encompasses \em both fixed-size and dynamic-size objects (\ref fixedsize "note"). * * The first three template parameters are required: * \tparam _Scalar Numeric type, e.g. float, double, int or std::complex. * User defined scalar types are supported as well (see \ref user_defined_scalars "here"). * \tparam _Rows Number of rows, or \b Dynamic * \tparam _Cols Number of columns, or \b Dynamic * * The remaining template parameters are optional -- in most cases you don't have to worry about them. * \tparam _Options A combination of either \b #RowMajor or \b #ColMajor, and of either * \b #AutoAlign or \b #DontAlign. * The former controls \ref TopicStorageOrders "storage order", and defaults to column-major. The latter controls alignment, which is required * for vectorization. It defaults to aligning matrices except for fixed sizes that aren't a multiple of the packet size. * \tparam _MaxRows Maximum number of rows. Defaults to \a _Rows (\ref maxrows "note"). * \tparam _MaxCols Maximum number of columns. Defaults to \a _Cols (\ref maxrows "note"). * * Eigen provides a number of typedefs covering the usual cases. Here are some examples: * * \li \c Matrix2d is a 2x2 square matrix of doubles (\c Matrix) * \li \c Vector4f is a vector of 4 floats (\c Matrix) * \li \c RowVector3i is a row-vector of 3 ints (\c Matrix) * * \li \c MatrixXf is a dynamic-size matrix of floats (\c Matrix) * \li \c VectorXf is a dynamic-size vector of floats (\c Matrix) * * \li \c Matrix2Xf is a partially fixed-size (dynamic-size) matrix of floats (\c Matrix) * \li \c MatrixX3d is a partially dynamic-size (fixed-size) matrix of double (\c Matrix) * * See \link matrixtypedefs this page \endlink for a complete list of predefined \em %Matrix and \em Vector typedefs. * * You can access elements of vectors and matrices using normal subscripting: * * \code * Eigen::VectorXd v(10); * v[0] = 0.1; * v[1] = 0.2; * v(0) = 0.3; * v(1) = 0.4; * * Eigen::MatrixXi m(10, 10); * m(0, 1) = 1; * m(0, 2) = 2; * m(0, 3) = 3; * \endcode * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_MATRIX_PLUGIN. * * Some notes: * *
*
\anchor dense Dense versus sparse:
*
This %Matrix class handles dense, not sparse matrices and vectors. For sparse matrices and vectors, see the Sparse module. * * Dense matrices and vectors are plain usual arrays of coefficients. All the coefficients are stored, in an ordinary contiguous array. * This is unlike Sparse matrices and vectors where the coefficients are stored as a list of nonzero coefficients.
* *
\anchor fixedsize Fixed-size versus dynamic-size:
*
Fixed-size means that the numbers of rows and columns are known are compile-time. In this case, Eigen allocates the array * of coefficients as a fixed-size array, as a class member. This makes sense for very small matrices, typically up to 4x4, sometimes up * to 16x16. Larger matrices should be declared as dynamic-size even if one happens to know their size at compile-time. * * Dynamic-size means that the numbers of rows or columns are not necessarily known at compile-time. In this case they are runtime * variables, and the array of coefficients is allocated dynamically on the heap. * * Note that \em dense matrices, be they Fixed-size or Dynamic-size, do not expand dynamically in the sense of a std::map. * If you want this behavior, see the Sparse module.
* *
\anchor maxrows _MaxRows and _MaxCols:
*
In most cases, one just leaves these parameters to the default values. * These parameters mean the maximum size of rows and columns that the matrix may have. They are useful in cases * when the exact numbers of rows and columns are not known are compile-time, but it is known at compile-time that they cannot * exceed a certain value. This happens when taking dynamic-size blocks inside fixed-size matrices: in this case _MaxRows and _MaxCols * are the dimensions of the original matrix, while _Rows and _Cols are Dynamic.
*
* * ABI and storage layout * * The table below summarizes the ABI of some possible Matrix instances which is fixed thorough the lifetime of Eigen 3. * * * * * * *
Matrix typeEquivalent C structure
\code Matrix \endcode\code * struct { * T *data; // with (size_t(data)%EIGEN_MAX_ALIGN_BYTES)==0 * Eigen::Index rows, cols; * }; * \endcode
\code * Matrix * Matrix \endcode\code * struct { * T *data; // with (size_t(data)%EIGEN_MAX_ALIGN_BYTES)==0 * Eigen::Index size; * }; * \endcode
\code Matrix \endcode\code * struct { * T data[Rows*Cols]; // with (size_t(data)%A(Rows*Cols*sizeof(T)))==0 * }; * \endcode
\code Matrix \endcode\code * struct { * T data[MaxRows*MaxCols]; // with (size_t(data)%A(MaxRows*MaxCols*sizeof(T)))==0 * Eigen::Index rows, cols; * }; * \endcode
* Note that in this table Rows, Cols, MaxRows and MaxCols are all positive integers. A(S) is defined to the largest possible power-of-two * smaller to EIGEN_MAX_STATIC_ALIGN_BYTES. * * \see MatrixBase for the majority of the API methods for matrices, \ref TopicClassHierarchy, * \ref TopicStorageOrders */ template class Matrix : public PlainObjectBase > { public: /** \brief Base class typedef. * \sa PlainObjectBase */ typedef PlainObjectBase Base; enum { Options = _Options }; EIGEN_DENSE_PUBLIC_INTERFACE(Matrix) typedef typename Base::PlainObject PlainObject; using Base::base; using Base::coeffRef; /** * \brief Assigns matrices to each other. * * \note This is a special case of the templated operator=. Its purpose is * to prevent a default operator= from hiding the templated operator=. * * \callgraph */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix& operator=(const Matrix& other) { return Base::_set(other); } /** \internal * \brief Copies the value of the expression \a other into \c *this with automatic resizing. * * *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized), * it will be initialized. * * Note that copying a row-vector into a vector (and conversely) is allowed. * The resizing, if any, is then done in the appropriate way so that row-vectors * remain row-vectors and vectors remain vectors. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix& operator=(const DenseBase& other) { return Base::_set(other); } /* Here, doxygen failed to copy the brief information when using \copydoc */ /** * \brief Copies the generic expression \a other into *this. * \copydetails DenseBase::operator=(const EigenBase &other) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix& operator=(const EigenBase &other) { return Base::operator=(other); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix& operator=(const ReturnByValue& func) { return Base::operator=(func); } /** \brief Default constructor. * * For fixed-size matrices, does nothing. * * For dynamic-size matrices, creates an empty matrix of size 0. Does not allocate any array. Such a matrix * is called a null matrix. This constructor is the unique way to create null matrices: resizing * a matrix to 0 is not supported. * * \sa resize(Index,Index) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix() : Base() { Base::_check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } // FIXME is it still needed EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Matrix(internal::constructor_without_unaligned_array_assert) : Base(internal::constructor_without_unaligned_array_assert()) { Base::_check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_constructible::value) : Base(std::move(other)) { Base::_check_template_params(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix& operator=(Matrix&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable::value) { Base::operator=(std::move(other)); return *this; } #endif #if EIGEN_HAS_CXX11 /** \copydoc PlainObjectBase(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&... args) * * Example: \include Matrix_variadic_ctor_cxx11.cpp * Output: \verbinclude Matrix_variadic_ctor_cxx11.out * * \sa Matrix(const std::initializer_list>&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) : Base(a0, a1, a2, a3, args...) {} /** \brief Constructs a Matrix and initializes it from the coefficients given as initializer-lists grouped by row. \cpp11 * * In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients: * * Example: \include Matrix_initializer_list_23_cxx11.cpp * Output: \verbinclude Matrix_initializer_list_23_cxx11.out * * Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is triggered. * * In the case of a compile-time column vector, implicit transposition from a single row is allowed. * Therefore VectorXd{{1,2,3,4,5}} is legal and the more verbose syntax * RowVectorXd{{1},{2},{3},{4},{5}} can be avoided: * * Example: \include Matrix_initializer_list_vector_cxx11.cpp * Output: \verbinclude Matrix_initializer_list_vector_cxx11.out * * In the case of fixed-sized matrices, the initializer list sizes must exactly match the matrix sizes, * and implicit transposition is allowed for compile-time vectors only. * * \sa Matrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) */ EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE Matrix(const std::initializer_list>& list) : Base(list) {} #endif // end EIGEN_HAS_CXX11 #ifndef EIGEN_PARSED_BY_DOXYGEN // This constructor is for both 1x1 matrices and dynamic vectors template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Matrix(const T& x) { Base::_check_template_params(); Base::template _init1(x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const T0& x, const T1& y) { Base::_check_template_params(); Base::template _init2(x, y); } #else /** \brief Constructs a fixed-sized matrix initialized with coefficients starting at \a data */ EIGEN_DEVICE_FUNC explicit Matrix(const Scalar *data); /** \brief Constructs a vector or row-vector with given dimension. \only_for_vectors * * This is useful for dynamic-size vectors. For fixed-size vectors, * it is redundant to pass these parameters, so one should use the default constructor * Matrix() instead. * * \warning This constructor is disabled for fixed-size \c 1x1 matrices. For instance, * calling Matrix(1) will call the initialization constructor: Matrix(const Scalar&). * For fixed-size \c 1x1 matrices it is therefore recommended to use the default * constructor Matrix() instead, especially when using one of the non standard * \c EIGEN_INITIALIZE_MATRICES_BY_{ZERO,\c NAN} macros (see \ref TopicPreprocessorDirectives). */ EIGEN_STRONG_INLINE explicit Matrix(Index dim); /** \brief Constructs an initialized 1x1 matrix with the given coefficient * \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */ Matrix(const Scalar& x); /** \brief Constructs an uninitialized matrix with \a rows rows and \a cols columns. * * This is useful for dynamic-size matrices. For fixed-size matrices, * it is redundant to pass these parameters, so one should use the default constructor * Matrix() instead. * * \warning This constructor is disabled for fixed-size \c 1x2 and \c 2x1 vectors. For instance, * calling Matrix2f(2,1) will call the initialization constructor: Matrix(const Scalar& x, const Scalar& y). * For fixed-size \c 1x2 or \c 2x1 vectors it is therefore recommended to use the default * constructor Matrix() instead, especially when using one of the non standard * \c EIGEN_INITIALIZE_MATRICES_BY_{ZERO,\c NAN} macros (see \ref TopicPreprocessorDirectives). */ EIGEN_DEVICE_FUNC Matrix(Index rows, Index cols); /** \brief Constructs an initialized 2D vector with given coefficients * \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */ Matrix(const Scalar& x, const Scalar& y); #endif // end EIGEN_PARSED_BY_DOXYGEN /** \brief Constructs an initialized 3D vector with given coefficients * \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z) { Base::_check_template_params(); EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Matrix, 3) m_storage.data()[0] = x; m_storage.data()[1] = y; m_storage.data()[2] = z; } /** \brief Constructs an initialized 4D vector with given coefficients * \sa Matrix(const Scalar&, const Scalar&, const Scalar&, const Scalar&, const ArgTypes&...) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z, const Scalar& w) { Base::_check_template_params(); EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Matrix, 4) m_storage.data()[0] = x; m_storage.data()[1] = y; m_storage.data()[2] = z; m_storage.data()[3] = w; } /** \brief Copy constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const Matrix& other) : Base(other) { } /** \brief Copy constructor for generic expressions. * \sa MatrixBase::operator=(const EigenBase&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Matrix(const EigenBase &other) : Base(other.derived()) { } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return 1; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return this->innerSize(); } /////////// Geometry module /////////// template EIGEN_DEVICE_FUNC explicit Matrix(const RotationBase& r); template EIGEN_DEVICE_FUNC Matrix& operator=(const RotationBase& r); // allow to extend Matrix outside Eigen #ifdef EIGEN_MATRIX_PLUGIN #include EIGEN_MATRIX_PLUGIN #endif protected: template friend struct internal::conservative_resize_like_impl; using Base::m_storage; }; /** \defgroup matrixtypedefs Global matrix typedefs * * \ingroup Core_Module * * %Eigen defines several typedef shortcuts for most common matrix and vector types. * * The general patterns are the following: * * \c MatrixSizeType where \c Size can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size, * and where \c Type can be \c i for integer, \c f for float, \c d for double, \c cf for complex float, \c cd * for complex double. * * For example, \c Matrix3d is a fixed-size 3x3 matrix type of doubles, and \c MatrixXf is a dynamic-size matrix of floats. * * There are also \c VectorSizeType and \c RowVectorSizeType which are self-explanatory. For example, \c Vector4cf is * a fixed-size vector of 4 complex floats. * * With \cpp11, template alias are also defined for common sizes. * They follow the same pattern as above except that the scalar type suffix is replaced by a * template parameter, i.e.: * - `MatrixSize` where `Size` can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size. * - `MatrixXSize` and `MatrixSizeX` where `Size` can be \c 2,\c 3,\c 4 for hybrid dynamic/fixed matrices. * - `VectorSize` and `RowVectorSize` for column and row vectors. * * With \cpp11, you can also use fully generic column and row vector types: `Vector` and `RowVector`. * * \sa class Matrix */ #define EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix) \ /** \ingroup matrixtypedefs */ \ typedef Matrix Matrix##SizeSuffix##TypeSuffix; \ /** \ingroup matrixtypedefs */ \ typedef Matrix Vector##SizeSuffix##TypeSuffix; \ /** \ingroup matrixtypedefs */ \ typedef Matrix RowVector##SizeSuffix##TypeSuffix; #define EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, Size) \ /** \ingroup matrixtypedefs */ \ typedef Matrix Matrix##Size##X##TypeSuffix; \ /** \ingroup matrixtypedefs */ \ typedef Matrix Matrix##X##Size##TypeSuffix; #define EIGEN_MAKE_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 2, 2) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 3, 3) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 4, 4) \ EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Dynamic, X) \ EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 2) \ EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 3) \ EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 4) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(int, i) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(float, f) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(double, d) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(std::complex, cf) EIGEN_MAKE_TYPEDEFS_ALL_SIZES(std::complex, cd) #undef EIGEN_MAKE_TYPEDEFS_ALL_SIZES #undef EIGEN_MAKE_TYPEDEFS #undef EIGEN_MAKE_FIXED_TYPEDEFS #if EIGEN_HAS_CXX11 #define EIGEN_MAKE_TYPEDEFS(Size, SizeSuffix) \ /** \ingroup matrixtypedefs */ \ /** \brief \cpp11 */ \ template \ using Matrix##SizeSuffix = Matrix; \ /** \ingroup matrixtypedefs */ \ /** \brief \cpp11 */ \ template \ using Vector##SizeSuffix = Matrix; \ /** \ingroup matrixtypedefs */ \ /** \brief \cpp11 */ \ template \ using RowVector##SizeSuffix = Matrix; #define EIGEN_MAKE_FIXED_TYPEDEFS(Size) \ /** \ingroup matrixtypedefs */ \ /** \brief \cpp11 */ \ template \ using Matrix##Size##X = Matrix; \ /** \ingroup matrixtypedefs */ \ /** \brief \cpp11 */ \ template \ using Matrix##X##Size = Matrix; EIGEN_MAKE_TYPEDEFS(2, 2) EIGEN_MAKE_TYPEDEFS(3, 3) EIGEN_MAKE_TYPEDEFS(4, 4) EIGEN_MAKE_TYPEDEFS(Dynamic, X) EIGEN_MAKE_FIXED_TYPEDEFS(2) EIGEN_MAKE_FIXED_TYPEDEFS(3) EIGEN_MAKE_FIXED_TYPEDEFS(4) /** \ingroup matrixtypedefs * \brief \cpp11 */ template using Vector = Matrix; /** \ingroup matrixtypedefs * \brief \cpp11 */ template using RowVector = Matrix; #undef EIGEN_MAKE_TYPEDEFS #undef EIGEN_MAKE_FIXED_TYPEDEFS #endif // EIGEN_HAS_CXX11 } // end namespace Eigen #endif // EIGEN_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/BandMatrix.h0000644000176200001440000003337314562417221021453 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BANDMATRIX_H #define EIGEN_BANDMATRIX_H namespace Eigen { namespace internal { template class BandMatrixBase : public EigenBase { public: enum { Flags = internal::traits::Flags, CoeffReadCost = internal::traits::CoeffReadCost, RowsAtCompileTime = internal::traits::RowsAtCompileTime, ColsAtCompileTime = internal::traits::ColsAtCompileTime, MaxRowsAtCompileTime = internal::traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = internal::traits::MaxColsAtCompileTime, Supers = internal::traits::Supers, Subs = internal::traits::Subs, Options = internal::traits::Options }; typedef typename internal::traits::Scalar Scalar; typedef Matrix DenseMatrixType; typedef typename DenseMatrixType::StorageIndex StorageIndex; typedef typename internal::traits::CoefficientsType CoefficientsType; typedef EigenBase Base; protected: enum { DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic, SizeAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime,ColsAtCompileTime) }; public: using Base::derived; using Base::rows; using Base::cols; /** \returns the number of super diagonals */ inline Index supers() const { return derived().supers(); } /** \returns the number of sub diagonals */ inline Index subs() const { return derived().subs(); } /** \returns an expression of the underlying coefficient matrix */ inline const CoefficientsType& coeffs() const { return derived().coeffs(); } /** \returns an expression of the underlying coefficient matrix */ inline CoefficientsType& coeffs() { return derived().coeffs(); } /** \returns a vector expression of the \a i -th column, * only the meaningful part is returned. * \warning the internal storage must be column major. */ inline Block col(Index i) { EIGEN_STATIC_ASSERT((int(Options) & int(RowMajor)) == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES); Index start = 0; Index len = coeffs().rows(); if (i<=supers()) { start = supers()-i; len = (std::min)(rows(),std::max(0,coeffs().rows() - (supers()-i))); } else if (i>=rows()-subs()) len = std::max(0,coeffs().rows() - (i + 1 - rows() + subs())); return Block(coeffs(), start, i, len, 1); } /** \returns a vector expression of the main diagonal */ inline Block diagonal() { return Block(coeffs(),supers(),0,1,(std::min)(rows(),cols())); } /** \returns a vector expression of the main diagonal (const version) */ inline const Block diagonal() const { return Block(coeffs(),supers(),0,1,(std::min)(rows(),cols())); } template struct DiagonalIntReturnType { enum { ReturnOpposite = (int(Options) & int(SelfAdjoint)) && (((Index) > 0 && Supers == 0) || ((Index) < 0 && Subs == 0)), Conjugate = ReturnOpposite && NumTraits::IsComplex, ActualIndex = ReturnOpposite ? -Index : Index, DiagonalSize = (RowsAtCompileTime==Dynamic || ColsAtCompileTime==Dynamic) ? Dynamic : (ActualIndex<0 ? EIGEN_SIZE_MIN_PREFER_DYNAMIC(ColsAtCompileTime, RowsAtCompileTime + ActualIndex) : EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime, ColsAtCompileTime - ActualIndex)) }; typedef Block BuildType; typedef typename internal::conditional,BuildType >, BuildType>::type Type; }; /** \returns a vector expression of the \a N -th sub or super diagonal */ template inline typename DiagonalIntReturnType::Type diagonal() { return typename DiagonalIntReturnType::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N)); } /** \returns a vector expression of the \a N -th sub or super diagonal */ template inline const typename DiagonalIntReturnType::Type diagonal() const { return typename DiagonalIntReturnType::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N)); } /** \returns a vector expression of the \a i -th sub or super diagonal */ inline Block diagonal(Index i) { eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers())); return Block(coeffs(), supers()-i, std::max(0,i), 1, diagonalLength(i)); } /** \returns a vector expression of the \a i -th sub or super diagonal */ inline const Block diagonal(Index i) const { eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers())); return Block(coeffs(), supers()-i, std::max(0,i), 1, diagonalLength(i)); } template inline void evalTo(Dest& dst) const { dst.resize(rows(),cols()); dst.setZero(); dst.diagonal() = diagonal(); for (Index i=1; i<=supers();++i) dst.diagonal(i) = diagonal(i); for (Index i=1; i<=subs();++i) dst.diagonal(-i) = diagonal(-i); } DenseMatrixType toDenseMatrix() const { DenseMatrixType res(rows(),cols()); evalTo(res); return res; } protected: inline Index diagonalLength(Index i) const { return i<0 ? (std::min)(cols(),rows()+i) : (std::min)(rows(),cols()-i); } }; /** * \class BandMatrix * \ingroup Core_Module * * \brief Represents a rectangular matrix with a banded storage * * \tparam _Scalar Numeric type, i.e. float, double, int * \tparam _Rows Number of rows, or \b Dynamic * \tparam _Cols Number of columns, or \b Dynamic * \tparam _Supers Number of super diagonal * \tparam _Subs Number of sub diagonal * \tparam _Options A combination of either \b #RowMajor or \b #ColMajor, and of \b #SelfAdjoint * The former controls \ref TopicStorageOrders "storage order", and defaults to * column-major. The latter controls whether the matrix represents a selfadjoint * matrix in which case either Supers of Subs have to be null. * * \sa class TridiagonalMatrix */ template struct traits > { typedef _Scalar Scalar; typedef Dense StorageKind; typedef Eigen::Index StorageIndex; enum { CoeffReadCost = NumTraits::ReadCost, RowsAtCompileTime = _Rows, ColsAtCompileTime = _Cols, MaxRowsAtCompileTime = _Rows, MaxColsAtCompileTime = _Cols, Flags = LvalueBit, Supers = _Supers, Subs = _Subs, Options = _Options, DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic }; typedef Matrix CoefficientsType; }; template class BandMatrix : public BandMatrixBase > { public: typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::StorageIndex StorageIndex; typedef typename internal::traits::CoefficientsType CoefficientsType; explicit inline BandMatrix(Index rows=Rows, Index cols=Cols, Index supers=Supers, Index subs=Subs) : m_coeffs(1+supers+subs,cols), m_rows(rows), m_supers(supers), m_subs(subs) { } /** \returns the number of columns */ inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); } /** \returns the number of rows */ inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); } /** \returns the number of super diagonals */ inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); } /** \returns the number of sub diagonals */ inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); } inline const CoefficientsType& coeffs() const { return m_coeffs; } inline CoefficientsType& coeffs() { return m_coeffs; } protected: CoefficientsType m_coeffs; internal::variable_if_dynamic m_rows; internal::variable_if_dynamic m_supers; internal::variable_if_dynamic m_subs; }; template class BandMatrixWrapper; template struct traits > { typedef typename _CoefficientsType::Scalar Scalar; typedef typename _CoefficientsType::StorageKind StorageKind; typedef typename _CoefficientsType::StorageIndex StorageIndex; enum { CoeffReadCost = internal::traits<_CoefficientsType>::CoeffReadCost, RowsAtCompileTime = _Rows, ColsAtCompileTime = _Cols, MaxRowsAtCompileTime = _Rows, MaxColsAtCompileTime = _Cols, Flags = LvalueBit, Supers = _Supers, Subs = _Subs, Options = _Options, DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic }; typedef _CoefficientsType CoefficientsType; }; template class BandMatrixWrapper : public BandMatrixBase > { public: typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::CoefficientsType CoefficientsType; typedef typename internal::traits::StorageIndex StorageIndex; explicit inline BandMatrixWrapper(const CoefficientsType& coeffs, Index rows=_Rows, Index cols=_Cols, Index supers=_Supers, Index subs=_Subs) : m_coeffs(coeffs), m_rows(rows), m_supers(supers), m_subs(subs) { EIGEN_UNUSED_VARIABLE(cols); //internal::assert(coeffs.cols()==cols() && (supers()+subs()+1)==coeffs.rows()); } /** \returns the number of columns */ inline EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); } /** \returns the number of rows */ inline EIGEN_CONSTEXPR Index cols() const { return m_coeffs.cols(); } /** \returns the number of super diagonals */ inline EIGEN_CONSTEXPR Index supers() const { return m_supers.value(); } /** \returns the number of sub diagonals */ inline EIGEN_CONSTEXPR Index subs() const { return m_subs.value(); } inline const CoefficientsType& coeffs() const { return m_coeffs; } protected: const CoefficientsType& m_coeffs; internal::variable_if_dynamic m_rows; internal::variable_if_dynamic m_supers; internal::variable_if_dynamic m_subs; }; /** * \class TridiagonalMatrix * \ingroup Core_Module * * \brief Represents a tridiagonal matrix with a compact banded storage * * \tparam Scalar Numeric type, i.e. float, double, int * \tparam Size Number of rows and cols, or \b Dynamic * \tparam Options Can be 0 or \b SelfAdjoint * * \sa class BandMatrix */ template class TridiagonalMatrix : public BandMatrix { typedef BandMatrix Base; typedef typename Base::StorageIndex StorageIndex; public: explicit TridiagonalMatrix(Index size = Size) : Base(size,size,Options&SelfAdjoint?0:1,1) {} inline typename Base::template DiagonalIntReturnType<1>::Type super() { return Base::template diagonal<1>(); } inline const typename Base::template DiagonalIntReturnType<1>::Type super() const { return Base::template diagonal<1>(); } inline typename Base::template DiagonalIntReturnType<-1>::Type sub() { return Base::template diagonal<-1>(); } inline const typename Base::template DiagonalIntReturnType<-1>::Type sub() const { return Base::template diagonal<-1>(); } protected: }; struct BandShape {}; template struct evaluator_traits > : public evaluator_traits_base > { typedef BandShape Shape; }; template struct evaluator_traits > : public evaluator_traits_base > { typedef BandShape Shape; }; template<> struct AssignmentKind { typedef EigenBase2EigenBase Kind; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_BANDMATRIX_H RcppEigen/inst/include/Eigen/src/Core/Random.h0000644000176200001440000001711414562417221020635 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_RANDOM_H #define EIGEN_RANDOM_H namespace Eigen { namespace internal { template struct scalar_random_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_random_op) inline const Scalar operator() () const { return random(); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = false, IsRepeatable = false }; }; } // end namespace internal /** \returns a random matrix expression * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this MatrixBase type. * * \not_reentrant * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Random() should be used * instead. * * * Example: \include MatrixBase_random_int_int.cpp * Output: \verbinclude MatrixBase_random_int_int.out * * This expression has the "evaluate before nesting" flag so that it will be evaluated into * a temporary matrix whenever it is nested in a larger expression. This prevents unexpected * behavior with expressions involving random matrices. * * See DenseBase::NullaryExpr(Index, const CustomNullaryOp&) for an example using C++11 random generators. * * \sa DenseBase::setRandom(), DenseBase::Random(Index), DenseBase::Random() */ template inline const typename DenseBase::RandomReturnType DenseBase::Random(Index rows, Index cols) { return NullaryExpr(rows, cols, internal::scalar_random_op()); } /** \returns a random vector expression * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * The parameter \a size is the size of the returned vector. * Must be compatible with this MatrixBase type. * * \only_for_vectors * \not_reentrant * * This variant is meant to be used for dynamic-size vector types. For fixed-size types, * it is redundant to pass \a size as argument, so Random() should be used * instead. * * Example: \include MatrixBase_random_int.cpp * Output: \verbinclude MatrixBase_random_int.out * * This expression has the "evaluate before nesting" flag so that it will be evaluated into * a temporary vector whenever it is nested in a larger expression. This prevents unexpected * behavior with expressions involving random matrices. * * \sa DenseBase::setRandom(), DenseBase::Random(Index,Index), DenseBase::Random() */ template inline const typename DenseBase::RandomReturnType DenseBase::Random(Index size) { return NullaryExpr(size, internal::scalar_random_op()); } /** \returns a fixed-size random matrix or vector expression * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you * need to use the variants taking size arguments. * * Example: \include MatrixBase_random.cpp * Output: \verbinclude MatrixBase_random.out * * This expression has the "evaluate before nesting" flag so that it will be evaluated into * a temporary matrix whenever it is nested in a larger expression. This prevents unexpected * behavior with expressions involving random matrices. * * \not_reentrant * * \sa DenseBase::setRandom(), DenseBase::Random(Index,Index), DenseBase::Random(Index) */ template inline const typename DenseBase::RandomReturnType DenseBase::Random() { return NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_random_op()); } /** Sets all coefficients in this expression to random values. * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * \not_reentrant * * Example: \include MatrixBase_setRandom.cpp * Output: \verbinclude MatrixBase_setRandom.out * * \sa class CwiseNullaryOp, setRandom(Index), setRandom(Index,Index) */ template EIGEN_DEVICE_FUNC inline Derived& DenseBase::setRandom() { return *this = Random(rows(), cols()); } /** Resizes to the given \a newSize, and sets all coefficients in this expression to random values. * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * \only_for_vectors * \not_reentrant * * Example: \include Matrix_setRandom_int.cpp * Output: \verbinclude Matrix_setRandom_int.out * * \sa DenseBase::setRandom(), setRandom(Index,Index), class CwiseNullaryOp, DenseBase::Random() */ template EIGEN_STRONG_INLINE Derived& PlainObjectBase::setRandom(Index newSize) { resize(newSize); return setRandom(); } /** Resizes to the given size, and sets all coefficients in this expression to random values. * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * \not_reentrant * * \param rows the new number of rows * \param cols the new number of columns * * Example: \include Matrix_setRandom_int_int.cpp * Output: \verbinclude Matrix_setRandom_int_int.out * * \sa DenseBase::setRandom(), setRandom(Index), class CwiseNullaryOp, DenseBase::Random() */ template EIGEN_STRONG_INLINE Derived& PlainObjectBase::setRandom(Index rows, Index cols) { resize(rows, cols); return setRandom(); } /** Resizes to the given size, changing only the number of columns, and sets all * coefficients in this expression to random values. For the parameter of type * NoChange_t, just pass the special value \c NoChange. * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * \not_reentrant * * \sa DenseBase::setRandom(), setRandom(Index), setRandom(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Random() */ template EIGEN_STRONG_INLINE Derived& PlainObjectBase::setRandom(NoChange_t, Index cols) { return setRandom(rows(), cols); } /** Resizes to the given size, changing only the number of rows, and sets all * coefficients in this expression to random values. For the parameter of type * NoChange_t, just pass the special value \c NoChange. * * Numbers are uniformly spread through their whole definition range for integer types, * and in the [-1:1] range for floating point scalar types. * * \not_reentrant * * \sa DenseBase::setRandom(), setRandom(Index), setRandom(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Random() */ template EIGEN_STRONG_INLINE Derived& PlainObjectBase::setRandom(Index rows, NoChange_t) { return setRandom(rows, cols()); } } // end namespace Eigen #endif // EIGEN_RANDOM_H RcppEigen/inst/include/Eigen/src/Core/VectorwiseOp.h0000644000176200001440000010454014562417221022046 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2019 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PARTIAL_REDUX_H #define EIGEN_PARTIAL_REDUX_H namespace Eigen { /** \class PartialReduxExpr * \ingroup Core_Module * * \brief Generic expression of a partially reduxed matrix * * \tparam MatrixType the type of the matrix we are applying the redux operation * \tparam MemberOp type of the member functor * \tparam Direction indicates the direction of the redux (#Vertical or #Horizontal) * * This class represents an expression of a partial redux operator of a matrix. * It is the return type of some VectorwiseOp functions, * and most of the time this is the only way it is used. * * \sa class VectorwiseOp */ template< typename MatrixType, typename MemberOp, int Direction> class PartialReduxExpr; namespace internal { template struct traits > : traits { typedef typename MemberOp::result_type Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::XprKind XprKind; typedef typename MatrixType::Scalar InputScalar; enum { RowsAtCompileTime = Direction==Vertical ? 1 : MatrixType::RowsAtCompileTime, ColsAtCompileTime = Direction==Horizontal ? 1 : MatrixType::ColsAtCompileTime, MaxRowsAtCompileTime = Direction==Vertical ? 1 : MatrixType::MaxRowsAtCompileTime, MaxColsAtCompileTime = Direction==Horizontal ? 1 : MatrixType::MaxColsAtCompileTime, Flags = RowsAtCompileTime == 1 ? RowMajorBit : 0, TraversalSize = Direction==Vertical ? MatrixType::RowsAtCompileTime : MatrixType::ColsAtCompileTime }; }; } template< typename MatrixType, typename MemberOp, int Direction> class PartialReduxExpr : public internal::dense_xpr_base< PartialReduxExpr >::type, internal::no_assignment_operator { public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(PartialReduxExpr) EIGEN_DEVICE_FUNC explicit PartialReduxExpr(const MatrixType& mat, const MemberOp& func = MemberOp()) : m_matrix(mat), m_functor(func) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return (Direction==Vertical ? 1 : m_matrix.rows()); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return (Direction==Horizontal ? 1 : m_matrix.cols()); } EIGEN_DEVICE_FUNC typename MatrixType::Nested nestedExpression() const { return m_matrix; } EIGEN_DEVICE_FUNC const MemberOp& functor() const { return m_functor; } protected: typename MatrixType::Nested m_matrix; const MemberOp m_functor; }; template struct partial_redux_dummy_func; #define EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(MEMBER,COST,VECTORIZABLE,BINARYOP) \ template \ struct member_##MEMBER { \ EIGEN_EMPTY_STRUCT_CTOR(member_##MEMBER) \ typedef ResultType result_type; \ typedef BINARYOP BinaryOp; \ template struct Cost { enum { value = COST }; }; \ enum { Vectorizable = VECTORIZABLE }; \ template \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ ResultType operator()(const XprType& mat) const \ { return mat.MEMBER(); } \ BinaryOp binaryFunc() const { return BinaryOp(); } \ } #define EIGEN_MEMBER_FUNCTOR(MEMBER,COST) \ EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(MEMBER,COST,0,partial_redux_dummy_func) namespace internal { EIGEN_MEMBER_FUNCTOR(norm, (Size+5) * NumTraits::MulCost + (Size-1)*NumTraits::AddCost); EIGEN_MEMBER_FUNCTOR(stableNorm, (Size+5) * NumTraits::MulCost + (Size-1)*NumTraits::AddCost); EIGEN_MEMBER_FUNCTOR(blueNorm, (Size+5) * NumTraits::MulCost + (Size-1)*NumTraits::AddCost); EIGEN_MEMBER_FUNCTOR(hypotNorm, (Size-1) * functor_traits >::Cost ); EIGEN_MEMBER_FUNCTOR(all, (Size-1)*NumTraits::AddCost); EIGEN_MEMBER_FUNCTOR(any, (Size-1)*NumTraits::AddCost); EIGEN_MEMBER_FUNCTOR(count, (Size-1)*NumTraits::AddCost); EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(sum, (Size-1)*NumTraits::AddCost, 1, internal::scalar_sum_op); EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(minCoeff, (Size-1)*NumTraits::AddCost, 1, internal::scalar_min_op); EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(maxCoeff, (Size-1)*NumTraits::AddCost, 1, internal::scalar_max_op); EIGEN_MAKE_PARTIAL_REDUX_FUNCTOR(prod, (Size-1)*NumTraits::MulCost, 1, internal::scalar_product_op); template struct member_lpnorm { typedef ResultType result_type; enum { Vectorizable = 0 }; template struct Cost { enum { value = (Size+5) * NumTraits::MulCost + (Size-1)*NumTraits::AddCost }; }; EIGEN_DEVICE_FUNC member_lpnorm() {} template EIGEN_DEVICE_FUNC inline ResultType operator()(const XprType& mat) const { return mat.template lpNorm

(); } }; template struct member_redux { typedef BinaryOpT BinaryOp; typedef typename result_of< BinaryOp(const Scalar&,const Scalar&) >::type result_type; enum { Vectorizable = functor_traits::PacketAccess }; template struct Cost { enum { value = (Size-1) * functor_traits::Cost }; }; EIGEN_DEVICE_FUNC explicit member_redux(const BinaryOp func) : m_functor(func) {} template EIGEN_DEVICE_FUNC inline result_type operator()(const DenseBase& mat) const { return mat.redux(m_functor); } const BinaryOp& binaryFunc() const { return m_functor; } const BinaryOp m_functor; }; } /** \class VectorwiseOp * \ingroup Core_Module * * \brief Pseudo expression providing broadcasting and partial reduction operations * * \tparam ExpressionType the type of the object on which to do partial reductions * \tparam Direction indicates whether to operate on columns (#Vertical) or rows (#Horizontal) * * This class represents a pseudo expression with broadcasting and partial reduction features. * It is the return type of DenseBase::colwise() and DenseBase::rowwise() * and most of the time this is the only way it is explicitly used. * * To understand the logic of rowwise/colwise expression, let's consider a generic case `A.colwise().foo()` * where `foo` is any method of `VectorwiseOp`. This expression is equivalent to applying `foo()` to each * column of `A` and then re-assemble the outputs in a matrix expression: * \code [A.col(0).foo(), A.col(1).foo(), ..., A.col(A.cols()-1).foo()] \endcode * * Example: \include MatrixBase_colwise.cpp * Output: \verbinclude MatrixBase_colwise.out * * The begin() and end() methods are obviously exceptions to the previous rule as they * return STL-compatible begin/end iterators to the rows or columns of the nested expression. * Typical use cases include for-range-loop and calls to STL algorithms: * * Example: \include MatrixBase_colwise_iterator_cxx11.cpp * Output: \verbinclude MatrixBase_colwise_iterator_cxx11.out * * For a partial reduction on an empty input, some rules apply. * For the sake of clarity, let's consider a vertical reduction: * - If the number of columns is zero, then a 1x0 row-major vector expression is returned. * - Otherwise, if the number of rows is zero, then * - a row vector of zeros is returned for sum-like reductions (sum, squaredNorm, norm, etc.) * - a row vector of ones is returned for a product reduction (e.g., MatrixXd(n,0).colwise().prod()) * - an assert is triggered for all other reductions (minCoeff,maxCoeff,redux(bin_op)) * * \sa DenseBase::colwise(), DenseBase::rowwise(), class PartialReduxExpr */ template class VectorwiseOp { public: typedef typename ExpressionType::Scalar Scalar; typedef typename ExpressionType::RealScalar RealScalar; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 typedef typename internal::ref_selector::non_const_type ExpressionTypeNested; typedef typename internal::remove_all::type ExpressionTypeNestedCleaned; template class Functor, typename ReturnScalar=Scalar> struct ReturnType { typedef PartialReduxExpr, Direction > Type; }; template struct ReduxReturnType { typedef PartialReduxExpr, Direction > Type; }; enum { isVertical = (Direction==Vertical) ? 1 : 0, isHorizontal = (Direction==Horizontal) ? 1 : 0 }; protected: template struct ExtendedType { typedef Replicate Type; }; /** \internal * Replicates a vector to match the size of \c *this */ template EIGEN_DEVICE_FUNC typename ExtendedType::Type extendedTo(const DenseBase& other) const { EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(isVertical, OtherDerived::MaxColsAtCompileTime==1), YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED) EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(isHorizontal, OtherDerived::MaxRowsAtCompileTime==1), YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED) return typename ExtendedType::Type (other.derived(), isVertical ? 1 : m_matrix.rows(), isHorizontal ? 1 : m_matrix.cols()); } template struct OppositeExtendedType { typedef Replicate Type; }; /** \internal * Replicates a vector in the opposite direction to match the size of \c *this */ template EIGEN_DEVICE_FUNC typename OppositeExtendedType::Type extendedToOpposite(const DenseBase& other) const { EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(isHorizontal, OtherDerived::MaxColsAtCompileTime==1), YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED) EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(isVertical, OtherDerived::MaxRowsAtCompileTime==1), YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED) return typename OppositeExtendedType::Type (other.derived(), isHorizontal ? 1 : m_matrix.rows(), isVertical ? 1 : m_matrix.cols()); } public: EIGEN_DEVICE_FUNC explicit inline VectorwiseOp(ExpressionType& matrix) : m_matrix(matrix) {} /** \internal */ EIGEN_DEVICE_FUNC inline const ExpressionType& _expression() const { return m_matrix; } #ifdef EIGEN_PARSED_BY_DOXYGEN /** STL-like RandomAccessIterator * iterator type over the columns or rows as returned by the begin() and end() methods. */ random_access_iterator_type iterator; /** This is the const version of iterator (aka read-only) */ random_access_iterator_type const_iterator; #else typedef internal::subvector_stl_iterator iterator; typedef internal::subvector_stl_iterator const_iterator; typedef internal::subvector_stl_reverse_iterator reverse_iterator; typedef internal::subvector_stl_reverse_iterator const_reverse_iterator; #endif /** returns an iterator to the first row (rowwise) or column (colwise) of the nested expression. * \sa end(), cbegin() */ iterator begin() { return iterator (m_matrix, 0); } /** const version of begin() */ const_iterator begin() const { return const_iterator(m_matrix, 0); } /** const version of begin() */ const_iterator cbegin() const { return const_iterator(m_matrix, 0); } /** returns a reverse iterator to the last row (rowwise) or column (colwise) of the nested expression. * \sa rend(), crbegin() */ reverse_iterator rbegin() { return reverse_iterator (m_matrix, m_matrix.template subVectors()-1); } /** const version of rbegin() */ const_reverse_iterator rbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors()-1); } /** const version of rbegin() */ const_reverse_iterator crbegin() const { return const_reverse_iterator (m_matrix, m_matrix.template subVectors()-1); } /** returns an iterator to the row (resp. column) following the last row (resp. column) of the nested expression * \sa begin(), cend() */ iterator end() { return iterator (m_matrix, m_matrix.template subVectors()); } /** const version of end() */ const_iterator end() const { return const_iterator(m_matrix, m_matrix.template subVectors()); } /** const version of end() */ const_iterator cend() const { return const_iterator(m_matrix, m_matrix.template subVectors()); } /** returns a reverse iterator to the row (resp. column) before the first row (resp. column) of the nested expression * \sa begin(), cend() */ reverse_iterator rend() { return reverse_iterator (m_matrix, -1); } /** const version of rend() */ const_reverse_iterator rend() const { return const_reverse_iterator (m_matrix, -1); } /** const version of rend() */ const_reverse_iterator crend() const { return const_reverse_iterator (m_matrix, -1); } /** \returns a row or column vector expression of \c *this reduxed by \a func * * The template parameter \a BinaryOp is the type of the functor * of the custom redux operator. Note that func must be an associative operator. * * \warning the size along the reduction direction must be strictly positive, * otherwise an assertion is triggered. * * \sa class VectorwiseOp, DenseBase::colwise(), DenseBase::rowwise() */ template EIGEN_DEVICE_FUNC const typename ReduxReturnType::Type redux(const BinaryOp& func = BinaryOp()) const { eigen_assert(redux_length()>0 && "you are using an empty matrix"); return typename ReduxReturnType::Type(_expression(), internal::member_redux(func)); } typedef typename ReturnType::Type MinCoeffReturnType; typedef typename ReturnType::Type MaxCoeffReturnType; typedef PartialReduxExpr, const ExpressionTypeNestedCleaned>,internal::member_sum,Direction> SquaredNormReturnType; typedef CwiseUnaryOp, const SquaredNormReturnType> NormReturnType; typedef typename ReturnType::Type BlueNormReturnType; typedef typename ReturnType::Type StableNormReturnType; typedef typename ReturnType::Type HypotNormReturnType; typedef typename ReturnType::Type SumReturnType; typedef EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(SumReturnType,Scalar,quotient) MeanReturnType; typedef typename ReturnType::Type AllReturnType; typedef typename ReturnType::Type AnyReturnType; typedef PartialReduxExpr, Direction> CountReturnType; typedef typename ReturnType::Type ProdReturnType; typedef Reverse ConstReverseReturnType; typedef Reverse ReverseReturnType; template struct LpNormReturnType { typedef PartialReduxExpr,Direction> Type; }; /** \returns a row (or column) vector expression of the smallest coefficient * of each column (or row) of the referenced expression. * * \warning the size along the reduction direction must be strictly positive, * otherwise an assertion is triggered. * * \warning the result is undefined if \c *this contains NaN. * * Example: \include PartialRedux_minCoeff.cpp * Output: \verbinclude PartialRedux_minCoeff.out * * \sa DenseBase::minCoeff() */ EIGEN_DEVICE_FUNC const MinCoeffReturnType minCoeff() const { eigen_assert(redux_length()>0 && "you are using an empty matrix"); return MinCoeffReturnType(_expression()); } /** \returns a row (or column) vector expression of the largest coefficient * of each column (or row) of the referenced expression. * * \warning the size along the reduction direction must be strictly positive, * otherwise an assertion is triggered. * * \warning the result is undefined if \c *this contains NaN. * * Example: \include PartialRedux_maxCoeff.cpp * Output: \verbinclude PartialRedux_maxCoeff.out * * \sa DenseBase::maxCoeff() */ EIGEN_DEVICE_FUNC const MaxCoeffReturnType maxCoeff() const { eigen_assert(redux_length()>0 && "you are using an empty matrix"); return MaxCoeffReturnType(_expression()); } /** \returns a row (or column) vector expression of the squared norm * of each column (or row) of the referenced expression. * This is a vector with real entries, even if the original matrix has complex entries. * * Example: \include PartialRedux_squaredNorm.cpp * Output: \verbinclude PartialRedux_squaredNorm.out * * \sa DenseBase::squaredNorm() */ EIGEN_DEVICE_FUNC const SquaredNormReturnType squaredNorm() const { return SquaredNormReturnType(m_matrix.cwiseAbs2()); } /** \returns a row (or column) vector expression of the norm * of each column (or row) of the referenced expression. * This is a vector with real entries, even if the original matrix has complex entries. * * Example: \include PartialRedux_norm.cpp * Output: \verbinclude PartialRedux_norm.out * * \sa DenseBase::norm() */ EIGEN_DEVICE_FUNC const NormReturnType norm() const { return NormReturnType(squaredNorm()); } /** \returns a row (or column) vector expression of the norm * of each column (or row) of the referenced expression. * This is a vector with real entries, even if the original matrix has complex entries. * * Example: \include PartialRedux_norm.cpp * Output: \verbinclude PartialRedux_norm.out * * \sa DenseBase::norm() */ template EIGEN_DEVICE_FUNC const typename LpNormReturnType

::Type lpNorm() const { return typename LpNormReturnType

::Type(_expression()); } /** \returns a row (or column) vector expression of the norm * of each column (or row) of the referenced expression, using * Blue's algorithm. * This is a vector with real entries, even if the original matrix has complex entries. * * \sa DenseBase::blueNorm() */ EIGEN_DEVICE_FUNC const BlueNormReturnType blueNorm() const { return BlueNormReturnType(_expression()); } /** \returns a row (or column) vector expression of the norm * of each column (or row) of the referenced expression, avoiding * underflow and overflow. * This is a vector with real entries, even if the original matrix has complex entries. * * \sa DenseBase::stableNorm() */ EIGEN_DEVICE_FUNC const StableNormReturnType stableNorm() const { return StableNormReturnType(_expression()); } /** \returns a row (or column) vector expression of the norm * of each column (or row) of the referenced expression, avoiding * underflow and overflow using a concatenation of hypot() calls. * This is a vector with real entries, even if the original matrix has complex entries. * * \sa DenseBase::hypotNorm() */ EIGEN_DEVICE_FUNC const HypotNormReturnType hypotNorm() const { return HypotNormReturnType(_expression()); } /** \returns a row (or column) vector expression of the sum * of each column (or row) of the referenced expression. * * Example: \include PartialRedux_sum.cpp * Output: \verbinclude PartialRedux_sum.out * * \sa DenseBase::sum() */ EIGEN_DEVICE_FUNC const SumReturnType sum() const { return SumReturnType(_expression()); } /** \returns a row (or column) vector expression of the mean * of each column (or row) of the referenced expression. * * \sa DenseBase::mean() */ EIGEN_DEVICE_FUNC const MeanReturnType mean() const { return sum() / Scalar(Direction==Vertical?m_matrix.rows():m_matrix.cols()); } /** \returns a row (or column) vector expression representing * whether \b all coefficients of each respective column (or row) are \c true. * This expression can be assigned to a vector with entries of type \c bool. * * \sa DenseBase::all() */ EIGEN_DEVICE_FUNC const AllReturnType all() const { return AllReturnType(_expression()); } /** \returns a row (or column) vector expression representing * whether \b at \b least one coefficient of each respective column (or row) is \c true. * This expression can be assigned to a vector with entries of type \c bool. * * \sa DenseBase::any() */ EIGEN_DEVICE_FUNC const AnyReturnType any() const { return AnyReturnType(_expression()); } /** \returns a row (or column) vector expression representing * the number of \c true coefficients of each respective column (or row). * This expression can be assigned to a vector whose entries have the same type as is used to * index entries of the original matrix; for dense matrices, this is \c std::ptrdiff_t . * * Example: \include PartialRedux_count.cpp * Output: \verbinclude PartialRedux_count.out * * \sa DenseBase::count() */ EIGEN_DEVICE_FUNC const CountReturnType count() const { return CountReturnType(_expression()); } /** \returns a row (or column) vector expression of the product * of each column (or row) of the referenced expression. * * Example: \include PartialRedux_prod.cpp * Output: \verbinclude PartialRedux_prod.out * * \sa DenseBase::prod() */ EIGEN_DEVICE_FUNC const ProdReturnType prod() const { return ProdReturnType(_expression()); } /** \returns a matrix expression * where each column (or row) are reversed. * * Example: \include Vectorwise_reverse.cpp * Output: \verbinclude Vectorwise_reverse.out * * \sa DenseBase::reverse() */ EIGEN_DEVICE_FUNC const ConstReverseReturnType reverse() const { return ConstReverseReturnType( _expression() ); } /** \returns a writable matrix expression * where each column (or row) are reversed. * * \sa reverse() const */ EIGEN_DEVICE_FUNC ReverseReturnType reverse() { return ReverseReturnType( _expression() ); } typedef Replicate ReplicateReturnType; EIGEN_DEVICE_FUNC const ReplicateReturnType replicate(Index factor) const; /** * \return an expression of the replication of each column (or row) of \c *this * * Example: \include DirectionWise_replicate.cpp * Output: \verbinclude DirectionWise_replicate.out * * \sa VectorwiseOp::replicate(Index), DenseBase::replicate(), class Replicate */ // NOTE implemented here because of sunstudio's compilation errors // isVertical*Factor+isHorizontal instead of (isVertical?Factor:1) to handle CUDA bug with ternary operator template const Replicate EIGEN_DEVICE_FUNC replicate(Index factor = Factor) const { return Replicate (_expression(),isVertical?factor:1,isHorizontal?factor:1); } /////////// Artithmetic operators /////////// /** Copies the vector \a other to each subvector of \c *this */ template EIGEN_DEVICE_FUNC ExpressionType& operator=(const DenseBase& other) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) //eigen_assert((m_matrix.isNull()) == (other.isNull())); FIXME return m_matrix = extendedTo(other.derived()); } /** Adds the vector \a other to each subvector of \c *this */ template EIGEN_DEVICE_FUNC ExpressionType& operator+=(const DenseBase& other) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix += extendedTo(other.derived()); } /** Substracts the vector \a other to each subvector of \c *this */ template EIGEN_DEVICE_FUNC ExpressionType& operator-=(const DenseBase& other) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix -= extendedTo(other.derived()); } /** Multiples each subvector of \c *this by the vector \a other */ template EIGEN_DEVICE_FUNC ExpressionType& operator*=(const DenseBase& other) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) m_matrix *= extendedTo(other.derived()); return m_matrix; } /** Divides each subvector of \c *this by the vector \a other */ template EIGEN_DEVICE_FUNC ExpressionType& operator/=(const DenseBase& other) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) m_matrix /= extendedTo(other.derived()); return m_matrix; } /** Returns the expression of the sum of the vector \a other to each subvector of \c *this */ template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC CwiseBinaryOp, const ExpressionTypeNestedCleaned, const typename ExtendedType::Type> operator+(const DenseBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix + extendedTo(other.derived()); } /** Returns the expression of the difference between each subvector of \c *this and the vector \a other */ template EIGEN_DEVICE_FUNC CwiseBinaryOp, const ExpressionTypeNestedCleaned, const typename ExtendedType::Type> operator-(const DenseBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix - extendedTo(other.derived()); } /** Returns the expression where each subvector is the product of the vector \a other * by the corresponding subvector of \c *this */ template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC CwiseBinaryOp, const ExpressionTypeNestedCleaned, const typename ExtendedType::Type> EIGEN_DEVICE_FUNC operator*(const DenseBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix * extendedTo(other.derived()); } /** Returns the expression where each subvector is the quotient of the corresponding * subvector of \c *this by the vector \a other */ template EIGEN_DEVICE_FUNC CwiseBinaryOp, const ExpressionTypeNestedCleaned, const typename ExtendedType::Type> operator/(const DenseBase& other) const { EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived) EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType) EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived) return m_matrix / extendedTo(other.derived()); } /** \returns an expression where each column (or row) of the referenced matrix are normalized. * The referenced matrix is \b not modified. * \sa MatrixBase::normalized(), normalize() */ EIGEN_DEVICE_FUNC CwiseBinaryOp, const ExpressionTypeNestedCleaned, const typename OppositeExtendedType::Type> normalized() const { return m_matrix.cwiseQuotient(extendedToOpposite(this->norm())); } /** Normalize in-place each row or columns of the referenced matrix. * \sa MatrixBase::normalize(), normalized() */ EIGEN_DEVICE_FUNC void normalize() { m_matrix = this->normalized(); } EIGEN_DEVICE_FUNC inline void reverseInPlace(); /////////// Geometry module /////////// typedef Homogeneous HomogeneousReturnType; EIGEN_DEVICE_FUNC HomogeneousReturnType homogeneous() const; typedef typename ExpressionType::PlainObject CrossReturnType; template EIGEN_DEVICE_FUNC const CrossReturnType cross(const MatrixBase& other) const; enum { HNormalized_Size = Direction==Vertical ? internal::traits::RowsAtCompileTime : internal::traits::ColsAtCompileTime, HNormalized_SizeMinusOne = HNormalized_Size==Dynamic ? Dynamic : HNormalized_Size-1 }; typedef Block::RowsAtCompileTime), Direction==Horizontal ? int(HNormalized_SizeMinusOne) : int(internal::traits::ColsAtCompileTime)> HNormalized_Block; typedef Block::RowsAtCompileTime), Direction==Horizontal ? 1 : int(internal::traits::ColsAtCompileTime)> HNormalized_Factors; typedef CwiseBinaryOp::Scalar>, const HNormalized_Block, const Replicate > HNormalizedReturnType; EIGEN_DEVICE_FUNC const HNormalizedReturnType hnormalized() const; # ifdef EIGEN_VECTORWISEOP_PLUGIN # include EIGEN_VECTORWISEOP_PLUGIN # endif protected: Index redux_length() const { return Direction==Vertical ? m_matrix.rows() : m_matrix.cols(); } ExpressionTypeNested m_matrix; }; //const colwise moved to DenseBase.h due to CUDA compiler bug /** \returns a writable VectorwiseOp wrapper of *this providing additional partial reduction operations * * \sa rowwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting */ template EIGEN_DEVICE_FUNC inline typename DenseBase::ColwiseReturnType DenseBase::colwise() { return ColwiseReturnType(derived()); } //const rowwise moved to DenseBase.h due to CUDA compiler bug /** \returns a writable VectorwiseOp wrapper of *this providing additional partial reduction operations * * \sa colwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting */ template EIGEN_DEVICE_FUNC inline typename DenseBase::RowwiseReturnType DenseBase::rowwise() { return RowwiseReturnType(derived()); } } // end namespace Eigen #endif // EIGEN_PARTIAL_REDUX_H RcppEigen/inst/include/Eigen/src/Core/NoAlias.h0000644000176200001440000000704414562417221020744 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NOALIAS_H #define EIGEN_NOALIAS_H namespace Eigen { /** \class NoAlias * \ingroup Core_Module * * \brief Pseudo expression providing an operator = assuming no aliasing * * \tparam ExpressionType the type of the object on which to do the lazy assignment * * This class represents an expression with special assignment operators * assuming no aliasing between the target expression and the source expression. * More precisely it alloas to bypass the EvalBeforeAssignBit flag of the source expression. * It is the return type of MatrixBase::noalias() * and most of the time this is the only way it is used. * * \sa MatrixBase::noalias() */ template class StorageBase> class NoAlias { public: typedef typename ExpressionType::Scalar Scalar; EIGEN_DEVICE_FUNC explicit NoAlias(ExpressionType& expression) : m_expression(expression) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ExpressionType& operator=(const StorageBase& other) { call_assignment_no_alias(m_expression, other.derived(), internal::assign_op()); return m_expression; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ExpressionType& operator+=(const StorageBase& other) { call_assignment_no_alias(m_expression, other.derived(), internal::add_assign_op()); return m_expression; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ExpressionType& operator-=(const StorageBase& other) { call_assignment_no_alias(m_expression, other.derived(), internal::sub_assign_op()); return m_expression; } EIGEN_DEVICE_FUNC ExpressionType& expression() const { return m_expression; } protected: ExpressionType& m_expression; }; /** \returns a pseudo expression of \c *this with an operator= assuming * no aliasing between \c *this and the source expression. * * More precisely, noalias() allows to bypass the EvalBeforeAssignBit flag. * Currently, even though several expressions may alias, only product * expressions have this flag. Therefore, noalias() is only useful when * the source expression contains a matrix product. * * Here are some examples where noalias is useful: * \code * D.noalias() = A * B; * D.noalias() += A.transpose() * B; * D.noalias() -= 2 * A * B.adjoint(); * \endcode * * On the other hand the following example will lead to a \b wrong result: * \code * A.noalias() = A * B; * \endcode * because the result matrix A is also an operand of the matrix product. Therefore, * there is no alternative than evaluating A * B in a temporary, that is the default * behavior when you write: * \code * A = A * B; * \endcode * * \sa class NoAlias */ template NoAlias EIGEN_DEVICE_FUNC MatrixBase::noalias() { return NoAlias(derived()); } } // end namespace Eigen #endif // EIGEN_NOALIAS_H RcppEigen/inst/include/Eigen/src/Core/EigenBase.h0000644000176200001440000001332114562417221021233 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Benoit Jacob // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_EIGENBASE_H #define EIGEN_EIGENBASE_H namespace Eigen { /** \class EigenBase * \ingroup Core_Module * * Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T). * * In other words, an EigenBase object is an object that can be copied into a MatrixBase. * * Besides MatrixBase-derived classes, this also includes special matrix classes such as diagonal matrices, etc. * * Notice that this class is trivial, it is only used to disambiguate overloaded functions. * * \sa \blank \ref TopicClassHierarchy */ template struct EigenBase { // typedef typename internal::plain_matrix_type::type PlainObject; /** \brief The interface type of indices * \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE. * \sa StorageIndex, \ref TopicPreprocessorDirectives. * DEPRECATED: Since Eigen 3.3, its usage is deprecated. Use Eigen::Index instead. * Deprecation is not marked with a doxygen comment because there are too many existing usages to add the deprecation attribute. */ typedef Eigen::Index Index; // FIXME is it needed? typedef typename internal::traits::StorageKind StorageKind; /** \returns a reference to the derived object */ EIGEN_DEVICE_FUNC Derived& derived() { return *static_cast(this); } /** \returns a const reference to the derived object */ EIGEN_DEVICE_FUNC const Derived& derived() const { return *static_cast(this); } EIGEN_DEVICE_FUNC inline Derived& const_cast_derived() const { return *static_cast(const_cast(this)); } EIGEN_DEVICE_FUNC inline const Derived& const_derived() const { return *static_cast(this); } /** \returns the number of rows. \sa cols(), RowsAtCompileTime */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); } /** \returns the number of columns. \sa rows(), ColsAtCompileTime*/ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); } /** \returns the number of coefficients, which is rows()*cols(). * \sa rows(), cols(), SizeAtCompileTime. */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index size() const EIGEN_NOEXCEPT { return rows() * cols(); } /** \internal Don't use it, but do the equivalent: \code dst = *this; \endcode */ template EIGEN_DEVICE_FUNC inline void evalTo(Dest& dst) const { derived().evalTo(dst); } /** \internal Don't use it, but do the equivalent: \code dst += *this; \endcode */ template EIGEN_DEVICE_FUNC inline void addTo(Dest& dst) const { // This is the default implementation, // derived class can reimplement it in a more optimized way. typename Dest::PlainObject res(rows(),cols()); evalTo(res); dst += res; } /** \internal Don't use it, but do the equivalent: \code dst -= *this; \endcode */ template EIGEN_DEVICE_FUNC inline void subTo(Dest& dst) const { // This is the default implementation, // derived class can reimplement it in a more optimized way. typename Dest::PlainObject res(rows(),cols()); evalTo(res); dst -= res; } /** \internal Don't use it, but do the equivalent: \code dst.applyOnTheRight(*this); \endcode */ template EIGEN_DEVICE_FUNC inline void applyThisOnTheRight(Dest& dst) const { // This is the default implementation, // derived class can reimplement it in a more optimized way. dst = dst * this->derived(); } /** \internal Don't use it, but do the equivalent: \code dst.applyOnTheLeft(*this); \endcode */ template EIGEN_DEVICE_FUNC inline void applyThisOnTheLeft(Dest& dst) const { // This is the default implementation, // derived class can reimplement it in a more optimized way. dst = this->derived() * dst; } }; /*************************************************************************** * Implementation of matrix base methods ***************************************************************************/ /** \brief Copies the generic expression \a other into *this. * * \details The expression must provide a (templated) evalTo(Derived& dst) const * function which does the actual job. In practice, this allows any user to write * its own special matrix without having to modify MatrixBase * * \returns a reference to *this. */ template template EIGEN_DEVICE_FUNC Derived& DenseBase::operator=(const EigenBase &other) { call_assignment(derived(), other.derived()); return derived(); } template template EIGEN_DEVICE_FUNC Derived& DenseBase::operator+=(const EigenBase &other) { call_assignment(derived(), other.derived(), internal::add_assign_op()); return derived(); } template template EIGEN_DEVICE_FUNC Derived& DenseBase::operator-=(const EigenBase &other) { call_assignment(derived(), other.derived(), internal::sub_assign_op()); return derived(); } } // end namespace Eigen #endif // EIGEN_EIGENBASE_H RcppEigen/inst/include/Eigen/src/Core/MathFunctions.h0000644000176200001440000016656014562417221022211 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2010 Benoit Jacob // Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATHFUNCTIONS_H #define EIGEN_MATHFUNCTIONS_H // TODO this should better be moved to NumTraits // Source: WolframAlpha #define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406L #define EIGEN_LOG2E 1.442695040888963407359924681001892137426645954152985934135449406931109219L #define EIGEN_LN2 0.693147180559945309417232121458176568075500134360255254120680009493393621L namespace Eigen { // On WINCE, std::abs is defined for int only, so let's defined our own overloads: // This issue has been confirmed with MSVC 2008 only, but the issue might exist for more recent versions too. #if EIGEN_OS_WINCE && EIGEN_COMP_MSVC && EIGEN_COMP_MSVC<=1500 long abs(long x) { return (labs(x)); } double abs(double x) { return (fabs(x)); } float abs(float x) { return (fabsf(x)); } long double abs(long double x) { return (fabsl(x)); } #endif namespace internal { /** \internal \class global_math_functions_filtering_base * * What it does: * Defines a typedef 'type' as follows: * - if type T has a member typedef Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl, then * global_math_functions_filtering_base::type is a typedef for it. * - otherwise, global_math_functions_filtering_base::type is a typedef for T. * * How it's used: * To allow to defined the global math functions (like sin...) in certain cases, like the Array expressions. * When you do sin(array1+array2), the object array1+array2 has a complicated expression type, all what you want to know * is that it inherits ArrayBase. So we implement a partial specialization of sin_impl for ArrayBase. * So we must make sure to use sin_impl > and not sin_impl, otherwise our partial specialization * won't be used. How does sin know that? That's exactly what global_math_functions_filtering_base tells it. * * How it's implemented: * SFINAE in the style of enable_if. Highly susceptible of breaking compilers. With GCC, it sure does work, but if you replace * the typename dummy by an integer template parameter, it doesn't work anymore! */ template struct global_math_functions_filtering_base { typedef T type; }; template struct always_void { typedef void type; }; template struct global_math_functions_filtering_base ::type > { typedef typename T::Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl type; }; #define EIGEN_MATHFUNC_IMPL(func, scalar) Eigen::internal::func##_impl::type> #define EIGEN_MATHFUNC_RETVAL(func, scalar) typename Eigen::internal::func##_retval::type>::type /**************************************************************************** * Implementation of real * ****************************************************************************/ template::IsComplex> struct real_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return x; } }; template struct real_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { using std::real; return real(x); } }; template struct real_impl : real_default_impl {}; #if defined(EIGEN_GPU_COMPILE_PHASE) template struct real_impl > { typedef T RealScalar; EIGEN_DEVICE_FUNC static inline T run(const std::complex& x) { return x.real(); } }; #endif template struct real_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of imag * ****************************************************************************/ template::IsComplex> struct imag_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar&) { return RealScalar(0); } }; template struct imag_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { using std::imag; return imag(x); } }; template struct imag_impl : imag_default_impl {}; #if defined(EIGEN_GPU_COMPILE_PHASE) template struct imag_impl > { typedef T RealScalar; EIGEN_DEVICE_FUNC static inline T run(const std::complex& x) { return x.imag(); } }; #endif template struct imag_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of real_ref * ****************************************************************************/ template struct real_ref_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar& run(Scalar& x) { return reinterpret_cast(&x)[0]; } EIGEN_DEVICE_FUNC static inline const RealScalar& run(const Scalar& x) { return reinterpret_cast(&x)[0]; } }; template struct real_ref_retval { typedef typename NumTraits::Real & type; }; /**************************************************************************** * Implementation of imag_ref * ****************************************************************************/ template struct imag_ref_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar& run(Scalar& x) { return reinterpret_cast(&x)[1]; } EIGEN_DEVICE_FUNC static inline const RealScalar& run(const Scalar& x) { return reinterpret_cast(&x)[1]; } }; template struct imag_ref_default_impl { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline Scalar run(Scalar&) { return Scalar(0); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline const Scalar run(const Scalar&) { return Scalar(0); } }; template struct imag_ref_impl : imag_ref_default_impl::IsComplex> {}; template struct imag_ref_retval { typedef typename NumTraits::Real & type; }; /**************************************************************************** * Implementation of conj * ****************************************************************************/ template::IsComplex> struct conj_default_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { return x; } }; template struct conj_default_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { using std::conj; return conj(x); } }; template::IsComplex> struct conj_impl : conj_default_impl {}; template struct conj_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of abs2 * ****************************************************************************/ template struct abs2_impl_default { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return x*x; } }; template struct abs2_impl_default // IsComplex { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return x.real()*x.real() + x.imag()*x.imag(); } }; template struct abs2_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return abs2_impl_default::IsComplex>::run(x); } }; template struct abs2_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of sqrt/rsqrt * ****************************************************************************/ template struct sqrt_impl { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE Scalar run(const Scalar& x) { EIGEN_USING_STD(sqrt); return sqrt(x); } }; // Complex sqrt defined in MathFunctionsImpl.h. template EIGEN_DEVICE_FUNC std::complex complex_sqrt(const std::complex& a_x); // Custom implementation is faster than `std::sqrt`, works on // GPU, and correctly handles special cases (unlike MSVC). template struct sqrt_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE std::complex run(const std::complex& x) { return complex_sqrt(x); } }; template struct sqrt_retval { typedef Scalar type; }; // Default implementation relies on numext::sqrt, at bottom of file. template struct rsqrt_impl; // Complex rsqrt defined in MathFunctionsImpl.h. template EIGEN_DEVICE_FUNC std::complex complex_rsqrt(const std::complex& a_x); template struct rsqrt_impl > { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE std::complex run(const std::complex& x) { return complex_rsqrt(x); } }; template struct rsqrt_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of norm1 * ****************************************************************************/ template struct norm1_default_impl; template struct norm1_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { EIGEN_USING_STD(abs); return abs(x.real()) + abs(x.imag()); } }; template struct norm1_default_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_USING_STD(abs); return abs(x); } }; template struct norm1_impl : norm1_default_impl::IsComplex> {}; template struct norm1_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of hypot * ****************************************************************************/ template struct hypot_impl; template struct hypot_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of cast * ****************************************************************************/ template struct cast_impl { EIGEN_DEVICE_FUNC static inline NewType run(const OldType& x) { return static_cast(x); } }; // Casting from S -> Complex leads to an implicit conversion from S to T, // generating warnings on clang. Here we explicitly cast the real component. template struct cast_impl::IsComplex && NumTraits::IsComplex >::type> { EIGEN_DEVICE_FUNC static inline NewType run(const OldType& x) { typedef typename NumTraits::Real NewReal; return static_cast(static_cast(x)); } }; // here, for once, we're plainly returning NewType: we don't want cast to do weird things. template EIGEN_DEVICE_FUNC inline NewType cast(const OldType& x) { return cast_impl::run(x); } /**************************************************************************** * Implementation of round * ****************************************************************************/ template struct round_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_STATIC_ASSERT((!NumTraits::IsComplex), NUMERIC_TYPE_MUST_BE_REAL) #if EIGEN_HAS_CXX11_MATH EIGEN_USING_STD(round); #endif return Scalar(round(x)); } }; #if !EIGEN_HAS_CXX11_MATH #if EIGEN_HAS_C99_MATH // Use ::roundf for float. template<> struct round_impl { EIGEN_DEVICE_FUNC static inline float run(const float& x) { return ::roundf(x); } }; #else template struct round_using_floor_ceil_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_STATIC_ASSERT((!NumTraits::IsComplex), NUMERIC_TYPE_MUST_BE_REAL) // Without C99 round/roundf, resort to floor/ceil. EIGEN_USING_STD(floor); EIGEN_USING_STD(ceil); // If not enough precision to resolve a decimal at all, return the input. // Otherwise, adding 0.5 can trigger an increment by 1. const Scalar limit = Scalar(1ull << (NumTraits::digits() - 1)); if (x >= limit || x <= -limit) { return x; } return (x > Scalar(0)) ? Scalar(floor(x + Scalar(0.5))) : Scalar(ceil(x - Scalar(0.5))); } }; template<> struct round_impl : round_using_floor_ceil_impl {}; template<> struct round_impl : round_using_floor_ceil_impl {}; #endif // EIGEN_HAS_C99_MATH #endif // !EIGEN_HAS_CXX11_MATH template struct round_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of rint * ****************************************************************************/ template struct rint_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_STATIC_ASSERT((!NumTraits::IsComplex), NUMERIC_TYPE_MUST_BE_REAL) #if EIGEN_HAS_CXX11_MATH EIGEN_USING_STD(rint); #endif return rint(x); } }; #if !EIGEN_HAS_CXX11_MATH template<> struct rint_impl { EIGEN_DEVICE_FUNC static inline double run(const double& x) { return ::rint(x); } }; template<> struct rint_impl { EIGEN_DEVICE_FUNC static inline float run(const float& x) { return ::rintf(x); } }; #endif template struct rint_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of arg * ****************************************************************************/ // Visual Studio 2017 has a bug where arg(float) returns 0 for negative inputs. // This seems to be fixed in VS 2019. #if EIGEN_HAS_CXX11_MATH && (!EIGEN_COMP_MSVC || EIGEN_COMP_MSVC >= 1920) // std::arg is only defined for types of std::complex, or integer types or float/double/long double template::IsComplex || is_integral::value || is_same::value || is_same::value || is_same::value > struct arg_default_impl; template struct arg_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { #if defined(EIGEN_HIP_DEVICE_COMPILE) // HIP does not seem to have a native device side implementation for the math routine "arg" using std::arg; #else EIGEN_USING_STD(arg); #endif return static_cast(arg(x)); } }; // Must be non-complex floating-point type (e.g. half/bfloat16). template struct arg_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return (x < Scalar(0)) ? RealScalar(EIGEN_PI) : RealScalar(0); } }; #else template::IsComplex> struct arg_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { return (x < RealScalar(0)) ? RealScalar(EIGEN_PI) : RealScalar(0); } }; template struct arg_default_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC static inline RealScalar run(const Scalar& x) { EIGEN_USING_STD(arg); return arg(x); } }; #endif template struct arg_impl : arg_default_impl {}; template struct arg_retval { typedef typename NumTraits::Real type; }; /**************************************************************************** * Implementation of expm1 * ****************************************************************************/ // This implementation is based on GSL Math's expm1. namespace std_fallback { // fallback expm1 implementation in case there is no expm1(Scalar) function in namespace of Scalar, // or that there is no suitable std::expm1 function available. Implementation // attributed to Kahan. See: http://www.plunk.org/~hatch/rightway.php. template EIGEN_DEVICE_FUNC inline Scalar expm1(const Scalar& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar) typedef typename NumTraits::Real RealScalar; EIGEN_USING_STD(exp); Scalar u = exp(x); if (numext::equal_strict(u, Scalar(1))) { return x; } Scalar um1 = u - RealScalar(1); if (numext::equal_strict(um1, Scalar(-1))) { return RealScalar(-1); } EIGEN_USING_STD(log); Scalar logu = log(u); return numext::equal_strict(u, logu) ? u : (u - RealScalar(1)) * x / logu; } } template struct expm1_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar) #if EIGEN_HAS_CXX11_MATH using std::expm1; #else using std_fallback::expm1; #endif return expm1(x); } }; template struct expm1_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of log * ****************************************************************************/ // Complex log defined in MathFunctionsImpl.h. template EIGEN_DEVICE_FUNC std::complex complex_log(const std::complex& z); template struct log_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_USING_STD(log); return static_cast(log(x)); } }; template struct log_impl > { EIGEN_DEVICE_FUNC static inline std::complex run(const std::complex& z) { return complex_log(z); } }; /**************************************************************************** * Implementation of log1p * ****************************************************************************/ namespace std_fallback { // fallback log1p implementation in case there is no log1p(Scalar) function in namespace of Scalar, // or that there is no suitable std::log1p function available template EIGEN_DEVICE_FUNC inline Scalar log1p(const Scalar& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar) typedef typename NumTraits::Real RealScalar; EIGEN_USING_STD(log); Scalar x1p = RealScalar(1) + x; Scalar log_1p = log_impl::run(x1p); const bool is_small = numext::equal_strict(x1p, Scalar(1)); const bool is_inf = numext::equal_strict(x1p, log_1p); return (is_small || is_inf) ? x : x * (log_1p / (x1p - RealScalar(1))); } } template struct log1p_impl { EIGEN_DEVICE_FUNC static inline Scalar run(const Scalar& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar) #if EIGEN_HAS_CXX11_MATH using std::log1p; #else using std_fallback::log1p; #endif return log1p(x); } }; // Specialization for complex types that are not supported by std::log1p. template struct log1p_impl > { EIGEN_DEVICE_FUNC static inline std::complex run( const std::complex& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar) return std_fallback::log1p(x); } }; template struct log1p_retval { typedef Scalar type; }; /**************************************************************************** * Implementation of pow * ****************************************************************************/ template::IsInteger&&NumTraits::IsInteger> struct pow_impl { //typedef Scalar retval; typedef typename ScalarBinaryOpTraits >::ReturnType result_type; static EIGEN_DEVICE_FUNC inline result_type run(const ScalarX& x, const ScalarY& y) { EIGEN_USING_STD(pow); return pow(x, y); } }; template struct pow_impl { typedef ScalarX result_type; static EIGEN_DEVICE_FUNC inline ScalarX run(ScalarX x, ScalarY y) { ScalarX res(1); eigen_assert(!NumTraits::IsSigned || y >= 0); if(y & 1) res *= x; y >>= 1; while(y) { x *= x; if(y&1) res *= x; y >>= 1; } return res; } }; /**************************************************************************** * Implementation of random * ****************************************************************************/ template struct random_default_impl {}; template struct random_impl : random_default_impl::IsComplex, NumTraits::IsInteger> {}; template struct random_retval { typedef Scalar type; }; template inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random(const Scalar& x, const Scalar& y); template inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random(); template struct random_default_impl { static inline Scalar run(const Scalar& x, const Scalar& y) { return x + (y-x) * Scalar(std::rand()) / Scalar(RAND_MAX); } static inline Scalar run() { return run(Scalar(NumTraits::IsSigned ? -1 : 0), Scalar(1)); } }; enum { meta_floor_log2_terminate, meta_floor_log2_move_up, meta_floor_log2_move_down, meta_floor_log2_bogus }; template struct meta_floor_log2_selector { enum { middle = (lower + upper) / 2, value = (upper <= lower + 1) ? int(meta_floor_log2_terminate) : (n < (1 << middle)) ? int(meta_floor_log2_move_down) : (n==0) ? int(meta_floor_log2_bogus) : int(meta_floor_log2_move_up) }; }; template::value> struct meta_floor_log2 {}; template struct meta_floor_log2 { enum { value = meta_floor_log2::middle>::value }; }; template struct meta_floor_log2 { enum { value = meta_floor_log2::middle, upper>::value }; }; template struct meta_floor_log2 { enum { value = (n >= ((unsigned int)(1) << (lower+1))) ? lower+1 : lower }; }; template struct meta_floor_log2 { // no value, error at compile time }; template struct random_default_impl { static inline Scalar run(const Scalar& x, const Scalar& y) { if (y <= x) return x; // ScalarU is the unsigned counterpart of Scalar, possibly Scalar itself. typedef typename make_unsigned::type ScalarU; // ScalarX is the widest of ScalarU and unsigned int. // We'll deal only with ScalarX and unsigned int below thus avoiding signed // types and arithmetic and signed overflows (which are undefined behavior). typedef typename conditional<(ScalarU(-1) > unsigned(-1)), ScalarU, unsigned>::type ScalarX; // The following difference doesn't overflow, provided our integer types are two's // complement and have the same number of padding bits in signed and unsigned variants. // This is the case in most modern implementations of C++. ScalarX range = ScalarX(y) - ScalarX(x); ScalarX offset = 0; ScalarX divisor = 1; ScalarX multiplier = 1; const unsigned rand_max = RAND_MAX; if (range <= rand_max) divisor = (rand_max + 1) / (range + 1); else multiplier = 1 + range / (rand_max + 1); // Rejection sampling. do { offset = (unsigned(std::rand()) * multiplier) / divisor; } while (offset > range); return Scalar(ScalarX(x) + offset); } static inline Scalar run() { #ifdef EIGEN_MAKING_DOCS return run(Scalar(NumTraits::IsSigned ? -10 : 0), Scalar(10)); #else enum { rand_bits = meta_floor_log2<(unsigned int)(RAND_MAX)+1>::value, scalar_bits = sizeof(Scalar) * CHAR_BIT, shift = EIGEN_PLAIN_ENUM_MAX(0, int(rand_bits) - int(scalar_bits)), offset = NumTraits::IsSigned ? (1 << (EIGEN_PLAIN_ENUM_MIN(rand_bits,scalar_bits)-1)) : 0 }; return Scalar((std::rand() >> shift) - offset); #endif } }; template struct random_default_impl { static inline Scalar run(const Scalar& x, const Scalar& y) { return Scalar(random(x.real(), y.real()), random(x.imag(), y.imag())); } static inline Scalar run() { typedef typename NumTraits::Real RealScalar; return Scalar(random(), random()); } }; template inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random(const Scalar& x, const Scalar& y) { return EIGEN_MATHFUNC_IMPL(random, Scalar)::run(x, y); } template inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random() { return EIGEN_MATHFUNC_IMPL(random, Scalar)::run(); } // Implementation of is* functions // std::is* do not work with fast-math and gcc, std::is* are available on MSVC 2013 and newer, as well as in clang. #if (EIGEN_HAS_CXX11_MATH && !(EIGEN_COMP_GNUC_STRICT && __FINITE_MATH_ONLY__)) || (EIGEN_COMP_MSVC>=1800) || (EIGEN_COMP_CLANG) #define EIGEN_USE_STD_FPCLASSIFY 1 #else #define EIGEN_USE_STD_FPCLASSIFY 0 #endif template EIGEN_DEVICE_FUNC typename internal::enable_if::value,bool>::type isnan_impl(const T&) { return false; } template EIGEN_DEVICE_FUNC typename internal::enable_if::value,bool>::type isinf_impl(const T&) { return false; } template EIGEN_DEVICE_FUNC typename internal::enable_if::value,bool>::type isfinite_impl(const T&) { return true; } template EIGEN_DEVICE_FUNC typename internal::enable_if<(!internal::is_integral::value)&&(!NumTraits::IsComplex),bool>::type isfinite_impl(const T& x) { #if defined(EIGEN_GPU_COMPILE_PHASE) return (::isfinite)(x); #elif EIGEN_USE_STD_FPCLASSIFY using std::isfinite; return isfinite EIGEN_NOT_A_MACRO (x); #else return x<=NumTraits::highest() && x>=NumTraits::lowest(); #endif } template EIGEN_DEVICE_FUNC typename internal::enable_if<(!internal::is_integral::value)&&(!NumTraits::IsComplex),bool>::type isinf_impl(const T& x) { #if defined(EIGEN_GPU_COMPILE_PHASE) return (::isinf)(x); #elif EIGEN_USE_STD_FPCLASSIFY using std::isinf; return isinf EIGEN_NOT_A_MACRO (x); #else return x>NumTraits::highest() || x::lowest(); #endif } template EIGEN_DEVICE_FUNC typename internal::enable_if<(!internal::is_integral::value)&&(!NumTraits::IsComplex),bool>::type isnan_impl(const T& x) { #if defined(EIGEN_GPU_COMPILE_PHASE) return (::isnan)(x); #elif EIGEN_USE_STD_FPCLASSIFY using std::isnan; return isnan EIGEN_NOT_A_MACRO (x); #else return x != x; #endif } #if (!EIGEN_USE_STD_FPCLASSIFY) #if EIGEN_COMP_MSVC template EIGEN_DEVICE_FUNC bool isinf_msvc_helper(T x) { return _fpclass(x)==_FPCLASS_NINF || _fpclass(x)==_FPCLASS_PINF; } //MSVC defines a _isnan builtin function, but for double only EIGEN_DEVICE_FUNC inline bool isnan_impl(const long double& x) { return _isnan(x)!=0; } EIGEN_DEVICE_FUNC inline bool isnan_impl(const double& x) { return _isnan(x)!=0; } EIGEN_DEVICE_FUNC inline bool isnan_impl(const float& x) { return _isnan(x)!=0; } EIGEN_DEVICE_FUNC inline bool isinf_impl(const long double& x) { return isinf_msvc_helper(x); } EIGEN_DEVICE_FUNC inline bool isinf_impl(const double& x) { return isinf_msvc_helper(x); } EIGEN_DEVICE_FUNC inline bool isinf_impl(const float& x) { return isinf_msvc_helper(x); } #elif (defined __FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ && EIGEN_COMP_GNUC) #if EIGEN_GNUC_AT_LEAST(5,0) #define EIGEN_TMP_NOOPT_ATTRIB EIGEN_DEVICE_FUNC inline __attribute__((optimize("no-finite-math-only"))) #else // NOTE the inline qualifier and noinline attribute are both needed: the former is to avoid linking issue (duplicate symbol), // while the second prevent too aggressive optimizations in fast-math mode: #define EIGEN_TMP_NOOPT_ATTRIB EIGEN_DEVICE_FUNC inline __attribute__((noinline,optimize("no-finite-math-only"))) #endif template<> EIGEN_TMP_NOOPT_ATTRIB bool isnan_impl(const long double& x) { return __builtin_isnan(x); } template<> EIGEN_TMP_NOOPT_ATTRIB bool isnan_impl(const double& x) { return __builtin_isnan(x); } template<> EIGEN_TMP_NOOPT_ATTRIB bool isnan_impl(const float& x) { return __builtin_isnan(x); } template<> EIGEN_TMP_NOOPT_ATTRIB bool isinf_impl(const double& x) { return __builtin_isinf(x); } template<> EIGEN_TMP_NOOPT_ATTRIB bool isinf_impl(const float& x) { return __builtin_isinf(x); } template<> EIGEN_TMP_NOOPT_ATTRIB bool isinf_impl(const long double& x) { return __builtin_isinf(x); } #undef EIGEN_TMP_NOOPT_ATTRIB #endif #endif // The following overload are defined at the end of this file template EIGEN_DEVICE_FUNC bool isfinite_impl(const std::complex& x); template EIGEN_DEVICE_FUNC bool isnan_impl(const std::complex& x); template EIGEN_DEVICE_FUNC bool isinf_impl(const std::complex& x); template T generic_fast_tanh_float(const T& a_x); } // end namespace internal /**************************************************************************** * Generic math functions * ****************************************************************************/ namespace numext { #if (!defined(EIGEN_GPUCC) || defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC)) template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y) { EIGEN_USING_STD(min) return min EIGEN_NOT_A_MACRO (x,y); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y) { EIGEN_USING_STD(max) return max EIGEN_NOT_A_MACRO (x,y); } #else template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y) { return y < x ? y : x; } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float mini(const float& x, const float& y) { return fminf(x, y); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double mini(const double& x, const double& y) { return fmin(x, y); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE long double mini(const long double& x, const long double& y) { #if defined(EIGEN_HIPCC) // no "fminl" on HIP yet return (x < y) ? x : y; #else return fminl(x, y); #endif } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y) { return x < y ? y : x; } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float maxi(const float& x, const float& y) { return fmaxf(x, y); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double maxi(const double& x, const double& y) { return fmax(x, y); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE long double maxi(const long double& x, const long double& y) { #if defined(EIGEN_HIPCC) // no "fmaxl" on HIP yet return (x > y) ? x : y; #else return fmaxl(x, y); #endif } #endif #if defined(SYCL_DEVICE_ONLY) #define SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_char) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_short) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_int) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_long) #define SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_char) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_short) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_int) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_long) #define SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_uchar) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_ushort) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_uint) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_ulong) #define SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_uchar) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_ushort) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_uint) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_ulong) #define SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(NAME, FUNC) \ SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) \ SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY(NAME, FUNC) #define SYCL_SPECIALIZE_INTEGER_TYPES_UNARY(NAME, FUNC) \ SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) \ SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY(NAME, FUNC) #define SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(NAME, FUNC) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, cl::sycl::cl_float) \ SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC,cl::sycl::cl_double) #define SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(NAME, FUNC) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, cl::sycl::cl_float) \ SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC,cl::sycl::cl_double) #define SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(NAME, FUNC, RET_TYPE) \ SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, cl::sycl::cl_float) \ SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, cl::sycl::cl_double) #define SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE) \ template<> \ EIGEN_DEVICE_FUNC \ EIGEN_ALWAYS_INLINE RET_TYPE NAME(const ARG_TYPE& x) { \ return cl::sycl::FUNC(x); \ } #define SYCL_SPECIALIZE_UNARY_FUNC(NAME, FUNC, TYPE) \ SYCL_SPECIALIZE_GEN_UNARY_FUNC(NAME, FUNC, TYPE, TYPE) #define SYCL_SPECIALIZE_GEN1_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE1, ARG_TYPE2) \ template<> \ EIGEN_DEVICE_FUNC \ EIGEN_ALWAYS_INLINE RET_TYPE NAME(const ARG_TYPE1& x, const ARG_TYPE2& y) { \ return cl::sycl::FUNC(x, y); \ } #define SYCL_SPECIALIZE_GEN2_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE) \ SYCL_SPECIALIZE_GEN1_BINARY_FUNC(NAME, FUNC, RET_TYPE, ARG_TYPE, ARG_TYPE) #define SYCL_SPECIALIZE_BINARY_FUNC(NAME, FUNC, TYPE) \ SYCL_SPECIALIZE_GEN2_BINARY_FUNC(NAME, FUNC, TYPE, TYPE) SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(mini, min) SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(mini, fmin) SYCL_SPECIALIZE_INTEGER_TYPES_BINARY(maxi, max) SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(maxi, fmax) #endif template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(real, Scalar) real(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(real, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline typename internal::add_const_on_value_type< EIGEN_MATHFUNC_RETVAL(real_ref, Scalar) >::type real_ref(const Scalar& x) { return internal::real_ref_impl::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(real_ref, Scalar) real_ref(Scalar& x) { return EIGEN_MATHFUNC_IMPL(real_ref, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(imag, Scalar) imag(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(imag, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(arg, Scalar) arg(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(arg, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline typename internal::add_const_on_value_type< EIGEN_MATHFUNC_RETVAL(imag_ref, Scalar) >::type imag_ref(const Scalar& x) { return internal::imag_ref_impl::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(imag_ref, Scalar) imag_ref(Scalar& x) { return EIGEN_MATHFUNC_IMPL(imag_ref, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(conj, Scalar) conj(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(conj, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(abs2, Scalar) abs2(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(abs2, Scalar)::run(x); } EIGEN_DEVICE_FUNC inline bool abs2(bool x) { return x; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T absdiff(const T& x, const T& y) { return x > y ? x - y : y - x; } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float absdiff(const float& x, const float& y) { return fabsf(x - y); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double absdiff(const double& x, const double& y) { return fabs(x - y); } #if !defined(EIGEN_GPUCC) // HIP and CUDA do not support long double. template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE long double absdiff(const long double& x, const long double& y) { return fabsl(x - y); } #endif template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(norm1, Scalar) norm1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(norm1, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(hypot, Scalar) hypot(const Scalar& x, const Scalar& y) { return EIGEN_MATHFUNC_IMPL(hypot, Scalar)::run(x, y); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(hypot, hypot) #endif template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(log1p, Scalar) log1p(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(log1p, Scalar)::run(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(log1p, log1p) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float log1p(const float &x) { return ::log1pf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double log1p(const double &x) { return ::log1p(x); } #endif template EIGEN_DEVICE_FUNC inline typename internal::pow_impl::result_type pow(const ScalarX& x, const ScalarY& y) { return internal::pow_impl::run(x, y); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(pow, pow) #endif template EIGEN_DEVICE_FUNC bool (isnan) (const T &x) { return internal::isnan_impl(x); } template EIGEN_DEVICE_FUNC bool (isinf) (const T &x) { return internal::isinf_impl(x); } template EIGEN_DEVICE_FUNC bool (isfinite)(const T &x) { return internal::isfinite_impl(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isnan, isnan, bool) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isinf, isinf, bool) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE(isfinite, isfinite, bool) #endif template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(rint, Scalar) rint(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(rint, Scalar)::run(x); } template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(round, Scalar)::run(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(round, round) #endif template EIGEN_DEVICE_FUNC T (floor)(const T& x) { EIGEN_USING_STD(floor) return floor(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(floor, floor) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float floor(const float &x) { return ::floorf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double floor(const double &x) { return ::floor(x); } #endif template EIGEN_DEVICE_FUNC T (ceil)(const T& x) { EIGEN_USING_STD(ceil); return ceil(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(ceil, ceil) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float ceil(const float &x) { return ::ceilf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double ceil(const double &x) { return ::ceil(x); } #endif /** Log base 2 for 32 bits positive integers. * Conveniently returns 0 for x==0. */ inline int log2(int x) { eigen_assert(x>=0); unsigned int v(x); static const int table[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 }; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return table[(v * 0x07C4ACDDU) >> 27]; } /** \returns the square root of \a x. * * It is essentially equivalent to * \code using std::sqrt; return sqrt(x); \endcode * but slightly faster for float/double and some compilers (e.g., gcc), thanks to * specializations when SSE is enabled. * * It's usage is justified in performance critical functions, like norm/normalize. */ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE EIGEN_MATHFUNC_RETVAL(sqrt, Scalar) sqrt(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(sqrt, Scalar)::run(x); } // Boolean specialization, avoids implicit float to bool conversion (-Wimplicit-conversion-floating-point-to-bool). template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_DEVICE_FUNC bool sqrt(const bool &x) { return x; } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sqrt, sqrt) #endif /** \returns the reciprocal square root of \a x. **/ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T rsqrt(const T& x) { return internal::rsqrt_impl::run(x); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T log(const T &x) { return internal::log_impl::run(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(log, log) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float log(const float &x) { return ::logf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double log(const double &x) { return ::log(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename internal::enable_if::IsSigned || NumTraits::IsComplex,typename NumTraits::Real>::type abs(const T &x) { EIGEN_USING_STD(abs); return abs(x); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE typename internal::enable_if::IsSigned || NumTraits::IsComplex),typename NumTraits::Real>::type abs(const T &x) { return x; } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_INTEGER_TYPES_UNARY(abs, abs) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(abs, fabs) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float abs(const float &x) { return ::fabsf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double abs(const double &x) { return ::fabs(x); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float abs(const std::complex& x) { return ::hypotf(x.real(), x.imag()); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double abs(const std::complex& x) { return ::hypot(x.real(), x.imag()); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T exp(const T &x) { EIGEN_USING_STD(exp); return exp(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(exp, exp) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float exp(const float &x) { return ::expf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double exp(const double &x) { return ::exp(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE std::complex exp(const std::complex& x) { float com = ::expf(x.real()); float res_real = com * ::cosf(x.imag()); float res_imag = com * ::sinf(x.imag()); return std::complex(res_real, res_imag); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE std::complex exp(const std::complex& x) { double com = ::exp(x.real()); double res_real = com * ::cos(x.imag()); double res_imag = com * ::sin(x.imag()); return std::complex(res_real, res_imag); } #endif template EIGEN_DEVICE_FUNC inline EIGEN_MATHFUNC_RETVAL(expm1, Scalar) expm1(const Scalar& x) { return EIGEN_MATHFUNC_IMPL(expm1, Scalar)::run(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(expm1, expm1) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float expm1(const float &x) { return ::expm1f(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double expm1(const double &x) { return ::expm1(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T cos(const T &x) { EIGEN_USING_STD(cos); return cos(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(cos,cos) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float cos(const float &x) { return ::cosf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double cos(const double &x) { return ::cos(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T sin(const T &x) { EIGEN_USING_STD(sin); return sin(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sin, sin) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float sin(const float &x) { return ::sinf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double sin(const double &x) { return ::sin(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T tan(const T &x) { EIGEN_USING_STD(tan); return tan(x); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(tan, tan) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float tan(const float &x) { return ::tanf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double tan(const double &x) { return ::tan(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T acos(const T &x) { EIGEN_USING_STD(acos); return acos(x); } #if EIGEN_HAS_CXX11_MATH template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T acosh(const T &x) { EIGEN_USING_STD(acosh); return static_cast(acosh(x)); } #endif #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(acos, acos) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(acosh, acosh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float acos(const float &x) { return ::acosf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double acos(const double &x) { return ::acos(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T asin(const T &x) { EIGEN_USING_STD(asin); return asin(x); } #if EIGEN_HAS_CXX11_MATH template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T asinh(const T &x) { EIGEN_USING_STD(asinh); return static_cast(asinh(x)); } #endif #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(asin, asin) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(asinh, asinh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float asin(const float &x) { return ::asinf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double asin(const double &x) { return ::asin(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T atan(const T &x) { EIGEN_USING_STD(atan); return static_cast(atan(x)); } #if EIGEN_HAS_CXX11_MATH template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T atanh(const T &x) { EIGEN_USING_STD(atanh); return static_cast(atanh(x)); } #endif #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(atan, atan) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(atanh, atanh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float atan(const float &x) { return ::atanf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double atan(const double &x) { return ::atan(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T cosh(const T &x) { EIGEN_USING_STD(cosh); return static_cast(cosh(x)); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(cosh, cosh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float cosh(const float &x) { return ::coshf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double cosh(const double &x) { return ::cosh(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T sinh(const T &x) { EIGEN_USING_STD(sinh); return static_cast(sinh(x)); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(sinh, sinh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float sinh(const float &x) { return ::sinhf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double sinh(const double &x) { return ::sinh(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T tanh(const T &x) { EIGEN_USING_STD(tanh); return tanh(x); } #if (!defined(EIGEN_GPUCC)) && EIGEN_FAST_MATH && !defined(SYCL_DEVICE_ONLY) EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float tanh(float x) { return internal::generic_fast_tanh_float(x); } #endif #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_UNARY(tanh, tanh) #endif #if defined(EIGEN_GPUCC) template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float tanh(const float &x) { return ::tanhf(x); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double tanh(const double &x) { return ::tanh(x); } #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T fmod(const T& a, const T& b) { EIGEN_USING_STD(fmod); return fmod(a, b); } #if defined(SYCL_DEVICE_ONLY) SYCL_SPECIALIZE_FLOATING_TYPES_BINARY(fmod, fmod) #endif #if defined(EIGEN_GPUCC) template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float fmod(const float& a, const float& b) { return ::fmodf(a, b); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double fmod(const double& a, const double& b) { return ::fmod(a, b); } #endif #if defined(SYCL_DEVICE_ONLY) #undef SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_BINARY #undef SYCL_SPECIALIZE_SIGNED_INTEGER_TYPES_UNARY #undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_BINARY #undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY #undef SYCL_SPECIALIZE_INTEGER_TYPES_BINARY #undef SYCL_SPECIALIZE_UNSIGNED_INTEGER_TYPES_UNARY #undef SYCL_SPECIALIZE_FLOATING_TYPES_BINARY #undef SYCL_SPECIALIZE_FLOATING_TYPES_UNARY #undef SYCL_SPECIALIZE_FLOATING_TYPES_UNARY_FUNC_RET_TYPE #undef SYCL_SPECIALIZE_GEN_UNARY_FUNC #undef SYCL_SPECIALIZE_UNARY_FUNC #undef SYCL_SPECIALIZE_GEN1_BINARY_FUNC #undef SYCL_SPECIALIZE_GEN2_BINARY_FUNC #undef SYCL_SPECIALIZE_BINARY_FUNC #endif } // end namespace numext namespace internal { template EIGEN_DEVICE_FUNC bool isfinite_impl(const std::complex& x) { return (numext::isfinite)(numext::real(x)) && (numext::isfinite)(numext::imag(x)); } template EIGEN_DEVICE_FUNC bool isnan_impl(const std::complex& x) { return (numext::isnan)(numext::real(x)) || (numext::isnan)(numext::imag(x)); } template EIGEN_DEVICE_FUNC bool isinf_impl(const std::complex& x) { return ((numext::isinf)(numext::real(x)) || (numext::isinf)(numext::imag(x))) && (!(numext::isnan)(x)); } /**************************************************************************** * Implementation of fuzzy comparisons * ****************************************************************************/ template struct scalar_fuzzy_default_impl {}; template struct scalar_fuzzy_default_impl { typedef typename NumTraits::Real RealScalar; template EIGEN_DEVICE_FUNC static inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y, const RealScalar& prec) { return numext::abs(x) <= numext::abs(y) * prec; } EIGEN_DEVICE_FUNC static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar& prec) { return numext::abs(x - y) <= numext::mini(numext::abs(x), numext::abs(y)) * prec; } EIGEN_DEVICE_FUNC static inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y, const RealScalar& prec) { return x <= y || isApprox(x, y, prec); } }; template struct scalar_fuzzy_default_impl { typedef typename NumTraits::Real RealScalar; template EIGEN_DEVICE_FUNC static inline bool isMuchSmallerThan(const Scalar& x, const Scalar&, const RealScalar&) { return x == Scalar(0); } EIGEN_DEVICE_FUNC static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar&) { return x == y; } EIGEN_DEVICE_FUNC static inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y, const RealScalar&) { return x <= y; } }; template struct scalar_fuzzy_default_impl { typedef typename NumTraits::Real RealScalar; template EIGEN_DEVICE_FUNC static inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y, const RealScalar& prec) { return numext::abs2(x) <= numext::abs2(y) * prec * prec; } EIGEN_DEVICE_FUNC static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar& prec) { return numext::abs2(x - y) <= numext::mini(numext::abs2(x), numext::abs2(y)) * prec * prec; } }; template struct scalar_fuzzy_impl : scalar_fuzzy_default_impl::IsComplex, NumTraits::IsInteger> {}; template EIGEN_DEVICE_FUNC inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y, const typename NumTraits::Real &precision = NumTraits::dummy_precision()) { return scalar_fuzzy_impl::template isMuchSmallerThan(x, y, precision); } template EIGEN_DEVICE_FUNC inline bool isApprox(const Scalar& x, const Scalar& y, const typename NumTraits::Real &precision = NumTraits::dummy_precision()) { return scalar_fuzzy_impl::isApprox(x, y, precision); } template EIGEN_DEVICE_FUNC inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y, const typename NumTraits::Real &precision = NumTraits::dummy_precision()) { return scalar_fuzzy_impl::isApproxOrLessThan(x, y, precision); } /****************************************** *** The special case of the bool type *** ******************************************/ template<> struct random_impl { static inline bool run() { return random(0,1)==0 ? false : true; } static inline bool run(const bool& a, const bool& b) { return random(a, b)==0 ? false : true; } }; template<> struct scalar_fuzzy_impl { typedef bool RealScalar; template EIGEN_DEVICE_FUNC static inline bool isMuchSmallerThan(const bool& x, const bool&, const bool&) { return !x; } EIGEN_DEVICE_FUNC static inline bool isApprox(bool x, bool y, bool) { return x == y; } EIGEN_DEVICE_FUNC static inline bool isApproxOrLessThan(const bool& x, const bool& y, const bool&) { return (!x) || y; } }; } // end namespace internal // Default implementations that rely on other numext implementations namespace internal { // Specialization for complex types that are not supported by std::expm1. template struct expm1_impl > { EIGEN_DEVICE_FUNC static inline std::complex run( const std::complex& x) { EIGEN_STATIC_ASSERT_NON_INTEGER(RealScalar) RealScalar xr = x.real(); RealScalar xi = x.imag(); // expm1(z) = exp(z) - 1 // = exp(x + i * y) - 1 // = exp(x) * (cos(y) + i * sin(y)) - 1 // = exp(x) * cos(y) - 1 + i * exp(x) * sin(y) // Imag(expm1(z)) = exp(x) * sin(y) // Real(expm1(z)) = exp(x) * cos(y) - 1 // = exp(x) * cos(y) - 1. // = expm1(x) + exp(x) * (cos(y) - 1) // = expm1(x) + exp(x) * (2 * sin(y / 2) ** 2) RealScalar erm1 = numext::expm1(xr); RealScalar er = erm1 + RealScalar(1.); RealScalar sin2 = numext::sin(xi / RealScalar(2.)); sin2 = sin2 * sin2; RealScalar s = numext::sin(xi); RealScalar real_part = erm1 - RealScalar(2.) * er * sin2; return std::complex(real_part, er * s); } }; template struct rsqrt_impl { EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE T run(const T& x) { return T(1)/numext::sqrt(x); } }; #if defined(EIGEN_GPU_COMPILE_PHASE) template struct conj_impl, true> { EIGEN_DEVICE_FUNC static inline std::complex run(const std::complex& x) { return std::complex(numext::real(x), -numext::imag(x)); } }; #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATHFUNCTIONS_H RcppEigen/inst/include/Eigen/src/Core/Replicate.h0000644000176200001440000001303014562417221021316 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_REPLICATE_H #define EIGEN_REPLICATE_H namespace Eigen { namespace internal { template struct traits > : traits { typedef typename MatrixType::Scalar Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::XprKind XprKind; typedef typename ref_selector::type MatrixTypeNested; typedef typename remove_reference::type _MatrixTypeNested; enum { RowsAtCompileTime = RowFactor==Dynamic || int(MatrixType::RowsAtCompileTime)==Dynamic ? Dynamic : RowFactor * MatrixType::RowsAtCompileTime, ColsAtCompileTime = ColFactor==Dynamic || int(MatrixType::ColsAtCompileTime)==Dynamic ? Dynamic : ColFactor * MatrixType::ColsAtCompileTime, //FIXME we don't propagate the max sizes !!! MaxRowsAtCompileTime = RowsAtCompileTime, MaxColsAtCompileTime = ColsAtCompileTime, IsRowMajor = MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1 ? 1 : MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1 ? 0 : (MatrixType::Flags & RowMajorBit) ? 1 : 0, // FIXME enable DirectAccess with negative strides? Flags = IsRowMajor ? RowMajorBit : 0 }; }; } /** * \class Replicate * \ingroup Core_Module * * \brief Expression of the multiple replication of a matrix or vector * * \tparam MatrixType the type of the object we are replicating * \tparam RowFactor number of repetitions at compile time along the vertical direction, can be Dynamic. * \tparam ColFactor number of repetitions at compile time along the horizontal direction, can be Dynamic. * * This class represents an expression of the multiple replication of a matrix or vector. * It is the return type of DenseBase::replicate() and most of the time * this is the only way it is used. * * \sa DenseBase::replicate() */ template class Replicate : public internal::dense_xpr_base< Replicate >::type { typedef typename internal::traits::MatrixTypeNested MatrixTypeNested; typedef typename internal::traits::_MatrixTypeNested _MatrixTypeNested; public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(Replicate) typedef typename internal::remove_all::type NestedExpression; template EIGEN_DEVICE_FUNC inline explicit Replicate(const OriginalMatrixType& matrix) : m_matrix(matrix), m_rowFactor(RowFactor), m_colFactor(ColFactor) { EIGEN_STATIC_ASSERT((internal::is_same::type,OriginalMatrixType>::value), THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE) eigen_assert(RowFactor!=Dynamic && ColFactor!=Dynamic); } template EIGEN_DEVICE_FUNC inline Replicate(const OriginalMatrixType& matrix, Index rowFactor, Index colFactor) : m_matrix(matrix), m_rowFactor(rowFactor), m_colFactor(colFactor) { EIGEN_STATIC_ASSERT((internal::is_same::type,OriginalMatrixType>::value), THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE) } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const { return m_matrix.rows() * m_rowFactor.value(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const { return m_matrix.cols() * m_colFactor.value(); } EIGEN_DEVICE_FUNC const _MatrixTypeNested& nestedExpression() const { return m_matrix; } protected: MatrixTypeNested m_matrix; const internal::variable_if_dynamic m_rowFactor; const internal::variable_if_dynamic m_colFactor; }; /** * \return an expression of the replication of \c *this * * Example: \include MatrixBase_replicate.cpp * Output: \verbinclude MatrixBase_replicate.out * * \sa VectorwiseOp::replicate(), DenseBase::replicate(Index,Index), class Replicate */ template template EIGEN_DEVICE_FUNC const Replicate DenseBase::replicate() const { return Replicate(derived()); } /** * \return an expression of the replication of each column (or row) of \c *this * * Example: \include DirectionWise_replicate_int.cpp * Output: \verbinclude DirectionWise_replicate_int.out * * \sa VectorwiseOp::replicate(), DenseBase::replicate(), class Replicate */ template EIGEN_DEVICE_FUNC const typename VectorwiseOp::ReplicateReturnType VectorwiseOp::replicate(Index factor) const { return typename VectorwiseOp::ReplicateReturnType (_expression(),Direction==Vertical?factor:1,Direction==Horizontal?factor:1); } } // end namespace Eigen #endif // EIGEN_REPLICATE_H RcppEigen/inst/include/Eigen/src/Core/DenseBase.h0000644000176200001440000007545114562417221021256 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007-2010 Benoit Jacob // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DENSEBASE_H #define EIGEN_DENSEBASE_H namespace Eigen { namespace internal { // The index type defined by EIGEN_DEFAULT_DENSE_INDEX_TYPE must be a signed type. // This dummy function simply aims at checking that at compile time. static inline void check_DenseIndex_is_signed() { EIGEN_STATIC_ASSERT(NumTraits::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE) } } // end namespace internal /** \class DenseBase * \ingroup Core_Module * * \brief Base class for all dense matrices, vectors, and arrays * * This class is the base that is inherited by all dense objects (matrix, vector, arrays, * and related expression types). The common Eigen API for dense objects is contained in this class. * * \tparam Derived is the derived type, e.g., a matrix type or an expression. * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_DENSEBASE_PLUGIN. * * \sa \blank \ref TopicClassHierarchy */ template class DenseBase #ifndef EIGEN_PARSED_BY_DOXYGEN : public DenseCoeffsBase::value> #else : public DenseCoeffsBase #endif // not EIGEN_PARSED_BY_DOXYGEN { public: /** Inner iterator type to iterate over the coefficients of a row or column. * \sa class InnerIterator */ typedef Eigen::InnerIterator InnerIterator; typedef typename internal::traits::StorageKind StorageKind; /** * \brief The type used to store indices * \details This typedef is relevant for types that store multiple indices such as * PermutationMatrix or Transpositions, otherwise it defaults to Eigen::Index * \sa \blank \ref TopicPreprocessorDirectives, Eigen::Index, SparseMatrixBase. */ typedef typename internal::traits::StorageIndex StorageIndex; /** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex, etc. */ typedef typename internal::traits::Scalar Scalar; /** The numeric type of the expression' coefficients, e.g. float, double, int or std::complex, etc. * * It is an alias for the Scalar type */ typedef Scalar value_type; typedef typename NumTraits::Real RealScalar; typedef DenseCoeffsBase::value> Base; using Base::derived; using Base::const_cast_derived; using Base::rows; using Base::cols; using Base::size; using Base::rowIndexByOuterInner; using Base::colIndexByOuterInner; using Base::coeff; using Base::coeffByOuterInner; using Base::operator(); using Base::operator[]; using Base::x; using Base::y; using Base::z; using Base::w; using Base::stride; using Base::innerStride; using Base::outerStride; using Base::rowStride; using Base::colStride; typedef typename Base::CoeffReturnType CoeffReturnType; enum { RowsAtCompileTime = internal::traits::RowsAtCompileTime, /**< The number of rows at compile-time. This is just a copy of the value provided * by the \a Derived type. If a value is not known at compile-time, * it is set to the \a Dynamic constant. * \sa MatrixBase::rows(), MatrixBase::cols(), ColsAtCompileTime, SizeAtCompileTime */ ColsAtCompileTime = internal::traits::ColsAtCompileTime, /**< The number of columns at compile-time. This is just a copy of the value provided * by the \a Derived type. If a value is not known at compile-time, * it is set to the \a Dynamic constant. * \sa MatrixBase::rows(), MatrixBase::cols(), RowsAtCompileTime, SizeAtCompileTime */ SizeAtCompileTime = (internal::size_at_compile_time::RowsAtCompileTime, internal::traits::ColsAtCompileTime>::ret), /**< This is equal to the number of coefficients, i.e. the number of * rows times the number of columns, or to \a Dynamic if this is not * known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */ MaxRowsAtCompileTime = internal::traits::MaxRowsAtCompileTime, /**< This value is equal to the maximum possible number of rows that this expression * might have. If this expression might have an arbitrarily high number of rows, * this value is set to \a Dynamic. * * This value is useful to know when evaluating an expression, in order to determine * whether it is possible to avoid doing a dynamic memory allocation. * * \sa RowsAtCompileTime, MaxColsAtCompileTime, MaxSizeAtCompileTime */ MaxColsAtCompileTime = internal::traits::MaxColsAtCompileTime, /**< This value is equal to the maximum possible number of columns that this expression * might have. If this expression might have an arbitrarily high number of columns, * this value is set to \a Dynamic. * * This value is useful to know when evaluating an expression, in order to determine * whether it is possible to avoid doing a dynamic memory allocation. * * \sa ColsAtCompileTime, MaxRowsAtCompileTime, MaxSizeAtCompileTime */ MaxSizeAtCompileTime = (internal::size_at_compile_time::MaxRowsAtCompileTime, internal::traits::MaxColsAtCompileTime>::ret), /**< This value is equal to the maximum possible number of coefficients that this expression * might have. If this expression might have an arbitrarily high number of coefficients, * this value is set to \a Dynamic. * * This value is useful to know when evaluating an expression, in order to determine * whether it is possible to avoid doing a dynamic memory allocation. * * \sa SizeAtCompileTime, MaxRowsAtCompileTime, MaxColsAtCompileTime */ IsVectorAtCompileTime = internal::traits::RowsAtCompileTime == 1 || internal::traits::ColsAtCompileTime == 1, /**< This is set to true if either the number of rows or the number of * columns is known at compile-time to be equal to 1. Indeed, in that case, * we are dealing with a column-vector (if there is only one column) or with * a row-vector (if there is only one row). */ NumDimensions = int(MaxSizeAtCompileTime) == 1 ? 0 : bool(IsVectorAtCompileTime) ? 1 : 2, /**< This value is equal to Tensor::NumDimensions, i.e. 0 for scalars, 1 for vectors, * and 2 for matrices. */ Flags = internal::traits::Flags, /**< This stores expression \ref flags flags which may or may not be inherited by new expressions * constructed from this one. See the \ref flags "list of flags". */ IsRowMajor = int(Flags) & RowMajorBit, /**< True if this expression has row-major storage order. */ InnerSizeAtCompileTime = int(IsVectorAtCompileTime) ? int(SizeAtCompileTime) : int(IsRowMajor) ? int(ColsAtCompileTime) : int(RowsAtCompileTime), InnerStrideAtCompileTime = internal::inner_stride_at_compile_time::ret, OuterStrideAtCompileTime = internal::outer_stride_at_compile_time::ret }; typedef typename internal::find_best_packet::type PacketScalar; enum { IsPlainObjectBase = 0 }; /** The plain matrix type corresponding to this expression. * \sa PlainObject */ typedef Matrix::Scalar, internal::traits::RowsAtCompileTime, internal::traits::ColsAtCompileTime, AutoAlign | (internal::traits::Flags&RowMajorBit ? RowMajor : ColMajor), internal::traits::MaxRowsAtCompileTime, internal::traits::MaxColsAtCompileTime > PlainMatrix; /** The plain array type corresponding to this expression. * \sa PlainObject */ typedef Array::Scalar, internal::traits::RowsAtCompileTime, internal::traits::ColsAtCompileTime, AutoAlign | (internal::traits::Flags&RowMajorBit ? RowMajor : ColMajor), internal::traits::MaxRowsAtCompileTime, internal::traits::MaxColsAtCompileTime > PlainArray; /** \brief The plain matrix or array type corresponding to this expression. * * This is not necessarily exactly the return type of eval(). In the case of plain matrices, * the return type of eval() is a const reference to a matrix, not a matrix! It is however guaranteed * that the return type of eval() is either PlainObject or const PlainObject&. */ typedef typename internal::conditional::XprKind,MatrixXpr >::value, PlainMatrix, PlainArray>::type PlainObject; /** \returns the number of nonzero coefficients which is in practice the number * of stored coefficients. */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index nonZeros() const { return size(); } /** \returns the outer size. * * \note For a vector, this returns just 1. For a matrix (non-vector), this is the major dimension * with respect to the \ref TopicStorageOrders "storage order", i.e., the number of columns for a * column-major matrix, and the number of rows for a row-major matrix. */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerSize() const { return IsVectorAtCompileTime ? 1 : int(IsRowMajor) ? this->rows() : this->cols(); } /** \returns the inner size. * * \note For a vector, this is just the size. For a matrix (non-vector), this is the minor dimension * with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a * column-major matrix, and the number of columns for a row-major matrix. */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index innerSize() const { return IsVectorAtCompileTime ? this->size() : int(IsRowMajor) ? this->cols() : this->rows(); } /** Only plain matrices/arrays, not expressions, may be resized; therefore the only useful resize methods are * Matrix::resize() and Array::resize(). The present method only asserts that the new size equals the old size, and does * nothing else. */ EIGEN_DEVICE_FUNC void resize(Index newSize) { EIGEN_ONLY_USED_FOR_DEBUG(newSize); eigen_assert(newSize == this->size() && "DenseBase::resize() does not actually allow to resize."); } /** Only plain matrices/arrays, not expressions, may be resized; therefore the only useful resize methods are * Matrix::resize() and Array::resize(). The present method only asserts that the new size equals the old size, and does * nothing else. */ EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { EIGEN_ONLY_USED_FOR_DEBUG(rows); EIGEN_ONLY_USED_FOR_DEBUG(cols); eigen_assert(rows == this->rows() && cols == this->cols() && "DenseBase::resize() does not actually allow to resize."); } #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal Represents a matrix with all coefficients equal to one another*/ typedef CwiseNullaryOp,PlainObject> ConstantReturnType; /** \internal \deprecated Represents a vector with linearly spaced coefficients that allows sequential access only. */ EIGEN_DEPRECATED typedef CwiseNullaryOp,PlainObject> SequentialLinSpacedReturnType; /** \internal Represents a vector with linearly spaced coefficients that allows random access. */ typedef CwiseNullaryOp,PlainObject> RandomAccessLinSpacedReturnType; /** \internal the return type of MatrixBase::eigenvalues() */ typedef Matrix::Scalar>::Real, internal::traits::ColsAtCompileTime, 1> EigenvaluesReturnType; #endif // not EIGEN_PARSED_BY_DOXYGEN /** Copies \a other into *this. \returns a reference to *this. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase& other); /** Special case of the template operator=, in order to prevent the compiler * from generating a default operator= (issue hit with g++ 4.1) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase& other); template EIGEN_DEVICE_FUNC Derived& operator=(const EigenBase &other); template EIGEN_DEVICE_FUNC Derived& operator+=(const EigenBase &other); template EIGEN_DEVICE_FUNC Derived& operator-=(const EigenBase &other); template EIGEN_DEVICE_FUNC Derived& operator=(const ReturnByValue& func); /** \internal * Copies \a other into *this without evaluating other. \returns a reference to *this. */ template /** \deprecated */ EIGEN_DEPRECATED EIGEN_DEVICE_FUNC Derived& lazyAssign(const DenseBase& other); EIGEN_DEVICE_FUNC CommaInitializer operator<< (const Scalar& s); template /** \deprecated it now returns \c *this */ EIGEN_DEPRECATED const Derived& flagged() const { return derived(); } template EIGEN_DEVICE_FUNC CommaInitializer operator<< (const DenseBase& other); typedef Transpose TransposeReturnType; EIGEN_DEVICE_FUNC TransposeReturnType transpose(); typedef typename internal::add_const >::type ConstTransposeReturnType; EIGEN_DEVICE_FUNC ConstTransposeReturnType transpose() const; EIGEN_DEVICE_FUNC void transposeInPlace(); EIGEN_DEVICE_FUNC static const ConstantReturnType Constant(Index rows, Index cols, const Scalar& value); EIGEN_DEVICE_FUNC static const ConstantReturnType Constant(Index size, const Scalar& value); EIGEN_DEVICE_FUNC static const ConstantReturnType Constant(const Scalar& value); EIGEN_DEPRECATED EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high); EIGEN_DEPRECATED EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType LinSpaced(Sequential_t, const Scalar& low, const Scalar& high); EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType LinSpaced(Index size, const Scalar& low, const Scalar& high); EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType LinSpaced(const Scalar& low, const Scalar& high); template EIGEN_DEVICE_FUNC static const CwiseNullaryOp NullaryExpr(Index rows, Index cols, const CustomNullaryOp& func); template EIGEN_DEVICE_FUNC static const CwiseNullaryOp NullaryExpr(Index size, const CustomNullaryOp& func); template EIGEN_DEVICE_FUNC static const CwiseNullaryOp NullaryExpr(const CustomNullaryOp& func); EIGEN_DEVICE_FUNC static const ConstantReturnType Zero(Index rows, Index cols); EIGEN_DEVICE_FUNC static const ConstantReturnType Zero(Index size); EIGEN_DEVICE_FUNC static const ConstantReturnType Zero(); EIGEN_DEVICE_FUNC static const ConstantReturnType Ones(Index rows, Index cols); EIGEN_DEVICE_FUNC static const ConstantReturnType Ones(Index size); EIGEN_DEVICE_FUNC static const ConstantReturnType Ones(); EIGEN_DEVICE_FUNC void fill(const Scalar& value); EIGEN_DEVICE_FUNC Derived& setConstant(const Scalar& value); EIGEN_DEVICE_FUNC Derived& setLinSpaced(Index size, const Scalar& low, const Scalar& high); EIGEN_DEVICE_FUNC Derived& setLinSpaced(const Scalar& low, const Scalar& high); EIGEN_DEVICE_FUNC Derived& setZero(); EIGEN_DEVICE_FUNC Derived& setOnes(); EIGEN_DEVICE_FUNC Derived& setRandom(); template EIGEN_DEVICE_FUNC bool isApprox(const DenseBase& other, const RealScalar& prec = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC bool isMuchSmallerThan(const RealScalar& other, const RealScalar& prec = NumTraits::dummy_precision()) const; template EIGEN_DEVICE_FUNC bool isMuchSmallerThan(const DenseBase& other, const RealScalar& prec = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC bool isApproxToConstant(const Scalar& value, const RealScalar& prec = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC bool isConstant(const Scalar& value, const RealScalar& prec = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC bool isZero(const RealScalar& prec = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC bool isOnes(const RealScalar& prec = NumTraits::dummy_precision()) const; inline bool hasNaN() const; inline bool allFinite() const; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator*=(const Scalar& other); EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator/=(const Scalar& other); typedef typename internal::add_const_on_value_type::type>::type EvalReturnType; /** \returns the matrix or vector obtained by evaluating this expression. * * Notice that in the case of a plain matrix or vector (not an expression) this function just returns * a const reference, in order to avoid a useless copy. * * \warning Be careful with eval() and the auto C++ keyword, as detailed in this \link TopicPitfalls_auto_keyword page \endlink. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EvalReturnType eval() const { // Even though MSVC does not honor strong inlining when the return type // is a dynamic matrix, we desperately need strong inlining for fixed // size types on MSVC. return typename internal::eval::type(derived()); } /** swaps *this with the expression \a other. * */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void swap(const DenseBase& other) { EIGEN_STATIC_ASSERT(!OtherDerived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY); eigen_assert(rows()==other.rows() && cols()==other.cols()); call_assignment(derived(), other.const_cast_derived(), internal::swap_assign_op()); } /** swaps *this with the matrix or array \a other. * */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void swap(PlainObjectBase& other) { eigen_assert(rows()==other.rows() && cols()==other.cols()); call_assignment(derived(), other.derived(), internal::swap_assign_op()); } EIGEN_DEVICE_FUNC inline const NestByValue nestByValue() const; EIGEN_DEVICE_FUNC inline const ForceAlignedAccess forceAlignedAccess() const; EIGEN_DEVICE_FUNC inline ForceAlignedAccess forceAlignedAccess(); template EIGEN_DEVICE_FUNC inline const typename internal::conditional,Derived&>::type forceAlignedAccessIf() const; template EIGEN_DEVICE_FUNC inline typename internal::conditional,Derived&>::type forceAlignedAccessIf(); EIGEN_DEVICE_FUNC Scalar sum() const; EIGEN_DEVICE_FUNC Scalar mean() const; EIGEN_DEVICE_FUNC Scalar trace() const; EIGEN_DEVICE_FUNC Scalar prod() const; template EIGEN_DEVICE_FUNC typename internal::traits::Scalar minCoeff() const; template EIGEN_DEVICE_FUNC typename internal::traits::Scalar maxCoeff() const; // By default, the fastest version with undefined NaN propagation semantics is // used. // TODO(rmlarsen): Replace with default template argument when we move to // c++11 or beyond. EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar minCoeff() const { return minCoeff(); } EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar maxCoeff() const { return maxCoeff(); } template EIGEN_DEVICE_FUNC typename internal::traits::Scalar minCoeff(IndexType* row, IndexType* col) const; template EIGEN_DEVICE_FUNC typename internal::traits::Scalar maxCoeff(IndexType* row, IndexType* col) const; template EIGEN_DEVICE_FUNC typename internal::traits::Scalar minCoeff(IndexType* index) const; template EIGEN_DEVICE_FUNC typename internal::traits::Scalar maxCoeff(IndexType* index) const; // TODO(rmlarsen): Replace these methods with a default template argument. template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar minCoeff(IndexType* row, IndexType* col) const { return minCoeff(row, col); } template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar maxCoeff(IndexType* row, IndexType* col) const { return maxCoeff(row, col); } template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar minCoeff(IndexType* index) const { return minCoeff(index); } template EIGEN_DEVICE_FUNC inline typename internal::traits::Scalar maxCoeff(IndexType* index) const { return maxCoeff(index); } template EIGEN_DEVICE_FUNC Scalar redux(const BinaryOp& func) const; template EIGEN_DEVICE_FUNC void visit(Visitor& func) const; /** \returns a WithFormat proxy object allowing to print a matrix the with given * format \a fmt. * * See class IOFormat for some examples. * * \sa class IOFormat, class WithFormat */ inline const WithFormat format(const IOFormat& fmt) const { return WithFormat(derived(), fmt); } /** \returns the unique coefficient of a 1x1 expression */ EIGEN_DEVICE_FUNC CoeffReturnType value() const { EIGEN_STATIC_ASSERT_SIZE_1x1(Derived) eigen_assert(this->rows() == 1 && this->cols() == 1); return derived().coeff(0,0); } EIGEN_DEVICE_FUNC bool all() const; EIGEN_DEVICE_FUNC bool any() const; EIGEN_DEVICE_FUNC Index count() const; typedef VectorwiseOp RowwiseReturnType; typedef const VectorwiseOp ConstRowwiseReturnType; typedef VectorwiseOp ColwiseReturnType; typedef const VectorwiseOp ConstColwiseReturnType; /** \returns a VectorwiseOp wrapper of *this for broadcasting and partial reductions * * Example: \include MatrixBase_rowwise.cpp * Output: \verbinclude MatrixBase_rowwise.out * * \sa colwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting */ //Code moved here due to a CUDA compiler bug EIGEN_DEVICE_FUNC inline ConstRowwiseReturnType rowwise() const { return ConstRowwiseReturnType(derived()); } EIGEN_DEVICE_FUNC RowwiseReturnType rowwise(); /** \returns a VectorwiseOp wrapper of *this broadcasting and partial reductions * * Example: \include MatrixBase_colwise.cpp * Output: \verbinclude MatrixBase_colwise.out * * \sa rowwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting */ EIGEN_DEVICE_FUNC inline ConstColwiseReturnType colwise() const { return ConstColwiseReturnType(derived()); } EIGEN_DEVICE_FUNC ColwiseReturnType colwise(); typedef CwiseNullaryOp,PlainObject> RandomReturnType; static const RandomReturnType Random(Index rows, Index cols); static const RandomReturnType Random(Index size); static const RandomReturnType Random(); template inline EIGEN_DEVICE_FUNC const Select select(const DenseBase& thenMatrix, const DenseBase& elseMatrix) const; template inline EIGEN_DEVICE_FUNC const Select select(const DenseBase& thenMatrix, const typename ThenDerived::Scalar& elseScalar) const; template inline EIGEN_DEVICE_FUNC const Select select(const typename ElseDerived::Scalar& thenScalar, const DenseBase& elseMatrix) const; template RealScalar lpNorm() const; template EIGEN_DEVICE_FUNC const Replicate replicate() const; /** * \return an expression of the replication of \c *this * * Example: \include MatrixBase_replicate_int_int.cpp * Output: \verbinclude MatrixBase_replicate_int_int.out * * \sa VectorwiseOp::replicate(), DenseBase::replicate(), class Replicate */ //Code moved here due to a CUDA compiler bug EIGEN_DEVICE_FUNC const Replicate replicate(Index rowFactor, Index colFactor) const { return Replicate(derived(), rowFactor, colFactor); } typedef Reverse ReverseReturnType; typedef const Reverse ConstReverseReturnType; EIGEN_DEVICE_FUNC ReverseReturnType reverse(); /** This is the const version of reverse(). */ //Code moved here due to a CUDA compiler bug EIGEN_DEVICE_FUNC ConstReverseReturnType reverse() const { return ConstReverseReturnType(derived()); } EIGEN_DEVICE_FUNC void reverseInPlace(); #ifdef EIGEN_PARSED_BY_DOXYGEN /** STL-like RandomAccessIterator * iterator type as returned by the begin() and end() methods. */ typedef random_access_iterator_type iterator; /** This is the const version of iterator (aka read-only) */ typedef random_access_iterator_type const_iterator; #else typedef typename internal::conditional< (Flags&DirectAccessBit)==DirectAccessBit, internal::pointer_based_stl_iterator, internal::generic_randaccess_stl_iterator >::type iterator_type; typedef typename internal::conditional< (Flags&DirectAccessBit)==DirectAccessBit, internal::pointer_based_stl_iterator, internal::generic_randaccess_stl_iterator >::type const_iterator_type; // Stl-style iterators are supported only for vectors. typedef typename internal::conditional< IsVectorAtCompileTime, iterator_type, void >::type iterator; typedef typename internal::conditional< IsVectorAtCompileTime, const_iterator_type, void >::type const_iterator; #endif inline iterator begin(); inline const_iterator begin() const; inline const_iterator cbegin() const; inline iterator end(); inline const_iterator end() const; inline const_iterator cend() const; #define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::DenseBase #define EIGEN_DOC_BLOCK_ADDONS_NOT_INNER_PANEL #define EIGEN_DOC_BLOCK_ADDONS_INNER_PANEL_IF(COND) #define EIGEN_DOC_UNARY_ADDONS(X,Y) # include "../plugins/CommonCwiseUnaryOps.h" # include "../plugins/BlockMethods.h" # include "../plugins/IndexedViewMethods.h" # include "../plugins/ReshapedMethods.h" # ifdef EIGEN_DENSEBASE_PLUGIN # include EIGEN_DENSEBASE_PLUGIN # endif #undef EIGEN_CURRENT_STORAGE_BASE_CLASS #undef EIGEN_DOC_BLOCK_ADDONS_NOT_INNER_PANEL #undef EIGEN_DOC_BLOCK_ADDONS_INNER_PANEL_IF #undef EIGEN_DOC_UNARY_ADDONS // disable the use of evalTo for dense objects with a nice compilation error template EIGEN_DEVICE_FUNC inline void evalTo(Dest& ) const { EIGEN_STATIC_ASSERT((internal::is_same::value),THE_EVAL_EVALTO_FUNCTION_SHOULD_NEVER_BE_CALLED_FOR_DENSE_OBJECTS); } protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(DenseBase) /** Default constructor. Do nothing. */ EIGEN_DEVICE_FUNC DenseBase() { /* Just checks for self-consistency of the flags. * Only do it when debugging Eigen, as this borders on paranoia and could slow compilation down */ #ifdef EIGEN_INTERNAL_DEBUGGING EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor)) && EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, int(!IsRowMajor))), INVALID_STORAGE_ORDER_FOR_THIS_VECTOR_EXPRESSION) #endif } private: EIGEN_DEVICE_FUNC explicit DenseBase(int); EIGEN_DEVICE_FUNC DenseBase(int,int); template EIGEN_DEVICE_FUNC explicit DenseBase(const DenseBase&); }; } // end namespace Eigen #endif // EIGEN_DENSEBASE_H RcppEigen/inst/include/Eigen/src/Core/DiagonalProduct.h0000644000176200001440000000173414562417221022475 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2007-2009 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DIAGONALPRODUCT_H #define EIGEN_DIAGONALPRODUCT_H namespace Eigen { /** \returns the diagonal matrix product of \c *this by the diagonal matrix \a diagonal. */ template template EIGEN_DEVICE_FUNC inline const Product MatrixBase::operator*(const DiagonalBase &a_diagonal) const { return Product(derived(),a_diagonal.derived()); } } // end namespace Eigen #endif // EIGEN_DIAGONALPRODUCT_H RcppEigen/inst/include/Eigen/src/Core/Reverse.h0000644000176200001440000001654214562417221021034 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2008 Benoit Jacob // Copyright (C) 2009 Ricard Marxer // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_REVERSE_H #define EIGEN_REVERSE_H namespace Eigen { namespace internal { template struct traits > : traits { typedef typename MatrixType::Scalar Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::XprKind XprKind; typedef typename ref_selector::type MatrixTypeNested; typedef typename remove_reference::type _MatrixTypeNested; enum { RowsAtCompileTime = MatrixType::RowsAtCompileTime, ColsAtCompileTime = MatrixType::ColsAtCompileTime, MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime, Flags = _MatrixTypeNested::Flags & (RowMajorBit | LvalueBit) }; }; template struct reverse_packet_cond { static inline PacketType run(const PacketType& x) { return preverse(x); } }; template struct reverse_packet_cond { static inline PacketType run(const PacketType& x) { return x; } }; } // end namespace internal /** \class Reverse * \ingroup Core_Module * * \brief Expression of the reverse of a vector or matrix * * \tparam MatrixType the type of the object of which we are taking the reverse * \tparam Direction defines the direction of the reverse operation, can be Vertical, Horizontal, or BothDirections * * This class represents an expression of the reverse of a vector. * It is the return type of MatrixBase::reverse() and VectorwiseOp::reverse() * and most of the time this is the only way it is used. * * \sa MatrixBase::reverse(), VectorwiseOp::reverse() */ template class Reverse : public internal::dense_xpr_base< Reverse >::type { public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(Reverse) typedef typename internal::remove_all::type NestedExpression; using Base::IsRowMajor; protected: enum { PacketSize = internal::packet_traits::size, IsColMajor = !IsRowMajor, ReverseRow = (Direction == Vertical) || (Direction == BothDirections), ReverseCol = (Direction == Horizontal) || (Direction == BothDirections), OffsetRow = ReverseRow && IsColMajor ? PacketSize : 1, OffsetCol = ReverseCol && IsRowMajor ? PacketSize : 1, ReversePacket = (Direction == BothDirections) || ((Direction == Vertical) && IsColMajor) || ((Direction == Horizontal) && IsRowMajor) }; typedef internal::reverse_packet_cond reverse_packet; public: EIGEN_DEVICE_FUNC explicit inline Reverse(const MatrixType& matrix) : m_matrix(matrix) { } EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reverse) EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); } EIGEN_DEVICE_FUNC inline Index innerStride() const { return -m_matrix.innerStride(); } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_matrix; } protected: typename MatrixType::Nested m_matrix; }; /** \returns an expression of the reverse of *this. * * Example: \include MatrixBase_reverse.cpp * Output: \verbinclude MatrixBase_reverse.out * */ template EIGEN_DEVICE_FUNC inline typename DenseBase::ReverseReturnType DenseBase::reverse() { return ReverseReturnType(derived()); } //reverse const overload moved DenseBase.h due to a CUDA compiler bug /** This is the "in place" version of reverse: it reverses \c *this. * * In most cases it is probably better to simply use the reversed expression * of a matrix. However, when reversing the matrix data itself is really needed, * then this "in-place" version is probably the right choice because it provides * the following additional benefits: * - less error prone: doing the same operation with .reverse() requires special care: * \code m = m.reverse().eval(); \endcode * - this API enables reverse operations without the need for a temporary * - it allows future optimizations (cache friendliness, etc.) * * \sa VectorwiseOp::reverseInPlace(), reverse() */ template EIGEN_DEVICE_FUNC inline void DenseBase::reverseInPlace() { if(cols()>rows()) { Index half = cols()/2; leftCols(half).swap(rightCols(half).reverse()); if((cols()%2)==1) { Index half2 = rows()/2; col(half).head(half2).swap(col(half).tail(half2).reverse()); } } else { Index half = rows()/2; topRows(half).swap(bottomRows(half).reverse()); if((rows()%2)==1) { Index half2 = cols()/2; row(half).head(half2).swap(row(half).tail(half2).reverse()); } } } namespace internal { template struct vectorwise_reverse_inplace_impl; template<> struct vectorwise_reverse_inplace_impl { template static void run(ExpressionType &xpr) { const int HalfAtCompileTime = ExpressionType::RowsAtCompileTime==Dynamic?Dynamic:ExpressionType::RowsAtCompileTime/2; Index half = xpr.rows()/2; xpr.topRows(fix(half)) .swap(xpr.bottomRows(fix(half)).colwise().reverse()); } }; template<> struct vectorwise_reverse_inplace_impl { template static void run(ExpressionType &xpr) { const int HalfAtCompileTime = ExpressionType::ColsAtCompileTime==Dynamic?Dynamic:ExpressionType::ColsAtCompileTime/2; Index half = xpr.cols()/2; xpr.leftCols(fix(half)) .swap(xpr.rightCols(fix(half)).rowwise().reverse()); } }; } // end namespace internal /** This is the "in place" version of VectorwiseOp::reverse: it reverses each column or row of \c *this. * * In most cases it is probably better to simply use the reversed expression * of a matrix. However, when reversing the matrix data itself is really needed, * then this "in-place" version is probably the right choice because it provides * the following additional benefits: * - less error prone: doing the same operation with .reverse() requires special care: * \code m = m.reverse().eval(); \endcode * - this API enables reverse operations without the need for a temporary * * \sa DenseBase::reverseInPlace(), reverse() */ template EIGEN_DEVICE_FUNC void VectorwiseOp::reverseInPlace() { internal::vectorwise_reverse_inplace_impl::run(m_matrix); } } // end namespace Eigen #endif // EIGEN_REVERSE_H RcppEigen/inst/include/Eigen/src/Core/GenericPacketMath.h0000644000176200001440000011363414562417221022737 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERIC_PACKET_MATH_H #define EIGEN_GENERIC_PACKET_MATH_H namespace Eigen { namespace internal { /** \internal * \file GenericPacketMath.h * * Default implementation for types not supported by the vectorization. * In practice these functions are provided to make easier the writing * of generic vectorized code. */ #ifndef EIGEN_DEBUG_ALIGNED_LOAD #define EIGEN_DEBUG_ALIGNED_LOAD #endif #ifndef EIGEN_DEBUG_UNALIGNED_LOAD #define EIGEN_DEBUG_UNALIGNED_LOAD #endif #ifndef EIGEN_DEBUG_ALIGNED_STORE #define EIGEN_DEBUG_ALIGNED_STORE #endif #ifndef EIGEN_DEBUG_UNALIGNED_STORE #define EIGEN_DEBUG_UNALIGNED_STORE #endif struct default_packet_traits { enum { HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 0, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 1, HasBlend = 0, // This flag is used to indicate whether packet comparison is supported. // pcmp_eq, pcmp_lt and pcmp_le should be defined for it to be true. HasCmp = 0, HasDiv = 0, HasSqrt = 0, HasRsqrt = 0, HasExp = 0, HasExpm1 = 0, HasLog = 0, HasLog1p = 0, HasLog10 = 0, HasPow = 0, HasSin = 0, HasCos = 0, HasTan = 0, HasASin = 0, HasACos = 0, HasATan = 0, HasSinh = 0, HasCosh = 0, HasTanh = 0, HasLGamma = 0, HasDiGamma = 0, HasZeta = 0, HasPolygamma = 0, HasErf = 0, HasErfc = 0, HasNdtri = 0, HasBessel = 0, HasIGamma = 0, HasIGammaDerA = 0, HasGammaSampleDerAlpha = 0, HasIGammac = 0, HasBetaInc = 0, HasRound = 0, HasRint = 0, HasFloor = 0, HasCeil = 0, HasSign = 0 }; }; template struct packet_traits : default_packet_traits { typedef T type; typedef T half; enum { Vectorizable = 0, size = 1, AlignedOnScalar = 0, HasHalfPacket = 0 }; enum { HasAdd = 0, HasSub = 0, HasMul = 0, HasNegate = 0, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasConj = 0, HasSetLinear = 0 }; }; template struct packet_traits : packet_traits { }; template struct unpacket_traits { typedef T type; typedef T half; enum { size = 1, alignment = 1, vectorizable = false, masked_load_available=false, masked_store_available=false }; }; template struct unpacket_traits : unpacket_traits { }; template struct type_casting_traits { enum { VectorizedCast = 0, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; /** \internal Wrapper to ensure that multiple packet types can map to the same same underlying vector type. */ template struct eigen_packet_wrapper { EIGEN_ALWAYS_INLINE operator T&() { return m_val; } EIGEN_ALWAYS_INLINE operator const T&() const { return m_val; } EIGEN_ALWAYS_INLINE eigen_packet_wrapper() {} EIGEN_ALWAYS_INLINE eigen_packet_wrapper(const T &v) : m_val(v) {} EIGEN_ALWAYS_INLINE eigen_packet_wrapper& operator=(const T &v) { m_val = v; return *this; } T m_val; }; /** \internal A convenience utility for determining if the type is a scalar. * This is used to enable some generic packet implementations. */ template struct is_scalar { typedef typename unpacket_traits::type Scalar; enum { value = internal::is_same::value }; }; /** \internal \returns static_cast(a) (coeff-wise) */ template EIGEN_DEVICE_FUNC inline TgtPacket pcast(const SrcPacket& a) { return static_cast(a); } template EIGEN_DEVICE_FUNC inline TgtPacket pcast(const SrcPacket& a, const SrcPacket& /*b*/) { return static_cast(a); } template EIGEN_DEVICE_FUNC inline TgtPacket pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/) { return static_cast(a); } template EIGEN_DEVICE_FUNC inline TgtPacket pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/, const SrcPacket& /*e*/, const SrcPacket& /*f*/, const SrcPacket& /*g*/, const SrcPacket& /*h*/) { return static_cast(a); } /** \internal \returns reinterpret_cast(a) */ template EIGEN_DEVICE_FUNC inline Target preinterpret(const Packet& a); /* { return reinterpret_cast(a); } */ /** \internal \returns a + b (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet padd(const Packet& a, const Packet& b) { return a+b; } // Avoid compiler warning for boolean algebra. template<> EIGEN_DEVICE_FUNC inline bool padd(const bool& a, const bool& b) { return a || b; } /** \internal \returns a - b (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet psub(const Packet& a, const Packet& b) { return a-b; } /** \internal \returns -a (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pnegate(const Packet& a) { return -a; } template<> EIGEN_DEVICE_FUNC inline bool pnegate(const bool& a) { return !a; } /** \internal \returns conj(a) (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pconj(const Packet& a) { return numext::conj(a); } /** \internal \returns a * b (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pmul(const Packet& a, const Packet& b) { return a*b; } // Avoid compiler warning for boolean algebra. template<> EIGEN_DEVICE_FUNC inline bool pmul(const bool& a, const bool& b) { return a && b; } /** \internal \returns a / b (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pdiv(const Packet& a, const Packet& b) { return a/b; } // In the generic case, memset to all one bits. template struct ptrue_impl { static EIGEN_DEVICE_FUNC inline Packet run(const Packet& /*a*/){ Packet b; memset(static_cast(&b), 0xff, sizeof(Packet)); return b; } }; // For non-trivial scalars, set to Scalar(1) (i.e. a non-zero value). // Although this is technically not a valid bitmask, the scalar path for pselect // uses a comparison to zero, so this should still work in most cases. We don't // have another option, since the scalar type requires initialization. template struct ptrue_impl::value && NumTraits::RequireInitialization>::type > { static EIGEN_DEVICE_FUNC inline T run(const T& /*a*/){ return T(1); } }; /** \internal \returns one bits. */ template EIGEN_DEVICE_FUNC inline Packet ptrue(const Packet& a) { return ptrue_impl::run(a); } // In the general case, memset to zero. template struct pzero_impl { static EIGEN_DEVICE_FUNC inline Packet run(const Packet& /*a*/) { Packet b; memset(static_cast(&b), 0x00, sizeof(Packet)); return b; } }; // For scalars, explicitly set to Scalar(0), since the underlying representation // for zero may not consist of all-zero bits. template struct pzero_impl::value>::type> { static EIGEN_DEVICE_FUNC inline T run(const T& /*a*/) { return T(0); } }; /** \internal \returns packet of zeros */ template EIGEN_DEVICE_FUNC inline Packet pzero(const Packet& a) { return pzero_impl::run(a); } /** \internal \returns a <= b as a bit mask */ template EIGEN_DEVICE_FUNC inline Packet pcmp_le(const Packet& a, const Packet& b) { return a<=b ? ptrue(a) : pzero(a); } /** \internal \returns a < b as a bit mask */ template EIGEN_DEVICE_FUNC inline Packet pcmp_lt(const Packet& a, const Packet& b) { return a EIGEN_DEVICE_FUNC inline Packet pcmp_eq(const Packet& a, const Packet& b) { return a==b ? ptrue(a) : pzero(a); } /** \internal \returns a < b or a==NaN or b==NaN as a bit mask */ template EIGEN_DEVICE_FUNC inline Packet pcmp_lt_or_nan(const Packet& a, const Packet& b) { return a>=b ? pzero(a) : ptrue(a); } template struct bit_and { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const { return a & b; } }; template struct bit_or { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const { return a | b; } }; template struct bit_xor { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a, const T& b) const { return a ^ b; } }; template struct bit_not { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR EIGEN_ALWAYS_INLINE T operator()(const T& a) const { return ~a; } }; // Use operators &, |, ^, ~. template struct operator_bitwise_helper { EIGEN_DEVICE_FUNC static inline T bitwise_and(const T& a, const T& b) { return bit_and()(a, b); } EIGEN_DEVICE_FUNC static inline T bitwise_or(const T& a, const T& b) { return bit_or()(a, b); } EIGEN_DEVICE_FUNC static inline T bitwise_xor(const T& a, const T& b) { return bit_xor()(a, b); } EIGEN_DEVICE_FUNC static inline T bitwise_not(const T& a) { return bit_not()(a); } }; // Apply binary operations byte-by-byte template struct bytewise_bitwise_helper { EIGEN_DEVICE_FUNC static inline T bitwise_and(const T& a, const T& b) { return binary(a, b, bit_and()); } EIGEN_DEVICE_FUNC static inline T bitwise_or(const T& a, const T& b) { return binary(a, b, bit_or()); } EIGEN_DEVICE_FUNC static inline T bitwise_xor(const T& a, const T& b) { return binary(a, b, bit_xor()); } EIGEN_DEVICE_FUNC static inline T bitwise_not(const T& a) { return unary(a,bit_not()); } private: template EIGEN_DEVICE_FUNC static inline T unary(const T& a, Op op) { const unsigned char* a_ptr = reinterpret_cast(&a); T c; unsigned char* c_ptr = reinterpret_cast(&c); for (size_t i = 0; i < sizeof(T); ++i) { *c_ptr++ = op(*a_ptr++); } return c; } template EIGEN_DEVICE_FUNC static inline T binary(const T& a, const T& b, Op op) { const unsigned char* a_ptr = reinterpret_cast(&a); const unsigned char* b_ptr = reinterpret_cast(&b); T c; unsigned char* c_ptr = reinterpret_cast(&c); for (size_t i = 0; i < sizeof(T); ++i) { *c_ptr++ = op(*a_ptr++, *b_ptr++); } return c; } }; // In the general case, use byte-by-byte manipulation. template struct bitwise_helper : public bytewise_bitwise_helper {}; // For integers or non-trivial scalars, use binary operators. template struct bitwise_helper::value && (NumTraits::IsInteger || NumTraits::RequireInitialization)>::type > : public operator_bitwise_helper {}; /** \internal \returns the bitwise and of \a a and \a b */ template EIGEN_DEVICE_FUNC inline Packet pand(const Packet& a, const Packet& b) { return bitwise_helper::bitwise_and(a, b); } /** \internal \returns the bitwise or of \a a and \a b */ template EIGEN_DEVICE_FUNC inline Packet por(const Packet& a, const Packet& b) { return bitwise_helper::bitwise_or(a, b); } /** \internal \returns the bitwise xor of \a a and \a b */ template EIGEN_DEVICE_FUNC inline Packet pxor(const Packet& a, const Packet& b) { return bitwise_helper::bitwise_xor(a, b); } /** \internal \returns the bitwise not of \a a */ template EIGEN_DEVICE_FUNC inline Packet pnot(const Packet& a) { return bitwise_helper::bitwise_not(a); } /** \internal \returns the bitwise and of \a a and not \a b */ template EIGEN_DEVICE_FUNC inline Packet pandnot(const Packet& a, const Packet& b) { return pand(a, pnot(b)); } // In the general case, use bitwise select. template struct pselect_impl { static EIGEN_DEVICE_FUNC inline Packet run(const Packet& mask, const Packet& a, const Packet& b) { return por(pand(a,mask),pandnot(b,mask)); } }; // For scalars, use ternary select. template struct pselect_impl::value>::type > { static EIGEN_DEVICE_FUNC inline Packet run(const Packet& mask, const Packet& a, const Packet& b) { return numext::equal_strict(mask, Packet(0)) ? b : a; } }; /** \internal \returns \a or \b for each field in packet according to \mask */ template EIGEN_DEVICE_FUNC inline Packet pselect(const Packet& mask, const Packet& a, const Packet& b) { return pselect_impl::run(mask, a, b); } template<> EIGEN_DEVICE_FUNC inline bool pselect( const bool& cond, const bool& a, const bool& b) { return cond ? a : b; } /** \internal \returns the min or of \a a and \a b (coeff-wise) If either \a a or \a b are NaN, the result is implementation defined. */ template struct pminmax_impl { template static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) { return op(a,b); } }; /** \internal \returns the min or max of \a a and \a b (coeff-wise) If either \a a or \a b are NaN, NaN is returned. */ template<> struct pminmax_impl { template static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) { Packet not_nan_mask_a = pcmp_eq(a, a); Packet not_nan_mask_b = pcmp_eq(b, b); return pselect(not_nan_mask_a, pselect(not_nan_mask_b, op(a, b), b), a); } }; /** \internal \returns the min or max of \a a and \a b (coeff-wise) If both \a a and \a b are NaN, NaN is returned. Equivalent to std::fmin(a, b). */ template<> struct pminmax_impl { template static EIGEN_DEVICE_FUNC inline Packet run(const Packet& a, const Packet& b, Op op) { Packet not_nan_mask_a = pcmp_eq(a, a); Packet not_nan_mask_b = pcmp_eq(b, b); return pselect(not_nan_mask_a, pselect(not_nan_mask_b, op(a, b), a), b); } }; #ifndef SYCL_DEVICE_ONLY #define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) Func #else #define EIGEN_BINARY_OP_NAN_PROPAGATION(Type, Func) \ [](const Type& a, const Type& b) { \ return Func(a, b);} #endif /** \internal \returns the min of \a a and \a b (coeff-wise). If \a a or \b b is NaN, the return value is implementation defined. */ template EIGEN_DEVICE_FUNC inline Packet pmin(const Packet& a, const Packet& b) { return numext::mini(a,b); } /** \internal \returns the min of \a a and \a b (coeff-wise). NaNPropagation determines the NaN propagation semantics. */ template EIGEN_DEVICE_FUNC inline Packet pmin(const Packet& a, const Packet& b) { return pminmax_impl::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet, (pmin))); } /** \internal \returns the max of \a a and \a b (coeff-wise) If \a a or \b b is NaN, the return value is implementation defined. */ template EIGEN_DEVICE_FUNC inline Packet pmax(const Packet& a, const Packet& b) { return numext::maxi(a, b); } /** \internal \returns the max of \a a and \a b (coeff-wise). NaNPropagation determines the NaN propagation semantics. */ template EIGEN_DEVICE_FUNC inline Packet pmax(const Packet& a, const Packet& b) { return pminmax_impl::run(a, b, EIGEN_BINARY_OP_NAN_PROPAGATION(Packet,(pmax))); } /** \internal \returns the absolute value of \a a */ template EIGEN_DEVICE_FUNC inline Packet pabs(const Packet& a) { return numext::abs(a); } template<> EIGEN_DEVICE_FUNC inline unsigned int pabs(const unsigned int& a) { return a; } template<> EIGEN_DEVICE_FUNC inline unsigned long pabs(const unsigned long& a) { return a; } template<> EIGEN_DEVICE_FUNC inline unsigned long long pabs(const unsigned long long& a) { return a; } /** \internal \returns the addsub value of \a a,b */ template EIGEN_DEVICE_FUNC inline Packet paddsub(const Packet& a, const Packet& b) { return pselect(peven_mask(a), padd(a, b), psub(a, b)); } /** \internal \returns the phase angle of \a a */ template EIGEN_DEVICE_FUNC inline Packet parg(const Packet& a) { using numext::arg; return arg(a); } /** \internal \returns \a a logically shifted by N bits to the right */ template EIGEN_DEVICE_FUNC inline int parithmetic_shift_right(const int& a) { return a >> N; } template EIGEN_DEVICE_FUNC inline long int parithmetic_shift_right(const long int& a) { return a >> N; } /** \internal \returns \a a arithmetically shifted by N bits to the right */ template EIGEN_DEVICE_FUNC inline int plogical_shift_right(const int& a) { return static_cast(static_cast(a) >> N); } template EIGEN_DEVICE_FUNC inline long int plogical_shift_right(const long int& a) { return static_cast(static_cast(a) >> N); } /** \internal \returns \a a shifted by N bits to the left */ template EIGEN_DEVICE_FUNC inline int plogical_shift_left(const int& a) { return a << N; } template EIGEN_DEVICE_FUNC inline long int plogical_shift_left(const long int& a) { return a << N; } /** \internal \returns the significant and exponent of the underlying floating point numbers * See https://en.cppreference.com/w/cpp/numeric/math/frexp */ template EIGEN_DEVICE_FUNC inline Packet pfrexp(const Packet& a, Packet& exponent) { int exp; EIGEN_USING_STD(frexp); Packet result = static_cast(frexp(a, &exp)); exponent = static_cast(exp); return result; } /** \internal \returns a * 2^((int)exponent) * See https://en.cppreference.com/w/cpp/numeric/math/ldexp */ template EIGEN_DEVICE_FUNC inline Packet pldexp(const Packet &a, const Packet &exponent) { EIGEN_USING_STD(ldexp) return static_cast(ldexp(a, static_cast(exponent))); } /** \internal \returns the min of \a a and \a b (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pabsdiff(const Packet& a, const Packet& b) { return pselect(pcmp_lt(a, b), psub(b, a), psub(a, b)); } /** \internal \returns a packet version of \a *from, from must be 16 bytes aligned */ template EIGEN_DEVICE_FUNC inline Packet pload(const typename unpacket_traits::type* from) { return *from; } /** \internal \returns a packet version of \a *from, (un-aligned load) */ template EIGEN_DEVICE_FUNC inline Packet ploadu(const typename unpacket_traits::type* from) { return *from; } /** \internal \returns a packet version of \a *from, (un-aligned masked load) * There is no generic implementation. We only have implementations for specialized * cases. Generic case should not be called. */ template EIGEN_DEVICE_FUNC inline typename enable_if::masked_load_available, Packet>::type ploadu(const typename unpacket_traits::type* from, typename unpacket_traits::mask_t umask); /** \internal \returns a packet with constant coefficients \a a, e.g.: (a,a,a,a) */ template EIGEN_DEVICE_FUNC inline Packet pset1(const typename unpacket_traits::type& a) { return a; } /** \internal \returns a packet with constant coefficients set from bits */ template EIGEN_DEVICE_FUNC inline Packet pset1frombits(BitsType a); /** \internal \returns a packet with constant coefficients \a a[0], e.g.: (a[0],a[0],a[0],a[0]) */ template EIGEN_DEVICE_FUNC inline Packet pload1(const typename unpacket_traits::type *a) { return pset1(*a); } /** \internal \returns a packet with elements of \a *from duplicated. * For instance, for a packet of 8 elements, 4 scalars will be read from \a *from and * duplicated to form: {from[0],from[0],from[1],from[1],from[2],from[2],from[3],from[3]} * Currently, this function is only used for scalar * complex products. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet ploaddup(const typename unpacket_traits::type* from) { return *from; } /** \internal \returns a packet with elements of \a *from quadrupled. * For instance, for a packet of 8 elements, 2 scalars will be read from \a *from and * replicated to form: {from[0],from[0],from[0],from[0],from[1],from[1],from[1],from[1]} * Currently, this function is only used in matrix products. * For packet-size smaller or equal to 4, this function is equivalent to pload1 */ template EIGEN_DEVICE_FUNC inline Packet ploadquad(const typename unpacket_traits::type* from) { return pload1(from); } /** \internal equivalent to * \code * a0 = pload1(a+0); * a1 = pload1(a+1); * a2 = pload1(a+2); * a3 = pload1(a+3); * \endcode * \sa pset1, pload1, ploaddup, pbroadcast2 */ template EIGEN_DEVICE_FUNC inline void pbroadcast4(const typename unpacket_traits::type *a, Packet& a0, Packet& a1, Packet& a2, Packet& a3) { a0 = pload1(a+0); a1 = pload1(a+1); a2 = pload1(a+2); a3 = pload1(a+3); } /** \internal equivalent to * \code * a0 = pload1(a+0); * a1 = pload1(a+1); * \endcode * \sa pset1, pload1, ploaddup, pbroadcast4 */ template EIGEN_DEVICE_FUNC inline void pbroadcast2(const typename unpacket_traits::type *a, Packet& a0, Packet& a1) { a0 = pload1(a+0); a1 = pload1(a+1); } /** \internal \brief Returns a packet with coefficients (a,a+1,...,a+packet_size-1). */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet plset(const typename unpacket_traits::type& a) { return a; } /** \internal \returns a packet with constant coefficients \a a, e.g.: (x, 0, x, 0), where x is the value of all 1-bits. */ template EIGEN_DEVICE_FUNC inline Packet peven_mask(const Packet& /*a*/) { typedef typename unpacket_traits::type Scalar; const size_t n = unpacket_traits::size; EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) Scalar elements[n]; for(size_t i = 0; i < n; ++i) { memset(elements+i, ((i & 1) == 0 ? 0xff : 0), sizeof(Scalar)); } return ploadu(elements); } /** \internal copy the packet \a from to \a *to, \a to must be 16 bytes aligned */ template EIGEN_DEVICE_FUNC inline void pstore(Scalar* to, const Packet& from) { (*to) = from; } /** \internal copy the packet \a from to \a *to, (un-aligned store) */ template EIGEN_DEVICE_FUNC inline void pstoreu(Scalar* to, const Packet& from) { (*to) = from; } /** \internal copy the packet \a from to \a *to, (un-aligned store with a mask) * There is no generic implementation. We only have implementations for specialized * cases. Generic case should not be called. */ template EIGEN_DEVICE_FUNC inline typename enable_if::masked_store_available, void>::type pstoreu(Scalar* to, const Packet& from, typename unpacket_traits::mask_t umask); template EIGEN_DEVICE_FUNC inline Packet pgather(const Scalar* from, Index /*stride*/) { return ploadu(from); } template EIGEN_DEVICE_FUNC inline void pscatter(Scalar* to, const Packet& from, Index /*stride*/) { pstore(to, from); } /** \internal tries to do cache prefetching of \a addr */ template EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* addr) { #if defined(EIGEN_HIP_DEVICE_COMPILE) // do nothing #elif defined(EIGEN_CUDA_ARCH) #if defined(__LP64__) || EIGEN_OS_WIN64 // 64-bit pointer operand constraint for inlined asm asm(" prefetch.L1 [ %1 ];" : "=l"(addr) : "l"(addr)); #else // 32-bit pointer operand constraint for inlined asm asm(" prefetch.L1 [ %1 ];" : "=r"(addr) : "r"(addr)); #endif #elif (!EIGEN_COMP_MSVC) && (EIGEN_COMP_GNUC || EIGEN_COMP_CLANG || EIGEN_COMP_ICC) __builtin_prefetch(addr); #endif } /** \internal \returns the reversed elements of \a a*/ template EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a) { return a; } /** \internal \returns \a a with real and imaginary part flipped (for complex type only) */ template EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a) { return Packet(numext::imag(a),numext::real(a)); } /************************** * Special math functions ***************************/ /** \internal \returns the sine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet psin(const Packet& a) { EIGEN_USING_STD(sin); return sin(a); } /** \internal \returns the cosine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pcos(const Packet& a) { EIGEN_USING_STD(cos); return cos(a); } /** \internal \returns the tan of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet ptan(const Packet& a) { EIGEN_USING_STD(tan); return tan(a); } /** \internal \returns the arc sine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pasin(const Packet& a) { EIGEN_USING_STD(asin); return asin(a); } /** \internal \returns the arc cosine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pacos(const Packet& a) { EIGEN_USING_STD(acos); return acos(a); } /** \internal \returns the arc tangent of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet patan(const Packet& a) { EIGEN_USING_STD(atan); return atan(a); } /** \internal \returns the hyperbolic sine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet psinh(const Packet& a) { EIGEN_USING_STD(sinh); return sinh(a); } /** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pcosh(const Packet& a) { EIGEN_USING_STD(cosh); return cosh(a); } /** \internal \returns the hyperbolic tan of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet ptanh(const Packet& a) { EIGEN_USING_STD(tanh); return tanh(a); } /** \internal \returns the exp of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pexp(const Packet& a) { EIGEN_USING_STD(exp); return exp(a); } /** \internal \returns the expm1 of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pexpm1(const Packet& a) { return numext::expm1(a); } /** \internal \returns the log of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet plog(const Packet& a) { EIGEN_USING_STD(log); return log(a); } /** \internal \returns the log1p of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet plog1p(const Packet& a) { return numext::log1p(a); } /** \internal \returns the log10 of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet plog10(const Packet& a) { EIGEN_USING_STD(log10); return log10(a); } /** \internal \returns the log10 of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet plog2(const Packet& a) { typedef typename internal::unpacket_traits::type Scalar; return pmul(pset1(Scalar(EIGEN_LOG2E)), plog(a)); } /** \internal \returns the square-root of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet psqrt(const Packet& a) { return numext::sqrt(a); } /** \internal \returns the reciprocal square-root of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet prsqrt(const Packet& a) { typedef typename internal::unpacket_traits::type Scalar; return pdiv(pset1(Scalar(1)), psqrt(a)); } /** \internal \returns the rounded value of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pround(const Packet& a) { using numext::round; return round(a); } /** \internal \returns the floor of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pfloor(const Packet& a) { using numext::floor; return floor(a); } /** \internal \returns the rounded value of \a a (coeff-wise) with current * rounding mode */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet print(const Packet& a) { using numext::rint; return rint(a); } /** \internal \returns the ceil of \a a (coeff-wise) */ template EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet pceil(const Packet& a) { using numext::ceil; return ceil(a); } /** \internal \returns the first element of a packet */ template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type pfirst(const Packet& a) { return a; } /** \internal \returns the sum of the elements of upper and lower half of \a a if \a a is larger than 4. * For a packet {a0, a1, a2, a3, a4, a5, a6, a7}, it returns a half packet {a0+a4, a1+a5, a2+a6, a3+a7} * For packet-size smaller or equal to 4, this boils down to a noop. */ template EIGEN_DEVICE_FUNC inline typename conditional<(unpacket_traits::size%8)==0,typename unpacket_traits::half,Packet>::type predux_half_dowto4(const Packet& a) { return a; } // Slow generic implementation of Packet reduction. template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_helper(const Packet& a, Op op) { typedef typename unpacket_traits::type Scalar; const size_t n = unpacket_traits::size; EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) Scalar elements[n]; pstoreu(elements, a); for(size_t k = n / 2; k > 0; k /= 2) { for(size_t i = 0; i < k; ++i) { elements[i] = op(elements[i], elements[i + k]); } } return elements[0]; } /** \internal \returns the sum of the elements of \a a*/ template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux(const Packet& a) { return a; } /** \internal \returns the product of the elements of \a a */ template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_mul( const Packet& a) { typedef typename unpacket_traits::type Scalar; return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmul))); } /** \internal \returns the min of the elements of \a a */ template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_min( const Packet &a) { typedef typename unpacket_traits::type Scalar; return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin))); } template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_min( const Packet& a) { typedef typename unpacket_traits::type Scalar; return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmin))); } /** \internal \returns the min of the elements of \a a */ template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_max( const Packet &a) { typedef typename unpacket_traits::type Scalar; return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax))); } template EIGEN_DEVICE_FUNC inline typename unpacket_traits::type predux_max( const Packet& a) { typedef typename unpacket_traits::type Scalar; return predux_helper(a, EIGEN_BINARY_OP_NAN_PROPAGATION(Scalar, (pmax))); } #undef EIGEN_BINARY_OP_NAN_PROPAGATION /** \internal \returns true if all coeffs of \a a means "true" * It is supposed to be called on values returned by pcmp_*. */ // not needed yet // template EIGEN_DEVICE_FUNC inline bool predux_all(const Packet& a) // { return bool(a); } /** \internal \returns true if any coeffs of \a a means "true" * It is supposed to be called on values returned by pcmp_*. */ template EIGEN_DEVICE_FUNC inline bool predux_any(const Packet& a) { // Dirty but generic implementation where "true" is assumed to be non 0 and all the sames. // It is expected that "true" is either: // - Scalar(1) // - bits full of ones (NaN for floats), // - or first bit equals to 1 (1 for ints, smallest denormal for floats). // For all these cases, taking the sum is just fine, and this boils down to a no-op for scalars. typedef typename unpacket_traits::type Scalar; return numext::not_equal_strict(predux(a), Scalar(0)); } /*************************************************************************** * The following functions might not have to be overwritten for vectorized types ***************************************************************************/ /** \internal copy a packet with constant coefficient \a a (e.g., [a,a,a,a]) to \a *to. \a to must be 16 bytes aligned */ // NOTE: this function must really be templated on the packet type (think about different packet types for the same scalar type) template inline void pstore1(typename unpacket_traits::type* to, const typename unpacket_traits::type& a) { pstore(to, pset1(a)); } /** \internal \returns a * b + c (coeff-wise) */ template EIGEN_DEVICE_FUNC inline Packet pmadd(const Packet& a, const Packet& b, const Packet& c) { return padd(pmul(a, b),c); } /** \internal \returns a packet version of \a *from. * The pointer \a from must be aligned on a \a Alignment bytes boundary. */ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet ploadt(const typename unpacket_traits::type* from) { if(Alignment >= unpacket_traits::alignment) return pload(from); else return ploadu(from); } /** \internal copy the packet \a from to \a *to. * The pointer \a from must be aligned on a \a Alignment bytes boundary. */ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstoret(Scalar* to, const Packet& from) { if(Alignment >= unpacket_traits::alignment) pstore(to, from); else pstoreu(to, from); } /** \internal \returns a packet version of \a *from. * Unlike ploadt, ploadt_ro takes advantage of the read-only memory path on the * hardware if available to speedup the loading of data that won't be modified * by the current computation. */ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet ploadt_ro(const typename unpacket_traits::type* from) { return ploadt(from); } /*************************************************************************** * Fast complex products (GCC generates a function call which is very slow) ***************************************************************************/ // Eigen+CUDA does not support complexes. #if !defined(EIGEN_GPUCC) template<> inline std::complex pmul(const std::complex& a, const std::complex& b) { return std::complex(a.real()*b.real() - a.imag()*b.imag(), a.imag()*b.real() + a.real()*b.imag()); } template<> inline std::complex pmul(const std::complex& a, const std::complex& b) { return std::complex(a.real()*b.real() - a.imag()*b.imag(), a.imag()*b.real() + a.real()*b.imag()); } #endif /*************************************************************************** * PacketBlock, that is a collection of N packets where the number of words * in the packet is a multiple of N. ***************************************************************************/ template ::size> struct PacketBlock { Packet packet[N]; }; template EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& /*kernel*/) { // Nothing to do in the scalar case, i.e. a 1x1 matrix. } /*************************************************************************** * Selector, i.e. vector of N boolean values used to select (i.e. blend) * words from 2 packets. ***************************************************************************/ template struct Selector { bool select[N]; }; template EIGEN_DEVICE_FUNC inline Packet pblend(const Selector::size>& ifPacket, const Packet& thenPacket, const Packet& elsePacket) { return ifPacket.select[0] ? thenPacket : elsePacket; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_GENERIC_PACKET_MATH_H RcppEigen/inst/include/Eigen/src/Core/ReturnByValue.h0000644000176200001440000001027414562417221022164 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // Copyright (C) 2009-2010 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_RETURNBYVALUE_H #define EIGEN_RETURNBYVALUE_H namespace Eigen { namespace internal { template struct traits > : public traits::ReturnType> { enum { // We're disabling the DirectAccess because e.g. the constructor of // the Block-with-DirectAccess expression requires to have a coeffRef method. // Also, we don't want to have to implement the stride stuff. Flags = (traits::ReturnType>::Flags | EvalBeforeNestingBit) & ~DirectAccessBit }; }; /* The ReturnByValue object doesn't even have a coeff() method. * So the only way that nesting it in an expression can work, is by evaluating it into a plain matrix. * So internal::nested always gives the plain return matrix type. * * FIXME: I don't understand why we need this specialization: isn't this taken care of by the EvalBeforeNestingBit ?? * Answer: EvalBeforeNestingBit should be deprecated since we have the evaluators */ template struct nested_eval, n, PlainObject> { typedef typename traits::ReturnType type; }; } // end namespace internal /** \class ReturnByValue * \ingroup Core_Module * */ template class ReturnByValue : public internal::dense_xpr_base< ReturnByValue >::type, internal::no_assignment_operator { public: typedef typename internal::traits::ReturnType ReturnType; typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(ReturnByValue) template EIGEN_DEVICE_FUNC inline void evalTo(Dest& dst) const { static_cast(this)->evalTo(dst); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return static_cast(this)->rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return static_cast(this)->cols(); } #ifndef EIGEN_PARSED_BY_DOXYGEN #define Unusable YOU_ARE_TRYING_TO_ACCESS_A_SINGLE_COEFFICIENT_IN_A_SPECIAL_EXPRESSION_WHERE_THAT_IS_NOT_ALLOWED_BECAUSE_THAT_WOULD_BE_INEFFICIENT class Unusable{ Unusable(const Unusable&) {} Unusable& operator=(const Unusable&) {return *this;} }; const Unusable& coeff(Index) const { return *reinterpret_cast(this); } const Unusable& coeff(Index,Index) const { return *reinterpret_cast(this); } Unusable& coeffRef(Index) { return *reinterpret_cast(this); } Unusable& coeffRef(Index,Index) { return *reinterpret_cast(this); } #undef Unusable #endif }; template template EIGEN_DEVICE_FUNC Derived& DenseBase::operator=(const ReturnByValue& other) { other.evalTo(derived()); return derived(); } namespace internal { // Expression is evaluated in a temporary; default implementation of Assignment is bypassed so that // when a ReturnByValue expression is assigned, the evaluator is not constructed. // TODO: Finalize port to new regime; ReturnByValue should not exist in the expression world template struct evaluator > : public evaluator::ReturnType> { typedef ReturnByValue XprType; typedef typename internal::traits::ReturnType PlainObject; typedef evaluator Base; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : m_result(xpr.rows(), xpr.cols()) { ::new (static_cast(this)) Base(m_result); xpr.evalTo(m_result); } protected: PlainObject m_result; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_RETURNBYVALUE_H RcppEigen/inst/include/Eigen/src/Core/CommaInitializer.h0000644000176200001440000001353514562417221022660 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMMAINITIALIZER_H #define EIGEN_COMMAINITIALIZER_H namespace Eigen { /** \class CommaInitializer * \ingroup Core_Module * * \brief Helper class used by the comma initializer operator * * This class is internally used to implement the comma initializer feature. It is * the return type of MatrixBase::operator<<, and most of the time this is the only * way it is used. * * \sa \blank \ref MatrixBaseCommaInitRef "MatrixBase::operator<<", CommaInitializer::finished() */ template struct CommaInitializer { typedef typename XprType::Scalar Scalar; EIGEN_DEVICE_FUNC inline CommaInitializer(XprType& xpr, const Scalar& s) : m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1) { eigen_assert(m_xpr.rows() > 0 && m_xpr.cols() > 0 && "Cannot comma-initialize a 0x0 matrix (operator<<)"); m_xpr.coeffRef(0,0) = s; } template EIGEN_DEVICE_FUNC inline CommaInitializer(XprType& xpr, const DenseBase& other) : m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows()) { eigen_assert(m_xpr.rows() >= other.rows() && m_xpr.cols() >= other.cols() && "Cannot comma-initialize a 0x0 matrix (operator<<)"); m_xpr.block(0, 0, other.rows(), other.cols()) = other; } /* Copy/Move constructor which transfers ownership. This is crucial in * absence of return value optimization to avoid assertions during destruction. */ // FIXME in C++11 mode this could be replaced by a proper RValue constructor EIGEN_DEVICE_FUNC inline CommaInitializer(const CommaInitializer& o) : m_xpr(o.m_xpr), m_row(o.m_row), m_col(o.m_col), m_currentBlockRows(o.m_currentBlockRows) { // Mark original object as finished. In absence of R-value references we need to const_cast: const_cast(o).m_row = m_xpr.rows(); const_cast(o).m_col = m_xpr.cols(); const_cast(o).m_currentBlockRows = 0; } /* inserts a scalar value in the target matrix */ EIGEN_DEVICE_FUNC CommaInitializer& operator,(const Scalar& s) { if (m_col==m_xpr.cols()) { m_row+=m_currentBlockRows; m_col = 0; m_currentBlockRows = 1; eigen_assert(m_row EIGEN_DEVICE_FUNC CommaInitializer& operator,(const DenseBase& other) { if (m_col==m_xpr.cols() && (other.cols()!=0 || other.rows()!=m_currentBlockRows)) { m_row+=m_currentBlockRows; m_col = 0; m_currentBlockRows = other.rows(); eigen_assert(m_row+m_currentBlockRows<=m_xpr.rows() && "Too many rows passed to comma initializer (operator<<)"); } eigen_assert((m_col + other.cols() <= m_xpr.cols()) && "Too many coefficients passed to comma initializer (operator<<)"); eigen_assert(m_currentBlockRows==other.rows()); m_xpr.template block (m_row, m_col, other.rows(), other.cols()) = other; m_col += other.cols(); return *this; } EIGEN_DEVICE_FUNC inline ~CommaInitializer() #if defined VERIFY_RAISES_ASSERT && (!defined EIGEN_NO_ASSERTION_CHECKING) && defined EIGEN_EXCEPTIONS EIGEN_EXCEPTION_SPEC(Eigen::eigen_assert_exception) #endif { finished(); } /** \returns the built matrix once all its coefficients have been set. * Calling finished is 100% optional. Its purpose is to write expressions * like this: * \code * quaternion.fromRotationMatrix((Matrix3f() << axis0, axis1, axis2).finished()); * \endcode */ EIGEN_DEVICE_FUNC inline XprType& finished() { eigen_assert(((m_row+m_currentBlockRows) == m_xpr.rows() || m_xpr.cols() == 0) && m_col == m_xpr.cols() && "Too few coefficients passed to comma initializer (operator<<)"); return m_xpr; } XprType& m_xpr; // target expression Index m_row; // current row id Index m_col; // current col id Index m_currentBlockRows; // current block height }; /** \anchor MatrixBaseCommaInitRef * Convenient operator to set the coefficients of a matrix. * * The coefficients must be provided in a row major order and exactly match * the size of the matrix. Otherwise an assertion is raised. * * Example: \include MatrixBase_set.cpp * Output: \verbinclude MatrixBase_set.out * * \note According the c++ standard, the argument expressions of this comma initializer are evaluated in arbitrary order. * * \sa CommaInitializer::finished(), class CommaInitializer */ template EIGEN_DEVICE_FUNC inline CommaInitializer DenseBase::operator<< (const Scalar& s) { return CommaInitializer(*static_cast(this), s); } /** \sa operator<<(const Scalar&) */ template template EIGEN_DEVICE_FUNC inline CommaInitializer DenseBase::operator<<(const DenseBase& other) { return CommaInitializer(*static_cast(this), other); } } // end namespace Eigen #endif // EIGEN_COMMAINITIALIZER_H RcppEigen/inst/include/Eigen/src/Core/AssignEvaluator.h0000644000176200001440000012131114562417221022517 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Benoit Jacob // Copyright (C) 2011-2014 Gael Guennebaud // Copyright (C) 2011-2012 Jitse Niesen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ASSIGN_EVALUATOR_H #define EIGEN_ASSIGN_EVALUATOR_H namespace Eigen { // This implementation is based on Assign.h namespace internal { /*************************************************************************** * Part 1 : the logic deciding a strategy for traversal and unrolling * ***************************************************************************/ // copy_using_evaluator_traits is based on assign_traits template struct copy_using_evaluator_traits { typedef typename DstEvaluator::XprType Dst; typedef typename Dst::Scalar DstScalar; enum { DstFlags = DstEvaluator::Flags, SrcFlags = SrcEvaluator::Flags }; public: enum { DstAlignment = DstEvaluator::Alignment, SrcAlignment = SrcEvaluator::Alignment, DstHasDirectAccess = (DstFlags & DirectAccessBit) == DirectAccessBit, JointAlignment = EIGEN_PLAIN_ENUM_MIN(DstAlignment,SrcAlignment) }; private: enum { InnerSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::SizeAtCompileTime) : int(DstFlags)&RowMajorBit ? int(Dst::ColsAtCompileTime) : int(Dst::RowsAtCompileTime), InnerMaxSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime) : int(DstFlags)&RowMajorBit ? int(Dst::MaxColsAtCompileTime) : int(Dst::MaxRowsAtCompileTime), RestrictedInnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(InnerSize,MaxPacketSize), RestrictedLinearSize = EIGEN_SIZE_MIN_PREFER_FIXED(Dst::SizeAtCompileTime,MaxPacketSize), OuterStride = int(outer_stride_at_compile_time::ret), MaxSizeAtCompileTime = Dst::SizeAtCompileTime }; // TODO distinguish between linear traversal and inner-traversals typedef typename find_best_packet::type LinearPacketType; typedef typename find_best_packet::type InnerPacketType; enum { LinearPacketSize = unpacket_traits::size, InnerPacketSize = unpacket_traits::size }; public: enum { LinearRequiredAlignment = unpacket_traits::alignment, InnerRequiredAlignment = unpacket_traits::alignment }; private: enum { DstIsRowMajor = DstFlags&RowMajorBit, SrcIsRowMajor = SrcFlags&RowMajorBit, StorageOrdersAgree = (int(DstIsRowMajor) == int(SrcIsRowMajor)), MightVectorize = bool(StorageOrdersAgree) && (int(DstFlags) & int(SrcFlags) & ActualPacketAccessBit) && bool(functor_traits::PacketAccess), MayInnerVectorize = MightVectorize && int(InnerSize)!=Dynamic && int(InnerSize)%int(InnerPacketSize)==0 && int(OuterStride)!=Dynamic && int(OuterStride)%int(InnerPacketSize)==0 && (EIGEN_UNALIGNED_VECTORIZE || int(JointAlignment)>=int(InnerRequiredAlignment)), MayLinearize = bool(StorageOrdersAgree) && (int(DstFlags) & int(SrcFlags) & LinearAccessBit), MayLinearVectorize = bool(MightVectorize) && bool(MayLinearize) && bool(DstHasDirectAccess) && (EIGEN_UNALIGNED_VECTORIZE || (int(DstAlignment)>=int(LinearRequiredAlignment)) || MaxSizeAtCompileTime == Dynamic), /* If the destination isn't aligned, we have to do runtime checks and we don't unroll, so it's only good for large enough sizes. */ MaySliceVectorize = bool(MightVectorize) && bool(DstHasDirectAccess) && (int(InnerMaxSize)==Dynamic || int(InnerMaxSize)>=(EIGEN_UNALIGNED_VECTORIZE?InnerPacketSize:(3*InnerPacketSize))) /* slice vectorization can be slow, so we only want it if the slices are big, which is indicated by InnerMaxSize rather than InnerSize, think of the case of a dynamic block in a fixed-size matrix However, with EIGEN_UNALIGNED_VECTORIZE and unrolling, slice vectorization is still worth it */ }; public: enum { Traversal = int(Dst::SizeAtCompileTime) == 0 ? int(AllAtOnceTraversal) // If compile-size is zero, traversing will fail at compile-time. : (int(MayLinearVectorize) && (LinearPacketSize>InnerPacketSize)) ? int(LinearVectorizedTraversal) : int(MayInnerVectorize) ? int(InnerVectorizedTraversal) : int(MayLinearVectorize) ? int(LinearVectorizedTraversal) : int(MaySliceVectorize) ? int(SliceVectorizedTraversal) : int(MayLinearize) ? int(LinearTraversal) : int(DefaultTraversal), Vectorized = int(Traversal) == InnerVectorizedTraversal || int(Traversal) == LinearVectorizedTraversal || int(Traversal) == SliceVectorizedTraversal }; typedef typename conditional::type PacketType; private: enum { ActualPacketSize = int(Traversal)==LinearVectorizedTraversal ? LinearPacketSize : Vectorized ? InnerPacketSize : 1, UnrollingLimit = EIGEN_UNROLLING_LIMIT * ActualPacketSize, MayUnrollCompletely = int(Dst::SizeAtCompileTime) != Dynamic && int(Dst::SizeAtCompileTime) * (int(DstEvaluator::CoeffReadCost)+int(SrcEvaluator::CoeffReadCost)) <= int(UnrollingLimit), MayUnrollInner = int(InnerSize) != Dynamic && int(InnerSize) * (int(DstEvaluator::CoeffReadCost)+int(SrcEvaluator::CoeffReadCost)) <= int(UnrollingLimit) }; public: enum { Unrolling = (int(Traversal) == int(InnerVectorizedTraversal) || int(Traversal) == int(DefaultTraversal)) ? ( int(MayUnrollCompletely) ? int(CompleteUnrolling) : int(MayUnrollInner) ? int(InnerUnrolling) : int(NoUnrolling) ) : int(Traversal) == int(LinearVectorizedTraversal) ? ( bool(MayUnrollCompletely) && ( EIGEN_UNALIGNED_VECTORIZE || (int(DstAlignment)>=int(LinearRequiredAlignment))) ? int(CompleteUnrolling) : int(NoUnrolling) ) : int(Traversal) == int(LinearTraversal) ? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling) : int(NoUnrolling) ) #if EIGEN_UNALIGNED_VECTORIZE : int(Traversal) == int(SliceVectorizedTraversal) ? ( bool(MayUnrollInner) ? int(InnerUnrolling) : int(NoUnrolling) ) #endif : int(NoUnrolling) }; #ifdef EIGEN_DEBUG_ASSIGN static void debug() { std::cerr << "DstXpr: " << typeid(typename DstEvaluator::XprType).name() << std::endl; std::cerr << "SrcXpr: " << typeid(typename SrcEvaluator::XprType).name() << std::endl; std::cerr.setf(std::ios::hex, std::ios::basefield); std::cerr << "DstFlags" << " = " << DstFlags << " (" << demangle_flags(DstFlags) << " )" << std::endl; std::cerr << "SrcFlags" << " = " << SrcFlags << " (" << demangle_flags(SrcFlags) << " )" << std::endl; std::cerr.unsetf(std::ios::hex); EIGEN_DEBUG_VAR(DstAlignment) EIGEN_DEBUG_VAR(SrcAlignment) EIGEN_DEBUG_VAR(LinearRequiredAlignment) EIGEN_DEBUG_VAR(InnerRequiredAlignment) EIGEN_DEBUG_VAR(JointAlignment) EIGEN_DEBUG_VAR(InnerSize) EIGEN_DEBUG_VAR(InnerMaxSize) EIGEN_DEBUG_VAR(LinearPacketSize) EIGEN_DEBUG_VAR(InnerPacketSize) EIGEN_DEBUG_VAR(ActualPacketSize) EIGEN_DEBUG_VAR(StorageOrdersAgree) EIGEN_DEBUG_VAR(MightVectorize) EIGEN_DEBUG_VAR(MayLinearize) EIGEN_DEBUG_VAR(MayInnerVectorize) EIGEN_DEBUG_VAR(MayLinearVectorize) EIGEN_DEBUG_VAR(MaySliceVectorize) std::cerr << "Traversal" << " = " << Traversal << " (" << demangle_traversal(Traversal) << ")" << std::endl; EIGEN_DEBUG_VAR(SrcEvaluator::CoeffReadCost) EIGEN_DEBUG_VAR(DstEvaluator::CoeffReadCost) EIGEN_DEBUG_VAR(Dst::SizeAtCompileTime) EIGEN_DEBUG_VAR(UnrollingLimit) EIGEN_DEBUG_VAR(MayUnrollCompletely) EIGEN_DEBUG_VAR(MayUnrollInner) std::cerr << "Unrolling" << " = " << Unrolling << " (" << demangle_unrolling(Unrolling) << ")" << std::endl; std::cerr << std::endl; } #endif }; /*************************************************************************** * Part 2 : meta-unrollers ***************************************************************************/ /************************ *** Default traversal *** ************************/ template struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling { // FIXME: this is not very clean, perhaps this information should be provided by the kernel? typedef typename Kernel::DstEvaluatorType DstEvaluatorType; typedef typename DstEvaluatorType::XprType DstXprType; enum { outer = Index / DstXprType::InnerSizeAtCompileTime, inner = Index % DstXprType::InnerSizeAtCompileTime }; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { kernel.assignCoeffByOuterInner(outer, inner); copy_using_evaluator_DefaultTraversal_CompleteUnrolling::run(kernel); } }; template struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel&) { } }; template struct copy_using_evaluator_DefaultTraversal_InnerUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel, Index outer) { kernel.assignCoeffByOuterInner(outer, Index_); copy_using_evaluator_DefaultTraversal_InnerUnrolling::run(kernel, outer); } }; template struct copy_using_evaluator_DefaultTraversal_InnerUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel&, Index) { } }; /*********************** *** Linear traversal *** ***********************/ template struct copy_using_evaluator_LinearTraversal_CompleteUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel& kernel) { kernel.assignCoeff(Index); copy_using_evaluator_LinearTraversal_CompleteUnrolling::run(kernel); } }; template struct copy_using_evaluator_LinearTraversal_CompleteUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel&) { } }; /************************** *** Inner vectorization *** **************************/ template struct copy_using_evaluator_innervec_CompleteUnrolling { // FIXME: this is not very clean, perhaps this information should be provided by the kernel? typedef typename Kernel::DstEvaluatorType DstEvaluatorType; typedef typename DstEvaluatorType::XprType DstXprType; typedef typename Kernel::PacketType PacketType; enum { outer = Index / DstXprType::InnerSizeAtCompileTime, inner = Index % DstXprType::InnerSizeAtCompileTime, SrcAlignment = Kernel::AssignmentTraits::SrcAlignment, DstAlignment = Kernel::AssignmentTraits::DstAlignment }; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { kernel.template assignPacketByOuterInner(outer, inner); enum { NextIndex = Index + unpacket_traits::size }; copy_using_evaluator_innervec_CompleteUnrolling::run(kernel); } }; template struct copy_using_evaluator_innervec_CompleteUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel&) { } }; template struct copy_using_evaluator_innervec_InnerUnrolling { typedef typename Kernel::PacketType PacketType; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel, Index outer) { kernel.template assignPacketByOuterInner(outer, Index_); enum { NextIndex = Index_ + unpacket_traits::size }; copy_using_evaluator_innervec_InnerUnrolling::run(kernel, outer); } }; template struct copy_using_evaluator_innervec_InnerUnrolling { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &, Index) { } }; /*************************************************************************** * Part 3 : implementation of all cases ***************************************************************************/ // dense_assignment_loop is based on assign_impl template struct dense_assignment_loop; /************************ ***** Special Cases ***** ************************/ // Zero-sized assignment is a no-op. template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static void EIGEN_STRONG_INLINE run(Kernel& /*kernel*/) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; EIGEN_STATIC_ASSERT(int(DstXprType::SizeAtCompileTime) == 0, EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT) } }; /************************ *** Default traversal *** ************************/ template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static void EIGEN_STRONG_INLINE run(Kernel &kernel) { for(Index outer = 0; outer < kernel.outerSize(); ++outer) { for(Index inner = 0; inner < kernel.innerSize(); ++inner) { kernel.assignCoeffByOuterInner(outer, inner); } } } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; copy_using_evaluator_DefaultTraversal_CompleteUnrolling::run(kernel); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; const Index outerSize = kernel.outerSize(); for(Index outer = 0; outer < outerSize; ++outer) copy_using_evaluator_DefaultTraversal_InnerUnrolling::run(kernel, outer); } }; /*************************** *** Linear vectorization *** ***************************/ // The goal of unaligned_dense_assignment_loop is simply to factorize the handling // of the non vectorizable beginning and ending parts template struct unaligned_dense_assignment_loop { // if IsAligned = true, then do nothing template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel&, Index, Index) {} }; template <> struct unaligned_dense_assignment_loop { // MSVC must not inline this functions. If it does, it fails to optimize the // packet access path. // FIXME check which version exhibits this issue #if EIGEN_COMP_MSVC template static EIGEN_DONT_INLINE void run(Kernel &kernel, Index start, Index end) #else template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel, Index start, Index end) #endif { for (Index index = start; index < end; ++index) kernel.assignCoeff(index); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { const Index size = kernel.size(); typedef typename Kernel::Scalar Scalar; typedef typename Kernel::PacketType PacketType; enum { requestedAlignment = Kernel::AssignmentTraits::LinearRequiredAlignment, packetSize = unpacket_traits::size, dstIsAligned = int(Kernel::AssignmentTraits::DstAlignment)>=int(requestedAlignment), dstAlignment = packet_traits::AlignedOnScalar ? int(requestedAlignment) : int(Kernel::AssignmentTraits::DstAlignment), srcAlignment = Kernel::AssignmentTraits::JointAlignment }; const Index alignedStart = dstIsAligned ? 0 : internal::first_aligned(kernel.dstDataPtr(), size); const Index alignedEnd = alignedStart + ((size-alignedStart)/packetSize)*packetSize; unaligned_dense_assignment_loop::run(kernel, 0, alignedStart); for(Index index = alignedStart; index < alignedEnd; index += packetSize) kernel.template assignPacket(index); unaligned_dense_assignment_loop<>::run(kernel, alignedEnd, size); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; typedef typename Kernel::PacketType PacketType; enum { size = DstXprType::SizeAtCompileTime, packetSize =unpacket_traits::size, alignedSize = (int(size)/packetSize)*packetSize }; copy_using_evaluator_innervec_CompleteUnrolling::run(kernel); copy_using_evaluator_DefaultTraversal_CompleteUnrolling::run(kernel); } }; /************************** *** Inner vectorization *** **************************/ template struct dense_assignment_loop { typedef typename Kernel::PacketType PacketType; enum { SrcAlignment = Kernel::AssignmentTraits::SrcAlignment, DstAlignment = Kernel::AssignmentTraits::DstAlignment }; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { const Index innerSize = kernel.innerSize(); const Index outerSize = kernel.outerSize(); const Index packetSize = unpacket_traits::size; for(Index outer = 0; outer < outerSize; ++outer) for(Index inner = 0; inner < innerSize; inner+=packetSize) kernel.template assignPacketByOuterInner(outer, inner); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; copy_using_evaluator_innervec_CompleteUnrolling::run(kernel); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; typedef typename Kernel::AssignmentTraits Traits; const Index outerSize = kernel.outerSize(); for(Index outer = 0; outer < outerSize; ++outer) copy_using_evaluator_innervec_InnerUnrolling::run(kernel, outer); } }; /*********************** *** Linear traversal *** ***********************/ template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { const Index size = kernel.size(); for(Index i = 0; i < size; ++i) kernel.assignCoeff(i); } }; template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; copy_using_evaluator_LinearTraversal_CompleteUnrolling::run(kernel); } }; /************************** *** Slice vectorization *** ***************************/ template struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::Scalar Scalar; typedef typename Kernel::PacketType PacketType; enum { packetSize = unpacket_traits::size, requestedAlignment = int(Kernel::AssignmentTraits::InnerRequiredAlignment), alignable = packet_traits::AlignedOnScalar || int(Kernel::AssignmentTraits::DstAlignment)>=sizeof(Scalar), dstIsAligned = int(Kernel::AssignmentTraits::DstAlignment)>=int(requestedAlignment), dstAlignment = alignable ? int(requestedAlignment) : int(Kernel::AssignmentTraits::DstAlignment) }; const Scalar *dst_ptr = kernel.dstDataPtr(); if((!bool(dstIsAligned)) && (UIntPtr(dst_ptr) % sizeof(Scalar))>0) { // the pointer is not aligned-on scalar, so alignment is not possible return dense_assignment_loop::run(kernel); } const Index packetAlignedMask = packetSize - 1; const Index innerSize = kernel.innerSize(); const Index outerSize = kernel.outerSize(); const Index alignedStep = alignable ? (packetSize - kernel.outerStride() % packetSize) & packetAlignedMask : 0; Index alignedStart = ((!alignable) || bool(dstIsAligned)) ? 0 : internal::first_aligned(dst_ptr, innerSize); for(Index outer = 0; outer < outerSize; ++outer) { const Index alignedEnd = alignedStart + ((innerSize-alignedStart) & ~packetAlignedMask); // do the non-vectorizable part of the assignment for(Index inner = 0; inner(outer, inner); // do the non-vectorizable part of the assignment for(Index inner = alignedEnd; inner struct dense_assignment_loop { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(Kernel &kernel) { typedef typename Kernel::DstEvaluatorType::XprType DstXprType; typedef typename Kernel::PacketType PacketType; enum { innerSize = DstXprType::InnerSizeAtCompileTime, packetSize =unpacket_traits::size, vectorizableSize = (int(innerSize) / int(packetSize)) * int(packetSize), size = DstXprType::SizeAtCompileTime }; for(Index outer = 0; outer < kernel.outerSize(); ++outer) { copy_using_evaluator_innervec_InnerUnrolling::run(kernel, outer); copy_using_evaluator_DefaultTraversal_InnerUnrolling::run(kernel, outer); } } }; #endif /*************************************************************************** * Part 4 : Generic dense assignment kernel ***************************************************************************/ // This class generalize the assignment of a coefficient (or packet) from one dense evaluator // to another dense writable evaluator. // It is parametrized by the two evaluators, and the actual assignment functor. // This abstraction level permits to keep the evaluation loops as simple and as generic as possible. // One can customize the assignment using this generic dense_assignment_kernel with different // functors, or by completely overloading it, by-passing a functor. template class generic_dense_assignment_kernel { protected: typedef typename DstEvaluatorTypeT::XprType DstXprType; typedef typename SrcEvaluatorTypeT::XprType SrcXprType; public: typedef DstEvaluatorTypeT DstEvaluatorType; typedef SrcEvaluatorTypeT SrcEvaluatorType; typedef typename DstEvaluatorType::Scalar Scalar; typedef copy_using_evaluator_traits AssignmentTraits; typedef typename AssignmentTraits::PacketType PacketType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr) : m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr) { #ifdef EIGEN_DEBUG_ASSIGN AssignmentTraits::debug(); #endif } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_dstExpr.size(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index innerSize() const EIGEN_NOEXCEPT { return m_dstExpr.innerSize(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerSize() const EIGEN_NOEXCEPT { return m_dstExpr.outerSize(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_dstExpr.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_dstExpr.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index outerStride() const EIGEN_NOEXCEPT { return m_dstExpr.outerStride(); } EIGEN_DEVICE_FUNC DstEvaluatorType& dstEvaluator() EIGEN_NOEXCEPT { return m_dst; } EIGEN_DEVICE_FUNC const SrcEvaluatorType& srcEvaluator() const EIGEN_NOEXCEPT { return m_src; } /// Assign src(row,col) to dst(row,col) through the assignment functor. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index row, Index col) { m_functor.assignCoeff(m_dst.coeffRef(row,col), m_src.coeff(row,col)); } /// \sa assignCoeff(Index,Index) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Index index) { m_functor.assignCoeff(m_dst.coeffRef(index), m_src.coeff(index)); } /// \sa assignCoeff(Index,Index) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeffByOuterInner(Index outer, Index inner) { Index row = rowIndexByOuterInner(outer, inner); Index col = colIndexByOuterInner(outer, inner); assignCoeff(row, col); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index row, Index col) { m_functor.template assignPacket(&m_dst.coeffRef(row,col), m_src.template packet(row,col)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacket(Index index) { m_functor.template assignPacket(&m_dst.coeffRef(index), m_src.template packet(index)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignPacketByOuterInner(Index outer, Index inner) { Index row = rowIndexByOuterInner(outer, inner); Index col = colIndexByOuterInner(outer, inner); assignPacket(row, col); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index rowIndexByOuterInner(Index outer, Index inner) { typedef typename DstEvaluatorType::ExpressionTraits Traits; return int(Traits::RowsAtCompileTime) == 1 ? 0 : int(Traits::ColsAtCompileTime) == 1 ? inner : int(DstEvaluatorType::Flags)&RowMajorBit ? outer : inner; } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Index colIndexByOuterInner(Index outer, Index inner) { typedef typename DstEvaluatorType::ExpressionTraits Traits; return int(Traits::ColsAtCompileTime) == 1 ? 0 : int(Traits::RowsAtCompileTime) == 1 ? inner : int(DstEvaluatorType::Flags)&RowMajorBit ? inner : outer; } EIGEN_DEVICE_FUNC const Scalar* dstDataPtr() const { return m_dstExpr.data(); } protected: DstEvaluatorType& m_dst; const SrcEvaluatorType& m_src; const Functor &m_functor; // TODO find a way to avoid the needs of the original expression DstXprType& m_dstExpr; }; // Special kernel used when computing small products whose operands have dynamic dimensions. It ensures that the // PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used // when computing the product. template class restricted_packet_dense_assignment_kernel : public generic_dense_assignment_kernel { protected: typedef generic_dense_assignment_kernel Base; public: typedef typename Base::Scalar Scalar; typedef typename Base::DstXprType DstXprType; typedef copy_using_evaluator_traits AssignmentTraits; typedef typename AssignmentTraits::PacketType PacketType; EIGEN_DEVICE_FUNC restricted_packet_dense_assignment_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, const Functor &func, DstXprType& dstExpr) : Base(dst, src, func, dstExpr) { } }; /*************************************************************************** * Part 5 : Entry point for dense rectangular assignment ***************************************************************************/ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize_if_allowed(DstXprType &dst, const SrcXprType& src, const Functor &/*func*/) { EIGEN_ONLY_USED_FOR_DEBUG(dst); EIGEN_ONLY_USED_FOR_DEBUG(src); eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize_if_allowed(DstXprType &dst, const SrcXprType& src, const internal::assign_op &/*func*/) { Index dstRows = src.rows(); Index dstCols = src.cols(); if(((dst.rows()!=dstRows) || (dst.cols()!=dstCols))) dst.resize(dstRows, dstCols); eigen_assert(dst.rows() == dstRows && dst.cols() == dstCols); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const SrcXprType& src, const Functor &func) { typedef evaluator DstEvaluatorType; typedef evaluator SrcEvaluatorType; SrcEvaluatorType srcEvaluator(src); // NOTE To properly handle A = (A*A.transpose())/s with A rectangular, // we need to resize the destination after the source evaluator has been created. resize_if_allowed(dst, src, func); DstEvaluatorType dstEvaluator(dst); typedef generic_dense_assignment_kernel Kernel; Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived()); dense_assignment_loop::run(kernel); } // Specialization for filling the destination with a constant value. #ifndef EIGEN_GPU_COMPILE_PHASE template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const Eigen::CwiseNullaryOp, DstXprType>& src, const internal::assign_op& func) { resize_if_allowed(dst, src, func); std::fill_n(dst.data(), dst.size(), src.functor()()); } #endif template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_dense_assignment_loop(DstXprType& dst, const SrcXprType& src) { call_dense_assignment_loop(dst, src, internal::assign_op()); } /*************************************************************************** * Part 6 : Generic assignment ***************************************************************************/ // Based on the respective shapes of the destination and source, // the class AssignmentKind determine the kind of assignment mechanism. // AssignmentKind must define a Kind typedef. template struct AssignmentKind; // Assignment kind defined in this file: struct Dense2Dense {}; struct EigenBase2EigenBase {}; template struct AssignmentKind { typedef EigenBase2EigenBase Kind; }; template<> struct AssignmentKind { typedef Dense2Dense Kind; }; // This is the main assignment class template< typename DstXprType, typename SrcXprType, typename Functor, typename Kind = typename AssignmentKind< typename evaluator_traits::Shape , typename evaluator_traits::Shape >::Kind, typename EnableIf = void> struct Assignment; // The only purpose of this call_assignment() function is to deal with noalias() / "assume-aliasing" and automatic transposition. // Indeed, I (Gael) think that this concept of "assume-aliasing" was a mistake, and it makes thing quite complicated. // So this intermediate function removes everything related to "assume-aliasing" such that Assignment // does not has to bother about these annoying details. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment(Dst& dst, const Src& src) { call_assignment(dst, src, internal::assign_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment(const Dst& dst, const Src& src) { call_assignment(dst, src, internal::assign_op()); } // Deal with "assume-aliasing" template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment(Dst& dst, const Src& src, const Func& func, typename enable_if< evaluator_assume_aliasing::value, void*>::type = 0) { typename plain_matrix_type::type tmp(src); call_assignment_no_alias(dst, tmp, func); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment(Dst& dst, const Src& src, const Func& func, typename enable_if::value, void*>::type = 0) { call_assignment_no_alias(dst, src, func); } // by-pass "assume-aliasing" // When there is no aliasing, we require that 'dst' has been properly resized template class StorageBase, typename Src, typename Func> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment(NoAlias& dst, const Src& src, const Func& func) { call_assignment_no_alias(dst.expression(), src, func); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment_no_alias(Dst& dst, const Src& src, const Func& func) { enum { NeedToTranspose = ( (int(Dst::RowsAtCompileTime) == 1 && int(Src::ColsAtCompileTime) == 1) || (int(Dst::ColsAtCompileTime) == 1 && int(Src::RowsAtCompileTime) == 1) ) && int(Dst::SizeAtCompileTime) != 1 }; typedef typename internal::conditional, Dst>::type ActualDstTypeCleaned; typedef typename internal::conditional, Dst&>::type ActualDstType; ActualDstType actualDst(dst); // TODO check whether this is the right place to perform these checks: EIGEN_STATIC_ASSERT_LVALUE(Dst) EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(ActualDstTypeCleaned,Src) EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename ActualDstTypeCleaned::Scalar,typename Src::Scalar); Assignment::run(actualDst, src, func); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_restricted_packet_assignment_no_alias(Dst& dst, const Src& src, const Func& func) { typedef evaluator DstEvaluatorType; typedef evaluator SrcEvaluatorType; typedef restricted_packet_dense_assignment_kernel Kernel; EIGEN_STATIC_ASSERT_LVALUE(Dst) EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename Dst::Scalar,typename Src::Scalar); SrcEvaluatorType srcEvaluator(src); resize_if_allowed(dst, src, func); DstEvaluatorType dstEvaluator(dst); Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived()); dense_assignment_loop::run(kernel); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment_no_alias(Dst& dst, const Src& src) { call_assignment_no_alias(dst, src, internal::assign_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment_no_alias_no_transpose(Dst& dst, const Src& src, const Func& func) { // TODO check whether this is the right place to perform these checks: EIGEN_STATIC_ASSERT_LVALUE(Dst) EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Dst,Src) EIGEN_CHECK_BINARY_COMPATIBILIY(Func,typename Dst::Scalar,typename Src::Scalar); Assignment::run(dst, src, func); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_assignment_no_alias_no_transpose(Dst& dst, const Src& src) { call_assignment_no_alias_no_transpose(dst, src, internal::assign_op()); } // forward declaration template void check_for_aliasing(const Dst &dst, const Src &src); // Generic Dense to Dense assignment // Note that the last template argument "Weak" is needed to make it possible to perform // both partial specialization+SFINAE without ambiguous specialization template< typename DstXprType, typename SrcXprType, typename Functor, typename Weak> struct Assignment { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(DstXprType &dst, const SrcXprType &src, const Functor &func) { #ifndef EIGEN_NO_DEBUG internal::check_for_aliasing(dst, src); #endif call_dense_assignment_loop(dst, src, func); } }; // Generic assignment through evalTo. // TODO: not sure we have to keep that one, but it helps porting current code to new evaluator mechanism. // Note that the last template argument "Weak" is needed to make it possible to perform // both partial specialization+SFINAE without ambiguous specialization template< typename DstXprType, typename SrcXprType, typename Functor, typename Weak> struct Assignment { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &/*func*/) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); src.evalTo(dst); } // NOTE The following two functions are templated to avoid their instantiation if not needed // This is needed because some expressions supports evalTo only and/or have 'void' as scalar type. template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op &/*func*/) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); src.addTo(dst); } template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op &/*func*/) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); src.subTo(dst); } }; } // namespace internal } // end namespace Eigen #endif // EIGEN_ASSIGN_EVALUATOR_H RcppEigen/inst/include/Eigen/src/Core/StableNorm.h0000644000176200001440000002077414562417221021471 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STABLENORM_H #define EIGEN_STABLENORM_H namespace Eigen { namespace internal { template inline void stable_norm_kernel(const ExpressionType& bl, Scalar& ssq, Scalar& scale, Scalar& invScale) { Scalar maxCoeff = bl.cwiseAbs().maxCoeff(); if(maxCoeff>scale) { ssq = ssq * numext::abs2(scale/maxCoeff); Scalar tmp = Scalar(1)/maxCoeff; if(tmp > NumTraits::highest()) { invScale = NumTraits::highest(); scale = Scalar(1)/invScale; } else if(maxCoeff>NumTraits::highest()) // we got a INF { invScale = Scalar(1); scale = maxCoeff; } else { scale = maxCoeff; invScale = tmp; } } else if(maxCoeff!=maxCoeff) // we got a NaN { scale = maxCoeff; } // TODO if the maxCoeff is much much smaller than the current scale, // then we can neglect this sub vector if(scale>Scalar(0)) // if scale==0, then bl is 0 ssq += (bl*invScale).squaredNorm(); } template void stable_norm_impl_inner_step(const VectorType &vec, RealScalar& ssq, RealScalar& scale, RealScalar& invScale) { typedef typename VectorType::Scalar Scalar; const Index blockSize = 4096; typedef typename internal::nested_eval::type VectorTypeCopy; typedef typename internal::remove_all::type VectorTypeCopyClean; const VectorTypeCopy copy(vec); enum { CanAlign = ( (int(VectorTypeCopyClean::Flags)&DirectAccessBit) || (int(internal::evaluator::Alignment)>0) // FIXME Alignment)>0 might not be enough ) && (blockSize*sizeof(Scalar)*20) // if we cannot allocate on the stack, then let's not bother about this optimization }; typedef typename internal::conditional, internal::evaluator::Alignment>, typename VectorTypeCopyClean::ConstSegmentReturnType>::type SegmentWrapper; Index n = vec.size(); Index bi = internal::first_default_aligned(copy); if (bi>0) internal::stable_norm_kernel(copy.head(bi), ssq, scale, invScale); for (; bi typename VectorType::RealScalar stable_norm_impl(const VectorType &vec, typename enable_if::type* = 0 ) { using std::sqrt; using std::abs; Index n = vec.size(); if(n==1) return abs(vec.coeff(0)); typedef typename VectorType::RealScalar RealScalar; RealScalar scale(0); RealScalar invScale(1); RealScalar ssq(0); // sum of squares stable_norm_impl_inner_step(vec, ssq, scale, invScale); return scale * sqrt(ssq); } template typename MatrixType::RealScalar stable_norm_impl(const MatrixType &mat, typename enable_if::type* = 0 ) { using std::sqrt; typedef typename MatrixType::RealScalar RealScalar; RealScalar scale(0); RealScalar invScale(1); RealScalar ssq(0); // sum of squares for(Index j=0; j inline typename NumTraits::Scalar>::Real blueNorm_impl(const EigenBase& _vec) { typedef typename Derived::RealScalar RealScalar; using std::pow; using std::sqrt; using std::abs; // This program calculates the machine-dependent constants // bl, b2, slm, s2m, relerr overfl // from the "basic" machine-dependent numbers // nbig, ibeta, it, iemin, iemax, rbig. // The following define the basic machine-dependent constants. // For portability, the PORT subprograms "ilmaeh" and "rlmach" // are used. For any specific computer, each of the assignment // statements can be replaced static const int ibeta = std::numeric_limits::radix; // base for floating-point numbers static const int it = NumTraits::digits(); // number of base-beta digits in mantissa static const int iemin = NumTraits::min_exponent(); // minimum exponent static const int iemax = NumTraits::max_exponent(); // maximum exponent static const RealScalar rbig = NumTraits::highest(); // largest floating-point number static const RealScalar b1 = RealScalar(pow(RealScalar(ibeta),RealScalar(-((1-iemin)/2)))); // lower boundary of midrange static const RealScalar b2 = RealScalar(pow(RealScalar(ibeta),RealScalar((iemax + 1 - it)/2))); // upper boundary of midrange static const RealScalar s1m = RealScalar(pow(RealScalar(ibeta),RealScalar((2-iemin)/2))); // scaling factor for lower range static const RealScalar s2m = RealScalar(pow(RealScalar(ibeta),RealScalar(- ((iemax+it)/2)))); // scaling factor for upper range static const RealScalar eps = RealScalar(pow(double(ibeta), 1-it)); static const RealScalar relerr = sqrt(eps); // tolerance for neglecting asml const Derived& vec(_vec.derived()); Index n = vec.size(); RealScalar ab2 = b2 / RealScalar(n); RealScalar asml = RealScalar(0); RealScalar amed = RealScalar(0); RealScalar abig = RealScalar(0); for(Index j=0; j ab2) abig += numext::abs2(ax*s2m); else if(ax < b1) asml += numext::abs2(ax*s1m); else amed += numext::abs2(ax); } } if(amed!=amed) return amed; // we got a NaN if(abig > RealScalar(0)) { abig = sqrt(abig); if(abig > rbig) // overflow, or *this contains INF values return abig; // return INF if(amed > RealScalar(0)) { abig = abig/s2m; amed = sqrt(amed); } else return abig/s2m; } else if(asml > RealScalar(0)) { if (amed > RealScalar(0)) { abig = sqrt(amed); amed = sqrt(asml) / s1m; } else return sqrt(asml)/s1m; } else return sqrt(amed); asml = numext::mini(abig, amed); abig = numext::maxi(abig, amed); if(asml <= abig*relerr) return abig; else return abig * sqrt(RealScalar(1) + numext::abs2(asml/abig)); } } // end namespace internal /** \returns the \em l2 norm of \c *this avoiding underflow and overflow. * This version use a blockwise two passes algorithm: * 1 - find the absolute largest coefficient \c s * 2 - compute \f$ s \Vert \frac{*this}{s} \Vert \f$ in a standard way * * For architecture/scalar types supporting vectorization, this version * is faster than blueNorm(). Otherwise the blueNorm() is much faster. * * \sa norm(), blueNorm(), hypotNorm() */ template inline typename NumTraits::Scalar>::Real MatrixBase::stableNorm() const { return internal::stable_norm_impl(derived()); } /** \returns the \em l2 norm of \c *this using the Blue's algorithm. * A Portable Fortran Program to Find the Euclidean Norm of a Vector, * ACM TOMS, Vol 4, Issue 1, 1978. * * For architecture/scalar types without vectorization, this version * is much faster than stableNorm(). Otherwise the stableNorm() is faster. * * \sa norm(), stableNorm(), hypotNorm() */ template inline typename NumTraits::Scalar>::Real MatrixBase::blueNorm() const { return internal::blueNorm_impl(*this); } /** \returns the \em l2 norm of \c *this avoiding undeflow and overflow. * This version use a concatenation of hypot() calls, and it is very slow. * * \sa norm(), stableNorm() */ template inline typename NumTraits::Scalar>::Real MatrixBase::hypotNorm() const { if(size()==1) return numext::abs(coeff(0,0)); else return this->cwiseAbs().redux(internal::scalar_hypot_op()); } } // end namespace Eigen #endif // EIGEN_STABLENORM_H RcppEigen/inst/include/Eigen/src/Core/MapBase.h0000644000176200001440000002602114562417221020722 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007-2010 Benoit Jacob // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MAPBASE_H #define EIGEN_MAPBASE_H #define EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) \ EIGEN_STATIC_ASSERT((int(internal::evaluator::Flags) & LinearAccessBit) || Derived::IsVectorAtCompileTime, \ YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT) namespace Eigen { /** \ingroup Core_Module * * \brief Base class for dense Map and Block expression with direct access * * This base class provides the const low-level accessors (e.g. coeff, coeffRef) of dense * Map and Block objects with direct access. * Typical users do not have to directly deal with this class. * * This class can be extended by through the macro plugin \c EIGEN_MAPBASE_PLUGIN. * See \link TopicCustomizing_Plugins customizing Eigen \endlink for details. * * The \c Derived class has to provide the following two methods describing the memory layout: * \code Index innerStride() const; \endcode * \code Index outerStride() const; \endcode * * \sa class Map, class Block */ template class MapBase : public internal::dense_xpr_base::type { public: typedef typename internal::dense_xpr_base::type Base; enum { RowsAtCompileTime = internal::traits::RowsAtCompileTime, ColsAtCompileTime = internal::traits::ColsAtCompileTime, InnerStrideAtCompileTime = internal::traits::InnerStrideAtCompileTime, SizeAtCompileTime = Base::SizeAtCompileTime }; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Scalar Scalar; typedef typename internal::packet_traits::type PacketScalar; typedef typename NumTraits::Real RealScalar; typedef typename internal::conditional< bool(internal::is_lvalue::value), Scalar *, const Scalar *>::type PointerType; using Base::derived; // using Base::RowsAtCompileTime; // using Base::ColsAtCompileTime; // using Base::SizeAtCompileTime; using Base::MaxRowsAtCompileTime; using Base::MaxColsAtCompileTime; using Base::MaxSizeAtCompileTime; using Base::IsVectorAtCompileTime; using Base::Flags; using Base::IsRowMajor; using Base::rows; using Base::cols; using Base::size; using Base::coeff; using Base::coeffRef; using Base::lazyAssign; using Base::eval; using Base::innerStride; using Base::outerStride; using Base::rowStride; using Base::colStride; // bug 217 - compile error on ICC 11.1 using Base::operator=; typedef typename Base::CoeffReturnType CoeffReturnType; /** \copydoc DenseBase::rows() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_rows.value(); } /** \copydoc DenseBase::cols() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_cols.value(); } /** Returns a pointer to the first coefficient of the matrix or vector. * * \note When addressing this data, make sure to honor the strides returned by innerStride() and outerStride(). * * \sa innerStride(), outerStride() */ EIGEN_DEVICE_FUNC inline const Scalar* data() const { return m_data; } /** \copydoc PlainObjectBase::coeff(Index,Index) const */ EIGEN_DEVICE_FUNC inline const Scalar& coeff(Index rowId, Index colId) const { return m_data[colId * colStride() + rowId * rowStride()]; } /** \copydoc PlainObjectBase::coeff(Index) const */ EIGEN_DEVICE_FUNC inline const Scalar& coeff(Index index) const { EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) return m_data[index * innerStride()]; } /** \copydoc PlainObjectBase::coeffRef(Index,Index) const */ EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const { return this->m_data[colId * colStride() + rowId * rowStride()]; } /** \copydoc PlainObjectBase::coeffRef(Index) const */ EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) return this->m_data[index * innerStride()]; } /** \internal */ template inline PacketScalar packet(Index rowId, Index colId) const { return internal::ploadt (m_data + (colId * colStride() + rowId * rowStride())); } /** \internal */ template inline PacketScalar packet(Index index) const { EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) return internal::ploadt(m_data + index * innerStride()); } /** \internal Constructor for fixed size matrices or vectors */ EIGEN_DEVICE_FUNC explicit inline MapBase(PointerType dataPtr) : m_data(dataPtr), m_rows(RowsAtCompileTime), m_cols(ColsAtCompileTime) { EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived) checkSanity(); } /** \internal Constructor for dynamically sized vectors */ EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index vecSize) : m_data(dataPtr), m_rows(RowsAtCompileTime == Dynamic ? vecSize : Index(RowsAtCompileTime)), m_cols(ColsAtCompileTime == Dynamic ? vecSize : Index(ColsAtCompileTime)) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) eigen_assert(vecSize >= 0); eigen_assert(dataPtr == 0 || SizeAtCompileTime == Dynamic || SizeAtCompileTime == vecSize); checkSanity(); } /** \internal Constructor for dynamically sized matrices */ EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index rows, Index cols) : m_data(dataPtr), m_rows(rows), m_cols(cols) { eigen_assert( (dataPtr == 0) || ( rows >= 0 && (RowsAtCompileTime == Dynamic || RowsAtCompileTime == rows) && cols >= 0 && (ColsAtCompileTime == Dynamic || ColsAtCompileTime == cols))); checkSanity(); } #ifdef EIGEN_MAPBASE_PLUGIN #include EIGEN_MAPBASE_PLUGIN #endif protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase) template EIGEN_DEVICE_FUNC void checkSanity(typename internal::enable_if<(internal::traits::Alignment>0),void*>::type = 0) const { #if EIGEN_MAX_ALIGN_BYTES>0 // innerStride() is not set yet when this function is called, so we optimistically assume the lowest plausible value: const Index minInnerStride = InnerStrideAtCompileTime == Dynamic ? 1 : Index(InnerStrideAtCompileTime); EIGEN_ONLY_USED_FOR_DEBUG(minInnerStride); eigen_assert(( ((internal::UIntPtr(m_data) % internal::traits::Alignment) == 0) || (cols() * rows() * minInnerStride * sizeof(Scalar)) < internal::traits::Alignment ) && "data is not aligned"); #endif } template EIGEN_DEVICE_FUNC void checkSanity(typename internal::enable_if::Alignment==0,void*>::type = 0) const {} PointerType m_data; const internal::variable_if_dynamic m_rows; const internal::variable_if_dynamic m_cols; }; /** \ingroup Core_Module * * \brief Base class for non-const dense Map and Block expression with direct access * * This base class provides the non-const low-level accessors (e.g. coeff and coeffRef) of * dense Map and Block objects with direct access. * It inherits MapBase which defines the const variant for reading specific entries. * * \sa class Map, class Block */ template class MapBase : public MapBase { typedef MapBase ReadOnlyMapBase; public: typedef MapBase Base; typedef typename Base::Scalar Scalar; typedef typename Base::PacketScalar PacketScalar; typedef typename Base::StorageIndex StorageIndex; typedef typename Base::PointerType PointerType; using Base::derived; using Base::rows; using Base::cols; using Base::size; using Base::coeff; using Base::coeffRef; using Base::innerStride; using Base::outerStride; using Base::rowStride; using Base::colStride; typedef typename internal::conditional< internal::is_lvalue::value, Scalar, const Scalar >::type ScalarWithConstIfNotLvalue; EIGEN_DEVICE_FUNC inline const Scalar* data() const { return this->m_data; } EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue* data() { return this->m_data; } // no const-cast here so non-const-correct code will give a compile error EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue& coeffRef(Index row, Index col) { return this->m_data[col * colStride() + row * rowStride()]; } EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue& coeffRef(Index index) { EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) return this->m_data[index * innerStride()]; } template inline void writePacket(Index row, Index col, const PacketScalar& val) { internal::pstoret (this->m_data + (col * colStride() + row * rowStride()), val); } template inline void writePacket(Index index, const PacketScalar& val) { EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) internal::pstoret (this->m_data + index * innerStride(), val); } EIGEN_DEVICE_FUNC explicit inline MapBase(PointerType dataPtr) : Base(dataPtr) {} EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index vecSize) : Base(dataPtr, vecSize) {} EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index rows, Index cols) : Base(dataPtr, rows, cols) {} EIGEN_DEVICE_FUNC Derived& operator=(const MapBase& other) { ReadOnlyMapBase::Base::operator=(other); return derived(); } // In theory we could simply refer to Base:Base::operator=, but MSVC does not like Base::Base, // see bugs 821 and 920. using ReadOnlyMapBase::Base::operator=; protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(MapBase) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MapBase) }; #undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS } // end namespace Eigen #endif // EIGEN_MAPBASE_H RcppEigen/inst/include/Eigen/src/Core/GlobalFunctions.h0000644000176200001440000002642714562417221022515 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010-2016 Gael Guennebaud // Copyright (C) 2010 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GLOBAL_FUNCTIONS_H #define EIGEN_GLOBAL_FUNCTIONS_H #ifdef EIGEN_PARSED_BY_DOXYGEN #define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME,FUNCTOR,DOC_OP,DOC_DETAILS) \ /** \returns an expression of the coefficient-wise DOC_OP of \a x DOC_DETAILS \sa Math functions, class CwiseUnaryOp */ \ template \ inline const Eigen::CwiseUnaryOp, const Derived> \ NAME(const Eigen::ArrayBase& x); #else #define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME,FUNCTOR,DOC_OP,DOC_DETAILS) \ template \ inline const Eigen::CwiseUnaryOp, const Derived> \ (NAME)(const Eigen::ArrayBase& x) { \ return Eigen::CwiseUnaryOp, const Derived>(x.derived()); \ } #endif // EIGEN_PARSED_BY_DOXYGEN #define EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(NAME,FUNCTOR) \ \ template \ struct NAME##_retval > \ { \ typedef const Eigen::CwiseUnaryOp, const Derived> type; \ }; \ template \ struct NAME##_impl > \ { \ static inline typename NAME##_retval >::type run(const Eigen::ArrayBase& x) \ { \ return typename NAME##_retval >::type(x.derived()); \ } \ }; namespace Eigen { EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(real,scalar_real_op,real part,\sa ArrayBase::real) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(imag,scalar_imag_op,imaginary part,\sa ArrayBase::imag) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(conj,scalar_conjugate_op,complex conjugate,\sa ArrayBase::conjugate) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(inverse,scalar_inverse_op,inverse,\sa ArrayBase::inverse) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sin,scalar_sin_op,sine,\sa ArrayBase::sin) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cos,scalar_cos_op,cosine,\sa ArrayBase::cos) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tan,scalar_tan_op,tangent,\sa ArrayBase::tan) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atan,scalar_atan_op,arc-tangent,\sa ArrayBase::atan) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asin,scalar_asin_op,arc-sine,\sa ArrayBase::asin) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acos,scalar_acos_op,arc-consine,\sa ArrayBase::acos) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sinh,scalar_sinh_op,hyperbolic sine,\sa ArrayBase::sinh) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cosh,scalar_cosh_op,hyperbolic cosine,\sa ArrayBase::cosh) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tanh,scalar_tanh_op,hyperbolic tangent,\sa ArrayBase::tanh) #if EIGEN_HAS_CXX11_MATH EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asinh,scalar_asinh_op,inverse hyperbolic sine,\sa ArrayBase::asinh) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acosh,scalar_acosh_op,inverse hyperbolic cosine,\sa ArrayBase::acosh) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atanh,scalar_atanh_op,inverse hyperbolic tangent,\sa ArrayBase::atanh) #endif EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(logistic,scalar_logistic_op,logistic function,\sa ArrayBase::logistic) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(lgamma,scalar_lgamma_op,natural logarithm of the gamma function,\sa ArrayBase::lgamma) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(digamma,scalar_digamma_op,derivative of lgamma,\sa ArrayBase::digamma) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erf,scalar_erf_op,error function,\sa ArrayBase::erf) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erfc,scalar_erfc_op,complement error function,\sa ArrayBase::erfc) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ndtri,scalar_ndtri_op,inverse normal distribution function,\sa ArrayBase::ndtri) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp,scalar_exp_op,exponential,\sa ArrayBase::exp) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1,scalar_expm1_op,exponential of a value minus 1,\sa ArrayBase::expm1) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log10) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log2,scalar_log2_op,base 2 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log2) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op,absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2,scalar_abs2_op,squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg DOXCOMMA MatrixBase::cwiseArg) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op,square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt,scalar_rsqrt_op,reciprocal square root,\sa ArrayBase::rsqrt) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square,scalar_square_op,square (power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube,scalar_cube_op,cube (power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rint,scalar_rint_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round,scalar_round_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(floor,scalar_floor_op,nearest integer not greater than the giben value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ceil,scalar_ceil_op,nearest integer not less than the giben value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isnan,scalar_isnan_op,not-a-number test,\sa Eigen::isinf DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isnan) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isinf,scalar_isinf_op,infinite value test,\sa Eigen::isnan DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isinf) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isfinite,scalar_isfinite_op,finite value test,\sa Eigen::isinf DOXCOMMA Eigen::isnan DOXCOMMA ArrayBase::isfinite) EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sign,scalar_sign_op,sign (or 0),\sa ArrayBase::sign) /** \returns an expression of the coefficient-wise power of \a x to the given constant \a exponent. * * \tparam ScalarExponent is the scalar type of \a exponent. It must be compatible with the scalar type of the given expression (\c Derived::Scalar). * * \sa ArrayBase::pow() * * \relates ArrayBase */ #ifdef EIGEN_PARSED_BY_DOXYGEN template inline const CwiseBinaryOp,Derived,Constant > pow(const Eigen::ArrayBase& x, const ScalarExponent& exponent); #else template EIGEN_DEVICE_FUNC inline EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE( const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,typename internal::promote_scalar_arg::type,pow)) pow(const Eigen::ArrayBase& x, const ScalarExponent& exponent) { typedef typename internal::promote_scalar_arg::type PromotedExponent; return EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,PromotedExponent,pow)(x.derived(), typename internal::plain_constant_type::type(x.derived().rows(), x.derived().cols(), internal::scalar_constant_op(exponent))); } #endif /** \returns an expression of the coefficient-wise power of \a x to the given array of \a exponents. * * This function computes the coefficient-wise power. * * Example: \include Cwise_array_power_array.cpp * Output: \verbinclude Cwise_array_power_array.out * * \sa ArrayBase::pow() * * \relates ArrayBase */ template inline const Eigen::CwiseBinaryOp, const Derived, const ExponentDerived> pow(const Eigen::ArrayBase& x, const Eigen::ArrayBase& exponents) { return Eigen::CwiseBinaryOp, const Derived, const ExponentDerived>( x.derived(), exponents.derived() ); } /** \returns an expression of the coefficient-wise power of the scalar \a x to the given array of \a exponents. * * This function computes the coefficient-wise power between a scalar and an array of exponents. * * \tparam Scalar is the scalar type of \a x. It must be compatible with the scalar type of the given array expression (\c Derived::Scalar). * * Example: \include Cwise_scalar_power_array.cpp * Output: \verbinclude Cwise_scalar_power_array.out * * \sa ArrayBase::pow() * * \relates ArrayBase */ #ifdef EIGEN_PARSED_BY_DOXYGEN template inline const CwiseBinaryOp,Constant,Derived> pow(const Scalar& x,const Eigen::ArrayBase& x); #else template EIGEN_DEVICE_FUNC inline EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE( const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename internal::promote_scalar_arg::type,Derived,pow)) pow(const Scalar& x, const Eigen::ArrayBase& exponents) { typedef typename internal::promote_scalar_arg::type PromotedScalar; return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(PromotedScalar,Derived,pow)( typename internal::plain_constant_type::type(exponents.derived().rows(), exponents.derived().cols(), internal::scalar_constant_op(x)), exponents.derived()); } #endif namespace internal { EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(real,scalar_real_op) EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(imag,scalar_imag_op) EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(abs2,scalar_abs2_op) } } // TODO: cleanly disable those functions that are not supported on Array (numext::real_ref, internal::random, internal::isApprox...) #endif // EIGEN_GLOBAL_FUNCTIONS_H RcppEigen/inst/include/Eigen/src/Core/IndexedView.h0000644000176200001440000002262414562417221021632 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INDEXED_VIEW_H #define EIGEN_INDEXED_VIEW_H namespace Eigen { namespace internal { template struct traits > : traits { enum { RowsAtCompileTime = int(array_size::value), ColsAtCompileTime = int(array_size::value), MaxRowsAtCompileTime = RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime) : Dynamic, MaxColsAtCompileTime = ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime) : Dynamic, XprTypeIsRowMajor = (int(traits::Flags)&RowMajorBit) != 0, IsRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1 : (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0 : XprTypeIsRowMajor, RowIncr = int(get_compile_time_incr::value), ColIncr = int(get_compile_time_incr::value), InnerIncr = IsRowMajor ? ColIncr : RowIncr, OuterIncr = IsRowMajor ? RowIncr : ColIncr, HasSameStorageOrderAsXprType = (IsRowMajor == XprTypeIsRowMajor), XprInnerStride = HasSameStorageOrderAsXprType ? int(inner_stride_at_compile_time::ret) : int(outer_stride_at_compile_time::ret), XprOuterstride = HasSameStorageOrderAsXprType ? int(outer_stride_at_compile_time::ret) : int(inner_stride_at_compile_time::ret), InnerSize = XprTypeIsRowMajor ? ColsAtCompileTime : RowsAtCompileTime, IsBlockAlike = InnerIncr==1 && OuterIncr==1, IsInnerPannel = HasSameStorageOrderAsXprType && is_same,typename conditional::type>::value, InnerStrideAtCompileTime = InnerIncr<0 || InnerIncr==DynamicIndex || XprInnerStride==Dynamic ? Dynamic : XprInnerStride * InnerIncr, OuterStrideAtCompileTime = OuterIncr<0 || OuterIncr==DynamicIndex || XprOuterstride==Dynamic ? Dynamic : XprOuterstride * OuterIncr, ReturnAsScalar = is_same::value && is_same::value, ReturnAsBlock = (!ReturnAsScalar) && IsBlockAlike, ReturnAsIndexedView = (!ReturnAsScalar) && (!ReturnAsBlock), // FIXME we deal with compile-time strides if and only if we have DirectAccessBit flag, // but this is too strict regarding negative strides... DirectAccessMask = (int(InnerIncr)!=UndefinedIncr && int(OuterIncr)!=UndefinedIncr && InnerIncr>=0 && OuterIncr>=0) ? DirectAccessBit : 0, FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0, FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0, Flags = (traits::Flags & (HereditaryBits | DirectAccessMask )) | FlagsLvalueBit | FlagsRowMajorBit | FlagsLinearAccessBit }; typedef Block BlockType; }; } template class IndexedViewImpl; /** \class IndexedView * \ingroup Core_Module * * \brief Expression of a non-sequential sub-matrix defined by arbitrary sequences of row and column indices * * \tparam XprType the type of the expression in which we are taking the intersections of sub-rows and sub-columns * \tparam RowIndices the type of the object defining the sequence of row indices * \tparam ColIndices the type of the object defining the sequence of column indices * * This class represents an expression of a sub-matrix (or sub-vector) defined as the intersection * of sub-sets of rows and columns, that are themself defined by generic sequences of row indices \f$ \{r_0,r_1,..r_{m-1}\} \f$ * and column indices \f$ \{c_0,c_1,..c_{n-1} \}\f$. Let \f$ A \f$ be the nested matrix, then the resulting matrix \f$ B \f$ has \c m * rows and \c n columns, and its entries are given by: \f$ B(i,j) = A(r_i,c_j) \f$. * * The \c RowIndices and \c ColIndices types must be compatible with the following API: * \code * operator[](Index) const; * Index size() const; * \endcode * * Typical supported types thus include: * - std::vector * - std::valarray * - std::array * - Plain C arrays: int[N] * - Eigen::ArrayXi * - decltype(ArrayXi::LinSpaced(...)) * - Any view/expressions of the previous types * - Eigen::ArithmeticSequence * - Eigen::internal::AllRange (helper for Eigen::all) * - Eigen::internal::SingleRange (helper for single index) * - etc. * * In typical usages of %Eigen, this class should never be used directly. It is the return type of * DenseBase::operator()(const RowIndices&, const ColIndices&). * * \sa class Block */ template class IndexedView : public IndexedViewImpl::StorageKind> { public: typedef typename IndexedViewImpl::StorageKind>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(IndexedView) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(IndexedView) typedef typename internal::ref_selector::non_const_type MatrixTypeNested; typedef typename internal::remove_all::type NestedExpression; template IndexedView(XprType& xpr, const T0& rowIndices, const T1& colIndices) : m_xpr(xpr), m_rowIndices(rowIndices), m_colIndices(colIndices) {} /** \returns number of rows */ Index rows() const { return internal::size(m_rowIndices); } /** \returns number of columns */ Index cols() const { return internal::size(m_colIndices); } /** \returns the nested expression */ const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } /** \returns the nested expression */ typename internal::remove_reference::type& nestedExpression() { return m_xpr; } /** \returns a const reference to the object storing/generating the row indices */ const RowIndices& rowIndices() const { return m_rowIndices; } /** \returns a const reference to the object storing/generating the column indices */ const ColIndices& colIndices() const { return m_colIndices; } protected: MatrixTypeNested m_xpr; RowIndices m_rowIndices; ColIndices m_colIndices; }; // Generic API dispatcher template class IndexedViewImpl : public internal::generic_xpr_base >::type { public: typedef typename internal::generic_xpr_base >::type Base; }; namespace internal { template struct unary_evaluator, IndexBased> : evaluator_base > { typedef IndexedView XprType; enum { CoeffReadCost = evaluator::CoeffReadCost /* TODO + cost of row/col index */, FlagsLinearAccessBit = (traits::RowsAtCompileTime == 1 || traits::ColsAtCompileTime == 1) ? LinearAccessBit : 0, FlagsRowMajorBit = traits::FlagsRowMajorBit, Flags = (evaluator::Flags & (HereditaryBits & ~RowMajorBit /*| LinearAccessBit | DirectAccessBit*/)) | FlagsLinearAccessBit | FlagsRowMajorBit, Alignment = 0 }; EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& xpr) : m_argImpl(xpr.nestedExpression()), m_xpr(xpr) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_argImpl.coeff(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_argImpl.coeffRef(m_xpr.rowIndices()[row], m_xpr.colIndices()[col]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { EIGEN_STATIC_ASSERT_LVALUE(XprType) Index row = XprType::RowsAtCompileTime == 1 ? 0 : index; Index col = XprType::RowsAtCompileTime == 1 ? index : 0; return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeffRef(Index index) const { Index row = XprType::RowsAtCompileTime == 1 ? 0 : index; Index col = XprType::RowsAtCompileTime == 1 ? index : 0; return m_argImpl.coeffRef( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index index) const { Index row = XprType::RowsAtCompileTime == 1 ? 0 : index; Index col = XprType::RowsAtCompileTime == 1 ? index : 0; return m_argImpl.coeff( m_xpr.rowIndices()[row], m_xpr.colIndices()[col]); } protected: evaluator m_argImpl; const XprType& m_xpr; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_INDEXED_VIEW_H RcppEigen/inst/include/Eigen/src/Core/products/0000755000176200001440000000000014562417221021103 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/products/TriangularMatrixMatrix_BLAS.h0000644000176200001440000003305314107270226026540 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Triangular matrix * matrix product functionality based on ?TRMM. ******************************************************************************** */ #ifndef EIGEN_TRIANGULAR_MATRIX_MATRIX_BLAS_H #define EIGEN_TRIANGULAR_MATRIX_MATRIX_BLAS_H namespace Eigen { namespace internal { template struct product_triangular_matrix_matrix_trmm : product_triangular_matrix_matrix {}; // try to go to BLAS specialization #define EIGEN_BLAS_TRMM_SPECIALIZE(Scalar, LhsIsTriangular) \ template \ struct product_triangular_matrix_matrix { \ static inline void run(Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride,\ const Scalar* _rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, Scalar alpha, level3_blocking& blocking) { \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ product_triangular_matrix_matrix_trmm::run( \ _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, resStride, alpha, blocking); \ } \ }; EIGEN_BLAS_TRMM_SPECIALIZE(double, true) EIGEN_BLAS_TRMM_SPECIALIZE(double, false) EIGEN_BLAS_TRMM_SPECIALIZE(dcomplex, true) EIGEN_BLAS_TRMM_SPECIALIZE(dcomplex, false) EIGEN_BLAS_TRMM_SPECIALIZE(float, true) EIGEN_BLAS_TRMM_SPECIALIZE(float, false) EIGEN_BLAS_TRMM_SPECIALIZE(scomplex, true) EIGEN_BLAS_TRMM_SPECIALIZE(scomplex, false) // implements col-major += alpha * op(triangular) * op(general) #define EIGEN_BLAS_TRMM_L(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_triangular_matrix_matrix_trmm \ { \ enum { \ IsLower = (Mode&Lower) == Lower, \ SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ LowUp = IsLower ? Lower : Upper, \ conjA = ((LhsStorageOrder==ColMajor) && ConjugateLhs) ? 1 : 0 \ }; \ \ static void run( \ Index _rows, Index _cols, Index _depth, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resStride, \ EIGTYPE alpha, level3_blocking& blocking) \ { \ Index diagSize = (std::min)(_rows,_depth); \ Index rows = IsLower ? _rows : diagSize; \ Index depth = IsLower ? diagSize : _depth; \ Index cols = _cols; \ \ typedef Matrix MatrixLhs; \ typedef Matrix MatrixRhs; \ \ /* Non-square case - doesn't fit to BLAS ?TRMM. Fall to default triangular product or call BLAS ?GEMM*/ \ if (rows != depth) { \ \ /* FIXME handle mkl_domain_get_max_threads */ \ /*int nthr = mkl_domain_get_max_threads(EIGEN_BLAS_DOMAIN_BLAS);*/ int nthr = 1;\ \ if (((nthr==1) && (((std::max)(rows,depth)-diagSize)/(double)diagSize < 0.5))) { \ /* Most likely no benefit to call TRMM or GEMM from BLAS */ \ product_triangular_matrix_matrix::run( \ _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, 1, resStride, alpha, blocking); \ /*std::cout << "TRMM_L: A is not square! Go to Eigen TRMM implementation!\n";*/ \ } else { \ /* Make sense to call GEMM */ \ Map > lhsMap(_lhs,rows,depth,OuterStride<>(lhsStride)); \ MatrixLhs aa_tmp=lhsMap.template triangularView(); \ BlasIndex aStride = convert_index(aa_tmp.outerStride()); \ gemm_blocking_space gemm_blocking(_rows,_cols,_depth, 1, true); \ general_matrix_matrix_product::run( \ rows, cols, depth, aa_tmp.data(), aStride, _rhs, rhsStride, res, 1, resStride, alpha, gemm_blocking, 0); \ \ /*std::cout << "TRMM_L: A is not square! Go to BLAS GEMM implementation! " << nthr<<" \n";*/ \ } \ return; \ } \ char side = 'L', transa, uplo, diag = 'N'; \ EIGTYPE *b; \ const EIGTYPE *a; \ BlasIndex m, n, lda, ldb; \ \ /* Set m, n */ \ m = convert_index(diagSize); \ n = convert_index(cols); \ \ /* Set trans */ \ transa = (LhsStorageOrder==RowMajor) ? ((ConjugateLhs) ? 'C' : 'T') : 'N'; \ \ /* Set b, ldb */ \ Map > rhs(_rhs,depth,cols,OuterStride<>(rhsStride)); \ MatrixX##EIGPREFIX b_tmp; \ \ if (ConjugateRhs) b_tmp = rhs.conjugate(); else b_tmp = rhs; \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ \ /* Set uplo */ \ uplo = IsLower ? 'L' : 'U'; \ if (LhsStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \ /* Set a, lda */ \ Map > lhs(_lhs,rows,depth,OuterStride<>(lhsStride)); \ MatrixLhs a_tmp; \ \ if ((conjA!=0) || (SetDiag==0)) { \ if (conjA) a_tmp = lhs.conjugate(); else a_tmp = lhs; \ if (IsZeroDiag) \ a_tmp.diagonal().setZero(); \ else if (IsUnitDiag) \ a_tmp.diagonal().setOnes();\ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else { \ a = _lhs; \ lda = convert_index(lhsStride); \ } \ /*std::cout << "TRMM_L: A is square! Go to BLAS TRMM implementation! \n";*/ \ /* call ?trmm*/ \ BLASFUNC(&side, &uplo, &transa, &diag, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (BLASTYPE*)b, &ldb); \ \ /* Add op(a_triangular)*b into res*/ \ Map > res_tmp(res,rows,cols,OuterStride<>(resStride)); \ res_tmp=res_tmp+b_tmp; \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRMM_L(double, double, d, dtrmm) EIGEN_BLAS_TRMM_L(dcomplex, MKL_Complex16, cd, ztrmm) EIGEN_BLAS_TRMM_L(float, float, f, strmm) EIGEN_BLAS_TRMM_L(scomplex, MKL_Complex8, cf, ctrmm) #else EIGEN_BLAS_TRMM_L(double, double, d, dtrmm_) EIGEN_BLAS_TRMM_L(dcomplex, double, cd, ztrmm_) EIGEN_BLAS_TRMM_L(float, float, f, strmm_) EIGEN_BLAS_TRMM_L(scomplex, float, cf, ctrmm_) #endif // implements col-major += alpha * op(general) * op(triangular) #define EIGEN_BLAS_TRMM_R(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_triangular_matrix_matrix_trmm \ { \ enum { \ IsLower = (Mode&Lower) == Lower, \ SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ LowUp = IsLower ? Lower : Upper, \ conjA = ((RhsStorageOrder==ColMajor) && ConjugateRhs) ? 1 : 0 \ }; \ \ static void run( \ Index _rows, Index _cols, Index _depth, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resStride, \ EIGTYPE alpha, level3_blocking& blocking) \ { \ Index diagSize = (std::min)(_cols,_depth); \ Index rows = _rows; \ Index depth = IsLower ? _depth : diagSize; \ Index cols = IsLower ? diagSize : _cols; \ \ typedef Matrix MatrixLhs; \ typedef Matrix MatrixRhs; \ \ /* Non-square case - doesn't fit to BLAS ?TRMM. Fall to default triangular product or call BLAS ?GEMM*/ \ if (cols != depth) { \ \ int nthr = 1 /*mkl_domain_get_max_threads(EIGEN_BLAS_DOMAIN_BLAS)*/; \ \ if ((nthr==1) && (((std::max)(cols,depth)-diagSize)/(double)diagSize < 0.5)) { \ /* Most likely no benefit to call TRMM or GEMM from BLAS*/ \ product_triangular_matrix_matrix::run( \ _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, 1, resStride, alpha, blocking); \ /*std::cout << "TRMM_R: A is not square! Go to Eigen TRMM implementation!\n";*/ \ } else { \ /* Make sense to call GEMM */ \ Map > rhsMap(_rhs,depth,cols, OuterStride<>(rhsStride)); \ MatrixRhs aa_tmp=rhsMap.template triangularView(); \ BlasIndex aStride = convert_index(aa_tmp.outerStride()); \ gemm_blocking_space gemm_blocking(_rows,_cols,_depth, 1, true); \ general_matrix_matrix_product::run( \ rows, cols, depth, _lhs, lhsStride, aa_tmp.data(), aStride, res, 1, resStride, alpha, gemm_blocking, 0); \ \ /*std::cout << "TRMM_R: A is not square! Go to BLAS GEMM implementation! " << nthr<<" \n";*/ \ } \ return; \ } \ char side = 'R', transa, uplo, diag = 'N'; \ EIGTYPE *b; \ const EIGTYPE *a; \ BlasIndex m, n, lda, ldb; \ \ /* Set m, n */ \ m = convert_index(rows); \ n = convert_index(diagSize); \ \ /* Set trans */ \ transa = (RhsStorageOrder==RowMajor) ? ((ConjugateRhs) ? 'C' : 'T') : 'N'; \ \ /* Set b, ldb */ \ Map > lhs(_lhs,rows,depth,OuterStride<>(lhsStride)); \ MatrixX##EIGPREFIX b_tmp; \ \ if (ConjugateLhs) b_tmp = lhs.conjugate(); else b_tmp = lhs; \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ \ /* Set uplo */ \ uplo = IsLower ? 'L' : 'U'; \ if (RhsStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \ /* Set a, lda */ \ Map > rhs(_rhs,depth,cols, OuterStride<>(rhsStride)); \ MatrixRhs a_tmp; \ \ if ((conjA!=0) || (SetDiag==0)) { \ if (conjA) a_tmp = rhs.conjugate(); else a_tmp = rhs; \ if (IsZeroDiag) \ a_tmp.diagonal().setZero(); \ else if (IsUnitDiag) \ a_tmp.diagonal().setOnes();\ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else { \ a = _rhs; \ lda = convert_index(rhsStride); \ } \ /*std::cout << "TRMM_R: A is square! Go to BLAS TRMM implementation! \n";*/ \ /* call ?trmm*/ \ BLASFUNC(&side, &uplo, &transa, &diag, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (BLASTYPE*)b, &ldb); \ \ /* Add op(a_triangular)*b into res*/ \ Map > res_tmp(res,rows,cols,OuterStride<>(resStride)); \ res_tmp=res_tmp+b_tmp; \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRMM_R(double, double, d, dtrmm) EIGEN_BLAS_TRMM_R(dcomplex, MKL_Complex16, cd, ztrmm) EIGEN_BLAS_TRMM_R(float, float, f, strmm) EIGEN_BLAS_TRMM_R(scomplex, MKL_Complex8, cf, ctrmm) #else EIGEN_BLAS_TRMM_R(double, double, d, dtrmm_) EIGEN_BLAS_TRMM_R(dcomplex, double, cd, ztrmm_) EIGEN_BLAS_TRMM_R(float, float, f, strmm_) EIGEN_BLAS_TRMM_R(scomplex, float, cf, ctrmm_) #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_MATRIX_MATRIX_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/SelfadjointRank2Update.h0000644000176200001440000001003614562417221025557 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFADJOINTRANK2UPTADE_H #define EIGEN_SELFADJOINTRANK2UPTADE_H namespace Eigen { namespace internal { /* Optimized selfadjoint matrix += alpha * uv' + conj(alpha)*vu' * It corresponds to the Level2 syr2 BLAS routine */ template struct selfadjoint_rank2_update_selector; template struct selfadjoint_rank2_update_selector { static EIGEN_DEVICE_FUNC void run(Scalar* mat, Index stride, const UType& u, const VType& v, const Scalar& alpha) { const Index size = u.size(); for (Index i=0; i >(mat+stride*i+i, size-i) += (numext::conj(alpha) * numext::conj(u.coeff(i))) * v.tail(size-i) + (alpha * numext::conj(v.coeff(i))) * u.tail(size-i); } } }; template struct selfadjoint_rank2_update_selector { static void run(Scalar* mat, Index stride, const UType& u, const VType& v, const Scalar& alpha) { const Index size = u.size(); for (Index i=0; i >(mat+stride*i, i+1) += (numext::conj(alpha) * numext::conj(u.coeff(i))) * v.head(i+1) + (alpha * numext::conj(v.coeff(i))) * u.head(i+1); } }; template struct conj_expr_if : conditional::Scalar>,T> > {}; } // end namespace internal template template EIGEN_DEVICE_FUNC SelfAdjointView& SelfAdjointView ::rankUpdate(const MatrixBase& u, const MatrixBase& v, const Scalar& alpha) { typedef internal::blas_traits UBlasTraits; typedef typename UBlasTraits::DirectLinearAccessType ActualUType; typedef typename internal::remove_all::type _ActualUType; typename internal::add_const_on_value_type::type actualU = UBlasTraits::extract(u.derived()); typedef internal::blas_traits VBlasTraits; typedef typename VBlasTraits::DirectLinearAccessType ActualVType; typedef typename internal::remove_all::type _ActualVType; typename internal::add_const_on_value_type::type actualV = VBlasTraits::extract(v.derived()); // If MatrixType is row major, then we use the routine for lower triangular in the upper triangular case and // vice versa, and take the complex conjugate of all coefficients and vector entries. enum { IsRowMajor = (internal::traits::Flags&RowMajorBit) ? 1 : 0 }; Scalar actualAlpha = alpha * UBlasTraits::extractScalarFactor(u.derived()) * numext::conj(VBlasTraits::extractScalarFactor(v.derived())); if (IsRowMajor) actualAlpha = numext::conj(actualAlpha); typedef typename internal::remove_all::type>::type UType; typedef typename internal::remove_all::type>::type VType; internal::selfadjoint_rank2_update_selector ::run(_expression().const_cast_derived().data(),_expression().outerStride(),UType(actualU),VType(actualV),actualAlpha); return *this; } } // end namespace Eigen #endif // EIGEN_SELFADJOINTRANK2UPTADE_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrix.h0000644000176200001440000004721014562417221025207 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERAL_MATRIX_MATRIX_H #define EIGEN_GENERAL_MATRIX_MATRIX_H namespace Eigen { namespace internal { template class level3_blocking; /* Specialization for a row-major destination matrix => simple transposition of the product */ template< typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs, typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs, int ResInnerStride> struct general_matrix_matrix_product { typedef gebp_traits Traits; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; static EIGEN_STRONG_INLINE void run( Index rows, Index cols, Index depth, const LhsScalar* lhs, Index lhsStride, const RhsScalar* rhs, Index rhsStride, ResScalar* res, Index resIncr, Index resStride, ResScalar alpha, level3_blocking& blocking, GemmParallelInfo* info = 0) { // transpose the product such that the result is column major general_matrix_matrix_product ::run(cols,rows,depth,rhs,rhsStride,lhs,lhsStride,res,resIncr,resStride,alpha,blocking,info); } }; /* Specialization for a col-major destination matrix * => Blocking algorithm following Goto's paper */ template< typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs, typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs, int ResInnerStride> struct general_matrix_matrix_product { typedef gebp_traits Traits; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; static void run(Index rows, Index cols, Index depth, const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsStride, ResScalar* _res, Index resIncr, Index resStride, ResScalar alpha, level3_blocking& blocking, GemmParallelInfo* info = 0) { typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs, lhsStride); RhsMapper rhs(_rhs, rhsStride); ResMapper res(_res, resStride, resIncr); Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction Index nc = (std::min)(cols,blocking.nc()); // cache block size along the N direction gemm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; gebp_kernel gebp; #ifdef EIGEN_HAS_OPENMP if(info) { // this is the parallel version! int tid = omp_get_thread_num(); int threads = omp_get_num_threads(); LhsScalar* blockA = blocking.blockA(); eigen_internal_assert(blockA!=0); std::size_t sizeB = kc*nc; ei_declare_aligned_stack_constructed_variable(RhsScalar, blockB, sizeB, 0); // For each horizontal panel of the rhs, and corresponding vertical panel of the lhs... for(Index k=0; k rows of B', and cols of the A' // In order to reduce the chance that a thread has to wait for the other, // let's start by packing B'. pack_rhs(blockB, rhs.getSubMapper(k,0), actual_kc, nc); // Pack A_k to A' in a parallel fashion: // each thread packs the sub block A_k,i to A'_i where i is the thread id. // However, before copying to A'_i, we have to make sure that no other thread is still using it, // i.e., we test that info[tid].users equals 0. // Then, we set info[tid].users to the number of threads to mark that all other threads are going to use it. while(info[tid].users!=0) {} info[tid].users = threads; pack_lhs(blockA+info[tid].lhs_start*actual_kc, lhs.getSubMapper(info[tid].lhs_start,k), actual_kc, info[tid].lhs_length); // Notify the other threads that the part A'_i is ready to go. info[tid].sync = k; // Computes C_i += A' * B' per A'_i for(int shift=0; shift0) { while(info[i].sync!=k) { } } gebp(res.getSubMapper(info[i].lhs_start, 0), blockA+info[i].lhs_start*actual_kc, blockB, info[i].lhs_length, actual_kc, nc, alpha); } // Then keep going as usual with the remaining B' for(Index j=nc; j Pack lhs's panel into a sequential chunk of memory (L2/L3 caching) // Note that this panel will be read as many times as the number of blocks in the rhs's // horizontal panel which is, in practice, a very low number. pack_lhs(blockA, lhs.getSubMapper(i2,k2), actual_kc, actual_mc); // For each kc x nc block of the rhs's horizontal panel... for(Index j2=0; j2 struct gemm_functor { gemm_functor(const Lhs& lhs, const Rhs& rhs, Dest& dest, const Scalar& actualAlpha, BlockingType& blocking) : m_lhs(lhs), m_rhs(rhs), m_dest(dest), m_actualAlpha(actualAlpha), m_blocking(blocking) {} void initParallelSession(Index num_threads) const { m_blocking.initParallel(m_lhs.rows(), m_rhs.cols(), m_lhs.cols(), num_threads); m_blocking.allocateA(); } void operator() (Index row, Index rows, Index col=0, Index cols=-1, GemmParallelInfo* info=0) const { if(cols==-1) cols = m_rhs.cols(); Gemm::run(rows, cols, m_lhs.cols(), &m_lhs.coeffRef(row,0), m_lhs.outerStride(), &m_rhs.coeffRef(0,col), m_rhs.outerStride(), (Scalar*)&(m_dest.coeffRef(row,col)), m_dest.innerStride(), m_dest.outerStride(), m_actualAlpha, m_blocking, info); } typedef typename Gemm::Traits Traits; protected: const Lhs& m_lhs; const Rhs& m_rhs; Dest& m_dest; Scalar m_actualAlpha; BlockingType& m_blocking; }; template class gemm_blocking_space; template class level3_blocking { typedef _LhsScalar LhsScalar; typedef _RhsScalar RhsScalar; protected: LhsScalar* m_blockA; RhsScalar* m_blockB; Index m_mc; Index m_nc; Index m_kc; public: level3_blocking() : m_blockA(0), m_blockB(0), m_mc(0), m_nc(0), m_kc(0) {} inline Index mc() const { return m_mc; } inline Index nc() const { return m_nc; } inline Index kc() const { return m_kc; } inline LhsScalar* blockA() { return m_blockA; } inline RhsScalar* blockB() { return m_blockB; } }; template class gemm_blocking_space : public level3_blocking< typename conditional::type, typename conditional::type> { enum { Transpose = StorageOrder==RowMajor, ActualRows = Transpose ? MaxCols : MaxRows, ActualCols = Transpose ? MaxRows : MaxCols }; typedef typename conditional::type LhsScalar; typedef typename conditional::type RhsScalar; typedef gebp_traits Traits; enum { SizeA = ActualRows * MaxDepth, SizeB = ActualCols * MaxDepth }; #if EIGEN_MAX_STATIC_ALIGN_BYTES >= EIGEN_DEFAULT_ALIGN_BYTES EIGEN_ALIGN_MAX LhsScalar m_staticA[SizeA]; EIGEN_ALIGN_MAX RhsScalar m_staticB[SizeB]; #else EIGEN_ALIGN_MAX char m_staticA[SizeA * sizeof(LhsScalar) + EIGEN_DEFAULT_ALIGN_BYTES-1]; EIGEN_ALIGN_MAX char m_staticB[SizeB * sizeof(RhsScalar) + EIGEN_DEFAULT_ALIGN_BYTES-1]; #endif public: gemm_blocking_space(Index /*rows*/, Index /*cols*/, Index /*depth*/, Index /*num_threads*/, bool /*full_rows = false*/) { this->m_mc = ActualRows; this->m_nc = ActualCols; this->m_kc = MaxDepth; #if EIGEN_MAX_STATIC_ALIGN_BYTES >= EIGEN_DEFAULT_ALIGN_BYTES this->m_blockA = m_staticA; this->m_blockB = m_staticB; #else this->m_blockA = reinterpret_cast((internal::UIntPtr(m_staticA) + (EIGEN_DEFAULT_ALIGN_BYTES-1)) & ~std::size_t(EIGEN_DEFAULT_ALIGN_BYTES-1)); this->m_blockB = reinterpret_cast((internal::UIntPtr(m_staticB) + (EIGEN_DEFAULT_ALIGN_BYTES-1)) & ~std::size_t(EIGEN_DEFAULT_ALIGN_BYTES-1)); #endif } void initParallel(Index, Index, Index, Index) {} inline void allocateA() {} inline void allocateB() {} inline void allocateAll() {} }; template class gemm_blocking_space : public level3_blocking< typename conditional::type, typename conditional::type> { enum { Transpose = StorageOrder==RowMajor }; typedef typename conditional::type LhsScalar; typedef typename conditional::type RhsScalar; typedef gebp_traits Traits; Index m_sizeA; Index m_sizeB; public: gemm_blocking_space(Index rows, Index cols, Index depth, Index num_threads, bool l3_blocking) { this->m_mc = Transpose ? cols : rows; this->m_nc = Transpose ? rows : cols; this->m_kc = depth; if(l3_blocking) { computeProductBlockingSizes(this->m_kc, this->m_mc, this->m_nc, num_threads); } else // no l3 blocking { Index n = this->m_nc; computeProductBlockingSizes(this->m_kc, this->m_mc, n, num_threads); } m_sizeA = this->m_mc * this->m_kc; m_sizeB = this->m_kc * this->m_nc; } void initParallel(Index rows, Index cols, Index depth, Index num_threads) { this->m_mc = Transpose ? cols : rows; this->m_nc = Transpose ? rows : cols; this->m_kc = depth; eigen_internal_assert(this->m_blockA==0 && this->m_blockB==0); Index m = this->m_mc; computeProductBlockingSizes(this->m_kc, m, this->m_nc, num_threads); m_sizeA = this->m_mc * this->m_kc; m_sizeB = this->m_kc * this->m_nc; } void allocateA() { if(this->m_blockA==0) this->m_blockA = aligned_new(m_sizeA); } void allocateB() { if(this->m_blockB==0) this->m_blockB = aligned_new(m_sizeB); } void allocateAll() { allocateA(); allocateB(); } ~gemm_blocking_space() { aligned_delete(this->m_blockA, m_sizeA); aligned_delete(this->m_blockB, m_sizeB); } }; } // end namespace internal namespace internal { template struct generic_product_impl : generic_product_impl_base > { typedef typename Product::Scalar Scalar; typedef typename Lhs::Scalar LhsScalar; typedef typename Rhs::Scalar RhsScalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef typename internal::remove_all::type ActualLhsTypeCleaned; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; typedef typename internal::remove_all::type ActualRhsTypeCleaned; enum { MaxDepthAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED(Lhs::MaxColsAtCompileTime,Rhs::MaxRowsAtCompileTime) }; typedef generic_product_impl lazyproduct; template static void evalTo(Dst& dst, const Lhs& lhs, const Rhs& rhs) { // See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404 for a discussion and helper program // to determine the following heuristic. // EIGEN_GEMM_TO_COEFFBASED_THRESHOLD is typically defined to 20 in GeneralProduct.h, // unless it has been specialized by the user or for a given architecture. // Note that the condition rhs.rows()>0 was required because lazy product is (was?) not happy with empty inputs. // I'm not sure it is still required. if((rhs.rows()+dst.rows()+dst.cols())0) lazyproduct::eval_dynamic(dst, lhs, rhs, internal::assign_op()); else { dst.setZero(); scaleAndAddTo(dst, lhs, rhs, Scalar(1)); } } template static void addTo(Dst& dst, const Lhs& lhs, const Rhs& rhs) { if((rhs.rows()+dst.rows()+dst.cols())0) lazyproduct::eval_dynamic(dst, lhs, rhs, internal::add_assign_op()); else scaleAndAddTo(dst,lhs, rhs, Scalar(1)); } template static void subTo(Dst& dst, const Lhs& lhs, const Rhs& rhs) { if((rhs.rows()+dst.rows()+dst.cols())0) lazyproduct::eval_dynamic(dst, lhs, rhs, internal::sub_assign_op()); else scaleAndAddTo(dst, lhs, rhs, Scalar(-1)); } template static void scaleAndAddTo(Dest& dst, const Lhs& a_lhs, const Rhs& a_rhs, const Scalar& alpha) { eigen_assert(dst.rows()==a_lhs.rows() && dst.cols()==a_rhs.cols()); if(a_lhs.cols()==0 || a_lhs.rows()==0 || a_rhs.cols()==0) return; if (dst.cols() == 1) { // Fallback to GEMV if either the lhs or rhs is a runtime vector typename Dest::ColXpr dst_vec(dst.col(0)); return internal::generic_product_impl ::scaleAndAddTo(dst_vec, a_lhs, a_rhs.col(0), alpha); } else if (dst.rows() == 1) { // Fallback to GEMV if either the lhs or rhs is a runtime vector typename Dest::RowXpr dst_vec(dst.row(0)); return internal::generic_product_impl ::scaleAndAddTo(dst_vec, a_lhs.row(0), a_rhs, alpha); } typename internal::add_const_on_value_type::type lhs = LhsBlasTraits::extract(a_lhs); typename internal::add_const_on_value_type::type rhs = RhsBlasTraits::extract(a_rhs); Scalar actualAlpha = combine_scalar_factors(alpha, a_lhs, a_rhs); typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,LhsScalar,RhsScalar, Dest::MaxRowsAtCompileTime,Dest::MaxColsAtCompileTime,MaxDepthAtCompileTime> BlockingType; typedef internal::gemm_functor< Scalar, Index, internal::general_matrix_matrix_product< Index, LhsScalar, (ActualLhsTypeCleaned::Flags&RowMajorBit) ? RowMajor : ColMajor, bool(LhsBlasTraits::NeedToConjugate), RhsScalar, (ActualRhsTypeCleaned::Flags&RowMajorBit) ? RowMajor : ColMajor, bool(RhsBlasTraits::NeedToConjugate), (Dest::Flags&RowMajorBit) ? RowMajor : ColMajor, Dest::InnerStrideAtCompileTime>, ActualLhsTypeCleaned, ActualRhsTypeCleaned, Dest, BlockingType> GemmFunctor; BlockingType blocking(dst.rows(), dst.cols(), lhs.cols(), 1, true); internal::parallelize_gemm<(Dest::MaxRowsAtCompileTime>32 || Dest::MaxRowsAtCompileTime==Dynamic)> (GemmFunctor(lhs, rhs, dst, actualAlpha, blocking), a_lhs.rows(), a_rhs.cols(), a_lhs.cols(), Dest::Flags&RowMajorBit); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_GENERAL_MATRIX_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrix_BLAS.h0000644000176200001440000001176214107270226026010 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * General matrix-matrix product functionality based on ?GEMM. ******************************************************************************** */ #ifndef EIGEN_GENERAL_MATRIX_MATRIX_BLAS_H #define EIGEN_GENERAL_MATRIX_MATRIX_BLAS_H namespace Eigen { namespace internal { /********************************************************************** * This file implements general matrix-matrix multiplication using BLAS * gemm function via partial specialization of * general_matrix_matrix_product::run(..) method for float, double, * std::complex and std::complex types **********************************************************************/ // gemm specialization #define GEMM_SPECIALIZATION(EIGTYPE, EIGPREFIX, BLASTYPE, BLASFUNC) \ template< \ typename Index, \ int LhsStorageOrder, bool ConjugateLhs, \ int RhsStorageOrder, bool ConjugateRhs> \ struct general_matrix_matrix_product \ { \ typedef gebp_traits Traits; \ \ static void run(Index rows, Index cols, Index depth, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resIncr, Index resStride, \ EIGTYPE alpha, \ level3_blocking& /*blocking*/, \ GemmParallelInfo* /*info = 0*/) \ { \ using std::conj; \ \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ char transa, transb; \ BlasIndex m, n, k, lda, ldb, ldc; \ const EIGTYPE *a, *b; \ EIGTYPE beta(1); \ MatrixX##EIGPREFIX a_tmp, b_tmp; \ \ /* Set transpose options */ \ transa = (LhsStorageOrder==RowMajor) ? ((ConjugateLhs) ? 'C' : 'T') : 'N'; \ transb = (RhsStorageOrder==RowMajor) ? ((ConjugateRhs) ? 'C' : 'T') : 'N'; \ \ /* Set m, n, k */ \ m = convert_index(rows); \ n = convert_index(cols); \ k = convert_index(depth); \ \ /* Set lda, ldb, ldc */ \ lda = convert_index(lhsStride); \ ldb = convert_index(rhsStride); \ ldc = convert_index(resStride); \ \ /* Set a, b, c */ \ if ((LhsStorageOrder==ColMajor) && (ConjugateLhs)) { \ Map > lhs(_lhs,m,k,OuterStride<>(lhsStride)); \ a_tmp = lhs.conjugate(); \ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else a = _lhs; \ \ if ((RhsStorageOrder==ColMajor) && (ConjugateRhs)) { \ Map > rhs(_rhs,k,n,OuterStride<>(rhsStride)); \ b_tmp = rhs.conjugate(); \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ } else b = _rhs; \ \ BLASFUNC(&transa, &transb, &m, &n, &k, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)b, &ldb, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &ldc); \ }}; #ifdef EIGEN_USE_MKL GEMM_SPECIALIZATION(double, d, double, dgemm) GEMM_SPECIALIZATION(float, f, float, sgemm) GEMM_SPECIALIZATION(dcomplex, cd, MKL_Complex16, zgemm) GEMM_SPECIALIZATION(scomplex, cf, MKL_Complex8, cgemm) #else GEMM_SPECIALIZATION(double, d, double, dgemm_) GEMM_SPECIALIZATION(float, f, float, sgemm_) GEMM_SPECIALIZATION(dcomplex, cd, double, zgemm_) GEMM_SPECIALIZATION(scomplex, cf, float, cgemm_) #endif } // end namespase internal } // end namespace Eigen #endif // EIGEN_GENERAL_MATRIX_MATRIX_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularMatrixVector.h0000644000176200001440000003460214107270226025736 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRIANGULARMATRIXVECTOR_H #define EIGEN_TRIANGULARMATRIXVECTOR_H namespace Eigen { namespace internal { template struct triangular_matrix_vector_product; template struct triangular_matrix_vector_product { typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; enum { IsLower = ((Mode&Lower)==Lower), HasUnitDiag = (Mode & UnitDiag)==UnitDiag, HasZeroDiag = (Mode & ZeroDiag)==ZeroDiag }; static EIGEN_DONT_INLINE void run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const RhsScalar& alpha); }; template EIGEN_DONT_INLINE void triangular_matrix_vector_product ::run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const RhsScalar& alpha) { static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH; Index size = (std::min)(_rows,_cols); Index rows = IsLower ? _rows : (std::min)(_rows,_cols); Index cols = IsLower ? (std::min)(_rows,_cols) : _cols; typedef Map, 0, OuterStride<> > LhsMap; const LhsMap lhs(_lhs,rows,cols,OuterStride<>(lhsStride)); typename conj_expr_if::type cjLhs(lhs); typedef Map, 0, InnerStride<> > RhsMap; const RhsMap rhs(_rhs,cols,InnerStride<>(rhsIncr)); typename conj_expr_if::type cjRhs(rhs); typedef Map > ResMap; ResMap res(_res,rows); typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; for (Index pi=0; pi0) res.segment(s,r) += (alpha * cjRhs.coeff(i)) * cjLhs.col(i).segment(s,r); if (HasUnitDiag) res.coeffRef(i) += alpha * cjRhs.coeff(i); } Index r = IsLower ? rows - pi - actualPanelWidth : pi; if (r>0) { Index s = IsLower ? pi+actualPanelWidth : 0; general_matrix_vector_product::run( r, actualPanelWidth, LhsMapper(&lhs.coeffRef(s,pi), lhsStride), RhsMapper(&rhs.coeffRef(pi), rhsIncr), &res.coeffRef(s), resIncr, alpha); } } if((!IsLower) && cols>size) { general_matrix_vector_product::run( rows, cols-size, LhsMapper(&lhs.coeffRef(0,size), lhsStride), RhsMapper(&rhs.coeffRef(size), rhsIncr), _res, resIncr, alpha); } } template struct triangular_matrix_vector_product { typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; enum { IsLower = ((Mode&Lower)==Lower), HasUnitDiag = (Mode & UnitDiag)==UnitDiag, HasZeroDiag = (Mode & ZeroDiag)==ZeroDiag }; static EIGEN_DONT_INLINE void run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha); }; template EIGEN_DONT_INLINE void triangular_matrix_vector_product ::run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha) { static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH; Index diagSize = (std::min)(_rows,_cols); Index rows = IsLower ? _rows : diagSize; Index cols = IsLower ? diagSize : _cols; typedef Map, 0, OuterStride<> > LhsMap; const LhsMap lhs(_lhs,rows,cols,OuterStride<>(lhsStride)); typename conj_expr_if::type cjLhs(lhs); typedef Map > RhsMap; const RhsMap rhs(_rhs,cols); typename conj_expr_if::type cjRhs(rhs); typedef Map, 0, InnerStride<> > ResMap; ResMap res(_res,rows,InnerStride<>(resIncr)); typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; for (Index pi=0; pi0) res.coeffRef(i) += alpha * (cjLhs.row(i).segment(s,r).cwiseProduct(cjRhs.segment(s,r).transpose())).sum(); if (HasUnitDiag) res.coeffRef(i) += alpha * cjRhs.coeff(i); } Index r = IsLower ? pi : cols - pi - actualPanelWidth; if (r>0) { Index s = IsLower ? 0 : pi + actualPanelWidth; general_matrix_vector_product::run( actualPanelWidth, r, LhsMapper(&lhs.coeffRef(pi,s), lhsStride), RhsMapper(&rhs.coeffRef(s), rhsIncr), &res.coeffRef(pi), resIncr, alpha); } } if(IsLower && rows>diagSize) { general_matrix_vector_product::run( rows-diagSize, cols, LhsMapper(&lhs.coeffRef(diagSize,0), lhsStride), RhsMapper(&rhs.coeffRef(0), rhsIncr), &res.coeffRef(diagSize), resIncr, alpha); } } /*************************************************************************** * Wrapper to product_triangular_vector ***************************************************************************/ template struct trmv_selector; } // end namespace internal namespace internal { template struct triangular_product_impl { template static void run(Dest& dst, const Lhs &lhs, const Rhs &rhs, const typename Dest::Scalar& alpha) { eigen_assert(dst.rows()==lhs.rows() && dst.cols()==rhs.cols()); internal::trmv_selector::Flags)&RowMajorBit) ? RowMajor : ColMajor>::run(lhs, rhs, dst, alpha); } }; template struct triangular_product_impl { template static void run(Dest& dst, const Lhs &lhs, const Rhs &rhs, const typename Dest::Scalar& alpha) { eigen_assert(dst.rows()==lhs.rows() && dst.cols()==rhs.cols()); Transpose dstT(dst); internal::trmv_selector<(Mode & (UnitDiag|ZeroDiag)) | ((Mode & Lower) ? Upper : Lower), (int(internal::traits::Flags)&RowMajorBit) ? ColMajor : RowMajor> ::run(rhs.transpose(),lhs.transpose(), dstT, alpha); } }; } // end namespace internal namespace internal { // TODO: find a way to factorize this piece of code with gemv_selector since the logic is exactly the same. template struct trmv_selector { template static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha) { typedef typename Lhs::Scalar LhsScalar; typedef typename Rhs::Scalar RhsScalar; typedef typename Dest::Scalar ResScalar; typedef typename Dest::RealScalar RealScalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; typedef Map, EIGEN_PLAIN_ENUM_MIN(AlignedMax,internal::packet_traits::size)> MappedDest; typename internal::add_const_on_value_type::type actualLhs = LhsBlasTraits::extract(lhs); typename internal::add_const_on_value_type::type actualRhs = RhsBlasTraits::extract(rhs); LhsScalar lhs_alpha = LhsBlasTraits::extractScalarFactor(lhs); RhsScalar rhs_alpha = RhsBlasTraits::extractScalarFactor(rhs); ResScalar actualAlpha = alpha * lhs_alpha * rhs_alpha; enum { // FIXME find a way to allow an inner stride on the result if packet_traits::size==1 // on, the other hand it is good for the cache to pack the vector anyways... EvalToDestAtCompileTime = Dest::InnerStrideAtCompileTime==1, ComplexByReal = (NumTraits::IsComplex) && (!NumTraits::IsComplex), MightCannotUseDest = (Dest::InnerStrideAtCompileTime!=1) || ComplexByReal }; gemv_static_vector_if static_dest; bool alphaIsCompatible = (!ComplexByReal) || (numext::imag(actualAlpha)==RealScalar(0)); bool evalToDest = EvalToDestAtCompileTime && alphaIsCompatible; RhsScalar compatibleAlpha = get_factor::run(actualAlpha); ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(), evalToDest ? dest.data() : static_dest.data()); if(!evalToDest) { #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN Index size = dest.size(); EIGEN_DENSE_STORAGE_CTOR_PLUGIN #endif if(!alphaIsCompatible) { MappedDest(actualDestPtr, dest.size()).setZero(); compatibleAlpha = RhsScalar(1); } else MappedDest(actualDestPtr, dest.size()) = dest; } internal::triangular_matrix_vector_product ::run(actualLhs.rows(),actualLhs.cols(), actualLhs.data(),actualLhs.outerStride(), actualRhs.data(),actualRhs.innerStride(), actualDestPtr,1,compatibleAlpha); if (!evalToDest) { if(!alphaIsCompatible) dest += actualAlpha * MappedDest(actualDestPtr, dest.size()); else dest = MappedDest(actualDestPtr, dest.size()); } if ( ((Mode&UnitDiag)==UnitDiag) && (lhs_alpha!=LhsScalar(1)) ) { Index diagSize = (std::min)(lhs.rows(),lhs.cols()); dest.head(diagSize) -= (lhs_alpha-LhsScalar(1))*rhs.head(diagSize); } } }; template struct trmv_selector { template static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha) { typedef typename Lhs::Scalar LhsScalar; typedef typename Rhs::Scalar RhsScalar; typedef typename Dest::Scalar ResScalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; typedef typename internal::remove_all::type ActualRhsTypeCleaned; typename add_const::type actualLhs = LhsBlasTraits::extract(lhs); typename add_const::type actualRhs = RhsBlasTraits::extract(rhs); LhsScalar lhs_alpha = LhsBlasTraits::extractScalarFactor(lhs); RhsScalar rhs_alpha = RhsBlasTraits::extractScalarFactor(rhs); ResScalar actualAlpha = alpha * lhs_alpha * rhs_alpha; enum { DirectlyUseRhs = ActualRhsTypeCleaned::InnerStrideAtCompileTime==1 }; gemv_static_vector_if static_rhs; ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,actualRhs.size(), DirectlyUseRhs ? const_cast(actualRhs.data()) : static_rhs.data()); if(!DirectlyUseRhs) { #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN Index size = actualRhs.size(); EIGEN_DENSE_STORAGE_CTOR_PLUGIN #endif Map(actualRhsPtr, actualRhs.size()) = actualRhs; } internal::triangular_matrix_vector_product ::run(actualLhs.rows(),actualLhs.cols(), actualLhs.data(),actualLhs.outerStride(), actualRhsPtr,1, dest.data(),dest.innerStride(), actualAlpha); if ( ((Mode&UnitDiag)==UnitDiag) && (lhs_alpha!=LhsScalar(1)) ) { Index diagSize = (std::min)(lhs.rows(),lhs.cols()); dest.head(diagSize) -= (lhs_alpha-LhsScalar(1))*rhs.head(diagSize); } } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULARMATRIXVECTOR_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixVector.h0000644000176200001440000005233414562417221025210 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERAL_MATRIX_VECTOR_H #define EIGEN_GENERAL_MATRIX_VECTOR_H namespace Eigen { namespace internal { enum GEMVPacketSizeType { GEMVPacketFull = 0, GEMVPacketHalf, GEMVPacketQuarter }; template struct gemv_packet_cond { typedef T3 type; }; template struct gemv_packet_cond { typedef T1 type; }; template struct gemv_packet_cond { typedef T2 type; }; template class gemv_traits { typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; #define PACKET_DECL_COND_PREFIX(prefix, name, packet_size) \ typedef typename gemv_packet_cond::type, \ typename packet_traits::half, \ typename unpacket_traits::half>::half>::type \ prefix ## name ## Packet PACKET_DECL_COND_PREFIX(_, Lhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Rhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Res, _PacketSize); #undef PACKET_DECL_COND_PREFIX public: enum { Vectorizable = unpacket_traits<_LhsPacket>::vectorizable && unpacket_traits<_RhsPacket>::vectorizable && int(unpacket_traits<_LhsPacket>::size)==int(unpacket_traits<_RhsPacket>::size), LhsPacketSize = Vectorizable ? unpacket_traits<_LhsPacket>::size : 1, RhsPacketSize = Vectorizable ? unpacket_traits<_RhsPacket>::size : 1, ResPacketSize = Vectorizable ? unpacket_traits<_ResPacket>::size : 1 }; typedef typename conditional::type LhsPacket; typedef typename conditional::type RhsPacket; typedef typename conditional::type ResPacket; }; /* Optimized col-major matrix * vector product: * This algorithm processes the matrix per vertical panels, * which are then processed horizontaly per chunck of 8*PacketSize x 1 vertical segments. * * Mixing type logic: C += alpha * A * B * | A | B |alpha| comments * |real |cplx |cplx | no vectorization * |real |cplx |real | alpha is converted to a cplx when calling the run function, no vectorization * |cplx |real |cplx | invalid, the caller has to do tmp: = A * B; C += alpha*tmp * |cplx |real |real | optimal case, vectorization possible via real-cplx mul * * The same reasoning apply for the transposed case. */ template struct general_matrix_vector_product { typedef gemv_traits Traits; typedef gemv_traits HalfTraits; typedef gemv_traits QuarterTraits; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; typedef typename Traits::LhsPacket LhsPacket; typedef typename Traits::RhsPacket RhsPacket; typedef typename Traits::ResPacket ResPacket; typedef typename HalfTraits::LhsPacket LhsPacketHalf; typedef typename HalfTraits::RhsPacket RhsPacketHalf; typedef typename HalfTraits::ResPacket ResPacketHalf; typedef typename QuarterTraits::LhsPacket LhsPacketQuarter; typedef typename QuarterTraits::RhsPacket RhsPacketQuarter; typedef typename QuarterTraits::ResPacket ResPacketQuarter; EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE static void run( Index rows, Index cols, const LhsMapper& lhs, const RhsMapper& rhs, ResScalar* res, Index resIncr, RhsScalar alpha); }; template EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE void general_matrix_vector_product::run( Index rows, Index cols, const LhsMapper& alhs, const RhsMapper& rhs, ResScalar* res, Index resIncr, RhsScalar alpha) { EIGEN_UNUSED_VARIABLE(resIncr); eigen_internal_assert(resIncr==1); // The following copy tells the compiler that lhs's attributes are not modified outside this function // This helps GCC to generate propoer code. LhsMapper lhs(alhs); conj_helper cj; conj_helper pcj; conj_helper pcj_half; conj_helper pcj_quarter; const Index lhsStride = lhs.stride(); // TODO: for padded aligned inputs, we could enable aligned reads enum { LhsAlignment = Unaligned, ResPacketSize = Traits::ResPacketSize, ResPacketSizeHalf = HalfTraits::ResPacketSize, ResPacketSizeQuarter = QuarterTraits::ResPacketSize, LhsPacketSize = Traits::LhsPacketSize, HasHalf = (int)ResPacketSizeHalf < (int)ResPacketSize, HasQuarter = (int)ResPacketSizeQuarter < (int)ResPacketSizeHalf }; const Index n8 = rows-8*ResPacketSize+1; const Index n4 = rows-4*ResPacketSize+1; const Index n3 = rows-3*ResPacketSize+1; const Index n2 = rows-2*ResPacketSize+1; const Index n1 = rows-1*ResPacketSize+1; const Index n_half = rows-1*ResPacketSizeHalf+1; const Index n_quarter = rows-1*ResPacketSizeQuarter+1; // TODO: improve the following heuristic: const Index block_cols = cols<128 ? cols : (lhsStride*sizeof(LhsScalar)<32000?16:4); ResPacket palpha = pset1(alpha); ResPacketHalf palpha_half = pset1(alpha); ResPacketQuarter palpha_quarter = pset1(alpha); for(Index j2=0; j2(ResScalar(0)), c1 = pset1(ResScalar(0)), c2 = pset1(ResScalar(0)), c3 = pset1(ResScalar(0)), c4 = pset1(ResScalar(0)), c5 = pset1(ResScalar(0)), c6 = pset1(ResScalar(0)), c7 = pset1(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj.pmadd(lhs.template load(i+LhsPacketSize*0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+LhsPacketSize*1,j),b0,c1); c2 = pcj.pmadd(lhs.template load(i+LhsPacketSize*2,j),b0,c2); c3 = pcj.pmadd(lhs.template load(i+LhsPacketSize*3,j),b0,c3); c4 = pcj.pmadd(lhs.template load(i+LhsPacketSize*4,j),b0,c4); c5 = pcj.pmadd(lhs.template load(i+LhsPacketSize*5,j),b0,c5); c6 = pcj.pmadd(lhs.template load(i+LhsPacketSize*6,j),b0,c6); c7 = pcj.pmadd(lhs.template load(i+LhsPacketSize*7,j),b0,c7); } pstoreu(res+i+ResPacketSize*0, pmadd(c0,palpha,ploadu(res+i+ResPacketSize*0))); pstoreu(res+i+ResPacketSize*1, pmadd(c1,palpha,ploadu(res+i+ResPacketSize*1))); pstoreu(res+i+ResPacketSize*2, pmadd(c2,palpha,ploadu(res+i+ResPacketSize*2))); pstoreu(res+i+ResPacketSize*3, pmadd(c3,palpha,ploadu(res+i+ResPacketSize*3))); pstoreu(res+i+ResPacketSize*4, pmadd(c4,palpha,ploadu(res+i+ResPacketSize*4))); pstoreu(res+i+ResPacketSize*5, pmadd(c5,palpha,ploadu(res+i+ResPacketSize*5))); pstoreu(res+i+ResPacketSize*6, pmadd(c6,palpha,ploadu(res+i+ResPacketSize*6))); pstoreu(res+i+ResPacketSize*7, pmadd(c7,palpha,ploadu(res+i+ResPacketSize*7))); } if(i(ResScalar(0)), c1 = pset1(ResScalar(0)), c2 = pset1(ResScalar(0)), c3 = pset1(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj.pmadd(lhs.template load(i+LhsPacketSize*0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+LhsPacketSize*1,j),b0,c1); c2 = pcj.pmadd(lhs.template load(i+LhsPacketSize*2,j),b0,c2); c3 = pcj.pmadd(lhs.template load(i+LhsPacketSize*3,j),b0,c3); } pstoreu(res+i+ResPacketSize*0, pmadd(c0,palpha,ploadu(res+i+ResPacketSize*0))); pstoreu(res+i+ResPacketSize*1, pmadd(c1,palpha,ploadu(res+i+ResPacketSize*1))); pstoreu(res+i+ResPacketSize*2, pmadd(c2,palpha,ploadu(res+i+ResPacketSize*2))); pstoreu(res+i+ResPacketSize*3, pmadd(c3,palpha,ploadu(res+i+ResPacketSize*3))); i+=ResPacketSize*4; } if(i(ResScalar(0)), c1 = pset1(ResScalar(0)), c2 = pset1(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj.pmadd(lhs.template load(i+LhsPacketSize*0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+LhsPacketSize*1,j),b0,c1); c2 = pcj.pmadd(lhs.template load(i+LhsPacketSize*2,j),b0,c2); } pstoreu(res+i+ResPacketSize*0, pmadd(c0,palpha,ploadu(res+i+ResPacketSize*0))); pstoreu(res+i+ResPacketSize*1, pmadd(c1,palpha,ploadu(res+i+ResPacketSize*1))); pstoreu(res+i+ResPacketSize*2, pmadd(c2,palpha,ploadu(res+i+ResPacketSize*2))); i+=ResPacketSize*3; } if(i(ResScalar(0)), c1 = pset1(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj.pmadd(lhs.template load(i+LhsPacketSize*0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+LhsPacketSize*1,j),b0,c1); } pstoreu(res+i+ResPacketSize*0, pmadd(c0,palpha,ploadu(res+i+ResPacketSize*0))); pstoreu(res+i+ResPacketSize*1, pmadd(c1,palpha,ploadu(res+i+ResPacketSize*1))); i+=ResPacketSize*2; } if(i(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj.pmadd(lhs.template load(i+0,j),b0,c0); } pstoreu(res+i+ResPacketSize*0, pmadd(c0,palpha,ploadu(res+i+ResPacketSize*0))); i+=ResPacketSize; } if(HasHalf && i(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj_half.pmadd(lhs.template load(i+0,j),b0,c0); } pstoreu(res+i+ResPacketSizeHalf*0, pmadd(c0,palpha_half,ploadu(res+i+ResPacketSizeHalf*0))); i+=ResPacketSizeHalf; } if(HasQuarter && i(ResScalar(0)); for(Index j=j2; j(rhs(j,0)); c0 = pcj_quarter.pmadd(lhs.template load(i+0,j),b0,c0); } pstoreu(res+i+ResPacketSizeQuarter*0, pmadd(c0,palpha_quarter,ploadu(res+i+ResPacketSizeQuarter*0))); i+=ResPacketSizeQuarter; } for(;i struct general_matrix_vector_product { typedef gemv_traits Traits; typedef gemv_traits HalfTraits; typedef gemv_traits QuarterTraits; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; typedef typename Traits::LhsPacket LhsPacket; typedef typename Traits::RhsPacket RhsPacket; typedef typename Traits::ResPacket ResPacket; typedef typename HalfTraits::LhsPacket LhsPacketHalf; typedef typename HalfTraits::RhsPacket RhsPacketHalf; typedef typename HalfTraits::ResPacket ResPacketHalf; typedef typename QuarterTraits::LhsPacket LhsPacketQuarter; typedef typename QuarterTraits::RhsPacket RhsPacketQuarter; typedef typename QuarterTraits::ResPacket ResPacketQuarter; EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE static void run( Index rows, Index cols, const LhsMapper& lhs, const RhsMapper& rhs, ResScalar* res, Index resIncr, ResScalar alpha); }; template EIGEN_DEVICE_FUNC EIGEN_DONT_INLINE void general_matrix_vector_product::run( Index rows, Index cols, const LhsMapper& alhs, const RhsMapper& rhs, ResScalar* res, Index resIncr, ResScalar alpha) { // The following copy tells the compiler that lhs's attributes are not modified outside this function // This helps GCC to generate propoer code. LhsMapper lhs(alhs); eigen_internal_assert(rhs.stride()==1); conj_helper cj; conj_helper pcj; conj_helper pcj_half; conj_helper pcj_quarter; // TODO: fine tune the following heuristic. The rationale is that if the matrix is very large, // processing 8 rows at once might be counter productive wrt cache. const Index n8 = lhs.stride()*sizeof(LhsScalar)>32000 ? 0 : rows-7; const Index n4 = rows-3; const Index n2 = rows-1; // TODO: for padded aligned inputs, we could enable aligned reads enum { LhsAlignment = Unaligned, ResPacketSize = Traits::ResPacketSize, ResPacketSizeHalf = HalfTraits::ResPacketSize, ResPacketSizeQuarter = QuarterTraits::ResPacketSize, LhsPacketSize = Traits::LhsPacketSize, LhsPacketSizeHalf = HalfTraits::LhsPacketSize, LhsPacketSizeQuarter = QuarterTraits::LhsPacketSize, HasHalf = (int)ResPacketSizeHalf < (int)ResPacketSize, HasQuarter = (int)ResPacketSizeQuarter < (int)ResPacketSizeHalf }; Index i=0; for(; i(ResScalar(0)), c1 = pset1(ResScalar(0)), c2 = pset1(ResScalar(0)), c3 = pset1(ResScalar(0)), c4 = pset1(ResScalar(0)), c5 = pset1(ResScalar(0)), c6 = pset1(ResScalar(0)), c7 = pset1(ResScalar(0)); Index j=0; for(; j+LhsPacketSize<=cols; j+=LhsPacketSize) { RhsPacket b0 = rhs.template load(j,0); c0 = pcj.pmadd(lhs.template load(i+0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+1,j),b0,c1); c2 = pcj.pmadd(lhs.template load(i+2,j),b0,c2); c3 = pcj.pmadd(lhs.template load(i+3,j),b0,c3); c4 = pcj.pmadd(lhs.template load(i+4,j),b0,c4); c5 = pcj.pmadd(lhs.template load(i+5,j),b0,c5); c6 = pcj.pmadd(lhs.template load(i+6,j),b0,c6); c7 = pcj.pmadd(lhs.template load(i+7,j),b0,c7); } ResScalar cc0 = predux(c0); ResScalar cc1 = predux(c1); ResScalar cc2 = predux(c2); ResScalar cc3 = predux(c3); ResScalar cc4 = predux(c4); ResScalar cc5 = predux(c5); ResScalar cc6 = predux(c6); ResScalar cc7 = predux(c7); for(; j(ResScalar(0)), c1 = pset1(ResScalar(0)), c2 = pset1(ResScalar(0)), c3 = pset1(ResScalar(0)); Index j=0; for(; j+LhsPacketSize<=cols; j+=LhsPacketSize) { RhsPacket b0 = rhs.template load(j,0); c0 = pcj.pmadd(lhs.template load(i+0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+1,j),b0,c1); c2 = pcj.pmadd(lhs.template load(i+2,j),b0,c2); c3 = pcj.pmadd(lhs.template load(i+3,j),b0,c3); } ResScalar cc0 = predux(c0); ResScalar cc1 = predux(c1); ResScalar cc2 = predux(c2); ResScalar cc3 = predux(c3); for(; j(ResScalar(0)), c1 = pset1(ResScalar(0)); Index j=0; for(; j+LhsPacketSize<=cols; j+=LhsPacketSize) { RhsPacket b0 = rhs.template load(j,0); c0 = pcj.pmadd(lhs.template load(i+0,j),b0,c0); c1 = pcj.pmadd(lhs.template load(i+1,j),b0,c1); } ResScalar cc0 = predux(c0); ResScalar cc1 = predux(c1); for(; j(ResScalar(0)); ResPacketHalf c0_h = pset1(ResScalar(0)); ResPacketQuarter c0_q = pset1(ResScalar(0)); Index j=0; for(; j+LhsPacketSize<=cols; j+=LhsPacketSize) { RhsPacket b0 = rhs.template load(j,0); c0 = pcj.pmadd(lhs.template load(i,j),b0,c0); } ResScalar cc0 = predux(c0); if (HasHalf) { for(; j+LhsPacketSizeHalf<=cols; j+=LhsPacketSizeHalf) { RhsPacketHalf b0 = rhs.template load(j,0); c0_h = pcj_half.pmadd(lhs.template load(i,j),b0,c0_h); } cc0 += predux(c0_h); } if (HasQuarter) { for(; j+LhsPacketSizeQuarter<=cols; j+=LhsPacketSizeQuarter) { RhsPacketQuarter b0 = rhs.template load(j,0); c0_q = pcj_quarter.pmadd(lhs.template load(i,j),b0,c0_q); } cc0 += predux(c0_q); } for(; j // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFADJOINT_MATRIX_MATRIX_H #define EIGEN_SELFADJOINT_MATRIX_MATRIX_H namespace Eigen { namespace internal { // pack a selfadjoint block diagonal for use with the gebp_kernel template struct symm_pack_lhs { template inline void pack(Scalar* blockA, const const_blas_data_mapper& lhs, Index cols, Index i, Index& count) { // normal copy for(Index k=0; k::type>::half HalfPacket; typedef typename unpacket_traits::type>::half>::half QuarterPacket; enum { PacketSize = packet_traits::size, HalfPacketSize = unpacket_traits::size, QuarterPacketSize = unpacket_traits::size, HasHalf = (int)HalfPacketSize < (int)PacketSize, HasQuarter = (int)QuarterPacketSize < (int)HalfPacketSize}; const_blas_data_mapper lhs(_lhs,lhsStride); Index count = 0; //Index peeled_mc3 = (rows/Pack1)*Pack1; const Index peeled_mc3 = Pack1>=3*PacketSize ? (rows/(3*PacketSize))*(3*PacketSize) : 0; const Index peeled_mc2 = Pack1>=2*PacketSize ? peeled_mc3+((rows-peeled_mc3)/(2*PacketSize))*(2*PacketSize) : 0; const Index peeled_mc1 = Pack1>=1*PacketSize ? peeled_mc2+((rows-peeled_mc2)/(1*PacketSize))*(1*PacketSize) : 0; const Index peeled_mc_half = Pack1>=HalfPacketSize ? peeled_mc1+((rows-peeled_mc1)/(HalfPacketSize))*(HalfPacketSize) : 0; const Index peeled_mc_quarter = Pack1>=QuarterPacketSize ? peeled_mc_half+((rows-peeled_mc_half)/(QuarterPacketSize))*(QuarterPacketSize) : 0; if(Pack1>=3*PacketSize) for(Index i=0; i(blockA, lhs, cols, i, count); if(Pack1>=2*PacketSize) for(Index i=peeled_mc3; i(blockA, lhs, cols, i, count); if(Pack1>=1*PacketSize) for(Index i=peeled_mc2; i(blockA, lhs, cols, i, count); if(HasHalf && Pack1>=HalfPacketSize) for(Index i=peeled_mc1; i(blockA, lhs, cols, i, count); if(HasQuarter && Pack1>=QuarterPacketSize) for(Index i=peeled_mc_half; i(blockA, lhs, cols, i, count); // do the same with mr==1 for(Index i=peeled_mc_quarter; i struct symm_pack_rhs { enum { PacketSize = packet_traits::size }; void operator()(Scalar* blockB, const Scalar* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { Index end_k = k2 + rows; Index count = 0; const_blas_data_mapper rhs(_rhs,rhsStride); Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0; Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0; // first part: normal case for(Index j2=0; j2=4) { blockB[count+2] = rhs(k,j2+2); blockB[count+3] = rhs(k,j2+3); } if (nr>=8) { blockB[count+4] = rhs(k,j2+4); blockB[count+5] = rhs(k,j2+5); blockB[count+6] = rhs(k,j2+6); blockB[count+7] = rhs(k,j2+7); } count += nr; } } // second part: diagonal block Index end8 = nr>=8 ? (std::min)(k2+rows,packet_cols8) : k2; if(nr>=8) { for(Index j2=k2; j2=4) { for(Index j2=end8; j2<(std::min)(k2+rows,packet_cols4); j2+=4) { // again we can split vertically in three different parts (transpose, symmetric, normal) // transpose for(Index k=k2; k=8) { for(Index j2=k2+rows; j2=4) { for(Index j2=(std::max)(packet_cols8,k2+rows); j2 the same with nr==1) for(Index j2=packet_cols4; j2 struct product_selfadjoint_matrix; template struct product_selfadjoint_matrix { static EIGEN_STRONG_INLINE void run( Index rows, Index cols, const Scalar* lhs, Index lhsStride, const Scalar* rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { product_selfadjoint_matrix::IsComplex && EIGEN_LOGICAL_XOR(RhsSelfAdjoint,ConjugateRhs), EIGEN_LOGICAL_XOR(LhsSelfAdjoint,LhsStorageOrder==RowMajor) ? ColMajor : RowMajor, LhsSelfAdjoint, NumTraits::IsComplex && EIGEN_LOGICAL_XOR(LhsSelfAdjoint,ConjugateLhs), ColMajor,ResInnerStride> ::run(cols, rows, rhs, rhsStride, lhs, lhsStride, res, resIncr, resStride, alpha, blocking); } }; template struct product_selfadjoint_matrix { static EIGEN_DONT_INLINE void run( Index rows, Index cols, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void product_selfadjoint_matrix::run( Index rows, Index cols, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* _res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { Index size = rows; typedef gebp_traits Traits; typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper LhsTransposeMapper; typedef const_blas_data_mapper RhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs,lhsStride); LhsTransposeMapper lhs_transpose(_lhs,lhsStride); RhsMapper rhs(_rhs,rhsStride); ResMapper res(_res, resStride, resIncr); Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction // kc must be smaller than mc kc = (std::min)(kc,mc); std::size_t sizeA = kc*mc; std::size_t sizeB = kc*cols; ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); gebp_kernel gebp_kernel; symm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; gemm_pack_lhs pack_lhs_transposed; for(Index k2=0; k2 transposed packed copy // 2 - the diagonal block => special packed copy // 3 - the panel below the diagonal block => generic packed copy for(Index i2=0; i2() (blockA, lhs.getSubMapper(i2, k2), actual_kc, actual_mc); gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha); } } } // matrix * selfadjoint product template struct product_selfadjoint_matrix { static EIGEN_DONT_INLINE void run( Index rows, Index cols, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void product_selfadjoint_matrix::run( Index rows, Index cols, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* _res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { Index size = cols; typedef gebp_traits Traits; typedef const_blas_data_mapper LhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs,lhsStride); ResMapper res(_res,resStride, resIncr); Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction std::size_t sizeA = kc*mc; std::size_t sizeB = kc*cols; ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); gebp_kernel gebp_kernel; gemm_pack_lhs pack_lhs; symm_pack_rhs pack_rhs; for(Index k2=0; k2 GEPP for(Index i2=0; i2 struct selfadjoint_product_impl { typedef typename Product::Scalar Scalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; enum { LhsIsUpper = (LhsMode&(Upper|Lower))==Upper, LhsIsSelfAdjoint = (LhsMode&SelfAdjoint)==SelfAdjoint, RhsIsUpper = (RhsMode&(Upper|Lower))==Upper, RhsIsSelfAdjoint = (RhsMode&SelfAdjoint)==SelfAdjoint }; template static void run(Dest &dst, const Lhs &a_lhs, const Rhs &a_rhs, const Scalar& alpha) { eigen_assert(dst.rows()==a_lhs.rows() && dst.cols()==a_rhs.cols()); typename internal::add_const_on_value_type::type lhs = LhsBlasTraits::extract(a_lhs); typename internal::add_const_on_value_type::type rhs = RhsBlasTraits::extract(a_rhs); Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(a_lhs) * RhsBlasTraits::extractScalarFactor(a_rhs); typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,Scalar,Scalar, Lhs::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime, Lhs::MaxColsAtCompileTime,1> BlockingType; BlockingType blocking(lhs.rows(), rhs.cols(), lhs.cols(), 1, false); internal::product_selfadjoint_matrix::Flags &RowMajorBit) ? RowMajor : ColMajor, LhsIsSelfAdjoint, NumTraits::IsComplex && EIGEN_LOGICAL_XOR(LhsIsUpper,bool(LhsBlasTraits::NeedToConjugate)), EIGEN_LOGICAL_XOR(RhsIsUpper,internal::traits::Flags &RowMajorBit) ? RowMajor : ColMajor, RhsIsSelfAdjoint, NumTraits::IsComplex && EIGEN_LOGICAL_XOR(RhsIsUpper,bool(RhsBlasTraits::NeedToConjugate)), internal::traits::Flags&RowMajorBit ? RowMajor : ColMajor, Dest::InnerStrideAtCompileTime> ::run( lhs.rows(), rhs.cols(), // sizes &lhs.coeffRef(0,0), lhs.outerStride(), // lhs info &rhs.coeffRef(0,0), rhs.outerStride(), // rhs info &dst.coeffRef(0,0), dst.innerStride(), dst.outerStride(), // result info actualAlpha, blocking // alpha ); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SELFADJOINT_MATRIX_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_BLAS.h0000644000176200001440000001543014562417221030040 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Level 3 BLAS SYRK/HERK implementation. ******************************************************************************** */ #ifndef EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_BLAS_H #define EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_BLAS_H namespace Eigen { namespace internal { template struct general_matrix_matrix_rankupdate : general_matrix_matrix_triangular_product< Index,Scalar,AStorageOrder,ConjugateA,Scalar,AStorageOrder,ConjugateA,ResStorageOrder,1,UpLo,BuiltIn> {}; // try to go to BLAS specialization #define EIGEN_BLAS_RANKUPDATE_SPECIALIZE(Scalar) \ template \ struct general_matrix_matrix_triangular_product { \ static EIGEN_STRONG_INLINE void run(Index size, Index depth,const Scalar* lhs, Index lhsStride, \ const Scalar* rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, Scalar alpha, level3_blocking& blocking) \ { \ if ( lhs==rhs && ((UpLo&(Lower|Upper))==UpLo) ) { \ general_matrix_matrix_rankupdate \ ::run(size,depth,lhs,lhsStride,rhs,rhsStride,res,resStride,alpha,blocking); \ } else { \ general_matrix_matrix_triangular_product \ ::run(size,depth,lhs,lhsStride,rhs,rhsStride,res,resIncr,resStride,alpha,blocking); \ } \ } \ }; EIGEN_BLAS_RANKUPDATE_SPECIALIZE(double) EIGEN_BLAS_RANKUPDATE_SPECIALIZE(float) // TODO handle complex cases // EIGEN_BLAS_RANKUPDATE_SPECIALIZE(dcomplex) // EIGEN_BLAS_RANKUPDATE_SPECIALIZE(scomplex) // SYRK for float/double #define EIGEN_BLAS_RANKUPDATE_R(EIGTYPE, BLASTYPE, BLASFUNC) \ template \ struct general_matrix_matrix_rankupdate { \ enum { \ IsLower = (UpLo&Lower) == Lower, \ LowUp = IsLower ? Lower : Upper, \ conjA = ((AStorageOrder==ColMajor) && ConjugateA) ? 1 : 0 \ }; \ static EIGEN_STRONG_INLINE void run(Index size, Index depth,const EIGTYPE* lhs, Index lhsStride, \ const EIGTYPE* /*rhs*/, Index /*rhsStride*/, EIGTYPE* res, Index resStride, EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ /* typedef Matrix MatrixRhs;*/ \ \ BlasIndex lda=convert_index(lhsStride), ldc=convert_index(resStride), n=convert_index(size), k=convert_index(depth); \ char uplo=((IsLower) ? 'L' : 'U'), trans=((AStorageOrder==RowMajor) ? 'T':'N'); \ EIGTYPE beta(1); \ BLASFUNC(&uplo, &trans, &n, &k, (const BLASTYPE*)&numext::real_ref(alpha), lhs, &lda, (const BLASTYPE*)&numext::real_ref(beta), res, &ldc); \ } \ }; // HERK for complex data #define EIGEN_BLAS_RANKUPDATE_C(EIGTYPE, BLASTYPE, RTYPE, BLASFUNC) \ template \ struct general_matrix_matrix_rankupdate { \ enum { \ IsLower = (UpLo&Lower) == Lower, \ LowUp = IsLower ? Lower : Upper, \ conjA = (((AStorageOrder==ColMajor) && ConjugateA) || ((AStorageOrder==RowMajor) && !ConjugateA)) ? 1 : 0 \ }; \ static EIGEN_STRONG_INLINE void run(Index size, Index depth,const EIGTYPE* lhs, Index lhsStride, \ const EIGTYPE* /*rhs*/, Index /*rhsStride*/, EIGTYPE* res, Index resStride, EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ typedef Matrix MatrixType; \ \ BlasIndex lda=convert_index(lhsStride), ldc=convert_index(resStride), n=convert_index(size), k=convert_index(depth); \ char uplo=((IsLower) ? 'L' : 'U'), trans=((AStorageOrder==RowMajor) ? 'C':'N'); \ RTYPE alpha_, beta_; \ const EIGTYPE* a_ptr; \ \ alpha_ = alpha.real(); \ beta_ = 1.0; \ /* Copy with conjugation in some cases*/ \ MatrixType a; \ if (conjA) { \ Map > mapA(lhs,n,k,OuterStride<>(lhsStride)); \ a = mapA.conjugate(); \ lda = a.outerStride(); \ a_ptr = a.data(); \ } else a_ptr=lhs; \ BLASFUNC(&uplo, &trans, &n, &k, &alpha_, (BLASTYPE*)a_ptr, &lda, &beta_, (BLASTYPE*)res, &ldc); \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_RANKUPDATE_R(double, double, dsyrk) EIGEN_BLAS_RANKUPDATE_R(float, float, ssyrk) #else EIGEN_BLAS_RANKUPDATE_R(double, double, dsyrk_) EIGEN_BLAS_RANKUPDATE_R(float, float, ssyrk_) #endif // TODO hanlde complex cases // EIGEN_BLAS_RANKUPDATE_C(dcomplex, double, double, zherk_) // EIGEN_BLAS_RANKUPDATE_C(scomplex, float, float, cherk_) } // end namespace internal } // end namespace Eigen #endif // EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/SelfadjointProduct.h0000644000176200001440000001402414562417221025060 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFADJOINT_PRODUCT_H #define EIGEN_SELFADJOINT_PRODUCT_H /********************************************************************** * This file implements a self adjoint product: C += A A^T updating only * half of the selfadjoint matrix C. * It corresponds to the level 3 SYRK and level 2 SYR Blas routines. **********************************************************************/ namespace Eigen { template struct selfadjoint_rank1_update { static void run(Index size, Scalar* mat, Index stride, const Scalar* vecX, const Scalar* vecY, const Scalar& alpha) { internal::conj_if cj; typedef Map > OtherMap; typedef typename internal::conditional::type ConjLhsType; for (Index i=0; i >(mat+stride*i+(UpLo==Lower ? i : 0), (UpLo==Lower ? size-i : (i+1))) += (alpha * cj(vecY[i])) * ConjLhsType(OtherMap(vecX+(UpLo==Lower ? i : 0),UpLo==Lower ? size-i : (i+1))); } } }; template struct selfadjoint_rank1_update { static void run(Index size, Scalar* mat, Index stride, const Scalar* vecX, const Scalar* vecY, const Scalar& alpha) { selfadjoint_rank1_update::run(size,mat,stride,vecY,vecX,alpha); } }; template struct selfadjoint_product_selector; template struct selfadjoint_product_selector { static void run(MatrixType& mat, const OtherType& other, const typename MatrixType::Scalar& alpha) { typedef typename MatrixType::Scalar Scalar; typedef internal::blas_traits OtherBlasTraits; typedef typename OtherBlasTraits::DirectLinearAccessType ActualOtherType; typedef typename internal::remove_all::type _ActualOtherType; typename internal::add_const_on_value_type::type actualOther = OtherBlasTraits::extract(other.derived()); Scalar actualAlpha = alpha * OtherBlasTraits::extractScalarFactor(other.derived()); enum { StorageOrder = (internal::traits::Flags&RowMajorBit) ? RowMajor : ColMajor, UseOtherDirectly = _ActualOtherType::InnerStrideAtCompileTime==1 }; internal::gemv_static_vector_if static_other; ei_declare_aligned_stack_constructed_variable(Scalar, actualOtherPtr, other.size(), (UseOtherDirectly ? const_cast(actualOther.data()) : static_other.data())); if(!UseOtherDirectly) Map(actualOtherPtr, actualOther.size()) = actualOther; selfadjoint_rank1_update::IsComplex, (!OtherBlasTraits::NeedToConjugate) && NumTraits::IsComplex> ::run(other.size(), mat.data(), mat.outerStride(), actualOtherPtr, actualOtherPtr, actualAlpha); } }; template struct selfadjoint_product_selector { static void run(MatrixType& mat, const OtherType& other, const typename MatrixType::Scalar& alpha) { typedef typename MatrixType::Scalar Scalar; typedef internal::blas_traits OtherBlasTraits; typedef typename OtherBlasTraits::DirectLinearAccessType ActualOtherType; typedef typename internal::remove_all::type _ActualOtherType; typename internal::add_const_on_value_type::type actualOther = OtherBlasTraits::extract(other.derived()); Scalar actualAlpha = alpha * OtherBlasTraits::extractScalarFactor(other.derived()); enum { IsRowMajor = (internal::traits::Flags&RowMajorBit) ? 1 : 0, OtherIsRowMajor = _ActualOtherType::Flags&RowMajorBit ? 1 : 0 }; Index size = mat.cols(); Index depth = actualOther.cols(); typedef internal::gemm_blocking_space BlockingType; BlockingType blocking(size, size, depth, 1, false); internal::general_matrix_matrix_triangular_product::IsComplex, Scalar, OtherIsRowMajor ? ColMajor : RowMajor, (!OtherBlasTraits::NeedToConjugate) && NumTraits::IsComplex, IsRowMajor ? RowMajor : ColMajor, MatrixType::InnerStrideAtCompileTime, UpLo> ::run(size, depth, actualOther.data(), actualOther.outerStride(), actualOther.data(), actualOther.outerStride(), mat.data(), mat.innerStride(), mat.outerStride(), actualAlpha, blocking); } }; // high level API template template EIGEN_DEVICE_FUNC SelfAdjointView& SelfAdjointView ::rankUpdate(const MatrixBase& u, const Scalar& alpha) { selfadjoint_product_selector::run(_expression().const_cast_derived(), u.derived(), alpha); return *this; } } // end namespace Eigen #endif // EIGEN_SELFADJOINT_PRODUCT_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixVector_BLAS.h0000644000176200001440000001434014107270226026001 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * General matrix-vector product functionality based on ?GEMV. ******************************************************************************** */ #ifndef EIGEN_GENERAL_MATRIX_VECTOR_BLAS_H #define EIGEN_GENERAL_MATRIX_VECTOR_BLAS_H namespace Eigen { namespace internal { /********************************************************************** * This file implements general matrix-vector multiplication using BLAS * gemv function via partial specialization of * general_matrix_vector_product::run(..) method for float, double, * std::complex and std::complex types **********************************************************************/ // gemv specialization template struct general_matrix_vector_product_gemv; #define EIGEN_BLAS_GEMV_SPECIALIZE(Scalar) \ template \ struct general_matrix_vector_product,ColMajor,ConjugateLhs,Scalar,const_blas_data_mapper,ConjugateRhs,Specialized> { \ static void run( \ Index rows, Index cols, \ const const_blas_data_mapper &lhs, \ const const_blas_data_mapper &rhs, \ Scalar* res, Index resIncr, Scalar alpha) \ { \ if (ConjugateLhs) { \ general_matrix_vector_product,ColMajor,ConjugateLhs,Scalar,const_blas_data_mapper,ConjugateRhs,BuiltIn>::run( \ rows, cols, lhs, rhs, res, resIncr, alpha); \ } else { \ general_matrix_vector_product_gemv::run( \ rows, cols, lhs.data(), lhs.stride(), rhs.data(), rhs.stride(), res, resIncr, alpha); \ } \ } \ }; \ template \ struct general_matrix_vector_product,RowMajor,ConjugateLhs,Scalar,const_blas_data_mapper,ConjugateRhs,Specialized> { \ static void run( \ Index rows, Index cols, \ const const_blas_data_mapper &lhs, \ const const_blas_data_mapper &rhs, \ Scalar* res, Index resIncr, Scalar alpha) \ { \ general_matrix_vector_product_gemv::run( \ rows, cols, lhs.data(), lhs.stride(), rhs.data(), rhs.stride(), res, resIncr, alpha); \ } \ }; \ EIGEN_BLAS_GEMV_SPECIALIZE(double) EIGEN_BLAS_GEMV_SPECIALIZE(float) EIGEN_BLAS_GEMV_SPECIALIZE(dcomplex) EIGEN_BLAS_GEMV_SPECIALIZE(scomplex) #define EIGEN_BLAS_GEMV_SPECIALIZATION(EIGTYPE,BLASTYPE,BLASFUNC) \ template \ struct general_matrix_vector_product_gemv \ { \ typedef Matrix GEMVVector;\ \ static void run( \ Index rows, Index cols, \ const EIGTYPE* lhs, Index lhsStride, \ const EIGTYPE* rhs, Index rhsIncr, \ EIGTYPE* res, Index resIncr, EIGTYPE alpha) \ { \ BlasIndex m=convert_index(rows), n=convert_index(cols), \ lda=convert_index(lhsStride), incx=convert_index(rhsIncr), incy=convert_index(resIncr); \ const EIGTYPE beta(1); \ const EIGTYPE *x_ptr; \ char trans=(LhsStorageOrder==ColMajor) ? 'N' : (ConjugateLhs) ? 'C' : 'T'; \ if (LhsStorageOrder==RowMajor) { \ m = convert_index(cols); \ n = convert_index(rows); \ }\ GEMVVector x_tmp; \ if (ConjugateRhs) { \ Map > map_x(rhs,cols,1,InnerStride<>(incx)); \ x_tmp=map_x.conjugate(); \ x_ptr=x_tmp.data(); \ incx=1; \ } else x_ptr=rhs; \ BLASFUNC(&trans, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)lhs, &lda, (const BLASTYPE*)x_ptr, &incx, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &incy); \ }\ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_GEMV_SPECIALIZATION(double, double, dgemv) EIGEN_BLAS_GEMV_SPECIALIZATION(float, float, sgemv) EIGEN_BLAS_GEMV_SPECIALIZATION(dcomplex, MKL_Complex16, zgemv) EIGEN_BLAS_GEMV_SPECIALIZATION(scomplex, MKL_Complex8 , cgemv) #else EIGEN_BLAS_GEMV_SPECIALIZATION(double, double, dgemv_) EIGEN_BLAS_GEMV_SPECIALIZATION(float, float, sgemv_) EIGEN_BLAS_GEMV_SPECIALIZATION(dcomplex, double, zgemv_) EIGEN_BLAS_GEMV_SPECIALIZATION(scomplex, float, cgemv_) #endif } // end namespase internal } // end namespace Eigen #endif // EIGEN_GENERAL_MATRIX_VECTOR_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/SelfadjointMatrixMatrix_BLAS.h0000644000176200001440000002646214107270226026700 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Self adjoint matrix * matrix product functionality based on ?SYMM/?HEMM. ******************************************************************************** */ #ifndef EIGEN_SELFADJOINT_MATRIX_MATRIX_BLAS_H #define EIGEN_SELFADJOINT_MATRIX_MATRIX_BLAS_H namespace Eigen { namespace internal { /* Optimized selfadjoint matrix * matrix (?SYMM/?HEMM) product */ #define EIGEN_BLAS_SYMM_L(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_selfadjoint_matrix \ {\ \ static void run( \ Index rows, Index cols, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resIncr, Index resStride, \ EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ char side='L', uplo='L'; \ BlasIndex m, n, lda, ldb, ldc; \ const EIGTYPE *a, *b; \ EIGTYPE beta(1); \ MatrixX##EIGPREFIX b_tmp; \ \ /* Set transpose options */ \ /* Set m, n, k */ \ m = convert_index(rows); \ n = convert_index(cols); \ \ /* Set lda, ldb, ldc */ \ lda = convert_index(lhsStride); \ ldb = convert_index(rhsStride); \ ldc = convert_index(resStride); \ \ /* Set a, b, c */ \ if (LhsStorageOrder==RowMajor) uplo='U'; \ a = _lhs; \ \ if (RhsStorageOrder==RowMajor) { \ Map > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \ b_tmp = rhs.adjoint(); \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ } else b = _rhs; \ \ BLASFUNC(&side, &uplo, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)b, &ldb, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &ldc); \ \ } \ }; #define EIGEN_BLAS_HEMM_L(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_selfadjoint_matrix \ {\ static void run( \ Index rows, Index cols, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resIncr, Index resStride, \ EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ char side='L', uplo='L'; \ BlasIndex m, n, lda, ldb, ldc; \ const EIGTYPE *a, *b; \ EIGTYPE beta(1); \ MatrixX##EIGPREFIX b_tmp; \ Matrix a_tmp; \ \ /* Set transpose options */ \ /* Set m, n, k */ \ m = convert_index(rows); \ n = convert_index(cols); \ \ /* Set lda, ldb, ldc */ \ lda = convert_index(lhsStride); \ ldb = convert_index(rhsStride); \ ldc = convert_index(resStride); \ \ /* Set a, b, c */ \ if (((LhsStorageOrder==ColMajor) && ConjugateLhs) || ((LhsStorageOrder==RowMajor) && (!ConjugateLhs))) { \ Map, 0, OuterStride<> > lhs(_lhs,m,m,OuterStride<>(lhsStride)); \ a_tmp = lhs.conjugate(); \ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else a = _lhs; \ if (LhsStorageOrder==RowMajor) uplo='U'; \ \ if (RhsStorageOrder==ColMajor && (!ConjugateRhs)) { \ b = _rhs; } \ else { \ if (RhsStorageOrder==ColMajor && ConjugateRhs) { \ Map > rhs(_rhs,m,n,OuterStride<>(rhsStride)); \ b_tmp = rhs.conjugate(); \ } else \ if (ConjugateRhs) { \ Map > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \ b_tmp = rhs.adjoint(); \ } else { \ Map > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \ b_tmp = rhs.transpose(); \ } \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ } \ \ BLASFUNC(&side, &uplo, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)b, &ldb, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &ldc); \ \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_SYMM_L(double, double, d, dsymm) EIGEN_BLAS_SYMM_L(float, float, f, ssymm) EIGEN_BLAS_HEMM_L(dcomplex, MKL_Complex16, cd, zhemm) EIGEN_BLAS_HEMM_L(scomplex, MKL_Complex8, cf, chemm) #else EIGEN_BLAS_SYMM_L(double, double, d, dsymm_) EIGEN_BLAS_SYMM_L(float, float, f, ssymm_) EIGEN_BLAS_HEMM_L(dcomplex, double, cd, zhemm_) EIGEN_BLAS_HEMM_L(scomplex, float, cf, chemm_) #endif /* Optimized matrix * selfadjoint matrix (?SYMM/?HEMM) product */ #define EIGEN_BLAS_SYMM_R(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_selfadjoint_matrix \ {\ \ static void run( \ Index rows, Index cols, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resIncr, Index resStride, \ EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ char side='R', uplo='L'; \ BlasIndex m, n, lda, ldb, ldc; \ const EIGTYPE *a, *b; \ EIGTYPE beta(1); \ MatrixX##EIGPREFIX b_tmp; \ \ /* Set m, n, k */ \ m = convert_index(rows); \ n = convert_index(cols); \ \ /* Set lda, ldb, ldc */ \ lda = convert_index(rhsStride); \ ldb = convert_index(lhsStride); \ ldc = convert_index(resStride); \ \ /* Set a, b, c */ \ if (RhsStorageOrder==RowMajor) uplo='U'; \ a = _rhs; \ \ if (LhsStorageOrder==RowMajor) { \ Map > lhs(_lhs,n,m,OuterStride<>(rhsStride)); \ b_tmp = lhs.adjoint(); \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ } else b = _lhs; \ \ BLASFUNC(&side, &uplo, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)b, &ldb, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &ldc); \ \ } \ }; #define EIGEN_BLAS_HEMM_R(EIGTYPE, BLASTYPE, EIGPREFIX, BLASFUNC) \ template \ struct product_selfadjoint_matrix \ {\ static void run( \ Index rows, Index cols, \ const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsStride, \ EIGTYPE* res, Index resIncr, Index resStride, \ EIGTYPE alpha, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(resIncr); \ eigen_assert(resIncr == 1); \ char side='R', uplo='L'; \ BlasIndex m, n, lda, ldb, ldc; \ const EIGTYPE *a, *b; \ EIGTYPE beta(1); \ MatrixX##EIGPREFIX b_tmp; \ Matrix a_tmp; \ \ /* Set m, n, k */ \ m = convert_index(rows); \ n = convert_index(cols); \ \ /* Set lda, ldb, ldc */ \ lda = convert_index(rhsStride); \ ldb = convert_index(lhsStride); \ ldc = convert_index(resStride); \ \ /* Set a, b, c */ \ if (((RhsStorageOrder==ColMajor) && ConjugateRhs) || ((RhsStorageOrder==RowMajor) && (!ConjugateRhs))) { \ Map, 0, OuterStride<> > rhs(_rhs,n,n,OuterStride<>(rhsStride)); \ a_tmp = rhs.conjugate(); \ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else a = _rhs; \ if (RhsStorageOrder==RowMajor) uplo='U'; \ \ if (LhsStorageOrder==ColMajor && (!ConjugateLhs)) { \ b = _lhs; } \ else { \ if (LhsStorageOrder==ColMajor && ConjugateLhs) { \ Map > lhs(_lhs,m,n,OuterStride<>(lhsStride)); \ b_tmp = lhs.conjugate(); \ } else \ if (ConjugateLhs) { \ Map > lhs(_lhs,n,m,OuterStride<>(lhsStride)); \ b_tmp = lhs.adjoint(); \ } else { \ Map > lhs(_lhs,n,m,OuterStride<>(lhsStride)); \ b_tmp = lhs.transpose(); \ } \ b = b_tmp.data(); \ ldb = convert_index(b_tmp.outerStride()); \ } \ \ BLASFUNC(&side, &uplo, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)b, &ldb, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &ldc); \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_SYMM_R(double, double, d, dsymm) EIGEN_BLAS_SYMM_R(float, float, f, ssymm) EIGEN_BLAS_HEMM_R(dcomplex, MKL_Complex16, cd, zhemm) EIGEN_BLAS_HEMM_R(scomplex, MKL_Complex8, cf, chemm) #else EIGEN_BLAS_SYMM_R(double, double, d, dsymm_) EIGEN_BLAS_SYMM_R(float, float, f, ssymm_) EIGEN_BLAS_HEMM_R(dcomplex, double, cd, zhemm_) EIGEN_BLAS_HEMM_R(scomplex, float, cf, chemm_) #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_SELFADJOINT_MATRIX_MATRIX_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/Parallelizer.h0000644000176200001440000001271614562417221023711 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PARALLELIZER_H #define EIGEN_PARALLELIZER_H #if EIGEN_HAS_CXX11_ATOMIC #include #endif namespace Eigen { namespace internal { /** \internal */ inline void manage_multi_threading(Action action, int* v) { static int m_maxThreads = -1; EIGEN_UNUSED_VARIABLE(m_maxThreads) if(action==SetAction) { eigen_internal_assert(v!=0); m_maxThreads = *v; } else if(action==GetAction) { eigen_internal_assert(v!=0); #ifdef EIGEN_HAS_OPENMP if(m_maxThreads>0) *v = m_maxThreads; else *v = omp_get_max_threads(); #else *v = 1; #endif } else { eigen_internal_assert(false); } } } /** Must be call first when calling Eigen from multiple threads */ inline void initParallel() { int nbt; internal::manage_multi_threading(GetAction, &nbt); std::ptrdiff_t l1, l2, l3; internal::manage_caching_sizes(GetAction, &l1, &l2, &l3); } /** \returns the max number of threads reserved for Eigen * \sa setNbThreads */ inline int nbThreads() { int ret; internal::manage_multi_threading(GetAction, &ret); return ret; } /** Sets the max number of threads reserved for Eigen * \sa nbThreads */ inline void setNbThreads(int v) { internal::manage_multi_threading(SetAction, &v); } namespace internal { template struct GemmParallelInfo { GemmParallelInfo() : sync(-1), users(0), lhs_start(0), lhs_length(0) {} // volatile is not enough on all architectures (see bug 1572) // to guarantee that when thread A says to thread B that it is // done with packing a block, then all writes have been really // carried out... C++11 memory model+atomic guarantees this. #if EIGEN_HAS_CXX11_ATOMIC std::atomic sync; std::atomic users; #else Index volatile sync; int volatile users; #endif Index lhs_start; Index lhs_length; }; template void parallelize_gemm(const Functor& func, Index rows, Index cols, Index depth, bool transpose) { // TODO when EIGEN_USE_BLAS is defined, // we should still enable OMP for other scalar types // Without C++11, we have to disable GEMM's parallelization on // non x86 architectures because there volatile is not enough for our purpose. // See bug 1572. #if (! defined(EIGEN_HAS_OPENMP)) || defined(EIGEN_USE_BLAS) || ((!EIGEN_HAS_CXX11_ATOMIC) && !(EIGEN_ARCH_i386_OR_x86_64)) // FIXME the transpose variable is only needed to properly split // the matrix product when multithreading is enabled. This is a temporary // fix to support row-major destination matrices. This whole // parallelizer mechanism has to be redesigned anyway. EIGEN_UNUSED_VARIABLE(depth); EIGEN_UNUSED_VARIABLE(transpose); func(0,rows, 0,cols); #else // Dynamically check whether we should enable or disable OpenMP. // The conditions are: // - the max number of threads we can create is greater than 1 // - we are not already in a parallel code // - the sizes are large enough // compute the maximal number of threads from the size of the product: // This first heuristic takes into account that the product kernel is fully optimized when working with nr columns at once. Index size = transpose ? rows : cols; Index pb_max_threads = std::max(1,size / Functor::Traits::nr); // compute the maximal number of threads from the total amount of work: double work = static_cast(rows) * static_cast(cols) * static_cast(depth); double kMinTaskSize = 50000; // FIXME improve this heuristic. pb_max_threads = std::max(1, std::min(pb_max_threads, static_cast( work / kMinTaskSize ) )); // compute the number of threads we are going to use Index threads = std::min(nbThreads(), pb_max_threads); // if multi-threading is explicitly disabled, not useful, or if we already are in a parallel session, // then abort multi-threading // FIXME omp_get_num_threads()>1 only works for openmp, what if the user does not use openmp? if((!Condition) || (threads==1) || (omp_get_num_threads()>1)) return func(0,rows, 0,cols); Eigen::initParallel(); func.initParallelSession(threads); if(transpose) std::swap(rows,cols); ei_declare_aligned_stack_constructed_variable(GemmParallelInfo,info,threads,0); #pragma omp parallel num_threads(threads) { Index i = omp_get_thread_num(); // Note that the actual number of threads might be lower than the number of request ones. Index actual_threads = omp_get_num_threads(); Index blockCols = (cols / actual_threads) & ~Index(0x3); Index blockRows = (rows / actual_threads); blockRows = (blockRows/Functor::Traits::mr)*Functor::Traits::mr; Index r0 = i*blockRows; Index actualBlockRows = (i+1==actual_threads) ? rows-r0 : blockRows; Index c0 = i*blockCols; Index actualBlockCols = (i+1==actual_threads) ? cols-c0 : blockCols; info[i].lhs_start = r0; info[i].lhs_length = actualBlockRows; if(transpose) func(c0, actualBlockCols, 0, rows, info); else func(0, rows, c0, actualBlockCols, info); } #endif } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PARALLELIZER_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularSolverMatrix.h0000644000176200001440000003452614562417221025756 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRIANGULAR_SOLVER_MATRIX_H #define EIGEN_TRIANGULAR_SOLVER_MATRIX_H namespace Eigen { namespace internal { // if the rhs is row major, let's transpose the product template struct triangular_solve_matrix { static void run( Index size, Index cols, const Scalar* tri, Index triStride, Scalar* _other, Index otherIncr, Index otherStride, level3_blocking& blocking) { triangular_solve_matrix< Scalar, Index, Side==OnTheLeft?OnTheRight:OnTheLeft, (Mode&UnitDiag) | ((Mode&Upper) ? Lower : Upper), NumTraits::IsComplex && Conjugate, TriStorageOrder==RowMajor ? ColMajor : RowMajor, ColMajor, OtherInnerStride> ::run(size, cols, tri, triStride, _other, otherIncr, otherStride, blocking); } }; /* Optimized triangular solver with multiple right hand side and the triangular matrix on the left */ template struct triangular_solve_matrix { static EIGEN_DONT_INLINE void run( Index size, Index otherSize, const Scalar* _tri, Index triStride, Scalar* _other, Index otherIncr, Index otherStride, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void triangular_solve_matrix::run( Index size, Index otherSize, const Scalar* _tri, Index triStride, Scalar* _other, Index otherIncr, Index otherStride, level3_blocking& blocking) { Index cols = otherSize; typedef const_blas_data_mapper TriMapper; typedef blas_data_mapper OtherMapper; TriMapper tri(_tri, triStride); OtherMapper other(_other, otherStride, otherIncr); typedef gebp_traits Traits; enum { SmallPanelWidth = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr), IsLower = (Mode&Lower) == Lower }; Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(size,blocking.mc()); // cache block size along the M direction std::size_t sizeA = kc*mc; std::size_t sizeB = kc*cols; ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); conj_if conj; gebp_kernel gebp_kernel; gemm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; // the goal here is to subdivise the Rhs panels such that we keep some cache // coherence when accessing the rhs elements std::ptrdiff_t l1, l2, l3; manage_caching_sizes(GetAction, &l1, &l2, &l3); Index subcols = cols>0 ? l2/(4 * sizeof(Scalar) * std::max(otherStride,size)) : 0; subcols = std::max((subcols/Traits::nr)*Traits::nr, Traits::nr); for(Index k2=IsLower ? 0 : size; IsLower ? k20; IsLower ? k2+=kc : k2-=kc) { const Index actual_kc = (std::min)(IsLower ? size-k2 : k2, kc); // We have selected and packed a big horizontal panel R1 of rhs. Let B be the packed copy of this panel, // and R2 the remaining part of rhs. The corresponding vertical panel of lhs is split into // A11 (the triangular part) and A21 the remaining rectangular part. // Then the high level algorithm is: // - B = R1 => general block copy (done during the next step) // - R1 = A11^-1 B => tricky part // - update B from the new R1 => actually this has to be performed continuously during the above step // - R2 -= A21 * B => GEPP // The tricky part: compute R1 = A11^-1 B while updating B from R1 // The idea is to split A11 into multiple small vertical panels. // Each panel can be split into a small triangular part T1k which is processed without optimization, // and the remaining small part T2k which is processed using gebp with appropriate block strides for(Index j2=0; j2(actual_kc-k1, SmallPanelWidth); // tr solve for (Index k=0; k0) { Index startTarget = IsLower ? k2+k1+actualPanelWidth : k2-actual_kc; pack_lhs(blockA, tri.getSubMapper(startTarget,startBlock), actualPanelWidth, lengthTarget); gebp_kernel(other.getSubMapper(startTarget,j2), blockA, blockB+actual_kc*j2, lengthTarget, actualPanelWidth, actual_cols, Scalar(-1), actualPanelWidth, actual_kc, 0, blockBOffset); } } } // R2 -= A21 * B => GEPP { Index start = IsLower ? k2+kc : 0; Index end = IsLower ? size : k2-kc; for(Index i2=start; i20) { pack_lhs(blockA, tri.getSubMapper(i2, IsLower ? k2 : k2-kc), actual_kc, actual_mc); gebp_kernel(other.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, Scalar(-1), -1, -1, 0, 0); } } } } } /* Optimized triangular solver with multiple left hand sides and the triangular matrix on the right */ template struct triangular_solve_matrix { static EIGEN_DONT_INLINE void run( Index size, Index otherSize, const Scalar* _tri, Index triStride, Scalar* _other, Index otherIncr, Index otherStride, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void triangular_solve_matrix::run( Index size, Index otherSize, const Scalar* _tri, Index triStride, Scalar* _other, Index otherIncr, Index otherStride, level3_blocking& blocking) { Index rows = otherSize; typedef typename NumTraits::Real RealScalar; typedef blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; LhsMapper lhs(_other, otherStride, otherIncr); RhsMapper rhs(_tri, triStride); typedef gebp_traits Traits; enum { RhsStorageOrder = TriStorageOrder, SmallPanelWidth = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr), IsLower = (Mode&Lower) == Lower }; Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction std::size_t sizeA = kc*mc; std::size_t sizeB = kc*size; ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); conj_if conj; gebp_kernel gebp_kernel; gemm_pack_rhs pack_rhs; gemm_pack_rhs pack_rhs_panel; gemm_pack_lhs pack_lhs_panel; for(Index k2=IsLower ? size : 0; IsLower ? k2>0 : k20) pack_rhs(geb, rhs.getSubMapper(actual_k2,startPanel), actual_kc, rs); // triangular packing (we only pack the panels off the diagonal, // neglecting the blocks overlapping the diagonal { for (Index j2=0; j2(actual_kc-j2, SmallPanelWidth); Index actual_j2 = actual_k2 + j2; Index panelOffset = IsLower ? j2+actualPanelWidth : 0; Index panelLength = IsLower ? actual_kc-j2-actualPanelWidth : j2; if (panelLength>0) pack_rhs_panel(blockB+j2*actual_kc, rhs.getSubMapper(actual_k2+panelOffset, actual_j2), panelLength, actualPanelWidth, actual_kc, panelOffset); } } for(Index i2=0; i2 vertical panels of rhs) for (Index j2 = IsLower ? (actual_kc - ((actual_kc%SmallPanelWidth) ? Index(actual_kc%SmallPanelWidth) : Index(SmallPanelWidth))) : 0; IsLower ? j2>=0 : j2(actual_kc-j2, SmallPanelWidth); Index absolute_j2 = actual_k2 + j2; Index panelOffset = IsLower ? j2+actualPanelWidth : 0; Index panelLength = IsLower ? actual_kc - j2 - actualPanelWidth : j2; // GEBP if(panelLength>0) { gebp_kernel(lhs.getSubMapper(i2,absolute_j2), blockA, blockB+j2*actual_kc, actual_mc, panelLength, actualPanelWidth, Scalar(-1), actual_kc, actual_kc, // strides panelOffset, panelOffset); // offsets } // unblocked triangular solve for (Index k=0; k0) gebp_kernel(lhs.getSubMapper(i2, startPanel), blockA, geb, actual_mc, actual_kc, rs, Scalar(-1), -1, -1, 0, 0); } } } } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_SOLVER_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularMatrixVector_BLAS.h0000644000176200001440000002451314107270226026537 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Triangular matrix-vector product functionality based on ?TRMV. ******************************************************************************** */ #ifndef EIGEN_TRIANGULAR_MATRIX_VECTOR_BLAS_H #define EIGEN_TRIANGULAR_MATRIX_VECTOR_BLAS_H namespace Eigen { namespace internal { /********************************************************************** * This file implements triangular matrix-vector multiplication using BLAS **********************************************************************/ // trmv/hemv specialization template struct triangular_matrix_vector_product_trmv : triangular_matrix_vector_product {}; #define EIGEN_BLAS_TRMV_SPECIALIZE(Scalar) \ template \ struct triangular_matrix_vector_product { \ static void run(Index _rows, Index _cols, const Scalar* _lhs, Index lhsStride, \ const Scalar* _rhs, Index rhsIncr, Scalar* _res, Index resIncr, Scalar alpha) { \ triangular_matrix_vector_product_trmv::run( \ _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \ } \ }; \ template \ struct triangular_matrix_vector_product { \ static void run(Index _rows, Index _cols, const Scalar* _lhs, Index lhsStride, \ const Scalar* _rhs, Index rhsIncr, Scalar* _res, Index resIncr, Scalar alpha) { \ triangular_matrix_vector_product_trmv::run( \ _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \ } \ }; EIGEN_BLAS_TRMV_SPECIALIZE(double) EIGEN_BLAS_TRMV_SPECIALIZE(float) EIGEN_BLAS_TRMV_SPECIALIZE(dcomplex) EIGEN_BLAS_TRMV_SPECIALIZE(scomplex) // implements col-major: res += alpha * op(triangular) * vector #define EIGEN_BLAS_TRMV_CM(EIGTYPE, BLASTYPE, EIGPREFIX, BLASPREFIX, BLASPOSTFIX) \ template \ struct triangular_matrix_vector_product_trmv { \ enum { \ IsLower = (Mode&Lower) == Lower, \ SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ LowUp = IsLower ? Lower : Upper \ }; \ static void run(Index _rows, Index _cols, const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsIncr, EIGTYPE* _res, Index resIncr, EIGTYPE alpha) \ { \ if (ConjLhs || IsZeroDiag) { \ triangular_matrix_vector_product::run( \ _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \ return; \ }\ Index size = (std::min)(_rows,_cols); \ Index rows = IsLower ? _rows : size; \ Index cols = IsLower ? size : _cols; \ \ typedef VectorX##EIGPREFIX VectorRhs; \ EIGTYPE *x, *y;\ \ /* Set x*/ \ Map > rhs(_rhs,cols,InnerStride<>(rhsIncr)); \ VectorRhs x_tmp; \ if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \ x = x_tmp.data(); \ \ /* Square part handling */\ \ char trans, uplo, diag; \ BlasIndex m, n, lda, incx, incy; \ EIGTYPE const *a; \ EIGTYPE beta(1); \ \ /* Set m, n */ \ n = convert_index(size); \ lda = convert_index(lhsStride); \ incx = 1; \ incy = convert_index(resIncr); \ \ /* Set uplo, trans and diag*/ \ trans = 'N'; \ uplo = IsLower ? 'L' : 'U'; \ diag = IsUnitDiag ? 'U' : 'N'; \ \ /* call ?TRMV*/ \ BLASPREFIX##trmv##BLASPOSTFIX(&uplo, &trans, &diag, &n, (const BLASTYPE*)_lhs, &lda, (BLASTYPE*)x, &incx); \ \ /* Add op(a_tr)rhs into res*/ \ BLASPREFIX##axpy##BLASPOSTFIX(&n, (const BLASTYPE*)&numext::real_ref(alpha),(const BLASTYPE*)x, &incx, (BLASTYPE*)_res, &incy); \ /* Non-square case - doesn't fit to BLAS ?TRMV. Fall to default triangular product*/ \ if (size<(std::max)(rows,cols)) { \ if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \ x = x_tmp.data(); \ if (size(rows-size); \ n = convert_index(size); \ } \ else { \ x += size; \ y = _res; \ a = _lhs + size*lda; \ m = convert_index(size); \ n = convert_index(cols-size); \ } \ BLASPREFIX##gemv##BLASPOSTFIX(&trans, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)x, &incx, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)y, &incy); \ } \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRMV_CM(double, double, d, d,) EIGEN_BLAS_TRMV_CM(dcomplex, MKL_Complex16, cd, z,) EIGEN_BLAS_TRMV_CM(float, float, f, s,) EIGEN_BLAS_TRMV_CM(scomplex, MKL_Complex8, cf, c,) #else EIGEN_BLAS_TRMV_CM(double, double, d, d, _) EIGEN_BLAS_TRMV_CM(dcomplex, double, cd, z, _) EIGEN_BLAS_TRMV_CM(float, float, f, s, _) EIGEN_BLAS_TRMV_CM(scomplex, float, cf, c, _) #endif // implements row-major: res += alpha * op(triangular) * vector #define EIGEN_BLAS_TRMV_RM(EIGTYPE, BLASTYPE, EIGPREFIX, BLASPREFIX, BLASPOSTFIX) \ template \ struct triangular_matrix_vector_product_trmv { \ enum { \ IsLower = (Mode&Lower) == Lower, \ SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ LowUp = IsLower ? Lower : Upper \ }; \ static void run(Index _rows, Index _cols, const EIGTYPE* _lhs, Index lhsStride, \ const EIGTYPE* _rhs, Index rhsIncr, EIGTYPE* _res, Index resIncr, EIGTYPE alpha) \ { \ if (IsZeroDiag) { \ triangular_matrix_vector_product::run( \ _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \ return; \ }\ Index size = (std::min)(_rows,_cols); \ Index rows = IsLower ? _rows : size; \ Index cols = IsLower ? size : _cols; \ \ typedef VectorX##EIGPREFIX VectorRhs; \ EIGTYPE *x, *y;\ \ /* Set x*/ \ Map > rhs(_rhs,cols,InnerStride<>(rhsIncr)); \ VectorRhs x_tmp; \ if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \ x = x_tmp.data(); \ \ /* Square part handling */\ \ char trans, uplo, diag; \ BlasIndex m, n, lda, incx, incy; \ EIGTYPE const *a; \ EIGTYPE beta(1); \ \ /* Set m, n */ \ n = convert_index(size); \ lda = convert_index(lhsStride); \ incx = 1; \ incy = convert_index(resIncr); \ \ /* Set uplo, trans and diag*/ \ trans = ConjLhs ? 'C' : 'T'; \ uplo = IsLower ? 'U' : 'L'; \ diag = IsUnitDiag ? 'U' : 'N'; \ \ /* call ?TRMV*/ \ BLASPREFIX##trmv##BLASPOSTFIX(&uplo, &trans, &diag, &n, (const BLASTYPE*)_lhs, &lda, (BLASTYPE*)x, &incx); \ \ /* Add op(a_tr)rhs into res*/ \ BLASPREFIX##axpy##BLASPOSTFIX(&n, (const BLASTYPE*)&numext::real_ref(alpha),(const BLASTYPE*)x, &incx, (BLASTYPE*)_res, &incy); \ /* Non-square case - doesn't fit to BLAS ?TRMV. Fall to default triangular product*/ \ if (size<(std::max)(rows,cols)) { \ if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \ x = x_tmp.data(); \ if (size(rows-size); \ n = convert_index(size); \ } \ else { \ x += size; \ y = _res; \ a = _lhs + size; \ m = convert_index(size); \ n = convert_index(cols-size); \ } \ BLASPREFIX##gemv##BLASPOSTFIX(&trans, &n, &m, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (const BLASTYPE*)x, &incx, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)y, &incy); \ } \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRMV_RM(double, double, d, d,) EIGEN_BLAS_TRMV_RM(dcomplex, MKL_Complex16, cd, z,) EIGEN_BLAS_TRMV_RM(float, float, f, s,) EIGEN_BLAS_TRMV_RM(scomplex, MKL_Complex8, cf, c,) #else EIGEN_BLAS_TRMV_RM(double, double, d, d,_) EIGEN_BLAS_TRMV_RM(dcomplex, double, cd, z,_) EIGEN_BLAS_TRMV_RM(float, float, f, s,_) EIGEN_BLAS_TRMV_RM(scomplex, float, cf, c,_) #endif } // end namespase internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_MATRIX_VECTOR_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularSolverMatrix_BLAS.h0000644000176200001440000001506314107270226026547 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Triangular matrix * matrix product functionality based on ?TRMM. ******************************************************************************** */ #ifndef EIGEN_TRIANGULAR_SOLVER_MATRIX_BLAS_H #define EIGEN_TRIANGULAR_SOLVER_MATRIX_BLAS_H namespace Eigen { namespace internal { // implements LeftSide op(triangular)^-1 * general #define EIGEN_BLAS_TRSM_L(EIGTYPE, BLASTYPE, BLASFUNC) \ template \ struct triangular_solve_matrix \ { \ enum { \ IsLower = (Mode&Lower) == Lower, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ conjA = ((TriStorageOrder==ColMajor) && Conjugate) ? 1 : 0 \ }; \ static void run( \ Index size, Index otherSize, \ const EIGTYPE* _tri, Index triStride, \ EIGTYPE* _other, Index otherIncr, Index otherStride, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(otherIncr); \ eigen_assert(otherIncr == 1); \ BlasIndex m = convert_index(size), n = convert_index(otherSize), lda, ldb; \ char side = 'L', uplo, diag='N', transa; \ /* Set alpha_ */ \ EIGTYPE alpha(1); \ ldb = convert_index(otherStride);\ \ const EIGTYPE *a; \ /* Set trans */ \ transa = (TriStorageOrder==RowMajor) ? ((Conjugate) ? 'C' : 'T') : 'N'; \ /* Set uplo */ \ uplo = IsLower ? 'L' : 'U'; \ if (TriStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \ /* Set a, lda */ \ typedef Matrix MatrixTri; \ Map > tri(_tri,size,size,OuterStride<>(triStride)); \ MatrixTri a_tmp; \ \ if (conjA) { \ a_tmp = tri.conjugate(); \ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else { \ a = _tri; \ lda = convert_index(triStride); \ } \ if (IsUnitDiag) diag='U'; \ /* call ?trsm*/ \ BLASFUNC(&side, &uplo, &transa, &diag, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (BLASTYPE*)_other, &ldb); \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRSM_L(double, double, dtrsm) EIGEN_BLAS_TRSM_L(dcomplex, MKL_Complex16, ztrsm) EIGEN_BLAS_TRSM_L(float, float, strsm) EIGEN_BLAS_TRSM_L(scomplex, MKL_Complex8, ctrsm) #else EIGEN_BLAS_TRSM_L(double, double, dtrsm_) EIGEN_BLAS_TRSM_L(dcomplex, double, ztrsm_) EIGEN_BLAS_TRSM_L(float, float, strsm_) EIGEN_BLAS_TRSM_L(scomplex, float, ctrsm_) #endif // implements RightSide general * op(triangular)^-1 #define EIGEN_BLAS_TRSM_R(EIGTYPE, BLASTYPE, BLASFUNC) \ template \ struct triangular_solve_matrix \ { \ enum { \ IsLower = (Mode&Lower) == Lower, \ IsUnitDiag = (Mode&UnitDiag) ? 1 : 0, \ IsZeroDiag = (Mode&ZeroDiag) ? 1 : 0, \ conjA = ((TriStorageOrder==ColMajor) && Conjugate) ? 1 : 0 \ }; \ static void run( \ Index size, Index otherSize, \ const EIGTYPE* _tri, Index triStride, \ EIGTYPE* _other, Index otherIncr, Index otherStride, level3_blocking& /*blocking*/) \ { \ EIGEN_ONLY_USED_FOR_DEBUG(otherIncr); \ eigen_assert(otherIncr == 1); \ BlasIndex m = convert_index(otherSize), n = convert_index(size), lda, ldb; \ char side = 'R', uplo, diag='N', transa; \ /* Set alpha_ */ \ EIGTYPE alpha(1); \ ldb = convert_index(otherStride);\ \ const EIGTYPE *a; \ /* Set trans */ \ transa = (TriStorageOrder==RowMajor) ? ((Conjugate) ? 'C' : 'T') : 'N'; \ /* Set uplo */ \ uplo = IsLower ? 'L' : 'U'; \ if (TriStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \ /* Set a, lda */ \ typedef Matrix MatrixTri; \ Map > tri(_tri,size,size,OuterStride<>(triStride)); \ MatrixTri a_tmp; \ \ if (conjA) { \ a_tmp = tri.conjugate(); \ a = a_tmp.data(); \ lda = convert_index(a_tmp.outerStride()); \ } else { \ a = _tri; \ lda = convert_index(triStride); \ } \ if (IsUnitDiag) diag='U'; \ /* call ?trsm*/ \ BLASFUNC(&side, &uplo, &transa, &diag, &m, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)a, &lda, (BLASTYPE*)_other, &ldb); \ /*std::cout << "TRMS_L specialization!\n";*/ \ } \ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_TRSM_R(double, double, dtrsm) EIGEN_BLAS_TRSM_R(dcomplex, MKL_Complex16, ztrsm) EIGEN_BLAS_TRSM_R(float, float, strsm) EIGEN_BLAS_TRSM_R(scomplex, MKL_Complex8, ctrsm) #else EIGEN_BLAS_TRSM_R(double, double, dtrsm_) EIGEN_BLAS_TRSM_R(dcomplex, double, ztrsm_) EIGEN_BLAS_TRSM_R(float, float, strsm_) EIGEN_BLAS_TRSM_R(scomplex, float, ctrsm_) #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_SOLVER_MATRIX_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularMatrixMatrix.h0000644000176200001440000005077314562417221025752 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRIANGULAR_MATRIX_MATRIX_H #define EIGEN_TRIANGULAR_MATRIX_MATRIX_H namespace Eigen { namespace internal { // template // struct gemm_pack_lhs_triangular // { // Matrix::IsComplex && Conjugate> cj; // const_blas_data_mapper lhs(_lhs,lhsStride); // int count = 0; // const int peeled_mc = (rows/mr)*mr; // for(int i=0; i struct product_triangular_matrix_matrix; template struct product_triangular_matrix_matrix { static EIGEN_STRONG_INLINE void run( Index rows, Index cols, Index depth, const Scalar* lhs, Index lhsStride, const Scalar* rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { product_triangular_matrix_matrix ::run(cols, rows, depth, rhs, rhsStride, lhs, lhsStride, res, resIncr, resStride, alpha, blocking); } }; // implements col-major += alpha * op(triangular) * op(general) template struct product_triangular_matrix_matrix { typedef gebp_traits Traits; enum { SmallPanelWidth = 2 * EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr), IsLower = (Mode&Lower) == Lower, SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1 }; static EIGEN_DONT_INLINE void run( Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void product_triangular_matrix_matrix::run( Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* _res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { // strip zeros Index diagSize = (std::min)(_rows,_depth); Index rows = IsLower ? _rows : diagSize; Index depth = IsLower ? diagSize : _depth; Index cols = _cols; typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs,lhsStride); RhsMapper rhs(_rhs,rhsStride); ResMapper res(_res, resStride, resIncr); Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction // The small panel size must not be larger than blocking size. // Usually this should never be the case because SmallPanelWidth^2 is very small // compared to L2 cache size, but let's be safe: Index panelWidth = (std::min)(Index(SmallPanelWidth),(std::min)(kc,mc)); std::size_t sizeA = kc*mc; std::size_t sizeB = kc*cols; ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); // To work around an "error: member reference base type 'Matrix<...> // (Eigen::internal::constructor_without_unaligned_array_assert (*)())' is // not a structure or union" compilation error in nvcc (tested V8.0.61), // create a dummy internal::constructor_without_unaligned_array_assert // object to pass to the Matrix constructor. internal::constructor_without_unaligned_array_assert a; Matrix triangularBuffer(a); triangularBuffer.setZero(); if((Mode&ZeroDiag)==ZeroDiag) triangularBuffer.diagonal().setZero(); else triangularBuffer.diagonal().setOnes(); gebp_kernel gebp_kernel; gemm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; for(Index k2=IsLower ? depth : 0; IsLower ? k2>0 : k2rows)) { actual_kc = rows-k2; k2 = k2+actual_kc-kc; } pack_rhs(blockB, rhs.getSubMapper(actual_k2,0), actual_kc, cols); // the selected lhs's panel has to be split in three different parts: // 1 - the part which is zero => skip it // 2 - the diagonal block => special kernel // 3 - the dense panel below (lower case) or above (upper case) the diagonal block => GEPP // the block diagonal, if any: if(IsLower || actual_k2(actual_kc-k1, panelWidth); Index lengthTarget = IsLower ? actual_kc-k1-actualPanelWidth : k1; Index startBlock = actual_k2+k1; Index blockBOffset = k1; // => GEBP with the micro triangular block // The trick is to pack this micro block while filling the opposite triangular part with zeros. // To this end we do an extra triangular copy to a small temporary buffer for (Index k=0;k0) { Index startTarget = IsLower ? actual_k2+k1+actualPanelWidth : actual_k2; pack_lhs(blockA, lhs.getSubMapper(startTarget,startBlock), actualPanelWidth, lengthTarget); gebp_kernel(res.getSubMapper(startTarget, 0), blockA, blockB, lengthTarget, actualPanelWidth, cols, alpha, actualPanelWidth, actual_kc, 0, blockBOffset); } } } // the part below (lower case) or above (upper case) the diagonal => GEPP { Index start = IsLower ? k2 : 0; Index end = IsLower ? rows : (std::min)(actual_k2,rows); for(Index i2=start; i2() (blockA, lhs.getSubMapper(i2, actual_k2), actual_kc, actual_mc); gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha, -1, -1, 0, 0); } } } } // implements col-major += alpha * op(general) * op(triangular) template struct product_triangular_matrix_matrix { typedef gebp_traits Traits; enum { SmallPanelWidth = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr), IsLower = (Mode&Lower) == Lower, SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1 }; static EIGEN_DONT_INLINE void run( Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking); }; template EIGEN_DONT_INLINE void product_triangular_matrix_matrix::run( Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride, const Scalar* _rhs, Index rhsStride, Scalar* _res, Index resIncr, Index resStride, const Scalar& alpha, level3_blocking& blocking) { const Index PacketBytes = packet_traits::size*sizeof(Scalar); // strip zeros Index diagSize = (std::min)(_cols,_depth); Index rows = _rows; Index depth = IsLower ? _depth : diagSize; Index cols = IsLower ? diagSize : _cols; typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs,lhsStride); RhsMapper rhs(_rhs,rhsStride); ResMapper res(_res, resStride, resIncr); Index kc = blocking.kc(); // cache block size along the K direction Index mc = (std::min)(rows,blocking.mc()); // cache block size along the M direction std::size_t sizeA = kc*mc; std::size_t sizeB = kc*cols+EIGEN_MAX_ALIGN_BYTES/sizeof(Scalar); ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB()); internal::constructor_without_unaligned_array_assert a; Matrix triangularBuffer(a); triangularBuffer.setZero(); if((Mode&ZeroDiag)==ZeroDiag) triangularBuffer.diagonal().setZero(); else triangularBuffer.diagonal().setOnes(); gebp_kernel gebp_kernel; gemm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; gemm_pack_rhs pack_rhs_panel; for(Index k2=IsLower ? 0 : depth; IsLower ? k20; IsLower ? k2+=kc : k2-=kc) { Index actual_kc = (std::min)(IsLower ? depth-k2 : k2, kc); Index actual_k2 = IsLower ? k2 : k2-actual_kc; // align blocks with the end of the triangular part for trapezoidal rhs if(IsLower && (k2cols)) { actual_kc = cols-k2; k2 = actual_k2 + actual_kc - kc; } // remaining size Index rs = IsLower ? (std::min)(cols,actual_k2) : cols - k2; // size of the triangular part Index ts = (IsLower && actual_k2>=cols) ? 0 : actual_kc; Scalar* geb = blockB+ts*ts; geb = geb + internal::first_aligned(geb,PacketBytes/sizeof(Scalar)); pack_rhs(geb, rhs.getSubMapper(actual_k2,IsLower ? 0 : k2), actual_kc, rs); // pack the triangular part of the rhs padding the unrolled blocks with zeros if(ts>0) { for (Index j2=0; j2(actual_kc-j2, SmallPanelWidth); Index actual_j2 = actual_k2 + j2; Index panelOffset = IsLower ? j2+actualPanelWidth : 0; Index panelLength = IsLower ? actual_kc-j2-actualPanelWidth : j2; // general part pack_rhs_panel(blockB+j2*actual_kc, rhs.getSubMapper(actual_k2+panelOffset, actual_j2), panelLength, actualPanelWidth, actual_kc, panelOffset); // append the triangular part via a temporary buffer for (Index j=0;j0) { for (Index j2=0; j2(actual_kc-j2, SmallPanelWidth); Index panelLength = IsLower ? actual_kc-j2 : j2+actualPanelWidth; Index blockOffset = IsLower ? j2 : 0; gebp_kernel(res.getSubMapper(i2, actual_k2 + j2), blockA, blockB+j2*actual_kc, actual_mc, panelLength, actualPanelWidth, alpha, actual_kc, actual_kc, // strides blockOffset, blockOffset);// offsets } } gebp_kernel(res.getSubMapper(i2, IsLower ? 0 : k2), blockA, geb, actual_mc, actual_kc, rs, alpha, -1, -1, 0, 0); } } } /*************************************************************************** * Wrapper to product_triangular_matrix_matrix ***************************************************************************/ } // end namespace internal namespace internal { template struct triangular_product_impl { template static void run(Dest& dst, const Lhs &a_lhs, const Rhs &a_rhs, const typename Dest::Scalar& alpha) { typedef typename Lhs::Scalar LhsScalar; typedef typename Rhs::Scalar RhsScalar; typedef typename Dest::Scalar Scalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef typename internal::remove_all::type ActualLhsTypeCleaned; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; typedef typename internal::remove_all::type ActualRhsTypeCleaned; typename internal::add_const_on_value_type::type lhs = LhsBlasTraits::extract(a_lhs); typename internal::add_const_on_value_type::type rhs = RhsBlasTraits::extract(a_rhs); LhsScalar lhs_alpha = LhsBlasTraits::extractScalarFactor(a_lhs); RhsScalar rhs_alpha = RhsBlasTraits::extractScalarFactor(a_rhs); Scalar actualAlpha = alpha * lhs_alpha * rhs_alpha; typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,Scalar,Scalar, Lhs::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime, Lhs::MaxColsAtCompileTime,4> BlockingType; enum { IsLower = (Mode&Lower) == Lower }; Index stripedRows = ((!LhsIsTriangular) || (IsLower)) ? lhs.rows() : (std::min)(lhs.rows(),lhs.cols()); Index stripedCols = ((LhsIsTriangular) || (!IsLower)) ? rhs.cols() : (std::min)(rhs.cols(),rhs.rows()); Index stripedDepth = LhsIsTriangular ? ((!IsLower) ? lhs.cols() : (std::min)(lhs.cols(),lhs.rows())) : ((IsLower) ? rhs.rows() : (std::min)(rhs.rows(),rhs.cols())); BlockingType blocking(stripedRows, stripedCols, stripedDepth, 1, false); internal::product_triangular_matrix_matrix::Flags&RowMajorBit) ? RowMajor : ColMajor, LhsBlasTraits::NeedToConjugate, (internal::traits::Flags&RowMajorBit) ? RowMajor : ColMajor, RhsBlasTraits::NeedToConjugate, (internal::traits::Flags&RowMajorBit) ? RowMajor : ColMajor, Dest::InnerStrideAtCompileTime> ::run( stripedRows, stripedCols, stripedDepth, // sizes &lhs.coeffRef(0,0), lhs.outerStride(), // lhs info &rhs.coeffRef(0,0), rhs.outerStride(), // rhs info &dst.coeffRef(0,0), dst.innerStride(), dst.outerStride(), // result info actualAlpha, blocking ); // Apply correction if the diagonal is unit and a scalar factor was nested: if ((Mode&UnitDiag)==UnitDiag) { if (LhsIsTriangular && lhs_alpha!=LhsScalar(1)) { Index diagSize = (std::min)(lhs.rows(),lhs.cols()); dst.topRows(diagSize) -= ((lhs_alpha-LhsScalar(1))*a_rhs).topRows(diagSize); } else if ((!LhsIsTriangular) && rhs_alpha!=RhsScalar(1)) { Index diagSize = (std::min)(rhs.rows(),rhs.cols()); dst.leftCols(diagSize) -= (rhs_alpha-RhsScalar(1))*a_lhs.leftCols(diagSize); } } } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_MATRIX_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/products/TriangularSolverVector.h0000644000176200001440000001337214562417221025750 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRIANGULAR_SOLVER_VECTOR_H #define EIGEN_TRIANGULAR_SOLVER_VECTOR_H namespace Eigen { namespace internal { template struct triangular_solve_vector { static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs) { triangular_solve_vector::run(size, _lhs, lhsStride, rhs); } }; // forward and backward substitution, row-major, rhs is a vector template struct triangular_solve_vector { enum { IsLower = ((Mode&Lower)==Lower) }; static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs) { typedef Map, 0, OuterStride<> > LhsMap; const LhsMap lhs(_lhs,size,size,OuterStride<>(lhsStride)); typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typename internal::conditional< Conjugate, const CwiseUnaryOp,LhsMap>, const LhsMap&> ::type cjLhs(lhs); static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH; for(Index pi=IsLower ? 0 : size; IsLower ? pi0; IsLower ? pi+=PanelWidth : pi-=PanelWidth) { Index actualPanelWidth = (std::min)(IsLower ? size - pi : pi, PanelWidth); Index r = IsLower ? pi : size - pi; // remaining size if (r > 0) { // let's directly call the low level product function because: // 1 - it is faster to compile // 2 - it is slightly faster at runtime Index startRow = IsLower ? pi : pi-actualPanelWidth; Index startCol = IsLower ? 0 : pi; general_matrix_vector_product::run( actualPanelWidth, r, LhsMapper(&lhs.coeffRef(startRow,startCol), lhsStride), RhsMapper(rhs + startCol, 1), rhs + startRow, 1, RhsScalar(-1)); } for(Index k=0; k0) rhs[i] -= (cjLhs.row(i).segment(s,k).transpose().cwiseProduct(Map >(rhs+s,k))).sum(); if((!(Mode & UnitDiag)) && numext::not_equal_strict(rhs[i],RhsScalar(0))) rhs[i] /= cjLhs(i,i); } } } }; // forward and backward substitution, column-major, rhs is a vector template struct triangular_solve_vector { enum { IsLower = ((Mode&Lower)==Lower) }; static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs) { typedef Map, 0, OuterStride<> > LhsMap; const LhsMap lhs(_lhs,size,size,OuterStride<>(lhsStride)); typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typename internal::conditional,LhsMap>, const LhsMap& >::type cjLhs(lhs); static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH; for(Index pi=IsLower ? 0 : size; IsLower ? pi0; IsLower ? pi+=PanelWidth : pi-=PanelWidth) { Index actualPanelWidth = (std::min)(IsLower ? size - pi : pi, PanelWidth); Index startBlock = IsLower ? pi : pi-actualPanelWidth; Index endBlock = IsLower ? pi + actualPanelWidth : 0; for(Index k=0; k0) Map >(rhs+s,r) -= rhs[i] * cjLhs.col(i).segment(s,r); } } Index r = IsLower ? size - endBlock : startBlock; // remaining size if (r > 0) { // let's directly call the low level product function because: // 1 - it is faster to compile // 2 - it is slightly faster at runtime general_matrix_vector_product::run( r, actualPanelWidth, LhsMapper(&lhs.coeffRef(endBlock,startBlock), lhsStride), RhsMapper(rhs+startBlock, 1), rhs+endBlock, 1, RhsScalar(-1)); } } } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULAR_SOLVER_VECTOR_H RcppEigen/inst/include/Eigen/src/Core/products/SelfadjointMatrixVector.h0000644000176200001440000002334614562417221026076 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFADJOINT_MATRIX_VECTOR_H #define EIGEN_SELFADJOINT_MATRIX_VECTOR_H namespace Eigen { namespace internal { /* Optimized selfadjoint matrix * vector product: * This algorithm processes 2 columns at once that allows to both reduce * the number of load/stores of the result by a factor 2 and to reduce * the instruction dependency. */ template struct selfadjoint_matrix_vector_product; template struct selfadjoint_matrix_vector_product { static EIGEN_DONT_INLINE EIGEN_DEVICE_FUNC void run( Index size, const Scalar* lhs, Index lhsStride, const Scalar* rhs, Scalar* res, Scalar alpha); }; template EIGEN_DONT_INLINE EIGEN_DEVICE_FUNC void selfadjoint_matrix_vector_product::run( Index size, const Scalar* lhs, Index lhsStride, const Scalar* rhs, Scalar* res, Scalar alpha) { typedef typename packet_traits::type Packet; typedef typename NumTraits::Real RealScalar; const Index PacketSize = sizeof(Packet)/sizeof(Scalar); enum { IsRowMajor = StorageOrder==RowMajor ? 1 : 0, IsLower = UpLo == Lower ? 1 : 0, FirstTriangular = IsRowMajor == IsLower }; conj_helper::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, IsRowMajor), ConjugateRhs> cj0; conj_helper::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, !IsRowMajor), ConjugateRhs> cj1; conj_helper cjd; conj_helper::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, IsRowMajor), ConjugateRhs> pcj0; conj_helper::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, !IsRowMajor), ConjugateRhs> pcj1; Scalar cjAlpha = ConjugateRhs ? numext::conj(alpha) : alpha; Index bound = numext::maxi(Index(0), size-8) & 0xfffffffe; if (FirstTriangular) bound = size - bound; for (Index j=FirstTriangular ? bound : 0; j<(FirstTriangular ? size : bound);j+=2) { const Scalar* EIGEN_RESTRICT A0 = lhs + j*lhsStride; const Scalar* EIGEN_RESTRICT A1 = lhs + (j+1)*lhsStride; Scalar t0 = cjAlpha * rhs[j]; Packet ptmp0 = pset1(t0); Scalar t1 = cjAlpha * rhs[j+1]; Packet ptmp1 = pset1(t1); Scalar t2(0); Packet ptmp2 = pset1(t2); Scalar t3(0); Packet ptmp3 = pset1(t3); Index starti = FirstTriangular ? 0 : j+2; Index endi = FirstTriangular ? j : size; Index alignedStart = (starti) + internal::first_default_aligned(&res[starti], endi-starti); Index alignedEnd = alignedStart + ((endi-alignedStart)/(PacketSize))*(PacketSize); res[j] += cjd.pmul(numext::real(A0[j]), t0); res[j+1] += cjd.pmul(numext::real(A1[j+1]), t1); if(FirstTriangular) { res[j] += cj0.pmul(A1[j], t1); t3 += cj1.pmul(A1[j], rhs[j]); } else { res[j+1] += cj0.pmul(A0[j+1],t0); t2 += cj1.pmul(A0[j+1], rhs[j+1]); } for (Index i=starti; i huge speed up) // gcc 4.2 does this optimization automatically. const Scalar* EIGEN_RESTRICT a0It = A0 + alignedStart; const Scalar* EIGEN_RESTRICT a1It = A1 + alignedStart; const Scalar* EIGEN_RESTRICT rhsIt = rhs + alignedStart; Scalar* EIGEN_RESTRICT resIt = res + alignedStart; for (Index i=alignedStart; i(a0It); a0It += PacketSize; Packet A1i = ploadu(a1It); a1It += PacketSize; Packet Bi = ploadu(rhsIt); rhsIt += PacketSize; // FIXME should be aligned in most cases Packet Xi = pload (resIt); Xi = pcj0.pmadd(A0i,ptmp0, pcj0.pmadd(A1i,ptmp1,Xi)); ptmp2 = pcj1.pmadd(A0i, Bi, ptmp2); ptmp3 = pcj1.pmadd(A1i, Bi, ptmp3); pstore(resIt,Xi); resIt += PacketSize; } for (Index i=alignedEnd; i struct selfadjoint_product_impl { typedef typename Product::Scalar Scalar; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType; typedef typename internal::remove_all::type ActualLhsTypeCleaned; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType; typedef typename internal::remove_all::type ActualRhsTypeCleaned; enum { LhsUpLo = LhsMode&(Upper|Lower) }; template static EIGEN_DEVICE_FUNC void run(Dest& dest, const Lhs &a_lhs, const Rhs &a_rhs, const Scalar& alpha) { typedef typename Dest::Scalar ResScalar; typedef typename Rhs::Scalar RhsScalar; typedef Map, EIGEN_PLAIN_ENUM_MIN(AlignedMax,internal::packet_traits::size)> MappedDest; eigen_assert(dest.rows()==a_lhs.rows() && dest.cols()==a_rhs.cols()); typename internal::add_const_on_value_type::type lhs = LhsBlasTraits::extract(a_lhs); typename internal::add_const_on_value_type::type rhs = RhsBlasTraits::extract(a_rhs); Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(a_lhs) * RhsBlasTraits::extractScalarFactor(a_rhs); enum { EvalToDest = (Dest::InnerStrideAtCompileTime==1), UseRhs = (ActualRhsTypeCleaned::InnerStrideAtCompileTime==1) }; internal::gemv_static_vector_if static_dest; internal::gemv_static_vector_if static_rhs; ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(), EvalToDest ? dest.data() : static_dest.data()); ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,rhs.size(), UseRhs ? const_cast(rhs.data()) : static_rhs.data()); if(!EvalToDest) { #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN Index size = dest.size(); EIGEN_DENSE_STORAGE_CTOR_PLUGIN #endif MappedDest(actualDestPtr, dest.size()) = dest; } if(!UseRhs) { #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN Index size = rhs.size(); EIGEN_DENSE_STORAGE_CTOR_PLUGIN #endif Map(actualRhsPtr, rhs.size()) = rhs; } internal::selfadjoint_matrix_vector_product::Flags&RowMajorBit) ? RowMajor : ColMajor, int(LhsUpLo), bool(LhsBlasTraits::NeedToConjugate), bool(RhsBlasTraits::NeedToConjugate)>::run ( lhs.rows(), // size &lhs.coeffRef(0,0), lhs.outerStride(), // lhs info actualRhsPtr, // rhs info actualDestPtr, // result info actualAlpha // scale factor ); if(!EvalToDest) dest = MappedDest(actualDestPtr, dest.size()); } }; template struct selfadjoint_product_impl { typedef typename Product::Scalar Scalar; enum { RhsUpLo = RhsMode&(Upper|Lower) }; template static void run(Dest& dest, const Lhs &a_lhs, const Rhs &a_rhs, const Scalar& alpha) { // let's simply transpose the product Transpose destT(dest); selfadjoint_product_impl, int(RhsUpLo)==Upper ? Lower : Upper, false, Transpose, 0, true>::run(destT, a_rhs.transpose(), a_lhs.transpose(), alpha); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_SELFADJOINT_MATRIX_VECTOR_H RcppEigen/inst/include/Eigen/src/Core/products/SelfadjointMatrixVector_BLAS.h0000644000176200001440000001213114107270226026662 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to BLAS F77 * Selfadjoint matrix-vector product functionality based on ?SYMV/HEMV. ******************************************************************************** */ #ifndef EIGEN_SELFADJOINT_MATRIX_VECTOR_BLAS_H #define EIGEN_SELFADJOINT_MATRIX_VECTOR_BLAS_H namespace Eigen { namespace internal { /********************************************************************** * This file implements selfadjoint matrix-vector multiplication using BLAS **********************************************************************/ // symv/hemv specialization template struct selfadjoint_matrix_vector_product_symv : selfadjoint_matrix_vector_product {}; #define EIGEN_BLAS_SYMV_SPECIALIZE(Scalar) \ template \ struct selfadjoint_matrix_vector_product { \ static void run( \ Index size, const Scalar* lhs, Index lhsStride, \ const Scalar* _rhs, Scalar* res, Scalar alpha) { \ enum {\ IsColMajor = StorageOrder==ColMajor \ }; \ if (IsColMajor == ConjugateLhs) {\ selfadjoint_matrix_vector_product::run( \ size, lhs, lhsStride, _rhs, res, alpha); \ } else {\ selfadjoint_matrix_vector_product_symv::run( \ size, lhs, lhsStride, _rhs, res, alpha); \ }\ } \ }; \ EIGEN_BLAS_SYMV_SPECIALIZE(double) EIGEN_BLAS_SYMV_SPECIALIZE(float) EIGEN_BLAS_SYMV_SPECIALIZE(dcomplex) EIGEN_BLAS_SYMV_SPECIALIZE(scomplex) #define EIGEN_BLAS_SYMV_SPECIALIZATION(EIGTYPE,BLASTYPE,BLASFUNC) \ template \ struct selfadjoint_matrix_vector_product_symv \ { \ typedef Matrix SYMVVector;\ \ static void run( \ Index size, const EIGTYPE* lhs, Index lhsStride, \ const EIGTYPE* _rhs, EIGTYPE* res, EIGTYPE alpha) \ { \ enum {\ IsRowMajor = StorageOrder==RowMajor ? 1 : 0, \ IsLower = UpLo == Lower ? 1 : 0 \ }; \ BlasIndex n=convert_index(size), lda=convert_index(lhsStride), incx=1, incy=1; \ EIGTYPE beta(1); \ const EIGTYPE *x_ptr; \ char uplo=(IsRowMajor) ? (IsLower ? 'U' : 'L') : (IsLower ? 'L' : 'U'); \ SYMVVector x_tmp; \ if (ConjugateRhs) { \ Map map_x(_rhs,size,1); \ x_tmp=map_x.conjugate(); \ x_ptr=x_tmp.data(); \ } else x_ptr=_rhs; \ BLASFUNC(&uplo, &n, (const BLASTYPE*)&numext::real_ref(alpha), (const BLASTYPE*)lhs, &lda, (const BLASTYPE*)x_ptr, &incx, (const BLASTYPE*)&numext::real_ref(beta), (BLASTYPE*)res, &incy); \ }\ }; #ifdef EIGEN_USE_MKL EIGEN_BLAS_SYMV_SPECIALIZATION(double, double, dsymv) EIGEN_BLAS_SYMV_SPECIALIZATION(float, float, ssymv) EIGEN_BLAS_SYMV_SPECIALIZATION(dcomplex, MKL_Complex16, zhemv) EIGEN_BLAS_SYMV_SPECIALIZATION(scomplex, MKL_Complex8, chemv) #else EIGEN_BLAS_SYMV_SPECIALIZATION(double, double, dsymv_) EIGEN_BLAS_SYMV_SPECIALIZATION(float, float, ssymv_) EIGEN_BLAS_SYMV_SPECIALIZATION(dcomplex, double, zhemv_) EIGEN_BLAS_SYMV_SPECIALIZATION(scomplex, float, chemv_) #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_SELFADJOINT_MATRIX_VECTOR_BLAS_H RcppEigen/inst/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h0000644000176200001440000032364014562417221025735 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERAL_BLOCK_PANEL_H #define EIGEN_GENERAL_BLOCK_PANEL_H namespace Eigen { namespace internal { enum GEBPPacketSizeType { GEBPPacketFull = 0, GEBPPacketHalf, GEBPPacketQuarter }; template class gebp_traits; /** \internal \returns b if a<=0, and returns a otherwise. */ inline std::ptrdiff_t manage_caching_sizes_helper(std::ptrdiff_t a, std::ptrdiff_t b) { return a<=0 ? b : a; } #if defined(EIGEN_DEFAULT_L1_CACHE_SIZE) #define EIGEN_SET_DEFAULT_L1_CACHE_SIZE(val) EIGEN_DEFAULT_L1_CACHE_SIZE #else #define EIGEN_SET_DEFAULT_L1_CACHE_SIZE(val) val #endif // defined(EIGEN_DEFAULT_L1_CACHE_SIZE) #if defined(EIGEN_DEFAULT_L2_CACHE_SIZE) #define EIGEN_SET_DEFAULT_L2_CACHE_SIZE(val) EIGEN_DEFAULT_L2_CACHE_SIZE #else #define EIGEN_SET_DEFAULT_L2_CACHE_SIZE(val) val #endif // defined(EIGEN_DEFAULT_L2_CACHE_SIZE) #if defined(EIGEN_DEFAULT_L3_CACHE_SIZE) #define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) EIGEN_DEFAULT_L3_CACHE_SIZE #else #define EIGEN_SET_DEFAULT_L3_CACHE_SIZE(val) val #endif // defined(EIGEN_DEFAULT_L3_CACHE_SIZE) #if EIGEN_ARCH_i386_OR_x86_64 const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(32*1024); const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(256*1024); const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(2*1024*1024); #elif EIGEN_ARCH_PPC const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(64*1024); const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(512*1024); const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(4*1024*1024); #else const std::ptrdiff_t defaultL1CacheSize = EIGEN_SET_DEFAULT_L1_CACHE_SIZE(16*1024); const std::ptrdiff_t defaultL2CacheSize = EIGEN_SET_DEFAULT_L2_CACHE_SIZE(512*1024); const std::ptrdiff_t defaultL3CacheSize = EIGEN_SET_DEFAULT_L3_CACHE_SIZE(512*1024); #endif #undef EIGEN_SET_DEFAULT_L1_CACHE_SIZE #undef EIGEN_SET_DEFAULT_L2_CACHE_SIZE #undef EIGEN_SET_DEFAULT_L3_CACHE_SIZE /** \internal */ struct CacheSizes { CacheSizes(): m_l1(-1),m_l2(-1),m_l3(-1) { int l1CacheSize, l2CacheSize, l3CacheSize; queryCacheSizes(l1CacheSize, l2CacheSize, l3CacheSize); m_l1 = manage_caching_sizes_helper(l1CacheSize, defaultL1CacheSize); m_l2 = manage_caching_sizes_helper(l2CacheSize, defaultL2CacheSize); m_l3 = manage_caching_sizes_helper(l3CacheSize, defaultL3CacheSize); } std::ptrdiff_t m_l1; std::ptrdiff_t m_l2; std::ptrdiff_t m_l3; }; /** \internal */ inline void manage_caching_sizes(Action action, std::ptrdiff_t* l1, std::ptrdiff_t* l2, std::ptrdiff_t* l3) { static CacheSizes m_cacheSizes; if(action==SetAction) { // set the cpu cache size and cache all block sizes from a global cache size in byte eigen_internal_assert(l1!=0 && l2!=0); m_cacheSizes.m_l1 = *l1; m_cacheSizes.m_l2 = *l2; m_cacheSizes.m_l3 = *l3; } else if(action==GetAction) { eigen_internal_assert(l1!=0 && l2!=0); *l1 = m_cacheSizes.m_l1; *l2 = m_cacheSizes.m_l2; *l3 = m_cacheSizes.m_l3; } else { eigen_internal_assert(false); } } /* Helper for computeProductBlockingSizes. * * Given a m x k times k x n matrix product of scalar types \c LhsScalar and \c RhsScalar, * this function computes the blocking size parameters along the respective dimensions * for matrix products and related algorithms. The blocking sizes depends on various * parameters: * - the L1 and L2 cache sizes, * - the register level blocking sizes defined by gebp_traits, * - the number of scalars that fit into a packet (when vectorization is enabled). * * \sa setCpuCacheSizes */ template void evaluateProductBlockingSizesHeuristic(Index& k, Index& m, Index& n, Index num_threads = 1) { typedef gebp_traits Traits; // Explanations: // Let's recall that the product algorithms form mc x kc vertical panels A' on the lhs and // kc x nc blocks B' on the rhs. B' has to fit into L2/L3 cache. Moreover, A' is processed // per mr x kc horizontal small panels where mr is the blocking size along the m dimension // at the register level. This small horizontal panel has to stay within L1 cache. std::ptrdiff_t l1, l2, l3; manage_caching_sizes(GetAction, &l1, &l2, &l3); #ifdef EIGEN_VECTORIZE_AVX512 // We need to find a rationale for that, but without this adjustment, // performance with AVX512 is pretty bad, like -20% slower. // One reason is that with increasing packet-size, the blocking size k // has to become pretty small if we want that 1 lhs panel fit within L1. // For instance, with the 3pX4 kernel and double, the size of the lhs+rhs panels are: // k*(3*64 + 4*8) Bytes, with l1=32kBytes, and k%8=0, we have k=144. // This is quite small for a good reuse of the accumulation registers. l1 *= 4; #endif if (num_threads > 1) { typedef typename Traits::ResScalar ResScalar; enum { kdiv = KcFactor * (Traits::mr * sizeof(LhsScalar) + Traits::nr * sizeof(RhsScalar)), ksub = Traits::mr * Traits::nr * sizeof(ResScalar), kr = 8, mr = Traits::mr, nr = Traits::nr }; // Increasing k gives us more time to prefetch the content of the "C" // registers. However once the latency is hidden there is no point in // increasing the value of k, so we'll cap it at 320 (value determined // experimentally). // To avoid that k vanishes, we make k_cache at least as big as kr const Index k_cache = numext::maxi(kr, (numext::mini)((l1-ksub)/kdiv, 320)); if (k_cache < k) { k = k_cache - (k_cache % kr); eigen_internal_assert(k > 0); } const Index n_cache = (l2-l1) / (nr * sizeof(RhsScalar) * k); const Index n_per_thread = numext::div_ceil(n, num_threads); if (n_cache <= n_per_thread) { // Don't exceed the capacity of the l2 cache. eigen_internal_assert(n_cache >= static_cast(nr)); n = n_cache - (n_cache % nr); eigen_internal_assert(n > 0); } else { n = (numext::mini)(n, (n_per_thread + nr - 1) - ((n_per_thread + nr - 1) % nr)); } if (l3 > l2) { // l3 is shared between all cores, so we'll give each thread its own chunk of l3. const Index m_cache = (l3-l2) / (sizeof(LhsScalar) * k * num_threads); const Index m_per_thread = numext::div_ceil(m, num_threads); if(m_cache < m_per_thread && m_cache >= static_cast(mr)) { m = m_cache - (m_cache % mr); eigen_internal_assert(m > 0); } else { m = (numext::mini)(m, (m_per_thread + mr - 1) - ((m_per_thread + mr - 1) % mr)); } } } else { // In unit tests we do not want to use extra large matrices, // so we reduce the cache size to check the blocking strategy is not flawed #ifdef EIGEN_DEBUG_SMALL_PRODUCT_BLOCKS l1 = 9*1024; l2 = 32*1024; l3 = 512*1024; #endif // Early return for small problems because the computation below are time consuming for small problems. // Perhaps it would make more sense to consider k*n*m?? // Note that for very tiny problem, this function should be bypassed anyway // because we use the coefficient-based implementation for them. if((numext::maxi)(k,(numext::maxi)(m,n))<48) return; typedef typename Traits::ResScalar ResScalar; enum { k_peeling = 8, k_div = KcFactor * (Traits::mr * sizeof(LhsScalar) + Traits::nr * sizeof(RhsScalar)), k_sub = Traits::mr * Traits::nr * sizeof(ResScalar) }; // ---- 1st level of blocking on L1, yields kc ---- // Blocking on the third dimension (i.e., k) is chosen so that an horizontal panel // of size mr x kc of the lhs plus a vertical panel of kc x nr of the rhs both fits within L1 cache. // We also include a register-level block of the result (mx x nr). // (In an ideal world only the lhs panel would stay in L1) // Moreover, kc has to be a multiple of 8 to be compatible with loop peeling, leading to a maximum blocking size of: const Index max_kc = numext::maxi(((l1-k_sub)/k_div) & (~(k_peeling-1)),1); const Index old_k = k; if(k>max_kc) { // We are really blocking on the third dimension: // -> reduce blocking size to make sure the last block is as large as possible // while keeping the same number of sweeps over the result. k = (k%max_kc)==0 ? max_kc : max_kc - k_peeling * ((max_kc-1-(k%max_kc))/(k_peeling*(k/max_kc+1))); eigen_internal_assert(((old_k/k) == (old_k/max_kc)) && "the number of sweeps has to remain the same"); } // ---- 2nd level of blocking on max(L2,L3), yields nc ---- // TODO find a reliable way to get the actual amount of cache per core to use for 2nd level blocking, that is: // actual_l2 = max(l2, l3/nb_core_sharing_l3) // The number below is quite conservative: it is better to underestimate the cache size rather than overestimating it) // For instance, it corresponds to 6MB of L3 shared among 4 cores. #ifdef EIGEN_DEBUG_SMALL_PRODUCT_BLOCKS const Index actual_l2 = l3; #else const Index actual_l2 = 1572864; // == 1.5 MB #endif // Here, nc is chosen such that a block of kc x nc of the rhs fit within half of L2. // The second half is implicitly reserved to access the result and lhs coefficients. // When k= Index(Traits::nr*sizeof(RhsScalar))*k) { // L1 blocking max_nc = remaining_l1 / (k*sizeof(RhsScalar)); } else { // L2 blocking max_nc = (3*actual_l2)/(2*2*max_kc*sizeof(RhsScalar)); } // WARNING Below, we assume that Traits::nr is a power of two. Index nc = numext::mini(actual_l2/(2*k*sizeof(RhsScalar)), max_nc) & (~(Traits::nr-1)); if(n>nc) { // We are really blocking over the columns: // -> reduce blocking size to make sure the last block is as large as possible // while keeping the same number of sweeps over the packed lhs. // Here we allow one more sweep if this gives us a perfect match, thus the commented "-1" n = (n%nc)==0 ? nc : (nc - Traits::nr * ((nc/*-1*/-(n%nc))/(Traits::nr*(n/nc+1)))); } else if(old_k==k) { // So far, no blocking at all, i.e., kc==k, and nc==n. // In this case, let's perform a blocking over the rows such that the packed lhs data is kept in cache L1/L2 // TODO: part of this blocking strategy is now implemented within the kernel itself, so the L1-based heuristic here should be obsolete. Index problem_size = k*n*sizeof(LhsScalar); Index actual_lm = actual_l2; Index max_mc = m; if(problem_size<=1024) { // problem is small enough to keep in L1 // Let's choose m such that lhs's block fit in 1/3 of L1 actual_lm = l1; } else if(l3!=0 && problem_size<=32768) { // we have both L2 and L3, and problem is small enough to be kept in L2 // Let's choose m such that lhs's block fit in 1/3 of L2 actual_lm = l2; max_mc = (numext::mini)(576,max_mc); } Index mc = (numext::mini)(actual_lm/(3*k*sizeof(LhsScalar)), max_mc); if (mc > Traits::mr) mc -= mc % Traits::mr; else if (mc==0) return; m = (m%mc)==0 ? mc : (mc - Traits::mr * ((mc/*-1*/-(m%mc))/(Traits::mr*(m/mc+1)))); } } } template inline bool useSpecificBlockingSizes(Index& k, Index& m, Index& n) { #ifdef EIGEN_TEST_SPECIFIC_BLOCKING_SIZES if (EIGEN_TEST_SPECIFIC_BLOCKING_SIZES) { k = numext::mini(k, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K); m = numext::mini(m, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M); n = numext::mini(n, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N); return true; } #else EIGEN_UNUSED_VARIABLE(k) EIGEN_UNUSED_VARIABLE(m) EIGEN_UNUSED_VARIABLE(n) #endif return false; } /** \brief Computes the blocking parameters for a m x k times k x n matrix product * * \param[in,out] k Input: the third dimension of the product. Output: the blocking size along the same dimension. * \param[in,out] m Input: the number of rows of the left hand side. Output: the blocking size along the same dimension. * \param[in,out] n Input: the number of columns of the right hand side. Output: the blocking size along the same dimension. * * Given a m x k times k x n matrix product of scalar types \c LhsScalar and \c RhsScalar, * this function computes the blocking size parameters along the respective dimensions * for matrix products and related algorithms. * * The blocking size parameters may be evaluated: * - either by a heuristic based on cache sizes; * - or using fixed prescribed values (for testing purposes). * * \sa setCpuCacheSizes */ template void computeProductBlockingSizes(Index& k, Index& m, Index& n, Index num_threads = 1) { if (!useSpecificBlockingSizes(k, m, n)) { evaluateProductBlockingSizesHeuristic(k, m, n, num_threads); } } template inline void computeProductBlockingSizes(Index& k, Index& m, Index& n, Index num_threads = 1) { computeProductBlockingSizes(k, m, n, num_threads); } template struct RhsPanelHelper { private: static const int remaining_registers = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS - registers_taken; public: typedef typename conditional=4, RhsPacketx4, RhsPacket>::type type; }; template struct QuadPacket { Packet B_0, B1, B2, B3; const Packet& get(const FixedInt<0>&) const { return B_0; } const Packet& get(const FixedInt<1>&) const { return B1; } const Packet& get(const FixedInt<2>&) const { return B2; } const Packet& get(const FixedInt<3>&) const { return B3; } }; template struct packet_conditional { typedef T3 type; }; template struct packet_conditional { typedef T1 type; }; template struct packet_conditional { typedef T2 type; }; #define PACKET_DECL_COND_PREFIX(prefix, name, packet_size) \ typedef typename packet_conditional::type, \ typename packet_traits::half, \ typename unpacket_traits::half>::half>::type \ prefix ## name ## Packet #define PACKET_DECL_COND(name, packet_size) \ typedef typename packet_conditional::type, \ typename packet_traits::half, \ typename unpacket_traits::half>::half>::type \ name ## Packet #define PACKET_DECL_COND_SCALAR_PREFIX(prefix, packet_size) \ typedef typename packet_conditional::type, \ typename packet_traits::half, \ typename unpacket_traits::half>::half>::type \ prefix ## ScalarPacket #define PACKET_DECL_COND_SCALAR(packet_size) \ typedef typename packet_conditional::type, \ typename packet_traits::half, \ typename unpacket_traits::half>::half>::type \ ScalarPacket /* Vectorization logic * real*real: unpack rhs to constant packets, ... * * cd*cd : unpack rhs to (b_r,b_r), (b_i,b_i), mul to get (a_r b_r,a_i b_r) (a_r b_i,a_i b_i), * storing each res packet into two packets (2x2), * at the end combine them: swap the second and addsub them * cf*cf : same but with 2x4 blocks * cplx*real : unpack rhs to constant packets, ... * real*cplx : load lhs as (a0,a0,a1,a1), and mul as usual */ template class gebp_traits { public: typedef _LhsScalar LhsScalar; typedef _RhsScalar RhsScalar; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; PACKET_DECL_COND_PREFIX(_, Lhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Rhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Res, _PacketSize); enum { ConjLhs = _ConjLhs, ConjRhs = _ConjRhs, Vectorizable = unpacket_traits<_LhsPacket>::vectorizable && unpacket_traits<_RhsPacket>::vectorizable, LhsPacketSize = Vectorizable ? unpacket_traits<_LhsPacket>::size : 1, RhsPacketSize = Vectorizable ? unpacket_traits<_RhsPacket>::size : 1, ResPacketSize = Vectorizable ? unpacket_traits<_ResPacket>::size : 1, NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS, // register block size along the N direction must be 1 or 4 nr = 4, // register block size along the M direction (currently, this one cannot be modified) default_mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*LhsPacketSize, #if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD) && !defined(EIGEN_VECTORIZE_ALTIVEC) && !defined(EIGEN_VECTORIZE_VSX) \ && ((!EIGEN_COMP_MSVC) || (EIGEN_COMP_MSVC>=1914)) // we assume 16 registers or more // See bug 992, if the scalar type is not vectorizable but that EIGEN_HAS_SINGLE_INSTRUCTION_MADD is defined, // then using 3*LhsPacketSize triggers non-implemented paths in syrk. // Bug 1515: MSVC prior to v19.14 yields to register spilling. mr = Vectorizable ? 3*LhsPacketSize : default_mr, #else mr = default_mr, #endif LhsProgress = LhsPacketSize, RhsProgress = 1 }; typedef typename conditional::type LhsPacket; typedef typename conditional::type RhsPacket; typedef typename conditional::type ResPacket; typedef LhsPacket LhsPacket4Packing; typedef QuadPacket RhsPacketx4; typedef ResPacket AccPacket; EIGEN_STRONG_INLINE void initAcc(AccPacket& p) { p = pset1(ResScalar(0)); } template EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketType& dest) const { dest = pset1(*b); } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { pbroadcast4(b, dest.B_0, dest.B1, dest.B2, dest.B3); } template EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketType& dest) const { loadRhs(b, dest); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const { } EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const { dest = ploadquad(b); } template EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacketType& dest) const { dest = pload(a); } template EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacketType& dest) const { dest = ploadu(a); } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const LaneIdType&) const { conj_helper cj; // It would be a lot cleaner to call pmadd all the time. Unfortunately if we // let gcc allocate the register in which to store the result of the pmul // (in the case where there is no FMA) gcc fails to figure out how to avoid // spilling register. #ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD EIGEN_UNUSED_VARIABLE(tmp); c = cj.pmadd(a,b,c); #else tmp = b; tmp = cj.pmul(a,tmp); c = padd(c,tmp); #endif } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, RhsPacket& tmp, const LaneIdType& lane) const { madd(a, b.get(lane), c, tmp, lane); } EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const { r = pmadd(c,alpha,r); } template EIGEN_STRONG_INLINE void acc(const ResPacketHalf& c, const ResPacketHalf& alpha, ResPacketHalf& r) const { r = pmadd(c,alpha,r); } }; template class gebp_traits, RealScalar, _ConjLhs, false, Arch, _PacketSize> { public: typedef std::complex LhsScalar; typedef RealScalar RhsScalar; typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; PACKET_DECL_COND_PREFIX(_, Lhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Rhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Res, _PacketSize); enum { ConjLhs = _ConjLhs, ConjRhs = false, Vectorizable = unpacket_traits<_LhsPacket>::vectorizable && unpacket_traits<_RhsPacket>::vectorizable, LhsPacketSize = Vectorizable ? unpacket_traits<_LhsPacket>::size : 1, RhsPacketSize = Vectorizable ? unpacket_traits<_RhsPacket>::size : 1, ResPacketSize = Vectorizable ? unpacket_traits<_ResPacket>::size : 1, NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS, nr = 4, #if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD) && !defined(EIGEN_VECTORIZE_ALTIVEC) && !defined(EIGEN_VECTORIZE_VSX) // we assume 16 registers mr = 3*LhsPacketSize, #else mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*LhsPacketSize, #endif LhsProgress = LhsPacketSize, RhsProgress = 1 }; typedef typename conditional::type LhsPacket; typedef typename conditional::type RhsPacket; typedef typename conditional::type ResPacket; typedef LhsPacket LhsPacket4Packing; typedef QuadPacket RhsPacketx4; typedef ResPacket AccPacket; EIGEN_STRONG_INLINE void initAcc(AccPacket& p) { p = pset1(ResScalar(0)); } template EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketType& dest) const { dest = pset1(*b); } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { pbroadcast4(b, dest.B_0, dest.B1, dest.B2, dest.B3); } template EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketType& dest) const { loadRhs(b, dest); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const {} EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const { loadRhsQuad_impl(b,dest, typename conditional::type()); } EIGEN_STRONG_INLINE void loadRhsQuad_impl(const RhsScalar* b, RhsPacket& dest, const true_type&) const { // FIXME we can do better! // what we want here is a ploadheight RhsScalar tmp[4] = {b[0],b[0],b[1],b[1]}; dest = ploadquad(tmp); } EIGEN_STRONG_INLINE void loadRhsQuad_impl(const RhsScalar* b, RhsPacket& dest, const false_type&) const { eigen_internal_assert(RhsPacketSize<=8); dest = pset1(*b); } EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const { dest = pload(a); } template EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacketType& dest) const { dest = ploadu(a); } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const LaneIdType&) const { madd_impl(a, b, c, tmp, typename conditional::type()); } template EIGEN_STRONG_INLINE void madd_impl(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const true_type&) const { #ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD EIGEN_UNUSED_VARIABLE(tmp); c.v = pmadd(a.v,b,c.v); #else tmp = b; tmp = pmul(a.v,tmp); c.v = padd(c.v,tmp); #endif } EIGEN_STRONG_INLINE void madd_impl(const LhsScalar& a, const RhsScalar& b, ResScalar& c, RhsScalar& /*tmp*/, const false_type&) const { c += a * b; } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, RhsPacket& tmp, const LaneIdType& lane) const { madd(a, b.get(lane), c, tmp, lane); } template EIGEN_STRONG_INLINE void acc(const AccPacketType& c, const ResPacketType& alpha, ResPacketType& r) const { conj_helper cj; r = cj.pmadd(c,alpha,r); } protected: }; template struct DoublePacket { Packet first; Packet second; }; template DoublePacket padd(const DoublePacket &a, const DoublePacket &b) { DoublePacket res; res.first = padd(a.first, b.first); res.second = padd(a.second,b.second); return res; } // note that for DoublePacket the "4" in "downto4" // corresponds to the number of complexes, so it means "8" // it terms of real coefficients. template const DoublePacket& predux_half_dowto4(const DoublePacket &a, typename enable_if::size<=8>::type* = 0) { return a; } template DoublePacket::half> predux_half_dowto4(const DoublePacket &a, typename enable_if::size==16>::type* = 0) { // yes, that's pretty hackish :( DoublePacket::half> res; typedef std::complex::type> Cplx; typedef typename packet_traits::type CplxPacket; res.first = predux_half_dowto4(CplxPacket(a.first)).v; res.second = predux_half_dowto4(CplxPacket(a.second)).v; return res; } // same here, "quad" actually means "8" in terms of real coefficients template void loadQuadToDoublePacket(const Scalar* b, DoublePacket& dest, typename enable_if::size<=8>::type* = 0) { dest.first = pset1(numext::real(*b)); dest.second = pset1(numext::imag(*b)); } template void loadQuadToDoublePacket(const Scalar* b, DoublePacket& dest, typename enable_if::size==16>::type* = 0) { // yes, that's pretty hackish too :( typedef typename NumTraits::Real RealScalar; RealScalar r[4] = {numext::real(b[0]), numext::real(b[0]), numext::real(b[1]), numext::real(b[1])}; RealScalar i[4] = {numext::imag(b[0]), numext::imag(b[0]), numext::imag(b[1]), numext::imag(b[1])}; dest.first = ploadquad(r); dest.second = ploadquad(i); } template struct unpacket_traits > { typedef DoublePacket::half> half; }; // template // DoublePacket pmadd(const DoublePacket &a, const DoublePacket &b) // { // DoublePacket res; // res.first = padd(a.first, b.first); // res.second = padd(a.second,b.second); // return res; // } template class gebp_traits, std::complex, _ConjLhs, _ConjRhs, Arch, _PacketSize > { public: typedef std::complex Scalar; typedef std::complex LhsScalar; typedef std::complex RhsScalar; typedef std::complex ResScalar; PACKET_DECL_COND_PREFIX(_, Lhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Rhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Res, _PacketSize); PACKET_DECL_COND(Real, _PacketSize); PACKET_DECL_COND_SCALAR(_PacketSize); enum { ConjLhs = _ConjLhs, ConjRhs = _ConjRhs, Vectorizable = unpacket_traits::vectorizable && unpacket_traits::vectorizable, ResPacketSize = Vectorizable ? unpacket_traits<_ResPacket>::size : 1, LhsPacketSize = Vectorizable ? unpacket_traits<_LhsPacket>::size : 1, RhsPacketSize = Vectorizable ? unpacket_traits::size : 1, RealPacketSize = Vectorizable ? unpacket_traits::size : 1, // FIXME: should depend on NumberOfRegisters nr = 4, mr = ResPacketSize, LhsProgress = ResPacketSize, RhsProgress = 1 }; typedef DoublePacket DoublePacketType; typedef typename conditional::type LhsPacket4Packing; typedef typename conditional::type LhsPacket; typedef typename conditional::type RhsPacket; typedef typename conditional::type ResPacket; typedef typename conditional::type AccPacket; // this actualy holds 8 packets! typedef QuadPacket RhsPacketx4; EIGEN_STRONG_INLINE void initAcc(Scalar& p) { p = Scalar(0); } EIGEN_STRONG_INLINE void initAcc(DoublePacketType& p) { p.first = pset1(RealScalar(0)); p.second = pset1(RealScalar(0)); } // Scalar path EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, ScalarPacket& dest) const { dest = pset1(*b); } // Vectorized path template EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, DoublePacket& dest) const { dest.first = pset1(numext::real(*b)); dest.second = pset1(numext::imag(*b)); } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { loadRhs(b, dest.B_0); loadRhs(b + 1, dest.B1); loadRhs(b + 2, dest.B2); loadRhs(b + 3, dest.B3); } // Scalar path EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, ScalarPacket& dest) const { loadRhs(b, dest); } // Vectorized path template EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, DoublePacket& dest) const { loadRhs(b, dest); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const {} EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, ResPacket& dest) const { loadRhs(b,dest); } EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, DoublePacketType& dest) const { loadQuadToDoublePacket(b,dest); } // nothing special here EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const { dest = pload((const typename unpacket_traits::type*)(a)); } template EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacketType& dest) const { dest = ploadu((const typename unpacket_traits::type*)(a)); } template EIGEN_STRONG_INLINE typename enable_if::value>::type madd(const LhsPacketType& a, const RhsPacketType& b, DoublePacket& c, TmpType& /*tmp*/, const LaneIdType&) const { c.first = padd(pmul(a,b.first), c.first); c.second = padd(pmul(a,b.second),c.second); } template EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, ResPacket& c, RhsPacket& /*tmp*/, const LaneIdType&) const { c = cj.pmadd(a,b,c); } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, RhsPacket& tmp, const LaneIdType& lane) const { madd(a, b.get(lane), c, tmp, lane); } EIGEN_STRONG_INLINE void acc(const Scalar& c, const Scalar& alpha, Scalar& r) const { r += alpha * c; } template EIGEN_STRONG_INLINE void acc(const DoublePacket& c, const ResPacketType& alpha, ResPacketType& r) const { // assemble c ResPacketType tmp; if((!ConjLhs)&&(!ConjRhs)) { tmp = pcplxflip(pconj(ResPacketType(c.second))); tmp = padd(ResPacketType(c.first),tmp); } else if((!ConjLhs)&&(ConjRhs)) { tmp = pconj(pcplxflip(ResPacketType(c.second))); tmp = padd(ResPacketType(c.first),tmp); } else if((ConjLhs)&&(!ConjRhs)) { tmp = pcplxflip(ResPacketType(c.second)); tmp = padd(pconj(ResPacketType(c.first)),tmp); } else if((ConjLhs)&&(ConjRhs)) { tmp = pcplxflip(ResPacketType(c.second)); tmp = psub(pconj(ResPacketType(c.first)),tmp); } r = pmadd(tmp,alpha,r); } protected: conj_helper cj; }; template class gebp_traits, false, _ConjRhs, Arch, _PacketSize > { public: typedef std::complex Scalar; typedef RealScalar LhsScalar; typedef Scalar RhsScalar; typedef Scalar ResScalar; PACKET_DECL_COND_PREFIX(_, Lhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Rhs, _PacketSize); PACKET_DECL_COND_PREFIX(_, Res, _PacketSize); PACKET_DECL_COND_PREFIX(_, Real, _PacketSize); PACKET_DECL_COND_SCALAR_PREFIX(_, _PacketSize); #undef PACKET_DECL_COND_SCALAR_PREFIX #undef PACKET_DECL_COND_PREFIX #undef PACKET_DECL_COND_SCALAR #undef PACKET_DECL_COND enum { ConjLhs = false, ConjRhs = _ConjRhs, Vectorizable = unpacket_traits<_RealPacket>::vectorizable && unpacket_traits<_ScalarPacket>::vectorizable, LhsPacketSize = Vectorizable ? unpacket_traits<_LhsPacket>::size : 1, RhsPacketSize = Vectorizable ? unpacket_traits<_RhsPacket>::size : 1, ResPacketSize = Vectorizable ? unpacket_traits<_ResPacket>::size : 1, NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS, // FIXME: should depend on NumberOfRegisters nr = 4, mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*ResPacketSize, LhsProgress = ResPacketSize, RhsProgress = 1 }; typedef typename conditional::type LhsPacket; typedef typename conditional::type RhsPacket; typedef typename conditional::type ResPacket; typedef LhsPacket LhsPacket4Packing; typedef QuadPacket RhsPacketx4; typedef ResPacket AccPacket; EIGEN_STRONG_INLINE void initAcc(AccPacket& p) { p = pset1(ResScalar(0)); } template EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketType& dest) const { dest = pset1(*b); } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { pbroadcast4(b, dest.B_0, dest.B1, dest.B2, dest.B3); } template EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketType& dest) const { loadRhs(b, dest); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const {} EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const { dest = ploaddup(a); } EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const { dest = ploadquad(b); } template EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacketType& dest) const { dest = ploaddup(a); } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const LaneIdType&) const { madd_impl(a, b, c, tmp, typename conditional::type()); } template EIGEN_STRONG_INLINE void madd_impl(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, RhsPacketType& tmp, const true_type&) const { #ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD EIGEN_UNUSED_VARIABLE(tmp); c.v = pmadd(a,b.v,c.v); #else tmp = b; tmp.v = pmul(a,tmp.v); c = padd(c,tmp); #endif } EIGEN_STRONG_INLINE void madd_impl(const LhsScalar& a, const RhsScalar& b, ResScalar& c, RhsScalar& /*tmp*/, const false_type&) const { c += a * b; } template EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketx4& b, AccPacketType& c, RhsPacket& tmp, const LaneIdType& lane) const { madd(a, b.get(lane), c, tmp, lane); } template EIGEN_STRONG_INLINE void acc(const AccPacketType& c, const ResPacketType& alpha, ResPacketType& r) const { conj_helper cj; r = cj.pmadd(alpha,c,r); } protected: }; /* optimized General packed Block * packed Panel product kernel * * Mixing type logic: C += A * B * | A | B | comments * |real |cplx | no vectorization yet, would require to pack A with duplication * |cplx |real | easy vectorization */ template struct gebp_kernel { typedef gebp_traits Traits; typedef gebp_traits HalfTraits; typedef gebp_traits QuarterTraits; typedef typename Traits::ResScalar ResScalar; typedef typename Traits::LhsPacket LhsPacket; typedef typename Traits::RhsPacket RhsPacket; typedef typename Traits::ResPacket ResPacket; typedef typename Traits::AccPacket AccPacket; typedef typename Traits::RhsPacketx4 RhsPacketx4; typedef typename RhsPanelHelper::type RhsPanel15; typedef gebp_traits SwappedTraits; typedef typename SwappedTraits::ResScalar SResScalar; typedef typename SwappedTraits::LhsPacket SLhsPacket; typedef typename SwappedTraits::RhsPacket SRhsPacket; typedef typename SwappedTraits::ResPacket SResPacket; typedef typename SwappedTraits::AccPacket SAccPacket; typedef typename HalfTraits::LhsPacket LhsPacketHalf; typedef typename HalfTraits::RhsPacket RhsPacketHalf; typedef typename HalfTraits::ResPacket ResPacketHalf; typedef typename HalfTraits::AccPacket AccPacketHalf; typedef typename QuarterTraits::LhsPacket LhsPacketQuarter; typedef typename QuarterTraits::RhsPacket RhsPacketQuarter; typedef typename QuarterTraits::ResPacket ResPacketQuarter; typedef typename QuarterTraits::AccPacket AccPacketQuarter; typedef typename DataMapper::LinearMapper LinearMapper; enum { Vectorizable = Traits::Vectorizable, LhsProgress = Traits::LhsProgress, LhsProgressHalf = HalfTraits::LhsProgress, LhsProgressQuarter = QuarterTraits::LhsProgress, RhsProgress = Traits::RhsProgress, RhsProgressHalf = HalfTraits::RhsProgress, RhsProgressQuarter = QuarterTraits::RhsProgress, ResPacketSize = Traits::ResPacketSize }; EIGEN_DONT_INLINE void operator()(const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB, Index rows, Index depth, Index cols, ResScalar alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template::LhsProgress> struct last_row_process_16_packets { typedef gebp_traits Traits; typedef gebp_traits SwappedTraits; typedef typename Traits::ResScalar ResScalar; typedef typename SwappedTraits::LhsPacket SLhsPacket; typedef typename SwappedTraits::RhsPacket SRhsPacket; typedef typename SwappedTraits::ResPacket SResPacket; typedef typename SwappedTraits::AccPacket SAccPacket; EIGEN_STRONG_INLINE void operator()(const DataMapper& res, SwappedTraits &straits, const LhsScalar* blA, const RhsScalar* blB, Index depth, const Index endk, Index i, Index j2, ResScalar alpha, SAccPacket &C0) { EIGEN_UNUSED_VARIABLE(res); EIGEN_UNUSED_VARIABLE(straits); EIGEN_UNUSED_VARIABLE(blA); EIGEN_UNUSED_VARIABLE(blB); EIGEN_UNUSED_VARIABLE(depth); EIGEN_UNUSED_VARIABLE(endk); EIGEN_UNUSED_VARIABLE(i); EIGEN_UNUSED_VARIABLE(j2); EIGEN_UNUSED_VARIABLE(alpha); EIGEN_UNUSED_VARIABLE(C0); } }; template struct last_row_process_16_packets { typedef gebp_traits Traits; typedef gebp_traits SwappedTraits; typedef typename Traits::ResScalar ResScalar; typedef typename SwappedTraits::LhsPacket SLhsPacket; typedef typename SwappedTraits::RhsPacket SRhsPacket; typedef typename SwappedTraits::ResPacket SResPacket; typedef typename SwappedTraits::AccPacket SAccPacket; EIGEN_STRONG_INLINE void operator()(const DataMapper& res, SwappedTraits &straits, const LhsScalar* blA, const RhsScalar* blB, Index depth, const Index endk, Index i, Index j2, ResScalar alpha, SAccPacket &C0) { typedef typename unpacket_traits::half>::half SResPacketQuarter; typedef typename unpacket_traits::half>::half SLhsPacketQuarter; typedef typename unpacket_traits::half>::half SRhsPacketQuarter; typedef typename unpacket_traits::half>::half SAccPacketQuarter; SResPacketQuarter R = res.template gatherPacket(i, j2); SResPacketQuarter alphav = pset1(alpha); if (depth - endk > 0) { // We have to handle the last row(s) of the rhs, which // correspond to a half-packet SAccPacketQuarter c0 = predux_half_dowto4(predux_half_dowto4(C0)); for (Index kk = endk; kk < depth; kk++) { SLhsPacketQuarter a0; SRhsPacketQuarter b0; straits.loadLhsUnaligned(blB, a0); straits.loadRhs(blA, b0); straits.madd(a0,b0,c0,b0, fix<0>); blB += SwappedTraits::LhsProgress/4; blA += 1; } straits.acc(c0, alphav, R); } else { straits.acc(predux_half_dowto4(predux_half_dowto4(C0)), alphav, R); } res.scatterPacket(i, j2, R); } }; template struct lhs_process_one_packet { typedef typename GEBPTraits::RhsPacketx4 RhsPacketx4; EIGEN_STRONG_INLINE void peeled_kc_onestep(Index K, const LhsScalar* blA, const RhsScalar* blB, GEBPTraits traits, LhsPacket *A0, RhsPacketx4 *rhs_panel, RhsPacket *T0, AccPacket *C0, AccPacket *C1, AccPacket *C2, AccPacket *C3) { EIGEN_ASM_COMMENT("begin step of gebp micro kernel 1X4"); EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); traits.loadLhs(&blA[(0+1*K)*LhsProgress], *A0); traits.loadRhs(&blB[(0+4*K)*RhsProgress], *rhs_panel); traits.madd(*A0, *rhs_panel, *C0, *T0, fix<0>); traits.madd(*A0, *rhs_panel, *C1, *T0, fix<1>); traits.madd(*A0, *rhs_panel, *C2, *T0, fix<2>); traits.madd(*A0, *rhs_panel, *C3, *T0, fix<3>); #if EIGEN_GNUC_AT_LEAST(6,0) && defined(EIGEN_VECTORIZE_SSE) __asm__ ("" : "+x,m" (*A0)); #endif EIGEN_ASM_COMMENT("end step of gebp micro kernel 1X4"); } EIGEN_STRONG_INLINE void operator()( const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB, ResScalar alpha, Index peelStart, Index peelEnd, Index strideA, Index strideB, Index offsetA, Index offsetB, int prefetch_res_offset, Index peeled_kc, Index pk, Index cols, Index depth, Index packet_cols4) { GEBPTraits traits; // loops on each largest micro horizontal panel of lhs // (LhsProgress x depth) for(Index i=peelStart; i(alpha); R0 = r0.template loadPacket(0); R1 = r1.template loadPacket(0); traits.acc(C0, alphav, R0); traits.acc(C1, alphav, R1); r0.storePacket(0, R0); r1.storePacket(0, R1); R0 = r2.template loadPacket(0); R1 = r3.template loadPacket(0); traits.acc(C2, alphav, R0); traits.acc(C3, alphav, R1); r2.storePacket(0, R0); r3.storePacket(0, R1); } // Deal with remaining columns of the rhs for(Index j2=packet_cols4; j2); \ EIGEN_ASM_COMMENT("end step of gebp micro kernel 1/half/quarterX1"); \ } while(false); EIGEN_GEBGP_ONESTEP(0); EIGEN_GEBGP_ONESTEP(1); EIGEN_GEBGP_ONESTEP(2); EIGEN_GEBGP_ONESTEP(3); EIGEN_GEBGP_ONESTEP(4); EIGEN_GEBGP_ONESTEP(5); EIGEN_GEBGP_ONESTEP(6); EIGEN_GEBGP_ONESTEP(7); blB += pk*RhsProgress; blA += pk*LhsProgress; EIGEN_ASM_COMMENT("end gebp micro kernel 1/half/quarterX1"); } // process remaining peeled loop for(Index k=peeled_kc; k(alpha); R0 = r0.template loadPacket(0); traits.acc(C0, alphav, R0); r0.storePacket(0, R0); } } } }; template struct lhs_process_fraction_of_packet : lhs_process_one_packet { EIGEN_STRONG_INLINE void peeled_kc_onestep(Index K, const LhsScalar* blA, const RhsScalar* blB, GEBPTraits traits, LhsPacket *A0, RhsPacket *B_0, RhsPacket *B1, RhsPacket *B2, RhsPacket *B3, AccPacket *C0, AccPacket *C1, AccPacket *C2, AccPacket *C3) { EIGEN_ASM_COMMENT("begin step of gebp micro kernel 1X4"); EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); traits.loadLhsUnaligned(&blA[(0+1*K)*(LhsProgress)], *A0); traits.broadcastRhs(&blB[(0+4*K)*RhsProgress], *B_0, *B1, *B2, *B3); traits.madd(*A0, *B_0, *C0, *B_0); traits.madd(*A0, *B1, *C1, *B1); traits.madd(*A0, *B2, *C2, *B2); traits.madd(*A0, *B3, *C3, *B3); EIGEN_ASM_COMMENT("end step of gebp micro kernel 1X4"); } }; template EIGEN_DONT_INLINE void gebp_kernel ::operator()(const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB, Index rows, Index depth, Index cols, ResScalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { Traits traits; SwappedTraits straits; if(strideA==-1) strideA = depth; if(strideB==-1) strideB = depth; conj_helper cj; Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0; const Index peeled_mc3 = mr>=3*Traits::LhsProgress ? (rows/(3*LhsProgress))*(3*LhsProgress) : 0; const Index peeled_mc2 = mr>=2*Traits::LhsProgress ? peeled_mc3+((rows-peeled_mc3)/(2*LhsProgress))*(2*LhsProgress) : 0; const Index peeled_mc1 = mr>=1*Traits::LhsProgress ? peeled_mc2+((rows-peeled_mc2)/(1*LhsProgress))*(1*LhsProgress) : 0; const Index peeled_mc_half = mr>=LhsProgressHalf ? peeled_mc1+((rows-peeled_mc1)/(LhsProgressHalf))*(LhsProgressHalf) : 0; const Index peeled_mc_quarter = mr>=LhsProgressQuarter ? peeled_mc_half+((rows-peeled_mc_half)/(LhsProgressQuarter))*(LhsProgressQuarter) : 0; enum { pk = 8 }; // NOTE Such a large peeling factor is important for large matrices (~ +5% when >1000 on Haswell) const Index peeled_kc = depth & ~(pk-1); const int prefetch_res_offset = 32/sizeof(ResScalar); // const Index depth2 = depth & ~1; //---------- Process 3 * LhsProgress rows at once ---------- // This corresponds to 3*LhsProgress x nr register blocks. // Usually, make sense only with FMA if(mr>=3*Traits::LhsProgress) { // Here, the general idea is to loop on each largest micro horizontal panel of the lhs (3*Traits::LhsProgress x depth) // and on each largest micro vertical panel of the rhs (depth * nr). // Blocking sizes, i.e., 'depth' has been computed so that the micro horizontal panel of the lhs fit in L1. // However, if depth is too small, we can extend the number of rows of these horizontal panels. // This actual number of rows is computed as follow: const Index l1 = defaultL1CacheSize; // in Bytes, TODO, l1 should be passed to this function. // The max(1, ...) here is needed because we may be using blocking params larger than what our known l1 cache size // suggests we should be using: either because our known l1 cache size is inaccurate (e.g. on Android, we can only guess), // or because we are testing specific blocking sizes. const Index actual_panel_rows = (3*LhsProgress) * std::max(1,( (l1 - sizeof(ResScalar)*mr*nr - depth*nr*sizeof(RhsScalar)) / (depth * sizeof(LhsScalar) * 3*LhsProgress) )); for(Index i1=0; i1); \ traits.madd(A1, rhs_panel, C4, T0, fix<0>); \ traits.madd(A2, rhs_panel, C8, T0, fix<0>); \ traits.updateRhs(blB + (1+4*K) * Traits::RhsProgress, rhs_panel); \ traits.madd(A0, rhs_panel, C1, T0, fix<1>); \ traits.madd(A1, rhs_panel, C5, T0, fix<1>); \ traits.madd(A2, rhs_panel, C9, T0, fix<1>); \ traits.updateRhs(blB + (2+4*K) * Traits::RhsProgress, rhs_panel); \ traits.madd(A0, rhs_panel, C2, T0, fix<2>); \ traits.madd(A1, rhs_panel, C6, T0, fix<2>); \ traits.madd(A2, rhs_panel, C10, T0, fix<2>); \ traits.updateRhs(blB + (3+4*K) * Traits::RhsProgress, rhs_panel); \ traits.madd(A0, rhs_panel, C3, T0, fix<3>); \ traits.madd(A1, rhs_panel, C7, T0, fix<3>); \ traits.madd(A2, rhs_panel, C11, T0, fix<3>); \ EIGEN_ASM_COMMENT("end step of gebp micro kernel 3pX4"); \ } while (false) internal::prefetch(blB); EIGEN_GEBP_ONESTEP(0); EIGEN_GEBP_ONESTEP(1); EIGEN_GEBP_ONESTEP(2); EIGEN_GEBP_ONESTEP(3); EIGEN_GEBP_ONESTEP(4); EIGEN_GEBP_ONESTEP(5); EIGEN_GEBP_ONESTEP(6); EIGEN_GEBP_ONESTEP(7); blB += pk*4*RhsProgress; blA += pk*3*Traits::LhsProgress; EIGEN_ASM_COMMENT("end gebp micro kernel 3pX4"); } // process remaining peeled loop for(Index k=peeled_kc; k(alpha); R0 = r0.template loadPacket(0 * Traits::ResPacketSize); R1 = r0.template loadPacket(1 * Traits::ResPacketSize); R2 = r0.template loadPacket(2 * Traits::ResPacketSize); traits.acc(C0, alphav, R0); traits.acc(C4, alphav, R1); traits.acc(C8, alphav, R2); r0.storePacket(0 * Traits::ResPacketSize, R0); r0.storePacket(1 * Traits::ResPacketSize, R1); r0.storePacket(2 * Traits::ResPacketSize, R2); R0 = r1.template loadPacket(0 * Traits::ResPacketSize); R1 = r1.template loadPacket(1 * Traits::ResPacketSize); R2 = r1.template loadPacket(2 * Traits::ResPacketSize); traits.acc(C1, alphav, R0); traits.acc(C5, alphav, R1); traits.acc(C9, alphav, R2); r1.storePacket(0 * Traits::ResPacketSize, R0); r1.storePacket(1 * Traits::ResPacketSize, R1); r1.storePacket(2 * Traits::ResPacketSize, R2); R0 = r2.template loadPacket(0 * Traits::ResPacketSize); R1 = r2.template loadPacket(1 * Traits::ResPacketSize); R2 = r2.template loadPacket(2 * Traits::ResPacketSize); traits.acc(C2, alphav, R0); traits.acc(C6, alphav, R1); traits.acc(C10, alphav, R2); r2.storePacket(0 * Traits::ResPacketSize, R0); r2.storePacket(1 * Traits::ResPacketSize, R1); r2.storePacket(2 * Traits::ResPacketSize, R2); R0 = r3.template loadPacket(0 * Traits::ResPacketSize); R1 = r3.template loadPacket(1 * Traits::ResPacketSize); R2 = r3.template loadPacket(2 * Traits::ResPacketSize); traits.acc(C3, alphav, R0); traits.acc(C7, alphav, R1); traits.acc(C11, alphav, R2); r3.storePacket(0 * Traits::ResPacketSize, R0); r3.storePacket(1 * Traits::ResPacketSize, R1); r3.storePacket(2 * Traits::ResPacketSize, R2); } } // Deal with remaining columns of the rhs for(Index j2=packet_cols4; j2); \ traits.madd(A1, B_0, C4, B_0, fix<0>); \ traits.madd(A2, B_0, C8, B_0, fix<0>); \ EIGEN_ASM_COMMENT("end step of gebp micro kernel 3pX1"); \ } while (false) EIGEN_GEBGP_ONESTEP(0); EIGEN_GEBGP_ONESTEP(1); EIGEN_GEBGP_ONESTEP(2); EIGEN_GEBGP_ONESTEP(3); EIGEN_GEBGP_ONESTEP(4); EIGEN_GEBGP_ONESTEP(5); EIGEN_GEBGP_ONESTEP(6); EIGEN_GEBGP_ONESTEP(7); blB += int(pk) * int(RhsProgress); blA += int(pk) * 3 * int(Traits::LhsProgress); EIGEN_ASM_COMMENT("end gebp micro kernel 3pX1"); } // process remaining peeled loop for(Index k=peeled_kc; k(alpha); R0 = r0.template loadPacket(0 * Traits::ResPacketSize); R1 = r0.template loadPacket(1 * Traits::ResPacketSize); R2 = r0.template loadPacket(2 * Traits::ResPacketSize); traits.acc(C0, alphav, R0); traits.acc(C4, alphav, R1); traits.acc(C8, alphav, R2); r0.storePacket(0 * Traits::ResPacketSize, R0); r0.storePacket(1 * Traits::ResPacketSize, R1); r0.storePacket(2 * Traits::ResPacketSize, R2); } } } } //---------- Process 2 * LhsProgress rows at once ---------- if(mr>=2*Traits::LhsProgress) { const Index l1 = defaultL1CacheSize; // in Bytes, TODO, l1 should be passed to this function. // The max(1, ...) here is needed because we may be using blocking params larger than what our known l1 cache size // suggests we should be using: either because our known l1 cache size is inaccurate (e.g. on Android, we can only guess), // or because we are testing specific blocking sizes. Index actual_panel_rows = (2*LhsProgress) * std::max(1,( (l1 - sizeof(ResScalar)*mr*nr - depth*nr*sizeof(RhsScalar)) / (depth * sizeof(LhsScalar) * 2*LhsProgress) )); for(Index i1=peeled_mc3; i1=6 without FMA (bug 1637) #if EIGEN_GNUC_AT_LEAST(6,0) && defined(EIGEN_VECTORIZE_SSE) #define EIGEN_GEBP_2PX4_SPILLING_WORKAROUND __asm__ ("" : [a0] "+x,m" (A0),[a1] "+x,m" (A1)); #else #define EIGEN_GEBP_2PX4_SPILLING_WORKAROUND #endif #define EIGEN_GEBGP_ONESTEP(K) \ do { \ EIGEN_ASM_COMMENT("begin step of gebp micro kernel 2pX4"); \ traits.loadLhs(&blA[(0 + 2 * K) * LhsProgress], A0); \ traits.loadLhs(&blA[(1 + 2 * K) * LhsProgress], A1); \ traits.loadRhs(&blB[(0 + 4 * K) * RhsProgress], rhs_panel); \ traits.madd(A0, rhs_panel, C0, T0, fix<0>); \ traits.madd(A1, rhs_panel, C4, T0, fix<0>); \ traits.madd(A0, rhs_panel, C1, T0, fix<1>); \ traits.madd(A1, rhs_panel, C5, T0, fix<1>); \ traits.madd(A0, rhs_panel, C2, T0, fix<2>); \ traits.madd(A1, rhs_panel, C6, T0, fix<2>); \ traits.madd(A0, rhs_panel, C3, T0, fix<3>); \ traits.madd(A1, rhs_panel, C7, T0, fix<3>); \ EIGEN_GEBP_2PX4_SPILLING_WORKAROUND \ EIGEN_ASM_COMMENT("end step of gebp micro kernel 2pX4"); \ } while (false) internal::prefetch(blB+(48+0)); EIGEN_GEBGP_ONESTEP(0); EIGEN_GEBGP_ONESTEP(1); EIGEN_GEBGP_ONESTEP(2); EIGEN_GEBGP_ONESTEP(3); internal::prefetch(blB+(48+16)); EIGEN_GEBGP_ONESTEP(4); EIGEN_GEBGP_ONESTEP(5); EIGEN_GEBGP_ONESTEP(6); EIGEN_GEBGP_ONESTEP(7); blB += pk*4*RhsProgress; blA += pk*(2*Traits::LhsProgress); EIGEN_ASM_COMMENT("end gebp micro kernel 2pX4"); } // process remaining peeled loop for(Index k=peeled_kc; k(alpha); R0 = r0.template loadPacket(0 * Traits::ResPacketSize); R1 = r0.template loadPacket(1 * Traits::ResPacketSize); R2 = r1.template loadPacket(0 * Traits::ResPacketSize); R3 = r1.template loadPacket(1 * Traits::ResPacketSize); traits.acc(C0, alphav, R0); traits.acc(C4, alphav, R1); traits.acc(C1, alphav, R2); traits.acc(C5, alphav, R3); r0.storePacket(0 * Traits::ResPacketSize, R0); r0.storePacket(1 * Traits::ResPacketSize, R1); r1.storePacket(0 * Traits::ResPacketSize, R2); r1.storePacket(1 * Traits::ResPacketSize, R3); R0 = r2.template loadPacket(0 * Traits::ResPacketSize); R1 = r2.template loadPacket(1 * Traits::ResPacketSize); R2 = r3.template loadPacket(0 * Traits::ResPacketSize); R3 = r3.template loadPacket(1 * Traits::ResPacketSize); traits.acc(C2, alphav, R0); traits.acc(C6, alphav, R1); traits.acc(C3, alphav, R2); traits.acc(C7, alphav, R3); r2.storePacket(0 * Traits::ResPacketSize, R0); r2.storePacket(1 * Traits::ResPacketSize, R1); r3.storePacket(0 * Traits::ResPacketSize, R2); r3.storePacket(1 * Traits::ResPacketSize, R3); } } // Deal with remaining columns of the rhs for(Index j2=packet_cols4; j2); \ traits.madd(A1, B_0, C4, B_0, fix<0>); \ EIGEN_ASM_COMMENT("end step of gebp micro kernel 2pX1"); \ } while(false) EIGEN_GEBGP_ONESTEP(0); EIGEN_GEBGP_ONESTEP(1); EIGEN_GEBGP_ONESTEP(2); EIGEN_GEBGP_ONESTEP(3); EIGEN_GEBGP_ONESTEP(4); EIGEN_GEBGP_ONESTEP(5); EIGEN_GEBGP_ONESTEP(6); EIGEN_GEBGP_ONESTEP(7); blB += int(pk) * int(RhsProgress); blA += int(pk) * 2 * int(Traits::LhsProgress); EIGEN_ASM_COMMENT("end gebp micro kernel 2pX1"); } // process remaining peeled loop for(Index k=peeled_kc; k(alpha); R0 = r0.template loadPacket(0 * Traits::ResPacketSize); R1 = r0.template loadPacket(1 * Traits::ResPacketSize); traits.acc(C0, alphav, R0); traits.acc(C4, alphav, R1); r0.storePacket(0 * Traits::ResPacketSize, R0); r0.storePacket(1 * Traits::ResPacketSize, R1); } } } } //---------- Process 1 * LhsProgress rows at once ---------- if(mr>=1*Traits::LhsProgress) { lhs_process_one_packet p; p(res, blockA, blockB, alpha, peeled_mc2, peeled_mc1, strideA, strideB, offsetA, offsetB, prefetch_res_offset, peeled_kc, pk, cols, depth, packet_cols4); } //---------- Process LhsProgressHalf rows at once ---------- if((LhsProgressHalf < LhsProgress) && mr>=LhsProgressHalf) { lhs_process_fraction_of_packet p; p(res, blockA, blockB, alpha, peeled_mc1, peeled_mc_half, strideA, strideB, offsetA, offsetB, prefetch_res_offset, peeled_kc, pk, cols, depth, packet_cols4); } //---------- Process LhsProgressQuarter rows at once ---------- if((LhsProgressQuarter < LhsProgressHalf) && mr>=LhsProgressQuarter) { lhs_process_fraction_of_packet p; p(res, blockA, blockB, alpha, peeled_mc_half, peeled_mc_quarter, strideA, strideB, offsetA, offsetB, prefetch_res_offset, peeled_kc, pk, cols, depth, packet_cols4); } //---------- Process remaining rows, 1 at once ---------- if(peeled_mc_quarter::half>::size; const int SResPacketQuarterSize = unpacket_traits::half>::half>::size; if ((SwappedTraits::LhsProgress % 4) == 0 && (SwappedTraits::LhsProgress<=16) && (SwappedTraits::LhsProgress!=8 || SResPacketHalfSize==nr) && (SwappedTraits::LhsProgress!=16 || SResPacketQuarterSize==nr)) { SAccPacket C0, C1, C2, C3; straits.initAcc(C0); straits.initAcc(C1); straits.initAcc(C2); straits.initAcc(C3); const Index spk = (std::max)(1,SwappedTraits::LhsProgress/4); const Index endk = (depth/spk)*spk; const Index endk4 = (depth/(spk*4))*(spk*4); Index k=0; for(; k); straits.madd(A1,B_1,C1,B_1, fix<0>); straits.loadLhsUnaligned(blB+2*SwappedTraits::LhsProgress, A0); straits.loadLhsUnaligned(blB+3*SwappedTraits::LhsProgress, A1); straits.loadRhsQuad(blA+2*spk, B_0); straits.loadRhsQuad(blA+3*spk, B_1); straits.madd(A0,B_0,C2,B_0, fix<0>); straits.madd(A1,B_1,C3,B_1, fix<0>); blB += 4*SwappedTraits::LhsProgress; blA += 4*spk; } C0 = padd(padd(C0,C1),padd(C2,C3)); for(; k); blB += SwappedTraits::LhsProgress; blA += spk; } if(SwappedTraits::LhsProgress==8) { // Special case where we have to first reduce the accumulation register C0 typedef typename conditional=8,typename unpacket_traits::half,SResPacket>::type SResPacketHalf; typedef typename conditional=8,typename unpacket_traits::half,SLhsPacket>::type SLhsPacketHalf; typedef typename conditional=8,typename unpacket_traits::half,SRhsPacket>::type SRhsPacketHalf; typedef typename conditional=8,typename unpacket_traits::half,SAccPacket>::type SAccPacketHalf; SResPacketHalf R = res.template gatherPacket(i, j2); SResPacketHalf alphav = pset1(alpha); if(depth-endk>0) { // We have to handle the last row of the rhs which corresponds to a half-packet SLhsPacketHalf a0; SRhsPacketHalf b0; straits.loadLhsUnaligned(blB, a0); straits.loadRhs(blA, b0); SAccPacketHalf c0 = predux_half_dowto4(C0); straits.madd(a0,b0,c0,b0, fix<0>); straits.acc(c0, alphav, R); } else { straits.acc(predux_half_dowto4(C0), alphav, R); } res.scatterPacket(i, j2, R); } else if (SwappedTraits::LhsProgress==16) { // Special case where we have to first reduce the // accumulation register C0. We specialize the block in // template form, so that LhsProgress < 16 paths don't // fail to compile last_row_process_16_packets p; p(res, straits, blA, blB, depth, endk, i, j2,alpha, C0); } else { SResPacket R = res.template gatherPacket(i, j2); SResPacket alphav = pset1(alpha); straits.acc(C0, alphav, R); res.scatterPacket(i, j2, R); } } else // scalar path { // get a 1 x 4 res block as registers ResScalar C0(0), C1(0), C2(0), C3(0); for(Index k=0; k struct gemm_pack_lhs { typedef typename DataMapper::LinearMapper LinearMapper; EIGEN_DONT_INLINE void operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template EIGEN_DONT_INLINE void gemm_pack_lhs ::operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { typedef typename unpacket_traits::half HalfPacket; typedef typename unpacket_traits::half>::half QuarterPacket; enum { PacketSize = unpacket_traits::size, HalfPacketSize = unpacket_traits::size, QuarterPacketSize = unpacket_traits::size, HasHalf = (int)HalfPacketSize < (int)PacketSize, HasQuarter = (int)QuarterPacketSize < (int)HalfPacketSize}; EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK LHS"); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(offset); eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride)); eigen_assert( ((Pack1%PacketSize)==0 && Pack1<=4*PacketSize) || (Pack1<=4) ); conj_if::IsComplex && Conjugate> cj; Index count = 0; const Index peeled_mc3 = Pack1>=3*PacketSize ? (rows/(3*PacketSize))*(3*PacketSize) : 0; const Index peeled_mc2 = Pack1>=2*PacketSize ? peeled_mc3+((rows-peeled_mc3)/(2*PacketSize))*(2*PacketSize) : 0; const Index peeled_mc1 = Pack1>=1*PacketSize ? peeled_mc2+((rows-peeled_mc2)/(1*PacketSize))*(1*PacketSize) : 0; const Index peeled_mc_half = Pack1>=HalfPacketSize ? peeled_mc1+((rows-peeled_mc1)/(HalfPacketSize))*(HalfPacketSize) : 0; const Index peeled_mc_quarter = Pack1>=QuarterPacketSize ? (rows/(QuarterPacketSize))*(QuarterPacketSize) : 0; const Index last_lhs_progress = rows > peeled_mc_quarter ? (rows - peeled_mc_quarter) & ~1 : 0; const Index peeled_mc0 = Pack2>=PacketSize ? peeled_mc_quarter : Pack2>1 && last_lhs_progress ? (rows/last_lhs_progress)*last_lhs_progress : 0; Index i=0; // Pack 3 packets if(Pack1>=3*PacketSize) { for(; i(i+0*PacketSize, k); B = lhs.template loadPacket(i+1*PacketSize, k); C = lhs.template loadPacket(i+2*PacketSize, k); pstore(blockA+count, cj.pconj(A)); count+=PacketSize; pstore(blockA+count, cj.pconj(B)); count+=PacketSize; pstore(blockA+count, cj.pconj(C)); count+=PacketSize; } if(PanelMode) count += (3*PacketSize) * (stride-offset-depth); } } // Pack 2 packets if(Pack1>=2*PacketSize) { for(; i(i+0*PacketSize, k); B = lhs.template loadPacket(i+1*PacketSize, k); pstore(blockA+count, cj.pconj(A)); count+=PacketSize; pstore(blockA+count, cj.pconj(B)); count+=PacketSize; } if(PanelMode) count += (2*PacketSize) * (stride-offset-depth); } } // Pack 1 packets if(Pack1>=1*PacketSize) { for(; i(i+0*PacketSize, k); pstore(blockA+count, cj.pconj(A)); count+=PacketSize; } if(PanelMode) count += (1*PacketSize) * (stride-offset-depth); } } // Pack half packets if(HasHalf && Pack1>=HalfPacketSize) { for(; i(i+0*(HalfPacketSize), k); pstoreu(blockA+count, cj.pconj(A)); count+=HalfPacketSize; } if(PanelMode) count += (HalfPacketSize) * (stride-offset-depth); } } // Pack quarter packets if(HasQuarter && Pack1>=QuarterPacketSize) { for(; i(i+0*(QuarterPacketSize), k); pstoreu(blockA+count, cj.pconj(A)); count+=QuarterPacketSize; } if(PanelMode) count += (QuarterPacketSize) * (stride-offset-depth); } } // Pack2 may be *smaller* than PacketSize—that happens for // products like real * complex, where we have to go half the // progress on the lhs in order to duplicate those operands to // address both real & imaginary parts on the rhs. This portion will // pack those half ones until they match the number expected on the // last peeling loop at this point (for the rhs). if(Pack21) { for(; i struct gemm_pack_lhs { typedef typename DataMapper::LinearMapper LinearMapper; EIGEN_DONT_INLINE void operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template EIGEN_DONT_INLINE void gemm_pack_lhs ::operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { typedef typename unpacket_traits::half HalfPacket; typedef typename unpacket_traits::half>::half QuarterPacket; enum { PacketSize = unpacket_traits::size, HalfPacketSize = unpacket_traits::size, QuarterPacketSize = unpacket_traits::size, HasHalf = (int)HalfPacketSize < (int)PacketSize, HasQuarter = (int)QuarterPacketSize < (int)HalfPacketSize}; EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK LHS"); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(offset); eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride)); conj_if::IsComplex && Conjugate> cj; Index count = 0; bool gone_half = false, gone_quarter = false, gone_last = false; Index i = 0; int pack = Pack1; int psize = PacketSize; while(pack>0) { Index remaining_rows = rows-i; Index peeled_mc = gone_last ? Pack2>1 ? (rows/pack)*pack : 0 : i+(remaining_rows/pack)*pack; Index starting_pos = i; for(; i=psize && psize >= QuarterPacketSize) { const Index peeled_k = (depth/psize)*psize; for(; k kernel; for (int p = 0; p < psize; ++p) kernel.packet[p] = lhs.template loadPacket(i+p+m, k); ptranspose(kernel); for (int p = 0; p < psize; ++p) pstore(blockA+count+m+(pack)*p, cj.pconj(kernel.packet[p])); } else if (HasHalf && psize == HalfPacketSize) { gone_half = true; PacketBlock kernel_half; for (int p = 0; p < psize; ++p) kernel_half.packet[p] = lhs.template loadPacket(i+p+m, k); ptranspose(kernel_half); for (int p = 0; p < psize; ++p) pstore(blockA+count+m+(pack)*p, cj.pconj(kernel_half.packet[p])); } else if (HasQuarter && psize == QuarterPacketSize) { gone_quarter = true; PacketBlock kernel_quarter; for (int p = 0; p < psize; ++p) kernel_quarter.packet[p] = lhs.template loadPacket(i+p+m, k); ptranspose(kernel_quarter); for (int p = 0; p < psize; ++p) pstore(blockA+count+m+(pack)*p, cj.pconj(kernel_quarter.packet[p])); } } count += psize*pack; } } for(; k= psize/2 || left >= psize/4) && ((psize/2 == HalfPacketSize && HasHalf && !gone_half) || (psize/2 == QuarterPacketSize && HasQuarter && !gone_quarter))) { psize /= 2; pack = psize; continue; } // Pack2 may be *smaller* than PacketSize—that happens for // products like real * complex, where we have to go half the // progress on the lhs in order to duplicate those operands to // address both real & imaginary parts on the rhs. This portion will // pack those half ones until they match the number expected on the // last peeling loop at this point (for the rhs). if (Pack2 < PacketSize && !gone_last) { gone_last = true; psize = pack = left & ~1; } } } for(; i struct gemm_pack_rhs { typedef typename packet_traits::type Packet; typedef typename DataMapper::LinearMapper LinearMapper; enum { PacketSize = packet_traits::size }; EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template EIGEN_DONT_INLINE void gemm_pack_rhs ::operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS COLMAJOR"); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(offset); eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride)); conj_if::IsComplex && Conjugate> cj; Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0; Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0; Index count = 0; const Index peeled_k = (depth/PacketSize)*PacketSize; // if(nr>=8) // { // for(Index j2=0; j2 kernel; // for (int p = 0; p < PacketSize; ++p) { // kernel.packet[p] = ploadu(&rhs[(j2+p)*rhsStride+k]); // } // ptranspose(kernel); // for (int p = 0; p < PacketSize; ++p) { // pstoreu(blockB+count, cj.pconj(kernel.packet[p])); // count+=PacketSize; // } // } // } // for(; k=4) { for(Index j2=packet_cols8; j2 kernel; kernel.packet[0 ] = dm0.template loadPacket(k); kernel.packet[1%PacketSize] = dm1.template loadPacket(k); kernel.packet[2%PacketSize] = dm2.template loadPacket(k); kernel.packet[3%PacketSize] = dm3.template loadPacket(k); ptranspose(kernel); pstoreu(blockB+count+0*PacketSize, cj.pconj(kernel.packet[0])); pstoreu(blockB+count+1*PacketSize, cj.pconj(kernel.packet[1%PacketSize])); pstoreu(blockB+count+2*PacketSize, cj.pconj(kernel.packet[2%PacketSize])); pstoreu(blockB+count+3*PacketSize, cj.pconj(kernel.packet[3%PacketSize])); count+=4*PacketSize; } } for(; k struct gemm_pack_rhs { typedef typename packet_traits::type Packet; typedef typename unpacket_traits::half HalfPacket; typedef typename unpacket_traits::half>::half QuarterPacket; typedef typename DataMapper::LinearMapper LinearMapper; enum { PacketSize = packet_traits::size, HalfPacketSize = unpacket_traits::size, QuarterPacketSize = unpacket_traits::size}; EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0) { EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS ROWMAJOR"); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(offset); eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride)); const bool HasHalf = (int)HalfPacketSize < (int)PacketSize; const bool HasQuarter = (int)QuarterPacketSize < (int)HalfPacketSize; conj_if::IsComplex && Conjugate> cj; Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0; Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0; Index count = 0; // if(nr>=8) // { // for(Index j2=0; j2(&rhs[k*rhsStride + j2]); // pstoreu(blockB+count, cj.pconj(A)); // } else if (PacketSize==4) { // Packet A = ploadu(&rhs[k*rhsStride + j2]); // Packet B = ploadu(&rhs[k*rhsStride + j2 + PacketSize]); // pstoreu(blockB+count, cj.pconj(A)); // pstoreu(blockB+count+PacketSize, cj.pconj(B)); // } else { // const Scalar* b0 = &rhs[k*rhsStride + j2]; // blockB[count+0] = cj(b0[0]); // blockB[count+1] = cj(b0[1]); // blockB[count+2] = cj(b0[2]); // blockB[count+3] = cj(b0[3]); // blockB[count+4] = cj(b0[4]); // blockB[count+5] = cj(b0[5]); // blockB[count+6] = cj(b0[6]); // blockB[count+7] = cj(b0[7]); // } // count += 8; // } // // skip what we have after // if(PanelMode) count += 8 * (stride-offset-depth); // } // } if(nr>=4) { for(Index j2=packet_cols8; j2(k, j2); pstoreu(blockB+count, cj.pconj(A)); count += PacketSize; } else if (HasHalf && HalfPacketSize==4) { HalfPacket A = rhs.template loadPacket(k, j2); pstoreu(blockB+count, cj.pconj(A)); count += HalfPacketSize; } else if (HasQuarter && QuarterPacketSize==4) { QuarterPacket A = rhs.template loadPacket(k, j2); pstoreu(blockB+count, cj.pconj(A)); count += QuarterPacketSize; } else { const LinearMapper dm0 = rhs.getLinearMapper(k, j2); blockB[count+0] = cj(dm0(0)); blockB[count+1] = cj(dm0(1)); blockB[count+2] = cj(dm0(2)); blockB[count+3] = cj(dm0(3)); count += 4; } } // skip what we have after if(PanelMode) count += 4 * (stride-offset-depth); } } // copy the remaining columns one at a time (nr==1) for(Index j2=packet_cols4; j2 // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H #define EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H namespace Eigen { template struct selfadjoint_rank1_update; namespace internal { /********************************************************************** * This file implements a general A * B product while * evaluating only one triangular part of the product. * This is a more general version of self adjoint product (C += A A^T) * as the level 3 SYRK Blas routine. **********************************************************************/ // forward declarations (defined at the end of this file) template struct tribb_kernel; /* Optimized matrix-matrix product evaluating only one triangular half */ template struct general_matrix_matrix_triangular_product; // as usual if the result is row major => we transpose the product template struct general_matrix_matrix_triangular_product { typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; static EIGEN_STRONG_INLINE void run(Index size, Index depth,const LhsScalar* lhs, Index lhsStride, const RhsScalar* rhs, Index rhsStride, ResScalar* res, Index resIncr, Index resStride, const ResScalar& alpha, level3_blocking& blocking) { general_matrix_matrix_triangular_product ::run(size,depth,rhs,rhsStride,lhs,lhsStride,res,resIncr,resStride,alpha,blocking); } }; template struct general_matrix_matrix_triangular_product { typedef typename ScalarBinaryOpTraits::ReturnType ResScalar; static EIGEN_STRONG_INLINE void run(Index size, Index depth,const LhsScalar* _lhs, Index lhsStride, const RhsScalar* _rhs, Index rhsStride, ResScalar* _res, Index resIncr, Index resStride, const ResScalar& alpha, level3_blocking& blocking) { typedef gebp_traits Traits; typedef const_blas_data_mapper LhsMapper; typedef const_blas_data_mapper RhsMapper; typedef blas_data_mapper ResMapper; LhsMapper lhs(_lhs,lhsStride); RhsMapper rhs(_rhs,rhsStride); ResMapper res(_res, resStride, resIncr); Index kc = blocking.kc(); Index mc = (std::min)(size,blocking.mc()); // !!! mc must be a multiple of nr: if(mc > Traits::nr) mc = (mc/Traits::nr)*Traits::nr; std::size_t sizeA = kc*mc; std::size_t sizeB = kc*size; ei_declare_aligned_stack_constructed_variable(LhsScalar, blockA, sizeA, blocking.blockA()); ei_declare_aligned_stack_constructed_variable(RhsScalar, blockB, sizeB, blocking.blockB()); gemm_pack_lhs pack_lhs; gemm_pack_rhs pack_rhs; gebp_kernel gebp; tribb_kernel sybb; for(Index k2=0; k2 processed with gebp or skipped // 2 - the actual_mc x actual_mc symmetric block => processed with a special kernel // 3 - after the diagonal => processed with gebp or skipped if (UpLo==Lower) gebp(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, (std::min)(size,i2), alpha, -1, -1, 0, 0); sybb(_res+resStride*i2 + resIncr*i2, resIncr, resStride, blockA, blockB + actual_kc*i2, actual_mc, actual_kc, alpha); if (UpLo==Upper) { Index j2 = i2+actual_mc; gebp(res.getSubMapper(i2, j2), blockA, blockB+actual_kc*j2, actual_mc, actual_kc, (std::max)(Index(0), size-j2), alpha, -1, -1, 0, 0); } } } } }; // Optimized packed Block * packed Block product kernel evaluating only one given triangular part // This kernel is built on top of the gebp kernel: // - the current destination block is processed per panel of actual_mc x BlockSize // where BlockSize is set to the minimal value allowing gebp to be as fast as possible // - then, as usual, each panel is split into three parts along the diagonal, // the sub blocks above and below the diagonal are processed as usual, // while the triangular block overlapping the diagonal is evaluated into a // small temporary buffer which is then accumulated into the result using a // triangular traversal. template struct tribb_kernel { typedef gebp_traits Traits; typedef typename Traits::ResScalar ResScalar; enum { BlockSize = meta_least_common_multiple::ret }; void operator()(ResScalar* _res, Index resIncr, Index resStride, const LhsScalar* blockA, const RhsScalar* blockB, Index size, Index depth, const ResScalar& alpha) { typedef blas_data_mapper ResMapper; typedef blas_data_mapper BufferMapper; ResMapper res(_res, resStride, resIncr); gebp_kernel gebp_kernel1; gebp_kernel gebp_kernel2; Matrix buffer((internal::constructor_without_unaligned_array_assert())); // let's process the block per panel of actual_mc x BlockSize, // again, each is split into three parts, etc. for (Index j=0; j(BlockSize,size - j); const RhsScalar* actual_b = blockB+j*depth; if(UpLo==Upper) gebp_kernel1(res.getSubMapper(0, j), blockA, actual_b, j, depth, actualBlockSize, alpha, -1, -1, 0, 0); // selfadjoint micro block { Index i = j; buffer.setZero(); // 1 - apply the kernel on the temporary buffer gebp_kernel2(BufferMapper(buffer.data(), BlockSize), blockA+depth*i, actual_b, actualBlockSize, depth, actualBlockSize, alpha, -1, -1, 0, 0); // 2 - triangular accumulation for(Index j1=0; j1 struct general_product_to_triangular_selector; template struct general_product_to_triangular_selector { static void run(MatrixType& mat, const ProductType& prod, const typename MatrixType::Scalar& alpha, bool beta) { typedef typename MatrixType::Scalar Scalar; typedef typename internal::remove_all::type Lhs; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhs; typedef typename internal::remove_all::type _ActualLhs; typename internal::add_const_on_value_type::type actualLhs = LhsBlasTraits::extract(prod.lhs()); typedef typename internal::remove_all::type Rhs; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhs; typedef typename internal::remove_all::type _ActualRhs; typename internal::add_const_on_value_type::type actualRhs = RhsBlasTraits::extract(prod.rhs()); Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs().derived()) * RhsBlasTraits::extractScalarFactor(prod.rhs().derived()); if(!beta) mat.template triangularView().setZero(); enum { StorageOrder = (internal::traits::Flags&RowMajorBit) ? RowMajor : ColMajor, UseLhsDirectly = _ActualLhs::InnerStrideAtCompileTime==1, UseRhsDirectly = _ActualRhs::InnerStrideAtCompileTime==1 }; internal::gemv_static_vector_if static_lhs; ei_declare_aligned_stack_constructed_variable(Scalar, actualLhsPtr, actualLhs.size(), (UseLhsDirectly ? const_cast(actualLhs.data()) : static_lhs.data())); if(!UseLhsDirectly) Map(actualLhsPtr, actualLhs.size()) = actualLhs; internal::gemv_static_vector_if static_rhs; ei_declare_aligned_stack_constructed_variable(Scalar, actualRhsPtr, actualRhs.size(), (UseRhsDirectly ? const_cast(actualRhs.data()) : static_rhs.data())); if(!UseRhsDirectly) Map(actualRhsPtr, actualRhs.size()) = actualRhs; selfadjoint_rank1_update::IsComplex, RhsBlasTraits::NeedToConjugate && NumTraits::IsComplex> ::run(actualLhs.size(), mat.data(), mat.outerStride(), actualLhsPtr, actualRhsPtr, actualAlpha); } }; template struct general_product_to_triangular_selector { static void run(MatrixType& mat, const ProductType& prod, const typename MatrixType::Scalar& alpha, bool beta) { typedef typename internal::remove_all::type Lhs; typedef internal::blas_traits LhsBlasTraits; typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhs; typedef typename internal::remove_all::type _ActualLhs; typename internal::add_const_on_value_type::type actualLhs = LhsBlasTraits::extract(prod.lhs()); typedef typename internal::remove_all::type Rhs; typedef internal::blas_traits RhsBlasTraits; typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhs; typedef typename internal::remove_all::type _ActualRhs; typename internal::add_const_on_value_type::type actualRhs = RhsBlasTraits::extract(prod.rhs()); typename ProductType::Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs().derived()) * RhsBlasTraits::extractScalarFactor(prod.rhs().derived()); if(!beta) mat.template triangularView().setZero(); enum { IsRowMajor = (internal::traits::Flags&RowMajorBit) ? 1 : 0, LhsIsRowMajor = _ActualLhs::Flags&RowMajorBit ? 1 : 0, RhsIsRowMajor = _ActualRhs::Flags&RowMajorBit ? 1 : 0, SkipDiag = (UpLo&(UnitDiag|ZeroDiag))!=0 }; Index size = mat.cols(); if(SkipDiag) size--; Index depth = actualLhs.cols(); typedef internal::gemm_blocking_space BlockingType; BlockingType blocking(size, size, depth, 1, false); internal::general_matrix_matrix_triangular_product ::run(size, depth, &actualLhs.coeffRef(SkipDiag&&(UpLo&Lower)==Lower ? 1 : 0,0), actualLhs.outerStride(), &actualRhs.coeffRef(0,SkipDiag&&(UpLo&Upper)==Upper ? 1 : 0), actualRhs.outerStride(), mat.data() + (SkipDiag ? (bool(IsRowMajor) != ((UpLo&Lower)==Lower) ? mat.innerStride() : mat.outerStride() ) : 0), mat.innerStride(), mat.outerStride(), actualAlpha, blocking); } }; template template EIGEN_DEVICE_FUNC TriangularView& TriangularViewImpl::_assignProduct(const ProductType& prod, const Scalar& alpha, bool beta) { EIGEN_STATIC_ASSERT((UpLo&UnitDiag)==0, WRITING_TO_TRIANGULAR_PART_WITH_UNIT_DIAGONAL_IS_NOT_SUPPORTED); eigen_assert(derived().nestedExpression().rows() == prod.rows() && derived().cols() == prod.cols()); general_product_to_triangular_selector::InnerSize==1>::run(derived().nestedExpression().const_cast_derived(), prod, alpha, beta); return derived(); } } // end namespace Eigen #endif // EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H RcppEigen/inst/include/Eigen/src/Core/DenseStorage.h0000644000176200001440000006142014562417221021777 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2009 Benoit Jacob // Copyright (C) 2010-2013 Hauke Heibel // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIXSTORAGE_H #define EIGEN_MATRIXSTORAGE_H #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN #define EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(X) X; EIGEN_DENSE_STORAGE_CTOR_PLUGIN; #else #define EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(X) #endif namespace Eigen { namespace internal { struct constructor_without_unaligned_array_assert {}; template EIGEN_DEVICE_FUNC void check_static_allocation_size() { // if EIGEN_STACK_ALLOCATION_LIMIT is defined to 0, then no limit #if EIGEN_STACK_ALLOCATION_LIMIT EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG); #endif } /** \internal * Static array. If the MatrixOrArrayOptions require auto-alignment, the array will be automatically aligned: * to 16 bytes boundary if the total size is a multiple of 16 bytes. */ template ::value > struct plain_array { T array[Size]; EIGEN_DEVICE_FUNC plain_array() { check_static_allocation_size(); } EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) { check_static_allocation_size(); } }; #if defined(EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT) #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask) #elif EIGEN_GNUC_AT_LEAST(4,7) // GCC 4.7 is too aggressive in its optimizations and remove the alignment test based on the fact the array is declared to be aligned. // See this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53900 // Hiding the origin of the array pointer behind a function argument seems to do the trick even if the function is inlined: template EIGEN_ALWAYS_INLINE PtrType eigen_unaligned_array_assert_workaround_gcc47(PtrType array) { return array; } #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask) \ eigen_assert((internal::UIntPtr(eigen_unaligned_array_assert_workaround_gcc47(array)) & (sizemask)) == 0 \ && "this assertion is explained here: " \ "http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" \ " **** READ THIS WEB PAGE !!! ****"); #else #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask) \ eigen_assert((internal::UIntPtr(array) & (sizemask)) == 0 \ && "this assertion is explained here: " \ "http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" \ " **** READ THIS WEB PAGE !!! ****"); #endif template struct plain_array { EIGEN_ALIGN_TO_BOUNDARY(8) T array[Size]; EIGEN_DEVICE_FUNC plain_array() { EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(7); check_static_allocation_size(); } EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) { check_static_allocation_size(); } }; template struct plain_array { EIGEN_ALIGN_TO_BOUNDARY(16) T array[Size]; EIGEN_DEVICE_FUNC plain_array() { EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(15); check_static_allocation_size(); } EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) { check_static_allocation_size(); } }; template struct plain_array { EIGEN_ALIGN_TO_BOUNDARY(32) T array[Size]; EIGEN_DEVICE_FUNC plain_array() { EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(31); check_static_allocation_size(); } EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) { check_static_allocation_size(); } }; template struct plain_array { EIGEN_ALIGN_TO_BOUNDARY(64) T array[Size]; EIGEN_DEVICE_FUNC plain_array() { EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(63); check_static_allocation_size(); } EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) { check_static_allocation_size(); } }; template struct plain_array { T array[1]; EIGEN_DEVICE_FUNC plain_array() {} EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) {} }; struct plain_array_helper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void copy(const plain_array& src, const Eigen::Index size, plain_array& dst) { smart_copy(src.array, src.array + size, dst.array); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void swap(plain_array& a, const Eigen::Index a_size, plain_array& b, const Eigen::Index b_size) { if (a_size < b_size) { std::swap_ranges(b.array, b.array + a_size, a.array); smart_move(b.array + a_size, b.array + b_size, a.array + a_size); } else if (a_size > b_size) { std::swap_ranges(a.array, a.array + b_size, b.array); smart_move(a.array + b_size, a.array + a_size, b.array + b_size); } else { std::swap_ranges(a.array, a.array + a_size, b.array); } } }; } // end namespace internal /** \internal * * \class DenseStorage * \ingroup Core_Module * * \brief Stores the data of a matrix * * This class stores the data of fixed-size, dynamic-size or mixed matrices * in a way as compact as possible. * * \sa Matrix */ template class DenseStorage; // purely fixed-size matrix template class DenseStorage { internal::plain_array m_data; public: EIGEN_DEVICE_FUNC DenseStorage() { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = Size) } EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(internal::constructor_without_unaligned_array_assert()) {} #if !EIGEN_HAS_CXX11 || defined(EIGEN_DENSE_STORAGE_CTOR_PLUGIN) EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(other.m_data) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = Size) } #else EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage&) = default; #endif #if !EIGEN_HAS_CXX11 EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) m_data = other.m_data; return *this; } #else EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) = default; #endif #if EIGEN_HAS_RVALUE_REFERENCES #if !EIGEN_HAS_CXX11 EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT : m_data(std::move(other.m_data)) { } EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT { if (this != &other) m_data = std::move(other.m_data); return *this; } #else EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&&) = default; EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&&) = default; #endif #endif EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) eigen_internal_assert(size==rows*cols && rows==_Rows && cols==_Cols); EIGEN_UNUSED_VARIABLE(size); EIGEN_UNUSED_VARIABLE(rows); EIGEN_UNUSED_VARIABLE(cols); } EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { numext::swap(m_data, other.m_data); } EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;} EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {} EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {} EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; } EIGEN_DEVICE_FUNC T *data() { return m_data.array; } }; // null matrix template class DenseStorage { public: EIGEN_DEVICE_FUNC DenseStorage() {} EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) {} EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage&) {} EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) { return *this; } EIGEN_DEVICE_FUNC DenseStorage(Index,Index,Index) {} EIGEN_DEVICE_FUNC void swap(DenseStorage& ) {} EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;} EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) EIGEN_NOEXCEPT {return _Cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index,Index,Index) {} EIGEN_DEVICE_FUNC void resize(Index,Index,Index) {} EIGEN_DEVICE_FUNC const T *data() const { return 0; } EIGEN_DEVICE_FUNC T *data() { return 0; } }; // more specializations for null matrices; these are necessary to resolve ambiguities template class DenseStorage : public DenseStorage { }; template class DenseStorage : public DenseStorage { }; template class DenseStorage : public DenseStorage { }; // dynamic-size matrix with fixed-size storage template class DenseStorage { internal::plain_array m_data; Index m_rows; Index m_cols; public: EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0), m_cols(0) {} EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0), m_cols(0) {} EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows), m_cols(other.m_cols) { internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { m_rows = other.m_rows; m_cols = other.m_cols; internal::plain_array_helper::copy(other.m_data, m_rows * m_cols, m_data); } return *this; } EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index cols) : m_rows(rows), m_cols(cols) {} EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { internal::plain_array_helper::swap(m_data, m_rows * m_cols, other.m_data, other.m_rows * other.m_cols); numext::swap(m_rows,other.m_rows); numext::swap(m_cols,other.m_cols); } EIGEN_DEVICE_FUNC Index rows() const {return m_rows;} EIGEN_DEVICE_FUNC Index cols() const {return m_cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index cols) { m_rows = rows; m_cols = cols; } EIGEN_DEVICE_FUNC void resize(Index, Index rows, Index cols) { m_rows = rows; m_cols = cols; } EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; } EIGEN_DEVICE_FUNC T *data() { return m_data.array; } }; // dynamic-size matrix with fixed-size storage and fixed width template class DenseStorage { internal::plain_array m_data; Index m_rows; public: EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0) {} EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0) {} EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(other.m_rows) { internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { m_rows = other.m_rows; internal::plain_array_helper::copy(other.m_data, m_rows * _Cols, m_data); } return *this; } EIGEN_DEVICE_FUNC DenseStorage(Index, Index rows, Index) : m_rows(rows) {} EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { internal::plain_array_helper::swap(m_data, m_rows * _Cols, other.m_data, other.m_rows * _Cols); numext::swap(m_rows, other.m_rows); } EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols(void) const EIGEN_NOEXCEPT {return _Cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index, Index rows, Index) { m_rows = rows; } EIGEN_DEVICE_FUNC void resize(Index, Index rows, Index) { m_rows = rows; } EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; } EIGEN_DEVICE_FUNC T *data() { return m_data.array; } }; // dynamic-size matrix with fixed-size storage and fixed height template class DenseStorage { internal::plain_array m_data; Index m_cols; public: EIGEN_DEVICE_FUNC DenseStorage() : m_cols(0) {} EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(internal::constructor_without_unaligned_array_assert()), m_cols(0) {} EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::constructor_without_unaligned_array_assert()), m_cols(other.m_cols) { internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { m_cols = other.m_cols; internal::plain_array_helper::copy(other.m_data, _Rows * m_cols, m_data); } return *this; } EIGEN_DEVICE_FUNC DenseStorage(Index, Index, Index cols) : m_cols(cols) {} EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { internal::plain_array_helper::swap(m_data, _Rows * m_cols, other.m_data, _Rows * other.m_cols); numext::swap(m_cols, other.m_cols); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows(void) const EIGEN_NOEXCEPT {return _Rows;} EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index, Index, Index cols) { m_cols = cols; } EIGEN_DEVICE_FUNC void resize(Index, Index, Index cols) { m_cols = cols; } EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; } EIGEN_DEVICE_FUNC T *data() { return m_data.array; } }; // purely dynamic matrix. template class DenseStorage { T *m_data; Index m_rows; Index m_cols; public: EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_rows(0), m_cols(0) {} EIGEN_DEVICE_FUNC explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_rows(0), m_cols(0) {} EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) : m_data(internal::conditional_aligned_new_auto(size)), m_rows(rows), m_cols(cols) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) eigen_internal_assert(size==rows*cols && rows>=0 && cols >=0); } EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::conditional_aligned_new_auto(other.m_rows*other.m_cols)) , m_rows(other.m_rows) , m_cols(other.m_cols) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = m_rows*m_cols) internal::smart_copy(other.m_data, other.m_data+other.m_rows*other.m_cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { DenseStorage tmp(other); this->swap(tmp); } return *this; } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT : m_data(std::move(other.m_data)) , m_rows(std::move(other.m_rows)) , m_cols(std::move(other.m_cols)) { other.m_data = nullptr; other.m_rows = 0; other.m_cols = 0; } EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT { numext::swap(m_data, other.m_data); numext::swap(m_rows, other.m_rows); numext::swap(m_cols, other.m_cols); return *this; } #endif EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto(m_data, m_rows*m_cols); } EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { numext::swap(m_data,other.m_data); numext::swap(m_rows,other.m_rows); numext::swap(m_cols,other.m_cols); } EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;} EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;} void conservativeResize(Index size, Index rows, Index cols) { m_data = internal::conditional_aligned_realloc_new_auto(m_data, size, m_rows*m_cols); m_rows = rows; m_cols = cols; } EIGEN_DEVICE_FUNC void resize(Index size, Index rows, Index cols) { if(size != m_rows*m_cols) { internal::conditional_aligned_delete_auto(m_data, m_rows*m_cols); if (size>0) // >0 and not simply !=0 to let the compiler knows that size cannot be negative m_data = internal::conditional_aligned_new_auto(size); else m_data = 0; EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) } m_rows = rows; m_cols = cols; } EIGEN_DEVICE_FUNC const T *data() const { return m_data; } EIGEN_DEVICE_FUNC T *data() { return m_data; } }; // matrix with dynamic width and fixed height (so that matrix has dynamic size). template class DenseStorage { T *m_data; Index m_cols; public: EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_cols(0) {} explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_cols(0) {} EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) : m_data(internal::conditional_aligned_new_auto(size)), m_cols(cols) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) eigen_internal_assert(size==rows*cols && rows==_Rows && cols >=0); EIGEN_UNUSED_VARIABLE(rows); } EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::conditional_aligned_new_auto(_Rows*other.m_cols)) , m_cols(other.m_cols) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = m_cols*_Rows) internal::smart_copy(other.m_data, other.m_data+_Rows*m_cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { DenseStorage tmp(other); this->swap(tmp); } return *this; } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT : m_data(std::move(other.m_data)) , m_cols(std::move(other.m_cols)) { other.m_data = nullptr; other.m_cols = 0; } EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT { numext::swap(m_data, other.m_data); numext::swap(m_cols, other.m_cols); return *this; } #endif EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto(m_data, _Rows*m_cols); } EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { numext::swap(m_data,other.m_data); numext::swap(m_cols,other.m_cols); } EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index rows(void) EIGEN_NOEXCEPT {return _Rows;} EIGEN_DEVICE_FUNC Index cols(void) const EIGEN_NOEXCEPT {return m_cols;} EIGEN_DEVICE_FUNC void conservativeResize(Index size, Index, Index cols) { m_data = internal::conditional_aligned_realloc_new_auto(m_data, size, _Rows*m_cols); m_cols = cols; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize(Index size, Index, Index cols) { if(size != _Rows*m_cols) { internal::conditional_aligned_delete_auto(m_data, _Rows*m_cols); if (size>0) // >0 and not simply !=0 to let the compiler knows that size cannot be negative m_data = internal::conditional_aligned_new_auto(size); else m_data = 0; EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) } m_cols = cols; } EIGEN_DEVICE_FUNC const T *data() const { return m_data; } EIGEN_DEVICE_FUNC T *data() { return m_data; } }; // matrix with dynamic height and fixed width (so that matrix has dynamic size). template class DenseStorage { T *m_data; Index m_rows; public: EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_rows(0) {} explicit DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_rows(0) {} EIGEN_DEVICE_FUNC DenseStorage(Index size, Index rows, Index cols) : m_data(internal::conditional_aligned_new_auto(size)), m_rows(rows) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) eigen_internal_assert(size==rows*cols && rows>=0 && cols == _Cols); EIGEN_UNUSED_VARIABLE(cols); } EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage& other) : m_data(internal::conditional_aligned_new_auto(other.m_rows*_Cols)) , m_rows(other.m_rows) { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN(Index size = m_rows*_Cols) internal::smart_copy(other.m_data, other.m_data+other.m_rows*_Cols, m_data); } EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage& other) { if (this != &other) { DenseStorage tmp(other); this->swap(tmp); } return *this; } #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC DenseStorage(DenseStorage&& other) EIGEN_NOEXCEPT : m_data(std::move(other.m_data)) , m_rows(std::move(other.m_rows)) { other.m_data = nullptr; other.m_rows = 0; } EIGEN_DEVICE_FUNC DenseStorage& operator=(DenseStorage&& other) EIGEN_NOEXCEPT { numext::swap(m_data, other.m_data); numext::swap(m_rows, other.m_rows); return *this; } #endif EIGEN_DEVICE_FUNC ~DenseStorage() { internal::conditional_aligned_delete_auto(m_data, _Cols*m_rows); } EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { numext::swap(m_data,other.m_data); numext::swap(m_rows,other.m_rows); } EIGEN_DEVICE_FUNC Index rows(void) const EIGEN_NOEXCEPT {return m_rows;} EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR Index cols(void) {return _Cols;} void conservativeResize(Index size, Index rows, Index) { m_data = internal::conditional_aligned_realloc_new_auto(m_data, size, m_rows*_Cols); m_rows = rows; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize(Index size, Index rows, Index) { if(size != m_rows*_Cols) { internal::conditional_aligned_delete_auto(m_data, _Cols*m_rows); if (size>0) // >0 and not simply !=0 to let the compiler knows that size cannot be negative m_data = internal::conditional_aligned_new_auto(size); else m_data = 0; EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN({}) } m_rows = rows; } EIGEN_DEVICE_FUNC const T *data() const { return m_data; } EIGEN_DEVICE_FUNC T *data() { return m_data; } }; } // end namespace Eigen #endif // EIGEN_MATRIX_H RcppEigen/inst/include/Eigen/src/Core/SelfCwiseBinaryOp.h0000644000176200001440000000324114107270226022736 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFCWISEBINARYOP_H #define EIGEN_SELFCWISEBINARYOP_H namespace Eigen { // TODO generalize the scalar type of 'other' template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::operator*=(const Scalar& other) { internal::call_assignment(this->derived(), PlainObject::Constant(rows(),cols(),other), internal::mul_assign_op()); return derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& ArrayBase::operator+=(const Scalar& other) { internal::call_assignment(this->derived(), PlainObject::Constant(rows(),cols(),other), internal::add_assign_op()); return derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& ArrayBase::operator-=(const Scalar& other) { internal::call_assignment(this->derived(), PlainObject::Constant(rows(),cols(),other), internal::sub_assign_op()); return derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::operator/=(const Scalar& other) { internal::call_assignment(this->derived(), PlainObject::Constant(rows(),cols(),other), internal::div_assign_op()); return derived(); } } // end namespace Eigen #endif // EIGEN_SELFCWISEBINARYOP_H RcppEigen/inst/include/Eigen/src/Core/CwiseBinaryOp.h0000644000176200001440000001734514562417221022141 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2014 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CWISE_BINARY_OP_H #define EIGEN_CWISE_BINARY_OP_H namespace Eigen { namespace internal { template struct traits > { // we must not inherit from traits since it has // the potential to cause problems with MSVC typedef typename remove_all::type Ancestor; typedef typename traits::XprKind XprKind; enum { RowsAtCompileTime = traits::RowsAtCompileTime, ColsAtCompileTime = traits::ColsAtCompileTime, MaxRowsAtCompileTime = traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = traits::MaxColsAtCompileTime }; // even though we require Lhs and Rhs to have the same scalar type (see CwiseBinaryOp constructor), // we still want to handle the case when the result type is different. typedef typename result_of< BinaryOp( const typename Lhs::Scalar&, const typename Rhs::Scalar& ) >::type Scalar; typedef typename cwise_promote_storage_type::StorageKind, typename traits::StorageKind, BinaryOp>::ret StorageKind; typedef typename promote_index_type::StorageIndex, typename traits::StorageIndex>::type StorageIndex; typedef typename Lhs::Nested LhsNested; typedef typename Rhs::Nested RhsNested; typedef typename remove_reference::type _LhsNested; typedef typename remove_reference::type _RhsNested; enum { Flags = cwise_promote_storage_order::StorageKind,typename traits::StorageKind,_LhsNested::Flags & RowMajorBit,_RhsNested::Flags & RowMajorBit>::value }; }; } // end namespace internal template class CwiseBinaryOpImpl; /** \class CwiseBinaryOp * \ingroup Core_Module * * \brief Generic expression where a coefficient-wise binary operator is applied to two expressions * * \tparam BinaryOp template functor implementing the operator * \tparam LhsType the type of the left-hand side * \tparam RhsType the type of the right-hand side * * This class represents an expression where a coefficient-wise binary operator is applied to two expressions. * It is the return type of binary operators, by which we mean only those binary operators where * both the left-hand side and the right-hand side are Eigen expressions. * For example, the return type of matrix1+matrix2 is a CwiseBinaryOp. * * Most of the time, this is the only way that it is used, so you typically don't have to name * CwiseBinaryOp types explicitly. * * \sa MatrixBase::binaryExpr(const MatrixBase &,const CustomBinaryOp &) const, class CwiseUnaryOp, class CwiseNullaryOp */ template class CwiseBinaryOp : public CwiseBinaryOpImpl< BinaryOp, LhsType, RhsType, typename internal::cwise_promote_storage_type::StorageKind, typename internal::traits::StorageKind, BinaryOp>::ret>, internal::no_assignment_operator { public: typedef typename internal::remove_all::type Functor; typedef typename internal::remove_all::type Lhs; typedef typename internal::remove_all::type Rhs; typedef typename CwiseBinaryOpImpl< BinaryOp, LhsType, RhsType, typename internal::cwise_promote_storage_type::StorageKind, typename internal::traits::StorageKind, BinaryOp>::ret>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseBinaryOp) typedef typename internal::ref_selector::type LhsNested; typedef typename internal::ref_selector::type RhsNested; typedef typename internal::remove_reference::type _LhsNested; typedef typename internal::remove_reference::type _RhsNested; #if EIGEN_COMP_MSVC && EIGEN_HAS_CXX11 //Required for Visual Studio or the Copy constructor will probably not get inlined! EIGEN_STRONG_INLINE CwiseBinaryOp(const CwiseBinaryOp&) = default; #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp()) : m_lhs(aLhs), m_rhs(aRhs), m_functor(func) { EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp,typename Lhs::Scalar,typename Rhs::Scalar); // require the sizes to match EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Lhs, Rhs) eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { // return the fixed size type if available to enable compile time optimizations return internal::traits::type>::RowsAtCompileTime==Dynamic ? m_rhs.rows() : m_lhs.rows(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { // return the fixed size type if available to enable compile time optimizations return internal::traits::type>::ColsAtCompileTime==Dynamic ? m_rhs.cols() : m_lhs.cols(); } /** \returns the left hand side nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const _LhsNested& lhs() const { return m_lhs; } /** \returns the right hand side nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const _RhsNested& rhs() const { return m_rhs; } /** \returns the functor representing the binary operation */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const BinaryOp& functor() const { return m_functor; } protected: LhsNested m_lhs; RhsNested m_rhs; const BinaryOp m_functor; }; // Generic API dispatcher template class CwiseBinaryOpImpl : public internal::generic_xpr_base >::type { public: typedef typename internal::generic_xpr_base >::type Base; }; /** replaces \c *this by \c *this - \a other. * * \returns a reference to \c *this */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived & MatrixBase::operator-=(const MatrixBase &other) { call_assignment(derived(), other.derived(), internal::sub_assign_op()); return derived(); } /** replaces \c *this by \c *this + \a other. * * \returns a reference to \c *this */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived & MatrixBase::operator+=(const MatrixBase& other) { call_assignment(derived(), other.derived(), internal::add_assign_op()); return derived(); } } // end namespace Eigen #endif // EIGEN_CWISE_BINARY_OP_H RcppEigen/inst/include/Eigen/src/Core/Stride.h0000644000176200001440000001016414562417221020645 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STRIDE_H #define EIGEN_STRIDE_H namespace Eigen { /** \class Stride * \ingroup Core_Module * * \brief Holds strides information for Map * * This class holds the strides information for mapping arrays with strides with class Map. * * It holds two values: the inner stride and the outer stride. * * The inner stride is the pointer increment between two consecutive entries within a given row of a * row-major matrix or within a given column of a column-major matrix. * * The outer stride is the pointer increment between two consecutive rows of a row-major matrix or * between two consecutive columns of a column-major matrix. * * These two values can be passed either at compile-time as template parameters, or at runtime as * arguments to the constructor. * * Indeed, this class takes two template parameters: * \tparam _OuterStrideAtCompileTime the outer stride, or Dynamic if you want to specify it at runtime. * \tparam _InnerStrideAtCompileTime the inner stride, or Dynamic if you want to specify it at runtime. * * Here is an example: * \include Map_general_stride.cpp * Output: \verbinclude Map_general_stride.out * * Both strides can be negative, however, a negative stride of -1 cannot be specified at compiletime * because of the ambiguity with Dynamic which is defined to -1 (historically, negative strides were * not allowed). * * \sa class InnerStride, class OuterStride, \ref TopicStorageOrders */ template class Stride { public: typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 enum { InnerStrideAtCompileTime = _InnerStrideAtCompileTime, OuterStrideAtCompileTime = _OuterStrideAtCompileTime }; /** Default constructor, for use when strides are fixed at compile time */ EIGEN_DEVICE_FUNC Stride() : m_outer(OuterStrideAtCompileTime), m_inner(InnerStrideAtCompileTime) { // FIXME: for Eigen 4 we should use DynamicIndex instead of Dynamic. // FIXME: for Eigen 4 we should also unify this API with fix<> eigen_assert(InnerStrideAtCompileTime != Dynamic && OuterStrideAtCompileTime != Dynamic); } /** Constructor allowing to pass the strides at runtime */ EIGEN_DEVICE_FUNC Stride(Index outerStride, Index innerStride) : m_outer(outerStride), m_inner(innerStride) { } /** Copy constructor */ EIGEN_DEVICE_FUNC Stride(const Stride& other) : m_outer(other.outer()), m_inner(other.inner()) {} /** \returns the outer stride */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outer() const { return m_outer.value(); } /** \returns the inner stride */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index inner() const { return m_inner.value(); } protected: internal::variable_if_dynamic m_outer; internal::variable_if_dynamic m_inner; }; /** \brief Convenience specialization of Stride to specify only an inner stride * See class Map for some examples */ template class InnerStride : public Stride<0, Value> { typedef Stride<0, Value> Base; public: EIGEN_DEVICE_FUNC InnerStride() : Base() {} EIGEN_DEVICE_FUNC InnerStride(Index v) : Base(0, v) {} // FIXME making this explicit could break valid code }; /** \brief Convenience specialization of Stride to specify only an outer stride * See class Map for some examples */ template class OuterStride : public Stride { typedef Stride Base; public: EIGEN_DEVICE_FUNC OuterStride() : Base() {} EIGEN_DEVICE_FUNC OuterStride(Index v) : Base(v,0) {} // FIXME making this explicit could break valid code }; } // end namespace Eigen #endif // EIGEN_STRIDE_H RcppEigen/inst/include/Eigen/src/Core/Transpositions.h0000644000176200001440000003237714562417221022464 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010-2011 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRANSPOSITIONS_H #define EIGEN_TRANSPOSITIONS_H namespace Eigen { template class TranspositionsBase { typedef internal::traits Traits; public: typedef typename Traits::IndicesType IndicesType; typedef typename IndicesType::Scalar StorageIndex; typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3 EIGEN_DEVICE_FUNC Derived& derived() { return *static_cast(this); } EIGEN_DEVICE_FUNC const Derived& derived() const { return *static_cast(this); } /** Copies the \a other transpositions into \c *this */ template Derived& operator=(const TranspositionsBase& other) { indices() = other.indices(); return derived(); } /** \returns the number of transpositions */ EIGEN_DEVICE_FUNC Index size() const { return indices().size(); } /** \returns the number of rows of the equivalent permutation matrix */ EIGEN_DEVICE_FUNC Index rows() const { return indices().size(); } /** \returns the number of columns of the equivalent permutation matrix */ EIGEN_DEVICE_FUNC Index cols() const { return indices().size(); } /** Direct access to the underlying index vector */ EIGEN_DEVICE_FUNC inline const StorageIndex& coeff(Index i) const { return indices().coeff(i); } /** Direct access to the underlying index vector */ inline StorageIndex& coeffRef(Index i) { return indices().coeffRef(i); } /** Direct access to the underlying index vector */ inline const StorageIndex& operator()(Index i) const { return indices()(i); } /** Direct access to the underlying index vector */ inline StorageIndex& operator()(Index i) { return indices()(i); } /** Direct access to the underlying index vector */ inline const StorageIndex& operator[](Index i) const { return indices()(i); } /** Direct access to the underlying index vector */ inline StorageIndex& operator[](Index i) { return indices()(i); } /** const version of indices(). */ EIGEN_DEVICE_FUNC const IndicesType& indices() const { return derived().indices(); } /** \returns a reference to the stored array representing the transpositions. */ EIGEN_DEVICE_FUNC IndicesType& indices() { return derived().indices(); } /** Resizes to given size. */ inline void resize(Index newSize) { indices().resize(newSize); } /** Sets \c *this to represents an identity transformation */ void setIdentity() { for(StorageIndex i = 0; i < indices().size(); ++i) coeffRef(i) = i; } // FIXME: do we want such methods ? // might be useful when the target matrix expression is complex, e.g.: // object.matrix().block(..,..,..,..) = trans * object.matrix().block(..,..,..,..); /* template void applyForwardToRows(MatrixType& mat) const { for(Index k=0 ; k void applyBackwardToRows(MatrixType& mat) const { for(Index k=size()-1 ; k>=0 ; --k) if(m_indices(k)!=k) mat.row(k).swap(mat.row(m_indices(k))); } */ /** \returns the inverse transformation */ inline Transpose inverse() const { return Transpose(derived()); } /** \returns the tranpose transformation */ inline Transpose transpose() const { return Transpose(derived()); } protected: }; namespace internal { template struct traits > : traits > { typedef Matrix<_StorageIndex, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1> IndicesType; typedef TranspositionsStorage StorageKind; }; } /** \class Transpositions * \ingroup Core_Module * * \brief Represents a sequence of transpositions (row/column interchange) * * \tparam SizeAtCompileTime the number of transpositions, or Dynamic * \tparam MaxSizeAtCompileTime the maximum number of transpositions, or Dynamic. This optional parameter defaults to SizeAtCompileTime. Most of the time, you should not have to specify it. * * This class represents a permutation transformation as a sequence of \em n transpositions * \f$[T_{n-1} \ldots T_{i} \ldots T_{0}]\f$. It is internally stored as a vector of integers \c indices. * Each transposition \f$ T_{i} \f$ applied on the left of a matrix (\f$ T_{i} M\f$) interchanges * the rows \c i and \c indices[i] of the matrix \c M. * A transposition applied on the right (e.g., \f$ M T_{i}\f$) yields a column interchange. * * Compared to the class PermutationMatrix, such a sequence of transpositions is what is * computed during a decomposition with pivoting, and it is faster when applying the permutation in-place. * * To apply a sequence of transpositions to a matrix, simply use the operator * as in the following example: * \code * Transpositions tr; * MatrixXf mat; * mat = tr * mat; * \endcode * In this example, we detect that the matrix appears on both side, and so the transpositions * are applied in-place without any temporary or extra copy. * * \sa class PermutationMatrix */ template class Transpositions : public TranspositionsBase > { typedef internal::traits Traits; public: typedef TranspositionsBase Base; typedef typename Traits::IndicesType IndicesType; typedef typename IndicesType::Scalar StorageIndex; inline Transpositions() {} /** Copy constructor. */ template inline Transpositions(const TranspositionsBase& other) : m_indices(other.indices()) {} /** Generic constructor from expression of the transposition indices. */ template explicit inline Transpositions(const MatrixBase& indices) : m_indices(indices) {} /** Copies the \a other transpositions into \c *this */ template Transpositions& operator=(const TranspositionsBase& other) { return Base::operator=(other); } /** Constructs an uninitialized permutation matrix of given size. */ inline Transpositions(Index size) : m_indices(size) {} /** const version of indices(). */ EIGEN_DEVICE_FUNC const IndicesType& indices() const { return m_indices; } /** \returns a reference to the stored array representing the transpositions. */ EIGEN_DEVICE_FUNC IndicesType& indices() { return m_indices; } protected: IndicesType m_indices; }; namespace internal { template struct traits,_PacketAccess> > : traits > { typedef Map, _PacketAccess> IndicesType; typedef _StorageIndex StorageIndex; typedef TranspositionsStorage StorageKind; }; } template class Map,PacketAccess> : public TranspositionsBase,PacketAccess> > { typedef internal::traits Traits; public: typedef TranspositionsBase Base; typedef typename Traits::IndicesType IndicesType; typedef typename IndicesType::Scalar StorageIndex; explicit inline Map(const StorageIndex* indicesPtr) : m_indices(indicesPtr) {} inline Map(const StorageIndex* indicesPtr, Index size) : m_indices(indicesPtr,size) {} /** Copies the \a other transpositions into \c *this */ template Map& operator=(const TranspositionsBase& other) { return Base::operator=(other); } #ifndef EIGEN_PARSED_BY_DOXYGEN /** This is a special case of the templated operator=. Its purpose is to * prevent a default operator= from hiding the templated operator=. */ Map& operator=(const Map& other) { m_indices = other.m_indices; return *this; } #endif /** const version of indices(). */ EIGEN_DEVICE_FUNC const IndicesType& indices() const { return m_indices; } /** \returns a reference to the stored array representing the transpositions. */ EIGEN_DEVICE_FUNC IndicesType& indices() { return m_indices; } protected: IndicesType m_indices; }; namespace internal { template struct traits > : traits > { typedef TranspositionsStorage StorageKind; }; } template class TranspositionsWrapper : public TranspositionsBase > { typedef internal::traits Traits; public: typedef TranspositionsBase Base; typedef typename Traits::IndicesType IndicesType; typedef typename IndicesType::Scalar StorageIndex; explicit inline TranspositionsWrapper(IndicesType& indices) : m_indices(indices) {} /** Copies the \a other transpositions into \c *this */ template TranspositionsWrapper& operator=(const TranspositionsBase& other) { return Base::operator=(other); } /** const version of indices(). */ EIGEN_DEVICE_FUNC const IndicesType& indices() const { return m_indices; } /** \returns a reference to the stored array representing the transpositions. */ EIGEN_DEVICE_FUNC IndicesType& indices() { return m_indices; } protected: typename IndicesType::Nested m_indices; }; /** \returns the \a matrix with the \a transpositions applied to the columns. */ template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase &matrix, const TranspositionsBase& transpositions) { return Product (matrix.derived(), transpositions.derived()); } /** \returns the \a matrix with the \a transpositions applied to the rows. */ template EIGEN_DEVICE_FUNC const Product operator*(const TranspositionsBase &transpositions, const MatrixBase& matrix) { return Product (transpositions.derived(), matrix.derived()); } // Template partial specialization for transposed/inverse transpositions namespace internal { template struct traits > > : traits {}; } // end namespace internal template class Transpose > { typedef TranspositionsDerived TranspositionType; typedef typename TranspositionType::IndicesType IndicesType; public: explicit Transpose(const TranspositionType& t) : m_transpositions(t) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_transpositions.size(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_transpositions.size(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_transpositions.size(); } /** \returns the \a matrix with the inverse transpositions applied to the columns. */ template friend const Product operator*(const MatrixBase& matrix, const Transpose& trt) { return Product(matrix.derived(), trt); } /** \returns the \a matrix with the inverse transpositions applied to the rows. */ template const Product operator*(const MatrixBase& matrix) const { return Product(*this, matrix.derived()); } EIGEN_DEVICE_FUNC const TranspositionType& nestedExpression() const { return m_transpositions; } protected: const TranspositionType& m_transpositions; }; } // end namespace Eigen #endif // EIGEN_TRANSPOSITIONS_H RcppEigen/inst/include/Eigen/src/Core/VectorBlock.h0000644000176200001440000000664014562417221021634 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_VECTORBLOCK_H #define EIGEN_VECTORBLOCK_H namespace Eigen { namespace internal { template struct traits > : public traits::Flags & RowMajorBit ? 1 : Size, traits::Flags & RowMajorBit ? Size : 1> > { }; } /** \class VectorBlock * \ingroup Core_Module * * \brief Expression of a fixed-size or dynamic-size sub-vector * * \tparam VectorType the type of the object in which we are taking a sub-vector * \tparam Size size of the sub-vector we are taking at compile time (optional) * * This class represents an expression of either a fixed-size or dynamic-size sub-vector. * It is the return type of DenseBase::segment(Index,Index) and DenseBase::segment(Index) and * most of the time this is the only way it is used. * * However, if you want to directly manipulate sub-vector expressions, * for instance if you want to write a function returning such an expression, you * will need to use this class. * * Here is an example illustrating the dynamic case: * \include class_VectorBlock.cpp * Output: \verbinclude class_VectorBlock.out * * \note Even though this expression has dynamic size, in the case where \a VectorType * has fixed size, this expression inherits a fixed maximal size which means that evaluating * it does not cause a dynamic memory allocation. * * Here is an example illustrating the fixed-size case: * \include class_FixedVectorBlock.cpp * Output: \verbinclude class_FixedVectorBlock.out * * \sa class Block, DenseBase::segment(Index,Index,Index,Index), DenseBase::segment(Index,Index) */ template class VectorBlock : public Block::Flags & RowMajorBit ? 1 : Size, internal::traits::Flags & RowMajorBit ? Size : 1> { typedef Block::Flags & RowMajorBit ? 1 : Size, internal::traits::Flags & RowMajorBit ? Size : 1> Base; enum { IsColVector = !(internal::traits::Flags & RowMajorBit) }; public: EIGEN_DENSE_PUBLIC_INTERFACE(VectorBlock) using Base::operator=; /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE VectorBlock(VectorType& vector, Index start, Index size) : Base(vector, IsColVector ? start : 0, IsColVector ? 0 : start, IsColVector ? size : 1, IsColVector ? 1 : size) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorBlock); } /** Fixed-size constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE VectorBlock(VectorType& vector, Index start) : Base(vector, IsColVector ? start : 0, IsColVector ? 0 : start) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorBlock); } }; } // end namespace Eigen #endif // EIGEN_VECTORBLOCK_H RcppEigen/inst/include/Eigen/src/Core/Product.h0000644000176200001440000001625014562417221021035 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2011 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PRODUCT_H #define EIGEN_PRODUCT_H namespace Eigen { template class ProductImpl; namespace internal { template struct traits > { typedef typename remove_all::type LhsCleaned; typedef typename remove_all::type RhsCleaned; typedef traits LhsTraits; typedef traits RhsTraits; typedef MatrixXpr XprKind; typedef typename ScalarBinaryOpTraits::Scalar, typename traits::Scalar>::ReturnType Scalar; typedef typename product_promote_storage_type::ret>::ret StorageKind; typedef typename promote_index_type::type StorageIndex; enum { RowsAtCompileTime = LhsTraits::RowsAtCompileTime, ColsAtCompileTime = RhsTraits::ColsAtCompileTime, MaxRowsAtCompileTime = LhsTraits::MaxRowsAtCompileTime, MaxColsAtCompileTime = RhsTraits::MaxColsAtCompileTime, // FIXME: only needed by GeneralMatrixMatrixTriangular InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(LhsTraits::ColsAtCompileTime, RhsTraits::RowsAtCompileTime), // The storage order is somewhat arbitrary here. The correct one will be determined through the evaluator. Flags = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? RowMajorBit : (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0 : ( ((LhsTraits::Flags&NoPreferredStorageOrderBit) && (RhsTraits::Flags&RowMajorBit)) || ((RhsTraits::Flags&NoPreferredStorageOrderBit) && (LhsTraits::Flags&RowMajorBit)) ) ? RowMajorBit : NoPreferredStorageOrderBit }; }; } // end namespace internal /** \class Product * \ingroup Core_Module * * \brief Expression of the product of two arbitrary matrices or vectors * * \tparam _Lhs the type of the left-hand side expression * \tparam _Rhs the type of the right-hand side expression * * This class represents an expression of the product of two arbitrary matrices. * * The other template parameters are: * \tparam Option can be DefaultProduct, AliasFreeProduct, or LazyProduct * */ template class Product : public ProductImpl<_Lhs,_Rhs,Option, typename internal::product_promote_storage_type::StorageKind, typename internal::traits<_Rhs>::StorageKind, internal::product_type<_Lhs,_Rhs>::ret>::ret> { public: typedef _Lhs Lhs; typedef _Rhs Rhs; typedef typename ProductImpl< Lhs, Rhs, Option, typename internal::product_promote_storage_type::StorageKind, typename internal::traits::StorageKind, internal::product_type::ret>::ret>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(Product) typedef typename internal::ref_selector::type LhsNested; typedef typename internal::ref_selector::type RhsNested; typedef typename internal::remove_all::type LhsNestedCleaned; typedef typename internal::remove_all::type RhsNestedCleaned; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Product(const Lhs& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs) { eigen_assert(lhs.cols() == rhs.rows() && "invalid matrix product" && "if you wanted a coeff-wise or a dot product use the respective explicit functions"); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_lhs.rows(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_rhs.cols(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const LhsNestedCleaned& lhs() const { return m_lhs; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const RhsNestedCleaned& rhs() const { return m_rhs; } protected: LhsNested m_lhs; RhsNested m_rhs; }; namespace internal { template::ret> class dense_product_base : public internal::dense_xpr_base >::type {}; /** Conversion to scalar for inner-products */ template class dense_product_base : public internal::dense_xpr_base >::type { typedef Product ProductXpr; typedef typename internal::dense_xpr_base::type Base; public: using Base::derived; typedef typename Base::Scalar Scalar; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator const Scalar() const { return internal::evaluator(derived()).coeff(0,0); } }; } // namespace internal // Generic API dispatcher template class ProductImpl : public internal::generic_xpr_base, MatrixXpr, StorageKind>::type { public: typedef typename internal::generic_xpr_base, MatrixXpr, StorageKind>::type Base; }; template class ProductImpl : public internal::dense_product_base { typedef Product Derived; public: typedef typename internal::dense_product_base Base; EIGEN_DENSE_PUBLIC_INTERFACE(Derived) protected: enum { IsOneByOne = (RowsAtCompileTime == 1 || RowsAtCompileTime == Dynamic) && (ColsAtCompileTime == 1 || ColsAtCompileTime == Dynamic), EnableCoeff = IsOneByOne || Option==LazyProduct }; public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar coeff(Index row, Index col) const { EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS); eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) ); return internal::evaluator(derived()).coeff(row,col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar coeff(Index i) const { EIGEN_STATIC_ASSERT(EnableCoeff, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS); eigen_assert( (Option==LazyProduct) || (this->rows() == 1 && this->cols() == 1) ); return internal::evaluator(derived()).coeff(i); } }; } // end namespace Eigen #endif // EIGEN_PRODUCT_H RcppEigen/inst/include/Eigen/src/Core/arch/0000755000176200001440000000000014562417221020155 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/ZVector/0000755000176200001440000000000014562417221021551 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/ZVector/MathFunctions.h0000644000176200001440000001753014562417221024512 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007 Julien Pommier // Copyright (C) 2009 Gael Guennebaud // Copyright (C) 2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* The sin, cos, exp, and log functions of this file come from * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/ */ #ifndef EIGEN_MATH_FUNCTIONS_ALTIVEC_H #define EIGEN_MATH_FUNCTIONS_ALTIVEC_H namespace Eigen { namespace internal { #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) static _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f); static _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f); static _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f); static _EIGEN_DECLARE_CONST_Packet4i(23, 23); static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inv_mant_mask, ~0x7f800000); /* the smallest non denormalized float number */ static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(min_norm_pos, 0x00800000); static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_inf, 0xff800000); // -1.f/0.f static _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_nan, 0xffffffff); /* natural logarithm computed for 4 simultaneous float return NaN for x <= 0 */ static _EIGEN_DECLARE_CONST_Packet4f(cephes_SQRTHF, 0.707106781186547524f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p0, 7.0376836292E-2f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p1, - 1.1514610310E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p2, 1.1676998740E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p3, - 1.2420140846E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p4, + 1.4249322787E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p5, - 1.6668057665E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p6, + 2.0000714765E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p7, - 2.4999993993E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p8, + 3.3333331174E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q1, -2.12194440e-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q2, 0.693359375f); static _EIGEN_DECLARE_CONST_Packet4f(exp_hi, 88.3762626647950f); static _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -88.3762626647949f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500E-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507E-3f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073E-3f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894E-2f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459E-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201E-1f); #endif static _EIGEN_DECLARE_CONST_Packet2d(1 , 1.0); static _EIGEN_DECLARE_CONST_Packet2d(2 , 2.0); static _EIGEN_DECLARE_CONST_Packet2d(half, 0.5); static _EIGEN_DECLARE_CONST_Packet2d(exp_hi, 709.437); static _EIGEN_DECLARE_CONST_Packet2d(exp_lo, -709.436139303); static _EIGEN_DECLARE_CONST_Packet2d(cephes_LOG2EF, 1.4426950408889634073599); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p0, 1.26177193074810590878e-4); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p1, 3.02994407707441961300e-2); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p2, 9.99999999999999999910e-1); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q0, 3.00198505138664455042e-6); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q1, 2.52448340349684104192e-3); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q2, 2.27265548208155028766e-1); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q3, 2.00000000000000000009e0); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C1, 0.693145751953125); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C2, 1.42860682030941723212e-6); template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp(const Packet2d& _x) { Packet2d x = _x; Packet2d tmp, fx; Packet2l emm0; // clamp x x = pmax(pmin(x, p2d_exp_hi), p2d_exp_lo); /* express exp(x) as exp(g + n*log(2)) */ fx = pmadd(p2d_cephes_LOG2EF, x, p2d_half); fx = vec_floor(fx); tmp = pmul(fx, p2d_cephes_exp_C1); Packet2d z = pmul(fx, p2d_cephes_exp_C2); x = psub(x, tmp); x = psub(x, z); Packet2d x2 = pmul(x,x); Packet2d px = p2d_cephes_exp_p0; px = pmadd(px, x2, p2d_cephes_exp_p1); px = pmadd(px, x2, p2d_cephes_exp_p2); px = pmul (px, x); Packet2d qx = p2d_cephes_exp_q0; qx = pmadd(qx, x2, p2d_cephes_exp_q1); qx = pmadd(qx, x2, p2d_cephes_exp_q2); qx = pmadd(qx, x2, p2d_cephes_exp_q3); x = pdiv(px,psub(qx,px)); x = pmadd(p2d_2,x,p2d_1); // build 2^n emm0 = vec_ctsl(fx, 0); static const Packet2l p2l_1023 = { 1023, 1023 }; static const Packet2ul p2ul_52 = { 52, 52 }; emm0 = emm0 + p2l_1023; emm0 = emm0 << reinterpret_cast(p2ul_52); // Altivec's max & min operators just drop silent NaNs. Check NaNs in // inputs and return them unmodified. Packet2ul isnumber_mask = reinterpret_cast(vec_cmpeq(_x, _x)); return vec_sel(_x, pmax(pmul(x, reinterpret_cast(emm0)), _x), isnumber_mask); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp(const Packet4f& _x) { #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) Packet4f x = _x; Packet4f tmp, fx; Packet4i emm0; // clamp x x = pmax(pmin(x, p4f_exp_hi), p4f_exp_lo); // express exp(x) as exp(g + n*log(2)) fx = pmadd(x, p4f_cephes_LOG2EF, p4f_half); fx = pfloor(fx); tmp = pmul(fx, p4f_cephes_exp_C1); Packet4f z = pmul(fx, p4f_cephes_exp_C2); x = psub(x, tmp); x = psub(x, z); z = pmul(x,x); Packet4f y = p4f_cephes_exp_p0; y = pmadd(y, x, p4f_cephes_exp_p1); y = pmadd(y, x, p4f_cephes_exp_p2); y = pmadd(y, x, p4f_cephes_exp_p3); y = pmadd(y, x, p4f_cephes_exp_p4); y = pmadd(y, x, p4f_cephes_exp_p5); y = pmadd(y, z, x); y = padd(y, p4f_1); // build 2^n emm0 = (Packet4i){ (int)fx[0], (int)fx[1], (int)fx[2], (int)fx[3] }; emm0 = emm0 + p4i_0x7f; emm0 = emm0 << reinterpret_cast(p4i_23); return pmax(pmul(y, reinterpret_cast(emm0)), _x); #else Packet4f res; res.v4f[0] = pexp(_x.v4f[0]); res.v4f[1] = pexp(_x.v4f[1]); return res; #endif } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d psqrt(const Packet2d& x) { return vec_sqrt(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psqrt(const Packet4f& x) { Packet4f res; #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) res = vec_sqrt(x); #else res.v4f[0] = psqrt(x.v4f[0]); res.v4f[1] = psqrt(x.v4f[1]); #endif return res; } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d prsqrt(const Packet2d& x) { return pset1(1.0) / psqrt(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f prsqrt(const Packet4f& x) { Packet4f res; #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) res = pset1(1.0) / psqrt(x); #else res.v4f[0] = prsqrt(x.v4f[0]); res.v4f[1] = prsqrt(x.v4f[1]); #endif return res; } // Hyperbolic Tangent function. template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh(const Packet4f& x) { return internal::generic_fast_tanh_float(x); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/ZVector/PacketMath.h0000755000176200001440000011003614562417221023747 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_ZVECTOR_H #define EIGEN_PACKET_MATH_ZVECTOR_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 16 #endif #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 #endif typedef __vector int Packet4i; typedef __vector unsigned int Packet4ui; typedef __vector __bool int Packet4bi; typedef __vector short int Packet8i; typedef __vector unsigned char Packet16uc; typedef __vector double Packet2d; typedef __vector unsigned long long Packet2ul; typedef __vector long long Packet2l; // Z14 has builtin support for float vectors #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) typedef __vector float Packet4f; #else typedef struct { Packet2d v4f[2]; } Packet4f; #endif typedef union { numext::int32_t i[4]; numext::uint32_t ui[4]; numext::int64_t l[2]; numext::uint64_t ul[2]; double d[2]; float f[4]; Packet4i v4i; Packet4ui v4ui; Packet2l v2l; Packet2ul v2ul; Packet2d v2d; #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) Packet4f v4f; #endif } Packet; // We don't want to write the same code all the time, but we need to reuse the constants // and it doesn't really work to declare them global, so we define macros instead #define _EIGEN_DECLARE_CONST_FAST_Packet4i(NAME,X) \ Packet4i p4i_##NAME = reinterpret_cast(vec_splat_s32(X)) #define _EIGEN_DECLARE_CONST_FAST_Packet2d(NAME,X) \ Packet2d p2d_##NAME = reinterpret_cast(vec_splat_s64(X)) #define _EIGEN_DECLARE_CONST_FAST_Packet2l(NAME,X) \ Packet2l p2l_##NAME = reinterpret_cast(vec_splat_s64(X)) #define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \ Packet4i p4i_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet2d(NAME,X) \ Packet2d p2d_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet2l(NAME,X) \ Packet2l p2l_##NAME = pset1(X) // These constants are endian-agnostic static _EIGEN_DECLARE_CONST_FAST_Packet4i(ZERO, 0); //{ 0, 0, 0, 0,} static _EIGEN_DECLARE_CONST_FAST_Packet4i(ONE, 1); //{ 1, 1, 1, 1} static _EIGEN_DECLARE_CONST_FAST_Packet2d(ZERO, 0); static _EIGEN_DECLARE_CONST_FAST_Packet2l(ZERO, 0); static _EIGEN_DECLARE_CONST_FAST_Packet2l(ONE, 1); static Packet2d p2d_ONE = { 1.0, 1.0 }; static Packet2d p2d_ZERO_ = { numext::bit_cast0x8000000000000000ull), numext::bit_cast0x8000000000000000ull) }; #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) #define _EIGEN_DECLARE_CONST_FAST_Packet4f(NAME,X) \ Packet4f p4f_##NAME = reinterpret_cast(vec_splat_s32(X)) #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \ Packet4f p4f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \ const Packet4f p4f_##NAME = reinterpret_cast(pset1(X)) static _EIGEN_DECLARE_CONST_FAST_Packet4f(ZERO, 0); //{ 0.0, 0.0, 0.0, 0.0} static _EIGEN_DECLARE_CONST_FAST_Packet4i(MINUS1,-1); //{ -1, -1, -1, -1} static Packet4f p4f_MZERO = { 0x80000000, 0x80000000, 0x80000000, 0x80000000}; #endif static Packet4i p4i_COUNTDOWN = { 0, 1, 2, 3 }; static Packet4f p4f_COUNTDOWN = { 0.0, 1.0, 2.0, 3.0 }; static Packet2d p2d_COUNTDOWN = reinterpret_cast(vec_sld(reinterpret_cast(p2d_ZERO), reinterpret_cast(p2d_ONE), 8)); static Packet16uc p16uc_PSET64_HI = { 0,1,2,3, 4,5,6,7, 0,1,2,3, 4,5,6,7 }; static Packet16uc p16uc_DUPLICATE32_HI = { 0,1,2,3, 0,1,2,3, 4,5,6,7, 4,5,6,7 }; // Mask alignment #define _EIGEN_MASK_ALIGNMENT 0xfffffffffffffff0 #define _EIGEN_ALIGNED_PTR(x) ((std::ptrdiff_t)(x) & _EIGEN_MASK_ALIGNMENT) // Handle endianness properly while loading constants // Define global static constants: static Packet16uc p16uc_FORWARD = { 0,1,2,3, 4,5,6,7, 8,9,10,11, 12,13,14,15 }; static Packet16uc p16uc_REVERSE32 = { 12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3 }; static Packet16uc p16uc_REVERSE64 = { 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; static Packet16uc p16uc_PSET32_WODD = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 0), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 2), 8);//{ 0,1,2,3, 0,1,2,3, 8,9,10,11, 8,9,10,11 }; static Packet16uc p16uc_PSET32_WEVEN = vec_sld(p16uc_DUPLICATE32_HI, (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 3), 8);//{ 4,5,6,7, 4,5,6,7, 12,13,14,15, 12,13,14,15 }; /*static Packet16uc p16uc_HALF64_0_16 = vec_sld((Packet16uc)p4i_ZERO, vec_splat((Packet16uc) vec_abs(p4i_MINUS16), 3), 8); //{ 0,0,0,0, 0,0,0,0, 16,16,16,16, 16,16,16,16}; static Packet16uc p16uc_PSET64_HI = (Packet16uc) vec_mergeh((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN); //{ 0,1,2,3, 4,5,6,7, 0,1,2,3, 4,5,6,7 };*/ static Packet16uc p16uc_PSET64_LO = (Packet16uc) vec_mergel((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN); //{ 8,9,10,11, 12,13,14,15, 8,9,10,11, 12,13,14,15 }; /*static Packet16uc p16uc_TRANSPOSE64_HI = vec_add(p16uc_PSET64_HI, p16uc_HALF64_0_16); //{ 0,1,2,3, 4,5,6,7, 16,17,18,19, 20,21,22,23}; static Packet16uc p16uc_TRANSPOSE64_LO = vec_add(p16uc_PSET64_LO, p16uc_HALF64_0_16); //{ 8,9,10,11, 12,13,14,15, 24,25,26,27, 28,29,30,31};*/ static Packet16uc p16uc_TRANSPOSE64_HI = { 0,1,2,3, 4,5,6,7, 16,17,18,19, 20,21,22,23}; static Packet16uc p16uc_TRANSPOSE64_LO = { 8,9,10,11, 12,13,14,15, 24,25,26,27, 28,29,30,31}; static Packet16uc p16uc_COMPLEX32_REV = vec_sld(p16uc_REVERSE32, p16uc_REVERSE32, 8); //{ 4,5,6,7, 0,1,2,3, 12,13,14,15, 8,9,10,11 }; static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_FORWARD, p16uc_FORWARD, 8); //{ 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; #if EIGEN_HAS_BUILTIN(__builtin_prefetch) || EIGEN_COMP_GNUC #define EIGEN_ZVECTOR_PREFETCH(ADDR) __builtin_prefetch(ADDR); #else #define EIGEN_ZVECTOR_PREFETCH(ADDR) asm( " pfd [%[addr]]\n" :: [addr] "r" (ADDR) : "cc" ); #endif template<> struct packet_traits : default_packet_traits { typedef Packet4i type; typedef Packet4i half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4f type; typedef Packet4f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasMin = 1, HasMax = 1, HasAbs = 1, HasSin = 0, HasCos = 0, HasLog = 0, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = 1, HasErf = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasNegate = 1, HasBlend = 1 }; }; template<> struct packet_traits : default_packet_traits { typedef Packet2d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=2, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasMin = 1, HasMax = 1, HasAbs = 1, HasSin = 0, HasCos = 0, HasLog = 0, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasNegate = 1, HasBlend = 1 }; }; template<> struct unpacket_traits { typedef int type; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet4i half; }; template<> struct unpacket_traits { typedef float type; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet4f half; }; template<> struct unpacket_traits { typedef double type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2d half; }; /* Forward declaration */ EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel); inline std::ostream & operator <<(std::ostream & s, const Packet4i & v) { Packet vt; vt.v4i = v; s << vt.i[0] << ", " << vt.i[1] << ", " << vt.i[2] << ", " << vt.i[3]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet4ui & v) { Packet vt; vt.v4ui = v; s << vt.ui[0] << ", " << vt.ui[1] << ", " << vt.ui[2] << ", " << vt.ui[3]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet2l & v) { Packet vt; vt.v2l = v; s << vt.l[0] << ", " << vt.l[1]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet2ul & v) { Packet vt; vt.v2ul = v; s << vt.ul[0] << ", " << vt.ul[1] ; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet2d & v) { Packet vt; vt.v2d = v; s << vt.d[0] << ", " << vt.d[1]; return s; } #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) inline std::ostream & operator <<(std::ostream & s, const Packet4f & v) { Packet vt; vt.v4f = v; s << vt.f[0] << ", " << vt.f[1] << ", " << vt.f[2] << ", " << vt.f[3]; return s; } #endif template<> EIGEN_STRONG_INLINE Packet4i pload(const int* from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_LOAD Packet *vfrom; vfrom = (Packet *) from; return vfrom->v4i; } template<> EIGEN_STRONG_INLINE Packet2d pload(const double* from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_LOAD Packet *vfrom; vfrom = (Packet *) from; return vfrom->v2d; } template<> EIGEN_STRONG_INLINE void pstore(int* to, const Packet4i& from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_STORE Packet *vto; vto = (Packet *) to; vto->v4i = from; } template<> EIGEN_STRONG_INLINE void pstore(double* to, const Packet2d& from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_STORE Packet *vto; vto = (Packet *) to; vto->v2d = from; } template<> EIGEN_STRONG_INLINE Packet4i pset1(const int& from) { return vec_splats(from); } template<> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { return vec_splats(from); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const int *a, Packet4i& a0, Packet4i& a1, Packet4i& a2, Packet4i& a3) { a3 = pload(a); a0 = vec_splat(a3, 0); a1 = vec_splat(a3, 1); a2 = vec_splat(a3, 2); a3 = vec_splat(a3, 3); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const double *a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3) { a1 = pload(a); a0 = vec_splat(a1, 0); a1 = vec_splat(a1, 1); a3 = pload(a+2); a2 = vec_splat(a3, 0); a3 = vec_splat(a3, 1); } template<> EIGEN_DEVICE_FUNC inline Packet4i pgather(const int* from, Index stride) { int EIGEN_ALIGN16 ai[4]; ai[0] = from[0*stride]; ai[1] = from[1*stride]; ai[2] = from[2*stride]; ai[3] = from[3*stride]; return pload(ai); } template<> EIGEN_DEVICE_FUNC inline Packet2d pgather(const double* from, Index stride) { double EIGEN_ALIGN16 af[2]; af[0] = from[0*stride]; af[1] = from[1*stride]; return pload(af); } template<> EIGEN_DEVICE_FUNC inline void pscatter(int* to, const Packet4i& from, Index stride) { int EIGEN_ALIGN16 ai[4]; pstore((int *)ai, from); to[0*stride] = ai[0]; to[1*stride] = ai[1]; to[2*stride] = ai[2]; to[3*stride] = ai[3]; } template<> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet2d& from, Index stride) { double EIGEN_ALIGN16 af[2]; pstore(af, from); to[0*stride] = af[0]; to[1*stride] = af[1]; } template<> EIGEN_STRONG_INLINE Packet4i padd(const Packet4i& a, const Packet4i& b) { return (a + b); } template<> EIGEN_STRONG_INLINE Packet2d padd(const Packet2d& a, const Packet2d& b) { return (a + b); } template<> EIGEN_STRONG_INLINE Packet4i psub(const Packet4i& a, const Packet4i& b) { return (a - b); } template<> EIGEN_STRONG_INLINE Packet2d psub(const Packet2d& a, const Packet2d& b) { return (a - b); } template<> EIGEN_STRONG_INLINE Packet4i pmul(const Packet4i& a, const Packet4i& b) { return (a * b); } template<> EIGEN_STRONG_INLINE Packet2d pmul(const Packet2d& a, const Packet2d& b) { return (a * b); } template<> EIGEN_STRONG_INLINE Packet4i pdiv(const Packet4i& a, const Packet4i& b) { return (a / b); } template<> EIGEN_STRONG_INLINE Packet2d pdiv(const Packet2d& a, const Packet2d& b) { return (a / b); } template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return (-a); } template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { return (-a); } template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return padd(pmul(a, b), c); } template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vec_madd(a, b, c); } template<> EIGEN_STRONG_INLINE Packet4i plset(const int& a) { return padd(pset1(a), p4i_COUNTDOWN); } template<> EIGEN_STRONG_INLINE Packet2d plset(const double& a) { return padd(pset1(a), p2d_COUNTDOWN); } template<> EIGEN_STRONG_INLINE Packet4i pmin(const Packet4i& a, const Packet4i& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pmax(const Packet4i& a, const Packet4i& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pand(const Packet4i& a, const Packet4i& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pand(const Packet2d& a, const Packet2d& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet4i por(const Packet4i& a, const Packet4i& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet2d por(const Packet2d& a, const Packet2d& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pxor(const Packet4i& a, const Packet4i& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& a, const Packet2d& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pandnot(const Packet4i& a, const Packet4i& b) { return pand(a, vec_nor(b, b)); } template<> EIGEN_STRONG_INLINE Packet2d pandnot(const Packet2d& a, const Packet2d& b) { return vec_and(a, vec_nor(b, b)); } template<> EIGEN_STRONG_INLINE Packet2d pround(const Packet2d& a) { return vec_round(a); } template<> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { return vec_ceil(a); } template<> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { return vec_floor(a); } template<> EIGEN_STRONG_INLINE Packet4i ploadu(const int* from) { return pload(from); } template<> EIGEN_STRONG_INLINE Packet2d ploadu(const double* from) { return pload(from); } template<> EIGEN_STRONG_INLINE Packet4i ploaddup(const int* from) { Packet4i p = pload(from); return vec_perm(p, p, p16uc_DUPLICATE32_HI); } template<> EIGEN_STRONG_INLINE Packet2d ploaddup(const double* from) { Packet2d p = pload(from); return vec_perm(p, p, p16uc_PSET64_HI); } template<> EIGEN_STRONG_INLINE void pstoreu(int* to, const Packet4i& from) { pstore(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet2d& from) { pstore(to, from); } template<> EIGEN_STRONG_INLINE void prefetch(const int* addr) { EIGEN_ZVECTOR_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { EIGEN_ZVECTOR_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE int pfirst(const Packet4i& a) { int EIGEN_ALIGN16 x[4]; pstore(x, a); return x[0]; } template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { double EIGEN_ALIGN16 x[2]; pstore(x, a); return x[0]; } template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE32)); } template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE64)); } template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE int predux(const Packet4i& a) { Packet4i b, sum; b = vec_sld(a, a, 8); sum = padd(a, b); b = vec_sld(sum, sum, 4); sum = padd(sum, b); return pfirst(sum); } template<> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { Packet2d b, sum; b = reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)); sum = padd(a, b); return pfirst(sum); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE int predux_mul(const Packet4i& a) { EIGEN_ALIGN16 int aux[4]; pstore(aux, a); return aux[0] * aux[1] * aux[2] * aux[3]; } template<> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { return pfirst(pmul(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } // min template<> EIGEN_STRONG_INLINE int predux_min(const Packet4i& a) { Packet4i b, res; b = pmin(a, vec_sld(a, a, 8)); res = pmin(b, vec_sld(b, b, 4)); return pfirst(res); } template<> EIGEN_STRONG_INLINE double predux_min(const Packet2d& a) { return pfirst(pmin(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } // max template<> EIGEN_STRONG_INLINE int predux_max(const Packet4i& a) { Packet4i b, res; b = pmax(a, vec_sld(a, a, 8)); res = pmax(b, vec_sld(b, b, 4)); return pfirst(res); } // max template<> EIGEN_STRONG_INLINE double predux_max(const Packet2d& a) { return pfirst(pmax(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet4i t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); Packet4i t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); Packet4i t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); Packet4i t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet2d t0 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_HI); Packet2d t1 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_LO); kernel.packet[0] = t0; kernel.packet[1] = t1; } template<> EIGEN_STRONG_INLINE Packet4i pblend(const Selector<4>& ifPacket, const Packet4i& thenPacket, const Packet4i& elsePacket) { Packet4ui select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3] }; Packet4ui mask = vec_cmpeq(select, reinterpret_cast(p4i_ONE)); return vec_sel(elsePacket, thenPacket, mask); } template<> EIGEN_STRONG_INLINE Packet2d pblend(const Selector<2>& ifPacket, const Packet2d& thenPacket, const Packet2d& elsePacket) { Packet2ul select = { ifPacket.select[0], ifPacket.select[1] }; Packet2ul mask = vec_cmpeq(select, reinterpret_cast(p2l_ONE)); return vec_sel(elsePacket, thenPacket, mask); } /* z13 has no vector float support so we emulate that with double z14 has proper vector float support. */ #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ < 12) /* Helper function to simulate a vec_splat_packet4f */ template EIGEN_STRONG_INLINE Packet4f vec_splat_packet4f(const Packet4f& from) { Packet4f splat; switch (element) { case 0: splat.v4f[0] = vec_splat(from.v4f[0], 0); splat.v4f[1] = splat.v4f[0]; break; case 1: splat.v4f[0] = vec_splat(from.v4f[0], 1); splat.v4f[1] = splat.v4f[0]; break; case 2: splat.v4f[0] = vec_splat(from.v4f[1], 0); splat.v4f[1] = splat.v4f[0]; break; case 3: splat.v4f[0] = vec_splat(from.v4f[1], 1); splat.v4f[1] = splat.v4f[0]; break; } return splat; } template<> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_LOAD Packet4f vfrom; vfrom.v4f[0] = vec_ld2f(&from[0]); vfrom.v4f[1] = vec_ld2f(&from[2]); return vfrom; } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_STORE vec_st2f(from.v4f[0], &to[0]); vec_st2f(from.v4f[1], &to[2]); } template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { Packet4f to; to.v4f[0] = pset1(static_cast(from)); to.v4f[1] = to.v4f[0]; return to; } template<> EIGEN_STRONG_INLINE void pbroadcast4(const float *a, Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3) { a3 = pload(a); a0 = vec_splat_packet4f<0>(a3); a1 = vec_splat_packet4f<1>(a3); a2 = vec_splat_packet4f<2>(a3); a3 = vec_splat_packet4f<3>(a3); } template<> EIGEN_DEVICE_FUNC inline Packet4f pgather(const float* from, Index stride) { float EIGEN_ALIGN16 ai[4]; ai[0] = from[0*stride]; ai[1] = from[1*stride]; ai[2] = from[2*stride]; ai[3] = from[3*stride]; return pload(ai); } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet4f& from, Index stride) { float EIGEN_ALIGN16 ai[4]; pstore((float *)ai, from); to[0*stride] = ai[0]; to[1*stride] = ai[1]; to[2*stride] = ai[2]; to[3*stride] = ai[3]; } template<> EIGEN_STRONG_INLINE Packet4f padd(const Packet4f& a, const Packet4f& b) { Packet4f c; c.v4f[0] = a.v4f[0] + b.v4f[0]; c.v4f[1] = a.v4f[1] + b.v4f[1]; return c; } template<> EIGEN_STRONG_INLINE Packet4f psub(const Packet4f& a, const Packet4f& b) { Packet4f c; c.v4f[0] = a.v4f[0] - b.v4f[0]; c.v4f[1] = a.v4f[1] - b.v4f[1]; return c; } template<> EIGEN_STRONG_INLINE Packet4f pmul(const Packet4f& a, const Packet4f& b) { Packet4f c; c.v4f[0] = a.v4f[0] * b.v4f[0]; c.v4f[1] = a.v4f[1] * b.v4f[1]; return c; } template<> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { Packet4f c; c.v4f[0] = a.v4f[0] / b.v4f[0]; c.v4f[1] = a.v4f[1] / b.v4f[1]; return c; } template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { Packet4f c; c.v4f[0] = -a.v4f[0]; c.v4f[1] = -a.v4f[1]; return c; } template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { Packet4f res; res.v4f[0] = vec_madd(a.v4f[0], b.v4f[0], c.v4f[0]); res.v4f[1] = vec_madd(a.v4f[1], b.v4f[1], c.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pmin(a.v4f[0], b.v4f[0]); res.v4f[1] = pmin(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pmax(a.v4f[0], b.v4f[0]); res.v4f[1] = pmax(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pand(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pand(a.v4f[0], b.v4f[0]); res.v4f[1] = pand(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f por(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = por(a.v4f[0], b.v4f[0]); res.v4f[1] = por(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pxor(a.v4f[0], b.v4f[0]); res.v4f[1] = pxor(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pandnot(a.v4f[0], b.v4f[0]); res.v4f[1] = pandnot(a.v4f[1], b.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pround(const Packet4f& a) { Packet4f res; res.v4f[0] = vec_round(a.v4f[0]); res.v4f[1] = vec_round(a.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { Packet4f res; res.v4f[0] = vec_ceil(a.v4f[0]); res.v4f[1] = vec_ceil(a.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { Packet4f res; res.v4f[0] = vec_floor(a.v4f[0]); res.v4f[1] = vec_floor(a.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { Packet4f p = pload(from); p.v4f[1] = vec_splat(p.v4f[0], 1); p.v4f[0] = vec_splat(p.v4f[0], 0); return p; } template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { float EIGEN_ALIGN16 x[2]; vec_st2f(a.v4f[0], &x[0]); return x[0]; } template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { Packet4f rev; rev.v4f[0] = preverse(a.v4f[1]); rev.v4f[1] = preverse(a.v4f[0]); return rev; } template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { Packet4f res; res.v4f[0] = pabs(a.v4f[0]); res.v4f[1] = pabs(a.v4f[1]); return res; } template<> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { Packet2d sum; sum = padd(a.v4f[0], a.v4f[1]); double first = predux(sum); return static_cast(first); } template<> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { // Return predux_mul of the subvectors product return static_cast(pfirst(predux_mul(pmul(a.v4f[0], a.v4f[1])))); } template<> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { Packet2d b, res; b = pmin(a.v4f[0], a.v4f[1]); res = pmin(b, reinterpret_cast(vec_sld(reinterpret_cast(b), reinterpret_cast(b), 8))); return static_cast(pfirst(res)); } template<> EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { Packet2d b, res; b = pmax(a.v4f[0], a.v4f[1]); res = pmax(b, reinterpret_cast(vec_sld(reinterpret_cast(b), reinterpret_cast(b), 8))); return static_cast(pfirst(res)); } /* Split the Packet4f PacketBlock into 4 Packet2d PacketBlocks and transpose each one */ EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { PacketBlock t0,t1,t2,t3; // copy top-left 2x2 Packet2d block t0.packet[0] = kernel.packet[0].v4f[0]; t0.packet[1] = kernel.packet[1].v4f[0]; // copy top-right 2x2 Packet2d block t1.packet[0] = kernel.packet[0].v4f[1]; t1.packet[1] = kernel.packet[1].v4f[1]; // copy bottom-left 2x2 Packet2d block t2.packet[0] = kernel.packet[2].v4f[0]; t2.packet[1] = kernel.packet[3].v4f[0]; // copy bottom-right 2x2 Packet2d block t3.packet[0] = kernel.packet[2].v4f[1]; t3.packet[1] = kernel.packet[3].v4f[1]; // Transpose all 2x2 blocks ptranspose(t0); ptranspose(t1); ptranspose(t2); ptranspose(t3); // Copy back transposed blocks, but exchange t1 and t2 due to transposition kernel.packet[0].v4f[0] = t0.packet[0]; kernel.packet[0].v4f[1] = t2.packet[0]; kernel.packet[1].v4f[0] = t0.packet[1]; kernel.packet[1].v4f[1] = t2.packet[1]; kernel.packet[2].v4f[0] = t1.packet[0]; kernel.packet[2].v4f[1] = t3.packet[0]; kernel.packet[3].v4f[0] = t1.packet[1]; kernel.packet[3].v4f[1] = t3.packet[1]; } template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) { Packet2ul select_hi = { ifPacket.select[0], ifPacket.select[1] }; Packet2ul select_lo = { ifPacket.select[2], ifPacket.select[3] }; Packet2ul mask_hi = vec_cmpeq(select_hi, reinterpret_cast(p2l_ONE)); Packet2ul mask_lo = vec_cmpeq(select_lo, reinterpret_cast(p2l_ONE)); Packet4f result; result.v4f[0] = vec_sel(elsePacket.v4f[0], thenPacket.v4f[0], mask_hi); result.v4f[1] = vec_sel(elsePacket.v4f[1], thenPacket.v4f[1], mask_lo); return result; } template<> Packet4f EIGEN_STRONG_INLINE pcmp_le(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pcmp_le(a.v4f[0], b.v4f[0]); res.v4f[1] = pcmp_le(a.v4f[1], b.v4f[1]); return res; } template<> Packet4f EIGEN_STRONG_INLINE pcmp_lt(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pcmp_lt(a.v4f[0], b.v4f[0]); res.v4f[1] = pcmp_lt(a.v4f[1], b.v4f[1]); return res; } template<> Packet4f EIGEN_STRONG_INLINE pcmp_eq(const Packet4f& a, const Packet4f& b) { Packet4f res; res.v4f[0] = pcmp_eq(a.v4f[0], b.v4f[0]); res.v4f[1] = pcmp_eq(a.v4f[1], b.v4f[1]); return res; } #else template<> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_LOAD Packet *vfrom; vfrom = (Packet *) from; return vfrom->v4f; } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { // FIXME: No intrinsic yet EIGEN_DEBUG_ALIGNED_STORE Packet *vto; vto = (Packet *) to; vto->v4f = from; } template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { return vec_splats(from); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const float *a, Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3) { a3 = pload(a); a0 = vec_splat(a3, 0); a1 = vec_splat(a3, 1); a2 = vec_splat(a3, 2); a3 = vec_splat(a3, 3); } template<> EIGEN_DEVICE_FUNC inline Packet4f pgather(const float* from, Index stride) { float EIGEN_ALIGN16 af[4]; af[0] = from[0*stride]; af[1] = from[1*stride]; af[2] = from[2*stride]; af[3] = from[3*stride]; return pload(af); } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet4f& from, Index stride) { float EIGEN_ALIGN16 af[4]; pstore((float*)af, from); to[0*stride] = af[0]; to[1*stride] = af[1]; to[2*stride] = af[2]; to[3*stride] = af[3]; } template<> EIGEN_STRONG_INLINE Packet4f padd(const Packet4f& a, const Packet4f& b) { return (a + b); } template<> EIGEN_STRONG_INLINE Packet4f psub(const Packet4f& a, const Packet4f& b) { return (a - b); } template<> EIGEN_STRONG_INLINE Packet4f pmul(const Packet4f& a, const Packet4f& b) { return (a * b); } template<> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { return (a / b); } template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { return (-a); } template<> EIGEN_STRONG_INLINE Packet4f pconj (const Packet4f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4f pmadd (const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vec_madd(a, b, c); } template<> EIGEN_STRONG_INLINE Packet4f pmin (const Packet4f& a, const Packet4f& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pmax (const Packet4f& a, const Packet4f& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pand (const Packet4f& a, const Packet4f& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet4f por (const Packet4f& a, const Packet4f& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pxor (const Packet4f& a, const Packet4f& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { return vec_and(a, vec_nor(b, b)); } template<> EIGEN_STRONG_INLINE Packet4f pround (const Packet4f& a) { return vec_round(a); } template<> EIGEN_STRONG_INLINE Packet4f pceil (const Packet4f& a) { return vec_ceil(a); } template<> EIGEN_STRONG_INLINE Packet4f pfloor (const Packet4f& a) { return vec_floor(a); } template<> EIGEN_STRONG_INLINE Packet4f pabs (const Packet4f& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { float EIGEN_ALIGN16 x[4]; pstore(x, a); return x[0]; } template<> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { Packet4f p = pload(from); return vec_perm(p, p, p16uc_DUPLICATE32_HI); } template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE32)); } template<> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { Packet4f b, sum; b = vec_sld(a, a, 8); sum = padd(a, b); b = vec_sld(sum, sum, 4); sum = padd(sum, b); return pfirst(sum); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { Packet4f prod; prod = pmul(a, vec_sld(a, a, 8)); return pfirst(pmul(prod, vec_sld(prod, prod, 4))); } // min template<> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { Packet4f b, res; b = pmin(a, vec_sld(a, a, 8)); res = pmin(b, vec_sld(b, b, 4)); return pfirst(res); } // max template<> EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { Packet4f b, res; b = pmax(a, vec_sld(a, a, 8)); res = pmax(b, vec_sld(b, b, 4)); return pfirst(res); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet4f t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); Packet4f t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); Packet4f t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); Packet4f t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) { Packet4ui select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3] }; Packet4ui mask = vec_cmpeq(select, reinterpret_cast(p4i_ONE)); return vec_sel(elsePacket, thenPacket, mask); } #endif template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { EIGEN_ZVECTOR_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE Packet4f ploadu (const float* from) { return pload(from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet4f& from) { pstore(to, from); } template<> EIGEN_STRONG_INLINE Packet4f plset (const float& a) { return padd(pset1(a), p4f_COUNTDOWN); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_ZVECTOR_H RcppEigen/inst/include/Eigen/src/Core/arch/ZVector/Complex.h0000644000176200001440000004053014562417221023333 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Gael Guennebaud // Copyright (C) 2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX32_ALTIVEC_H #define EIGEN_COMPLEX32_ALTIVEC_H namespace Eigen { namespace internal { #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ >= 12) static Packet4ui p4ui_CONJ_XOR = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; //vec_mergeh((Packet4ui)p4i_ZERO, (Packet4ui)p4f_MZERO); #endif static Packet2ul p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2d_ZERO_, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 }; static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO, (Packet4ui) p2d_ZERO_, 8);//{ 0x8000000000000000, 0x0000000000000000 }; struct Packet1cd { EIGEN_STRONG_INLINE Packet1cd() {} EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {} Packet2d v; }; struct Packet2cf { EIGEN_STRONG_INLINE Packet2cf() {} EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {} #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ < 12) union { Packet4f v; Packet1cd cd[2]; }; #else Packet4f v; #endif }; template<> struct packet_traits > : default_packet_traits { typedef Packet2cf type; typedef Packet2cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasBlend = 1, HasSetLinear = 0 }; }; template<> struct packet_traits > : default_packet_traits { typedef Packet1cd type; typedef Packet1cd half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 1, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; }; template<> struct unpacket_traits { typedef std::complex type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; }; /* Forward declaration */ EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel); /* complex first */ template<> EIGEN_STRONG_INLINE Packet1cd pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet1cd ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu((const double*)from)); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); } template<> EIGEN_STRONG_INLINE Packet1cd pset1(const std::complex& from) { /* here we really have to use unaligned loads :( */ return ploadu(&from); } template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather, Packet1cd>(const std::complex* from, Index stride EIGEN_UNUSED) { return pload(from); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet1cd>(std::complex* to, const Packet1cd& from, Index stride EIGEN_UNUSED) { pstore >(to, from); } template<> EIGEN_STRONG_INLINE Packet1cd padd(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v + b.v); } template<> EIGEN_STRONG_INLINE Packet1cd psub(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v - b.v); } template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); } template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd((Packet2d)vec_xor((Packet2d)a.v, (Packet2d)p2ul_CONJ_XOR2)); } template<> EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) { Packet2d a_re, a_im, v1, v2; // Permute and multiply the real parts of a and b a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI); // Get the imaginary parts of a a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO); // multiply a_re * b v1 = vec_madd(a_re, b.v, p2d_ZERO); // multiply a_im * b and get the conjugate result v2 = vec_madd(a_im, b.v, p2d_ZERO); v2 = (Packet2d) vec_sld((Packet4ui)v2, (Packet4ui)v2, 8); v2 = (Packet2d) vec_xor((Packet2d)v2, (Packet2d) p2ul_CONJ_XOR1); return Packet1cd(v1 + v2); } template<> EIGEN_STRONG_INLINE Packet1cd pand (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_and(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd por (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_or(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pxor (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_xor(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pandnot (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_and(a.v, vec_nor(b.v,b.v))); } template<> EIGEN_STRONG_INLINE Packet1cd ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) { Packet2d eq = vec_cmpeq (a.v, b.v); Packet2d tmp = { eq[1], eq[0] }; return (Packet1cd)pand(eq, tmp); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { EIGEN_ZVECTOR_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cd& a) { std::complex EIGEN_ALIGN16 res; pstore >(&res, a); return res; } template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet1cd& a) { return pfirst(a); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cd& a) { return pfirst(a); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d) template<> EIGEN_STRONG_INLINE Packet1cd pdiv(const Packet1cd& a, const Packet1cd& b) { // TODO optimize it for AltiVec Packet1cd res = pmul(a,pconj(b)); Packet2d s = vec_madd(b.v, b.v, p2d_ZERO_); return Packet1cd(pdiv(res.v, s + vec_perm(s, s, p16uc_REVERSE64))); } EIGEN_STRONG_INLINE Packet1cd pcplxflip/**/(const Packet1cd& x) { return Packet1cd(preverse(Packet2d(x.v))); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet2d tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI); kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO); kernel.packet[0].v = tmp; } /* complex follows */ template<> EIGEN_STRONG_INLINE Packet2cf pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload((const float*)from)); } template<> EIGEN_STRONG_INLINE Packet2cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu((const float*)from)); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cf& a) { std::complex EIGEN_ALIGN16 res[2]; pstore >(res, a); return res[0]; } #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ < 12) template<> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { Packet2cf res; res.cd[0] = Packet1cd(vec_ld2f((const float *)&from)); res.cd[1] = res.cd[0]; return res; } #else template<> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { Packet2cf res; if((std::ptrdiff_t(&from) % 16) == 0) res.v = pload((const float *)&from); else res.v = ploadu((const float *)&from); res.v = vec_perm(res.v, res.v, p16uc_PSET64_HI); return res; } #endif template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather, Packet2cf>(const std::complex* from, Index stride) { std::complex EIGEN_ALIGN16 af[2]; af[0] = from[0*stride]; af[1] = from[1*stride]; return pload(af); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cf>(std::complex* to, const Packet2cf& from, Index stride) { std::complex EIGEN_ALIGN16 af[2]; pstore >((std::complex *) af, from); to[0*stride] = af[0]; to[1*stride] = af[1]; } template<> EIGEN_STRONG_INLINE Packet2cf padd(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(padd(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf psub(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(psub(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(Packet4f(a.v))); } template<> EIGEN_STRONG_INLINE Packet2cf pand (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pand(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf por (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(por(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pxor (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pxor(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pandnot(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pandnot(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { EIGEN_ZVECTOR_PREFETCH(addr); } #if !defined(__ARCH__) || (defined(__ARCH__) && __ARCH__ < 12) template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) { Packet4f eq = pcmp_eq (a.v, b.v); Packet2cf res; Packet2d tmp1 = { eq.v4f[0][1], eq.v4f[0][0] }; Packet2d tmp2 = { eq.v4f[1][1], eq.v4f[1][0] }; res.v.v4f[0] = pand(eq.v4f[0], tmp1); res.v.v4f[1] = pand(eq.v4f[1], tmp2); return res; } template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { Packet2cf res; res.v.v4f[0] = pconj(Packet1cd(reinterpret_cast(a.v.v4f[0]))).v; res.v.v4f[1] = pconj(Packet1cd(reinterpret_cast(a.v.v4f[1]))).v; return res; } template<> EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { Packet2cf res; res.v.v4f[0] = pmul(Packet1cd(reinterpret_cast(a.v.v4f[0])), Packet1cd(reinterpret_cast(b.v.v4f[0]))).v; res.v.v4f[1] = pmul(Packet1cd(reinterpret_cast(a.v.v4f[1])), Packet1cd(reinterpret_cast(b.v.v4f[1]))).v; return res; } template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { Packet2cf res; res.cd[0] = a.cd[1]; res.cd[1] = a.cd[0]; return res; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { std::complex res; Packet1cd b = padd(a.cd[0], a.cd[1]); vec_st2f(b.v, (float*)&res); return res; } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { std::complex res; Packet1cd b = pmul(a.cd[0], a.cd[1]); vec_st2f(b.v, (float*)&res); return res; } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f) template<> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { // TODO optimize it for AltiVec Packet2cf res; res.cd[0] = pdiv(a.cd[0], b.cd[0]); res.cd[1] = pdiv(a.cd[1], b.cd[1]); return res; } EIGEN_STRONG_INLINE Packet2cf pcplxflip/**/(const Packet2cf& x) { Packet2cf res; res.cd[0] = pcplxflip(x.cd[0]); res.cd[1] = pcplxflip(x.cd[1]); return res; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet1cd tmp = kernel.packet[0].cd[1]; kernel.packet[0].cd[1] = kernel.packet[1].cd[0]; kernel.packet[1].cd[0] = tmp; } template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) { Packet2cf result; const Selector<4> ifPacket4 = { ifPacket.select[0], ifPacket.select[0], ifPacket.select[1], ifPacket.select[1] }; result.v = pblend(ifPacket4, thenPacket.v, elsePacket.v); return result; } #else template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) { Packet4f eq = vec_cmpeq (a.v, b.v); Packet4f tmp = { eq[1], eq[0], eq[3], eq[2] }; return (Packet2cf)pand(eq, tmp); } template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor(a.v, reinterpret_cast(p4ui_CONJ_XOR))); } template<> EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { Packet4f a_re, a_im, prod, prod_im; // Permute and multiply the real parts of a and b a_re = vec_perm(a.v, a.v, p16uc_PSET32_WODD); // Get the imaginary parts of a a_im = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN); // multiply a_im * b and get the conjugate result prod_im = a_im * b.v; prod_im = pxor(prod_im, reinterpret_cast(p4ui_CONJ_XOR)); // permute back to a proper order prod_im = vec_perm(prod_im, prod_im, p16uc_COMPLEX32_REV); // multiply a_re * b, add prod_im prod = pmadd(a_re, b.v, prod_im); return Packet2cf(prod); } template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { Packet4f rev_a; rev_a = vec_perm(a.v, a.v, p16uc_COMPLEX32_REV2); return Packet2cf(rev_a); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { Packet4f b; b = vec_sld(a.v, a.v, 8); b = padd(a.v, b); return pfirst(Packet2cf(b)); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { Packet4f b; Packet2cf prod; b = vec_sld(a.v, a.v, 8); prod = pmul(a, Packet2cf(b)); return pfirst(prod); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f) template<> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { // TODO optimize it for AltiVec Packet2cf res = pmul(a, pconj(b)); Packet4f s = pmul(b.v, b.v); return Packet2cf(pdiv(res.v, padd(s, vec_perm(s, s, p16uc_COMPLEX32_REV)))); } template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip(const Packet2cf& x) { return Packet2cf(vec_perm(x.v, x.v, p16uc_COMPLEX32_REV)); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet4f tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI); kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO); kernel.packet[0].v = tmp; } template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) { Packet2cf result; result.v = reinterpret_cast(pblend(ifPacket, reinterpret_cast(thenPacket.v), reinterpret_cast(elsePacket.v))); return result; } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX32_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/SVE/0000755000176200001440000000000014562417221020612 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/SVE/MathFunctions.h0000644000176200001440000000225214562417221023546 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020, Arm Limited and Contributors // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATH_FUNCTIONS_SVE_H #define EIGEN_MATH_FUNCTIONS_SVE_H namespace Eigen { namespace internal { template <> EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf pexp(const PacketXf& x) { return pexp_float(x); } template <> EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf plog(const PacketXf& x) { return plog_float(x); } template <> EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf psin(const PacketXf& x) { return psin_float(x); } template <> EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf pcos(const PacketXf& x) { return pcos_float(x); } // Hyperbolic Tangent function. template <> EIGEN_STRONG_INLINE EIGEN_UNUSED PacketXf ptanh(const PacketXf& x) { return internal::generic_fast_tanh_float(x); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_SVE_H RcppEigen/inst/include/Eigen/src/Core/arch/SVE/PacketMath.h0000644000176200001440000005132014562417221023005 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020, Arm Limited and Contributors // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_SVE_H #define EIGEN_PACKET_MATH_SVE_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 template struct sve_packet_size_selector { enum { size = SVEVectorLength / (sizeof(Scalar) * CHAR_BIT) }; }; /********************************* int32 **************************************/ typedef svint32_t PacketXi __attribute__((arm_sve_vector_bits(EIGEN_ARM64_SVE_VL))); template <> struct packet_traits : default_packet_traits { typedef PacketXi type; typedef PacketXi half; // Half not implemented yet enum { Vectorizable = 1, AlignedOnScalar = 1, size = sve_packet_size_selector::size, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasReduxp = 0 // Not implemented in SVE }; }; template <> struct unpacket_traits { typedef numext::int32_t type; typedef PacketXi half; // Half not yet implemented enum { size = sve_packet_size_selector::size, alignment = Aligned64, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template <> EIGEN_STRONG_INLINE void prefetch(const numext::int32_t* addr) { svprfw(svptrue_b32(), addr, SV_PLDL1KEEP); } template <> EIGEN_STRONG_INLINE PacketXi pset1(const numext::int32_t& from) { return svdup_n_s32(from); } template <> EIGEN_STRONG_INLINE PacketXi plset(const numext::int32_t& a) { numext::int32_t c[packet_traits::size]; for (int i = 0; i < packet_traits::size; i++) c[i] = i; return svadd_s32_z(svptrue_b32(), pset1(a), svld1_s32(svptrue_b32(), c)); } template <> EIGEN_STRONG_INLINE PacketXi padd(const PacketXi& a, const PacketXi& b) { return svadd_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi psub(const PacketXi& a, const PacketXi& b) { return svsub_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pnegate(const PacketXi& a) { return svneg_s32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE PacketXi pconj(const PacketXi& a) { return a; } template <> EIGEN_STRONG_INLINE PacketXi pmul(const PacketXi& a, const PacketXi& b) { return svmul_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pdiv(const PacketXi& a, const PacketXi& b) { return svdiv_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pmadd(const PacketXi& a, const PacketXi& b, const PacketXi& c) { return svmla_s32_z(svptrue_b32(), c, a, b); } template <> EIGEN_STRONG_INLINE PacketXi pmin(const PacketXi& a, const PacketXi& b) { return svmin_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pmax(const PacketXi& a, const PacketXi& b) { return svmax_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pcmp_le(const PacketXi& a, const PacketXi& b) { return svdup_n_s32_z(svcmplt_s32(svptrue_b32(), a, b), 0xffffffffu); } template <> EIGEN_STRONG_INLINE PacketXi pcmp_lt(const PacketXi& a, const PacketXi& b) { return svdup_n_s32_z(svcmplt_s32(svptrue_b32(), a, b), 0xffffffffu); } template <> EIGEN_STRONG_INLINE PacketXi pcmp_eq(const PacketXi& a, const PacketXi& b) { return svdup_n_s32_z(svcmpeq_s32(svptrue_b32(), a, b), 0xffffffffu); } template <> EIGEN_STRONG_INLINE PacketXi ptrue(const PacketXi& /*a*/) { return svdup_n_s32_z(svptrue_b32(), 0xffffffffu); } template <> EIGEN_STRONG_INLINE PacketXi pzero(const PacketXi& /*a*/) { return svdup_n_s32_z(svptrue_b32(), 0); } template <> EIGEN_STRONG_INLINE PacketXi pand(const PacketXi& a, const PacketXi& b) { return svand_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi por(const PacketXi& a, const PacketXi& b) { return svorr_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pxor(const PacketXi& a, const PacketXi& b) { return sveor_s32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXi pandnot(const PacketXi& a, const PacketXi& b) { return svbic_s32_z(svptrue_b32(), a, b); } template EIGEN_STRONG_INLINE PacketXi parithmetic_shift_right(PacketXi a) { return svasrd_n_s32_z(svptrue_b32(), a, N); } template EIGEN_STRONG_INLINE PacketXi plogical_shift_right(PacketXi a) { return svreinterpret_s32_u32(svlsr_u32_z(svptrue_b32(), svreinterpret_u32_s32(a), svdup_n_u32_z(svptrue_b32(), N))); } template EIGEN_STRONG_INLINE PacketXi plogical_shift_left(PacketXi a) { return svlsl_s32_z(svptrue_b32(), a, svdup_n_u32_z(svptrue_b32(), N)); } template <> EIGEN_STRONG_INLINE PacketXi pload(const numext::int32_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return svld1_s32(svptrue_b32(), from); } template <> EIGEN_STRONG_INLINE PacketXi ploadu(const numext::int32_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return svld1_s32(svptrue_b32(), from); } template <> EIGEN_STRONG_INLINE PacketXi ploaddup(const numext::int32_t* from) { svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...} return svld1_gather_u32index_s32(svptrue_b32(), from, indices); } template <> EIGEN_STRONG_INLINE PacketXi ploadquad(const numext::int32_t* from) { svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a0, a0, a1, a1, a1, a1, ...} return svld1_gather_u32index_s32(svptrue_b32(), from, indices); } template <> EIGEN_STRONG_INLINE void pstore(numext::int32_t* to, const PacketXi& from) { EIGEN_DEBUG_ALIGNED_STORE svst1_s32(svptrue_b32(), to, from); } template <> EIGEN_STRONG_INLINE void pstoreu(numext::int32_t* to, const PacketXi& from) { EIGEN_DEBUG_UNALIGNED_STORE svst1_s32(svptrue_b32(), to, from); } template <> EIGEN_DEVICE_FUNC inline PacketXi pgather(const numext::int32_t* from, Index stride) { // Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...} svint32_t indices = svindex_s32(0, stride); return svld1_gather_s32index_s32(svptrue_b32(), from, indices); } template <> EIGEN_DEVICE_FUNC inline void pscatter(numext::int32_t* to, const PacketXi& from, Index stride) { // Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...} svint32_t indices = svindex_s32(0, stride); svst1_scatter_s32index_s32(svptrue_b32(), to, indices, from); } template <> EIGEN_STRONG_INLINE numext::int32_t pfirst(const PacketXi& a) { // svlasta returns the first element if all predicate bits are 0 return svlasta_s32(svpfalse_b(), a); } template <> EIGEN_STRONG_INLINE PacketXi preverse(const PacketXi& a) { return svrev_s32(a); } template <> EIGEN_STRONG_INLINE PacketXi pabs(const PacketXi& a) { return svabs_s32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE numext::int32_t predux(const PacketXi& a) { return static_cast(svaddv_s32(svptrue_b32(), a)); } template <> EIGEN_STRONG_INLINE numext::int32_t predux_mul(const PacketXi& a) { EIGEN_STATIC_ASSERT((EIGEN_ARM64_SVE_VL % 128 == 0), EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT); // Multiply the vector by its reverse svint32_t prod = svmul_s32_z(svptrue_b32(), a, svrev_s32(a)); svint32_t half_prod; // Extract the high half of the vector. Depending on the VL more reductions need to be done if (EIGEN_ARM64_SVE_VL >= 2048) { half_prod = svtbl_s32(prod, svindex_u32(32, 1)); prod = svmul_s32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 1024) { half_prod = svtbl_s32(prod, svindex_u32(16, 1)); prod = svmul_s32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 512) { half_prod = svtbl_s32(prod, svindex_u32(8, 1)); prod = svmul_s32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 256) { half_prod = svtbl_s32(prod, svindex_u32(4, 1)); prod = svmul_s32_z(svptrue_b32(), prod, half_prod); } // Last reduction half_prod = svtbl_s32(prod, svindex_u32(2, 1)); prod = svmul_s32_z(svptrue_b32(), prod, half_prod); // The reduction is done to the first element. return pfirst(prod); } template <> EIGEN_STRONG_INLINE numext::int32_t predux_min(const PacketXi& a) { return svminv_s32(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE numext::int32_t predux_max(const PacketXi& a) { return svmaxv_s32(svptrue_b32(), a); } template EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { int buffer[packet_traits::size * N] = {0}; int i = 0; PacketXi stride_index = svindex_s32(0, N); for (i = 0; i < N; i++) { svst1_scatter_s32index_s32(svptrue_b32(), buffer + i, stride_index, kernel.packet[i]); } for (i = 0; i < N; i++) { kernel.packet[i] = svld1_s32(svptrue_b32(), buffer + i * packet_traits::size); } } /********************************* float32 ************************************/ typedef svfloat32_t PacketXf __attribute__((arm_sve_vector_bits(EIGEN_ARM64_SVE_VL))); template <> struct packet_traits : default_packet_traits { typedef PacketXf type; typedef PacketXf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = sve_packet_size_selector::size, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasReduxp = 0, // Not implemented in SVE HasDiv = 1, HasFloor = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, HasSqrt = 0, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH }; }; template <> struct unpacket_traits { typedef float type; typedef PacketXf half; // Half not yet implemented typedef PacketXi integer_packet; enum { size = sve_packet_size_selector::size, alignment = Aligned64, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template <> EIGEN_STRONG_INLINE PacketXf pset1(const float& from) { return svdup_n_f32(from); } template <> EIGEN_STRONG_INLINE PacketXf pset1frombits(numext::uint32_t from) { return svreinterpret_f32_u32(svdup_n_u32_z(svptrue_b32(), from)); } template <> EIGEN_STRONG_INLINE PacketXf plset(const float& a) { float c[packet_traits::size]; for (int i = 0; i < packet_traits::size; i++) c[i] = i; return svadd_f32_z(svptrue_b32(), pset1(a), svld1_f32(svptrue_b32(), c)); } template <> EIGEN_STRONG_INLINE PacketXf padd(const PacketXf& a, const PacketXf& b) { return svadd_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf psub(const PacketXf& a, const PacketXf& b) { return svsub_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pnegate(const PacketXf& a) { return svneg_f32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE PacketXf pconj(const PacketXf& a) { return a; } template <> EIGEN_STRONG_INLINE PacketXf pmul(const PacketXf& a, const PacketXf& b) { return svmul_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pdiv(const PacketXf& a, const PacketXf& b) { return svdiv_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmadd(const PacketXf& a, const PacketXf& b, const PacketXf& c) { return svmla_f32_z(svptrue_b32(), c, a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmin(const PacketXf& a, const PacketXf& b) { return svmin_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmin(const PacketXf& a, const PacketXf& b) { return pmin(a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmin(const PacketXf& a, const PacketXf& b) { return svminnm_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmax(const PacketXf& a, const PacketXf& b) { return svmax_f32_z(svptrue_b32(), a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmax(const PacketXf& a, const PacketXf& b) { return pmax(a, b); } template <> EIGEN_STRONG_INLINE PacketXf pmax(const PacketXf& a, const PacketXf& b) { return svmaxnm_f32_z(svptrue_b32(), a, b); } // Float comparisons in SVE return svbool (predicate). Use svdup to set active // lanes to 1 (0xffffffffu) and inactive lanes to 0. template <> EIGEN_STRONG_INLINE PacketXf pcmp_le(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svdup_n_u32_z(svcmplt_f32(svptrue_b32(), a, b), 0xffffffffu)); } template <> EIGEN_STRONG_INLINE PacketXf pcmp_lt(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svdup_n_u32_z(svcmplt_f32(svptrue_b32(), a, b), 0xffffffffu)); } template <> EIGEN_STRONG_INLINE PacketXf pcmp_eq(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svdup_n_u32_z(svcmpeq_f32(svptrue_b32(), a, b), 0xffffffffu)); } // Do a predicate inverse (svnot_b_z) on the predicate resulted from the // greater/equal comparison (svcmpge_f32). Then fill a float vector with the // active elements. template <> EIGEN_STRONG_INLINE PacketXf pcmp_lt_or_nan(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svdup_n_u32_z(svnot_b_z(svptrue_b32(), svcmpge_f32(svptrue_b32(), a, b)), 0xffffffffu)); } template <> EIGEN_STRONG_INLINE PacketXf pfloor(const PacketXf& a) { return svrintm_f32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE PacketXf ptrue(const PacketXf& /*a*/) { return svreinterpret_f32_u32(svdup_n_u32_z(svptrue_b32(), 0xffffffffu)); } // Logical Operations are not supported for float, so reinterpret casts template <> EIGEN_STRONG_INLINE PacketXf pand(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svand_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b))); } template <> EIGEN_STRONG_INLINE PacketXf por(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svorr_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b))); } template <> EIGEN_STRONG_INLINE PacketXf pxor(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(sveor_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b))); } template <> EIGEN_STRONG_INLINE PacketXf pandnot(const PacketXf& a, const PacketXf& b) { return svreinterpret_f32_u32(svbic_u32_z(svptrue_b32(), svreinterpret_u32_f32(a), svreinterpret_u32_f32(b))); } template <> EIGEN_STRONG_INLINE PacketXf pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return svld1_f32(svptrue_b32(), from); } template <> EIGEN_STRONG_INLINE PacketXf ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return svld1_f32(svptrue_b32(), from); } template <> EIGEN_STRONG_INLINE PacketXf ploaddup(const float* from) { svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...} return svld1_gather_u32index_f32(svptrue_b32(), from, indices); } template <> EIGEN_STRONG_INLINE PacketXf ploadquad(const float* from) { svuint32_t indices = svindex_u32(0, 1); // index {base=0, base+step=1, base+step*2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a1, a1, a2, a2, ...} indices = svzip1_u32(indices, indices); // index in the format {a0, a0, a0, a0, a1, a1, a1, a1, ...} return svld1_gather_u32index_f32(svptrue_b32(), from, indices); } template <> EIGEN_STRONG_INLINE void pstore(float* to, const PacketXf& from) { EIGEN_DEBUG_ALIGNED_STORE svst1_f32(svptrue_b32(), to, from); } template <> EIGEN_STRONG_INLINE void pstoreu(float* to, const PacketXf& from) { EIGEN_DEBUG_UNALIGNED_STORE svst1_f32(svptrue_b32(), to, from); } template <> EIGEN_DEVICE_FUNC inline PacketXf pgather(const float* from, Index stride) { // Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...} svint32_t indices = svindex_s32(0, stride); return svld1_gather_s32index_f32(svptrue_b32(), from, indices); } template <> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const PacketXf& from, Index stride) { // Indice format: {base=0, base+stride, base+stride*2, base+stride*3, ...} svint32_t indices = svindex_s32(0, stride); svst1_scatter_s32index_f32(svptrue_b32(), to, indices, from); } template <> EIGEN_STRONG_INLINE float pfirst(const PacketXf& a) { // svlasta returns the first element if all predicate bits are 0 return svlasta_f32(svpfalse_b(), a); } template <> EIGEN_STRONG_INLINE PacketXf preverse(const PacketXf& a) { return svrev_f32(a); } template <> EIGEN_STRONG_INLINE PacketXf pabs(const PacketXf& a) { return svabs_f32_z(svptrue_b32(), a); } // TODO(tellenbach): Should this go into MathFunctions.h? If so, change for // all vector extensions and the generic version. template <> EIGEN_STRONG_INLINE PacketXf pfrexp(const PacketXf& a, PacketXf& exponent) { return pfrexp_generic(a, exponent); } template <> EIGEN_STRONG_INLINE float predux(const PacketXf& a) { return svaddv_f32(svptrue_b32(), a); } // Other reduction functions: // mul // Only works for SVE Vls multiple of 128 template <> EIGEN_STRONG_INLINE float predux_mul(const PacketXf& a) { EIGEN_STATIC_ASSERT((EIGEN_ARM64_SVE_VL % 128 == 0), EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT); // Multiply the vector by its reverse svfloat32_t prod = svmul_f32_z(svptrue_b32(), a, svrev_f32(a)); svfloat32_t half_prod; // Extract the high half of the vector. Depending on the VL more reductions need to be done if (EIGEN_ARM64_SVE_VL >= 2048) { half_prod = svtbl_f32(prod, svindex_u32(32, 1)); prod = svmul_f32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 1024) { half_prod = svtbl_f32(prod, svindex_u32(16, 1)); prod = svmul_f32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 512) { half_prod = svtbl_f32(prod, svindex_u32(8, 1)); prod = svmul_f32_z(svptrue_b32(), prod, half_prod); } if (EIGEN_ARM64_SVE_VL >= 256) { half_prod = svtbl_f32(prod, svindex_u32(4, 1)); prod = svmul_f32_z(svptrue_b32(), prod, half_prod); } // Last reduction half_prod = svtbl_f32(prod, svindex_u32(2, 1)); prod = svmul_f32_z(svptrue_b32(), prod, half_prod); // The reduction is done to the first element. return pfirst(prod); } template <> EIGEN_STRONG_INLINE float predux_min(const PacketXf& a) { return svminv_f32(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE float predux_max(const PacketXf& a) { return svmaxv_f32(svptrue_b32(), a); } template EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { float buffer[packet_traits::size * N] = {0}; int i = 0; PacketXi stride_index = svindex_s32(0, N); for (i = 0; i < N; i++) { svst1_scatter_s32index_f32(svptrue_b32(), buffer + i, stride_index, kernel.packet[i]); } for (i = 0; i < N; i++) { kernel.packet[i] = svld1_f32(svptrue_b32(), buffer + i * packet_traits::size); } } template<> EIGEN_STRONG_INLINE PacketXf pldexp(const PacketXf& a, const PacketXf& exponent) { return pldexp_generic(a, exponent); } } // namespace internal } // namespace Eigen #endif // EIGEN_PACKET_MATH_SVE_H RcppEigen/inst/include/Eigen/src/Core/arch/SVE/TypeCasting.h0000644000176200001440000000250714562417221023221 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020, Arm Limited and Contributors // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_SVE_H #define EIGEN_TYPE_CASTING_SVE_H namespace Eigen { namespace internal { template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE PacketXf pcast(const PacketXi& a) { return svcvt_f32_s32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE PacketXi pcast(const PacketXf& a) { return svcvt_s32_f32_z(svptrue_b32(), a); } template <> EIGEN_STRONG_INLINE PacketXf preinterpret(const PacketXi& a) { return svreinterpret_f32_s32(a); } template <> EIGEN_STRONG_INLINE PacketXi preinterpret(const PacketXf& a) { return svreinterpret_s32_f32(a); } } // namespace internal } // namespace Eigen #endif // EIGEN_TYPE_CASTING_SVE_H RcppEigen/inst/include/Eigen/src/Core/arch/SSE/0000755000176200001440000000000014562417221020607 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/SSE/MathFunctions.h0000644000176200001440000001515514562417221023551 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007 Julien Pommier // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* The sin and cos and functions of this file come from * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/ */ #ifndef EIGEN_MATH_FUNCTIONS_SSE_H #define EIGEN_MATH_FUNCTIONS_SSE_H namespace Eigen { namespace internal { template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog(const Packet4f& _x) { return plog_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d plog(const Packet2d& _x) { return plog_double(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog2(const Packet4f& _x) { return plog2_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d plog2(const Packet2d& _x) { return plog2_double(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog1p(const Packet4f& _x) { return generic_plog1p(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexpm1(const Packet4f& _x) { return generic_expm1(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp(const Packet4f& _x) { return pexp_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp(const Packet2d& x) { return pexp_double(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psin(const Packet4f& _x) { return psin_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pcos(const Packet4f& _x) { return pcos_float(_x); } #if EIGEN_FAST_MATH // Functions for sqrt. // The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step // of Newton's method, at a cost of 1-2 bits of precision as opposed to the // exact solution. It does not handle +inf, or denormalized numbers correctly. // The main advantage of this approach is not just speed, but also the fact that // it can be inlined and pipelined with other computations, further reducing its // effective latency. This is similar to Quake3's fast inverse square root. // For detail see here: http://www.beyond3d.com/content/articles/8/ template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psqrt(const Packet4f& _x) { Packet4f minus_half_x = pmul(_x, pset1(-0.5f)); Packet4f denormal_mask = pandnot( pcmp_lt(_x, pset1((std::numeric_limits::min)())), pcmp_lt(_x, pzero(_x))); // Compute approximate reciprocal sqrt. Packet4f x = _mm_rsqrt_ps(_x); // Do a single step of Newton's iteration. x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1(1.5f))); // Flush results for denormals to zero. return pandnot(pmul(_x,x), denormal_mask); } #else template<>EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psqrt(const Packet4f& x) { return _mm_sqrt_ps(x); } #endif template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d psqrt(const Packet2d& x) { return _mm_sqrt_pd(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16b psqrt(const Packet16b& x) { return x; } #if EIGEN_FAST_MATH template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f prsqrt(const Packet4f& _x) { _EIGEN_DECLARE_CONST_Packet4f(one_point_five, 1.5f); _EIGEN_DECLARE_CONST_Packet4f(minus_half, -0.5f); _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inf, 0x7f800000u); _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(flt_min, 0x00800000u); Packet4f neg_half = pmul(_x, p4f_minus_half); // Identity infinite, zero, negative and denormal arguments. Packet4f lt_min_mask = _mm_cmplt_ps(_x, p4f_flt_min); Packet4f inf_mask = _mm_cmpeq_ps(_x, p4f_inf); Packet4f not_normal_finite_mask = _mm_or_ps(lt_min_mask, inf_mask); // Compute an approximate result using the rsqrt intrinsic. Packet4f y_approx = _mm_rsqrt_ps(_x); // Do a single step of Newton-Raphson iteration to improve the approximation. // This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n). // It is essential to evaluate the inner term like this because forming // y_n^2 may over- or underflow. Packet4f y_newton = pmul( y_approx, pmadd(y_approx, pmul(neg_half, y_approx), p4f_one_point_five)); // Select the result of the Newton-Raphson step for positive normal arguments. // For other arguments, choose the output of the intrinsic. This will // return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(x) = +inf if // x is zero or a positive denormalized float (equivalent to flushing positive // denormalized inputs to zero). return pselect(not_normal_finite_mask, y_approx, y_newton); } #else template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f prsqrt(const Packet4f& x) { // Unfortunately we can't use the much faster mm_rsqrt_ps since it only provides an approximation. return _mm_div_ps(pset1(1.0f), _mm_sqrt_ps(x)); } #endif template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d prsqrt(const Packet2d& x) { return _mm_div_pd(pset1(1.0), _mm_sqrt_pd(x)); } // Hyperbolic Tangent function. template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh(const Packet4f& x) { return internal::generic_fast_tanh_float(x); } } // end namespace internal namespace numext { template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float sqrt(const float &x) { return internal::pfirst(internal::Packet4f(_mm_sqrt_ss(_mm_set_ss(x)))); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double sqrt(const double &x) { #if EIGEN_COMP_GNUC_STRICT // This works around a GCC bug generating poor code for _mm_sqrt_pd // See https://gitlab.com/libeigen/eigen/commit/8dca9f97e38970 return internal::pfirst(internal::Packet2d(__builtin_ia32_sqrtsd(_mm_set_sd(x)))); #else return internal::pfirst(internal::Packet2d(_mm_sqrt_pd(_mm_set_sd(x)))); #endif } } // end namespace numex } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_SSE_H RcppEigen/inst/include/Eigen/src/Core/arch/SSE/PacketMath.h0000755000176200001440000017572114562417221023021 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_SSE_H #define EIGEN_PACKET_MATH_SSE_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #if !defined(EIGEN_VECTORIZE_AVX) && !defined(EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS) // 32 bits => 8 registers // 64 bits => 16 registers #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS (2*sizeof(void*)) #endif #ifdef EIGEN_VECTORIZE_FMA #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #endif #if ((defined EIGEN_VECTORIZE_AVX) && (EIGEN_COMP_GNUC_STRICT || EIGEN_COMP_MINGW) && (__GXX_ABI_VERSION < 1004)) || EIGEN_OS_QNX // With GCC's default ABI version, a __m128 or __m256 are the same types and therefore we cannot // have overloads for both types without linking error. // One solution is to increase ABI version using -fabi-version=4 (or greater). // Otherwise, we workaround this inconvenience by wrapping 128bit types into the following helper // structure: typedef eigen_packet_wrapper<__m128> Packet4f; typedef eigen_packet_wrapper<__m128d> Packet2d; #else typedef __m128 Packet4f; typedef __m128d Packet2d; #endif typedef eigen_packet_wrapper<__m128i, 0> Packet4i; typedef eigen_packet_wrapper<__m128i, 1> Packet16b; template<> struct is_arithmetic<__m128> { enum { value = true }; }; template<> struct is_arithmetic<__m128i> { enum { value = true }; }; template<> struct is_arithmetic<__m128d> { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template struct shuffle_mask{ enum { mask = (s)<<6|(r)<<4|(q)<<2|(p) }; }; // TODO: change the implementation of all swizzle* ops from macro to template, #define vec4f_swizzle1(v,p,q,r,s) \ Packet4f(_mm_castsi128_ps(_mm_shuffle_epi32( _mm_castps_si128(v), (shuffle_mask::mask)))) #define vec4i_swizzle1(v,p,q,r,s) \ Packet4i(_mm_shuffle_epi32( v, (shuffle_mask::mask))) #define vec2d_swizzle1(v,p,q) \ Packet2d(_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v), (shuffle_mask<2*p,2*p+1,2*q,2*q+1>::mask)))) #define vec4f_swizzle2(a,b,p,q,r,s) \ Packet4f(_mm_shuffle_ps( (a), (b), (shuffle_mask::mask))) #define vec4i_swizzle2(a,b,p,q,r,s) \ Packet4i(_mm_castps_si128( (_mm_shuffle_ps( _mm_castsi128_ps(a), _mm_castsi128_ps(b), (shuffle_mask::mask))))) EIGEN_STRONG_INLINE Packet4f vec4f_movelh(const Packet4f& a, const Packet4f& b) { return Packet4f(_mm_movelh_ps(a,b)); } EIGEN_STRONG_INLINE Packet4f vec4f_movehl(const Packet4f& a, const Packet4f& b) { return Packet4f(_mm_movehl_ps(a,b)); } EIGEN_STRONG_INLINE Packet4f vec4f_unpacklo(const Packet4f& a, const Packet4f& b) { return Packet4f(_mm_unpacklo_ps(a,b)); } EIGEN_STRONG_INLINE Packet4f vec4f_unpackhi(const Packet4f& a, const Packet4f& b) { return Packet4f(_mm_unpackhi_ps(a,b)); } #define vec4f_duplane(a,p) \ vec4f_swizzle2(a,a,p,p,p,p) #define vec2d_swizzle2(a,b,mask) \ Packet2d(_mm_shuffle_pd(a,b,mask)) EIGEN_STRONG_INLINE Packet2d vec2d_unpacklo(const Packet2d& a, const Packet2d& b) { return Packet2d(_mm_unpacklo_pd(a,b)); } EIGEN_STRONG_INLINE Packet2d vec2d_unpackhi(const Packet2d& a, const Packet2d& b) { return Packet2d(_mm_unpackhi_pd(a,b)); } #define vec2d_duplane(a,p) \ vec2d_swizzle2(a,a,(p<<1)|p) #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \ const Packet4f p4f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet2d(NAME,X) \ const Packet2d p2d_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \ const Packet4f p4f_##NAME = pset1frombits(X) #define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \ const Packet4i p4i_##NAME = pset1(X) // Use the packet_traits defined in AVX/PacketMath.h instead if we're going // to leverage AVX instructions. #ifndef EIGEN_VECTORIZE_AVX template <> struct packet_traits : default_packet_traits { typedef Packet4f type; typedef Packet4f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasCmp = 1, HasDiv = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasNdtri = 1, HasExp = 1, HasBessel = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBlend = 1, HasCeil = 1, HasFloor = 1, #ifdef EIGEN_VECTORIZE_SSE4_1 HasRound = 1, #endif HasRint = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet2d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=2, HasHalfPacket = 0, HasCmp = 1, HasDiv = 1, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasBlend = 1, HasFloor = 1, HasCeil = 1, #ifdef EIGEN_VECTORIZE_SSE4_1 HasRound = 1, #endif HasRint = 1 }; }; #endif template<> struct packet_traits : default_packet_traits { typedef Packet4i type; typedef Packet4i half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=4, HasShift = 1, HasBlend = 1 }; }; template<> struct packet_traits : default_packet_traits { typedef Packet16b type; typedef Packet16b half; enum { Vectorizable = 1, AlignedOnScalar = 1, HasHalfPacket = 0, size=16, HasAdd = 1, HasSub = 1, HasShift = 0, HasMul = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasConj = 0, HasSqrt = 1 }; }; template<> struct unpacket_traits { typedef float type; typedef Packet4f half; typedef Packet4i integer_packet; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef double type; typedef Packet2d half; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef int type; typedef Packet4i half; enum {size=4, alignment=Aligned16, vectorizable=false, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef bool type; typedef Packet16b half; enum {size=16, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; #ifndef EIGEN_VECTORIZE_AVX template<> struct scalar_div_cost { enum { value = 7 }; }; template<> struct scalar_div_cost { enum { value = 8 }; }; #endif #if EIGEN_COMP_MSVC==1500 // Workaround MSVC 9 internal compiler error. // TODO: It has been detected with win64 builds (amd64), so let's check whether it also happens in 32bits+SSE mode // TODO: let's check whether there does not exist a better fix, like adding a pset0() function. (it crashed on pset1(0)). template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { return _mm_set_ps(from,from,from,from); } template<> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { return _mm_set_pd(from,from); } template<> EIGEN_STRONG_INLINE Packet4i pset1(const int& from) { return _mm_set_epi32(from,from,from,from); } #else template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { return _mm_set_ps1(from); } template<> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { return _mm_set1_pd(from); } template<> EIGEN_STRONG_INLINE Packet4i pset1(const int& from) { return _mm_set1_epi32(from); } #endif template<> EIGEN_STRONG_INLINE Packet16b pset1(const bool& from) { return _mm_set1_epi8(static_cast(from)); } template<> EIGEN_STRONG_INLINE Packet4f pset1frombits(unsigned int from) { return _mm_castsi128_ps(pset1(from)); } template<> EIGEN_STRONG_INLINE Packet2d pset1frombits(uint64_t from) { return _mm_castsi128_pd(_mm_set1_epi64x(from)); } template<> EIGEN_STRONG_INLINE Packet4f peven_mask(const Packet4f& /*a*/) { return _mm_castsi128_ps(_mm_set_epi32(0, -1, 0, -1)); } template<> EIGEN_STRONG_INLINE Packet4i peven_mask(const Packet4i& /*a*/) { return _mm_set_epi32(0, -1, 0, -1); } template<> EIGEN_STRONG_INLINE Packet2d peven_mask(const Packet2d& /*a*/) { return _mm_castsi128_pd(_mm_set_epi32(0, 0, -1, -1)); } template<> EIGEN_STRONG_INLINE Packet4f pzero(const Packet4f& /*a*/) { return _mm_setzero_ps(); } template<> EIGEN_STRONG_INLINE Packet2d pzero(const Packet2d& /*a*/) { return _mm_setzero_pd(); } template<> EIGEN_STRONG_INLINE Packet4i pzero(const Packet4i& /*a*/) { return _mm_setzero_si128(); } // GCC generates a shufps instruction for _mm_set1_ps/_mm_load1_ps instead of the more efficient pshufd instruction. // However, using inrinsics for pset1 makes gcc to generate crappy code in some cases (see bug 203) // Using inline assembly is also not an option because then gcc fails to reorder properly the instructions. // Therefore, we introduced the pload1 functions to be used in product kernels for which bug 203 does not apply. // Also note that with AVX, we want it to generate a vbroadcastss. #if EIGEN_COMP_GNUC_STRICT && (!defined __AVX__) template<> EIGEN_STRONG_INLINE Packet4f pload1(const float *from) { return vec4f_swizzle1(_mm_load_ss(from),0,0,0,0); } #endif template<> EIGEN_STRONG_INLINE Packet4f plset(const float& a) { return _mm_add_ps(pset1(a), _mm_set_ps(3,2,1,0)); } template<> EIGEN_STRONG_INLINE Packet2d plset(const double& a) { return _mm_add_pd(pset1(a),_mm_set_pd(1,0)); } template<> EIGEN_STRONG_INLINE Packet4i plset(const int& a) { return _mm_add_epi32(pset1(a),_mm_set_epi32(3,2,1,0)); } template<> EIGEN_STRONG_INLINE Packet4f padd(const Packet4f& a, const Packet4f& b) { return _mm_add_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d padd(const Packet2d& a, const Packet2d& b) { return _mm_add_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i padd(const Packet4i& a, const Packet4i& b) { return _mm_add_epi32(a,b); } template<> EIGEN_STRONG_INLINE Packet16b padd(const Packet16b& a, const Packet16b& b) { return _mm_or_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f psub(const Packet4f& a, const Packet4f& b) { return _mm_sub_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d psub(const Packet2d& a, const Packet2d& b) { return _mm_sub_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i psub(const Packet4i& a, const Packet4i& b) { return _mm_sub_epi32(a,b); } template<> EIGEN_STRONG_INLINE Packet16b psub(const Packet16b& a, const Packet16b& b) { return _mm_xor_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b); template<> EIGEN_STRONG_INLINE Packet4f paddsub(const Packet4f& a, const Packet4f& b) { #ifdef EIGEN_VECTORIZE_SSE3 return _mm_addsub_ps(a,b); #else const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x0,0x80000000,0x0)); return padd(a, pxor(mask, b)); #endif } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& , const Packet2d& ); template<> EIGEN_STRONG_INLINE Packet2d paddsub(const Packet2d& a, const Packet2d& b) { #ifdef EIGEN_VECTORIZE_SSE3 return _mm_addsub_pd(a,b); #else const Packet2d mask = _mm_castsi128_pd(_mm_setr_epi32(0x0,0x80000000,0x0,0x0)); return padd(a, pxor(mask, b)); #endif } template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x80000000,0x80000000)); return _mm_xor_ps(a,mask); } template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { const Packet2d mask = _mm_castsi128_pd(_mm_setr_epi32(0x0,0x80000000,0x0,0x80000000)); return _mm_xor_pd(a,mask); } template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return psub(Packet4i(_mm_setr_epi32(0,0,0,0)), a); } template<> EIGEN_STRONG_INLINE Packet16b pnegate(const Packet16b& a) { return psub(pset1(false), a); } template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4f pmul(const Packet4f& a, const Packet4f& b) { return _mm_mul_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pmul(const Packet2d& a, const Packet2d& b) { return _mm_mul_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pmul(const Packet4i& a, const Packet4i& b) { #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_mullo_epi32(a,b); #else // this version is slightly faster than 4 scalar products return vec4i_swizzle1( vec4i_swizzle2( _mm_mul_epu32(a,b), _mm_mul_epu32(vec4i_swizzle1(a,1,0,3,2), vec4i_swizzle1(b,1,0,3,2)), 0,2,0,2), 0,2,1,3); #endif } template<> EIGEN_STRONG_INLINE Packet16b pmul(const Packet16b& a, const Packet16b& b) { return _mm_and_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { return _mm_div_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pdiv(const Packet2d& a, const Packet2d& b) { return _mm_div_pd(a,b); } // for some weird raisons, it has to be overloaded for packet of integers template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return padd(pmul(a,b), c); } #ifdef EIGEN_VECTORIZE_FMA template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return _mm_fmadd_ps(a,b,c); } template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return _mm_fmadd_pd(a,b,c); } #endif #ifdef EIGEN_VECTORIZE_SSE4_1 template<> EIGEN_DEVICE_FUNC inline Packet4f pselect(const Packet4f& mask, const Packet4f& a, const Packet4f& b) { return _mm_blendv_ps(b,a,mask); } template<> EIGEN_DEVICE_FUNC inline Packet4i pselect(const Packet4i& mask, const Packet4i& a, const Packet4i& b) { return _mm_castps_si128(_mm_blendv_ps(_mm_castsi128_ps(b),_mm_castsi128_ps(a),_mm_castsi128_ps(mask))); } template<> EIGEN_DEVICE_FUNC inline Packet2d pselect(const Packet2d& mask, const Packet2d& a, const Packet2d& b) { return _mm_blendv_pd(b,a,mask); } template<> EIGEN_DEVICE_FUNC inline Packet16b pselect(const Packet16b& mask, const Packet16b& a, const Packet16b& b) { return _mm_blendv_epi8(b,a,mask); } #else template<> EIGEN_DEVICE_FUNC inline Packet16b pselect(const Packet16b& mask, const Packet16b& a, const Packet16b& b) { Packet16b a_part = _mm_and_si128(mask, a); Packet16b b_part = _mm_andnot_si128(mask, b); return _mm_or_si128(a_part, b_part); } #endif template<> EIGEN_STRONG_INLINE Packet4i ptrue(const Packet4i& a) { return _mm_cmpeq_epi32(a, a); } template<> EIGEN_STRONG_INLINE Packet16b ptrue(const Packet16b& a) { return _mm_cmpeq_epi8(a, a); } template<> EIGEN_STRONG_INLINE Packet4f ptrue(const Packet4f& a) { Packet4i b = _mm_castps_si128(a); return _mm_castsi128_ps(_mm_cmpeq_epi32(b, b)); } template<> EIGEN_STRONG_INLINE Packet2d ptrue(const Packet2d& a) { Packet4i b = _mm_castpd_si128(a); return _mm_castsi128_pd(_mm_cmpeq_epi32(b, b)); } template<> EIGEN_STRONG_INLINE Packet4f pand(const Packet4f& a, const Packet4f& b) { return _mm_and_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pand(const Packet2d& a, const Packet2d& b) { return _mm_and_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pand(const Packet4i& a, const Packet4i& b) { return _mm_and_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet16b pand(const Packet16b& a, const Packet16b& b) { return _mm_and_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f por(const Packet4f& a, const Packet4f& b) { return _mm_or_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d por(const Packet2d& a, const Packet2d& b) { return _mm_or_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i por(const Packet4i& a, const Packet4i& b) { return _mm_or_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet16b por(const Packet16b& a, const Packet16b& b) { return _mm_or_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b) { return _mm_xor_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& a, const Packet2d& b) { return _mm_xor_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pxor(const Packet4i& a, const Packet4i& b) { return _mm_xor_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet16b pxor(const Packet16b& a, const Packet16b& b) { return _mm_xor_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { return _mm_andnot_ps(b,a); } template<> EIGEN_STRONG_INLINE Packet2d pandnot(const Packet2d& a, const Packet2d& b) { return _mm_andnot_pd(b,a); } template<> EIGEN_STRONG_INLINE Packet4i pandnot(const Packet4i& a, const Packet4i& b) { return _mm_andnot_si128(b,a); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_le(const Packet4f& a, const Packet4f& b) { return _mm_cmple_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt(const Packet4f& a, const Packet4f& b) { return _mm_cmplt_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt_or_nan(const Packet4f& a, const Packet4f& b) { return _mm_cmpnge_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_eq(const Packet4f& a, const Packet4f& b) { return _mm_cmpeq_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_le(const Packet2d& a, const Packet2d& b) { return _mm_cmple_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt(const Packet2d& a, const Packet2d& b) { return _mm_cmplt_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt_or_nan(const Packet2d& a, const Packet2d& b) { return _mm_cmpnge_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_eq(const Packet2d& a, const Packet2d& b) { return _mm_cmpeq_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_lt(const Packet4i& a, const Packet4i& b) { return _mm_cmplt_epi32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_eq(const Packet4i& a, const Packet4i& b) { return _mm_cmpeq_epi32(a,b); } template<> EIGEN_STRONG_INLINE Packet16b pcmp_eq(const Packet16b& a, const Packet16b& b) { return _mm_cmpeq_epi8(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_le(const Packet4i& a, const Packet4i& b) { return por(pcmp_lt(a,b), pcmp_eq(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // There appears to be a bug in GCC, by which the optimizer may // flip the argument order in calls to _mm_min_ps, so we have to // resort to inline ASM here. This is supposed to be fixed in gcc6.3, // see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 #ifdef EIGEN_VECTORIZE_AVX Packet4f res; asm("vminps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); #else Packet4f res = b; asm("minps %[a], %[res]" : [res] "+x" (res) : [a] "x" (a)); #endif return res; #else // Arguments are reversed to match NaN propagation behavior of std::min. return _mm_min_ps(b, a); #endif } template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // There appears to be a bug in GCC, by which the optimizer may // flip the argument order in calls to _mm_min_pd, so we have to // resort to inline ASM here. This is supposed to be fixed in gcc6.3, // see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 #ifdef EIGEN_VECTORIZE_AVX Packet2d res; asm("vminpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); #else Packet2d res = b; asm("minpd %[a], %[res]" : [res] "+x" (res) : [a] "x" (a)); #endif return res; #else // Arguments are reversed to match NaN propagation behavior of std::min. return _mm_min_pd(b, a); #endif } template<> EIGEN_STRONG_INLINE Packet4i pmin(const Packet4i& a, const Packet4i& b) { #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_min_epi32(a,b); #else // after some bench, this version *is* faster than a scalar implementation Packet4i mask = _mm_cmplt_epi32(a,b); return _mm_or_si128(_mm_and_si128(mask,a),_mm_andnot_si128(mask,b)); #endif } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // There appears to be a bug in GCC, by which the optimizer may // flip the argument order in calls to _mm_max_ps, so we have to // resort to inline ASM here. This is supposed to be fixed in gcc6.3, // see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 #ifdef EIGEN_VECTORIZE_AVX Packet4f res; asm("vmaxps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); #else Packet4f res = b; asm("maxps %[a], %[res]" : [res] "+x" (res) : [a] "x" (a)); #endif return res; #else // Arguments are reversed to match NaN propagation behavior of std::max. return _mm_max_ps(b, a); #endif } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // There appears to be a bug in GCC, by which the optimizer may // flip the argument order in calls to _mm_max_pd, so we have to // resort to inline ASM here. This is supposed to be fixed in gcc6.3, // see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 #ifdef EIGEN_VECTORIZE_AVX Packet2d res; asm("vmaxpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); #else Packet2d res = b; asm("maxpd %[a], %[res]" : [res] "+x" (res) : [a] "x" (a)); #endif return res; #else // Arguments are reversed to match NaN propagation behavior of std::max. return _mm_max_pd(b, a); #endif } template<> EIGEN_STRONG_INLINE Packet4i pmax(const Packet4i& a, const Packet4i& b) { #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_max_epi32(a,b); #else // after some bench, this version *is* faster than a scalar implementation Packet4i mask = _mm_cmpgt_epi32(a,b); return _mm_or_si128(_mm_and_si128(mask,a),_mm_andnot_si128(mask,b)); #endif } template EIGEN_STRONG_INLINE Packet pminmax_propagate_numbers(const Packet& a, const Packet& b, Op op) { // In this implementation, we take advantage of the fact that pmin/pmax for SSE // always return a if either a or b is NaN. Packet not_nan_mask_a = pcmp_eq(a, a); Packet m = op(a, b); return pselect(not_nan_mask_a, m, b); } template EIGEN_STRONG_INLINE Packet pminmax_propagate_nan(const Packet& a, const Packet& b, Op op) { // In this implementation, we take advantage of the fact that pmin/pmax for SSE // always return a if either a or b is NaN. Packet not_nan_mask_a = pcmp_eq(a, a); Packet m = op(b, a); return pselect(not_nan_mask_a, m, a); } // Add specializations for min/max with prescribed NaN progation. template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { return pminmax_propagate_nan(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return pminmax_propagate_nan(a, b, pmax); } template EIGEN_STRONG_INLINE Packet4i parithmetic_shift_right(const Packet4i& a) { return _mm_srai_epi32(a,N); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_right (const Packet4i& a) { return _mm_srli_epi32(a,N); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_left (const Packet4i& a) { return _mm_slli_epi32(a,N); } template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF)); return _mm_and_ps(a,mask); } template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { const Packet2d mask = _mm_castsi128_pd(_mm_setr_epi32(0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF)); return _mm_and_pd(a,mask); } template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { #ifdef EIGEN_VECTORIZE_SSSE3 return _mm_abs_epi32(a); #else Packet4i aux = _mm_srai_epi32(a,31); return _mm_sub_epi32(_mm_xor_si128(a,aux),aux); #endif } #ifdef EIGEN_VECTORIZE_SSE4_1 template<> EIGEN_STRONG_INLINE Packet4f pround(const Packet4f& a) { // Unfortunatly _mm_round_ps doesn't have a rounding mode to implement numext::round. const Packet4f mask = pset1frombits(0x80000000u); const Packet4f prev0dot5 = pset1frombits(0x3EFFFFFFu); return _mm_round_ps(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template<> EIGEN_STRONG_INLINE Packet2d pround(const Packet2d& a) { const Packet2d mask = _mm_castsi128_pd(_mm_set_epi64x(0x8000000000000000ull, 0x8000000000000000ull)); const Packet2d prev0dot5 = _mm_castsi128_pd(_mm_set_epi64x(0x3FDFFFFFFFFFFFFFull, 0x3FDFFFFFFFFFFFFFull)); return _mm_round_pd(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template<> EIGEN_STRONG_INLINE Packet4f print(const Packet4f& a) { return _mm_round_ps(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet2d print(const Packet2d& a) { return _mm_round_pd(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { return _mm_ceil_ps(a); } template<> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { return _mm_ceil_pd(a); } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { return _mm_floor_ps(a); } template<> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { return _mm_floor_pd(a); } #else template<> EIGEN_STRONG_INLINE Packet4f print(const Packet4f& a) { // Adds and subtracts signum(a) * 2^23 to force rounding. const Packet4f limit = pset1(static_cast(1<<23)); const Packet4f abs_a = pabs(a); Packet4f r = padd(abs_a, limit); // Don't compile-away addition and subtraction. EIGEN_OPTIMIZATION_BARRIER(r); r = psub(r, limit); // If greater than limit, simply return a. Otherwise, account for sign. r = pselect(pcmp_lt(abs_a, limit), pselect(pcmp_lt(a, pzero(a)), pnegate(r), r), a); return r; } template<> EIGEN_STRONG_INLINE Packet2d print(const Packet2d& a) { // Adds and subtracts signum(a) * 2^52 to force rounding. const Packet2d limit = pset1(static_cast(1ull<<52)); const Packet2d abs_a = pabs(a); Packet2d r = padd(abs_a, limit); // Don't compile-away addition and subtraction. EIGEN_OPTIMIZATION_BARRIER(r); r = psub(r, limit); // If greater than limit, simply return a. Otherwise, account for sign. r = pselect(pcmp_lt(abs_a, limit), pselect(pcmp_lt(a, pzero(a)), pnegate(r), r), a); return r; } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { const Packet4f cst_1 = pset1(1.0f); Packet4f tmp = print(a); // If greater, subtract one. Packet4f mask = _mm_cmpgt_ps(tmp, a); mask = pand(mask, cst_1); return psub(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { const Packet2d cst_1 = pset1(1.0); Packet2d tmp = print(a); // If greater, subtract one. Packet2d mask = _mm_cmpgt_pd(tmp, a); mask = pand(mask, cst_1); return psub(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { const Packet4f cst_1 = pset1(1.0f); Packet4f tmp = print(a); // If smaller, add one. Packet4f mask = _mm_cmplt_ps(tmp, a); mask = pand(mask, cst_1); return padd(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { const Packet2d cst_1 = pset1(1.0); Packet2d tmp = print(a); // If smaller, add one. Packet2d mask = _mm_cmplt_pd(tmp, a); mask = pand(mask, cst_1); return padd(tmp, mask); } #endif template<> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_ps(from); } template<> EIGEN_STRONG_INLINE Packet2d pload(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_pd(from); } template<> EIGEN_STRONG_INLINE Packet4i pload(const int* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet16b pload(const bool* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_si128(reinterpret_cast(from)); } #if EIGEN_COMP_MSVC template<> EIGEN_STRONG_INLINE Packet4f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD #if (EIGEN_COMP_MSVC==1600) // NOTE Some version of MSVC10 generates bad code when using _mm_loadu_ps // (i.e., it does not generate an unaligned load!! __m128 res = _mm_loadl_pi(_mm_set1_ps(0.0f), (const __m64*)(from)); res = _mm_loadh_pi(res, (const __m64*)(from+2)); return res; #else return _mm_loadu_ps(from); #endif } #else // NOTE: with the code below, MSVC's compiler crashes! template<> EIGEN_STRONG_INLINE Packet4f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_ps(from); } #endif template<> EIGEN_STRONG_INLINE Packet2d ploadu(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_pd(from); } template<> EIGEN_STRONG_INLINE Packet4i ploadu(const int* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet16b ploadu(const bool* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { return vec4f_swizzle1(_mm_castpd_ps(_mm_load_sd(reinterpret_cast(from))), 0, 0, 1, 1); } template<> EIGEN_STRONG_INLINE Packet2d ploaddup(const double* from) { return pset1(from[0]); } template<> EIGEN_STRONG_INLINE Packet4i ploaddup(const int* from) { Packet4i tmp; tmp = _mm_loadl_epi64(reinterpret_cast(from)); return vec4i_swizzle1(tmp, 0, 0, 1, 1); } // Loads 8 bools from memory and returns the packet // {b0, b0, b1, b1, b2, b2, b3, b3, b4, b4, b5, b5, b6, b6, b7, b7} template<> EIGEN_STRONG_INLINE Packet16b ploaddup(const bool* from) { __m128i tmp = _mm_castpd_si128(pload1(reinterpret_cast(from))); return _mm_unpacklo_epi8(tmp, tmp); } // Loads 4 bools from memory and returns the packet // {b0, b0 b0, b0, b1, b1, b1, b1, b2, b2, b2, b2, b3, b3, b3, b3} template<> EIGEN_STRONG_INLINE Packet16b ploadquad(const bool* from) { __m128i tmp = _mm_castps_si128(pload1(reinterpret_cast(from))); tmp = _mm_unpacklo_epi8(tmp, tmp); return _mm_unpacklo_epi16(tmp, tmp); } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_ps(to, from); } template<> EIGEN_STRONG_INLINE void pstore(double* to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_pd(to, from); } template<> EIGEN_STRONG_INLINE void pstore(int* to, const Packet4i& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstore(bool* to, const Packet16b& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet2d& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm_storeu_pd(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet4f& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm_storeu_ps(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(int* to, const Packet4i& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm_storeu_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(bool* to, const Packet16b& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_storeu_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_DEVICE_FUNC inline Packet4f pgather(const float* from, Index stride) { return _mm_set_ps(from[3*stride], from[2*stride], from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline Packet2d pgather(const double* from, Index stride) { return _mm_set_pd(from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline Packet4i pgather(const int* from, Index stride) { return _mm_set_epi32(from[3*stride], from[2*stride], from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline Packet16b pgather(const bool* from, Index stride) { return _mm_set_epi8(from[15*stride], from[14*stride], from[13*stride], from[12*stride], from[11*stride], from[10*stride], from[9*stride], from[8*stride], from[7*stride], from[6*stride], from[5*stride], from[4*stride], from[3*stride], from[2*stride], from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet4f& from, Index stride) { to[stride*0] = _mm_cvtss_f32(from); to[stride*1] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 1)); to[stride*2] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 2)); to[stride*3] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 3)); } template<> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet2d& from, Index stride) { to[stride*0] = _mm_cvtsd_f64(from); to[stride*1] = _mm_cvtsd_f64(_mm_shuffle_pd(from, from, 1)); } template<> EIGEN_DEVICE_FUNC inline void pscatter(int* to, const Packet4i& from, Index stride) { to[stride*0] = _mm_cvtsi128_si32(from); to[stride*1] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 1)); to[stride*2] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 2)); to[stride*3] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 3)); } template<> EIGEN_DEVICE_FUNC inline void pscatter(bool* to, const Packet16b& from, Index stride) { to[4*stride*0] = _mm_cvtsi128_si32(from); to[4*stride*1] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 1)); to[4*stride*2] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 2)); to[4*stride*3] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 3)); } // some compilers might be tempted to perform multiple moves instead of using a vector path. template<> EIGEN_STRONG_INLINE void pstore1(float* to, const float& a) { Packet4f pa = _mm_set_ss(a); pstore(to, Packet4f(vec4f_swizzle1(pa,0,0,0,0))); } // some compilers might be tempted to perform multiple moves instead of using a vector path. template<> EIGEN_STRONG_INLINE void pstore1(double* to, const double& a) { Packet2d pa = _mm_set_sd(a); pstore(to, Packet2d(vec2d_swizzle1(pa,0,0))); } #if EIGEN_COMP_PGI && EIGEN_COMP_PGI < 1900 typedef const void * SsePrefetchPtrType; #else typedef const char * SsePrefetchPtrType; #endif #ifndef EIGEN_VECTORIZE_AVX template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const int* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } #endif #if EIGEN_COMP_MSVC_STRICT && EIGEN_OS_WIN64 // The temporary variable fixes an internal compilation error in vs <= 2008 and a wrong-result bug in vs 2010 // Direct of the struct members fixed bug #62. template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { return a.m128_f32[0]; } template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { return a.m128d_f64[0]; } template<> EIGEN_STRONG_INLINE int pfirst(const Packet4i& a) { int x = _mm_cvtsi128_si32(a); return x; } #elif EIGEN_COMP_MSVC_STRICT // The temporary variable fixes an internal compilation error in vs <= 2008 and a wrong-result bug in vs 2010 template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { float x = _mm_cvtss_f32(a); return x; } template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { double x = _mm_cvtsd_f64(a); return x; } template<> EIGEN_STRONG_INLINE int pfirst(const Packet4i& a) { int x = _mm_cvtsi128_si32(a); return x; } #else template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { return _mm_cvtss_f32(a); } template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { return _mm_cvtsd_f64(a); } template<> EIGEN_STRONG_INLINE int pfirst(const Packet4i& a) { return _mm_cvtsi128_si32(a); } #endif template<> EIGEN_STRONG_INLINE bool pfirst(const Packet16b& a) { int x = _mm_cvtsi128_si32(a); return static_cast(x & 1); } template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { return _mm_shuffle_ps(a,a,0x1B); } template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return _mm_shuffle_pd(a,a,0x1); } template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { return _mm_shuffle_epi32(a,0x1B); } template<> EIGEN_STRONG_INLINE Packet16b preverse(const Packet16b& a) { #ifdef EIGEN_VECTORIZE_SSSE3 __m128i mask = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15); return _mm_shuffle_epi8(a, mask); #else Packet16b tmp = _mm_shuffle_epi32(a, _MM_SHUFFLE(0, 1, 2, 3)); tmp = _mm_shufflehi_epi16(_mm_shufflelo_epi16(tmp, _MM_SHUFFLE(2, 3, 0, 1)), _MM_SHUFFLE(2, 3, 0, 1)); return _mm_or_si128(_mm_slli_epi16(tmp, 8), _mm_srli_epi16(tmp, 8)); #endif } template<> EIGEN_STRONG_INLINE Packet4f pfrexp(const Packet4f& a, Packet4f& exponent) { return pfrexp_generic(a,exponent); } // Extract exponent without existence of Packet2l. template<> EIGEN_STRONG_INLINE Packet2d pfrexp_generic_get_biased_exponent(const Packet2d& a) { const Packet2d cst_exp_mask = pset1frombits(static_cast(0x7ff0000000000000ull)); __m128i a_expo = _mm_srli_epi64(_mm_castpd_si128(pand(a, cst_exp_mask)), 52); return _mm_cvtepi32_pd(vec4i_swizzle1(a_expo, 0, 2, 1, 3)); } template<> EIGEN_STRONG_INLINE Packet2d pfrexp(const Packet2d& a, Packet2d& exponent) { return pfrexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet4f pldexp(const Packet4f& a, const Packet4f& exponent) { return pldexp_generic(a,exponent); } // We specialize pldexp here, since the generic implementation uses Packet2l, which is not well // supported by SSE, and has more range than is needed for exponents. template<> EIGEN_STRONG_INLINE Packet2d pldexp(const Packet2d& a, const Packet2d& exponent) { // Clamp exponent to [-2099, 2099] const Packet2d max_exponent = pset1(2099.0); const Packet2d e = pmin(pmax(exponent, pnegate(max_exponent)), max_exponent); // Convert e to integer and swizzle to low-order bits. const Packet4i ei = vec4i_swizzle1(_mm_cvtpd_epi32(e), 0, 3, 1, 3); // Split 2^e into four factors and multiply: const Packet4i bias = _mm_set_epi32(0, 1023, 0, 1023); Packet4i b = parithmetic_shift_right<2>(ei); // floor(e/4) Packet2d c = _mm_castsi128_pd(_mm_slli_epi64(padd(b, bias), 52)); // 2^b Packet2d out = pmul(pmul(pmul(a, c), c), c); // a * 2^(3b) b = psub(psub(psub(ei, b), b), b); // e - 3b c = _mm_castsi128_pd(_mm_slli_epi64(padd(b, bias), 52)); // 2^(e - 3b) out = pmul(out, c); // a * 2^e return out; } // with AVX, the default implementations based on pload1 are faster #ifndef __AVX__ template<> EIGEN_STRONG_INLINE void pbroadcast4(const float *a, Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3) { a3 = pload(a); a0 = vec4f_swizzle1(a3, 0,0,0,0); a1 = vec4f_swizzle1(a3, 1,1,1,1); a2 = vec4f_swizzle1(a3, 2,2,2,2); a3 = vec4f_swizzle1(a3, 3,3,3,3); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const double *a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3) { #ifdef EIGEN_VECTORIZE_SSE3 a0 = _mm_loaddup_pd(a+0); a1 = _mm_loaddup_pd(a+1); a2 = _mm_loaddup_pd(a+2); a3 = _mm_loaddup_pd(a+3); #else a1 = pload(a); a0 = vec2d_swizzle1(a1, 0,0); a1 = vec2d_swizzle1(a1, 1,1); a3 = pload(a+2); a2 = vec2d_swizzle1(a3, 0,0); a3 = vec2d_swizzle1(a3, 1,1); #endif } #endif EIGEN_STRONG_INLINE void punpackp(Packet4f* vecs) { vecs[1] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0x55)); vecs[2] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0xAA)); vecs[3] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0xFF)); vecs[0] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0x00)); } template<> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { // Disable SSE3 _mm_hadd_pd that is extremely slow on all existing Intel's architectures // (from Nehalem to Haswell) // #ifdef EIGEN_VECTORIZE_SSE3 // Packet4f tmp = _mm_add_ps(a, vec4f_swizzle1(a,2,3,2,3)); // return pfirst(_mm_hadd_ps(tmp, tmp)); // #else Packet4f tmp = _mm_add_ps(a, _mm_movehl_ps(a,a)); return pfirst(_mm_add_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1))); // #endif } template<> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { // Disable SSE3 _mm_hadd_pd that is extremely slow on all existing Intel's architectures // (from Nehalem to Haswell) // #ifdef EIGEN_VECTORIZE_SSE3 // return pfirst(_mm_hadd_pd(a, a)); // #else return pfirst(_mm_add_sd(a, _mm_unpackhi_pd(a,a))); // #endif } #ifdef EIGEN_VECTORIZE_SSSE3 template<> EIGEN_STRONG_INLINE int predux(const Packet4i& a) { Packet4i tmp0 = _mm_hadd_epi32(a,a); return pfirst(_mm_hadd_epi32(tmp0,tmp0)); } #else template<> EIGEN_STRONG_INLINE int predux(const Packet4i& a) { Packet4i tmp = _mm_add_epi32(a, _mm_unpackhi_epi64(a,a)); return pfirst(tmp) + pfirst(_mm_shuffle_epi32(tmp, 1)); } #endif template<> EIGEN_STRONG_INLINE bool predux(const Packet16b& a) { Packet4i tmp = _mm_or_si128(a, _mm_unpackhi_epi64(a,a)); return (pfirst(tmp) != 0) || (pfirst(_mm_shuffle_epi32(tmp, 1)) != 0); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { Packet4f tmp = _mm_mul_ps(a, _mm_movehl_ps(a,a)); return pfirst(_mm_mul_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1))); } template<> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { return pfirst(_mm_mul_sd(a, _mm_unpackhi_pd(a,a))); } template<> EIGEN_STRONG_INLINE int predux_mul(const Packet4i& a) { // after some experiments, it is seems this is the fastest way to implement it // for GCC (eg., reusing pmul is very slow !) // TODO try to call _mm_mul_epu32 directly EIGEN_ALIGN16 int aux[4]; pstore(aux, a); return (aux[0] * aux[1]) * (aux[2] * aux[3]); } template<> EIGEN_STRONG_INLINE bool predux_mul(const Packet16b& a) { Packet4i tmp = _mm_and_si128(a, _mm_unpackhi_epi64(a,a)); return ((pfirst(tmp) == 0x01010101) && (pfirst(_mm_shuffle_epi32(tmp, 1)) == 0x01010101)); } // min template<> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { Packet4f tmp = _mm_min_ps(a, _mm_movehl_ps(a,a)); return pfirst(_mm_min_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1))); } template<> EIGEN_STRONG_INLINE double predux_min(const Packet2d& a) { return pfirst(_mm_min_sd(a, _mm_unpackhi_pd(a,a))); } template<> EIGEN_STRONG_INLINE int predux_min(const Packet4i& a) { #ifdef EIGEN_VECTORIZE_SSE4_1 Packet4i tmp = _mm_min_epi32(a, _mm_shuffle_epi32(a, _MM_SHUFFLE(0,0,3,2))); return pfirst(_mm_min_epi32(tmp,_mm_shuffle_epi32(tmp, 1))); #else // after some experiments, it is seems this is the fastest way to implement it // for GCC (eg., it does not like using std::min after the pstore !!) EIGEN_ALIGN16 int aux[4]; pstore(aux, a); int aux0 = aux[0] EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { Packet4f tmp = _mm_max_ps(a, _mm_movehl_ps(a,a)); return pfirst(_mm_max_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1))); } template<> EIGEN_STRONG_INLINE double predux_max(const Packet2d& a) { return pfirst(_mm_max_sd(a, _mm_unpackhi_pd(a,a))); } template<> EIGEN_STRONG_INLINE int predux_max(const Packet4i& a) { #ifdef EIGEN_VECTORIZE_SSE4_1 Packet4i tmp = _mm_max_epi32(a, _mm_shuffle_epi32(a, _MM_SHUFFLE(0,0,3,2))); return pfirst(_mm_max_epi32(tmp,_mm_shuffle_epi32(tmp, 1))); #else // after some experiments, it is seems this is the fastest way to implement it // for GCC (eg., it does not like using std::min after the pstore !!) EIGEN_ALIGN16 int aux[4]; pstore(aux, a); int aux0 = aux[0]>aux[1] ? aux[0] : aux[1]; int aux2 = aux[2]>aux[3] ? aux[2] : aux[3]; return aux0>aux2 ? aux0 : aux2; #endif // EIGEN_VECTORIZE_SSE4_1 } // not needed yet // template<> EIGEN_STRONG_INLINE bool predux_all(const Packet4f& x) // { // return _mm_movemask_ps(x) == 0xF; // } template<> EIGEN_STRONG_INLINE bool predux_any(const Packet4f& x) { return _mm_movemask_ps(x) != 0x0; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { _MM_TRANSPOSE4_PS(kernel.packet[0], kernel.packet[1], kernel.packet[2], kernel.packet[3]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m128d tmp = _mm_unpackhi_pd(kernel.packet[0], kernel.packet[1]); kernel.packet[0] = _mm_unpacklo_pd(kernel.packet[0], kernel.packet[1]); kernel.packet[1] = tmp; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m128i T0 = _mm_unpacklo_epi32(kernel.packet[0], kernel.packet[1]); __m128i T1 = _mm_unpacklo_epi32(kernel.packet[2], kernel.packet[3]); __m128i T2 = _mm_unpackhi_epi32(kernel.packet[0], kernel.packet[1]); __m128i T3 = _mm_unpackhi_epi32(kernel.packet[2], kernel.packet[3]); kernel.packet[0] = _mm_unpacklo_epi64(T0, T1); kernel.packet[1] = _mm_unpackhi_epi64(T0, T1); kernel.packet[2] = _mm_unpacklo_epi64(T2, T3); kernel.packet[3] = _mm_unpackhi_epi64(T2, T3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m128i T0 = _mm_unpacklo_epi8(kernel.packet[0], kernel.packet[1]); __m128i T1 = _mm_unpackhi_epi8(kernel.packet[0], kernel.packet[1]); __m128i T2 = _mm_unpacklo_epi8(kernel.packet[2], kernel.packet[3]); __m128i T3 = _mm_unpackhi_epi8(kernel.packet[2], kernel.packet[3]); kernel.packet[0] = _mm_unpacklo_epi16(T0, T2); kernel.packet[1] = _mm_unpackhi_epi16(T0, T2); kernel.packet[2] = _mm_unpacklo_epi16(T1, T3); kernel.packet[3] = _mm_unpackhi_epi16(T1, T3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { // If we number the elements in the input thus: // kernel.packet[ 0] = {00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 0a, 0b, 0c, 0d, 0e, 0f} // kernel.packet[ 1] = {10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1a, 1b, 1c, 1d, 1e, 1f} // ... // kernel.packet[15] = {f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, fa, fb, fc, fd, fe, ff}, // // the desired output is: // kernel.packet[ 0] = {00, 10, 20, 30, 40, 50, 60, 70, 80, 90, a0, b0, c0, d0, e0, f0} // kernel.packet[ 1] = {01, 11, 21, 31, 41, 51, 61, 71, 81, 91, a1, b1, c1, d1, e1, f1} // ... // kernel.packet[15] = {0f, 1f, 2f, 3f, 4f, 5f, 6f, 7f, 8f, 9f, af, bf, cf, df, ef, ff}, __m128i t0 = _mm_unpacklo_epi8(kernel.packet[0], kernel.packet[1]); // 00 10 01 11 02 12 03 13 04 14 05 15 06 16 07 17 __m128i t1 = _mm_unpackhi_epi8(kernel.packet[0], kernel.packet[1]); // 08 18 09 19 0a 1a 0b 1b 0c 1c 0d 1d 0e 1e 0f 1f __m128i t2 = _mm_unpacklo_epi8(kernel.packet[2], kernel.packet[3]); // 20 30 21 31 22 32 ... 27 37 __m128i t3 = _mm_unpackhi_epi8(kernel.packet[2], kernel.packet[3]); // 28 38 29 39 2a 3a ... 2f 3f __m128i t4 = _mm_unpacklo_epi8(kernel.packet[4], kernel.packet[5]); // 40 50 41 51 42 52 47 57 __m128i t5 = _mm_unpackhi_epi8(kernel.packet[4], kernel.packet[5]); // 48 58 49 59 4a 5a __m128i t6 = _mm_unpacklo_epi8(kernel.packet[6], kernel.packet[7]); __m128i t7 = _mm_unpackhi_epi8(kernel.packet[6], kernel.packet[7]); __m128i t8 = _mm_unpacklo_epi8(kernel.packet[8], kernel.packet[9]); __m128i t9 = _mm_unpackhi_epi8(kernel.packet[8], kernel.packet[9]); __m128i ta = _mm_unpacklo_epi8(kernel.packet[10], kernel.packet[11]); __m128i tb = _mm_unpackhi_epi8(kernel.packet[10], kernel.packet[11]); __m128i tc = _mm_unpacklo_epi8(kernel.packet[12], kernel.packet[13]); __m128i td = _mm_unpackhi_epi8(kernel.packet[12], kernel.packet[13]); __m128i te = _mm_unpacklo_epi8(kernel.packet[14], kernel.packet[15]); __m128i tf = _mm_unpackhi_epi8(kernel.packet[14], kernel.packet[15]); __m128i s0 = _mm_unpacklo_epi16(t0, t2); // 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 __m128i s1 = _mm_unpackhi_epi16(t0, t2); // 04 14 24 34 __m128i s2 = _mm_unpacklo_epi16(t1, t3); // 08 18 28 38 ... __m128i s3 = _mm_unpackhi_epi16(t1, t3); // 0c 1c 2c 3c ... __m128i s4 = _mm_unpacklo_epi16(t4, t6); // 40 50 60 70 41 51 61 71 42 52 62 72 43 53 63 73 __m128i s5 = _mm_unpackhi_epi16(t4, t6); // 44 54 64 74 ... __m128i s6 = _mm_unpacklo_epi16(t5, t7); __m128i s7 = _mm_unpackhi_epi16(t5, t7); __m128i s8 = _mm_unpacklo_epi16(t8, ta); __m128i s9 = _mm_unpackhi_epi16(t8, ta); __m128i sa = _mm_unpacklo_epi16(t9, tb); __m128i sb = _mm_unpackhi_epi16(t9, tb); __m128i sc = _mm_unpacklo_epi16(tc, te); __m128i sd = _mm_unpackhi_epi16(tc, te); __m128i se = _mm_unpacklo_epi16(td, tf); __m128i sf = _mm_unpackhi_epi16(td, tf); __m128i u0 = _mm_unpacklo_epi32(s0, s4); // 00 10 20 30 40 50 60 70 01 11 21 31 41 51 61 71 __m128i u1 = _mm_unpackhi_epi32(s0, s4); // 02 12 22 32 42 52 62 72 03 13 23 33 43 53 63 73 __m128i u2 = _mm_unpacklo_epi32(s1, s5); __m128i u3 = _mm_unpackhi_epi32(s1, s5); __m128i u4 = _mm_unpacklo_epi32(s2, s6); __m128i u5 = _mm_unpackhi_epi32(s2, s6); __m128i u6 = _mm_unpacklo_epi32(s3, s7); __m128i u7 = _mm_unpackhi_epi32(s3, s7); __m128i u8 = _mm_unpacklo_epi32(s8, sc); __m128i u9 = _mm_unpackhi_epi32(s8, sc); __m128i ua = _mm_unpacklo_epi32(s9, sd); __m128i ub = _mm_unpackhi_epi32(s9, sd); __m128i uc = _mm_unpacklo_epi32(sa, se); __m128i ud = _mm_unpackhi_epi32(sa, se); __m128i ue = _mm_unpacklo_epi32(sb, sf); __m128i uf = _mm_unpackhi_epi32(sb, sf); kernel.packet[0] = _mm_unpacklo_epi64(u0, u8); kernel.packet[1] = _mm_unpackhi_epi64(u0, u8); kernel.packet[2] = _mm_unpacklo_epi64(u1, u9); kernel.packet[3] = _mm_unpackhi_epi64(u1, u9); kernel.packet[4] = _mm_unpacklo_epi64(u2, ua); kernel.packet[5] = _mm_unpackhi_epi64(u2, ua); kernel.packet[6] = _mm_unpacklo_epi64(u3, ub); kernel.packet[7] = _mm_unpackhi_epi64(u3, ub); kernel.packet[8] = _mm_unpacklo_epi64(u4, uc); kernel.packet[9] = _mm_unpackhi_epi64(u4, uc); kernel.packet[10] = _mm_unpacklo_epi64(u5, ud); kernel.packet[11] = _mm_unpackhi_epi64(u5, ud); kernel.packet[12] = _mm_unpacklo_epi64(u6, ue); kernel.packet[13] = _mm_unpackhi_epi64(u6, ue); kernel.packet[14] = _mm_unpacklo_epi64(u7, uf); kernel.packet[15] = _mm_unpackhi_epi64(u7, uf); } template<> EIGEN_STRONG_INLINE Packet4i pblend(const Selector<4>& ifPacket, const Packet4i& thenPacket, const Packet4i& elsePacket) { const __m128i zero = _mm_setzero_si128(); const __m128i select = _mm_set_epi32(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]); __m128i false_mask = _mm_cmpeq_epi32(select, zero); #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_blendv_epi8(thenPacket, elsePacket, false_mask); #else return _mm_or_si128(_mm_andnot_si128(false_mask, thenPacket), _mm_and_si128(false_mask, elsePacket)); #endif } template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) { const __m128 zero = _mm_setzero_ps(); const __m128 select = _mm_set_ps(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]); __m128 false_mask = _mm_cmpeq_ps(select, zero); #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_blendv_ps(thenPacket, elsePacket, false_mask); #else return _mm_or_ps(_mm_andnot_ps(false_mask, thenPacket), _mm_and_ps(false_mask, elsePacket)); #endif } template<> EIGEN_STRONG_INLINE Packet2d pblend(const Selector<2>& ifPacket, const Packet2d& thenPacket, const Packet2d& elsePacket) { const __m128d zero = _mm_setzero_pd(); const __m128d select = _mm_set_pd(ifPacket.select[1], ifPacket.select[0]); __m128d false_mask = _mm_cmpeq_pd(select, zero); #ifdef EIGEN_VECTORIZE_SSE4_1 return _mm_blendv_pd(thenPacket, elsePacket, false_mask); #else return _mm_or_pd(_mm_andnot_pd(false_mask, thenPacket), _mm_and_pd(false_mask, elsePacket)); #endif } // Scalar path for pmadd with FMA to ensure consistency with vectorized path. #ifdef EIGEN_VECTORIZE_FMA template<> EIGEN_STRONG_INLINE float pmadd(const float& a, const float& b, const float& c) { return ::fmaf(a,b,c); } template<> EIGEN_STRONG_INLINE double pmadd(const double& a, const double& b, const double& c) { return ::fma(a,b,c); } #endif // Packet math for Eigen::half // Disable the following code since it's broken on too many platforms / compilers. //#elif defined(EIGEN_VECTORIZE_SSE) && (!EIGEN_ARCH_x86_64) && (!EIGEN_COMP_MSVC) #if 0 typedef struct { __m64 x; } Packet4h; template<> struct is_arithmetic { enum { value = true }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4h type; // There is no half-size packet for Packet4h. typedef Packet4h half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 0, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasConj = 0, HasSetLinear = 0, HasSqrt = 0, HasRsqrt = 0, HasExp = 0, HasLog = 0, HasBlend = 0 }; }; template<> struct unpacket_traits { typedef Eigen::half type; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet4h half; }; template<> EIGEN_STRONG_INLINE Packet4h pset1(const Eigen::half& from) { Packet4h result; result.x = _mm_set1_pi16(from.x); return result; } template<> EIGEN_STRONG_INLINE Eigen::half pfirst(const Packet4h& from) { return half_impl::raw_uint16_to_half(static_cast(_mm_cvtsi64_si32(from.x))); } template<> EIGEN_STRONG_INLINE Packet4h pconj(const Packet4h& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4h padd(const Packet4h& a, const Packet4h& b) { __int64_t a64 = _mm_cvtm64_si64(a.x); __int64_t b64 = _mm_cvtm64_si64(b.x); Eigen::half h[4]; Eigen::half ha = half_impl::raw_uint16_to_half(static_cast(a64)); Eigen::half hb = half_impl::raw_uint16_to_half(static_cast(b64)); h[0] = ha + hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 16)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 16)); h[1] = ha + hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 32)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 32)); h[2] = ha + hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 48)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 48)); h[3] = ha + hb; Packet4h result; result.x = _mm_set_pi16(h[3].x, h[2].x, h[1].x, h[0].x); return result; } template<> EIGEN_STRONG_INLINE Packet4h psub(const Packet4h& a, const Packet4h& b) { __int64_t a64 = _mm_cvtm64_si64(a.x); __int64_t b64 = _mm_cvtm64_si64(b.x); Eigen::half h[4]; Eigen::half ha = half_impl::raw_uint16_to_half(static_cast(a64)); Eigen::half hb = half_impl::raw_uint16_to_half(static_cast(b64)); h[0] = ha - hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 16)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 16)); h[1] = ha - hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 32)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 32)); h[2] = ha - hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 48)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 48)); h[3] = ha - hb; Packet4h result; result.x = _mm_set_pi16(h[3].x, h[2].x, h[1].x, h[0].x); return result; } template<> EIGEN_STRONG_INLINE Packet4h pmul(const Packet4h& a, const Packet4h& b) { __int64_t a64 = _mm_cvtm64_si64(a.x); __int64_t b64 = _mm_cvtm64_si64(b.x); Eigen::half h[4]; Eigen::half ha = half_impl::raw_uint16_to_half(static_cast(a64)); Eigen::half hb = half_impl::raw_uint16_to_half(static_cast(b64)); h[0] = ha * hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 16)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 16)); h[1] = ha * hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 32)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 32)); h[2] = ha * hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 48)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 48)); h[3] = ha * hb; Packet4h result; result.x = _mm_set_pi16(h[3].x, h[2].x, h[1].x, h[0].x); return result; } template<> EIGEN_STRONG_INLINE Packet4h pdiv(const Packet4h& a, const Packet4h& b) { __int64_t a64 = _mm_cvtm64_si64(a.x); __int64_t b64 = _mm_cvtm64_si64(b.x); Eigen::half h[4]; Eigen::half ha = half_impl::raw_uint16_to_half(static_cast(a64)); Eigen::half hb = half_impl::raw_uint16_to_half(static_cast(b64)); h[0] = ha / hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 16)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 16)); h[1] = ha / hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 32)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 32)); h[2] = ha / hb; ha = half_impl::raw_uint16_to_half(static_cast(a64 >> 48)); hb = half_impl::raw_uint16_to_half(static_cast(b64 >> 48)); h[3] = ha / hb; Packet4h result; result.x = _mm_set_pi16(h[3].x, h[2].x, h[1].x, h[0].x); return result; } template<> EIGEN_STRONG_INLINE Packet4h pload(const Eigen::half* from) { Packet4h result; result.x = _mm_cvtsi64_m64(*reinterpret_cast(from)); return result; } template<> EIGEN_STRONG_INLINE Packet4h ploadu(const Eigen::half* from) { Packet4h result; result.x = _mm_cvtsi64_m64(*reinterpret_cast(from)); return result; } template<> EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const Packet4h& from) { __int64_t r = _mm_cvtm64_si64(from.x); *(reinterpret_cast<__int64_t*>(to)) = r; } template<> EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const Packet4h& from) { __int64_t r = _mm_cvtm64_si64(from.x); *(reinterpret_cast<__int64_t*>(to)) = r; } template<> EIGEN_STRONG_INLINE Packet4h ploadquad(const Eigen::half* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE Packet4h pgather(const Eigen::half* from, Index stride) { Packet4h result; result.x = _mm_set_pi16(from[3*stride].x, from[2*stride].x, from[1*stride].x, from[0*stride].x); return result; } template<> EIGEN_STRONG_INLINE void pscatter(Eigen::half* to, const Packet4h& from, Index stride) { __int64_t a = _mm_cvtm64_si64(from.x); to[stride*0].x = static_cast(a); to[stride*1].x = static_cast(a >> 16); to[stride*2].x = static_cast(a >> 32); to[stride*3].x = static_cast(a >> 48); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m64 T0 = _mm_unpacklo_pi16(kernel.packet[0].x, kernel.packet[1].x); __m64 T1 = _mm_unpacklo_pi16(kernel.packet[2].x, kernel.packet[3].x); __m64 T2 = _mm_unpackhi_pi16(kernel.packet[0].x, kernel.packet[1].x); __m64 T3 = _mm_unpackhi_pi16(kernel.packet[2].x, kernel.packet[3].x); kernel.packet[0].x = _mm_unpacklo_pi32(T0, T1); kernel.packet[1].x = _mm_unpackhi_pi32(T0, T1); kernel.packet[2].x = _mm_unpacklo_pi32(T2, T3); kernel.packet[3].x = _mm_unpackhi_pi32(T2, T3); } #endif } // end namespace internal } // end namespace Eigen #if EIGEN_COMP_PGI && EIGEN_COMP_PGI < 1900 // PGI++ does not define the following intrinsics in C++ mode. static inline __m128 _mm_castpd_ps (__m128d x) { return reinterpret_cast<__m128&>(x); } static inline __m128i _mm_castpd_si128(__m128d x) { return reinterpret_cast<__m128i&>(x); } static inline __m128d _mm_castps_pd (__m128 x) { return reinterpret_cast<__m128d&>(x); } static inline __m128i _mm_castps_si128(__m128 x) { return reinterpret_cast<__m128i&>(x); } static inline __m128 _mm_castsi128_ps(__m128i x) { return reinterpret_cast<__m128&>(x); } static inline __m128d _mm_castsi128_pd(__m128i x) { return reinterpret_cast<__m128d&>(x); } #endif #endif // EIGEN_PACKET_MATH_SSE_H RcppEigen/inst/include/Eigen/src/Core/arch/SSE/TypeCasting.h0000644000176200001440000000710214562417221023212 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_SSE_H #define EIGEN_TYPE_CASTING_SSE_H namespace Eigen { namespace internal { #ifndef EIGEN_VECTORIZE_AVX template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; #endif template<> EIGEN_STRONG_INLINE Packet4i pcast(const Packet4f& a) { return _mm_cvttps_epi32(a); } template<> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4i& a) { return _mm_cvtepi32_ps(a); } template<> EIGEN_STRONG_INLINE Packet4f pcast(const Packet2d& a, const Packet2d& b) { return _mm_shuffle_ps(_mm_cvtpd_ps(a), _mm_cvtpd_ps(b), (1 << 2) | (1 << 6)); } template<> EIGEN_STRONG_INLINE Packet2d pcast(const Packet4f& a) { // Simply discard the second half of the input return _mm_cvtps_pd(a); } template<> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet4f& a) { return _mm_castps_si128(a); } template<> EIGEN_STRONG_INLINE Packet4f preinterpret(const Packet4i& a) { return _mm_castsi128_ps(a); } template<> EIGEN_STRONG_INLINE Packet2d preinterpret(const Packet4i& a) { return _mm_castsi128_pd(a); } template<> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet2d& a) { return _mm_castpd_si128(a); } // Disable the following code since it's broken on too many platforms / compilers. //#elif defined(EIGEN_VECTORIZE_SSE) && (!EIGEN_ARCH_x86_64) && (!EIGEN_COMP_MSVC) #if 0 template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4h& a) { __int64_t a64 = _mm_cvtm64_si64(a.x); Eigen::half h = raw_uint16_to_half(static_cast(a64)); float f1 = static_cast(h); h = raw_uint16_to_half(static_cast(a64 >> 16)); float f2 = static_cast(h); h = raw_uint16_to_half(static_cast(a64 >> 32)); float f3 = static_cast(h); h = raw_uint16_to_half(static_cast(a64 >> 48)); float f4 = static_cast(h); return _mm_set_ps(f4, f3, f2, f1); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet4h pcast(const Packet4f& a) { EIGEN_ALIGN16 float aux[4]; pstore(aux, a); Eigen::half h0(aux[0]); Eigen::half h1(aux[1]); Eigen::half h2(aux[2]); Eigen::half h3(aux[3]); Packet4h result; result.x = _mm_set_pi16(h3.x, h2.x, h1.x, h0.x); return result; } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_SSE_H RcppEigen/inst/include/Eigen/src/Core/arch/SSE/Complex.h0000644000176200001440000003365314562417221022401 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_SSE_H #define EIGEN_COMPLEX_SSE_H namespace Eigen { namespace internal { //---------- float ---------- struct Packet2cf { EIGEN_STRONG_INLINE Packet2cf() {} EIGEN_STRONG_INLINE explicit Packet2cf(const __m128& a) : v(a) {} Packet4f v; }; // Use the packet_traits defined in AVX/PacketMath.h instead if we're going // to leverage AVX instructions. #ifndef EIGEN_VECTORIZE_AVX template<> struct packet_traits > : default_packet_traits { typedef Packet2cf type; typedef Packet2cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0, HasBlend = 1 }; }; #endif template<> struct unpacket_traits { typedef std::complex type; typedef Packet2cf half; typedef Packet4f as_real; enum { size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet2cf padd(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_add_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf psub(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_sub_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x80000000,0x80000000)); return Packet2cf(_mm_xor_ps(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000)); return Packet2cf(_mm_xor_ps(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { #ifdef EIGEN_VECTORIZE_SSE3 return Packet2cf(_mm_addsub_ps(_mm_mul_ps(_mm_moveldup_ps(a.v), b.v), _mm_mul_ps(_mm_movehdup_ps(a.v), vec4f_swizzle1(b.v, 1, 0, 3, 2)))); // return Packet2cf(_mm_addsub_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), // _mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3), // vec4f_swizzle1(b.v, 1, 0, 3, 2)))); #else const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x00000000,0x80000000,0x00000000)); return Packet2cf(_mm_add_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), _mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3), vec4f_swizzle1(b.v, 1, 0, 3, 2)), mask))); #endif } template<> EIGEN_STRONG_INLINE Packet2cf ptrue (const Packet2cf& a) { return Packet2cf(ptrue(Packet4f(a.v))); } template<> EIGEN_STRONG_INLINE Packet2cf pand (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_and_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf por (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_or_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pxor (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_xor_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pandnot(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_andnot_ps(b.v,a.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet2cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { Packet2cf res; #ifdef EIGEN_VECTORIZE_SSE3 res.v = _mm_castpd_ps(_mm_loaddup_pd(reinterpret_cast(&from))); #else res.v = _mm_castpd_ps(_mm_load_sd(reinterpret_cast(&from))); res.v = _mm_movelh_ps(res.v, res.v); #endif return res; } template<> EIGEN_STRONG_INLINE Packet2cf ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), Packet4f(from.v)); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), Packet4f(from.v)); } template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather, Packet2cf>(const std::complex* from, Index stride) { return Packet2cf(_mm_set_ps(std::imag(from[1*stride]), std::real(from[1*stride]), std::imag(from[0*stride]), std::real(from[0*stride]))); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cf>(std::complex* to, const Packet2cf& from, Index stride) { to[stride*0] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 0)), _mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 1))); to[stride*1] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 2)), _mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 3))); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cf& a) { #if EIGEN_GNUC_AT_MOST(4,3) // Workaround gcc 4.2 ICE - this is not performance wise ideal, but who cares... // This workaround also fix invalid code generation with gcc 4.3 EIGEN_ALIGN16 std::complex res[2]; _mm_store_ps((float*)res, a.v); return res[0]; #else std::complex res; _mm_storel_pi((__m64*)&res, a.v); return res; #endif } template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { return Packet2cf(_mm_castpd_ps(preverse(Packet2d(_mm_castps_pd(a.v))))); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { return pfirst(Packet2cf(_mm_add_ps(a.v, _mm_movehl_ps(a.v,a.v)))); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { return pfirst(pmul(a, Packet2cf(_mm_movehl_ps(a.v,a.v)))); } EIGEN_STRONG_INLINE Packet2cf pcplxflip/* */(const Packet2cf& x) { return Packet2cf(vec4f_swizzle1(x.v, 1, 0, 3, 2)); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f) template<> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { // TODO optimize it for SSE3 and 4 Packet2cf res = pmul(a, pconj(b)); __m128 s = _mm_mul_ps(b.v,b.v); return Packet2cf(_mm_div_ps(res.v,_mm_add_ps(s,vec4f_swizzle1(s, 1, 0, 3, 2)))); } //---------- double ---------- struct Packet1cd { EIGEN_STRONG_INLINE Packet1cd() {} EIGEN_STRONG_INLINE explicit Packet1cd(const __m128d& a) : v(a) {} Packet2d v; }; // Use the packet_traits defined in AVX/PacketMath.h instead if we're going // to leverage AVX instructions. #ifndef EIGEN_VECTORIZE_AVX template<> struct packet_traits > : default_packet_traits { typedef Packet1cd type; typedef Packet1cd half; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 1, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; #endif template<> struct unpacket_traits { typedef std::complex type; typedef Packet1cd half; typedef Packet2d as_real; enum { size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet1cd padd(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_add_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd psub(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_sub_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); } template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0)); return Packet1cd(_mm_xor_pd(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) { #ifdef EIGEN_VECTORIZE_SSE3 return Packet1cd(_mm_addsub_pd(_mm_mul_pd(_mm_movedup_pd(a.v), b.v), _mm_mul_pd(vec2d_swizzle1(a.v, 1, 1), vec2d_swizzle1(b.v, 1, 0)))); #else const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0)); return Packet1cd(_mm_add_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v), _mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1), vec2d_swizzle1(b.v, 1, 0)), mask))); #endif } template<> EIGEN_STRONG_INLINE Packet1cd ptrue (const Packet1cd& a) { return Packet1cd(ptrue(Packet2d(a.v))); } template<> EIGEN_STRONG_INLINE Packet1cd pand (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_and_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd por (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_or_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pxor (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_xor_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pandnot(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_andnot_pd(b.v,a.v)); } // FIXME force unaligned load, this is a temporary fix template<> EIGEN_STRONG_INLINE Packet1cd pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet1cd ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet1cd pset1(const std::complex& from) { /* here we really have to use unaligned loads :( */ return ploadu(&from); } template<> EIGEN_STRONG_INLINE Packet1cd ploaddup(const std::complex* from) { return pset1(*from); } // FIXME force unaligned store, this is a temporary fix template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, Packet2d(from.v)); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, Packet2d(from.v)); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cd& a) { EIGEN_ALIGN16 double res[2]; _mm_store_pd(res, a.v); return std::complex(res[0],res[1]); } template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet1cd& a) { return pfirst(a); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cd& a) { return pfirst(a); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d) template<> EIGEN_STRONG_INLINE Packet1cd pdiv(const Packet1cd& a, const Packet1cd& b) { // TODO optimize it for SSE3 and 4 Packet1cd res = pmul(a,pconj(b)); __m128d s = _mm_mul_pd(b.v,b.v); return Packet1cd(_mm_div_pd(res.v, _mm_add_pd(s,_mm_shuffle_pd(s, s, 0x1)))); } EIGEN_STRONG_INLINE Packet1cd pcplxflip/* */(const Packet1cd& x) { return Packet1cd(preverse(Packet2d(x.v))); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m128d w1 = _mm_castps_pd(kernel.packet[0].v); __m128d w2 = _mm_castps_pd(kernel.packet[1].v); __m128 tmp = _mm_castpd_ps(_mm_unpackhi_pd(w1, w2)); kernel.packet[0].v = _mm_castpd_ps(_mm_unpacklo_pd(w1, w2)); kernel.packet[1].v = tmp; } template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) { __m128 eq = _mm_cmpeq_ps(a.v, b.v); return Packet2cf(pand(eq, vec4f_swizzle1(eq, 1, 0, 3, 2))); } template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) { __m128d eq = _mm_cmpeq_pd(a.v, b.v); return Packet1cd(pand(eq, vec2d_swizzle1(eq, 1, 0))); } template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) { __m128d result = pblend(ifPacket, _mm_castps_pd(thenPacket.v), _mm_castps_pd(elsePacket.v)); return Packet2cf(_mm_castpd_ps(result)); } template<> EIGEN_STRONG_INLINE Packet1cd psqrt(const Packet1cd& a) { return psqrt_complex(a); } template<> EIGEN_STRONG_INLINE Packet2cf psqrt(const Packet2cf& a) { return psqrt_complex(a); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX_SSE_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX/0000755000176200001440000000000014562417221020613 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/AVX/MathFunctions.h0000644000176200001440000001764614562417221023564 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATH_FUNCTIONS_AVX_H #define EIGEN_MATH_FUNCTIONS_AVX_H /* The sin and cos functions of this file are loosely derived from * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/ */ namespace Eigen { namespace internal { template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f psin(const Packet8f& _x) { return psin_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f pcos(const Packet8f& _x) { return pcos_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f plog(const Packet8f& _x) { return plog_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d plog(const Packet4d& _x) { return plog_double(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f plog2(const Packet8f& _x) { return plog2_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d plog2(const Packet4d& _x) { return plog2_double(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f plog1p(const Packet8f& _x) { return generic_plog1p(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f pexpm1(const Packet8f& _x) { return generic_expm1(_x); } // Exponential function. Works by writing "x = m*log(2) + r" where // "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then // "exp(x) = 2^m*exp(r)" where exp(r) is in the range [-1,1). template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f pexp(const Packet8f& _x) { return pexp_float(_x); } // Hyperbolic Tangent function. template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f ptanh(const Packet8f& _x) { return internal::generic_fast_tanh_float(_x); } // Exponential function for doubles. template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d pexp(const Packet4d& _x) { return pexp_double(_x); } // Functions for sqrt. // The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step // of Newton's method, at a cost of 1-2 bits of precision as opposed to the // exact solution. It does not handle +inf, or denormalized numbers correctly. // The main advantage of this approach is not just speed, but also the fact that // it can be inlined and pipelined with other computations, further reducing its // effective latency. This is similar to Quake3's fast inverse square root. // For detail see here: http://www.beyond3d.com/content/articles/8/ #if EIGEN_FAST_MATH template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f psqrt(const Packet8f& _x) { Packet8f minus_half_x = pmul(_x, pset1(-0.5f)); Packet8f denormal_mask = pandnot( pcmp_lt(_x, pset1((std::numeric_limits::min)())), pcmp_lt(_x, pzero(_x))); // Compute approximate reciprocal sqrt. Packet8f x = _mm256_rsqrt_ps(_x); // Do a single step of Newton's iteration. x = pmul(x, pmadd(minus_half_x, pmul(x,x), pset1(1.5f))); // Flush results for denormals to zero. return pandnot(pmul(_x,x), denormal_mask); } #else template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f psqrt(const Packet8f& _x) { return _mm256_sqrt_ps(_x); } #endif template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d psqrt(const Packet4d& _x) { return _mm256_sqrt_pd(_x); } #if EIGEN_FAST_MATH template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f prsqrt(const Packet8f& _x) { _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000); _EIGEN_DECLARE_CONST_Packet8f(one_point_five, 1.5f); _EIGEN_DECLARE_CONST_Packet8f(minus_half, -0.5f); _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(flt_min, 0x00800000); Packet8f neg_half = pmul(_x, p8f_minus_half); // select only the inverse sqrt of positive normal inputs (denormals are // flushed to zero and cause infs as well). Packet8f lt_min_mask = _mm256_cmp_ps(_x, p8f_flt_min, _CMP_LT_OQ); Packet8f inf_mask = _mm256_cmp_ps(_x, p8f_inf, _CMP_EQ_OQ); Packet8f not_normal_finite_mask = _mm256_or_ps(lt_min_mask, inf_mask); // Compute an approximate result using the rsqrt intrinsic. Packet8f y_approx = _mm256_rsqrt_ps(_x); // Do a single step of Newton-Raphson iteration to improve the approximation. // This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n). // It is essential to evaluate the inner term like this because forming // y_n^2 may over- or underflow. Packet8f y_newton = pmul(y_approx, pmadd(y_approx, pmul(neg_half, y_approx), p8f_one_point_five)); // Select the result of the Newton-Raphson step for positive normal arguments. // For other arguments, choose the output of the intrinsic. This will // return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(x) = +inf if // x is zero or a positive denormalized float (equivalent to flushing positive // denormalized inputs to zero). return pselect(not_normal_finite_mask, y_approx, y_newton); } #else template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f prsqrt(const Packet8f& _x) { _EIGEN_DECLARE_CONST_Packet8f(one, 1.0f); return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(_x)); } #endif template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d prsqrt(const Packet4d& _x) { _EIGEN_DECLARE_CONST_Packet4d(one, 1.0); return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(_x)); } F16_PACKET_FUNCTION(Packet8f, Packet8h, psin) F16_PACKET_FUNCTION(Packet8f, Packet8h, pcos) F16_PACKET_FUNCTION(Packet8f, Packet8h, plog) F16_PACKET_FUNCTION(Packet8f, Packet8h, plog2) F16_PACKET_FUNCTION(Packet8f, Packet8h, plog1p) F16_PACKET_FUNCTION(Packet8f, Packet8h, pexpm1) F16_PACKET_FUNCTION(Packet8f, Packet8h, pexp) F16_PACKET_FUNCTION(Packet8f, Packet8h, ptanh) F16_PACKET_FUNCTION(Packet8f, Packet8h, psqrt) F16_PACKET_FUNCTION(Packet8f, Packet8h, prsqrt) template <> EIGEN_STRONG_INLINE Packet8h pfrexp(const Packet8h& a, Packet8h& exponent) { Packet8f fexponent; const Packet8h out = float2half(pfrexp(half2float(a), fexponent)); exponent = float2half(fexponent); return out; } template <> EIGEN_STRONG_INLINE Packet8h pldexp(const Packet8h& a, const Packet8h& exponent) { return float2half(pldexp(half2float(a), half2float(exponent))); } BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psin) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pcos) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog2) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, plog1p) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexpm1) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, pexp) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, ptanh) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, psqrt) BF16_PACKET_FUNCTION(Packet8f, Packet8bf, prsqrt) template <> EIGEN_STRONG_INLINE Packet8bf pfrexp(const Packet8bf& a, Packet8bf& exponent) { Packet8f fexponent; const Packet8bf out = F32ToBf16(pfrexp(Bf16ToF32(a), fexponent)); exponent = F32ToBf16(fexponent); return out; } template <> EIGEN_STRONG_INLINE Packet8bf pldexp(const Packet8bf& a, const Packet8bf& exponent) { return F32ToBf16(pldexp(Bf16ToF32(a), Bf16ToF32(exponent))); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_AVX_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX/PacketMath.h0000644000176200001440000017614014562417221023016 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner (benoit.steiner.goog@gmail.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_AVX_H #define EIGEN_PACKET_MATH_AVX_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #if !defined(EIGEN_VECTORIZE_AVX512) && !defined(EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS) #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 16 #endif #ifdef EIGEN_VECTORIZE_FMA #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #endif typedef __m256 Packet8f; typedef __m256i Packet8i; typedef __m256d Packet4d; typedef eigen_packet_wrapper<__m128i, 2> Packet8h; typedef eigen_packet_wrapper<__m128i, 3> Packet8bf; template<> struct is_arithmetic<__m256> { enum { value = true }; }; template<> struct is_arithmetic<__m256i> { enum { value = true }; }; template<> struct is_arithmetic<__m256d> { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; #define _EIGEN_DECLARE_CONST_Packet8f(NAME,X) \ const Packet8f p8f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4d(NAME,X) \ const Packet4d p4d_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(NAME,X) \ const Packet8f p8f_##NAME = _mm256_castsi256_ps(pset1(X)) #define _EIGEN_DECLARE_CONST_Packet8i(NAME,X) \ const Packet8i p8i_##NAME = pset1(X) // Use the packet_traits defined in AVX512/PacketMath.h instead if we're going // to leverage AVX512 instructions. #ifndef EIGEN_VECTORIZE_AVX512 template<> struct packet_traits : default_packet_traits { typedef Packet8f type; typedef Packet4f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, HasCmp = 1, HasDiv = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasExp = 1, HasNdtri = 1, HasBessel = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBlend = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1 }; }; template<> struct packet_traits : default_packet_traits { typedef Packet4d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=4, HasHalfPacket = 1, HasCmp = 1, HasDiv = 1, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasBlend = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8h type; // There is no half-size packet for Packet8h. typedef Packet8h half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasNegate = 1, HasAbs = 1, HasAbs2 = 0, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBlend = 0, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasBessel = 1, HasNdtri = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8bf type; // There is no half-size packet for current Packet8bf. // TODO: support as SSE path. typedef Packet8bf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasNegate = 1, HasAbs = 1, HasAbs2 = 0, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBlend = 0, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasBessel = 1, HasNdtri = 1 }; }; #endif template<> struct scalar_div_cost { enum { value = 14 }; }; template<> struct scalar_div_cost { enum { value = 16 }; }; /* Proper support for integers is only provided by AVX2. In the meantime, we'll use SSE instructions and packets to deal with integers. template<> struct packet_traits : default_packet_traits { typedef Packet8i type; enum { Vectorizable = 1, AlignedOnScalar = 1, size=8 }; }; */ template<> struct unpacket_traits { typedef float type; typedef Packet4f half; typedef Packet8i integer_packet; typedef uint8_t mask_t; enum {size=8, alignment=Aligned32, vectorizable=true, masked_load_available=true, masked_store_available=true}; }; template<> struct unpacket_traits { typedef double type; typedef Packet2d half; enum {size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef int type; typedef Packet4i half; enum {size=8, alignment=Aligned32, vectorizable=false, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef bfloat16 type; typedef Packet8bf half; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; // Helper function for bit packing snippet of low precision comparison. // It packs the flags from 16x16 to 8x16. EIGEN_STRONG_INLINE __m128i Pack16To8(Packet8f rf) { return _mm_packs_epi32(_mm256_extractf128_si256(_mm256_castps_si256(rf), 0), _mm256_extractf128_si256(_mm256_castps_si256(rf), 1)); } template<> EIGEN_STRONG_INLINE Packet8f pset1(const float& from) { return _mm256_set1_ps(from); } template<> EIGEN_STRONG_INLINE Packet4d pset1(const double& from) { return _mm256_set1_pd(from); } template<> EIGEN_STRONG_INLINE Packet8i pset1(const int& from) { return _mm256_set1_epi32(from); } template<> EIGEN_STRONG_INLINE Packet8f pset1frombits(unsigned int from) { return _mm256_castsi256_ps(pset1(from)); } template<> EIGEN_STRONG_INLINE Packet4d pset1frombits(uint64_t from) { return _mm256_castsi256_pd(_mm256_set1_epi64x(from)); } template<> EIGEN_STRONG_INLINE Packet8f pzero(const Packet8f& /*a*/) { return _mm256_setzero_ps(); } template<> EIGEN_STRONG_INLINE Packet4d pzero(const Packet4d& /*a*/) { return _mm256_setzero_pd(); } template<> EIGEN_STRONG_INLINE Packet8i pzero(const Packet8i& /*a*/) { return _mm256_setzero_si256(); } template<> EIGEN_STRONG_INLINE Packet8f peven_mask(const Packet8f& /*a*/) { return _mm256_castsi256_ps(_mm256_set_epi32(0, -1, 0, -1, 0, -1, 0, -1)); } template<> EIGEN_STRONG_INLINE Packet8i peven_mask(const Packet8i& /*a*/) { return _mm256_set_epi32(0, -1, 0, -1, 0, -1, 0, -1); } template<> EIGEN_STRONG_INLINE Packet4d peven_mask(const Packet4d& /*a*/) { return _mm256_castsi256_pd(_mm256_set_epi32(0, 0, -1, -1, 0, 0, -1, -1)); } template<> EIGEN_STRONG_INLINE Packet8f pload1(const float* from) { return _mm256_broadcast_ss(from); } template<> EIGEN_STRONG_INLINE Packet4d pload1(const double* from) { return _mm256_broadcast_sd(from); } template<> EIGEN_STRONG_INLINE Packet8f plset(const float& a) { return _mm256_add_ps(_mm256_set1_ps(a), _mm256_set_ps(7.0,6.0,5.0,4.0,3.0,2.0,1.0,0.0)); } template<> EIGEN_STRONG_INLINE Packet4d plset(const double& a) { return _mm256_add_pd(_mm256_set1_pd(a), _mm256_set_pd(3.0,2.0,1.0,0.0)); } template<> EIGEN_STRONG_INLINE Packet8f padd(const Packet8f& a, const Packet8f& b) { return _mm256_add_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d padd(const Packet4d& a, const Packet4d& b) { return _mm256_add_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i padd(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_add_epi32(a,b); #else __m128i lo = _mm_add_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0)); __m128i hi = _mm_add_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1)); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template<> EIGEN_STRONG_INLINE Packet8f psub(const Packet8f& a, const Packet8f& b) { return _mm256_sub_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d psub(const Packet4d& a, const Packet4d& b) { return _mm256_sub_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i psub(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_sub_epi32(a,b); #else __m128i lo = _mm_sub_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0)); __m128i hi = _mm_sub_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1)); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template<> EIGEN_STRONG_INLINE Packet8f pnegate(const Packet8f& a) { return _mm256_sub_ps(_mm256_set1_ps(0.0),a); } template<> EIGEN_STRONG_INLINE Packet4d pnegate(const Packet4d& a) { return _mm256_sub_pd(_mm256_set1_pd(0.0),a); } template<> EIGEN_STRONG_INLINE Packet8f pconj(const Packet8f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4d pconj(const Packet4d& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8i pconj(const Packet8i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8f pmul(const Packet8f& a, const Packet8f& b) { return _mm256_mul_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d pmul(const Packet4d& a, const Packet4d& b) { return _mm256_mul_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i pmul(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_mullo_epi32(a,b); #else const __m128i lo = _mm_mullo_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0)); const __m128i hi = _mm_mullo_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1)); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template<> EIGEN_STRONG_INLINE Packet8f pdiv(const Packet8f& a, const Packet8f& b) { return _mm256_div_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d pdiv(const Packet4d& a, const Packet4d& b) { return _mm256_div_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i pdiv(const Packet8i& /*a*/, const Packet8i& /*b*/) { eigen_assert(false && "packet integer division are not supported by AVX"); return pset1(0); } #ifdef EIGEN_VECTORIZE_FMA template<> EIGEN_STRONG_INLINE Packet8f pmadd(const Packet8f& a, const Packet8f& b, const Packet8f& c) { #if ( (EIGEN_COMP_GNUC_STRICT && EIGEN_COMP_GNUC<80) || (EIGEN_COMP_CLANG) ) // Clang stupidly generates a vfmadd213ps instruction plus some vmovaps on registers, // and even register spilling with clang>=6.0 (bug 1637). // Gcc stupidly generates a vfmadd132ps instruction. // So let's enforce it to generate a vfmadd231ps instruction since the most common use // case is to accumulate the result of the product. Packet8f res = c; __asm__("vfmadd231ps %[a], %[b], %[c]" : [c] "+x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else return _mm256_fmadd_ps(a,b,c); #endif } template<> EIGEN_STRONG_INLINE Packet4d pmadd(const Packet4d& a, const Packet4d& b, const Packet4d& c) { #if ( (EIGEN_COMP_GNUC_STRICT && EIGEN_COMP_GNUC<80) || (EIGEN_COMP_CLANG) ) // see above Packet4d res = c; __asm__("vfmadd231pd %[a], %[b], %[c]" : [c] "+x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else return _mm256_fmadd_pd(a,b,c); #endif } #endif template<> EIGEN_STRONG_INLINE Packet8f pcmp_le(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_LE_OQ); } template<> EIGEN_STRONG_INLINE Packet8f pcmp_lt(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_LT_OQ); } template<> EIGEN_STRONG_INLINE Packet8f pcmp_lt_or_nan(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a, b, _CMP_NGE_UQ); } template<> EIGEN_STRONG_INLINE Packet8f pcmp_eq(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_EQ_OQ); } template<> EIGEN_STRONG_INLINE Packet4d pcmp_le(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_LE_OQ); } template<> EIGEN_STRONG_INLINE Packet4d pcmp_lt(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_LT_OQ); } template<> EIGEN_STRONG_INLINE Packet4d pcmp_lt_or_nan(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a, b, _CMP_NGE_UQ); } template<> EIGEN_STRONG_INLINE Packet4d pcmp_eq(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_EQ_OQ); } template<> EIGEN_STRONG_INLINE Packet8i pcmp_eq(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_cmpeq_epi32(a,b); #else __m128i lo = _mm_cmpeq_epi32(_mm256_extractf128_si256(a, 0), _mm256_extractf128_si256(b, 0)); __m128i hi = _mm_cmpeq_epi32(_mm256_extractf128_si256(a, 1), _mm256_extractf128_si256(b, 1)); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template<> EIGEN_STRONG_INLINE Packet8f pmin(const Packet8f& a, const Packet8f& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // There appears to be a bug in GCC, by which the optimizer may flip // the argument order in calls to _mm_min_ps/_mm_max_ps, so we have to // resort to inline ASM here. This is supposed to be fixed in gcc6.3, // see also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 Packet8f res; asm("vminps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else // Arguments are swapped to match NaN propagation behavior of std::min. return _mm256_min_ps(b,a); #endif } template<> EIGEN_STRONG_INLINE Packet4d pmin(const Packet4d& a, const Packet4d& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // See pmin above Packet4d res; asm("vminpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else // Arguments are swapped to match NaN propagation behavior of std::min. return _mm256_min_pd(b,a); #endif } template<> EIGEN_STRONG_INLINE Packet8f pmax(const Packet8f& a, const Packet8f& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // See pmin above Packet8f res; asm("vmaxps %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else // Arguments are swapped to match NaN propagation behavior of std::max. return _mm256_max_ps(b,a); #endif } template<> EIGEN_STRONG_INLINE Packet4d pmax(const Packet4d& a, const Packet4d& b) { #if EIGEN_COMP_GNUC && EIGEN_COMP_GNUC < 63 // See pmin above Packet4d res; asm("vmaxpd %[a], %[b], %[res]" : [res] "=x" (res) : [a] "x" (a), [b] "x" (b)); return res; #else // Arguments are swapped to match NaN propagation behavior of std::max. return _mm256_max_pd(b,a); #endif } // Add specializations for min/max with prescribed NaN progation. template<> EIGEN_STRONG_INLINE Packet8f pmin(const Packet8f& a, const Packet8f& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet4d pmin(const Packet4d& a, const Packet4d& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet8f pmax(const Packet8f& a, const Packet8f& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet4d pmax(const Packet4d& a, const Packet4d& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet8f pmin(const Packet8f& a, const Packet8f& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet4d pmin(const Packet4d& a, const Packet4d& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet8f pmax(const Packet8f& a, const Packet8f& b) { return pminmax_propagate_nan(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet4d pmax(const Packet4d& a, const Packet4d& b) { return pminmax_propagate_nan(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet8f print(const Packet8f& a) { return _mm256_round_ps(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet4d print(const Packet4d& a) { return _mm256_round_pd(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet8f pceil(const Packet8f& a) { return _mm256_ceil_ps(a); } template<> EIGEN_STRONG_INLINE Packet4d pceil(const Packet4d& a) { return _mm256_ceil_pd(a); } template<> EIGEN_STRONG_INLINE Packet8f pfloor(const Packet8f& a) { return _mm256_floor_ps(a); } template<> EIGEN_STRONG_INLINE Packet4d pfloor(const Packet4d& a) { return _mm256_floor_pd(a); } template<> EIGEN_STRONG_INLINE Packet8i ptrue(const Packet8i& a) { #ifdef EIGEN_VECTORIZE_AVX2 // vpcmpeqd has lower latency than the more general vcmpps return _mm256_cmpeq_epi32(a,a); #else const __m256 b = _mm256_castsi256_ps(a); return _mm256_castps_si256(_mm256_cmp_ps(b,b,_CMP_TRUE_UQ)); #endif } template<> EIGEN_STRONG_INLINE Packet8f ptrue(const Packet8f& a) { #ifdef EIGEN_VECTORIZE_AVX2 // vpcmpeqd has lower latency than the more general vcmpps const __m256i b = _mm256_castps_si256(a); return _mm256_castsi256_ps(_mm256_cmpeq_epi32(b,b)); #else return _mm256_cmp_ps(a,a,_CMP_TRUE_UQ); #endif } template<> EIGEN_STRONG_INLINE Packet4d ptrue(const Packet4d& a) { #ifdef EIGEN_VECTORIZE_AVX2 // vpcmpeqq has lower latency than the more general vcmppd const __m256i b = _mm256_castpd_si256(a); return _mm256_castsi256_pd(_mm256_cmpeq_epi64(b,b)); #else return _mm256_cmp_pd(a,a,_CMP_TRUE_UQ); #endif } template<> EIGEN_STRONG_INLINE Packet8f pand(const Packet8f& a, const Packet8f& b) { return _mm256_and_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d pand(const Packet4d& a, const Packet4d& b) { return _mm256_and_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i pand(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_and_si256(a,b); #else return _mm256_castps_si256(_mm256_and_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b))); #endif } template<> EIGEN_STRONG_INLINE Packet8f por(const Packet8f& a, const Packet8f& b) { return _mm256_or_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d por(const Packet4d& a, const Packet4d& b) { return _mm256_or_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i por(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_or_si256(a,b); #else return _mm256_castps_si256(_mm256_or_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b))); #endif } template<> EIGEN_STRONG_INLINE Packet8f pxor(const Packet8f& a, const Packet8f& b) { return _mm256_xor_ps(a,b); } template<> EIGEN_STRONG_INLINE Packet4d pxor(const Packet4d& a, const Packet4d& b) { return _mm256_xor_pd(a,b); } template<> EIGEN_STRONG_INLINE Packet8i pxor(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_xor_si256(a,b); #else return _mm256_castps_si256(_mm256_xor_ps(_mm256_castsi256_ps(a),_mm256_castsi256_ps(b))); #endif } template<> EIGEN_STRONG_INLINE Packet8f pandnot(const Packet8f& a, const Packet8f& b) { return _mm256_andnot_ps(b,a); } template<> EIGEN_STRONG_INLINE Packet4d pandnot(const Packet4d& a, const Packet4d& b) { return _mm256_andnot_pd(b,a); } template<> EIGEN_STRONG_INLINE Packet8i pandnot(const Packet8i& a, const Packet8i& b) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_andnot_si256(b,a); #else return _mm256_castps_si256(_mm256_andnot_ps(_mm256_castsi256_ps(b),_mm256_castsi256_ps(a))); #endif } template<> EIGEN_STRONG_INLINE Packet8f pround(const Packet8f& a) { const Packet8f mask = pset1frombits(static_cast(0x80000000u)); const Packet8f prev0dot5 = pset1frombits(static_cast(0x3EFFFFFFu)); return _mm256_round_ps(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template<> EIGEN_STRONG_INLINE Packet4d pround(const Packet4d& a) { const Packet4d mask = pset1frombits(static_cast(0x8000000000000000ull)); const Packet4d prev0dot5 = pset1frombits(static_cast(0x3FDFFFFFFFFFFFFFull)); return _mm256_round_pd(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template<> EIGEN_STRONG_INLINE Packet8f pselect(const Packet8f& mask, const Packet8f& a, const Packet8f& b) { return _mm256_blendv_ps(b,a,mask); } template<> EIGEN_STRONG_INLINE Packet4d pselect(const Packet4d& mask, const Packet4d& a, const Packet4d& b) { return _mm256_blendv_pd(b,a,mask); } template EIGEN_STRONG_INLINE Packet8i parithmetic_shift_right(Packet8i a) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_srai_epi32(a, N); #else __m128i lo = _mm_srai_epi32(_mm256_extractf128_si256(a, 0), N); __m128i hi = _mm_srai_epi32(_mm256_extractf128_si256(a, 1), N); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template EIGEN_STRONG_INLINE Packet8i plogical_shift_right(Packet8i a) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_srli_epi32(a, N); #else __m128i lo = _mm_srli_epi32(_mm256_extractf128_si256(a, 0), N); __m128i hi = _mm_srli_epi32(_mm256_extractf128_si256(a, 1), N); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template EIGEN_STRONG_INLINE Packet8i plogical_shift_left(Packet8i a) { #ifdef EIGEN_VECTORIZE_AVX2 return _mm256_slli_epi32(a, N); #else __m128i lo = _mm_slli_epi32(_mm256_extractf128_si256(a, 0), N); __m128i hi = _mm_slli_epi32(_mm256_extractf128_si256(a, 1), N); return _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1); #endif } template<> EIGEN_STRONG_INLINE Packet8f pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_ps(from); } template<> EIGEN_STRONG_INLINE Packet4d pload(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_pd(from); } template<> EIGEN_STRONG_INLINE Packet8i pload(const int* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_si256(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet8f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_ps(from); } template<> EIGEN_STRONG_INLINE Packet4d ploadu(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_pd(from); } template<> EIGEN_STRONG_INLINE Packet8i ploadu(const int* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet8f ploadu(const float* from, uint8_t umask) { Packet8i mask = _mm256_set1_epi8(static_cast(umask)); const Packet8i bit_mask = _mm256_set_epi32(0xffffff7f, 0xffffffbf, 0xffffffdf, 0xffffffef, 0xfffffff7, 0xfffffffb, 0xfffffffd, 0xfffffffe); mask = por(mask, bit_mask); mask = pcmp_eq(mask, _mm256_set1_epi32(0xffffffff)); EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_maskload_ps(from, mask); } // Loads 4 floats from memory a returns the packet {a0, a0 a1, a1, a2, a2, a3, a3} template<> EIGEN_STRONG_INLINE Packet8f ploaddup(const float* from) { // TODO try to find a way to avoid the need of a temporary register // Packet8f tmp = _mm256_castps128_ps256(_mm_loadu_ps(from)); // tmp = _mm256_insertf128_ps(tmp, _mm_movehl_ps(_mm256_castps256_ps128(tmp),_mm256_castps256_ps128(tmp)), 1); // return _mm256_unpacklo_ps(tmp,tmp); // _mm256_insertf128_ps is very slow on Haswell, thus: Packet8f tmp = _mm256_broadcast_ps((const __m128*)(const void*)from); // mimic an "inplace" permutation of the lower 128bits using a blend tmp = _mm256_blend_ps(tmp,_mm256_castps128_ps256(_mm_permute_ps( _mm256_castps256_ps128(tmp), _MM_SHUFFLE(1,0,1,0))), 15); // then we can perform a consistent permutation on the global register to get everything in shape: return _mm256_permute_ps(tmp, _MM_SHUFFLE(3,3,2,2)); } // Loads 2 doubles from memory a returns the packet {a0, a0 a1, a1} template<> EIGEN_STRONG_INLINE Packet4d ploaddup(const double* from) { Packet4d tmp = _mm256_broadcast_pd((const __m128d*)(const void*)from); return _mm256_permute_pd(tmp, 3<<2); } // Loads 2 floats from memory a returns the packet {a0, a0 a0, a0, a1, a1, a1, a1} template<> EIGEN_STRONG_INLINE Packet8f ploadquad(const float* from) { Packet8f tmp = _mm256_castps128_ps256(_mm_broadcast_ss(from)); return _mm256_insertf128_ps(tmp, _mm_broadcast_ss(from+1), 1); } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet8f& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_store_ps(to, from); } template<> EIGEN_STRONG_INLINE void pstore(double* to, const Packet4d& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_store_pd(to, from); } template<> EIGEN_STRONG_INLINE void pstore(int* to, const Packet8i& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet8f& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_ps(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet4d& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_pd(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(int* to, const Packet8i& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet8f& from, uint8_t umask) { Packet8i mask = _mm256_set1_epi8(static_cast(umask)); const Packet8i bit_mask = _mm256_set_epi32(0xffffff7f, 0xffffffbf, 0xffffffdf, 0xffffffef, 0xfffffff7, 0xfffffffb, 0xfffffffd, 0xfffffffe); mask = por(mask, bit_mask); mask = pcmp_eq(mask, _mm256_set1_epi32(0xffffffff)); EIGEN_DEBUG_UNALIGNED_STORE return _mm256_maskstore_ps(to, mask, from); } // NOTE: leverage _mm256_i32gather_ps and _mm256_i32gather_pd if AVX2 instructions are available // NOTE: for the record the following seems to be slower: return _mm256_i32gather_ps(from, _mm256_set1_epi32(stride), 4); template<> EIGEN_DEVICE_FUNC inline Packet8f pgather(const float* from, Index stride) { return _mm256_set_ps(from[7*stride], from[6*stride], from[5*stride], from[4*stride], from[3*stride], from[2*stride], from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline Packet4d pgather(const double* from, Index stride) { return _mm256_set_pd(from[3*stride], from[2*stride], from[1*stride], from[0*stride]); } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet8f& from, Index stride) { __m128 low = _mm256_extractf128_ps(from, 0); to[stride*0] = _mm_cvtss_f32(low); to[stride*1] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 1)); to[stride*2] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 2)); to[stride*3] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 3)); __m128 high = _mm256_extractf128_ps(from, 1); to[stride*4] = _mm_cvtss_f32(high); to[stride*5] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 1)); to[stride*6] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 2)); to[stride*7] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 3)); } template<> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet4d& from, Index stride) { __m128d low = _mm256_extractf128_pd(from, 0); to[stride*0] = _mm_cvtsd_f64(low); to[stride*1] = _mm_cvtsd_f64(_mm_shuffle_pd(low, low, 1)); __m128d high = _mm256_extractf128_pd(from, 1); to[stride*2] = _mm_cvtsd_f64(high); to[stride*3] = _mm_cvtsd_f64(_mm_shuffle_pd(high, high, 1)); } template<> EIGEN_STRONG_INLINE void pstore1(float* to, const float& a) { Packet8f pa = pset1(a); pstore(to, pa); } template<> EIGEN_STRONG_INLINE void pstore1(double* to, const double& a) { Packet4d pa = pset1(a); pstore(to, pa); } template<> EIGEN_STRONG_INLINE void pstore1(int* to, const int& a) { Packet8i pa = pset1(a); pstore(to, pa); } #ifndef EIGEN_VECTORIZE_AVX512 template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const int* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } #endif template<> EIGEN_STRONG_INLINE float pfirst(const Packet8f& a) { return _mm_cvtss_f32(_mm256_castps256_ps128(a)); } template<> EIGEN_STRONG_INLINE double pfirst(const Packet4d& a) { return _mm_cvtsd_f64(_mm256_castpd256_pd128(a)); } template<> EIGEN_STRONG_INLINE int pfirst(const Packet8i& a) { return _mm_cvtsi128_si32(_mm256_castsi256_si128(a)); } template<> EIGEN_STRONG_INLINE Packet8f preverse(const Packet8f& a) { __m256 tmp = _mm256_shuffle_ps(a,a,0x1b); return _mm256_permute2f128_ps(tmp, tmp, 1); } template<> EIGEN_STRONG_INLINE Packet4d preverse(const Packet4d& a) { __m256d tmp = _mm256_shuffle_pd(a,a,5); return _mm256_permute2f128_pd(tmp, tmp, 1); #if 0 // This version is unlikely to be faster as _mm256_shuffle_ps and _mm256_permute_pd // exhibit the same latency/throughput, but it is here for future reference/benchmarking... __m256d swap_halves = _mm256_permute2f128_pd(a,a,1); return _mm256_permute_pd(swap_halves,5); #endif } // pabs should be ok template<> EIGEN_STRONG_INLINE Packet8f pabs(const Packet8f& a) { const Packet8f mask = _mm256_castsi256_ps(_mm256_setr_epi32(0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF)); return _mm256_and_ps(a,mask); } template<> EIGEN_STRONG_INLINE Packet4d pabs(const Packet4d& a) { const Packet4d mask = _mm256_castsi256_pd(_mm256_setr_epi32(0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF)); return _mm256_and_pd(a,mask); } template<> EIGEN_STRONG_INLINE Packet8f pfrexp(const Packet8f& a, Packet8f& exponent) { return pfrexp_generic(a,exponent); } // Extract exponent without existence of Packet4l. template<> EIGEN_STRONG_INLINE Packet4d pfrexp_generic_get_biased_exponent(const Packet4d& a) { const Packet4d cst_exp_mask = pset1frombits(static_cast(0x7ff0000000000000ull)); __m256i a_expo = _mm256_castpd_si256(pand(a, cst_exp_mask)); #ifdef EIGEN_VECTORIZE_AVX2 a_expo = _mm256_srli_epi64(a_expo, 52); __m128i lo = _mm256_extractf128_si256(a_expo, 0); __m128i hi = _mm256_extractf128_si256(a_expo, 1); #else __m128i lo = _mm256_extractf128_si256(a_expo, 0); __m128i hi = _mm256_extractf128_si256(a_expo, 1); lo = _mm_srli_epi64(lo, 52); hi = _mm_srli_epi64(hi, 52); #endif Packet2d exponent_lo = _mm_cvtepi32_pd(vec4i_swizzle1(lo, 0, 2, 1, 3)); Packet2d exponent_hi = _mm_cvtepi32_pd(vec4i_swizzle1(hi, 0, 2, 1, 3)); Packet4d exponent = _mm256_insertf128_pd(_mm256_setzero_pd(), exponent_lo, 0); exponent = _mm256_insertf128_pd(exponent, exponent_hi, 1); return exponent; } template<> EIGEN_STRONG_INLINE Packet4d pfrexp(const Packet4d& a, Packet4d& exponent) { return pfrexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet8f pldexp(const Packet8f& a, const Packet8f& exponent) { return pldexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet4d pldexp(const Packet4d& a, const Packet4d& exponent) { // Clamp exponent to [-2099, 2099] const Packet4d max_exponent = pset1(2099.0); const Packet4i e = _mm256_cvtpd_epi32(pmin(pmax(exponent, pnegate(max_exponent)), max_exponent)); // Split 2^e into four factors and multiply. const Packet4i bias = pset1(1023); Packet4i b = parithmetic_shift_right<2>(e); // floor(e/4) // 2^b Packet4i hi = vec4i_swizzle1(padd(b, bias), 0, 2, 1, 3); Packet4i lo = _mm_slli_epi64(hi, 52); hi = _mm_slli_epi64(_mm_srli_epi64(hi, 32), 52); Packet4d c = _mm256_castsi256_pd(_mm256_insertf128_si256(_mm256_castsi128_si256(lo), hi, 1)); Packet4d out = pmul(pmul(pmul(a, c), c), c); // a * 2^(3b) // 2^(e - 3b) b = psub(psub(psub(e, b), b), b); // e - 3b hi = vec4i_swizzle1(padd(b, bias), 0, 2, 1, 3); lo = _mm_slli_epi64(hi, 52); hi = _mm_slli_epi64(_mm_srli_epi64(hi, 32), 52); c = _mm256_castsi256_pd(_mm256_insertf128_si256(_mm256_castsi128_si256(lo), hi, 1)); out = pmul(out, c); // a * 2^e return out; } template<> EIGEN_STRONG_INLINE float predux(const Packet8f& a) { return predux(Packet4f(_mm_add_ps(_mm256_castps256_ps128(a),_mm256_extractf128_ps(a,1)))); } template<> EIGEN_STRONG_INLINE double predux(const Packet4d& a) { return predux(Packet2d(_mm_add_pd(_mm256_castpd256_pd128(a),_mm256_extractf128_pd(a,1)))); } template<> EIGEN_STRONG_INLINE Packet4f predux_half_dowto4(const Packet8f& a) { return _mm_add_ps(_mm256_castps256_ps128(a),_mm256_extractf128_ps(a,1)); } template<> EIGEN_STRONG_INLINE float predux_mul(const Packet8f& a) { Packet8f tmp; tmp = _mm256_mul_ps(a, _mm256_permute2f128_ps(a,a,1)); tmp = _mm256_mul_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2))); return pfirst(_mm256_mul_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1))); } template<> EIGEN_STRONG_INLINE double predux_mul(const Packet4d& a) { Packet4d tmp; tmp = _mm256_mul_pd(a, _mm256_permute2f128_pd(a,a,1)); return pfirst(_mm256_mul_pd(tmp, _mm256_shuffle_pd(tmp,tmp,1))); } template<> EIGEN_STRONG_INLINE float predux_min(const Packet8f& a) { Packet8f tmp = _mm256_min_ps(a, _mm256_permute2f128_ps(a,a,1)); tmp = _mm256_min_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2))); return pfirst(_mm256_min_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1))); } template<> EIGEN_STRONG_INLINE double predux_min(const Packet4d& a) { Packet4d tmp = _mm256_min_pd(a, _mm256_permute2f128_pd(a,a,1)); return pfirst(_mm256_min_pd(tmp, _mm256_shuffle_pd(tmp, tmp, 1))); } template<> EIGEN_STRONG_INLINE float predux_max(const Packet8f& a) { Packet8f tmp = _mm256_max_ps(a, _mm256_permute2f128_ps(a,a,1)); tmp = _mm256_max_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2))); return pfirst(_mm256_max_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1))); } template<> EIGEN_STRONG_INLINE double predux_max(const Packet4d& a) { Packet4d tmp = _mm256_max_pd(a, _mm256_permute2f128_pd(a,a,1)); return pfirst(_mm256_max_pd(tmp, _mm256_shuffle_pd(tmp, tmp, 1))); } // not needed yet // template<> EIGEN_STRONG_INLINE bool predux_all(const Packet8f& x) // { // return _mm256_movemask_ps(x)==0xFF; // } template<> EIGEN_STRONG_INLINE bool predux_any(const Packet8f& x) { return _mm256_movemask_ps(x)!=0; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m256 T0 = _mm256_unpacklo_ps(kernel.packet[0], kernel.packet[1]); __m256 T1 = _mm256_unpackhi_ps(kernel.packet[0], kernel.packet[1]); __m256 T2 = _mm256_unpacklo_ps(kernel.packet[2], kernel.packet[3]); __m256 T3 = _mm256_unpackhi_ps(kernel.packet[2], kernel.packet[3]); __m256 T4 = _mm256_unpacklo_ps(kernel.packet[4], kernel.packet[5]); __m256 T5 = _mm256_unpackhi_ps(kernel.packet[4], kernel.packet[5]); __m256 T6 = _mm256_unpacklo_ps(kernel.packet[6], kernel.packet[7]); __m256 T7 = _mm256_unpackhi_ps(kernel.packet[6], kernel.packet[7]); __m256 S0 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(1,0,1,0)); __m256 S1 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(3,2,3,2)); __m256 S2 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(1,0,1,0)); __m256 S3 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(3,2,3,2)); __m256 S4 = _mm256_shuffle_ps(T4,T6,_MM_SHUFFLE(1,0,1,0)); __m256 S5 = _mm256_shuffle_ps(T4,T6,_MM_SHUFFLE(3,2,3,2)); __m256 S6 = _mm256_shuffle_ps(T5,T7,_MM_SHUFFLE(1,0,1,0)); __m256 S7 = _mm256_shuffle_ps(T5,T7,_MM_SHUFFLE(3,2,3,2)); kernel.packet[0] = _mm256_permute2f128_ps(S0, S4, 0x20); kernel.packet[1] = _mm256_permute2f128_ps(S1, S5, 0x20); kernel.packet[2] = _mm256_permute2f128_ps(S2, S6, 0x20); kernel.packet[3] = _mm256_permute2f128_ps(S3, S7, 0x20); kernel.packet[4] = _mm256_permute2f128_ps(S0, S4, 0x31); kernel.packet[5] = _mm256_permute2f128_ps(S1, S5, 0x31); kernel.packet[6] = _mm256_permute2f128_ps(S2, S6, 0x31); kernel.packet[7] = _mm256_permute2f128_ps(S3, S7, 0x31); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m256 T0 = _mm256_unpacklo_ps(kernel.packet[0], kernel.packet[1]); __m256 T1 = _mm256_unpackhi_ps(kernel.packet[0], kernel.packet[1]); __m256 T2 = _mm256_unpacklo_ps(kernel.packet[2], kernel.packet[3]); __m256 T3 = _mm256_unpackhi_ps(kernel.packet[2], kernel.packet[3]); __m256 S0 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(1,0,1,0)); __m256 S1 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(3,2,3,2)); __m256 S2 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(1,0,1,0)); __m256 S3 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(3,2,3,2)); kernel.packet[0] = _mm256_permute2f128_ps(S0, S1, 0x20); kernel.packet[1] = _mm256_permute2f128_ps(S2, S3, 0x20); kernel.packet[2] = _mm256_permute2f128_ps(S0, S1, 0x31); kernel.packet[3] = _mm256_permute2f128_ps(S2, S3, 0x31); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m256d T0 = _mm256_shuffle_pd(kernel.packet[0], kernel.packet[1], 15); __m256d T1 = _mm256_shuffle_pd(kernel.packet[0], kernel.packet[1], 0); __m256d T2 = _mm256_shuffle_pd(kernel.packet[2], kernel.packet[3], 15); __m256d T3 = _mm256_shuffle_pd(kernel.packet[2], kernel.packet[3], 0); kernel.packet[1] = _mm256_permute2f128_pd(T0, T2, 32); kernel.packet[3] = _mm256_permute2f128_pd(T0, T2, 49); kernel.packet[0] = _mm256_permute2f128_pd(T1, T3, 32); kernel.packet[2] = _mm256_permute2f128_pd(T1, T3, 49); } template<> EIGEN_STRONG_INLINE Packet8f pblend(const Selector<8>& ifPacket, const Packet8f& thenPacket, const Packet8f& elsePacket) { const __m256 zero = _mm256_setzero_ps(); const __m256 select = _mm256_set_ps(ifPacket.select[7], ifPacket.select[6], ifPacket.select[5], ifPacket.select[4], ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]); __m256 false_mask = _mm256_cmp_ps(select, zero, _CMP_EQ_UQ); return _mm256_blendv_ps(thenPacket, elsePacket, false_mask); } template<> EIGEN_STRONG_INLINE Packet4d pblend(const Selector<4>& ifPacket, const Packet4d& thenPacket, const Packet4d& elsePacket) { const __m256d zero = _mm256_setzero_pd(); const __m256d select = _mm256_set_pd(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]); __m256d false_mask = _mm256_cmp_pd(select, zero, _CMP_EQ_UQ); return _mm256_blendv_pd(thenPacket, elsePacket, false_mask); } // Packet math for Eigen::half template<> struct unpacket_traits { typedef Eigen::half type; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet8h half; }; template<> EIGEN_STRONG_INLINE Packet8h pset1(const Eigen::half& from) { return _mm_set1_epi16(numext::bit_cast(from)); } template<> EIGEN_STRONG_INLINE Eigen::half pfirst(const Packet8h& from) { return numext::bit_cast(static_cast(_mm_extract_epi16(from, 0))); } template<> EIGEN_STRONG_INLINE Packet8h pload(const Eigen::half* from) { return _mm_load_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet8h ploadu(const Eigen::half* from) { return _mm_loadu_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const Packet8h& from) { _mm_store_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const Packet8h& from) { _mm_storeu_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE Packet8h ploaddup(const Eigen::half* from) { const numext::uint16_t a = numext::bit_cast(from[0]); const numext::uint16_t b = numext::bit_cast(from[1]); const numext::uint16_t c = numext::bit_cast(from[2]); const numext::uint16_t d = numext::bit_cast(from[3]); return _mm_set_epi16(d, d, c, c, b, b, a, a); } template<> EIGEN_STRONG_INLINE Packet8h ploadquad(const Eigen::half* from) { const numext::uint16_t a = numext::bit_cast(from[0]); const numext::uint16_t b = numext::bit_cast(from[1]); return _mm_set_epi16(b, b, b, b, a, a, a, a); } template<> EIGEN_STRONG_INLINE Packet8h ptrue(const Packet8h& a) { return _mm_cmpeq_epi32(a, a); } template <> EIGEN_STRONG_INLINE Packet8h pabs(const Packet8h& a) { const __m128i sign_mask = _mm_set1_epi16(static_cast(0x8000)); return _mm_andnot_si128(sign_mask, a); } EIGEN_STRONG_INLINE Packet8f half2float(const Packet8h& a) { #ifdef EIGEN_HAS_FP16_C return _mm256_cvtph_ps(a); #else EIGEN_ALIGN32 Eigen::half aux[8]; pstore(aux, a); float f0(aux[0]); float f1(aux[1]); float f2(aux[2]); float f3(aux[3]); float f4(aux[4]); float f5(aux[5]); float f6(aux[6]); float f7(aux[7]); return _mm256_set_ps(f7, f6, f5, f4, f3, f2, f1, f0); #endif } EIGEN_STRONG_INLINE Packet8h float2half(const Packet8f& a) { #ifdef EIGEN_HAS_FP16_C return _mm256_cvtps_ph(a, _MM_FROUND_TO_NEAREST_INT|_MM_FROUND_NO_EXC); #else EIGEN_ALIGN32 float aux[8]; pstore(aux, a); const numext::uint16_t s0 = numext::bit_cast(Eigen::half(aux[0])); const numext::uint16_t s1 = numext::bit_cast(Eigen::half(aux[1])); const numext::uint16_t s2 = numext::bit_cast(Eigen::half(aux[2])); const numext::uint16_t s3 = numext::bit_cast(Eigen::half(aux[3])); const numext::uint16_t s4 = numext::bit_cast(Eigen::half(aux[4])); const numext::uint16_t s5 = numext::bit_cast(Eigen::half(aux[5])); const numext::uint16_t s6 = numext::bit_cast(Eigen::half(aux[6])); const numext::uint16_t s7 = numext::bit_cast(Eigen::half(aux[7])); return _mm_set_epi16(s7, s6, s5, s4, s3, s2, s1, s0); #endif } template <> EIGEN_STRONG_INLINE Packet8h pmin(const Packet8h& a, const Packet8h& b) { return float2half(pmin(half2float(a), half2float(b))); } template <> EIGEN_STRONG_INLINE Packet8h pmax(const Packet8h& a, const Packet8h& b) { return float2half(pmax(half2float(a), half2float(b))); } template <> EIGEN_STRONG_INLINE Packet8h plset(const half& a) { return float2half(plset(static_cast(a))); } template<> EIGEN_STRONG_INLINE Packet8h por(const Packet8h& a,const Packet8h& b) { // in some cases Packet4i is a wrapper around __m128i, so we either need to // cast to Packet4i to directly call the intrinsics as below: return _mm_or_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8h pxor(const Packet8h& a,const Packet8h& b) { return _mm_xor_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8h pand(const Packet8h& a,const Packet8h& b) { return _mm_and_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8h pandnot(const Packet8h& a,const Packet8h& b) { return _mm_andnot_si128(b,a); } template<> EIGEN_STRONG_INLINE Packet8h pselect(const Packet8h& mask, const Packet8h& a, const Packet8h& b) { return _mm_blendv_epi8(b, a, mask); } template<> EIGEN_STRONG_INLINE Packet8h pround(const Packet8h& a) { return float2half(pround(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet8h print(const Packet8h& a) { return float2half(print(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet8h pceil(const Packet8h& a) { return float2half(pceil(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet8h pfloor(const Packet8h& a) { return float2half(pfloor(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet8h pcmp_eq(const Packet8h& a,const Packet8h& b) { return Pack16To8(pcmp_eq(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet8h pcmp_le(const Packet8h& a,const Packet8h& b) { return Pack16To8(pcmp_le(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet8h pcmp_lt(const Packet8h& a,const Packet8h& b) { return Pack16To8(pcmp_lt(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet8h pcmp_lt_or_nan(const Packet8h& a,const Packet8h& b) { return Pack16To8(pcmp_lt_or_nan(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet8h pconj(const Packet8h& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8h pnegate(const Packet8h& a) { Packet8h sign_mask = _mm_set1_epi16(static_cast(0x8000)); return _mm_xor_si128(a, sign_mask); } template<> EIGEN_STRONG_INLINE Packet8h padd(const Packet8h& a, const Packet8h& b) { Packet8f af = half2float(a); Packet8f bf = half2float(b); Packet8f rf = padd(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet8h psub(const Packet8h& a, const Packet8h& b) { Packet8f af = half2float(a); Packet8f bf = half2float(b); Packet8f rf = psub(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet8h pmul(const Packet8h& a, const Packet8h& b) { Packet8f af = half2float(a); Packet8f bf = half2float(b); Packet8f rf = pmul(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet8h pdiv(const Packet8h& a, const Packet8h& b) { Packet8f af = half2float(a); Packet8f bf = half2float(b); Packet8f rf = pdiv(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet8h pgather(const Eigen::half* from, Index stride) { const numext::uint16_t s0 = numext::bit_cast(from[0*stride]); const numext::uint16_t s1 = numext::bit_cast(from[1*stride]); const numext::uint16_t s2 = numext::bit_cast(from[2*stride]); const numext::uint16_t s3 = numext::bit_cast(from[3*stride]); const numext::uint16_t s4 = numext::bit_cast(from[4*stride]); const numext::uint16_t s5 = numext::bit_cast(from[5*stride]); const numext::uint16_t s6 = numext::bit_cast(from[6*stride]); const numext::uint16_t s7 = numext::bit_cast(from[7*stride]); return _mm_set_epi16(s7, s6, s5, s4, s3, s2, s1, s0); } template<> EIGEN_STRONG_INLINE void pscatter(Eigen::half* to, const Packet8h& from, Index stride) { EIGEN_ALIGN32 Eigen::half aux[8]; pstore(aux, from); to[stride*0] = aux[0]; to[stride*1] = aux[1]; to[stride*2] = aux[2]; to[stride*3] = aux[3]; to[stride*4] = aux[4]; to[stride*5] = aux[5]; to[stride*6] = aux[6]; to[stride*7] = aux[7]; } template<> EIGEN_STRONG_INLINE Eigen::half predux(const Packet8h& a) { Packet8f af = half2float(a); float reduced = predux(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE Eigen::half predux_max(const Packet8h& a) { Packet8f af = half2float(a); float reduced = predux_max(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE Eigen::half predux_min(const Packet8h& a) { Packet8f af = half2float(a); float reduced = predux_min(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE Eigen::half predux_mul(const Packet8h& a) { Packet8f af = half2float(a); float reduced = predux_mul(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE Packet8h preverse(const Packet8h& a) { __m128i m = _mm_setr_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); return _mm_shuffle_epi8(a,m); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m128i a = kernel.packet[0]; __m128i b = kernel.packet[1]; __m128i c = kernel.packet[2]; __m128i d = kernel.packet[3]; __m128i e = kernel.packet[4]; __m128i f = kernel.packet[5]; __m128i g = kernel.packet[6]; __m128i h = kernel.packet[7]; __m128i a03b03 = _mm_unpacklo_epi16(a, b); __m128i c03d03 = _mm_unpacklo_epi16(c, d); __m128i e03f03 = _mm_unpacklo_epi16(e, f); __m128i g03h03 = _mm_unpacklo_epi16(g, h); __m128i a47b47 = _mm_unpackhi_epi16(a, b); __m128i c47d47 = _mm_unpackhi_epi16(c, d); __m128i e47f47 = _mm_unpackhi_epi16(e, f); __m128i g47h47 = _mm_unpackhi_epi16(g, h); __m128i a01b01c01d01 = _mm_unpacklo_epi32(a03b03, c03d03); __m128i a23b23c23d23 = _mm_unpackhi_epi32(a03b03, c03d03); __m128i e01f01g01h01 = _mm_unpacklo_epi32(e03f03, g03h03); __m128i e23f23g23h23 = _mm_unpackhi_epi32(e03f03, g03h03); __m128i a45b45c45d45 = _mm_unpacklo_epi32(a47b47, c47d47); __m128i a67b67c67d67 = _mm_unpackhi_epi32(a47b47, c47d47); __m128i e45f45g45h45 = _mm_unpacklo_epi32(e47f47, g47h47); __m128i e67f67g67h67 = _mm_unpackhi_epi32(e47f47, g47h47); __m128i a0b0c0d0e0f0g0h0 = _mm_unpacklo_epi64(a01b01c01d01, e01f01g01h01); __m128i a1b1c1d1e1f1g1h1 = _mm_unpackhi_epi64(a01b01c01d01, e01f01g01h01); __m128i a2b2c2d2e2f2g2h2 = _mm_unpacklo_epi64(a23b23c23d23, e23f23g23h23); __m128i a3b3c3d3e3f3g3h3 = _mm_unpackhi_epi64(a23b23c23d23, e23f23g23h23); __m128i a4b4c4d4e4f4g4h4 = _mm_unpacklo_epi64(a45b45c45d45, e45f45g45h45); __m128i a5b5c5d5e5f5g5h5 = _mm_unpackhi_epi64(a45b45c45d45, e45f45g45h45); __m128i a6b6c6d6e6f6g6h6 = _mm_unpacklo_epi64(a67b67c67d67, e67f67g67h67); __m128i a7b7c7d7e7f7g7h7 = _mm_unpackhi_epi64(a67b67c67d67, e67f67g67h67); kernel.packet[0] = a0b0c0d0e0f0g0h0; kernel.packet[1] = a1b1c1d1e1f1g1h1; kernel.packet[2] = a2b2c2d2e2f2g2h2; kernel.packet[3] = a3b3c3d3e3f3g3h3; kernel.packet[4] = a4b4c4d4e4f4g4h4; kernel.packet[5] = a5b5c5d5e5f5g5h5; kernel.packet[6] = a6b6c6d6e6f6g6h6; kernel.packet[7] = a7b7c7d7e7f7g7h7; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { EIGEN_ALIGN32 Eigen::half in[4][8]; pstore(in[0], kernel.packet[0]); pstore(in[1], kernel.packet[1]); pstore(in[2], kernel.packet[2]); pstore(in[3], kernel.packet[3]); EIGEN_ALIGN32 Eigen::half out[4][8]; for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { out[i][j] = in[j][2*i]; } for (int j = 0; j < 4; ++j) { out[i][j+4] = in[j][2*i+1]; } } kernel.packet[0] = pload(out[0]); kernel.packet[1] = pload(out[1]); kernel.packet[2] = pload(out[2]); kernel.packet[3] = pload(out[3]); } // BFloat16 implementation. EIGEN_STRONG_INLINE Packet8f Bf16ToF32(const Packet8bf& a) { #ifdef EIGEN_VECTORIZE_AVX2 __m256i extend = _mm256_cvtepu16_epi32(a); return _mm256_castsi256_ps(_mm256_slli_epi32(extend, 16)); #else __m128i lo = _mm_cvtepu16_epi32(a); __m128i hi = _mm_cvtepu16_epi32(_mm_srli_si128(a, 8)); __m128i lo_shift = _mm_slli_epi32(lo, 16); __m128i hi_shift = _mm_slli_epi32(hi, 16); return _mm256_castsi256_ps(_mm256_insertf128_si256(_mm256_castsi128_si256(lo_shift), hi_shift, 1)); #endif } // Convert float to bfloat16 according to round-to-nearest-even/denormals algorithm. EIGEN_STRONG_INLINE Packet8bf F32ToBf16(const Packet8f& a) { Packet8bf r; __m256i input = _mm256_castps_si256(a); #ifdef EIGEN_VECTORIZE_AVX2 // uint32_t lsb = (input >> 16); __m256i t = _mm256_srli_epi32(input, 16); // uint32_t lsb = lsb & 1; t = _mm256_and_si256(t, _mm256_set1_epi32(1)); // uint32_t rounding_bias = 0x7fff + lsb; t = _mm256_add_epi32(t, _mm256_set1_epi32(0x7fff)); // input += rounding_bias; t = _mm256_add_epi32(t, input); // input = input >> 16; t = _mm256_srli_epi32(t, 16); // Check NaN before converting back to bf16 __m256 mask = _mm256_cmp_ps(a, a, _CMP_ORD_Q); __m256i nan = _mm256_set1_epi32(0x7fc0); t = _mm256_blendv_epi8(nan, t, _mm256_castps_si256(mask)); // output = numext::bit_cast(input); return _mm_packus_epi32(_mm256_extractf128_si256(t, 0), _mm256_extractf128_si256(t, 1)); #else // uint32_t lsb = (input >> 16); __m128i lo = _mm_srli_epi32(_mm256_extractf128_si256(input, 0), 16); __m128i hi = _mm_srli_epi32(_mm256_extractf128_si256(input, 1), 16); // uint32_t lsb = lsb & 1; lo = _mm_and_si128(lo, _mm_set1_epi32(1)); hi = _mm_and_si128(hi, _mm_set1_epi32(1)); // uint32_t rounding_bias = 0x7fff + lsb; lo = _mm_add_epi32(lo, _mm_set1_epi32(0x7fff)); hi = _mm_add_epi32(hi, _mm_set1_epi32(0x7fff)); // input += rounding_bias; lo = _mm_add_epi32(lo, _mm256_extractf128_si256(input, 0)); hi = _mm_add_epi32(hi, _mm256_extractf128_si256(input, 1)); // input = input >> 16; lo = _mm_srli_epi32(lo, 16); hi = _mm_srli_epi32(hi, 16); // Check NaN before converting back to bf16 __m256 mask = _mm256_cmp_ps(a, a, _CMP_ORD_Q); __m128i nan = _mm_set1_epi32(0x7fc0); lo = _mm_blendv_epi8(nan, lo, _mm_castps_si128(_mm256_castps256_ps128(mask))); hi = _mm_blendv_epi8(nan, hi, _mm_castps_si128(_mm256_extractf128_ps(mask, 1))); // output = numext::bit_cast(input); return _mm_packus_epi32(lo, hi); #endif } template<> EIGEN_STRONG_INLINE Packet8bf pset1(const bfloat16& from) { return _mm_set1_epi16(numext::bit_cast(from)); } template<> EIGEN_STRONG_INLINE bfloat16 pfirst(const Packet8bf& from) { return numext::bit_cast(static_cast(_mm_extract_epi16(from, 0))); } template<> EIGEN_STRONG_INLINE Packet8bf pload(const bfloat16* from) { return _mm_load_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet8bf ploadu(const bfloat16* from) { return _mm_loadu_si128(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE void pstore(bfloat16* to, const Packet8bf& from) { _mm_store_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(bfloat16* to, const Packet8bf& from) { _mm_storeu_si128(reinterpret_cast<__m128i*>(to), from); } template<> EIGEN_STRONG_INLINE Packet8bf ploaddup(const bfloat16* from) { const numext::uint16_t a = numext::bit_cast(from[0]); const numext::uint16_t b = numext::bit_cast(from[1]); const numext::uint16_t c = numext::bit_cast(from[2]); const numext::uint16_t d = numext::bit_cast(from[3]); return _mm_set_epi16(d, d, c, c, b, b, a, a); } template<> EIGEN_STRONG_INLINE Packet8bf ploadquad(const bfloat16* from) { const numext::uint16_t a = numext::bit_cast(from[0]); const numext::uint16_t b = numext::bit_cast(from[1]); return _mm_set_epi16(b, b, b, b, a, a, a, a); } template<> EIGEN_STRONG_INLINE Packet8bf ptrue(const Packet8bf& a) { return _mm_cmpeq_epi32(a, a); } template <> EIGEN_STRONG_INLINE Packet8bf pabs(const Packet8bf& a) { const __m128i sign_mask = _mm_set1_epi16(static_cast(0x8000)); return _mm_andnot_si128(sign_mask, a); } template <> EIGEN_STRONG_INLINE Packet8bf pmin(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(pmin(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet8bf pmax(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(pmax(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet8bf plset(const bfloat16& a) { return F32ToBf16(plset(static_cast(a))); } template<> EIGEN_STRONG_INLINE Packet8bf por(const Packet8bf& a,const Packet8bf& b) { return _mm_or_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8bf pxor(const Packet8bf& a,const Packet8bf& b) { return _mm_xor_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8bf pand(const Packet8bf& a,const Packet8bf& b) { return _mm_and_si128(a,b); } template<> EIGEN_STRONG_INLINE Packet8bf pandnot(const Packet8bf& a,const Packet8bf& b) { return _mm_andnot_si128(b,a); } template<> EIGEN_STRONG_INLINE Packet8bf pselect(const Packet8bf& mask, const Packet8bf& a, const Packet8bf& b) { return _mm_blendv_epi8(b, a, mask); } template<> EIGEN_STRONG_INLINE Packet8bf pround(const Packet8bf& a) { return F32ToBf16(pround(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet8bf print(const Packet8bf& a) { return F32ToBf16(print(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet8bf pceil(const Packet8bf& a) { return F32ToBf16(pceil(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet8bf pfloor(const Packet8bf& a) { return F32ToBf16(pfloor(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_eq(const Packet8bf& a,const Packet8bf& b) { return Pack16To8(pcmp_eq(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_le(const Packet8bf& a,const Packet8bf& b) { return Pack16To8(pcmp_le(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt(const Packet8bf& a,const Packet8bf& b) { return Pack16To8(pcmp_lt(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt_or_nan(const Packet8bf& a,const Packet8bf& b) { return Pack16To8(pcmp_lt_or_nan(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pconj(const Packet8bf& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8bf pnegate(const Packet8bf& a) { Packet8bf sign_mask = _mm_set1_epi16(static_cast(0x8000)); return _mm_xor_si128(a, sign_mask); } template<> EIGEN_STRONG_INLINE Packet8bf padd(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(padd(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf psub(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(psub(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pmul(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(pmul(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pdiv(const Packet8bf& a, const Packet8bf& b) { return F32ToBf16(pdiv(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet8bf pgather(const bfloat16* from, Index stride) { const numext::uint16_t s0 = numext::bit_cast(from[0*stride]); const numext::uint16_t s1 = numext::bit_cast(from[1*stride]); const numext::uint16_t s2 = numext::bit_cast(from[2*stride]); const numext::uint16_t s3 = numext::bit_cast(from[3*stride]); const numext::uint16_t s4 = numext::bit_cast(from[4*stride]); const numext::uint16_t s5 = numext::bit_cast(from[5*stride]); const numext::uint16_t s6 = numext::bit_cast(from[6*stride]); const numext::uint16_t s7 = numext::bit_cast(from[7*stride]); return _mm_set_epi16(s7, s6, s5, s4, s3, s2, s1, s0); } template<> EIGEN_STRONG_INLINE void pscatter(bfloat16* to, const Packet8bf& from, Index stride) { EIGEN_ALIGN32 bfloat16 aux[8]; pstore(aux, from); to[stride*0] = aux[0]; to[stride*1] = aux[1]; to[stride*2] = aux[2]; to[stride*3] = aux[3]; to[stride*4] = aux[4]; to[stride*5] = aux[5]; to[stride*6] = aux[6]; to[stride*7] = aux[7]; } template<> EIGEN_STRONG_INLINE bfloat16 predux(const Packet8bf& a) { return static_cast(predux(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_max(const Packet8bf& a) { return static_cast(predux_max(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_min(const Packet8bf& a) { return static_cast(predux_min(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_mul(const Packet8bf& a) { return static_cast(predux_mul(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet8bf preverse(const Packet8bf& a) { __m128i m = _mm_setr_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); return _mm_shuffle_epi8(a,m); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m128i a = kernel.packet[0]; __m128i b = kernel.packet[1]; __m128i c = kernel.packet[2]; __m128i d = kernel.packet[3]; __m128i e = kernel.packet[4]; __m128i f = kernel.packet[5]; __m128i g = kernel.packet[6]; __m128i h = kernel.packet[7]; __m128i a03b03 = _mm_unpacklo_epi16(a, b); __m128i c03d03 = _mm_unpacklo_epi16(c, d); __m128i e03f03 = _mm_unpacklo_epi16(e, f); __m128i g03h03 = _mm_unpacklo_epi16(g, h); __m128i a47b47 = _mm_unpackhi_epi16(a, b); __m128i c47d47 = _mm_unpackhi_epi16(c, d); __m128i e47f47 = _mm_unpackhi_epi16(e, f); __m128i g47h47 = _mm_unpackhi_epi16(g, h); __m128i a01b01c01d01 = _mm_unpacklo_epi32(a03b03, c03d03); __m128i a23b23c23d23 = _mm_unpackhi_epi32(a03b03, c03d03); __m128i e01f01g01h01 = _mm_unpacklo_epi32(e03f03, g03h03); __m128i e23f23g23h23 = _mm_unpackhi_epi32(e03f03, g03h03); __m128i a45b45c45d45 = _mm_unpacklo_epi32(a47b47, c47d47); __m128i a67b67c67d67 = _mm_unpackhi_epi32(a47b47, c47d47); __m128i e45f45g45h45 = _mm_unpacklo_epi32(e47f47, g47h47); __m128i e67f67g67h67 = _mm_unpackhi_epi32(e47f47, g47h47); kernel.packet[0] = _mm_unpacklo_epi64(a01b01c01d01, e01f01g01h01); kernel.packet[1] = _mm_unpackhi_epi64(a01b01c01d01, e01f01g01h01); kernel.packet[2] = _mm_unpacklo_epi64(a23b23c23d23, e23f23g23h23); kernel.packet[3] = _mm_unpackhi_epi64(a23b23c23d23, e23f23g23h23); kernel.packet[4] = _mm_unpacklo_epi64(a45b45c45d45, e45f45g45h45); kernel.packet[5] = _mm_unpackhi_epi64(a45b45c45d45, e45f45g45h45); kernel.packet[6] = _mm_unpacklo_epi64(a67b67c67d67, e67f67g67h67); kernel.packet[7] = _mm_unpackhi_epi64(a67b67c67d67, e67f67g67h67); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m128i a = kernel.packet[0]; __m128i b = kernel.packet[1]; __m128i c = kernel.packet[2]; __m128i d = kernel.packet[3]; __m128i ab_03 = _mm_unpacklo_epi16(a, b); __m128i cd_03 = _mm_unpacklo_epi16(c, d); __m128i ab_47 = _mm_unpackhi_epi16(a, b); __m128i cd_47 = _mm_unpackhi_epi16(c, d); kernel.packet[0] = _mm_unpacklo_epi32(ab_03, cd_03); kernel.packet[1] = _mm_unpackhi_epi32(ab_03, cd_03); kernel.packet[2] = _mm_unpacklo_epi32(ab_47, cd_47); kernel.packet[3] = _mm_unpackhi_epi32(ab_47, cd_47); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_AVX_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX/TypeCasting.h0000644000176200001440000000500414562417221023215 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2015 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_AVX_H #define EIGEN_TYPE_CASTING_AVX_H namespace Eigen { namespace internal { // For now we use SSE to handle integers, so we can't use AVX instructions to cast // from int to float template <> struct type_casting_traits { enum { VectorizedCast = 0, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 0, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; #ifndef EIGEN_VECTORIZE_AVX512 template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; #endif // EIGEN_VECTORIZE_AVX512 template<> EIGEN_STRONG_INLINE Packet8i pcast(const Packet8f& a) { return _mm256_cvttps_epi32(a); } template<> EIGEN_STRONG_INLINE Packet8f pcast(const Packet8i& a) { return _mm256_cvtepi32_ps(a); } template<> EIGEN_STRONG_INLINE Packet8i preinterpret(const Packet8f& a) { return _mm256_castps_si256(a); } template<> EIGEN_STRONG_INLINE Packet8f preinterpret(const Packet8i& a) { return _mm256_castsi256_ps(a); } template<> EIGEN_STRONG_INLINE Packet8f pcast(const Packet8h& a) { return half2float(a); } template<> EIGEN_STRONG_INLINE Packet8f pcast(const Packet8bf& a) { return Bf16ToF32(a); } template<> EIGEN_STRONG_INLINE Packet8h pcast(const Packet8f& a) { return float2half(a); } template<> EIGEN_STRONG_INLINE Packet8bf pcast(const Packet8f& a) { return F32ToBf16(a); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_AVX_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX/Complex.h0000644000176200001440000003556714562417221022413 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner (benoit.steiner.goog@gmail.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_AVX_H #define EIGEN_COMPLEX_AVX_H namespace Eigen { namespace internal { //---------- float ---------- struct Packet4cf { EIGEN_STRONG_INLINE Packet4cf() {} EIGEN_STRONG_INLINE explicit Packet4cf(const __m256& a) : v(a) {} __m256 v; }; #ifndef EIGEN_VECTORIZE_AVX512 template<> struct packet_traits > : default_packet_traits { typedef Packet4cf type; typedef Packet2cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; #endif template<> struct unpacket_traits { typedef std::complex type; typedef Packet2cf half; typedef Packet8f as_real; enum { size=4, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet4cf padd(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cf psub(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cf pnegate(const Packet4cf& a) { return Packet4cf(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet4cf pconj(const Packet4cf& a) { const __m256 mask = _mm256_castsi256_ps(_mm256_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000)); return Packet4cf(_mm256_xor_ps(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) { __m256 tmp1 = _mm256_mul_ps(_mm256_moveldup_ps(a.v), b.v); __m256 tmp2 = _mm256_mul_ps(_mm256_movehdup_ps(a.v), _mm256_permute_ps(b.v, _MM_SHUFFLE(2,3,0,1))); __m256 result = _mm256_addsub_ps(tmp1, tmp2); return Packet4cf(result); } template <> EIGEN_STRONG_INLINE Packet4cf pcmp_eq(const Packet4cf& a, const Packet4cf& b) { __m256 eq = _mm256_cmp_ps(a.v, b.v, _CMP_EQ_OQ); return Packet4cf(_mm256_and_ps(eq, _mm256_permute_ps(eq, 0xb1))); } template<> EIGEN_STRONG_INLINE Packet4cf ptrue(const Packet4cf& a) { return Packet4cf(ptrue(Packet8f(a.v))); } template<> EIGEN_STRONG_INLINE Packet4cf pand (const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cf por (const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cf pxor (const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cf pandnot(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_andnot_ps(b.v,a.v)); } template<> EIGEN_STRONG_INLINE Packet4cf pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet4cf(pload(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet4cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet4cf(ploadu(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet4cf pset1(const std::complex& from) { return Packet4cf(_mm256_castpd_ps(_mm256_broadcast_sd((const double*)(const void*)&from))); } template<> EIGEN_STRONG_INLINE Packet4cf ploaddup(const std::complex* from) { // FIXME The following might be optimized using _mm256_movedup_pd Packet2cf a = ploaddup(from); Packet2cf b = ploaddup(from+1); return Packet4cf(_mm256_insertf128_ps(_mm256_castps128_ps256(a.v), b.v, 1)); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex* to, const Packet4cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex* to, const Packet4cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), from.v); } template<> EIGEN_DEVICE_FUNC inline Packet4cf pgather, Packet4cf>(const std::complex* from, Index stride) { return Packet4cf(_mm256_set_ps(std::imag(from[3*stride]), std::real(from[3*stride]), std::imag(from[2*stride]), std::real(from[2*stride]), std::imag(from[1*stride]), std::real(from[1*stride]), std::imag(from[0*stride]), std::real(from[0*stride]))); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet4cf>(std::complex* to, const Packet4cf& from, Index stride) { __m128 low = _mm256_extractf128_ps(from.v, 0); to[stride*0] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(low, low, 0)), _mm_cvtss_f32(_mm_shuffle_ps(low, low, 1))); to[stride*1] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(low, low, 2)), _mm_cvtss_f32(_mm_shuffle_ps(low, low, 3))); __m128 high = _mm256_extractf128_ps(from.v, 1); to[stride*2] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(high, high, 0)), _mm_cvtss_f32(_mm_shuffle_ps(high, high, 1))); to[stride*3] = std::complex(_mm_cvtss_f32(_mm_shuffle_ps(high, high, 2)), _mm_cvtss_f32(_mm_shuffle_ps(high, high, 3))); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet4cf& a) { return pfirst(Packet2cf(_mm256_castps256_ps128(a.v))); } template<> EIGEN_STRONG_INLINE Packet4cf preverse(const Packet4cf& a) { __m128 low = _mm256_extractf128_ps(a.v, 0); __m128 high = _mm256_extractf128_ps(a.v, 1); __m128d lowd = _mm_castps_pd(low); __m128d highd = _mm_castps_pd(high); low = _mm_castpd_ps(_mm_shuffle_pd(lowd,lowd,0x1)); high = _mm_castpd_ps(_mm_shuffle_pd(highd,highd,0x1)); __m256 result = _mm256_setzero_ps(); result = _mm256_insertf128_ps(result, low, 1); result = _mm256_insertf128_ps(result, high, 0); return Packet4cf(result); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet4cf& a) { return predux(padd(Packet2cf(_mm256_extractf128_ps(a.v,0)), Packet2cf(_mm256_extractf128_ps(a.v,1)))); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet4cf& a) { return predux_mul(pmul(Packet2cf(_mm256_extractf128_ps(a.v, 0)), Packet2cf(_mm256_extractf128_ps(a.v, 1)))); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cf,Packet8f) template<> EIGEN_STRONG_INLINE Packet4cf pdiv(const Packet4cf& a, const Packet4cf& b) { Packet4cf num = pmul(a, pconj(b)); __m256 tmp = _mm256_mul_ps(b.v, b.v); __m256 tmp2 = _mm256_shuffle_ps(tmp,tmp,0xB1); __m256 denom = _mm256_add_ps(tmp, tmp2); return Packet4cf(_mm256_div_ps(num.v, denom)); } template<> EIGEN_STRONG_INLINE Packet4cf pcplxflip(const Packet4cf& x) { return Packet4cf(_mm256_shuffle_ps(x.v, x.v, _MM_SHUFFLE(2, 3, 0 ,1))); } //---------- double ---------- struct Packet2cd { EIGEN_STRONG_INLINE Packet2cd() {} EIGEN_STRONG_INLINE explicit Packet2cd(const __m256d& a) : v(a) {} __m256d v; }; #ifndef EIGEN_VECTORIZE_AVX512 template<> struct packet_traits > : default_packet_traits { typedef Packet2cd type; typedef Packet1cd half; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 2, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; #endif template<> struct unpacket_traits { typedef std::complex type; typedef Packet1cd half; typedef Packet4d as_real; enum { size=2, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet2cd padd(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cd psub(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cd pnegate(const Packet2cd& a) { return Packet2cd(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet2cd pconj(const Packet2cd& a) { const __m256d mask = _mm256_castsi256_pd(_mm256_set_epi32(0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0)); return Packet2cd(_mm256_xor_pd(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) { __m256d tmp1 = _mm256_shuffle_pd(a.v,a.v,0x0); __m256d even = _mm256_mul_pd(tmp1, b.v); __m256d tmp2 = _mm256_shuffle_pd(a.v,a.v,0xF); __m256d tmp3 = _mm256_shuffle_pd(b.v,b.v,0x5); __m256d odd = _mm256_mul_pd(tmp2, tmp3); return Packet2cd(_mm256_addsub_pd(even, odd)); } template <> EIGEN_STRONG_INLINE Packet2cd pcmp_eq(const Packet2cd& a, const Packet2cd& b) { __m256d eq = _mm256_cmp_pd(a.v, b.v, _CMP_EQ_OQ); return Packet2cd(pand(eq, _mm256_permute_pd(eq, 0x5))); } template<> EIGEN_STRONG_INLINE Packet2cd ptrue(const Packet2cd& a) { return Packet2cd(ptrue(Packet4d(a.v))); } template<> EIGEN_STRONG_INLINE Packet2cd pand (const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cd por (const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cd pxor (const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet2cd pandnot(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_andnot_pd(b.v,a.v)); } template<> EIGEN_STRONG_INLINE Packet2cd pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cd(pload((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet2cd ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cd(ploadu((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet2cd pset1(const std::complex& from) { // in case casting to a __m128d* is really not safe, then we can still fallback to this version: (much slower though) // return Packet2cd(_mm256_loadu2_m128d((const double*)&from,(const double*)&from)); return Packet2cd(_mm256_broadcast_pd((const __m128d*)(const void*)&from)); } template<> EIGEN_STRONG_INLINE Packet2cd ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet2cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet2cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); } template<> EIGEN_DEVICE_FUNC inline Packet2cd pgather, Packet2cd>(const std::complex* from, Index stride) { return Packet2cd(_mm256_set_pd(std::imag(from[1*stride]), std::real(from[1*stride]), std::imag(from[0*stride]), std::real(from[0*stride]))); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cd>(std::complex* to, const Packet2cd& from, Index stride) { __m128d low = _mm256_extractf128_pd(from.v, 0); to[stride*0] = std::complex(_mm_cvtsd_f64(low), _mm_cvtsd_f64(_mm_shuffle_pd(low, low, 1))); __m128d high = _mm256_extractf128_pd(from.v, 1); to[stride*1] = std::complex(_mm_cvtsd_f64(high), _mm_cvtsd_f64(_mm_shuffle_pd(high, high, 1))); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cd& a) { __m128d low = _mm256_extractf128_pd(a.v, 0); EIGEN_ALIGN16 double res[2]; _mm_store_pd(res, low); return std::complex(res[0],res[1]); } template<> EIGEN_STRONG_INLINE Packet2cd preverse(const Packet2cd& a) { __m256d result = _mm256_permute2f128_pd(a.v, a.v, 1); return Packet2cd(result); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cd& a) { return predux(padd(Packet1cd(_mm256_extractf128_pd(a.v,0)), Packet1cd(_mm256_extractf128_pd(a.v,1)))); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cd& a) { return predux(pmul(Packet1cd(_mm256_extractf128_pd(a.v,0)), Packet1cd(_mm256_extractf128_pd(a.v,1)))); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cd,Packet4d) template<> EIGEN_STRONG_INLINE Packet2cd pdiv(const Packet2cd& a, const Packet2cd& b) { Packet2cd num = pmul(a, pconj(b)); __m256d tmp = _mm256_mul_pd(b.v, b.v); __m256d denom = _mm256_hadd_pd(tmp, tmp); return Packet2cd(_mm256_div_pd(num.v, denom)); } template<> EIGEN_STRONG_INLINE Packet2cd pcplxflip(const Packet2cd& x) { return Packet2cd(_mm256_shuffle_pd(x.v, x.v, 0x5)); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m256d P0 = _mm256_castps_pd(kernel.packet[0].v); __m256d P1 = _mm256_castps_pd(kernel.packet[1].v); __m256d P2 = _mm256_castps_pd(kernel.packet[2].v); __m256d P3 = _mm256_castps_pd(kernel.packet[3].v); __m256d T0 = _mm256_shuffle_pd(P0, P1, 15); __m256d T1 = _mm256_shuffle_pd(P0, P1, 0); __m256d T2 = _mm256_shuffle_pd(P2, P3, 15); __m256d T3 = _mm256_shuffle_pd(P2, P3, 0); kernel.packet[1].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T0, T2, 32)); kernel.packet[3].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T0, T2, 49)); kernel.packet[0].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T1, T3, 32)); kernel.packet[2].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T1, T3, 49)); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m256d tmp = _mm256_permute2f128_pd(kernel.packet[0].v, kernel.packet[1].v, 0+(2<<4)); kernel.packet[1].v = _mm256_permute2f128_pd(kernel.packet[0].v, kernel.packet[1].v, 1+(3<<4)); kernel.packet[0].v = tmp; } template<> EIGEN_STRONG_INLINE Packet2cd psqrt(const Packet2cd& a) { return psqrt_complex(a); } template<> EIGEN_STRONG_INLINE Packet4cf psqrt(const Packet4cf& a) { return psqrt_complex(a); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX_AVX_H RcppEigen/inst/include/Eigen/src/Core/arch/GPU/0000755000176200001440000000000014562417221020610 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/GPU/MathFunctions.h0000644000176200001440000000520714562417221023547 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATH_FUNCTIONS_GPU_H #define EIGEN_MATH_FUNCTIONS_GPU_H namespace Eigen { namespace internal { // Make sure this is only available when targeting a GPU: we don't want to // introduce conflicts between these packet_traits definitions and the ones // we'll use on the host side (SSE, AVX, ...) #if defined(EIGEN_GPUCC) && defined(EIGEN_USE_GPU) template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plog(const float4& a) { return make_float4(logf(a.x), logf(a.y), logf(a.z), logf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plog(const double2& a) { using ::log; return make_double2(log(a.x), log(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plog1p(const float4& a) { return make_float4(log1pf(a.x), log1pf(a.y), log1pf(a.z), log1pf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plog1p(const double2& a) { return make_double2(log1p(a.x), log1p(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pexp(const float4& a) { return make_float4(expf(a.x), expf(a.y), expf(a.z), expf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pexp(const double2& a) { using ::exp; return make_double2(exp(a.x), exp(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pexpm1(const float4& a) { return make_float4(expm1f(a.x), expm1f(a.y), expm1f(a.z), expm1f(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pexpm1(const double2& a) { return make_double2(expm1(a.x), expm1(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 psqrt(const float4& a) { return make_float4(sqrtf(a.x), sqrtf(a.y), sqrtf(a.z), sqrtf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 psqrt(const double2& a) { using ::sqrt; return make_double2(sqrt(a.x), sqrt(a.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 prsqrt(const float4& a) { return make_float4(rsqrtf(a.x), rsqrtf(a.y), rsqrtf(a.z), rsqrtf(a.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 prsqrt(const double2& a) { return make_double2(rsqrt(a.x), rsqrt(a.y)); } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_GPU_H RcppEigen/inst/include/Eigen/src/Core/arch/GPU/PacketMath.h0000644000176200001440000015732714562417221023021 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_GPU_H #define EIGEN_PACKET_MATH_GPU_H namespace Eigen { namespace internal { // Read-only data cached load available. #if defined(EIGEN_HIP_DEVICE_COMPILE) || (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 350) #define EIGEN_GPU_HAS_LDG 1 #endif // FP16 math available. #if (defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) #define EIGEN_CUDA_HAS_FP16_ARITHMETIC 1 #endif #if defined(EIGEN_HIP_DEVICE_COMPILE) || defined(EIGEN_CUDA_HAS_FP16_ARITHMETIC) #define EIGEN_GPU_HAS_FP16_ARITHMETIC 1 #endif // Make sure this is only available when targeting a GPU: we don't want to // introduce conflicts between these packet_traits definitions and the ones // we'll use on the host side (SSE, AVX, ...) #if defined(EIGEN_GPUCC) && defined(EIGEN_USE_GPU) template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct packet_traits : default_packet_traits { typedef float4 type; typedef float4 half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=4, HasHalfPacket = 0, HasDiv = 1, HasSin = 0, HasCos = 0, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasLGamma = 1, HasDiGamma = 1, HasZeta = 1, HasPolygamma = 1, HasErf = 1, HasErfc = 1, HasNdtri = 1, HasBessel = 1, HasIGamma = 1, HasIGammaDerA = 1, HasGammaSampleDerAlpha = 1, HasIGammac = 1, HasBetaInc = 1, HasBlend = 0, HasFloor = 1, }; }; template<> struct packet_traits : default_packet_traits { typedef double2 type; typedef double2 half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=2, HasHalfPacket = 0, HasDiv = 1, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasLGamma = 1, HasDiGamma = 1, HasZeta = 1, HasPolygamma = 1, HasErf = 1, HasErfc = 1, HasNdtri = 1, HasBessel = 1, HasIGamma = 1, HasIGammaDerA = 1, HasGammaSampleDerAlpha = 1, HasIGammac = 1, HasBetaInc = 1, HasBlend = 0, HasFloor = 1, }; }; template<> struct unpacket_traits { typedef float type; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef float4 half; }; template<> struct unpacket_traits { typedef double type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef double2 half; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pset1(const float& from) { return make_float4(from, from, from, from); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pset1(const double& from) { return make_double2(from, from); } // We need to distinguish ‘clang as the CUDA compiler’ from ‘clang as the host compiler, // invoked by NVCC’ (e.g. on MacOS). The former needs to see both host and device implementation // of the functions, while the latter can only deal with one of them. #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIPCC) || (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) namespace { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float bitwise_and(const float& a, const float& b) { return __int_as_float(__float_as_int(a) & __float_as_int(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bitwise_and(const double& a, const double& b) { return __longlong_as_double(__double_as_longlong(a) & __double_as_longlong(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float bitwise_or(const float& a, const float& b) { return __int_as_float(__float_as_int(a) | __float_as_int(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bitwise_or(const double& a, const double& b) { return __longlong_as_double(__double_as_longlong(a) | __double_as_longlong(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float bitwise_xor(const float& a, const float& b) { return __int_as_float(__float_as_int(a) ^ __float_as_int(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bitwise_xor(const double& a, const double& b) { return __longlong_as_double(__double_as_longlong(a) ^ __double_as_longlong(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float bitwise_andnot(const float& a, const float& b) { return __int_as_float(__float_as_int(a) & ~__float_as_int(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double bitwise_andnot(const double& a, const double& b) { return __longlong_as_double(__double_as_longlong(a) & ~__double_as_longlong(b)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float eq_mask(const float& a, const float& b) { return __int_as_float(a == b ? 0xffffffffu : 0u); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double eq_mask(const double& a, const double& b) { return __longlong_as_double(a == b ? 0xffffffffffffffffull : 0ull); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float lt_mask(const float& a, const float& b) { return __int_as_float(a < b ? 0xffffffffu : 0u); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double lt_mask(const double& a, const double& b) { return __longlong_as_double(a < b ? 0xffffffffffffffffull : 0ull); } } // namespace template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pand(const float4& a, const float4& b) { return make_float4(bitwise_and(a.x, b.x), bitwise_and(a.y, b.y), bitwise_and(a.z, b.z), bitwise_and(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pand(const double2& a, const double2& b) { return make_double2(bitwise_and(a.x, b.x), bitwise_and(a.y, b.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 por(const float4& a, const float4& b) { return make_float4(bitwise_or(a.x, b.x), bitwise_or(a.y, b.y), bitwise_or(a.z, b.z), bitwise_or(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 por(const double2& a, const double2& b) { return make_double2(bitwise_or(a.x, b.x), bitwise_or(a.y, b.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pxor(const float4& a, const float4& b) { return make_float4(bitwise_xor(a.x, b.x), bitwise_xor(a.y, b.y), bitwise_xor(a.z, b.z), bitwise_xor(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pxor(const double2& a, const double2& b) { return make_double2(bitwise_xor(a.x, b.x), bitwise_xor(a.y, b.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pandnot(const float4& a, const float4& b) { return make_float4(bitwise_andnot(a.x, b.x), bitwise_andnot(a.y, b.y), bitwise_andnot(a.z, b.z), bitwise_andnot(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pandnot(const double2& a, const double2& b) { return make_double2(bitwise_andnot(a.x, b.x), bitwise_andnot(a.y, b.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcmp_eq(const float4& a, const float4& b) { return make_float4(eq_mask(a.x, b.x), eq_mask(a.y, b.y), eq_mask(a.z, b.z), eq_mask(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcmp_lt(const float4& a, const float4& b) { return make_float4(lt_mask(a.x, b.x), lt_mask(a.y, b.y), lt_mask(a.z, b.z), lt_mask(a.w, b.w)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pcmp_eq(const double2& a, const double2& b) { return make_double2(eq_mask(a.x, b.x), eq_mask(a.y, b.y)); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pcmp_lt(const double2& a, const double2& b) { return make_double2(lt_mask(a.x, b.x), lt_mask(a.y, b.y)); } #endif // defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIPCC) || (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plset(const float& a) { return make_float4(a, a+1, a+2, a+3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plset(const double& a) { return make_double2(a, a+1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 padd(const float4& a, const float4& b) { return make_float4(a.x+b.x, a.y+b.y, a.z+b.z, a.w+b.w); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 padd(const double2& a, const double2& b) { return make_double2(a.x+b.x, a.y+b.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 psub(const float4& a, const float4& b) { return make_float4(a.x-b.x, a.y-b.y, a.z-b.z, a.w-b.w); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 psub(const double2& a, const double2& b) { return make_double2(a.x-b.x, a.y-b.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pnegate(const float4& a) { return make_float4(-a.x, -a.y, -a.z, -a.w); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pnegate(const double2& a) { return make_double2(-a.x, -a.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pconj(const float4& a) { return a; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pconj(const double2& a) { return a; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmul(const float4& a, const float4& b) { return make_float4(a.x*b.x, a.y*b.y, a.z*b.z, a.w*b.w); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmul(const double2& a, const double2& b) { return make_double2(a.x*b.x, a.y*b.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pdiv(const float4& a, const float4& b) { return make_float4(a.x/b.x, a.y/b.y, a.z/b.z, a.w/b.w); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pdiv(const double2& a, const double2& b) { return make_double2(a.x/b.x, a.y/b.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmin(const float4& a, const float4& b) { return make_float4(fminf(a.x, b.x), fminf(a.y, b.y), fminf(a.z, b.z), fminf(a.w, b.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmin(const double2& a, const double2& b) { return make_double2(fmin(a.x, b.x), fmin(a.y, b.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmax(const float4& a, const float4& b) { return make_float4(fmaxf(a.x, b.x), fmaxf(a.y, b.y), fmaxf(a.z, b.z), fmaxf(a.w, b.w)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmax(const double2& a, const double2& b) { return make_double2(fmax(a.x, b.x), fmax(a.y, b.y)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pload(const float* from) { return *reinterpret_cast(from); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pload(const double* from) { return *reinterpret_cast(from); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ploadu(const float* from) { return make_float4(from[0], from[1], from[2], from[3]); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ploadu(const double* from) { return make_double2(from[0], from[1]); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ploaddup(const float* from) { return make_float4(from[0], from[0], from[1], from[1]); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ploaddup(const double* from) { return make_double2(from[0], from[0]); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore(float* to, const float4& from) { *reinterpret_cast(to) = from; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore(double* to, const double2& from) { *reinterpret_cast(to) = from; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu(float* to, const float4& from) { to[0] = from.x; to[1] = from.y; to[2] = from.z; to[3] = from.w; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu(double* to, const double2& from) { to[0] = from.x; to[1] = from.y; } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro(const float* from) { #if defined(EIGEN_GPU_HAS_LDG) return __ldg((const float4*)from); #else return make_float4(from[0], from[1], from[2], from[3]); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro(const double* from) { #if defined(EIGEN_GPU_HAS_LDG) return __ldg((const double2*)from); #else return make_double2(from[0], from[1]); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro(const float* from) { #if defined(EIGEN_GPU_HAS_LDG) return make_float4(__ldg(from+0), __ldg(from+1), __ldg(from+2), __ldg(from+3)); #else return make_float4(from[0], from[1], from[2], from[3]); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro(const double* from) { #if defined(EIGEN_GPU_HAS_LDG) return make_double2(__ldg(from+0), __ldg(from+1)); #else return make_double2(from[0], from[1]); #endif } template<> EIGEN_DEVICE_FUNC inline float4 pgather(const float* from, Index stride) { return make_float4(from[0*stride], from[1*stride], from[2*stride], from[3*stride]); } template<> EIGEN_DEVICE_FUNC inline double2 pgather(const double* from, Index stride) { return make_double2(from[0*stride], from[1*stride]); } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const float4& from, Index stride) { to[stride*0] = from.x; to[stride*1] = from.y; to[stride*2] = from.z; to[stride*3] = from.w; } template<> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const double2& from, Index stride) { to[stride*0] = from.x; to[stride*1] = from.y; } template<> EIGEN_DEVICE_FUNC inline float pfirst(const float4& a) { return a.x; } template<> EIGEN_DEVICE_FUNC inline double pfirst(const double2& a) { return a.x; } template<> EIGEN_DEVICE_FUNC inline float predux(const float4& a) { return a.x + a.y + a.z + a.w; } template<> EIGEN_DEVICE_FUNC inline double predux(const double2& a) { return a.x + a.y; } template<> EIGEN_DEVICE_FUNC inline float predux_max(const float4& a) { return fmaxf(fmaxf(a.x, a.y), fmaxf(a.z, a.w)); } template<> EIGEN_DEVICE_FUNC inline double predux_max(const double2& a) { return fmax(a.x, a.y); } template<> EIGEN_DEVICE_FUNC inline float predux_min(const float4& a) { return fminf(fminf(a.x, a.y), fminf(a.z, a.w)); } template<> EIGEN_DEVICE_FUNC inline double predux_min(const double2& a) { return fmin(a.x, a.y); } template<> EIGEN_DEVICE_FUNC inline float predux_mul(const float4& a) { return a.x * a.y * a.z * a.w; } template<> EIGEN_DEVICE_FUNC inline double predux_mul(const double2& a) { return a.x * a.y; } template<> EIGEN_DEVICE_FUNC inline float4 pabs(const float4& a) { return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w)); } template<> EIGEN_DEVICE_FUNC inline double2 pabs(const double2& a) { return make_double2(fabs(a.x), fabs(a.y)); } template<> EIGEN_DEVICE_FUNC inline float4 pfloor(const float4& a) { return make_float4(floorf(a.x), floorf(a.y), floorf(a.z), floorf(a.w)); } template<> EIGEN_DEVICE_FUNC inline double2 pfloor(const double2& a) { return make_double2(floor(a.x), floor(a.y)); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { float tmp = kernel.packet[0].y; kernel.packet[0].y = kernel.packet[1].x; kernel.packet[1].x = tmp; tmp = kernel.packet[0].z; kernel.packet[0].z = kernel.packet[2].x; kernel.packet[2].x = tmp; tmp = kernel.packet[0].w; kernel.packet[0].w = kernel.packet[3].x; kernel.packet[3].x = tmp; tmp = kernel.packet[1].z; kernel.packet[1].z = kernel.packet[2].y; kernel.packet[2].y = tmp; tmp = kernel.packet[1].w; kernel.packet[1].w = kernel.packet[3].y; kernel.packet[3].y = tmp; tmp = kernel.packet[2].w; kernel.packet[2].w = kernel.packet[3].z; kernel.packet[3].z = tmp; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { double tmp = kernel.packet[0].y; kernel.packet[0].y = kernel.packet[1].x; kernel.packet[1].x = tmp; } #endif // defined(EIGEN_GPUCC) && defined(EIGEN_USE_GPU) // Packet4h2 must be defined in the macro without EIGEN_CUDA_ARCH, meaning // its corresponding packet_traits must be visible on host. #if defined(EIGEN_HAS_CUDA_FP16) || defined(EIGEN_HAS_HIP_FP16) typedef ulonglong2 Packet4h2; template<> struct unpacket_traits { typedef Eigen::half type; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet4h2 half; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct unpacket_traits { typedef Eigen::half type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef half2 half; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct packet_traits : default_packet_traits { typedef Packet4h2 type; typedef Packet4h2 half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=8, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasSqrt = 1, HasRsqrt = 1, HasExp = 1, HasExpm1 = 1, HasLog = 1, HasLog1p = 1 }; }; namespace { // This is equivalent to make_half2, which is undocumented and doesn't seem to always exist. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 combine_half(const __half& a, const __half& b) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __halves2half2(a, b); #else // Round-about way since __halves2half2 is a __device__ function. return __floats2half2_rn(__half2float(a), __half2float(b)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE __half get_half2_low(const half2& a) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __low2half(a); #else return __float2half(__low2float(a)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE __half get_half2_high(const half2& a) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __high2half(a); #else return __float2half(__high2float(a)); #endif } } // namespace template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pset1(const Eigen::half& from) { #if defined(EIGEN_GPU_COMPILE_PHASE) return __half2half2(from); #else const float f = __half2float(from); return __floats2half2_rn(f, f); #endif } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pset1(const Eigen::half& from) { Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = pset1(from); p_alias[1] = pset1(from); p_alias[2] = pset1(from); p_alias[3] = pset1(from); return r; } // We now need this visible on both host and device. // #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIPCC) || (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) namespace { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pload(const Eigen::half* from) { return *reinterpret_cast(from); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 ploadu(const Eigen::half* from) { return combine_half(from[0], from[1]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 ploaddup(const Eigen::half* from) { return combine_half(from[0], from[0]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const half2& from) { *reinterpret_cast(to) = from; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const half2& from) { to[0] = get_half2_low(from); to[1] = get_half2_high(from); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE half2 ploadt_ro_aligned( const Eigen::half* from) { #if defined(EIGEN_GPU_HAS_LDG) // Input is guaranteed to be properly aligned. return __ldg(reinterpret_cast(from)); #else return combine_half(*(from+0), *(from+1)); #endif } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE half2 ploadt_ro_unaligned( const Eigen::half* from) { #if defined(EIGEN_GPU_HAS_LDG) return __halves2half2(__ldg(from+0), __ldg(from+1)); #else return combine_half(*(from+0), *(from+1)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pgather(const Eigen::half* from, Index stride) { return combine_half(from[0*stride], from[1*stride]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter( Eigen::half* to, const half2& from, Index stride) { to[stride*0] = get_half2_low(from); to[stride*1] = get_half2_high(from); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half pfirst(const half2& a) { return get_half2_low(a); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pabs(const half2& a) { half a1 = get_half2_low(a); half a2 = get_half2_high(a); half result1 = half_impl::raw_uint16_to_half(a1.x & 0x7FFF); half result2 = half_impl::raw_uint16_to_half(a2.x & 0x7FFF); return combine_half(result1, result2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 ptrue(const half2& /*a*/) { half true_half = half_impl::raw_uint16_to_half(0xffffu); return pset1(true_half); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pzero(const half2& /*a*/) { half false_half = half_impl::raw_uint16_to_half(0x0000u); return pset1(false_half); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __half a1 = get_half2_low(kernel.packet[0]); __half a2 = get_half2_high(kernel.packet[0]); __half b1 = get_half2_low(kernel.packet[1]); __half b2 = get_half2_high(kernel.packet[1]); kernel.packet[0] = combine_half(a1, b1); kernel.packet[1] = combine_half(a2, b2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 plset(const Eigen::half& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __halves2half2(a, __hadd(a, __float2half(1.0f))); #else float f = __half2float(a) + 1.0f; return combine_half(a, __float2half(f)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pselect(const half2& mask, const half2& a, const half2& b) { half mask_low = get_half2_low(mask); half mask_high = get_half2_high(mask); half result_low = mask_low == half(0) ? get_half2_low(b) : get_half2_low(a); half result_high = mask_high == half(0) ? get_half2_high(b) : get_half2_high(a); return combine_half(result_low, result_high); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcmp_eq(const half2& a, const half2& b) { half true_half = half_impl::raw_uint16_to_half(0xffffu); half false_half = half_impl::raw_uint16_to_half(0x0000u); half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half eq1 = __half2float(a1) == __half2float(b1) ? true_half : false_half; half eq2 = __half2float(a2) == __half2float(b2) ? true_half : false_half; return combine_half(eq1, eq2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcmp_lt(const half2& a, const half2& b) { half true_half = half_impl::raw_uint16_to_half(0xffffu); half false_half = half_impl::raw_uint16_to_half(0x0000u); half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half eq1 = __half2float(a1) < __half2float(b1) ? true_half : false_half; half eq2 = __half2float(a2) < __half2float(b2) ? true_half : false_half; return combine_half(eq1, eq2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pand(const half2& a, const half2& b) { half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half result1 = half_impl::raw_uint16_to_half(a1.x & b1.x); half result2 = half_impl::raw_uint16_to_half(a2.x & b2.x); return combine_half(result1, result2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 por(const half2& a, const half2& b) { half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half result1 = half_impl::raw_uint16_to_half(a1.x | b1.x); half result2 = half_impl::raw_uint16_to_half(a2.x | b2.x); return combine_half(result1, result2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pxor(const half2& a, const half2& b) { half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half result1 = half_impl::raw_uint16_to_half(a1.x ^ b1.x); half result2 = half_impl::raw_uint16_to_half(a2.x ^ b2.x); return combine_half(result1, result2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pandnot(const half2& a, const half2& b) { half a1 = get_half2_low(a); half a2 = get_half2_high(a); half b1 = get_half2_low(b); half b2 = get_half2_high(b); half result1 = half_impl::raw_uint16_to_half(a1.x & ~b1.x); half result2 = half_impl::raw_uint16_to_half(a2.x & ~b2.x); return combine_half(result1, result2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 padd(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hadd2(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 + b1; float r2 = a2 + b2; return __floats2half2_rn(r1, r2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 psub(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hsub2(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 - b1; float r2 = a2 - b2; return __floats2half2_rn(r1, r2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pnegate(const half2& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hneg2(a); #else float a1 = __low2float(a); float a2 = __high2float(a); return __floats2half2_rn(-a1, -a2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pconj(const half2& a) { return a; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmul(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hmul2(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 * b1; float r2 = a2 * b2; return __floats2half2_rn(r1, r2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmadd(const half2& a, const half2& b, const half2& c) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hfma2(a, b, c); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float c1 = __low2float(c); float c2 = __high2float(c); float r1 = a1 * b1 + c1; float r2 = a2 * b2 + c2; return __floats2half2_rn(r1, r2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pdiv(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __h2div(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 / b1; float r2 = a2 / b2; return __floats2half2_rn(r1, r2); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmin(const half2& a, const half2& b) { float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); __half r1 = a1 < b1 ? get_half2_low(a) : get_half2_low(b); __half r2 = a2 < b2 ? get_half2_high(a) : get_half2_high(b); return combine_half(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmax(const half2& a, const half2& b) { float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); __half r1 = a1 > b1 ? get_half2_low(a) : get_half2_low(b); __half r2 = a2 > b2 ? get_half2_high(a) : get_half2_high(b); return combine_half(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux(const half2& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hadd(__low2half(a), __high2half(a)); #else float a1 = __low2float(a); float a2 = __high2float(a); return Eigen::half(__float2half(a1 + a2)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_max(const half2& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) __half first = __low2half(a); __half second = __high2half(a); return __hgt(first, second) ? first : second; #else float a1 = __low2float(a); float a2 = __high2float(a); return a1 > a2 ? get_half2_low(a) : get_half2_high(a); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_min(const half2& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) __half first = __low2half(a); __half second = __high2half(a); return __hlt(first, second) ? first : second; #else float a1 = __low2float(a); float a2 = __high2float(a); return a1 < a2 ? get_half2_low(a) : get_half2_high(a); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_mul(const half2& a) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hmul(__low2half(a), __high2half(a)); #else float a1 = __low2float(a); float a2 = __high2float(a); return Eigen::half(__float2half(a1 * a2)); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 plog1p(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = log1pf(a1); float r2 = log1pf(a2); return __floats2half2_rn(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pexpm1(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = expm1f(a1); float r2 = expm1f(a2); return __floats2half2_rn(r1, r2); } #if (EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_HAS_FP16_ARITHMETIC)) || \ defined(EIGEN_HIP_DEVICE_COMPILE) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 plog(const half2& a) { return h2log(a); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pexp(const half2& a) { return h2exp(a); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 psqrt(const half2& a) { return h2sqrt(a); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 prsqrt(const half2& a) { return h2rsqrt(a); } #else EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 plog(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = logf(a1); float r2 = logf(a2); return __floats2half2_rn(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pexp(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = expf(a1); float r2 = expf(a2); return __floats2half2_rn(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 psqrt(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = sqrtf(a1); float r2 = sqrtf(a2); return __floats2half2_rn(r1, r2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 prsqrt(const half2& a) { float a1 = __low2float(a); float a2 = __high2float(a); float r1 = rsqrtf(a1); float r2 = rsqrtf(a2); return __floats2half2_rn(r1, r2); } #endif } // namespace template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pload(const Eigen::half* from) { return *reinterpret_cast(from); } // unaligned load; template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 ploadu(const Eigen::half* from) { Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = ploadu(from + 0); p_alias[1] = ploadu(from + 2); p_alias[2] = ploadu(from + 4); p_alias[3] = ploadu(from + 6); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 ploaddup(const Eigen::half* from) { Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = ploaddup(from + 0); p_alias[1] = ploaddup(from + 1); p_alias[2] = ploaddup(from + 2); p_alias[3] = ploaddup(from + 3); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore( Eigen::half* to, const Packet4h2& from) { *reinterpret_cast(to) = from; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu( Eigen::half* to, const Packet4h2& from) { const half2* from_alias = reinterpret_cast(&from); pstoreu(to + 0,from_alias[0]); pstoreu(to + 2,from_alias[1]); pstoreu(to + 4,from_alias[2]); pstoreu(to + 6,from_alias[3]); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet4h2 ploadt_ro(const Eigen::half* from) { #if defined(EIGEN_GPU_HAS_LDG) Packet4h2 r; r = __ldg(reinterpret_cast(from)); return r; #else Packet4h2 r; half2* r_alias = reinterpret_cast(&r); r_alias[0] = ploadt_ro_aligned(from + 0); r_alias[1] = ploadt_ro_aligned(from + 2); r_alias[2] = ploadt_ro_aligned(from + 4); r_alias[3] = ploadt_ro_aligned(from + 6); return r; #endif } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet4h2 ploadt_ro(const Eigen::half* from) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); r_alias[0] = ploadt_ro_unaligned(from + 0); r_alias[1] = ploadt_ro_unaligned(from + 2); r_alias[2] = ploadt_ro_unaligned(from + 4); r_alias[3] = ploadt_ro_unaligned(from + 6); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pgather(const Eigen::half* from, Index stride) { Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = combine_half(from[0 * stride], from[1 * stride]); p_alias[1] = combine_half(from[2 * stride], from[3 * stride]); p_alias[2] = combine_half(from[4 * stride], from[5 * stride]); p_alias[3] = combine_half(from[6 * stride], from[7 * stride]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter( Eigen::half* to, const Packet4h2& from, Index stride) { const half2* from_alias = reinterpret_cast(&from); pscatter(to + stride * 0, from_alias[0], stride); pscatter(to + stride * 2, from_alias[1], stride); pscatter(to + stride * 4, from_alias[2], stride); pscatter(to + stride * 6, from_alias[3], stride); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half pfirst( const Packet4h2& a) { return pfirst(*(reinterpret_cast(&a))); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pabs( const Packet4h2& a) { Packet4h2 r; half2* p_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); p_alias[0] = pabs(a_alias[0]); p_alias[1] = pabs(a_alias[1]); p_alias[2] = pabs(a_alias[2]); p_alias[3] = pabs(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 ptrue( const Packet4h2& /*a*/) { half true_half = half_impl::raw_uint16_to_half(0xffffu); return pset1(true_half); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pzero(const Packet4h2& /*a*/) { half false_half = half_impl::raw_uint16_to_half(0x0000u); return pset1(false_half); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose_double( double* d_row0, double* d_row1, double* d_row2, double* d_row3, double* d_row4, double* d_row5, double* d_row6, double* d_row7) { double d_tmp; d_tmp = d_row0[1]; d_row0[1] = d_row4[0]; d_row4[0] = d_tmp; d_tmp = d_row1[1]; d_row1[1] = d_row5[0]; d_row5[0] = d_tmp; d_tmp = d_row2[1]; d_row2[1] = d_row6[0]; d_row6[0] = d_tmp; d_tmp = d_row3[1]; d_row3[1] = d_row7[0]; d_row7[0] = d_tmp; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose_half2( half2* f_row0, half2* f_row1, half2* f_row2, half2* f_row3) { half2 f_tmp; f_tmp = f_row0[1]; f_row0[1] = f_row2[0]; f_row2[0] = f_tmp; f_tmp = f_row1[1]; f_row1[1] = f_row3[0]; f_row3[0] = f_tmp; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose_half(half2& f0, half2& f1) { __half a1 = get_half2_low(f0); __half a2 = get_half2_high(f0); __half b1 = get_half2_low(f1); __half b2 = get_half2_high(f1); f0 = combine_half(a1, b1); f1 = combine_half(a2, b2); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { double* d_row0 = reinterpret_cast(&kernel.packet[0]); double* d_row1 = reinterpret_cast(&kernel.packet[1]); double* d_row2 = reinterpret_cast(&kernel.packet[2]); double* d_row3 = reinterpret_cast(&kernel.packet[3]); double* d_row4 = reinterpret_cast(&kernel.packet[4]); double* d_row5 = reinterpret_cast(&kernel.packet[5]); double* d_row6 = reinterpret_cast(&kernel.packet[6]); double* d_row7 = reinterpret_cast(&kernel.packet[7]); ptranspose_double(d_row0, d_row1, d_row2, d_row3, d_row4, d_row5, d_row6, d_row7); half2* f_row0 = reinterpret_cast(d_row0); half2* f_row1 = reinterpret_cast(d_row1); half2* f_row2 = reinterpret_cast(d_row2); half2* f_row3 = reinterpret_cast(d_row3); ptranspose_half2(f_row0, f_row1, f_row2, f_row3); ptranspose_half(f_row0[0], f_row1[0]); ptranspose_half(f_row0[1], f_row1[1]); ptranspose_half(f_row2[0], f_row3[0]); ptranspose_half(f_row2[1], f_row3[1]); f_row0 = reinterpret_cast(d_row0 + 1); f_row1 = reinterpret_cast(d_row1 + 1); f_row2 = reinterpret_cast(d_row2 + 1); f_row3 = reinterpret_cast(d_row3 + 1); ptranspose_half2(f_row0, f_row1, f_row2, f_row3); ptranspose_half(f_row0[0], f_row1[0]); ptranspose_half(f_row0[1], f_row1[1]); ptranspose_half(f_row2[0], f_row3[0]); ptranspose_half(f_row2[1], f_row3[1]); f_row0 = reinterpret_cast(d_row4); f_row1 = reinterpret_cast(d_row5); f_row2 = reinterpret_cast(d_row6); f_row3 = reinterpret_cast(d_row7); ptranspose_half2(f_row0, f_row1, f_row2, f_row3); ptranspose_half(f_row0[0], f_row1[0]); ptranspose_half(f_row0[1], f_row1[1]); ptranspose_half(f_row2[0], f_row3[0]); ptranspose_half(f_row2[1], f_row3[1]); f_row0 = reinterpret_cast(d_row4 + 1); f_row1 = reinterpret_cast(d_row5 + 1); f_row2 = reinterpret_cast(d_row6 + 1); f_row3 = reinterpret_cast(d_row7 + 1); ptranspose_half2(f_row0, f_row1, f_row2, f_row3); ptranspose_half(f_row0[0], f_row1[0]); ptranspose_half(f_row0[1], f_row1[1]); ptranspose_half(f_row2[0], f_row3[0]); ptranspose_half(f_row2[1], f_row3[1]); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 plset(const Eigen::half& a) { #if defined(EIGEN_HIP_DEVICE_COMPILE) Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = __halves2half2(a, __hadd(a, __float2half(1.0f))); p_alias[1] = __halves2half2(__hadd(a, __float2half(2.0f)), __hadd(a, __float2half(3.0f))); p_alias[2] = __halves2half2(__hadd(a, __float2half(4.0f)), __hadd(a, __float2half(5.0f))); p_alias[3] = __halves2half2(__hadd(a, __float2half(6.0f)), __hadd(a, __float2half(7.0f))); return r; #elif defined(EIGEN_CUDA_HAS_FP16_ARITHMETIC) Packet4h2 r; half2* r_alias = reinterpret_cast(&r); half2 b = pset1(a); half2 c; half2 half_offset0 = __halves2half2(__float2half(0.0f),__float2half(2.0f)); half2 half_offset1 = __halves2half2(__float2half(4.0f),__float2half(6.0f)); c = __hadd2(b, half_offset0); r_alias[0] = plset(__low2half(c)); r_alias[1] = plset(__high2half(c)); c = __hadd2(b, half_offset1); r_alias[2] = plset(__low2half(c)); r_alias[3] = plset(__high2half(c)); return r; #else float f = __half2float(a); Packet4h2 r; half2* p_alias = reinterpret_cast(&r); p_alias[0] = combine_half(a, __float2half(f + 1.0f)); p_alias[1] = combine_half(__float2half(f + 2.0f), __float2half(f + 3.0f)); p_alias[2] = combine_half(__float2half(f + 4.0f), __float2half(f + 5.0f)); p_alias[3] = combine_half(__float2half(f + 6.0f), __float2half(f + 7.0f)); return r; #endif } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pselect(const Packet4h2& mask, const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* mask_alias = reinterpret_cast(&mask); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pselect(mask_alias[0], a_alias[0], b_alias[0]); r_alias[1] = pselect(mask_alias[1], a_alias[1], b_alias[1]); r_alias[2] = pselect(mask_alias[2], a_alias[2], b_alias[2]); r_alias[3] = pselect(mask_alias[3], a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pcmp_eq(const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pcmp_eq(a_alias[0], b_alias[0]); r_alias[1] = pcmp_eq(a_alias[1], b_alias[1]); r_alias[2] = pcmp_eq(a_alias[2], b_alias[2]); r_alias[3] = pcmp_eq(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pand( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pand(a_alias[0], b_alias[0]); r_alias[1] = pand(a_alias[1], b_alias[1]); r_alias[2] = pand(a_alias[2], b_alias[2]); r_alias[3] = pand(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 por( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = por(a_alias[0], b_alias[0]); r_alias[1] = por(a_alias[1], b_alias[1]); r_alias[2] = por(a_alias[2], b_alias[2]); r_alias[3] = por(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pxor( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pxor(a_alias[0], b_alias[0]); r_alias[1] = pxor(a_alias[1], b_alias[1]); r_alias[2] = pxor(a_alias[2], b_alias[2]); r_alias[3] = pxor(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pandnot(const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pandnot(a_alias[0], b_alias[0]); r_alias[1] = pandnot(a_alias[1], b_alias[1]); r_alias[2] = pandnot(a_alias[2], b_alias[2]); r_alias[3] = pandnot(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 padd( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = padd(a_alias[0], b_alias[0]); r_alias[1] = padd(a_alias[1], b_alias[1]); r_alias[2] = padd(a_alias[2], b_alias[2]); r_alias[3] = padd(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 psub( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = psub(a_alias[0], b_alias[0]); r_alias[1] = psub(a_alias[1], b_alias[1]); r_alias[2] = psub(a_alias[2], b_alias[2]); r_alias[3] = psub(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pnegate(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = pnegate(a_alias[0]); r_alias[1] = pnegate(a_alias[1]); r_alias[2] = pnegate(a_alias[2]); r_alias[3] = pnegate(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pconj(const Packet4h2& a) { return a; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pmul( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pmul(a_alias[0], b_alias[0]); r_alias[1] = pmul(a_alias[1], b_alias[1]); r_alias[2] = pmul(a_alias[2], b_alias[2]); r_alias[3] = pmul(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pmadd( const Packet4h2& a, const Packet4h2& b, const Packet4h2& c) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); const half2* c_alias = reinterpret_cast(&c); r_alias[0] = pmadd(a_alias[0], b_alias[0], c_alias[0]); r_alias[1] = pmadd(a_alias[1], b_alias[1], c_alias[1]); r_alias[2] = pmadd(a_alias[2], b_alias[2], c_alias[2]); r_alias[3] = pmadd(a_alias[3], b_alias[3], c_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pdiv( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pdiv(a_alias[0], b_alias[0]); r_alias[1] = pdiv(a_alias[1], b_alias[1]); r_alias[2] = pdiv(a_alias[2], b_alias[2]); r_alias[3] = pdiv(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pmin( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pmin(a_alias[0], b_alias[0]); r_alias[1] = pmin(a_alias[1], b_alias[1]); r_alias[2] = pmin(a_alias[2], b_alias[2]); r_alias[3] = pmin(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pmax( const Packet4h2& a, const Packet4h2& b) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); const half2* b_alias = reinterpret_cast(&b); r_alias[0] = pmax(a_alias[0], b_alias[0]); r_alias[1] = pmax(a_alias[1], b_alias[1]); r_alias[2] = pmax(a_alias[2], b_alias[2]); r_alias[3] = pmax(a_alias[3], b_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux( const Packet4h2& a) { const half2* a_alias = reinterpret_cast(&a); return predux(a_alias[0]) + predux(a_alias[1]) + predux(a_alias[2]) + predux(a_alias[3]); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_max( const Packet4h2& a) { const half2* a_alias = reinterpret_cast(&a); half2 m0 = combine_half(predux_max(a_alias[0]), predux_max(a_alias[1])); half2 m1 = combine_half(predux_max(a_alias[2]), predux_max(a_alias[3])); __half first = predux_max(m0); __half second = predux_max(m1); #if defined(EIGEN_CUDA_HAS_FP16_ARITHMETIC) return (__hgt(first, second) ? first : second); #else float ffirst = __half2float(first); float fsecond = __half2float(second); return (ffirst > fsecond)? first: second; #endif } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_min( const Packet4h2& a) { const half2* a_alias = reinterpret_cast(&a); half2 m0 = combine_half(predux_min(a_alias[0]), predux_min(a_alias[1])); half2 m1 = combine_half(predux_min(a_alias[2]), predux_min(a_alias[3])); __half first = predux_min(m0); __half second = predux_min(m1); #if defined(EIGEN_CUDA_HAS_FP16_ARITHMETIC) return (__hlt(first, second) ? first : second); #else float ffirst = __half2float(first); float fsecond = __half2float(second); return (ffirst < fsecond)? first: second; #endif } // likely overflow/underflow template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half predux_mul( const Packet4h2& a) { const half2* a_alias = reinterpret_cast(&a); return predux_mul(pmul(pmul(a_alias[0], a_alias[1]), pmul(a_alias[2], a_alias[3]))); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 plog1p(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = plog1p(a_alias[0]); r_alias[1] = plog1p(a_alias[1]); r_alias[2] = plog1p(a_alias[2]); r_alias[3] = plog1p(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pexpm1(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = pexpm1(a_alias[0]); r_alias[1] = pexpm1(a_alias[1]); r_alias[2] = pexpm1(a_alias[2]); r_alias[3] = pexpm1(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 plog(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = plog(a_alias[0]); r_alias[1] = plog(a_alias[1]); r_alias[2] = plog(a_alias[2]); r_alias[3] = plog(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pexp(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = pexp(a_alias[0]); r_alias[1] = pexp(a_alias[1]); r_alias[2] = pexp(a_alias[2]); r_alias[3] = pexp(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 psqrt(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = psqrt(a_alias[0]); r_alias[1] = psqrt(a_alias[1]); r_alias[2] = psqrt(a_alias[2]); r_alias[3] = psqrt(a_alias[3]); return r; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 prsqrt(const Packet4h2& a) { Packet4h2 r; half2* r_alias = reinterpret_cast(&r); const half2* a_alias = reinterpret_cast(&a); r_alias[0] = prsqrt(a_alias[0]); r_alias[1] = prsqrt(a_alias[1]); r_alias[2] = prsqrt(a_alias[2]); r_alias[3] = prsqrt(a_alias[3]); return r; } // The following specialized padd, pmul, pdiv, pmin, pmax, pset1 are needed for // the implementation of GPU half reduction. template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 padd(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hadd2(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 + b1; float r2 = a2 + b2; return __floats2half2_rn(r1, r2); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmul(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __hmul2(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 * b1; float r2 = a2 * b2; return __floats2half2_rn(r1, r2); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pdiv(const half2& a, const half2& b) { #if defined(EIGEN_GPU_HAS_FP16_ARITHMETIC) return __h2div(a, b); #else float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); float r1 = a1 / b1; float r2 = a2 / b2; return __floats2half2_rn(r1, r2); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmin(const half2& a, const half2& b) { float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); __half r1 = a1 < b1 ? get_half2_low(a) : get_half2_low(b); __half r2 = a2 < b2 ? get_half2_high(a) : get_half2_high(b); return combine_half(r1, r2); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pmax(const half2& a, const half2& b) { float a1 = __low2float(a); float a2 = __high2float(a); float b1 = __low2float(b); float b2 = __high2float(b); __half r1 = a1 > b1 ? get_half2_low(a) : get_half2_low(b); __half r2 = a2 > b2 ? get_half2_high(a) : get_half2_high(b); return combine_half(r1, r2); } // #endif // defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIPCC) || (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) #endif // defined(EIGEN_HAS_CUDA_FP16) || defined(EIGEN_HAS_HIP_FP16) #undef EIGEN_GPU_HAS_LDG #undef EIGEN_CUDA_HAS_FP16_ARITHMETIC #undef EIGEN_GPU_HAS_FP16_ARITHMETIC } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_GPU_H RcppEigen/inst/include/Eigen/src/Core/arch/GPU/TypeCasting.h0000644000176200001440000000432014562417221023212 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_GPU_H #define EIGEN_TYPE_CASTING_GPU_H namespace Eigen { namespace internal { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast(const half2& a, const half2& b) { float2 r1 = __half22float2(a); float2 r2 = __half22float2(b); return make_float4(r1.x, r1.y, r2.x, r2.y); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4h2 pcast(const float4& a, const float4& b) { Packet4h2 r; half2* r_alias=reinterpret_cast(&r); r_alias[0]=__floats2half2_rn(a.x,a.y); r_alias[1]=__floats2half2_rn(a.z,a.w); r_alias[2]=__floats2half2_rn(b.x,b.y); r_alias[3]=__floats2half2_rn(b.z,b.w); return r; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pcast(const Packet4h2& a) { // Simply discard the second half of the input float4 r; const half2* a_alias=reinterpret_cast(&a); float2 r1 = __half22float2(a_alias[0]); float2 r2 = __half22float2(a_alias[1]); r.x=static_cast(r1.x); r.y=static_cast(r1.y); r.z=static_cast(r2.x); r.w=static_cast(r2.y); return r; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE half2 pcast(const float4& a) { // Simply discard the second half of the input return __floats2half2_rn(a.x, a.y); } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_GPU_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX512/0000755000176200001440000000000014562417221021043 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/AVX512/MathFunctions.h0000644000176200001440000003204014562417221023775 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Pedro Gonnet (pedro.gonnet@gmail.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef THIRD_PARTY_EIGEN3_EIGEN_SRC_CORE_ARCH_AVX512_MATHFUNCTIONS_H_ #define THIRD_PARTY_EIGEN3_EIGEN_SRC_CORE_ARCH_AVX512_MATHFUNCTIONS_H_ namespace Eigen { namespace internal { // Disable the code for older versions of gcc that don't support many of the required avx512 instrinsics. #if EIGEN_GNUC_AT_LEAST(5, 3) || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC >= 1923 #define _EIGEN_DECLARE_CONST_Packet16f(NAME, X) \ const Packet16f p16f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet16f_FROM_INT(NAME, X) \ const Packet16f p16f_##NAME = preinterpret(pset1(X)) #define _EIGEN_DECLARE_CONST_Packet8d(NAME, X) \ const Packet8d p8d_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(NAME, X) \ const Packet8d p8d_##NAME = _mm512_castsi512_pd(_mm512_set1_epi64(X)) #define _EIGEN_DECLARE_CONST_Packet16bf(NAME, X) \ const Packet16bf p16bf_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet16bf_FROM_INT(NAME, X) \ const Packet16bf p16bf_##NAME = preinterpret(pset1(X)) template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f plog(const Packet16f& _x) { return plog_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d plog(const Packet8d& _x) { return plog_double(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, plog) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog) template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f plog2(const Packet16f& _x) { return plog2_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d plog2(const Packet8d& _x) { return plog2_double(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, plog2) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog2) // Exponential function. Works by writing "x = m*log(2) + r" where // "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then // "exp(x) = 2^m*exp(r)" where exp(r) is in the range [-1,1). template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f pexp(const Packet16f& _x) { _EIGEN_DECLARE_CONST_Packet16f(1, 1.0f); _EIGEN_DECLARE_CONST_Packet16f(half, 0.5f); _EIGEN_DECLARE_CONST_Packet16f(127, 127.0f); _EIGEN_DECLARE_CONST_Packet16f(exp_hi, 88.3762626647950f); _EIGEN_DECLARE_CONST_Packet16f(exp_lo, -88.3762626647949f); _EIGEN_DECLARE_CONST_Packet16f(cephes_LOG2EF, 1.44269504088896341f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p0, 1.9875691500E-4f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p1, 1.3981999507E-3f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p2, 8.3334519073E-3f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p3, 4.1665795894E-2f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p4, 1.6666665459E-1f); _EIGEN_DECLARE_CONST_Packet16f(cephes_exp_p5, 5.0000001201E-1f); // Clamp x. Packet16f x = pmax(pmin(_x, p16f_exp_hi), p16f_exp_lo); // Express exp(x) as exp(m*ln(2) + r), start by extracting // m = floor(x/ln(2) + 0.5). Packet16f m = _mm512_floor_ps(pmadd(x, p16f_cephes_LOG2EF, p16f_half)); // Get r = x - m*ln(2). Note that we can do this without losing more than one // ulp precision due to the FMA instruction. _EIGEN_DECLARE_CONST_Packet16f(nln2, -0.6931471805599453f); Packet16f r = _mm512_fmadd_ps(m, p16f_nln2, x); Packet16f r2 = pmul(r, r); Packet16f r3 = pmul(r2, r); // Evaluate the polynomial approximant,improved by instruction-level parallelism. Packet16f y, y1, y2; y = pmadd(p16f_cephes_exp_p0, r, p16f_cephes_exp_p1); y1 = pmadd(p16f_cephes_exp_p3, r, p16f_cephes_exp_p4); y2 = padd(r, p16f_1); y = pmadd(y, r, p16f_cephes_exp_p2); y1 = pmadd(y1, r, p16f_cephes_exp_p5); y = pmadd(y, r3, y1); y = pmadd(y, r2, y2); // Build emm0 = 2^m. Packet16i emm0 = _mm512_cvttps_epi32(padd(m, p16f_127)); emm0 = _mm512_slli_epi32(emm0, 23); // Return 2^m * exp(r). return pmax(pmul(y, _mm512_castsi512_ps(emm0)), _x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d pexp(const Packet8d& _x) { return pexp_double(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, pexp) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexp) template <> EIGEN_STRONG_INLINE Packet16h pfrexp(const Packet16h& a, Packet16h& exponent) { Packet16f fexponent; const Packet16h out = float2half(pfrexp(half2float(a), fexponent)); exponent = float2half(fexponent); return out; } template <> EIGEN_STRONG_INLINE Packet16h pldexp(const Packet16h& a, const Packet16h& exponent) { return float2half(pldexp(half2float(a), half2float(exponent))); } template <> EIGEN_STRONG_INLINE Packet16bf pfrexp(const Packet16bf& a, Packet16bf& exponent) { Packet16f fexponent; const Packet16bf out = F32ToBf16(pfrexp(Bf16ToF32(a), fexponent)); exponent = F32ToBf16(fexponent); return out; } template <> EIGEN_STRONG_INLINE Packet16bf pldexp(const Packet16bf& a, const Packet16bf& exponent) { return F32ToBf16(pldexp(Bf16ToF32(a), Bf16ToF32(exponent))); } // Functions for sqrt. // The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step // of Newton's method, at a cost of 1-2 bits of precision as opposed to the // exact solution. The main advantage of this approach is not just speed, but // also the fact that it can be inlined and pipelined with other computations, // further reducing its effective latency. #if EIGEN_FAST_MATH template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f psqrt(const Packet16f& _x) { Packet16f neg_half = pmul(_x, pset1(-.5f)); __mmask16 denormal_mask = _mm512_kand( _mm512_cmp_ps_mask(_x, pset1((std::numeric_limits::min)()), _CMP_LT_OQ), _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_GE_OQ)); Packet16f x = _mm512_rsqrt14_ps(_x); // Do a single step of Newton's iteration. x = pmul(x, pmadd(neg_half, pmul(x, x), pset1(1.5f))); // Flush results for denormals to zero. return _mm512_mask_blend_ps(denormal_mask, pmul(_x,x), _mm512_setzero_ps()); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d psqrt(const Packet8d& _x) { Packet8d neg_half = pmul(_x, pset1(-.5)); __mmask16 denormal_mask = _mm512_kand( _mm512_cmp_pd_mask(_x, pset1((std::numeric_limits::min)()), _CMP_LT_OQ), _mm512_cmp_pd_mask(_x, _mm512_setzero_pd(), _CMP_GE_OQ)); Packet8d x = _mm512_rsqrt14_pd(_x); // Do a single step of Newton's iteration. x = pmul(x, pmadd(neg_half, pmul(x, x), pset1(1.5))); // Do a second step of Newton's iteration. x = pmul(x, pmadd(neg_half, pmul(x, x), pset1(1.5))); return _mm512_mask_blend_pd(denormal_mask, pmul(_x,x), _mm512_setzero_pd()); } #else template <> EIGEN_STRONG_INLINE Packet16f psqrt(const Packet16f& x) { return _mm512_sqrt_ps(x); } template <> EIGEN_STRONG_INLINE Packet8d psqrt(const Packet8d& x) { return _mm512_sqrt_pd(x); } #endif F16_PACKET_FUNCTION(Packet16f, Packet16h, psqrt) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psqrt) // prsqrt for float. #if defined(EIGEN_VECTORIZE_AVX512ER) template <> EIGEN_STRONG_INLINE Packet16f prsqrt(const Packet16f& x) { return _mm512_rsqrt28_ps(x); } #elif EIGEN_FAST_MATH template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f prsqrt(const Packet16f& _x) { _EIGEN_DECLARE_CONST_Packet16f_FROM_INT(inf, 0x7f800000); _EIGEN_DECLARE_CONST_Packet16f(one_point_five, 1.5f); _EIGEN_DECLARE_CONST_Packet16f(minus_half, -0.5f); Packet16f neg_half = pmul(_x, p16f_minus_half); // Identity infinite, negative and denormal arguments. __mmask16 inf_mask = _mm512_cmp_ps_mask(_x, p16f_inf, _CMP_EQ_OQ); __mmask16 not_pos_mask = _mm512_cmp_ps_mask(_x, _mm512_setzero_ps(), _CMP_LE_OQ); __mmask16 not_finite_pos_mask = not_pos_mask | inf_mask; // Compute an approximate result using the rsqrt intrinsic, forcing +inf // for denormals for consistency with AVX and SSE implementations. Packet16f y_approx = _mm512_rsqrt14_ps(_x); // Do a single step of Newton-Raphson iteration to improve the approximation. // This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n). // It is essential to evaluate the inner term like this because forming // y_n^2 may over- or underflow. Packet16f y_newton = pmul(y_approx, pmadd(y_approx, pmul(neg_half, y_approx), p16f_one_point_five)); // Select the result of the Newton-Raphson step for positive finite arguments. // For other arguments, choose the output of the intrinsic. This will // return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf. return _mm512_mask_blend_ps(not_finite_pos_mask, y_newton, y_approx); } #else template <> EIGEN_STRONG_INLINE Packet16f prsqrt(const Packet16f& x) { _EIGEN_DECLARE_CONST_Packet16f(one, 1.0f); return _mm512_div_ps(p16f_one, _mm512_sqrt_ps(x)); } #endif F16_PACKET_FUNCTION(Packet16f, Packet16h, prsqrt) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, prsqrt) // prsqrt for double. #if EIGEN_FAST_MATH template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8d prsqrt(const Packet8d& _x) { _EIGEN_DECLARE_CONST_Packet8d(one_point_five, 1.5); _EIGEN_DECLARE_CONST_Packet8d(minus_half, -0.5); _EIGEN_DECLARE_CONST_Packet8d_FROM_INT64(inf, 0x7ff0000000000000LL); Packet8d neg_half = pmul(_x, p8d_minus_half); // Identity infinite, negative and denormal arguments. __mmask8 inf_mask = _mm512_cmp_pd_mask(_x, p8d_inf, _CMP_EQ_OQ); __mmask8 not_pos_mask = _mm512_cmp_pd_mask(_x, _mm512_setzero_pd(), _CMP_LE_OQ); __mmask8 not_finite_pos_mask = not_pos_mask | inf_mask; // Compute an approximate result using the rsqrt intrinsic, forcing +inf // for denormals for consistency with AVX and SSE implementations. #if defined(EIGEN_VECTORIZE_AVX512ER) Packet8d y_approx = _mm512_rsqrt28_pd(_x); #else Packet8d y_approx = _mm512_rsqrt14_pd(_x); #endif // Do one or two steps of Newton-Raphson's to improve the approximation, depending on the // starting accuracy (either 2^-14 or 2^-28, depending on whether AVX512ER is available). // The Newton-Raphson algorithm has quadratic convergence and roughly doubles the number // of correct digits for each step. // This uses the formula y_{n+1} = y_n * (1.5 - y_n * (0.5 * x) * y_n). // It is essential to evaluate the inner term like this because forming // y_n^2 may over- or underflow. Packet8d y_newton = pmul(y_approx, pmadd(neg_half, pmul(y_approx, y_approx), p8d_one_point_five)); #if !defined(EIGEN_VECTORIZE_AVX512ER) y_newton = pmul(y_newton, pmadd(y_newton, pmul(neg_half, y_newton), p8d_one_point_five)); #endif // Select the result of the Newton-Raphson step for positive finite arguments. // For other arguments, choose the output of the intrinsic. This will // return rsqrt(+inf) = 0, rsqrt(x) = NaN if x < 0, and rsqrt(0) = +inf. return _mm512_mask_blend_pd(not_finite_pos_mask, y_newton, y_approx); } #else template <> EIGEN_STRONG_INLINE Packet8d prsqrt(const Packet8d& x) { _EIGEN_DECLARE_CONST_Packet8d(one, 1.0f); return _mm512_div_pd(p8d_one, _mm512_sqrt_pd(x)); } #endif template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f plog1p(const Packet16f& _x) { return generic_plog1p(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, plog1p) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, plog1p) template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f pexpm1(const Packet16f& _x) { return generic_expm1(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, pexpm1) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pexpm1) #endif template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f psin(const Packet16f& _x) { return psin_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f pcos(const Packet16f& _x) { return pcos_float(_x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet16f ptanh(const Packet16f& _x) { return internal::generic_fast_tanh_float(_x); } F16_PACKET_FUNCTION(Packet16f, Packet16h, psin) F16_PACKET_FUNCTION(Packet16f, Packet16h, pcos) F16_PACKET_FUNCTION(Packet16f, Packet16h, ptanh) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, psin) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, pcos) BF16_PACKET_FUNCTION(Packet16f, Packet16bf, ptanh) } // end namespace internal } // end namespace Eigen #endif // THIRD_PARTY_EIGEN3_EIGEN_SRC_CORE_ARCH_AVX512_MATHFUNCTIONS_H_ RcppEigen/inst/include/Eigen/src/Core/arch/AVX512/PacketMath.h0000644000176200001440000025352314562417221023247 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner (benoit.steiner.goog@gmail.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_AVX512_H #define EIGEN_PACKET_MATH_AVX512_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 #endif #ifdef EIGEN_VECTORIZE_FMA #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #endif typedef __m512 Packet16f; typedef __m512i Packet16i; typedef __m512d Packet8d; typedef eigen_packet_wrapper<__m256i, 1> Packet16h; typedef eigen_packet_wrapper<__m256i, 2> Packet16bf; template <> struct is_arithmetic<__m512> { enum { value = true }; }; template <> struct is_arithmetic<__m512i> { enum { value = true }; }; template <> struct is_arithmetic<__m512d> { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16h type; // There is no half-size packet for Packet16h. typedef Packet16h half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 1, HasCmp = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 1, HasAbs2 = 0, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBlend = 0, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasBessel = 1, HasNdtri = 1 }; }; template<> struct packet_traits : default_packet_traits { typedef Packet16f type; typedef Packet8f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 1, HasAbs = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasBlend = 0, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, #if EIGEN_GNUC_AT_LEAST(5, 3) || (!EIGEN_COMP_GNUC_STRICT) HasLog = 1, HasLog1p = 1, HasExpm1 = 1, HasNdtri = 1, HasBessel = 1, HasExp = 1, HasSqrt = EIGEN_FAST_MATH, HasRsqrt = EIGEN_FAST_MATH, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, #endif HasCmp = 1, HasDiv = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1 }; }; template<> struct packet_traits : default_packet_traits { typedef Packet8d type; typedef Packet4d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, #if EIGEN_GNUC_AT_LEAST(5, 3) || (!EIGEN_COMP_GNUC_STRICT) HasLog = 1, HasExp = 1, HasSqrt = EIGEN_FAST_MATH, HasRsqrt = EIGEN_FAST_MATH, #endif HasCmp = 1, HasDiv = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1 }; }; /* TODO Implement AVX512 for integers template<> struct packet_traits : default_packet_traits { typedef Packet16i type; enum { Vectorizable = 1, AlignedOnScalar = 1, size=8 }; }; */ template <> struct unpacket_traits { typedef float type; typedef Packet8f half; typedef Packet16i integer_packet; typedef uint16_t mask_t; enum { size = 16, alignment=Aligned64, vectorizable=true, masked_load_available=true, masked_store_available=true }; }; template <> struct unpacket_traits { typedef double type; typedef Packet4d half; enum { size = 8, alignment=Aligned64, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template <> struct unpacket_traits { typedef int type; typedef Packet8i half; enum { size = 16, alignment=Aligned64, vectorizable=false, masked_load_available=false, masked_store_available=false }; }; template<> struct unpacket_traits { typedef Eigen::half type; typedef Packet8h half; enum {size=16, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template <> EIGEN_STRONG_INLINE Packet16f pset1(const float& from) { return _mm512_set1_ps(from); } template <> EIGEN_STRONG_INLINE Packet8d pset1(const double& from) { return _mm512_set1_pd(from); } template <> EIGEN_STRONG_INLINE Packet16i pset1(const int& from) { return _mm512_set1_epi32(from); } template <> EIGEN_STRONG_INLINE Packet16f pset1frombits(unsigned int from) { return _mm512_castsi512_ps(_mm512_set1_epi32(from)); } template <> EIGEN_STRONG_INLINE Packet8d pset1frombits(const numext::uint64_t from) { return _mm512_castsi512_pd(_mm512_set1_epi64(from)); } template<> EIGEN_STRONG_INLINE Packet16f pzero(const Packet16f& /*a*/) { return _mm512_setzero_ps(); } template<> EIGEN_STRONG_INLINE Packet8d pzero(const Packet8d& /*a*/) { return _mm512_setzero_pd(); } template<> EIGEN_STRONG_INLINE Packet16i pzero(const Packet16i& /*a*/) { return _mm512_setzero_si512(); } template<> EIGEN_STRONG_INLINE Packet16f peven_mask(const Packet16f& /*a*/) { return _mm512_castsi512_ps(_mm512_set_epi32(0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1)); } template<> EIGEN_STRONG_INLINE Packet16i peven_mask(const Packet16i& /*a*/) { return _mm512_set_epi32(0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1); } template<> EIGEN_STRONG_INLINE Packet8d peven_mask(const Packet8d& /*a*/) { return _mm512_castsi512_pd(_mm512_set_epi32(0, 0, -1, -1, 0, 0, -1, -1, 0, 0, -1, -1, 0, 0, -1, -1)); } template <> EIGEN_STRONG_INLINE Packet16f pload1(const float* from) { return _mm512_broadcastss_ps(_mm_load_ps1(from)); } template <> EIGEN_STRONG_INLINE Packet8d pload1(const double* from) { return _mm512_set1_pd(*from); } template <> EIGEN_STRONG_INLINE Packet16f plset(const float& a) { return _mm512_add_ps( _mm512_set1_ps(a), _mm512_set_ps(15.0f, 14.0f, 13.0f, 12.0f, 11.0f, 10.0f, 9.0f, 8.0f, 7.0f, 6.0f, 5.0f, 4.0f, 3.0f, 2.0f, 1.0f, 0.0f)); } template <> EIGEN_STRONG_INLINE Packet8d plset(const double& a) { return _mm512_add_pd(_mm512_set1_pd(a), _mm512_set_pd(7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.0)); } template <> EIGEN_STRONG_INLINE Packet16f padd(const Packet16f& a, const Packet16f& b) { return _mm512_add_ps(a, b); } template <> EIGEN_STRONG_INLINE Packet8d padd(const Packet8d& a, const Packet8d& b) { return _mm512_add_pd(a, b); } template <> EIGEN_STRONG_INLINE Packet16i padd(const Packet16i& a, const Packet16i& b) { return _mm512_add_epi32(a, b); } template <> EIGEN_STRONG_INLINE Packet16f psub(const Packet16f& a, const Packet16f& b) { return _mm512_sub_ps(a, b); } template <> EIGEN_STRONG_INLINE Packet8d psub(const Packet8d& a, const Packet8d& b) { return _mm512_sub_pd(a, b); } template <> EIGEN_STRONG_INLINE Packet16i psub(const Packet16i& a, const Packet16i& b) { return _mm512_sub_epi32(a, b); } template <> EIGEN_STRONG_INLINE Packet16f pnegate(const Packet16f& a) { return _mm512_sub_ps(_mm512_set1_ps(0.0), a); } template <> EIGEN_STRONG_INLINE Packet8d pnegate(const Packet8d& a) { return _mm512_sub_pd(_mm512_set1_pd(0.0), a); } template <> EIGEN_STRONG_INLINE Packet16f pconj(const Packet16f& a) { return a; } template <> EIGEN_STRONG_INLINE Packet8d pconj(const Packet8d& a) { return a; } template <> EIGEN_STRONG_INLINE Packet16i pconj(const Packet16i& a) { return a; } template <> EIGEN_STRONG_INLINE Packet16f pmul(const Packet16f& a, const Packet16f& b) { return _mm512_mul_ps(a, b); } template <> EIGEN_STRONG_INLINE Packet8d pmul(const Packet8d& a, const Packet8d& b) { return _mm512_mul_pd(a, b); } template <> EIGEN_STRONG_INLINE Packet16i pmul(const Packet16i& a, const Packet16i& b) { return _mm512_mullo_epi32(a, b); } template <> EIGEN_STRONG_INLINE Packet16f pdiv(const Packet16f& a, const Packet16f& b) { return _mm512_div_ps(a, b); } template <> EIGEN_STRONG_INLINE Packet8d pdiv(const Packet8d& a, const Packet8d& b) { return _mm512_div_pd(a, b); } #ifdef EIGEN_VECTORIZE_FMA template <> EIGEN_STRONG_INLINE Packet16f pmadd(const Packet16f& a, const Packet16f& b, const Packet16f& c) { return _mm512_fmadd_ps(a, b, c); } template <> EIGEN_STRONG_INLINE Packet8d pmadd(const Packet8d& a, const Packet8d& b, const Packet8d& c) { return _mm512_fmadd_pd(a, b, c); } #endif template <> EIGEN_DEVICE_FUNC inline Packet16f pselect(const Packet16f& mask, const Packet16f& a, const Packet16f& b) { __mmask16 mask16 = _mm512_cmp_epi32_mask( _mm512_castps_si512(mask), _mm512_setzero_epi32(), _MM_CMPINT_EQ); return _mm512_mask_blend_ps(mask16, a, b); } template <> EIGEN_DEVICE_FUNC inline Packet8d pselect(const Packet8d& mask, const Packet8d& a, const Packet8d& b) { __mmask8 mask8 = _mm512_cmp_epi64_mask(_mm512_castpd_si512(mask), _mm512_setzero_epi32(), _MM_CMPINT_EQ); return _mm512_mask_blend_pd(mask8, a, b); } template <> EIGEN_STRONG_INLINE Packet16f pmin(const Packet16f& a, const Packet16f& b) { // Arguments are reversed to match NaN propagation behavior of std::min. return _mm512_min_ps(b, a); } template <> EIGEN_STRONG_INLINE Packet8d pmin(const Packet8d& a, const Packet8d& b) { // Arguments are reversed to match NaN propagation behavior of std::min. return _mm512_min_pd(b, a); } template <> EIGEN_STRONG_INLINE Packet16f pmax(const Packet16f& a, const Packet16f& b) { // Arguments are reversed to match NaN propagation behavior of std::max. return _mm512_max_ps(b, a); } template <> EIGEN_STRONG_INLINE Packet8d pmax(const Packet8d& a, const Packet8d& b) { // Arguments are reversed to match NaN propagation behavior of std::max. return _mm512_max_pd(b, a); } // Add specializations for min/max with prescribed NaN progation. template<> EIGEN_STRONG_INLINE Packet16f pmin(const Packet16f& a, const Packet16f& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet8d pmin(const Packet8d& a, const Packet8d& b) { return pminmax_propagate_numbers(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet16f pmax(const Packet16f& a, const Packet16f& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet8d pmax(const Packet8d& a, const Packet8d& b) { return pminmax_propagate_numbers(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet16f pmin(const Packet16f& a, const Packet16f& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet8d pmin(const Packet8d& a, const Packet8d& b) { return pminmax_propagate_nan(a, b, pmin); } template<> EIGEN_STRONG_INLINE Packet16f pmax(const Packet16f& a, const Packet16f& b) { return pminmax_propagate_nan(a, b, pmax); } template<> EIGEN_STRONG_INLINE Packet8d pmax(const Packet8d& a, const Packet8d& b) { return pminmax_propagate_nan(a, b, pmax); } #ifdef EIGEN_VECTORIZE_AVX512DQ template EIGEN_STRONG_INLINE Packet8f extract256(Packet16f x) { return _mm512_extractf32x8_ps(x,I_); } template EIGEN_STRONG_INLINE Packet2d extract128(Packet8d x) { return _mm512_extractf64x2_pd(x,I_); } EIGEN_STRONG_INLINE Packet16f cat256(Packet8f a, Packet8f b) { return _mm512_insertf32x8(_mm512_castps256_ps512(a),b,1); } #else // AVX512F does not define _mm512_extractf32x8_ps to extract _m256 from _m512 template EIGEN_STRONG_INLINE Packet8f extract256(Packet16f x) { return _mm256_castsi256_ps(_mm512_extracti64x4_epi64( _mm512_castps_si512(x),I_)); } // AVX512F does not define _mm512_extractf64x2_pd to extract _m128 from _m512 template EIGEN_STRONG_INLINE Packet2d extract128(Packet8d x) { return _mm_castsi128_pd(_mm512_extracti32x4_epi32( _mm512_castpd_si512(x),I_)); } EIGEN_STRONG_INLINE Packet16f cat256(Packet8f a, Packet8f b) { return _mm512_castsi512_ps(_mm512_inserti64x4(_mm512_castsi256_si512(_mm256_castps_si256(a)), _mm256_castps_si256(b),1)); } #endif // Helper function for bit packing snippet of low precision comparison. // It packs the flags from 32x16 to 16x16. EIGEN_STRONG_INLINE __m256i Pack32To16(Packet16f rf) { // Split data into small pieces and handle with AVX instructions // to guarantee internal order of vector. // Operation: // dst[15:0] := Saturate16(rf[31:0]) // dst[31:16] := Saturate16(rf[63:32]) // ... // dst[255:240] := Saturate16(rf[255:224]) __m256i lo = _mm256_castps_si256(extract256<0>(rf)); __m256i hi = _mm256_castps_si256(extract256<1>(rf)); __m128i result_lo = _mm_packs_epi32(_mm256_extractf128_si256(lo, 0), _mm256_extractf128_si256(lo, 1)); __m128i result_hi = _mm_packs_epi32(_mm256_extractf128_si256(hi, 0), _mm256_extractf128_si256(hi, 1)); return _mm256_insertf128_si256(_mm256_castsi128_si256(result_lo), result_hi, 1); } template <> EIGEN_STRONG_INLINE Packet16f pcmp_eq(const Packet16f& a, const Packet16f& b) { __mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_EQ_OQ); return _mm512_castsi512_ps( _mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu)); } template<> EIGEN_STRONG_INLINE Packet16f pcmp_le(const Packet16f& a, const Packet16f& b) { __mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_LE_OQ); return _mm512_castsi512_ps( _mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu)); } template<> EIGEN_STRONG_INLINE Packet16f pcmp_lt(const Packet16f& a, const Packet16f& b) { __mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_LT_OQ); return _mm512_castsi512_ps( _mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu)); } template<> EIGEN_STRONG_INLINE Packet16f pcmp_lt_or_nan(const Packet16f& a, const Packet16f& b) { __mmask16 mask = _mm512_cmp_ps_mask(a, b, _CMP_NGE_UQ); return _mm512_castsi512_ps( _mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu)); } template<> EIGEN_STRONG_INLINE Packet16i pcmp_eq(const Packet16i& a, const Packet16i& b) { __mmask16 mask = _mm512_cmp_epi32_mask(a, b, _CMP_EQ_OQ); return _mm512_mask_set1_epi32(_mm512_set1_epi32(0), mask, 0xffffffffu); } template <> EIGEN_STRONG_INLINE Packet8d pcmp_eq(const Packet8d& a, const Packet8d& b) { __mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_EQ_OQ); return _mm512_castsi512_pd( _mm512_mask_set1_epi64(_mm512_set1_epi64(0), mask, 0xffffffffffffffffu)); } template <> EIGEN_STRONG_INLINE Packet8d pcmp_le(const Packet8d& a, const Packet8d& b) { __mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_LE_OQ); return _mm512_castsi512_pd( _mm512_mask_set1_epi64(_mm512_set1_epi64(0), mask, 0xffffffffffffffffu)); } template <> EIGEN_STRONG_INLINE Packet8d pcmp_lt(const Packet8d& a, const Packet8d& b) { __mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_LT_OQ); return _mm512_castsi512_pd( _mm512_mask_set1_epi64(_mm512_set1_epi64(0), mask, 0xffffffffffffffffu)); } template <> EIGEN_STRONG_INLINE Packet8d pcmp_lt_or_nan(const Packet8d& a, const Packet8d& b) { __mmask8 mask = _mm512_cmp_pd_mask(a, b, _CMP_NGE_UQ); return _mm512_castsi512_pd( _mm512_mask_set1_epi64(_mm512_set1_epi64(0), mask, 0xffffffffffffffffu)); } template<> EIGEN_STRONG_INLINE Packet16f print(const Packet16f& a) { return _mm512_roundscale_ps(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet8d print(const Packet8d& a) { return _mm512_roundscale_pd(a, _MM_FROUND_CUR_DIRECTION); } template<> EIGEN_STRONG_INLINE Packet16f pceil(const Packet16f& a) { return _mm512_roundscale_ps(a, _MM_FROUND_TO_POS_INF); } template<> EIGEN_STRONG_INLINE Packet8d pceil(const Packet8d& a) { return _mm512_roundscale_pd(a, _MM_FROUND_TO_POS_INF); } template<> EIGEN_STRONG_INLINE Packet16f pfloor(const Packet16f& a) { return _mm512_roundscale_ps(a, _MM_FROUND_TO_NEG_INF); } template<> EIGEN_STRONG_INLINE Packet8d pfloor(const Packet8d& a) { return _mm512_roundscale_pd(a, _MM_FROUND_TO_NEG_INF); } template <> EIGEN_STRONG_INLINE Packet16i ptrue(const Packet16i& /*a*/) { return _mm512_set1_epi32(0xffffffffu); } template <> EIGEN_STRONG_INLINE Packet16f ptrue(const Packet16f& a) { return _mm512_castsi512_ps(ptrue(_mm512_castps_si512(a))); } template <> EIGEN_STRONG_INLINE Packet8d ptrue(const Packet8d& a) { return _mm512_castsi512_pd(ptrue(_mm512_castpd_si512(a))); } template <> EIGEN_STRONG_INLINE Packet16i pand(const Packet16i& a, const Packet16i& b) { return _mm512_and_si512(a,b); } template <> EIGEN_STRONG_INLINE Packet16f pand(const Packet16f& a, const Packet16f& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_and_ps(a, b); #else return _mm512_castsi512_ps(pand(_mm512_castps_si512(a),_mm512_castps_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet8d pand(const Packet8d& a, const Packet8d& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_and_pd(a, b); #else Packet8d res = _mm512_undefined_pd(); Packet4d lane0_a = _mm512_extractf64x4_pd(a, 0); Packet4d lane0_b = _mm512_extractf64x4_pd(b, 0); res = _mm512_insertf64x4(res, _mm256_and_pd(lane0_a, lane0_b), 0); Packet4d lane1_a = _mm512_extractf64x4_pd(a, 1); Packet4d lane1_b = _mm512_extractf64x4_pd(b, 1); return _mm512_insertf64x4(res, _mm256_and_pd(lane1_a, lane1_b), 1); #endif } template <> EIGEN_STRONG_INLINE Packet16i por(const Packet16i& a, const Packet16i& b) { return _mm512_or_si512(a, b); } template <> EIGEN_STRONG_INLINE Packet16f por(const Packet16f& a, const Packet16f& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_or_ps(a, b); #else return _mm512_castsi512_ps(por(_mm512_castps_si512(a),_mm512_castps_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet8d por(const Packet8d& a, const Packet8d& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_or_pd(a, b); #else return _mm512_castsi512_pd(por(_mm512_castpd_si512(a),_mm512_castpd_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet16i pxor(const Packet16i& a, const Packet16i& b) { return _mm512_xor_si512(a, b); } template <> EIGEN_STRONG_INLINE Packet16f pxor(const Packet16f& a, const Packet16f& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_xor_ps(a, b); #else return _mm512_castsi512_ps(pxor(_mm512_castps_si512(a),_mm512_castps_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet8d pxor(const Packet8d& a, const Packet8d& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_xor_pd(a, b); #else return _mm512_castsi512_pd(pxor(_mm512_castpd_si512(a),_mm512_castpd_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet16i pandnot(const Packet16i& a, const Packet16i& b) { return _mm512_andnot_si512(b, a); } template <> EIGEN_STRONG_INLINE Packet16f pandnot(const Packet16f& a, const Packet16f& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_andnot_ps(b, a); #else return _mm512_castsi512_ps(pandnot(_mm512_castps_si512(a),_mm512_castps_si512(b))); #endif } template <> EIGEN_STRONG_INLINE Packet8d pandnot(const Packet8d& a,const Packet8d& b) { #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_andnot_pd(b, a); #else return _mm512_castsi512_pd(pandnot(_mm512_castpd_si512(a),_mm512_castpd_si512(b))); #endif } template<> EIGEN_STRONG_INLINE Packet16f pround(const Packet16f& a) { // Work-around for default std::round rounding mode. const Packet16f mask = pset1frombits(static_cast(0x80000000u)); const Packet16f prev0dot5 = pset1frombits(static_cast(0x3EFFFFFFu)); return _mm512_roundscale_ps(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template<> EIGEN_STRONG_INLINE Packet8d pround(const Packet8d& a) { // Work-around for default std::round rounding mode. const Packet8d mask = pset1frombits(static_cast(0x8000000000000000ull)); const Packet8d prev0dot5 = pset1frombits(static_cast(0x3FDFFFFFFFFFFFFFull)); return _mm512_roundscale_pd(padd(por(pand(a, mask), prev0dot5), a), _MM_FROUND_TO_ZERO); } template EIGEN_STRONG_INLINE Packet16i parithmetic_shift_right(Packet16i a) { return _mm512_srai_epi32(a, N); } template EIGEN_STRONG_INLINE Packet16i plogical_shift_right(Packet16i a) { return _mm512_srli_epi32(a, N); } template EIGEN_STRONG_INLINE Packet16i plogical_shift_left(Packet16i a) { return _mm512_slli_epi32(a, N); } template <> EIGEN_STRONG_INLINE Packet16f pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm512_load_ps(from); } template <> EIGEN_STRONG_INLINE Packet8d pload(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm512_load_pd(from); } template <> EIGEN_STRONG_INLINE Packet16i pload(const int* from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm512_load_si512( reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet16f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm512_loadu_ps(from); } template <> EIGEN_STRONG_INLINE Packet8d ploadu(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm512_loadu_pd(from); } template <> EIGEN_STRONG_INLINE Packet16i ploadu(const int* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm512_loadu_si512( reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet16f ploadu(const float* from, uint16_t umask) { __mmask16 mask = static_cast<__mmask16>(umask); EIGEN_DEBUG_UNALIGNED_LOAD return _mm512_maskz_loadu_ps(mask, from); } // Loads 8 floats from memory a returns the packet // {a0, a0 a1, a1, a2, a2, a3, a3, a4, a4, a5, a5, a6, a6, a7, a7} template <> EIGEN_STRONG_INLINE Packet16f ploaddup(const float* from) { // an unaligned load is required here as there is no requirement // on the alignment of input pointer 'from' __m256i low_half = _mm256_loadu_si256(reinterpret_cast(from)); __m512 even_elements = _mm512_castsi512_ps(_mm512_cvtepu32_epi64(low_half)); __m512 pairs = _mm512_permute_ps(even_elements, _MM_SHUFFLE(2, 2, 0, 0)); return pairs; } #ifdef EIGEN_VECTORIZE_AVX512DQ // FIXME: this does not look optimal, better load a Packet4d and shuffle... // Loads 4 doubles from memory a returns the packet {a0, a0 a1, a1, a2, a2, a3, // a3} template <> EIGEN_STRONG_INLINE Packet8d ploaddup(const double* from) { __m512d x = _mm512_setzero_pd(); x = _mm512_insertf64x2(x, _mm_loaddup_pd(&from[0]), 0); x = _mm512_insertf64x2(x, _mm_loaddup_pd(&from[1]), 1); x = _mm512_insertf64x2(x, _mm_loaddup_pd(&from[2]), 2); x = _mm512_insertf64x2(x, _mm_loaddup_pd(&from[3]), 3); return x; } #else template <> EIGEN_STRONG_INLINE Packet8d ploaddup(const double* from) { __m512d x = _mm512_setzero_pd(); x = _mm512_mask_broadcastsd_pd(x, 0x3<<0, _mm_load_sd(from+0)); x = _mm512_mask_broadcastsd_pd(x, 0x3<<2, _mm_load_sd(from+1)); x = _mm512_mask_broadcastsd_pd(x, 0x3<<4, _mm_load_sd(from+2)); x = _mm512_mask_broadcastsd_pd(x, 0x3<<6, _mm_load_sd(from+3)); return x; } #endif // Loads 4 floats from memory a returns the packet // {a0, a0 a0, a0, a1, a1, a1, a1, a2, a2, a2, a2, a3, a3, a3, a3} template <> EIGEN_STRONG_INLINE Packet16f ploadquad(const float* from) { Packet16f tmp = _mm512_castps128_ps512(ploadu(from)); const Packet16i scatter_mask = _mm512_set_epi32(3,3,3,3, 2,2,2,2, 1,1,1,1, 0,0,0,0); return _mm512_permutexvar_ps(scatter_mask, tmp); } // Loads 2 doubles from memory a returns the packet // {a0, a0 a0, a0, a1, a1, a1, a1} template <> EIGEN_STRONG_INLINE Packet8d ploadquad(const double* from) { __m256d lane0 = _mm256_set1_pd(*from); __m256d lane1 = _mm256_set1_pd(*(from+1)); __m512d tmp = _mm512_undefined_pd(); tmp = _mm512_insertf64x4(tmp, lane0, 0); return _mm512_insertf64x4(tmp, lane1, 1); } template <> EIGEN_STRONG_INLINE void pstore(float* to, const Packet16f& from) { EIGEN_DEBUG_ALIGNED_STORE _mm512_store_ps(to, from); } template <> EIGEN_STRONG_INLINE void pstore(double* to, const Packet8d& from) { EIGEN_DEBUG_ALIGNED_STORE _mm512_store_pd(to, from); } template <> EIGEN_STRONG_INLINE void pstore(int* to, const Packet16i& from) { EIGEN_DEBUG_ALIGNED_STORE _mm512_storeu_si512(reinterpret_cast<__m512i*>(to), from); } template <> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet16f& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm512_storeu_ps(to, from); } template <> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet8d& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm512_storeu_pd(to, from); } template <> EIGEN_STRONG_INLINE void pstoreu(int* to, const Packet16i& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm512_storeu_si512( reinterpret_cast<__m512i*>(to), from); } template <> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet16f& from, uint16_t umask) { __mmask16 mask = static_cast<__mmask16>(umask); EIGEN_DEBUG_UNALIGNED_STORE return _mm512_mask_storeu_ps(to, mask, from); } template <> EIGEN_DEVICE_FUNC inline Packet16f pgather(const float* from, Index stride) { Packet16i stride_vector = _mm512_set1_epi32(convert_index(stride)); Packet16i stride_multiplier = _mm512_set_epi32(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); Packet16i indices = _mm512_mullo_epi32(stride_vector, stride_multiplier); return _mm512_i32gather_ps(indices, from, 4); } template <> EIGEN_DEVICE_FUNC inline Packet8d pgather(const double* from, Index stride) { Packet8i stride_vector = _mm256_set1_epi32(convert_index(stride)); Packet8i stride_multiplier = _mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0); Packet8i indices = _mm256_mullo_epi32(stride_vector, stride_multiplier); return _mm512_i32gather_pd(indices, from, 8); } template <> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet16f& from, Index stride) { Packet16i stride_vector = _mm512_set1_epi32(convert_index(stride)); Packet16i stride_multiplier = _mm512_set_epi32(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); Packet16i indices = _mm512_mullo_epi32(stride_vector, stride_multiplier); _mm512_i32scatter_ps(to, indices, from, 4); } template <> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet8d& from, Index stride) { Packet8i stride_vector = _mm256_set1_epi32(convert_index(stride)); Packet8i stride_multiplier = _mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0); Packet8i indices = _mm256_mullo_epi32(stride_vector, stride_multiplier); _mm512_i32scatter_pd(to, indices, from, 8); } template <> EIGEN_STRONG_INLINE void pstore1(float* to, const float& a) { Packet16f pa = pset1(a); pstore(to, pa); } template <> EIGEN_STRONG_INLINE void pstore1(double* to, const double& a) { Packet8d pa = pset1(a); pstore(to, pa); } template <> EIGEN_STRONG_INLINE void pstore1(int* to, const int& a) { Packet16i pa = pset1(a); pstore(to, pa); } template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template<> EIGEN_STRONG_INLINE void prefetch(const int* addr) { _mm_prefetch((SsePrefetchPtrType)(addr), _MM_HINT_T0); } template <> EIGEN_STRONG_INLINE float pfirst(const Packet16f& a) { return _mm_cvtss_f32(_mm512_extractf32x4_ps(a, 0)); } template <> EIGEN_STRONG_INLINE double pfirst(const Packet8d& a) { return _mm_cvtsd_f64(_mm256_extractf128_pd(_mm512_extractf64x4_pd(a, 0), 0)); } template <> EIGEN_STRONG_INLINE int pfirst(const Packet16i& a) { return _mm_extract_epi32(_mm512_extracti32x4_epi32(a, 0), 0); } template<> EIGEN_STRONG_INLINE Packet16f preverse(const Packet16f& a) { return _mm512_permutexvar_ps(_mm512_set_epi32(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), a); } template<> EIGEN_STRONG_INLINE Packet8d preverse(const Packet8d& a) { return _mm512_permutexvar_pd(_mm512_set_epi32(0, 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7), a); } template<> EIGEN_STRONG_INLINE Packet16f pabs(const Packet16f& a) { // _mm512_abs_ps intrinsic not found, so hack around it return _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(a), _mm512_set1_epi32(0x7fffffff))); } template <> EIGEN_STRONG_INLINE Packet8d pabs(const Packet8d& a) { // _mm512_abs_ps intrinsic not found, so hack around it return _mm512_castsi512_pd(_mm512_and_si512(_mm512_castpd_si512(a), _mm512_set1_epi64(0x7fffffffffffffff))); } template<> EIGEN_STRONG_INLINE Packet16f pfrexp(const Packet16f& a, Packet16f& exponent){ return pfrexp_generic(a, exponent); } // Extract exponent without existence of Packet8l. template<> EIGEN_STRONG_INLINE Packet8d pfrexp_generic_get_biased_exponent(const Packet8d& a) { const Packet8d cst_exp_mask = pset1frombits(static_cast(0x7ff0000000000000ull)); #ifdef EIGEN_VECTORIZE_AVX512DQ return _mm512_cvtepi64_pd(_mm512_srli_epi64(_mm512_castpd_si512(pand(a, cst_exp_mask)), 52)); #else return _mm512_cvtepi32_pd(_mm512_cvtepi64_epi32(_mm512_srli_epi64(_mm512_castpd_si512(pand(a, cst_exp_mask)), 52))); #endif } template<> EIGEN_STRONG_INLINE Packet8d pfrexp(const Packet8d& a, Packet8d& exponent) { return pfrexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet16f pldexp(const Packet16f& a, const Packet16f& exponent) { return pldexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet8d pldexp(const Packet8d& a, const Packet8d& exponent) { // Clamp exponent to [-2099, 2099] const Packet8d max_exponent = pset1(2099.0); const Packet8i e = _mm512_cvtpd_epi32(pmin(pmax(exponent, pnegate(max_exponent)), max_exponent)); // Split 2^e into four factors and multiply. const Packet8i bias = pset1(1023); Packet8i b = parithmetic_shift_right<2>(e); // floor(e/4) // 2^b const Packet8i permute_idx = _mm256_setr_epi32(0, 4, 1, 5, 2, 6, 3, 7); Packet8i hi = _mm256_permutevar8x32_epi32(padd(b, bias), permute_idx); Packet8i lo = _mm256_slli_epi64(hi, 52); hi = _mm256_slli_epi64(_mm256_srli_epi64(hi, 32), 52); Packet8d c = _mm512_castsi512_pd(_mm512_inserti64x4(_mm512_castsi256_si512(lo), hi, 1)); Packet8d out = pmul(pmul(pmul(a, c), c), c); // a * 2^(3b) // 2^(e - 3b) b = psub(psub(psub(e, b), b), b); // e - 3b hi = _mm256_permutevar8x32_epi32(padd(b, bias), permute_idx); lo = _mm256_slli_epi64(hi, 52); hi = _mm256_slli_epi64(_mm256_srli_epi64(hi, 32), 52); c = _mm512_castsi512_pd(_mm512_inserti64x4(_mm512_castsi256_si512(lo), hi, 1)); out = pmul(out, c); // a * 2^e return out; } #ifdef EIGEN_VECTORIZE_AVX512DQ // AVX512F does not define _mm512_extractf32x8_ps to extract _m256 from _m512 #define EIGEN_EXTRACT_8f_FROM_16f(INPUT, OUTPUT) \ __m256 OUTPUT##_0 = _mm512_extractf32x8_ps(INPUT, 0); \ __m256 OUTPUT##_1 = _mm512_extractf32x8_ps(INPUT, 1) #else #define EIGEN_EXTRACT_8f_FROM_16f(INPUT, OUTPUT) \ __m256 OUTPUT##_0 = _mm256_insertf128_ps( \ _mm256_castps128_ps256(_mm512_extractf32x4_ps(INPUT, 0)), \ _mm512_extractf32x4_ps(INPUT, 1), 1); \ __m256 OUTPUT##_1 = _mm256_insertf128_ps( \ _mm256_castps128_ps256(_mm512_extractf32x4_ps(INPUT, 2)), \ _mm512_extractf32x4_ps(INPUT, 3), 1); #endif #ifdef EIGEN_VECTORIZE_AVX512DQ #define EIGEN_INSERT_8f_INTO_16f(OUTPUT, INPUTA, INPUTB) \ OUTPUT = _mm512_insertf32x8(_mm512_castps256_ps512(INPUTA), INPUTB, 1); #else #define EIGEN_INSERT_8f_INTO_16f(OUTPUT, INPUTA, INPUTB) \ OUTPUT = _mm512_undefined_ps(); \ OUTPUT = _mm512_insertf32x4(OUTPUT, _mm256_extractf128_ps(INPUTA, 0), 0); \ OUTPUT = _mm512_insertf32x4(OUTPUT, _mm256_extractf128_ps(INPUTA, 1), 1); \ OUTPUT = _mm512_insertf32x4(OUTPUT, _mm256_extractf128_ps(INPUTB, 0), 2); \ OUTPUT = _mm512_insertf32x4(OUTPUT, _mm256_extractf128_ps(INPUTB, 1), 3); #endif template <> EIGEN_STRONG_INLINE float predux(const Packet16f& a) { #ifdef EIGEN_VECTORIZE_AVX512DQ __m256 lane0 = _mm512_extractf32x8_ps(a, 0); __m256 lane1 = _mm512_extractf32x8_ps(a, 1); Packet8f x = _mm256_add_ps(lane0, lane1); return predux(x); #else __m128 lane0 = _mm512_extractf32x4_ps(a, 0); __m128 lane1 = _mm512_extractf32x4_ps(a, 1); __m128 lane2 = _mm512_extractf32x4_ps(a, 2); __m128 lane3 = _mm512_extractf32x4_ps(a, 3); __m128 sum = _mm_add_ps(_mm_add_ps(lane0, lane1), _mm_add_ps(lane2, lane3)); sum = _mm_hadd_ps(sum, sum); sum = _mm_hadd_ps(sum, _mm_permute_ps(sum, 1)); return _mm_cvtss_f32(sum); #endif } template <> EIGEN_STRONG_INLINE double predux(const Packet8d& a) { __m256d lane0 = _mm512_extractf64x4_pd(a, 0); __m256d lane1 = _mm512_extractf64x4_pd(a, 1); __m256d sum = _mm256_add_pd(lane0, lane1); __m256d tmp0 = _mm256_hadd_pd(sum, _mm256_permute2f128_pd(sum, sum, 1)); return _mm_cvtsd_f64(_mm256_castpd256_pd128(_mm256_hadd_pd(tmp0, tmp0))); } template <> EIGEN_STRONG_INLINE Packet8f predux_half_dowto4(const Packet16f& a) { #ifdef EIGEN_VECTORIZE_AVX512DQ __m256 lane0 = _mm512_extractf32x8_ps(a, 0); __m256 lane1 = _mm512_extractf32x8_ps(a, 1); return _mm256_add_ps(lane0, lane1); #else __m128 lane0 = _mm512_extractf32x4_ps(a, 0); __m128 lane1 = _mm512_extractf32x4_ps(a, 1); __m128 lane2 = _mm512_extractf32x4_ps(a, 2); __m128 lane3 = _mm512_extractf32x4_ps(a, 3); __m128 sum0 = _mm_add_ps(lane0, lane2); __m128 sum1 = _mm_add_ps(lane1, lane3); return _mm256_insertf128_ps(_mm256_castps128_ps256(sum0), sum1, 1); #endif } template <> EIGEN_STRONG_INLINE Packet4d predux_half_dowto4(const Packet8d& a) { __m256d lane0 = _mm512_extractf64x4_pd(a, 0); __m256d lane1 = _mm512_extractf64x4_pd(a, 1); return _mm256_add_pd(lane0, lane1); } template <> EIGEN_STRONG_INLINE float predux_mul(const Packet16f& a) { //#ifdef EIGEN_VECTORIZE_AVX512DQ #if 0 Packet8f lane0 = _mm512_extractf32x8_ps(a, 0); Packet8f lane1 = _mm512_extractf32x8_ps(a, 1); Packet8f res = pmul(lane0, lane1); res = pmul(res, _mm256_permute2f128_ps(res, res, 1)); res = pmul(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 3, 2))); return pfirst(pmul(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 0, 1)))); #else __m128 lane0 = _mm512_extractf32x4_ps(a, 0); __m128 lane1 = _mm512_extractf32x4_ps(a, 1); __m128 lane2 = _mm512_extractf32x4_ps(a, 2); __m128 lane3 = _mm512_extractf32x4_ps(a, 3); __m128 res = pmul(pmul(lane0, lane1), pmul(lane2, lane3)); res = pmul(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 3, 2))); return pfirst(pmul(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 0, 1)))); #endif } template <> EIGEN_STRONG_INLINE double predux_mul(const Packet8d& a) { __m256d lane0 = _mm512_extractf64x4_pd(a, 0); __m256d lane1 = _mm512_extractf64x4_pd(a, 1); __m256d res = pmul(lane0, lane1); res = pmul(res, _mm256_permute2f128_pd(res, res, 1)); return pfirst(pmul(res, _mm256_shuffle_pd(res, res, 1))); } template <> EIGEN_STRONG_INLINE float predux_min(const Packet16f& a) { __m128 lane0 = _mm512_extractf32x4_ps(a, 0); __m128 lane1 = _mm512_extractf32x4_ps(a, 1); __m128 lane2 = _mm512_extractf32x4_ps(a, 2); __m128 lane3 = _mm512_extractf32x4_ps(a, 3); __m128 res = _mm_min_ps(_mm_min_ps(lane0, lane1), _mm_min_ps(lane2, lane3)); res = _mm_min_ps(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 3, 2))); return pfirst(_mm_min_ps(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 0, 1)))); } template <> EIGEN_STRONG_INLINE double predux_min(const Packet8d& a) { __m256d lane0 = _mm512_extractf64x4_pd(a, 0); __m256d lane1 = _mm512_extractf64x4_pd(a, 1); __m256d res = _mm256_min_pd(lane0, lane1); res = _mm256_min_pd(res, _mm256_permute2f128_pd(res, res, 1)); return pfirst(_mm256_min_pd(res, _mm256_shuffle_pd(res, res, 1))); } template <> EIGEN_STRONG_INLINE float predux_max(const Packet16f& a) { __m128 lane0 = _mm512_extractf32x4_ps(a, 0); __m128 lane1 = _mm512_extractf32x4_ps(a, 1); __m128 lane2 = _mm512_extractf32x4_ps(a, 2); __m128 lane3 = _mm512_extractf32x4_ps(a, 3); __m128 res = _mm_max_ps(_mm_max_ps(lane0, lane1), _mm_max_ps(lane2, lane3)); res = _mm_max_ps(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 3, 2))); return pfirst(_mm_max_ps(res, _mm_permute_ps(res, _MM_SHUFFLE(0, 0, 0, 1)))); } template <> EIGEN_STRONG_INLINE double predux_max(const Packet8d& a) { __m256d lane0 = _mm512_extractf64x4_pd(a, 0); __m256d lane1 = _mm512_extractf64x4_pd(a, 1); __m256d res = _mm256_max_pd(lane0, lane1); res = _mm256_max_pd(res, _mm256_permute2f128_pd(res, res, 1)); return pfirst(_mm256_max_pd(res, _mm256_shuffle_pd(res, res, 1))); } template<> EIGEN_STRONG_INLINE bool predux_any(const Packet16f& x) { Packet16i xi = _mm512_castps_si512(x); __mmask16 tmp = _mm512_test_epi32_mask(xi,xi); return !_mm512_kortestz(tmp,tmp); } #define PACK_OUTPUT(OUTPUT, INPUT, INDEX, STRIDE) \ EIGEN_INSERT_8f_INTO_16f(OUTPUT[INDEX], INPUT[INDEX], INPUT[INDEX + STRIDE]); EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m512 T0 = _mm512_unpacklo_ps(kernel.packet[0], kernel.packet[1]); __m512 T1 = _mm512_unpackhi_ps(kernel.packet[0], kernel.packet[1]); __m512 T2 = _mm512_unpacklo_ps(kernel.packet[2], kernel.packet[3]); __m512 T3 = _mm512_unpackhi_ps(kernel.packet[2], kernel.packet[3]); __m512 T4 = _mm512_unpacklo_ps(kernel.packet[4], kernel.packet[5]); __m512 T5 = _mm512_unpackhi_ps(kernel.packet[4], kernel.packet[5]); __m512 T6 = _mm512_unpacklo_ps(kernel.packet[6], kernel.packet[7]); __m512 T7 = _mm512_unpackhi_ps(kernel.packet[6], kernel.packet[7]); __m512 T8 = _mm512_unpacklo_ps(kernel.packet[8], kernel.packet[9]); __m512 T9 = _mm512_unpackhi_ps(kernel.packet[8], kernel.packet[9]); __m512 T10 = _mm512_unpacklo_ps(kernel.packet[10], kernel.packet[11]); __m512 T11 = _mm512_unpackhi_ps(kernel.packet[10], kernel.packet[11]); __m512 T12 = _mm512_unpacklo_ps(kernel.packet[12], kernel.packet[13]); __m512 T13 = _mm512_unpackhi_ps(kernel.packet[12], kernel.packet[13]); __m512 T14 = _mm512_unpacklo_ps(kernel.packet[14], kernel.packet[15]); __m512 T15 = _mm512_unpackhi_ps(kernel.packet[14], kernel.packet[15]); __m512 S0 = _mm512_shuffle_ps(T0, T2, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S1 = _mm512_shuffle_ps(T0, T2, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S2 = _mm512_shuffle_ps(T1, T3, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S3 = _mm512_shuffle_ps(T1, T3, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S4 = _mm512_shuffle_ps(T4, T6, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S5 = _mm512_shuffle_ps(T4, T6, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S6 = _mm512_shuffle_ps(T5, T7, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S7 = _mm512_shuffle_ps(T5, T7, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S8 = _mm512_shuffle_ps(T8, T10, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S9 = _mm512_shuffle_ps(T8, T10, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S10 = _mm512_shuffle_ps(T9, T11, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S11 = _mm512_shuffle_ps(T9, T11, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S12 = _mm512_shuffle_ps(T12, T14, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S13 = _mm512_shuffle_ps(T12, T14, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S14 = _mm512_shuffle_ps(T13, T15, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S15 = _mm512_shuffle_ps(T13, T15, _MM_SHUFFLE(3, 2, 3, 2)); EIGEN_EXTRACT_8f_FROM_16f(S0, S0); EIGEN_EXTRACT_8f_FROM_16f(S1, S1); EIGEN_EXTRACT_8f_FROM_16f(S2, S2); EIGEN_EXTRACT_8f_FROM_16f(S3, S3); EIGEN_EXTRACT_8f_FROM_16f(S4, S4); EIGEN_EXTRACT_8f_FROM_16f(S5, S5); EIGEN_EXTRACT_8f_FROM_16f(S6, S6); EIGEN_EXTRACT_8f_FROM_16f(S7, S7); EIGEN_EXTRACT_8f_FROM_16f(S8, S8); EIGEN_EXTRACT_8f_FROM_16f(S9, S9); EIGEN_EXTRACT_8f_FROM_16f(S10, S10); EIGEN_EXTRACT_8f_FROM_16f(S11, S11); EIGEN_EXTRACT_8f_FROM_16f(S12, S12); EIGEN_EXTRACT_8f_FROM_16f(S13, S13); EIGEN_EXTRACT_8f_FROM_16f(S14, S14); EIGEN_EXTRACT_8f_FROM_16f(S15, S15); PacketBlock tmp; tmp.packet[0] = _mm256_permute2f128_ps(S0_0, S4_0, 0x20); tmp.packet[1] = _mm256_permute2f128_ps(S1_0, S5_0, 0x20); tmp.packet[2] = _mm256_permute2f128_ps(S2_0, S6_0, 0x20); tmp.packet[3] = _mm256_permute2f128_ps(S3_0, S7_0, 0x20); tmp.packet[4] = _mm256_permute2f128_ps(S0_0, S4_0, 0x31); tmp.packet[5] = _mm256_permute2f128_ps(S1_0, S5_0, 0x31); tmp.packet[6] = _mm256_permute2f128_ps(S2_0, S6_0, 0x31); tmp.packet[7] = _mm256_permute2f128_ps(S3_0, S7_0, 0x31); tmp.packet[8] = _mm256_permute2f128_ps(S0_1, S4_1, 0x20); tmp.packet[9] = _mm256_permute2f128_ps(S1_1, S5_1, 0x20); tmp.packet[10] = _mm256_permute2f128_ps(S2_1, S6_1, 0x20); tmp.packet[11] = _mm256_permute2f128_ps(S3_1, S7_1, 0x20); tmp.packet[12] = _mm256_permute2f128_ps(S0_1, S4_1, 0x31); tmp.packet[13] = _mm256_permute2f128_ps(S1_1, S5_1, 0x31); tmp.packet[14] = _mm256_permute2f128_ps(S2_1, S6_1, 0x31); tmp.packet[15] = _mm256_permute2f128_ps(S3_1, S7_1, 0x31); // Second set of _m256 outputs tmp.packet[16] = _mm256_permute2f128_ps(S8_0, S12_0, 0x20); tmp.packet[17] = _mm256_permute2f128_ps(S9_0, S13_0, 0x20); tmp.packet[18] = _mm256_permute2f128_ps(S10_0, S14_0, 0x20); tmp.packet[19] = _mm256_permute2f128_ps(S11_0, S15_0, 0x20); tmp.packet[20] = _mm256_permute2f128_ps(S8_0, S12_0, 0x31); tmp.packet[21] = _mm256_permute2f128_ps(S9_0, S13_0, 0x31); tmp.packet[22] = _mm256_permute2f128_ps(S10_0, S14_0, 0x31); tmp.packet[23] = _mm256_permute2f128_ps(S11_0, S15_0, 0x31); tmp.packet[24] = _mm256_permute2f128_ps(S8_1, S12_1, 0x20); tmp.packet[25] = _mm256_permute2f128_ps(S9_1, S13_1, 0x20); tmp.packet[26] = _mm256_permute2f128_ps(S10_1, S14_1, 0x20); tmp.packet[27] = _mm256_permute2f128_ps(S11_1, S15_1, 0x20); tmp.packet[28] = _mm256_permute2f128_ps(S8_1, S12_1, 0x31); tmp.packet[29] = _mm256_permute2f128_ps(S9_1, S13_1, 0x31); tmp.packet[30] = _mm256_permute2f128_ps(S10_1, S14_1, 0x31); tmp.packet[31] = _mm256_permute2f128_ps(S11_1, S15_1, 0x31); // Pack them into the output PACK_OUTPUT(kernel.packet, tmp.packet, 0, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 1, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 2, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 3, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 4, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 5, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 6, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 7, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 8, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 9, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 10, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 11, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 12, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 13, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 14, 16); PACK_OUTPUT(kernel.packet, tmp.packet, 15, 16); } #define PACK_OUTPUT_2(OUTPUT, INPUT, INDEX, STRIDE) \ EIGEN_INSERT_8f_INTO_16f(OUTPUT[INDEX], INPUT[2 * INDEX], \ INPUT[2 * INDEX + STRIDE]); EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m512 T0 = _mm512_unpacklo_ps(kernel.packet[0], kernel.packet[1]); __m512 T1 = _mm512_unpackhi_ps(kernel.packet[0], kernel.packet[1]); __m512 T2 = _mm512_unpacklo_ps(kernel.packet[2], kernel.packet[3]); __m512 T3 = _mm512_unpackhi_ps(kernel.packet[2], kernel.packet[3]); __m512 S0 = _mm512_shuffle_ps(T0, T2, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S1 = _mm512_shuffle_ps(T0, T2, _MM_SHUFFLE(3, 2, 3, 2)); __m512 S2 = _mm512_shuffle_ps(T1, T3, _MM_SHUFFLE(1, 0, 1, 0)); __m512 S3 = _mm512_shuffle_ps(T1, T3, _MM_SHUFFLE(3, 2, 3, 2)); EIGEN_EXTRACT_8f_FROM_16f(S0, S0); EIGEN_EXTRACT_8f_FROM_16f(S1, S1); EIGEN_EXTRACT_8f_FROM_16f(S2, S2); EIGEN_EXTRACT_8f_FROM_16f(S3, S3); PacketBlock tmp; tmp.packet[0] = _mm256_permute2f128_ps(S0_0, S1_0, 0x20); tmp.packet[1] = _mm256_permute2f128_ps(S2_0, S3_0, 0x20); tmp.packet[2] = _mm256_permute2f128_ps(S0_0, S1_0, 0x31); tmp.packet[3] = _mm256_permute2f128_ps(S2_0, S3_0, 0x31); tmp.packet[4] = _mm256_permute2f128_ps(S0_1, S1_1, 0x20); tmp.packet[5] = _mm256_permute2f128_ps(S2_1, S3_1, 0x20); tmp.packet[6] = _mm256_permute2f128_ps(S0_1, S1_1, 0x31); tmp.packet[7] = _mm256_permute2f128_ps(S2_1, S3_1, 0x31); PACK_OUTPUT_2(kernel.packet, tmp.packet, 0, 1); PACK_OUTPUT_2(kernel.packet, tmp.packet, 1, 1); PACK_OUTPUT_2(kernel.packet, tmp.packet, 2, 1); PACK_OUTPUT_2(kernel.packet, tmp.packet, 3, 1); } #define PACK_OUTPUT_SQ_D(OUTPUT, INPUT, INDEX, STRIDE) \ OUTPUT[INDEX] = _mm512_insertf64x4(OUTPUT[INDEX], INPUT[INDEX], 0); \ OUTPUT[INDEX] = _mm512_insertf64x4(OUTPUT[INDEX], INPUT[INDEX + STRIDE], 1); #define PACK_OUTPUT_D(OUTPUT, INPUT, INDEX, STRIDE) \ OUTPUT[INDEX] = _mm512_insertf64x4(OUTPUT[INDEX], INPUT[(2 * INDEX)], 0); \ OUTPUT[INDEX] = \ _mm512_insertf64x4(OUTPUT[INDEX], INPUT[(2 * INDEX) + STRIDE], 1); EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m512d T0 = _mm512_shuffle_pd(kernel.packet[0], kernel.packet[1], 0); __m512d T1 = _mm512_shuffle_pd(kernel.packet[0], kernel.packet[1], 0xff); __m512d T2 = _mm512_shuffle_pd(kernel.packet[2], kernel.packet[3], 0); __m512d T3 = _mm512_shuffle_pd(kernel.packet[2], kernel.packet[3], 0xff); PacketBlock tmp; tmp.packet[0] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 0), _mm512_extractf64x4_pd(T2, 0), 0x20); tmp.packet[1] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 0), _mm512_extractf64x4_pd(T3, 0), 0x20); tmp.packet[2] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 0), _mm512_extractf64x4_pd(T2, 0), 0x31); tmp.packet[3] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 0), _mm512_extractf64x4_pd(T3, 0), 0x31); tmp.packet[4] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 1), _mm512_extractf64x4_pd(T2, 1), 0x20); tmp.packet[5] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 1), _mm512_extractf64x4_pd(T3, 1), 0x20); tmp.packet[6] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 1), _mm512_extractf64x4_pd(T2, 1), 0x31); tmp.packet[7] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 1), _mm512_extractf64x4_pd(T3, 1), 0x31); PACK_OUTPUT_D(kernel.packet, tmp.packet, 0, 1); PACK_OUTPUT_D(kernel.packet, tmp.packet, 1, 1); PACK_OUTPUT_D(kernel.packet, tmp.packet, 2, 1); PACK_OUTPUT_D(kernel.packet, tmp.packet, 3, 1); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m512d T0 = _mm512_unpacklo_pd(kernel.packet[0], kernel.packet[1]); __m512d T1 = _mm512_unpackhi_pd(kernel.packet[0], kernel.packet[1]); __m512d T2 = _mm512_unpacklo_pd(kernel.packet[2], kernel.packet[3]); __m512d T3 = _mm512_unpackhi_pd(kernel.packet[2], kernel.packet[3]); __m512d T4 = _mm512_unpacklo_pd(kernel.packet[4], kernel.packet[5]); __m512d T5 = _mm512_unpackhi_pd(kernel.packet[4], kernel.packet[5]); __m512d T6 = _mm512_unpacklo_pd(kernel.packet[6], kernel.packet[7]); __m512d T7 = _mm512_unpackhi_pd(kernel.packet[6], kernel.packet[7]); PacketBlock tmp; tmp.packet[0] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 0), _mm512_extractf64x4_pd(T2, 0), 0x20); tmp.packet[1] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 0), _mm512_extractf64x4_pd(T3, 0), 0x20); tmp.packet[2] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 0), _mm512_extractf64x4_pd(T2, 0), 0x31); tmp.packet[3] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 0), _mm512_extractf64x4_pd(T3, 0), 0x31); tmp.packet[4] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 1), _mm512_extractf64x4_pd(T2, 1), 0x20); tmp.packet[5] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 1), _mm512_extractf64x4_pd(T3, 1), 0x20); tmp.packet[6] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T0, 1), _mm512_extractf64x4_pd(T2, 1), 0x31); tmp.packet[7] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T1, 1), _mm512_extractf64x4_pd(T3, 1), 0x31); tmp.packet[8] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T4, 0), _mm512_extractf64x4_pd(T6, 0), 0x20); tmp.packet[9] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T5, 0), _mm512_extractf64x4_pd(T7, 0), 0x20); tmp.packet[10] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T4, 0), _mm512_extractf64x4_pd(T6, 0), 0x31); tmp.packet[11] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T5, 0), _mm512_extractf64x4_pd(T7, 0), 0x31); tmp.packet[12] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T4, 1), _mm512_extractf64x4_pd(T6, 1), 0x20); tmp.packet[13] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T5, 1), _mm512_extractf64x4_pd(T7, 1), 0x20); tmp.packet[14] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T4, 1), _mm512_extractf64x4_pd(T6, 1), 0x31); tmp.packet[15] = _mm256_permute2f128_pd(_mm512_extractf64x4_pd(T5, 1), _mm512_extractf64x4_pd(T7, 1), 0x31); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 0, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 1, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 2, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 3, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 4, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 5, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 6, 8); PACK_OUTPUT_SQ_D(kernel.packet, tmp.packet, 7, 8); } template <> EIGEN_STRONG_INLINE Packet16f pblend(const Selector<16>& /*ifPacket*/, const Packet16f& /*thenPacket*/, const Packet16f& /*elsePacket*/) { assert(false && "To be implemented"); return Packet16f(); } template <> EIGEN_STRONG_INLINE Packet8d pblend(const Selector<8>& ifPacket, const Packet8d& thenPacket, const Packet8d& elsePacket) { __mmask8 m = (ifPacket.select[0] ) | (ifPacket.select[1]<<1) | (ifPacket.select[2]<<2) | (ifPacket.select[3]<<3) | (ifPacket.select[4]<<4) | (ifPacket.select[5]<<5) | (ifPacket.select[6]<<6) | (ifPacket.select[7]<<7); return _mm512_mask_blend_pd(m, elsePacket, thenPacket); } // Packet math for Eigen::half template<> EIGEN_STRONG_INLINE Packet16h pset1(const Eigen::half& from) { return _mm256_set1_epi16(from.x); } template<> EIGEN_STRONG_INLINE Eigen::half pfirst(const Packet16h& from) { return half_impl::raw_uint16_to_half(static_cast(_mm256_extract_epi16(from, 0))); } template<> EIGEN_STRONG_INLINE Packet16h pload(const Eigen::half* from) { return _mm256_load_si256(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet16h ploadu(const Eigen::half* from) { return _mm256_loadu_si256(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const Packet16h& from) { // (void*) -> workaround clang warning: // cast from 'Eigen::half *' to '__m256i *' increases required alignment from 2 to 32 _mm256_store_si256((__m256i*)(void*)to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const Packet16h& from) { // (void*) -> workaround clang warning: // cast from 'Eigen::half *' to '__m256i *' increases required alignment from 2 to 32 _mm256_storeu_si256((__m256i*)(void*)to, from); } template<> EIGEN_STRONG_INLINE Packet16h ploaddup(const Eigen::half* from) { unsigned short a = from[0].x; unsigned short b = from[1].x; unsigned short c = from[2].x; unsigned short d = from[3].x; unsigned short e = from[4].x; unsigned short f = from[5].x; unsigned short g = from[6].x; unsigned short h = from[7].x; return _mm256_set_epi16(h, h, g, g, f, f, e, e, d, d, c, c, b, b, a, a); } template<> EIGEN_STRONG_INLINE Packet16h ploadquad(const Eigen::half* from) { unsigned short a = from[0].x; unsigned short b = from[1].x; unsigned short c = from[2].x; unsigned short d = from[3].x; return _mm256_set_epi16(d, d, d, d, c, c, c, c, b, b, b, b, a, a, a, a); } EIGEN_STRONG_INLINE Packet16f half2float(const Packet16h& a) { #ifdef EIGEN_HAS_FP16_C return _mm512_cvtph_ps(a); #else EIGEN_ALIGN64 half aux[16]; pstore(aux, a); float f0(aux[0]); float f1(aux[1]); float f2(aux[2]); float f3(aux[3]); float f4(aux[4]); float f5(aux[5]); float f6(aux[6]); float f7(aux[7]); float f8(aux[8]); float f9(aux[9]); float fa(aux[10]); float fb(aux[11]); float fc(aux[12]); float fd(aux[13]); float fe(aux[14]); float ff(aux[15]); return _mm512_set_ps( ff, fe, fd, fc, fb, fa, f9, f8, f7, f6, f5, f4, f3, f2, f1, f0); #endif } EIGEN_STRONG_INLINE Packet16h float2half(const Packet16f& a) { #ifdef EIGEN_HAS_FP16_C return _mm512_cvtps_ph(a, _MM_FROUND_TO_NEAREST_INT|_MM_FROUND_NO_EXC); #else EIGEN_ALIGN64 float aux[16]; pstore(aux, a); half h0(aux[0]); half h1(aux[1]); half h2(aux[2]); half h3(aux[3]); half h4(aux[4]); half h5(aux[5]); half h6(aux[6]); half h7(aux[7]); half h8(aux[8]); half h9(aux[9]); half ha(aux[10]); half hb(aux[11]); half hc(aux[12]); half hd(aux[13]); half he(aux[14]); half hf(aux[15]); return _mm256_set_epi16( hf.x, he.x, hd.x, hc.x, hb.x, ha.x, h9.x, h8.x, h7.x, h6.x, h5.x, h4.x, h3.x, h2.x, h1.x, h0.x); #endif } template<> EIGEN_STRONG_INLINE Packet16h ptrue(const Packet16h& a) { return ptrue(Packet8i(a)); } template <> EIGEN_STRONG_INLINE Packet16h pabs(const Packet16h& a) { const __m256i sign_mask = _mm256_set1_epi16(static_cast(0x8000)); return _mm256_andnot_si256(sign_mask, a); } template <> EIGEN_STRONG_INLINE Packet16h pmin(const Packet16h& a, const Packet16h& b) { return float2half(pmin(half2float(a), half2float(b))); } template <> EIGEN_STRONG_INLINE Packet16h pmax(const Packet16h& a, const Packet16h& b) { return float2half(pmax(half2float(a), half2float(b))); } template <> EIGEN_STRONG_INLINE Packet16h plset(const half& a) { return float2half(plset(static_cast(a))); } template<> EIGEN_STRONG_INLINE Packet16h por(const Packet16h& a,const Packet16h& b) { // in some cases Packet8i is a wrapper around __m256i, so we need to // cast to Packet8i to call the correct overload. return por(Packet8i(a),Packet8i(b)); } template<> EIGEN_STRONG_INLINE Packet16h pxor(const Packet16h& a,const Packet16h& b) { return pxor(Packet8i(a),Packet8i(b)); } template<> EIGEN_STRONG_INLINE Packet16h pand(const Packet16h& a,const Packet16h& b) { return pand(Packet8i(a),Packet8i(b)); } template<> EIGEN_STRONG_INLINE Packet16h pandnot(const Packet16h& a,const Packet16h& b) { return pandnot(Packet8i(a),Packet8i(b)); } template<> EIGEN_STRONG_INLINE Packet16h pselect(const Packet16h& mask, const Packet16h& a, const Packet16h& b) { return _mm256_blendv_epi8(b, a, mask); } template<> EIGEN_STRONG_INLINE Packet16h pround(const Packet16h& a) { return float2half(pround(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet16h print(const Packet16h& a) { return float2half(print(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet16h pceil(const Packet16h& a) { return float2half(pceil(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet16h pfloor(const Packet16h& a) { return float2half(pfloor(half2float(a))); } template<> EIGEN_STRONG_INLINE Packet16h pcmp_eq(const Packet16h& a,const Packet16h& b) { Packet16f af = half2float(a); Packet16f bf = half2float(b); return Pack32To16(pcmp_eq(af, bf)); } template<> EIGEN_STRONG_INLINE Packet16h pcmp_le(const Packet16h& a,const Packet16h& b) { return Pack32To16(pcmp_le(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet16h pcmp_lt(const Packet16h& a,const Packet16h& b) { return Pack32To16(pcmp_lt(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet16h pcmp_lt_or_nan(const Packet16h& a,const Packet16h& b) { return Pack32To16(pcmp_lt_or_nan(half2float(a), half2float(b))); } template<> EIGEN_STRONG_INLINE Packet16h pconj(const Packet16h& a) { return a; } template<> EIGEN_STRONG_INLINE Packet16h pnegate(const Packet16h& a) { Packet16h sign_mask = _mm256_set1_epi16(static_cast(0x8000)); return _mm256_xor_si256(a, sign_mask); } template<> EIGEN_STRONG_INLINE Packet16h padd(const Packet16h& a, const Packet16h& b) { Packet16f af = half2float(a); Packet16f bf = half2float(b); Packet16f rf = padd(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet16h psub(const Packet16h& a, const Packet16h& b) { Packet16f af = half2float(a); Packet16f bf = half2float(b); Packet16f rf = psub(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet16h pmul(const Packet16h& a, const Packet16h& b) { Packet16f af = half2float(a); Packet16f bf = half2float(b); Packet16f rf = pmul(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE Packet16h pdiv(const Packet16h& a, const Packet16h& b) { Packet16f af = half2float(a); Packet16f bf = half2float(b); Packet16f rf = pdiv(af, bf); return float2half(rf); } template<> EIGEN_STRONG_INLINE half predux(const Packet16h& from) { Packet16f from_float = half2float(from); return half(predux(from_float)); } template <> EIGEN_STRONG_INLINE Packet8h predux_half_dowto4(const Packet16h& a) { Packet8h lane0 = _mm256_extractf128_si256(a, 0); Packet8h lane1 = _mm256_extractf128_si256(a, 1); return padd(lane0, lane1); } template<> EIGEN_STRONG_INLINE Eigen::half predux_max(const Packet16h& a) { Packet16f af = half2float(a); float reduced = predux_max(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE Eigen::half predux_min(const Packet16h& a) { Packet16f af = half2float(a); float reduced = predux_min(af); return Eigen::half(reduced); } template<> EIGEN_STRONG_INLINE half predux_mul(const Packet16h& from) { Packet16f from_float = half2float(from); return half(predux_mul(from_float)); } template<> EIGEN_STRONG_INLINE Packet16h preverse(const Packet16h& a) { __m128i m = _mm_setr_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); return _mm256_insertf128_si256( _mm256_castsi128_si256(_mm_shuffle_epi8(_mm256_extractf128_si256(a,1),m)), _mm_shuffle_epi8(_mm256_extractf128_si256(a,0),m), 1); } template<> EIGEN_STRONG_INLINE Packet16h pgather(const Eigen::half* from, Index stride) { return _mm256_set_epi16( from[15*stride].x, from[14*stride].x, from[13*stride].x, from[12*stride].x, from[11*stride].x, from[10*stride].x, from[9*stride].x, from[8*stride].x, from[7*stride].x, from[6*stride].x, from[5*stride].x, from[4*stride].x, from[3*stride].x, from[2*stride].x, from[1*stride].x, from[0*stride].x); } template<> EIGEN_STRONG_INLINE void pscatter(half* to, const Packet16h& from, Index stride) { EIGEN_ALIGN64 half aux[16]; pstore(aux, from); to[stride*0] = aux[0]; to[stride*1] = aux[1]; to[stride*2] = aux[2]; to[stride*3] = aux[3]; to[stride*4] = aux[4]; to[stride*5] = aux[5]; to[stride*6] = aux[6]; to[stride*7] = aux[7]; to[stride*8] = aux[8]; to[stride*9] = aux[9]; to[stride*10] = aux[10]; to[stride*11] = aux[11]; to[stride*12] = aux[12]; to[stride*13] = aux[13]; to[stride*14] = aux[14]; to[stride*15] = aux[15]; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m256i a = kernel.packet[0]; __m256i b = kernel.packet[1]; __m256i c = kernel.packet[2]; __m256i d = kernel.packet[3]; __m256i e = kernel.packet[4]; __m256i f = kernel.packet[5]; __m256i g = kernel.packet[6]; __m256i h = kernel.packet[7]; __m256i i = kernel.packet[8]; __m256i j = kernel.packet[9]; __m256i k = kernel.packet[10]; __m256i l = kernel.packet[11]; __m256i m = kernel.packet[12]; __m256i n = kernel.packet[13]; __m256i o = kernel.packet[14]; __m256i p = kernel.packet[15]; __m256i ab_07 = _mm256_unpacklo_epi16(a, b); __m256i cd_07 = _mm256_unpacklo_epi16(c, d); __m256i ef_07 = _mm256_unpacklo_epi16(e, f); __m256i gh_07 = _mm256_unpacklo_epi16(g, h); __m256i ij_07 = _mm256_unpacklo_epi16(i, j); __m256i kl_07 = _mm256_unpacklo_epi16(k, l); __m256i mn_07 = _mm256_unpacklo_epi16(m, n); __m256i op_07 = _mm256_unpacklo_epi16(o, p); __m256i ab_8f = _mm256_unpackhi_epi16(a, b); __m256i cd_8f = _mm256_unpackhi_epi16(c, d); __m256i ef_8f = _mm256_unpackhi_epi16(e, f); __m256i gh_8f = _mm256_unpackhi_epi16(g, h); __m256i ij_8f = _mm256_unpackhi_epi16(i, j); __m256i kl_8f = _mm256_unpackhi_epi16(k, l); __m256i mn_8f = _mm256_unpackhi_epi16(m, n); __m256i op_8f = _mm256_unpackhi_epi16(o, p); __m256i abcd_03 = _mm256_unpacklo_epi32(ab_07, cd_07); __m256i abcd_47 = _mm256_unpackhi_epi32(ab_07, cd_07); __m256i efgh_03 = _mm256_unpacklo_epi32(ef_07, gh_07); __m256i efgh_47 = _mm256_unpackhi_epi32(ef_07, gh_07); __m256i ijkl_03 = _mm256_unpacklo_epi32(ij_07, kl_07); __m256i ijkl_47 = _mm256_unpackhi_epi32(ij_07, kl_07); __m256i mnop_03 = _mm256_unpacklo_epi32(mn_07, op_07); __m256i mnop_47 = _mm256_unpackhi_epi32(mn_07, op_07); __m256i abcd_8b = _mm256_unpacklo_epi32(ab_8f, cd_8f); __m256i abcd_cf = _mm256_unpackhi_epi32(ab_8f, cd_8f); __m256i efgh_8b = _mm256_unpacklo_epi32(ef_8f, gh_8f); __m256i efgh_cf = _mm256_unpackhi_epi32(ef_8f, gh_8f); __m256i ijkl_8b = _mm256_unpacklo_epi32(ij_8f, kl_8f); __m256i ijkl_cf = _mm256_unpackhi_epi32(ij_8f, kl_8f); __m256i mnop_8b = _mm256_unpacklo_epi32(mn_8f, op_8f); __m256i mnop_cf = _mm256_unpackhi_epi32(mn_8f, op_8f); __m256i abcdefgh_01 = _mm256_unpacklo_epi64(abcd_03, efgh_03); __m256i abcdefgh_23 = _mm256_unpackhi_epi64(abcd_03, efgh_03); __m256i ijklmnop_01 = _mm256_unpacklo_epi64(ijkl_03, mnop_03); __m256i ijklmnop_23 = _mm256_unpackhi_epi64(ijkl_03, mnop_03); __m256i abcdefgh_45 = _mm256_unpacklo_epi64(abcd_47, efgh_47); __m256i abcdefgh_67 = _mm256_unpackhi_epi64(abcd_47, efgh_47); __m256i ijklmnop_45 = _mm256_unpacklo_epi64(ijkl_47, mnop_47); __m256i ijklmnop_67 = _mm256_unpackhi_epi64(ijkl_47, mnop_47); __m256i abcdefgh_89 = _mm256_unpacklo_epi64(abcd_8b, efgh_8b); __m256i abcdefgh_ab = _mm256_unpackhi_epi64(abcd_8b, efgh_8b); __m256i ijklmnop_89 = _mm256_unpacklo_epi64(ijkl_8b, mnop_8b); __m256i ijklmnop_ab = _mm256_unpackhi_epi64(ijkl_8b, mnop_8b); __m256i abcdefgh_cd = _mm256_unpacklo_epi64(abcd_cf, efgh_cf); __m256i abcdefgh_ef = _mm256_unpackhi_epi64(abcd_cf, efgh_cf); __m256i ijklmnop_cd = _mm256_unpacklo_epi64(ijkl_cf, mnop_cf); __m256i ijklmnop_ef = _mm256_unpackhi_epi64(ijkl_cf, mnop_cf); // NOTE: no unpacklo/hi instr in this case, so using permute instr. __m256i a_p_0 = _mm256_permute2x128_si256(abcdefgh_01, ijklmnop_01, 0x20); __m256i a_p_1 = _mm256_permute2x128_si256(abcdefgh_23, ijklmnop_23, 0x20); __m256i a_p_2 = _mm256_permute2x128_si256(abcdefgh_45, ijklmnop_45, 0x20); __m256i a_p_3 = _mm256_permute2x128_si256(abcdefgh_67, ijklmnop_67, 0x20); __m256i a_p_4 = _mm256_permute2x128_si256(abcdefgh_89, ijklmnop_89, 0x20); __m256i a_p_5 = _mm256_permute2x128_si256(abcdefgh_ab, ijklmnop_ab, 0x20); __m256i a_p_6 = _mm256_permute2x128_si256(abcdefgh_cd, ijklmnop_cd, 0x20); __m256i a_p_7 = _mm256_permute2x128_si256(abcdefgh_ef, ijklmnop_ef, 0x20); __m256i a_p_8 = _mm256_permute2x128_si256(abcdefgh_01, ijklmnop_01, 0x31); __m256i a_p_9 = _mm256_permute2x128_si256(abcdefgh_23, ijklmnop_23, 0x31); __m256i a_p_a = _mm256_permute2x128_si256(abcdefgh_45, ijklmnop_45, 0x31); __m256i a_p_b = _mm256_permute2x128_si256(abcdefgh_67, ijklmnop_67, 0x31); __m256i a_p_c = _mm256_permute2x128_si256(abcdefgh_89, ijklmnop_89, 0x31); __m256i a_p_d = _mm256_permute2x128_si256(abcdefgh_ab, ijklmnop_ab, 0x31); __m256i a_p_e = _mm256_permute2x128_si256(abcdefgh_cd, ijklmnop_cd, 0x31); __m256i a_p_f = _mm256_permute2x128_si256(abcdefgh_ef, ijklmnop_ef, 0x31); kernel.packet[0] = a_p_0; kernel.packet[1] = a_p_1; kernel.packet[2] = a_p_2; kernel.packet[3] = a_p_3; kernel.packet[4] = a_p_4; kernel.packet[5] = a_p_5; kernel.packet[6] = a_p_6; kernel.packet[7] = a_p_7; kernel.packet[8] = a_p_8; kernel.packet[9] = a_p_9; kernel.packet[10] = a_p_a; kernel.packet[11] = a_p_b; kernel.packet[12] = a_p_c; kernel.packet[13] = a_p_d; kernel.packet[14] = a_p_e; kernel.packet[15] = a_p_f; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { EIGEN_ALIGN64 half in[8][16]; pstore(in[0], kernel.packet[0]); pstore(in[1], kernel.packet[1]); pstore(in[2], kernel.packet[2]); pstore(in[3], kernel.packet[3]); pstore(in[4], kernel.packet[4]); pstore(in[5], kernel.packet[5]); pstore(in[6], kernel.packet[6]); pstore(in[7], kernel.packet[7]); EIGEN_ALIGN64 half out[8][16]; for (int i = 0; i < 8; ++i) { for (int j = 0; j < 8; ++j) { out[i][j] = in[j][2*i]; } for (int j = 0; j < 8; ++j) { out[i][j+8] = in[j][2*i+1]; } } kernel.packet[0] = pload(out[0]); kernel.packet[1] = pload(out[1]); kernel.packet[2] = pload(out[2]); kernel.packet[3] = pload(out[3]); kernel.packet[4] = pload(out[4]); kernel.packet[5] = pload(out[5]); kernel.packet[6] = pload(out[6]); kernel.packet[7] = pload(out[7]); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { EIGEN_ALIGN64 half in[4][16]; pstore(in[0], kernel.packet[0]); pstore(in[1], kernel.packet[1]); pstore(in[2], kernel.packet[2]); pstore(in[3], kernel.packet[3]); EIGEN_ALIGN64 half out[4][16]; for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { out[i][j] = in[j][4*i]; } for (int j = 0; j < 4; ++j) { out[i][j+4] = in[j][4*i+1]; } for (int j = 0; j < 4; ++j) { out[i][j+8] = in[j][4*i+2]; } for (int j = 0; j < 4; ++j) { out[i][j+12] = in[j][4*i+3]; } } kernel.packet[0] = pload(out[0]); kernel.packet[1] = pload(out[1]); kernel.packet[2] = pload(out[2]); kernel.packet[3] = pload(out[3]); } template <> struct is_arithmetic { enum { value = true }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16bf type; typedef Packet8bf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 1, HasBlend = 0, HasInsert = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, #if EIGEN_GNUC_AT_LEAST(5, 3) || (!EIGEN_COMP_GNUC_STRICT) #ifdef EIGEN_VECTORIZE_AVX512DQ HasLog = 1, // Currently fails test with bad accuracy. HasLog1p = 1, HasExpm1 = 1, HasNdtri = 1, HasBessel = 1, #endif HasExp = 1, HasSqrt = EIGEN_FAST_MATH, HasRsqrt = EIGEN_FAST_MATH, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, #endif HasCmp = 1, HasDiv = 1 }; }; template <> struct unpacket_traits { typedef bfloat16 type; enum {size=16, alignment=Aligned32, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet8bf half; }; template <> EIGEN_STRONG_INLINE Packet16bf pset1(const bfloat16& from) { return _mm256_set1_epi16(from.value); } template <> EIGEN_STRONG_INLINE bfloat16 pfirst(const Packet16bf& from) { bfloat16 t; t.value = static_cast(_mm256_extract_epi16(from, 0)); return t; } template <> EIGEN_STRONG_INLINE Packet16bf pload(const bfloat16* from) { return _mm256_load_si256(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet16bf ploadu(const bfloat16* from) { return _mm256_loadu_si256(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE void pstore(bfloat16* to, const Packet16bf& from) { _mm256_store_si256(reinterpret_cast<__m256i*>(to), from); } template <> EIGEN_STRONG_INLINE void pstoreu(bfloat16* to, const Packet16bf& from) { _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); } template<> EIGEN_STRONG_INLINE Packet16bf ploaddup(const bfloat16* from) { Packet16bf r; unsigned short a = from[0].value; unsigned short b = from[1].value; unsigned short c = from[2].value; unsigned short d = from[3].value; unsigned short e = from[4].value; unsigned short f = from[5].value; unsigned short g = from[6].value; unsigned short h = from[7].value; return _mm256_set_epi16(h, h, g, g, f, f, e, e, d, d, c, c, b, b, a, a); } template<> EIGEN_STRONG_INLINE Packet16bf ploadquad(const bfloat16* from) { Packet16bf r; unsigned short a = from[0].value; unsigned short b = from[1].value; unsigned short c = from[2].value; unsigned short d = from[3].value; return _mm256_set_epi16(d, d, d, d, c, c, c, c, b, b, b, b, a, a, a, a); } EIGEN_STRONG_INLINE Packet16f Bf16ToF32(const Packet16bf& a) { return _mm512_castsi512_ps(_mm512_slli_epi32(_mm512_cvtepu16_epi32(a), 16)); } // Convert float to bfloat16 according to round-to-nearest-even/denormals algorithm. EIGEN_STRONG_INLINE Packet16bf F32ToBf16(const Packet16f& a) { Packet16bf r; #if defined(EIGEN_VECTORIZE_AVX512BF16) && EIGEN_GNUC_AT_LEAST(10, 1) // Since GCC 10.1 supports avx512bf16 and C style explicit cast // (C++ static_cast is not supported yet), do converion via intrinsic // and register path for performance. r = (__m256i)(_mm512_cvtneps_pbh(a)); #else __m512i t; __m512i input = _mm512_castps_si512(a); __m512i nan = _mm512_set1_epi32(0x7fc0); // uint32_t lsb = (input >> 16) & 1; t = _mm512_and_si512(_mm512_srli_epi32(input, 16), _mm512_set1_epi32(1)); // uint32_t rounding_bias = 0x7fff + lsb; t = _mm512_add_epi32(t, _mm512_set1_epi32(0x7fff)); // input += rounding_bias; t = _mm512_add_epi32(t, input); // input = input >> 16; t = _mm512_srli_epi32(t, 16); // Check NaN before converting back to bf16 __mmask16 mask = _mm512_cmp_ps_mask(a, a, _CMP_ORD_Q); t = _mm512_mask_blend_epi32(mask, nan, t); // output.value = static_cast(input); r = _mm512_cvtepi32_epi16(t); #endif // EIGEN_VECTORIZE_AVX512BF16 return r; } template <> EIGEN_STRONG_INLINE Packet16bf ptrue(const Packet16bf& a) { return ptrue(a); } template <> EIGEN_STRONG_INLINE Packet16bf por(const Packet16bf& a, const Packet16bf& b) { return por(a, b); } template <> EIGEN_STRONG_INLINE Packet16bf pxor(const Packet16bf& a, const Packet16bf& b) { return pxor(a, b); } template <> EIGEN_STRONG_INLINE Packet16bf pand(const Packet16bf& a, const Packet16bf& b) { return pand(a, b); } template <> EIGEN_STRONG_INLINE Packet16bf pandnot(const Packet16bf& a, const Packet16bf& b) { return pandnot(a, b); } template <> EIGEN_STRONG_INLINE Packet16bf pselect(const Packet16bf& mask, const Packet16bf& a, const Packet16bf& b) { // Input mask is expected to be all 0/1, handle it with 8-bit // intrinsic for performance. return _mm256_blendv_epi8(b, a, mask); } template<> EIGEN_STRONG_INLINE Packet16bf pround(const Packet16bf& a) { return F32ToBf16(pround(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet16bf print(const Packet16bf& a) { return F32ToBf16(print(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet16bf pceil(const Packet16bf& a) { return F32ToBf16(pceil(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet16bf pfloor(const Packet16bf& a) { return F32ToBf16(pfloor(Bf16ToF32(a))); } template <> EIGEN_STRONG_INLINE Packet16bf pcmp_eq(const Packet16bf& a, const Packet16bf& b) { return Pack32To16(pcmp_eq(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pcmp_le(const Packet16bf& a, const Packet16bf& b) { return Pack32To16(pcmp_le(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pcmp_lt(const Packet16bf& a, const Packet16bf& b) { return Pack32To16(pcmp_lt(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pcmp_lt_or_nan(const Packet16bf& a, const Packet16bf& b) { return Pack32To16(pcmp_lt_or_nan(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pnegate(const Packet16bf& a) { Packet16bf sign_mask = _mm256_set1_epi16(static_cast(0x8000)); return _mm256_xor_si256(a, sign_mask); } template <> EIGEN_STRONG_INLINE Packet16bf pconj(const Packet16bf& a) { return a; } template <> EIGEN_STRONG_INLINE Packet16bf pabs(const Packet16bf& a) { const __m256i sign_mask = _mm256_set1_epi16(static_cast(0x8000)); return _mm256_andnot_si256(sign_mask, a); } template <> EIGEN_STRONG_INLINE Packet16bf padd(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(padd(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf psub(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(psub(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pmul(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(pmul(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pdiv(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(pdiv(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pmin(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(pmin(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf pmax(const Packet16bf& a, const Packet16bf& b) { return F32ToBf16(pmax(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet16bf plset(const bfloat16& a) { return F32ToBf16(plset(static_cast(a))); } template <> EIGEN_STRONG_INLINE Packet8bf predux_half_dowto4(const Packet16bf& a) { Packet8bf lane0 = _mm256_extractf128_si256(a, 0); Packet8bf lane1 = _mm256_extractf128_si256(a, 1); return padd(lane0, lane1); } template <> EIGEN_STRONG_INLINE bfloat16 predux(const Packet16bf& p) { return static_cast(predux(Bf16ToF32(p))); } template <> EIGEN_STRONG_INLINE bfloat16 predux_mul(const Packet16bf& from) { return static_cast(predux_mul(Bf16ToF32(from))); } template <> EIGEN_STRONG_INLINE bfloat16 predux_min(const Packet16bf& from) { return static_cast(predux_min(Bf16ToF32(from))); } template <> EIGEN_STRONG_INLINE bfloat16 predux_max(const Packet16bf& from) { return static_cast(predux_max(Bf16ToF32(from))); } template <> EIGEN_STRONG_INLINE Packet16bf preverse(const Packet16bf& a) { __m256i m = _mm256_setr_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1, 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); Packet16bf res; // Swap hi and lo first because shuffle is in 128-bit lanes. res = _mm256_permute2x128_si256(a, a, 1); // Shuffle 8-bit values in src within 2*128-bit lanes. return _mm256_shuffle_epi8(res, m); } template <> EIGEN_STRONG_INLINE Packet16bf pgather(const bfloat16* from, Index stride) { return _mm256_set_epi16( from[15*stride].value, from[14*stride].value, from[13*stride].value, from[12*stride].value, from[11*stride].value, from[10*stride].value, from[9*stride].value, from[8*stride].value, from[7*stride].value, from[6*stride].value, from[5*stride].value, from[4*stride].value, from[3*stride].value, from[2*stride].value, from[1*stride].value, from[0*stride].value); } template <> EIGEN_STRONG_INLINE void pscatter(bfloat16* to, const Packet16bf& from, Index stride) { EIGEN_ALIGN64 bfloat16 aux[16]; pstore(aux, from); to[stride*0] = aux[0]; to[stride*1] = aux[1]; to[stride*2] = aux[2]; to[stride*3] = aux[3]; to[stride*4] = aux[4]; to[stride*5] = aux[5]; to[stride*6] = aux[6]; to[stride*7] = aux[7]; to[stride*8] = aux[8]; to[stride*9] = aux[9]; to[stride*10] = aux[10]; to[stride*11] = aux[11]; to[stride*12] = aux[12]; to[stride*13] = aux[13]; to[stride*14] = aux[14]; to[stride*15] = aux[15]; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m256i a = kernel.packet[0]; __m256i b = kernel.packet[1]; __m256i c = kernel.packet[2]; __m256i d = kernel.packet[3]; __m256i e = kernel.packet[4]; __m256i f = kernel.packet[5]; __m256i g = kernel.packet[6]; __m256i h = kernel.packet[7]; __m256i i = kernel.packet[8]; __m256i j = kernel.packet[9]; __m256i k = kernel.packet[10]; __m256i l = kernel.packet[11]; __m256i m = kernel.packet[12]; __m256i n = kernel.packet[13]; __m256i o = kernel.packet[14]; __m256i p = kernel.packet[15]; __m256i ab_07 = _mm256_unpacklo_epi16(a, b); __m256i cd_07 = _mm256_unpacklo_epi16(c, d); __m256i ef_07 = _mm256_unpacklo_epi16(e, f); __m256i gh_07 = _mm256_unpacklo_epi16(g, h); __m256i ij_07 = _mm256_unpacklo_epi16(i, j); __m256i kl_07 = _mm256_unpacklo_epi16(k, l); __m256i mn_07 = _mm256_unpacklo_epi16(m, n); __m256i op_07 = _mm256_unpacklo_epi16(o, p); __m256i ab_8f = _mm256_unpackhi_epi16(a, b); __m256i cd_8f = _mm256_unpackhi_epi16(c, d); __m256i ef_8f = _mm256_unpackhi_epi16(e, f); __m256i gh_8f = _mm256_unpackhi_epi16(g, h); __m256i ij_8f = _mm256_unpackhi_epi16(i, j); __m256i kl_8f = _mm256_unpackhi_epi16(k, l); __m256i mn_8f = _mm256_unpackhi_epi16(m, n); __m256i op_8f = _mm256_unpackhi_epi16(o, p); __m256i abcd_03 = _mm256_unpacklo_epi32(ab_07, cd_07); __m256i abcd_47 = _mm256_unpackhi_epi32(ab_07, cd_07); __m256i efgh_03 = _mm256_unpacklo_epi32(ef_07, gh_07); __m256i efgh_47 = _mm256_unpackhi_epi32(ef_07, gh_07); __m256i ijkl_03 = _mm256_unpacklo_epi32(ij_07, kl_07); __m256i ijkl_47 = _mm256_unpackhi_epi32(ij_07, kl_07); __m256i mnop_03 = _mm256_unpacklo_epi32(mn_07, op_07); __m256i mnop_47 = _mm256_unpackhi_epi32(mn_07, op_07); __m256i abcd_8b = _mm256_unpacklo_epi32(ab_8f, cd_8f); __m256i abcd_cf = _mm256_unpackhi_epi32(ab_8f, cd_8f); __m256i efgh_8b = _mm256_unpacklo_epi32(ef_8f, gh_8f); __m256i efgh_cf = _mm256_unpackhi_epi32(ef_8f, gh_8f); __m256i ijkl_8b = _mm256_unpacklo_epi32(ij_8f, kl_8f); __m256i ijkl_cf = _mm256_unpackhi_epi32(ij_8f, kl_8f); __m256i mnop_8b = _mm256_unpacklo_epi32(mn_8f, op_8f); __m256i mnop_cf = _mm256_unpackhi_epi32(mn_8f, op_8f); __m256i abcdefgh_01 = _mm256_unpacklo_epi64(abcd_03, efgh_03); __m256i abcdefgh_23 = _mm256_unpackhi_epi64(abcd_03, efgh_03); __m256i ijklmnop_01 = _mm256_unpacklo_epi64(ijkl_03, mnop_03); __m256i ijklmnop_23 = _mm256_unpackhi_epi64(ijkl_03, mnop_03); __m256i abcdefgh_45 = _mm256_unpacklo_epi64(abcd_47, efgh_47); __m256i abcdefgh_67 = _mm256_unpackhi_epi64(abcd_47, efgh_47); __m256i ijklmnop_45 = _mm256_unpacklo_epi64(ijkl_47, mnop_47); __m256i ijklmnop_67 = _mm256_unpackhi_epi64(ijkl_47, mnop_47); __m256i abcdefgh_89 = _mm256_unpacklo_epi64(abcd_8b, efgh_8b); __m256i abcdefgh_ab = _mm256_unpackhi_epi64(abcd_8b, efgh_8b); __m256i ijklmnop_89 = _mm256_unpacklo_epi64(ijkl_8b, mnop_8b); __m256i ijklmnop_ab = _mm256_unpackhi_epi64(ijkl_8b, mnop_8b); __m256i abcdefgh_cd = _mm256_unpacklo_epi64(abcd_cf, efgh_cf); __m256i abcdefgh_ef = _mm256_unpackhi_epi64(abcd_cf, efgh_cf); __m256i ijklmnop_cd = _mm256_unpacklo_epi64(ijkl_cf, mnop_cf); __m256i ijklmnop_ef = _mm256_unpackhi_epi64(ijkl_cf, mnop_cf); // NOTE: no unpacklo/hi instr in this case, so using permute instr. kernel.packet[0] = _mm256_permute2x128_si256(abcdefgh_01, ijklmnop_01, 0x20); kernel.packet[1] = _mm256_permute2x128_si256(abcdefgh_23, ijklmnop_23, 0x20); kernel.packet[2] = _mm256_permute2x128_si256(abcdefgh_45, ijklmnop_45, 0x20); kernel.packet[3] = _mm256_permute2x128_si256(abcdefgh_67, ijklmnop_67, 0x20); kernel.packet[4] = _mm256_permute2x128_si256(abcdefgh_89, ijklmnop_89, 0x20); kernel.packet[5] = _mm256_permute2x128_si256(abcdefgh_ab, ijklmnop_ab, 0x20); kernel.packet[6] = _mm256_permute2x128_si256(abcdefgh_cd, ijklmnop_cd, 0x20); kernel.packet[7] = _mm256_permute2x128_si256(abcdefgh_ef, ijklmnop_ef, 0x20); kernel.packet[8] = _mm256_permute2x128_si256(abcdefgh_01, ijklmnop_01, 0x31); kernel.packet[9] = _mm256_permute2x128_si256(abcdefgh_23, ijklmnop_23, 0x31); kernel.packet[10] = _mm256_permute2x128_si256(abcdefgh_45, ijklmnop_45, 0x31); kernel.packet[11] = _mm256_permute2x128_si256(abcdefgh_67, ijklmnop_67, 0x31); kernel.packet[12] = _mm256_permute2x128_si256(abcdefgh_89, ijklmnop_89, 0x31); kernel.packet[13] = _mm256_permute2x128_si256(abcdefgh_ab, ijklmnop_ab, 0x31); kernel.packet[14] = _mm256_permute2x128_si256(abcdefgh_cd, ijklmnop_cd, 0x31); kernel.packet[15] = _mm256_permute2x128_si256(abcdefgh_ef, ijklmnop_ef, 0x31); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { __m256i a = kernel.packet[0]; __m256i b = kernel.packet[1]; __m256i c = kernel.packet[2]; __m256i d = kernel.packet[3]; __m256i ab_07 = _mm256_unpacklo_epi16(a, b); __m256i cd_07 = _mm256_unpacklo_epi16(c, d); __m256i ab_8f = _mm256_unpackhi_epi16(a, b); __m256i cd_8f = _mm256_unpackhi_epi16(c, d); __m256i abcd_03 = _mm256_unpacklo_epi32(ab_07, cd_07); __m256i abcd_47 = _mm256_unpackhi_epi32(ab_07, cd_07); __m256i abcd_8b = _mm256_unpacklo_epi32(ab_8f, cd_8f); __m256i abcd_cf = _mm256_unpackhi_epi32(ab_8f, cd_8f); // NOTE: no unpacklo/hi instr in this case, so using permute instr. kernel.packet[0] = _mm256_permute2x128_si256(abcd_03, abcd_47, 0x20); kernel.packet[1] = _mm256_permute2x128_si256(abcd_8b, abcd_cf, 0x20); kernel.packet[2] = _mm256_permute2x128_si256(abcd_03, abcd_47, 0x31); kernel.packet[3] = _mm256_permute2x128_si256(abcd_8b, abcd_cf, 0x31); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_AVX512_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX512/TypeCasting.h0000644000176200001440000000412614562417221023451 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2019 Rasmus Munk Larsen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_AVX512_H #define EIGEN_TYPE_CASTING_AVX512_H namespace Eigen { namespace internal { template<> EIGEN_STRONG_INLINE Packet16i pcast(const Packet16f& a) { return _mm512_cvttps_epi32(a); } template<> EIGEN_STRONG_INLINE Packet16f pcast(const Packet16i& a) { return _mm512_cvtepi32_ps(a); } template<> EIGEN_STRONG_INLINE Packet16i preinterpret(const Packet16f& a) { return _mm512_castps_si512(a); } template<> EIGEN_STRONG_INLINE Packet16f preinterpret(const Packet16i& a) { return _mm512_castsi512_ps(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet16f pcast(const Packet16h& a) { return half2float(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet16h pcast(const Packet16f& a) { return float2half(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet16f pcast(const Packet16bf& a) { return Bf16ToF32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet16bf pcast(const Packet16f& a) { return F32ToBf16(a); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_AVX512_H RcppEigen/inst/include/Eigen/src/Core/arch/AVX512/Complex.h0000644000176200001440000004141014562417221022623 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2018 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_AVX512_H #define EIGEN_COMPLEX_AVX512_H namespace Eigen { namespace internal { //---------- float ---------- struct Packet8cf { EIGEN_STRONG_INLINE Packet8cf() {} EIGEN_STRONG_INLINE explicit Packet8cf(const __m512& a) : v(a) {} __m512 v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet8cf type; typedef Packet4cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; typedef Packet4cf half; typedef Packet16f as_real; enum { size = 8, alignment=unpacket_traits::alignment, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet8cf ptrue(const Packet8cf& a) { return Packet8cf(ptrue(Packet16f(a.v))); } template<> EIGEN_STRONG_INLINE Packet8cf padd(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_add_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet8cf psub(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(_mm512_sub_ps(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet8cf pnegate(const Packet8cf& a) { return Packet8cf(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet8cf pconj(const Packet8cf& a) { const __m512 mask = _mm512_castsi512_ps(_mm512_setr_epi32( 0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000, 0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000)); return Packet8cf(pxor(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet8cf pmul(const Packet8cf& a, const Packet8cf& b) { __m512 tmp2 = _mm512_mul_ps(_mm512_movehdup_ps(a.v), _mm512_permute_ps(b.v, _MM_SHUFFLE(2,3,0,1))); return Packet8cf(_mm512_fmaddsub_ps(_mm512_moveldup_ps(a.v), b.v, tmp2)); } template<> EIGEN_STRONG_INLINE Packet8cf pand (const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pand(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet8cf por (const Packet8cf& a, const Packet8cf& b) { return Packet8cf(por(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet8cf pxor (const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pxor(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet8cf pandnot(const Packet8cf& a, const Packet8cf& b) { return Packet8cf(pandnot(a.v,b.v)); } template <> EIGEN_STRONG_INLINE Packet8cf pcmp_eq(const Packet8cf& a, const Packet8cf& b) { __m512 eq = pcmp_eq(a.v, b.v); return Packet8cf(pand(eq, _mm512_permute_ps(eq, 0xB1))); } template<> EIGEN_STRONG_INLINE Packet8cf pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet8cf(pload(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet8cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet8cf(ploadu(&numext::real_ref(*from))); } template<> EIGEN_STRONG_INLINE Packet8cf pset1(const std::complex& from) { return Packet8cf(_mm512_castpd_ps(pload1((const double*)(const void*)&from))); } template<> EIGEN_STRONG_INLINE Packet8cf ploaddup(const std::complex* from) { return Packet8cf( _mm512_castpd_ps( ploaddup((const double*)(const void*)from )) ); } template<> EIGEN_STRONG_INLINE Packet8cf ploadquad(const std::complex* from) { return Packet8cf( _mm512_castpd_ps( ploadquad((const double*)(const void*)from )) ); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex* to, const Packet8cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex* to, const Packet8cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), from.v); } template<> EIGEN_DEVICE_FUNC inline Packet8cf pgather, Packet8cf>(const std::complex* from, Index stride) { return Packet8cf(_mm512_castpd_ps(pgather((const double*)(const void*)from, stride))); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet8cf>(std::complex* to, const Packet8cf& from, Index stride) { pscatter((double*)(void*)to, _mm512_castps_pd(from.v), stride); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet8cf& a) { return pfirst(Packet2cf(_mm512_castps512_ps128(a.v))); } template<> EIGEN_STRONG_INLINE Packet8cf preverse(const Packet8cf& a) { return Packet8cf(_mm512_castsi512_ps( _mm512_permutexvar_epi64( _mm512_set_epi32(0, 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7), _mm512_castps_si512(a.v)))); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet8cf& a) { return predux(padd(Packet4cf(extract256<0>(a.v)), Packet4cf(extract256<1>(a.v)))); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet8cf& a) { return predux_mul(pmul(Packet4cf(extract256<0>(a.v)), Packet4cf(extract256<1>(a.v)))); } template <> EIGEN_STRONG_INLINE Packet4cf predux_half_dowto4(const Packet8cf& a) { __m256 lane0 = extract256<0>(a.v); __m256 lane1 = extract256<1>(a.v); __m256 res = _mm256_add_ps(lane0, lane1); return Packet4cf(res); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet8cf,Packet16f) template<> EIGEN_STRONG_INLINE Packet8cf pdiv(const Packet8cf& a, const Packet8cf& b) { Packet8cf num = pmul(a, pconj(b)); __m512 tmp = _mm512_mul_ps(b.v, b.v); __m512 tmp2 = _mm512_shuffle_ps(tmp,tmp,0xB1); __m512 denom = _mm512_add_ps(tmp, tmp2); return Packet8cf(_mm512_div_ps(num.v, denom)); } template<> EIGEN_STRONG_INLINE Packet8cf pcplxflip(const Packet8cf& x) { return Packet8cf(_mm512_shuffle_ps(x.v, x.v, _MM_SHUFFLE(2, 3, 0 ,1))); } //---------- double ---------- struct Packet4cd { EIGEN_STRONG_INLINE Packet4cd() {} EIGEN_STRONG_INLINE explicit Packet4cd(const __m512d& a) : v(a) {} __m512d v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet4cd type; typedef Packet2cd half; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasSqrt = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; typedef Packet2cd half; typedef Packet8d as_real; enum { size = 4, alignment = unpacket_traits::alignment, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet4cd padd(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_add_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cd psub(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(_mm512_sub_pd(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cd pnegate(const Packet4cd& a) { return Packet4cd(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet4cd pconj(const Packet4cd& a) { const __m512d mask = _mm512_castsi512_pd( _mm512_set_epi32(0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0, 0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0)); return Packet4cd(pxor(a.v,mask)); } template<> EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) { __m512d tmp1 = _mm512_shuffle_pd(a.v,a.v,0x0); __m512d tmp2 = _mm512_shuffle_pd(a.v,a.v,0xFF); __m512d tmp3 = _mm512_shuffle_pd(b.v,b.v,0x55); __m512d odd = _mm512_mul_pd(tmp2, tmp3); return Packet4cd(_mm512_fmaddsub_pd(tmp1, b.v, odd)); } template<> EIGEN_STRONG_INLINE Packet4cd ptrue(const Packet4cd& a) { return Packet4cd(ptrue(Packet8d(a.v))); } template<> EIGEN_STRONG_INLINE Packet4cd pand (const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pand(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cd por (const Packet4cd& a, const Packet4cd& b) { return Packet4cd(por(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cd pxor (const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pxor(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet4cd pandnot(const Packet4cd& a, const Packet4cd& b) { return Packet4cd(pandnot(a.v,b.v)); } template <> EIGEN_STRONG_INLINE Packet4cd pcmp_eq(const Packet4cd& a, const Packet4cd& b) { __m512d eq = pcmp_eq(a.v, b.v); return Packet4cd(pand(eq, _mm512_permute_pd(eq, 0x55))); } template<> EIGEN_STRONG_INLINE Packet4cd pload (const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet4cd(pload((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet4cd ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet4cd(ploadu((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet4cd pset1(const std::complex& from) { #ifdef EIGEN_VECTORIZE_AVX512DQ return Packet4cd(_mm512_broadcast_f64x2(pset1(from).v)); #else return Packet4cd(_mm512_castps_pd(_mm512_broadcast_f32x4( _mm_castpd_ps(pset1(from).v)))); #endif } template<> EIGEN_STRONG_INLINE Packet4cd ploaddup(const std::complex* from) { return Packet4cd(_mm512_insertf64x4( _mm512_castpd256_pd512(ploaddup(from).v), ploaddup(from+1).v, 1)); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet4cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet4cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); } template<> EIGEN_DEVICE_FUNC inline Packet4cd pgather, Packet4cd>(const std::complex* from, Index stride) { return Packet4cd(_mm512_insertf64x4(_mm512_castpd256_pd512( _mm256_insertf128_pd(_mm256_castpd128_pd256(ploadu(from+0*stride).v), ploadu(from+1*stride).v,1)), _mm256_insertf128_pd(_mm256_castpd128_pd256(ploadu(from+2*stride).v), ploadu(from+3*stride).v,1), 1)); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet4cd>(std::complex* to, const Packet4cd& from, Index stride) { __m512i fromi = _mm512_castpd_si512(from.v); double* tod = (double*)(void*)to; _mm_storeu_pd(tod+0*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,0)) ); _mm_storeu_pd(tod+2*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,1)) ); _mm_storeu_pd(tod+4*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,2)) ); _mm_storeu_pd(tod+6*stride, _mm_castsi128_pd(_mm512_extracti32x4_epi32(fromi,3)) ); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet4cd& a) { __m128d low = extract128<0>(a.v); EIGEN_ALIGN16 double res[2]; _mm_store_pd(res, low); return std::complex(res[0],res[1]); } template<> EIGEN_STRONG_INLINE Packet4cd preverse(const Packet4cd& a) { return Packet4cd(_mm512_shuffle_f64x2(a.v, a.v, (shuffle_mask<3,2,1,0>::mask))); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet4cd& a) { return predux(padd(Packet2cd(_mm512_extractf64x4_pd(a.v,0)), Packet2cd(_mm512_extractf64x4_pd(a.v,1)))); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet4cd& a) { return predux_mul(pmul(Packet2cd(_mm512_extractf64x4_pd(a.v,0)), Packet2cd(_mm512_extractf64x4_pd(a.v,1)))); } template<> struct conj_helper { EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const { return padd(pmul(x,y),c); } EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const { return internal::pmul(a, pconj(b)); } }; template<> struct conj_helper { EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const { return padd(pmul(x,y),c); } EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const { return internal::pmul(pconj(a), b); } }; template<> struct conj_helper { EIGEN_STRONG_INLINE Packet4cd pmadd(const Packet4cd& x, const Packet4cd& y, const Packet4cd& c) const { return padd(pmul(x,y),c); } EIGEN_STRONG_INLINE Packet4cd pmul(const Packet4cd& a, const Packet4cd& b) const { return pconj(internal::pmul(a, b)); } }; EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet4cd,Packet8d) template<> EIGEN_STRONG_INLINE Packet4cd pdiv(const Packet4cd& a, const Packet4cd& b) { Packet4cd num = pmul(a, pconj(b)); __m512d tmp = _mm512_mul_pd(b.v, b.v); __m512d denom = padd(_mm512_permute_pd(tmp,0x55), tmp); return Packet4cd(_mm512_div_pd(num.v, denom)); } template<> EIGEN_STRONG_INLINE Packet4cd pcplxflip(const Packet4cd& x) { return Packet4cd(_mm512_permute_pd(x.v,0x55)); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { PacketBlock pb; pb.packet[0] = _mm512_castps_pd(kernel.packet[0].v); pb.packet[1] = _mm512_castps_pd(kernel.packet[1].v); pb.packet[2] = _mm512_castps_pd(kernel.packet[2].v); pb.packet[3] = _mm512_castps_pd(kernel.packet[3].v); ptranspose(pb); kernel.packet[0].v = _mm512_castpd_ps(pb.packet[0]); kernel.packet[1].v = _mm512_castpd_ps(pb.packet[1]); kernel.packet[2].v = _mm512_castpd_ps(pb.packet[2]); kernel.packet[3].v = _mm512_castpd_ps(pb.packet[3]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { PacketBlock pb; pb.packet[0] = _mm512_castps_pd(kernel.packet[0].v); pb.packet[1] = _mm512_castps_pd(kernel.packet[1].v); pb.packet[2] = _mm512_castps_pd(kernel.packet[2].v); pb.packet[3] = _mm512_castps_pd(kernel.packet[3].v); pb.packet[4] = _mm512_castps_pd(kernel.packet[4].v); pb.packet[5] = _mm512_castps_pd(kernel.packet[5].v); pb.packet[6] = _mm512_castps_pd(kernel.packet[6].v); pb.packet[7] = _mm512_castps_pd(kernel.packet[7].v); ptranspose(pb); kernel.packet[0].v = _mm512_castpd_ps(pb.packet[0]); kernel.packet[1].v = _mm512_castpd_ps(pb.packet[1]); kernel.packet[2].v = _mm512_castpd_ps(pb.packet[2]); kernel.packet[3].v = _mm512_castpd_ps(pb.packet[3]); kernel.packet[4].v = _mm512_castpd_ps(pb.packet[4]); kernel.packet[5].v = _mm512_castpd_ps(pb.packet[5]); kernel.packet[6].v = _mm512_castpd_ps(pb.packet[6]); kernel.packet[7].v = _mm512_castpd_ps(pb.packet[7]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { __m512d T0 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<0,1,0,1>::mask)); // [a0 a1 b0 b1] __m512d T1 = _mm512_shuffle_f64x2(kernel.packet[0].v, kernel.packet[1].v, (shuffle_mask<2,3,2,3>::mask)); // [a2 a3 b2 b3] __m512d T2 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<0,1,0,1>::mask)); // [c0 c1 d0 d1] __m512d T3 = _mm512_shuffle_f64x2(kernel.packet[2].v, kernel.packet[3].v, (shuffle_mask<2,3,2,3>::mask)); // [c2 c3 d2 d3] kernel.packet[3] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<1,3,1,3>::mask))); // [a3 b3 c3 d3] kernel.packet[2] = Packet4cd(_mm512_shuffle_f64x2(T1, T3, (shuffle_mask<0,2,0,2>::mask))); // [a2 b2 c2 d2] kernel.packet[1] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<1,3,1,3>::mask))); // [a1 b1 c1 d1] kernel.packet[0] = Packet4cd(_mm512_shuffle_f64x2(T0, T2, (shuffle_mask<0,2,0,2>::mask))); // [a0 b0 c0 d0] } template<> EIGEN_STRONG_INLINE Packet4cd psqrt(const Packet4cd& a) { return psqrt_complex(a); } template<> EIGEN_STRONG_INLINE Packet8cf psqrt(const Packet8cf& a) { return psqrt_complex(a); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX_AVX512_H RcppEigen/inst/include/Eigen/src/Core/arch/SYCL/0000755000176200001440000000000014562417221020727 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/SYCL/MathFunctions.h0000644000176200001440000003037314562417221023670 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * MathFunctions.h * * \brief: * MathFunctions * *****************************************************************/ #ifndef EIGEN_MATH_FUNCTIONS_SYCL_H #define EIGEN_MATH_FUNCTIONS_SYCL_H namespace Eigen { namespace internal { // Make sure this is only available when targeting a GPU: we don't want to // introduce conflicts between these packet_traits definitions and the ones // we'll use on the host side (SSE, AVX, ...) #if defined(SYCL_DEVICE_ONLY) #define SYCL_PLOG(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type plog( \ const packet_type& a) { \ return cl::sycl::log(a); \ } SYCL_PLOG(cl::sycl::cl_float4) SYCL_PLOG(cl::sycl::cl_double2) #undef SYCL_PLOG #define SYCL_PLOG1P(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type plog1p( \ const packet_type& a) { \ return cl::sycl::log1p(a); \ } SYCL_PLOG1P(cl::sycl::cl_float4) SYCL_PLOG1P(cl::sycl::cl_double2) #undef SYCL_PLOG1P #define SYCL_PLOG10(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type plog10( \ const packet_type& a) { \ return cl::sycl::log10(a); \ } SYCL_PLOG10(cl::sycl::cl_float4) SYCL_PLOG10(cl::sycl::cl_double2) #undef SYCL_PLOG10 #define SYCL_PEXP(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pexp( \ const packet_type& a) { \ return cl::sycl::exp(a); \ } SYCL_PEXP(cl::sycl::cl_float4) SYCL_PEXP(cl::sycl::cl_float) SYCL_PEXP(cl::sycl::cl_double2) #undef SYCL_PEXP #define SYCL_PEXPM1(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pexpm1( \ const packet_type& a) { \ return cl::sycl::expm1(a); \ } SYCL_PEXPM1(cl::sycl::cl_float4) SYCL_PEXPM1(cl::sycl::cl_double2) #undef SYCL_PEXPM1 #define SYCL_PSQRT(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type psqrt( \ const packet_type& a) { \ return cl::sycl::sqrt(a); \ } SYCL_PSQRT(cl::sycl::cl_float4) SYCL_PSQRT(cl::sycl::cl_double2) #undef SYCL_PSQRT #define SYCL_PRSQRT(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type prsqrt( \ const packet_type& a) { \ return cl::sycl::rsqrt(a); \ } SYCL_PRSQRT(cl::sycl::cl_float4) SYCL_PRSQRT(cl::sycl::cl_double2) #undef SYCL_PRSQRT /** \internal \returns the hyperbolic sine of \a a (coeff-wise) */ #define SYCL_PSIN(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type psin( \ const packet_type& a) { \ return cl::sycl::sin(a); \ } SYCL_PSIN(cl::sycl::cl_float4) SYCL_PSIN(cl::sycl::cl_double2) #undef SYCL_PSIN /** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */ #define SYCL_PCOS(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pcos( \ const packet_type& a) { \ return cl::sycl::cos(a); \ } SYCL_PCOS(cl::sycl::cl_float4) SYCL_PCOS(cl::sycl::cl_double2) #undef SYCL_PCOS /** \internal \returns the hyperbolic tan of \a a (coeff-wise) */ #define SYCL_PTAN(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type ptan( \ const packet_type& a) { \ return cl::sycl::tan(a); \ } SYCL_PTAN(cl::sycl::cl_float4) SYCL_PTAN(cl::sycl::cl_double2) #undef SYCL_PTAN /** \internal \returns the hyperbolic sine of \a a (coeff-wise) */ #define SYCL_PASIN(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pasin( \ const packet_type& a) { \ return cl::sycl::asin(a); \ } SYCL_PASIN(cl::sycl::cl_float4) SYCL_PASIN(cl::sycl::cl_double2) #undef SYCL_PASIN /** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */ #define SYCL_PACOS(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pacos( \ const packet_type& a) { \ return cl::sycl::acos(a); \ } SYCL_PACOS(cl::sycl::cl_float4) SYCL_PACOS(cl::sycl::cl_double2) #undef SYCL_PACOS /** \internal \returns the hyperbolic tan of \a a (coeff-wise) */ #define SYCL_PATAN(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type patan( \ const packet_type& a) { \ return cl::sycl::atan(a); \ } SYCL_PATAN(cl::sycl::cl_float4) SYCL_PATAN(cl::sycl::cl_double2) #undef SYCL_PATAN /** \internal \returns the hyperbolic sine of \a a (coeff-wise) */ #define SYCL_PSINH(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type psinh( \ const packet_type& a) { \ return cl::sycl::sinh(a); \ } SYCL_PSINH(cl::sycl::cl_float4) SYCL_PSINH(cl::sycl::cl_double2) #undef SYCL_PSINH /** \internal \returns the hyperbolic cosine of \a a (coeff-wise) */ #define SYCL_PCOSH(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pcosh( \ const packet_type& a) { \ return cl::sycl::cosh(a); \ } SYCL_PCOSH(cl::sycl::cl_float4) SYCL_PCOSH(cl::sycl::cl_double2) #undef SYCL_PCOSH /** \internal \returns the hyperbolic tan of \a a (coeff-wise) */ #define SYCL_PTANH(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type ptanh( \ const packet_type& a) { \ return cl::sycl::tanh(a); \ } SYCL_PTANH(cl::sycl::cl_float4) SYCL_PTANH(cl::sycl::cl_double2) #undef SYCL_PTANH #define SYCL_PCEIL(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pceil( \ const packet_type& a) { \ return cl::sycl::ceil(a); \ } SYCL_PCEIL(cl::sycl::cl_float4) SYCL_PCEIL(cl::sycl::cl_double2) #undef SYCL_PCEIL #define SYCL_PROUND(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pround( \ const packet_type& a) { \ return cl::sycl::round(a); \ } SYCL_PROUND(cl::sycl::cl_float4) SYCL_PROUND(cl::sycl::cl_double2) #undef SYCL_PROUND #define SYCL_PRINT(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type print( \ const packet_type& a) { \ return cl::sycl::rint(a); \ } SYCL_PRINT(cl::sycl::cl_float4) SYCL_PRINT(cl::sycl::cl_double2) #undef SYCL_PRINT #define SYCL_FLOOR(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pfloor( \ const packet_type& a) { \ return cl::sycl::floor(a); \ } SYCL_FLOOR(cl::sycl::cl_float4) SYCL_FLOOR(cl::sycl::cl_double2) #undef SYCL_FLOOR #define SYCL_PMIN(packet_type, expr) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pmin( \ const packet_type& a, const packet_type& b) { \ return expr; \ } SYCL_PMIN(cl::sycl::cl_float4, cl::sycl::fmin(a, b)) SYCL_PMIN(cl::sycl::cl_double2, cl::sycl::fmin(a, b)) #undef SYCL_PMIN #define SYCL_PMAX(packet_type, expr) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pmax( \ const packet_type& a, const packet_type& b) { \ return expr; \ } SYCL_PMAX(cl::sycl::cl_float4, cl::sycl::fmax(a, b)) SYCL_PMAX(cl::sycl::cl_double2, cl::sycl::fmax(a, b)) #undef SYCL_PMAX #define SYCL_PLDEXP(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type pldexp( \ const packet_type& a, const packet_type& exponent) { \ return cl::sycl::ldexp( \ a, exponent.template convert()); \ } SYCL_PLDEXP(cl::sycl::cl_float4) SYCL_PLDEXP(cl::sycl::cl_double2) #undef SYCL_PLDEXP #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_SYCL_H RcppEigen/inst/include/Eigen/src/Core/arch/SYCL/PacketMath.h0000644000176200001440000006621214562417221023130 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * PacketMath.h * * \brief: * PacketMath * *****************************************************************/ #ifndef EIGEN_PACKET_MATH_SYCL_H #define EIGEN_PACKET_MATH_SYCL_H #include namespace Eigen { namespace internal { #ifdef SYCL_DEVICE_ONLY #define SYCL_PLOADT_RO(address_space_target) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type ploadt_ro( \ typename cl::sycl::multi_ptr< \ const typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ from) { \ typedef typename unpacket_traits::type scalar; \ typedef cl::sycl::multi_ptr< \ scalar, cl::sycl::access::address_space::address_space_target> \ multi_ptr; \ auto res = packet_type( \ static_cast::type>(0)); \ res.load(0, multi_ptr(const_cast(from))); \ return res; \ } SYCL_PLOADT_RO(global_space) SYCL_PLOADT_RO(local_space) #undef SYCL_PLOADT_RO #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type ploadt_ro(const Eigen::TensorSycl::internal::RangeAccess< cl::sycl::access::mode::read_write, T>& from) { return ploadt_ro(from.get_pointer()); } #ifdef SYCL_DEVICE_ONLY #define SYCL_PLOAD(address_space_target, Alignment, AlignedType) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type pload##AlignedType( \ typename cl::sycl::multi_ptr< \ const typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ from) { \ return ploadt_ro(from); \ } // global space SYCL_PLOAD(global_space, Unaligned, u) SYCL_PLOAD(global_space, Aligned, ) // local space SYCL_PLOAD(local_space, Unaligned, u) SYCL_PLOAD(local_space, Aligned, ) #undef SYCL_PLOAD #endif #define SYCL_PLOAD(Alignment, AlignedType) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type pload##AlignedType( \ const Eigen::TensorSycl::internal::RangeAccess< \ cl::sycl::access::mode::read_write, \ typename unpacket_traits::type> \ from) { \ return ploadt_ro(from); \ } SYCL_PLOAD(Unaligned, u) SYCL_PLOAD(Aligned, ) #undef SYCL_PLOAD #ifdef SYCL_DEVICE_ONLY /** \internal \returns a packet version of \a *from. * The pointer \a from must be aligned on a \a Alignment bytes boundary. */ #define SYCL_PLOADT(address_space_target) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type ploadt( \ typename cl::sycl::multi_ptr< \ const typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ from) { \ if (Alignment >= unpacket_traits::alignment) \ return pload(from); \ else \ return ploadu(from); \ } // global space SYCL_PLOADT(global_space) // local space SYCL_PLOADT(local_space) #undef SYCL_PLOADT #endif template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type ploadt(const Eigen::TensorSycl::internal::RangeAccess< cl::sycl::access::mode::read_write, typename unpacket_traits::type>& from) { return ploadt(from.get_pointer()); } #ifdef SYCL_DEVICE_ONLY // private_space #define SYCL_PLOADT_RO_SPECIAL(packet_type, Alignment) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type \ ploadt_ro( \ const typename unpacket_traits::type* from) { \ typedef typename unpacket_traits::type scalar; \ auto res = packet_type(static_cast(0)); \ res.template load( \ 0, const_cast(from)); \ return res; \ } SYCL_PLOADT_RO_SPECIAL(cl::sycl::cl_float4, Aligned) SYCL_PLOADT_RO_SPECIAL(cl::sycl::cl_double2, Aligned) SYCL_PLOADT_RO_SPECIAL(cl::sycl::cl_float4, Unaligned) SYCL_PLOADT_RO_SPECIAL(cl::sycl::cl_double2, Unaligned) #define SYCL_PLOAD_SPECIAL(packet_type, alignment_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type pload##alignment_type( \ const typename unpacket_traits::type* from) { \ typedef typename unpacket_traits::type scalar; \ auto res = packet_type(static_cast(0)); \ res.template load( \ 0, const_cast(from)); \ return res; \ } SYCL_PLOAD_SPECIAL(cl::sycl::cl_float4, ) SYCL_PLOAD_SPECIAL(cl::sycl::cl_double2, ) SYCL_PLOAD_SPECIAL(cl::sycl::cl_float4, u) SYCL_PLOAD_SPECIAL(cl::sycl::cl_double2, u) #undef SYCL_PLOAD_SPECIAL #define SYCL_PSTORE(scalar, packet_type, address_space_target, alignment) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstore##alignment( \ typename cl::sycl::multi_ptr< \ scalar, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ to, \ const packet_type& from) { \ typedef cl::sycl::multi_ptr< \ scalar, cl::sycl::access::address_space::address_space_target> \ multi_ptr; \ from.store(0, multi_ptr(to)); \ } // global space SYCL_PSTORE(float, cl::sycl::cl_float4, global_space, ) SYCL_PSTORE(float, cl::sycl::cl_float4, global_space, u) SYCL_PSTORE(double, cl::sycl::cl_double2, global_space, ) SYCL_PSTORE(double, cl::sycl::cl_double2, global_space, u) SYCL_PSTORE(float, cl::sycl::cl_float4, local_space, ) SYCL_PSTORE(float, cl::sycl::cl_float4, local_space, u) SYCL_PSTORE(double, cl::sycl::cl_double2, local_space, ) SYCL_PSTORE(double, cl::sycl::cl_double2, local_space, u) SYCL_PSTORE(float, cl::sycl::cl_float4, private_space, ) SYCL_PSTORE(float, cl::sycl::cl_float4, private_space, u) SYCL_PSTORE(double, cl::sycl::cl_double2, private_space, ) SYCL_PSTORE(double, cl::sycl::cl_double2, private_space, u) #undef SYCL_PSTORE #define SYCL_PSTORE_T(address_space_target) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstoret( \ typename cl::sycl::multi_ptr< \ scalar, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ to, \ const packet_type& from) { \ if (Alignment) \ pstore(to, from); \ else \ pstoreu(to, from); \ } SYCL_PSTORE_T(global_space) SYCL_PSTORE_T(local_space) #undef SYCL_PSTORE_T #define SYCL_PSET1(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type pset1( \ const typename unpacket_traits::type& from) { \ return packet_type(from); \ } // global space SYCL_PSET1(cl::sycl::cl_float4) SYCL_PSET1(cl::sycl::cl_double2) #undef SYCL_PSET1 template struct get_base_packet { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type get_ploaddup(sycl_multi_pointer) {} template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type get_pgather(sycl_multi_pointer, Index) {} }; template <> struct get_base_packet { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_float4 get_ploaddup( sycl_multi_pointer from) { return cl::sycl::cl_float4(from[0], from[0], from[1], from[1]); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_float4 get_pgather( sycl_multi_pointer from, Index stride) { return cl::sycl::cl_float4(from[0 * stride], from[1 * stride], from[2 * stride], from[3 * stride]); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void set_pscatter( sycl_multi_pointer to, const cl::sycl::cl_float4& from, Index stride) { auto tmp = stride; to[0] = from.x(); to[tmp] = from.y(); to[tmp += stride] = from.z(); to[tmp += stride] = from.w(); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_float4 set_plset( const float& a) { return cl::sycl::cl_float4(static_cast(a), static_cast(a + 1), static_cast(a + 2), static_cast(a + 3)); } }; template <> struct get_base_packet { template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_double2 get_ploaddup(const sycl_multi_pointer from) { return cl::sycl::cl_double2(from[0], from[0]); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_double2 get_pgather( const sycl_multi_pointer from, Index stride) { return cl::sycl::cl_double2(from[0 * stride], from[1 * stride]); } template static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void set_pscatter( sycl_multi_pointer to, const cl::sycl::cl_double2& from, Index stride) { to[0] = from.x(); to[stride] = from.y(); } static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE cl::sycl::cl_double2 set_plset( const double& a) { return cl::sycl::cl_double2(static_cast(a), static_cast(a + 1)); } }; #define SYCL_PLOAD_DUP(address_space_target) \ template \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type ploaddup( \ typename cl::sycl::multi_ptr< \ const typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ from) { \ return get_base_packet::get_ploaddup(from); \ } // global space SYCL_PLOAD_DUP(global_space) // local_space SYCL_PLOAD_DUP(local_space) #undef SYCL_PLOAD_DUP #define SYCL_PLOAD_DUP_SPECILIZE(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type ploaddup( \ const typename unpacket_traits::type* from) { \ return get_base_packet::get_ploaddup(from); \ } SYCL_PLOAD_DUP_SPECILIZE(cl::sycl::cl_float4) SYCL_PLOAD_DUP_SPECILIZE(cl::sycl::cl_double2) #undef SYCL_PLOAD_DUP_SPECILIZE #define SYCL_PLSET(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type plset( \ const typename unpacket_traits::type& a) { \ return get_base_packet::set_plset(a); \ } SYCL_PLSET(cl::sycl::cl_float4) SYCL_PLSET(cl::sycl::cl_double2) #undef SYCL_PLSET #define SYCL_PGATHER(address_space_target) \ template \ EIGEN_DEVICE_FUNC inline packet_type pgather( \ typename cl::sycl::multi_ptr< \ const typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ from, \ Index stride) { \ return get_base_packet::get_pgather(from, stride); \ } // global space SYCL_PGATHER(global_space) // local space SYCL_PGATHER(local_space) #undef SYCL_PGATHER #define SYCL_PGATHER_SPECILIZE(scalar, packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packet_type \ pgather( \ const typename unpacket_traits::type* from, Index stride) { \ return get_base_packet::get_pgather(from, stride); \ } SYCL_PGATHER_SPECILIZE(float, cl::sycl::cl_float4) SYCL_PGATHER_SPECILIZE(double, cl::sycl::cl_double2) #undef SYCL_PGATHER_SPECILIZE #define SYCL_PSCATTER(address_space_target) \ template \ EIGEN_DEVICE_FUNC inline void pscatter( \ typename cl::sycl::multi_ptr< \ typename unpacket_traits::type, \ cl::sycl::access::address_space::address_space_target>::pointer_t \ to, \ const packet_type& from, Index stride) { \ get_base_packet::set_pscatter(to, from, stride); \ } // global space SYCL_PSCATTER(global_space) // local space SYCL_PSCATTER(local_space) #undef SYCL_PSCATTER #define SYCL_PSCATTER_SPECILIZE(scalar, packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter( \ typename unpacket_traits::type * to, \ const packet_type& from, Index stride) { \ get_base_packet::set_pscatter(to, from, stride); \ } SYCL_PSCATTER_SPECILIZE(float, cl::sycl::cl_float4) SYCL_PSCATTER_SPECILIZE(double, cl::sycl::cl_double2) #undef SYCL_PSCATTER_SPECILIZE #define SYCL_PMAD(packet_type) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE packet_type pmadd( \ const packet_type& a, const packet_type& b, const packet_type& c) { \ return cl::sycl::mad(a, b, c); \ } SYCL_PMAD(cl::sycl::cl_float4) SYCL_PMAD(cl::sycl::cl_double2) #undef SYCL_PMAD template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float pfirst( const cl::sycl::cl_float4& a) { return a.x(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double pfirst( const cl::sycl::cl_double2& a) { return a.x(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float predux( const cl::sycl::cl_float4& a) { return a.x() + a.y() + a.z() + a.w(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double predux( const cl::sycl::cl_double2& a) { return a.x() + a.y(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float predux_max( const cl::sycl::cl_float4& a) { return cl::sycl::fmax(cl::sycl::fmax(a.x(), a.y()), cl::sycl::fmax(a.z(), a.w())); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double predux_max( const cl::sycl::cl_double2& a) { return cl::sycl::fmax(a.x(), a.y()); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float predux_min( const cl::sycl::cl_float4& a) { return cl::sycl::fmin(cl::sycl::fmin(a.x(), a.y()), cl::sycl::fmin(a.z(), a.w())); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double predux_min( const cl::sycl::cl_double2& a) { return cl::sycl::fmin(a.x(), a.y()); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float predux_mul( const cl::sycl::cl_float4& a) { return a.x() * a.y() * a.z() * a.w(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double predux_mul( const cl::sycl::cl_double2& a) { return a.x() * a.y(); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_float4 pabs(const cl::sycl::cl_float4& a) { return cl::sycl::cl_float4(cl::sycl::fabs(a.x()), cl::sycl::fabs(a.y()), cl::sycl::fabs(a.z()), cl::sycl::fabs(a.w())); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_double2 pabs(const cl::sycl::cl_double2& a) { return cl::sycl::cl_double2(cl::sycl::fabs(a.x()), cl::sycl::fabs(a.y())); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_le(const Packet &a, const Packet &b) { return ((a <= b) .template convert::type, cl::sycl::rounding_mode::automatic>()); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_lt(const Packet &a, const Packet &b) { return ((a < b) .template convert::type, cl::sycl::rounding_mode::automatic>()); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet sycl_pcmp_eq(const Packet &a, const Packet &b) { return ((a == b) .template convert::type, cl::sycl::rounding_mode::automatic>()); } #define SYCL_PCMP(OP, TYPE) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE TYPE pcmp_##OP(const TYPE &a, \ const TYPE &b) { \ return sycl_pcmp_##OP(a, b); \ } SYCL_PCMP(le, cl::sycl::cl_float4) SYCL_PCMP(lt, cl::sycl::cl_float4) SYCL_PCMP(eq, cl::sycl::cl_float4) SYCL_PCMP(le, cl::sycl::cl_double2) SYCL_PCMP(lt, cl::sycl::cl_double2) SYCL_PCMP(eq, cl::sycl::cl_double2) #undef SYCL_PCMP template struct convert_to_integer; template <> struct convert_to_integer { using type = std::int32_t; using packet_type = cl::sycl::cl_int4; }; template <> struct convert_to_integer { using type = std::int64_t; using packet_type = cl::sycl::cl_long2; }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename convert_to_integer< typename unpacket_traits::type>::packet_type vector_as_int(const PacketIn &p) { return ( p.template convert::type>::type, cl::sycl::rounding_mode::automatic>()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE packetOut convert_vector(const PacketIn &p) { return (p.template convert::type, cl::sycl::rounding_mode::automatic>()); } #define SYCL_PAND(TYPE) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pand(const TYPE &a, \ const TYPE &b) { \ return convert_vector(vector_as_int(a) & vector_as_int(b)); \ } SYCL_PAND(cl::sycl::cl_float4) SYCL_PAND(cl::sycl::cl_double2) #undef SYCL_PAND #define SYCL_POR(TYPE) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE por(const TYPE &a, \ const TYPE &b) { \ return convert_vector(vector_as_int(a) | vector_as_int(b)); \ } SYCL_POR(cl::sycl::cl_float4) SYCL_POR(cl::sycl::cl_double2) #undef SYCL_POR #define SYCL_PXOR(TYPE) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pxor(const TYPE &a, \ const TYPE &b) { \ return convert_vector(vector_as_int(a) ^ vector_as_int(b)); \ } SYCL_PXOR(cl::sycl::cl_float4) SYCL_PXOR(cl::sycl::cl_double2) #undef SYCL_PXOR #define SYCL_PANDNOT(TYPE) \ template <> \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TYPE pandnot(const TYPE &a, \ const TYPE &b) { \ return convert_vector(vector_as_int(a) & (~vector_as_int(b))); \ } SYCL_PANDNOT(cl::sycl::cl_float4) SYCL_PANDNOT(cl::sycl::cl_double2) #undef SYCL_PANDNOT EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void ptranspose( PacketBlock& kernel) { float tmp = kernel.packet[0].y(); kernel.packet[0].y() = kernel.packet[1].x(); kernel.packet[1].x() = tmp; tmp = kernel.packet[0].z(); kernel.packet[0].z() = kernel.packet[2].x(); kernel.packet[2].x() = tmp; tmp = kernel.packet[0].w(); kernel.packet[0].w() = kernel.packet[3].x(); kernel.packet[3].x() = tmp; tmp = kernel.packet[1].z(); kernel.packet[1].z() = kernel.packet[2].y(); kernel.packet[2].y() = tmp; tmp = kernel.packet[1].w(); kernel.packet[1].w() = kernel.packet[3].y(); kernel.packet[3].y() = tmp; tmp = kernel.packet[2].w(); kernel.packet[2].w() = kernel.packet[3].z(); kernel.packet[3].z() = tmp; } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void ptranspose( PacketBlock& kernel) { double tmp = kernel.packet[0].y(); kernel.packet[0].y() = kernel.packet[1].x(); kernel.packet[1].x() = tmp; } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_float4 pblend( const Selector::size>& ifPacket, const cl::sycl::cl_float4& thenPacket, const cl::sycl::cl_float4& elsePacket) { cl::sycl::cl_int4 condition( ifPacket.select[0] ? 0 : -1, ifPacket.select[1] ? 0 : -1, ifPacket.select[2] ? 0 : -1, ifPacket.select[3] ? 0 : -1); return cl::sycl::select(thenPacket, elsePacket, condition); } template <> inline cl::sycl::cl_double2 pblend( const Selector::size>& ifPacket, const cl::sycl::cl_double2& thenPacket, const cl::sycl::cl_double2& elsePacket) { cl::sycl::cl_long2 condition(ifPacket.select[0] ? 0 : -1, ifPacket.select[1] ? 0 : -1); return cl::sycl::select(thenPacket, elsePacket, condition); } #endif // SYCL_DEVICE_ONLY #define SYCL_PSTORE(alignment) \ template \ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstore##alignment( \ const Eigen::TensorSycl::internal::RangeAccess< \ cl::sycl::access::mode::read_write, \ typename unpacket_traits::type>& to, \ const packet_type& from) { \ pstore##alignment(to.get_pointer(), from); \ } // global space SYCL_PSTORE() SYCL_PSTORE(u) #undef SYCL_PSTORE template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstoret( Eigen::TensorSycl::internal::RangeAccess< cl::sycl::access::mode::read_write, typename unpacket_traits::type> to, const packet_type& from) { pstoret(to.get_pointer(), from); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_SYCL_H RcppEigen/inst/include/Eigen/src/Core/arch/SYCL/TypeCasting.h0000644000176200001440000000510214562417221023330 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * TypeCasting.h * * \brief: * TypeCasting * *****************************************************************/ #ifndef EIGEN_TYPE_CASTING_SYCL_H #define EIGEN_TYPE_CASTING_SYCL_H namespace Eigen { namespace internal { #ifdef SYCL_DEVICE_ONLY template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_int4 pcast(const cl::sycl::cl_float4& a) { return a .template convert(); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_float4 pcast(const cl::sycl::cl_int4& a) { return a.template convert(); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_float4 pcast( const cl::sycl::cl_double2& a, const cl::sycl::cl_double2& b) { auto a1 = a.template convert(); auto b1 = b.template convert(); return cl::sycl::float4(a1.x(), a1.y(), b1.x(), b1.y()); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE cl::sycl::cl_double2 pcast(const cl::sycl::cl_float4& a) { // Simply discard the second half of the input return cl::sycl::cl_double2(a.x(), a.y()); } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_SYCL_H RcppEigen/inst/include/Eigen/src/Core/arch/SYCL/InteropHeaders.h0000644000176200001440000001640414562417221024021 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Mehdi Goli Codeplay Software Ltd. // Ralph Potter Codeplay Software Ltd. // Luke Iwanski Codeplay Software Ltd. // Contact: // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************** * InteropHeaders.h * * \brief: * InteropHeaders * *****************************************************************/ #ifndef EIGEN_INTEROP_HEADERS_SYCL_H #define EIGEN_INTEROP_HEADERS_SYCL_H namespace Eigen { #if !defined(EIGEN_DONT_VECTORIZE_SYCL) namespace internal { template struct sycl_packet_traits : default_packet_traits { enum { Vectorizable = 1, AlignedOnScalar = 1, size = lengths, HasHalfPacket = 0, HasDiv = 1, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasSin = 1, HasCos = 1, HasTan = 1, HasASin = 1, HasACos = 1, HasATan = 1, HasSinh = 1, HasCosh = 1, HasTanh = 1, HasLGamma = 0, HasDiGamma = 0, HasZeta = 0, HasPolygamma = 0, HasErf = 0, HasErfc = 0, HasNdtri = 0, HasIGamma = 0, HasIGammac = 0, HasBetaInc = 0, HasBlend = has_blend, // This flag is used to indicate whether packet comparison is supported. // pcmp_eq, pcmp_lt and pcmp_le should be defined for it to be true. HasCmp = 1, HasMax = 1, HasMin = 1, HasMul = 1, HasAdd = 1, HasFloor = 1, HasRound = 1, HasRint = 1, HasLog1p = 1, HasExpm1 = 1, HasCeil = 1, }; }; #ifdef SYCL_DEVICE_ONLY #define SYCL_PACKET_TRAITS(packet_type, has_blend, unpacket_type, lengths) \ template <> \ struct packet_traits \ : sycl_packet_traits { \ typedef packet_type type; \ typedef packet_type half; \ }; SYCL_PACKET_TRAITS(cl::sycl::cl_float4, 1, float, 4) SYCL_PACKET_TRAITS(cl::sycl::cl_float4, 1, const float, 4) SYCL_PACKET_TRAITS(cl::sycl::cl_double2, 0, double, 2) SYCL_PACKET_TRAITS(cl::sycl::cl_double2, 0, const double, 2) #undef SYCL_PACKET_TRAITS // Make sure this is only available when targeting a GPU: we don't want to // introduce conflicts between these packet_traits definitions and the ones // we'll use on the host side (SSE, AVX, ...) #define SYCL_ARITHMETIC(packet_type) \ template <> \ struct is_arithmetic { \ enum { value = true }; \ }; SYCL_ARITHMETIC(cl::sycl::cl_float4) SYCL_ARITHMETIC(cl::sycl::cl_double2) #undef SYCL_ARITHMETIC #define SYCL_UNPACKET_TRAITS(packet_type, unpacket_type, lengths) \ template <> \ struct unpacket_traits { \ typedef unpacket_type type; \ enum { size = lengths, vectorizable = true, alignment = Aligned16 }; \ typedef packet_type half; \ }; SYCL_UNPACKET_TRAITS(cl::sycl::cl_float4, float, 4) SYCL_UNPACKET_TRAITS(cl::sycl::cl_double2, double, 2) #undef SYCL_UNPACKET_TRAITS #endif } // end namespace internal #endif namespace TensorSycl { namespace internal { template struct PacketWrapper; // This function should never get called on the device #ifndef SYCL_DEVICE_ONLY template struct PacketWrapper { typedef typename ::Eigen::internal::unpacket_traits::type Scalar; template EIGEN_DEVICE_FUNC static Scalar scalarize(Index, PacketReturnType &) { eigen_assert(false && "THERE IS NO PACKETIZE VERSION FOR THE CHOSEN TYPE"); abort(); } EIGEN_DEVICE_FUNC static PacketReturnType convert_to_packet_type(Scalar in, Scalar) { return ::Eigen::internal::template plset(in); } EIGEN_DEVICE_FUNC static void set_packet(PacketReturnType, Scalar *) { eigen_assert(false && "THERE IS NO PACKETIZE VERSION FOR THE CHOSEN TYPE"); abort(); } }; #elif defined(SYCL_DEVICE_ONLY) template struct PacketWrapper { typedef typename ::Eigen::internal::unpacket_traits::type Scalar; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index index, PacketReturnType &in) { switch (index) { case 0: return in.x(); case 1: return in.y(); case 2: return in.z(); case 3: return in.w(); default: //INDEX MUST BE BETWEEN 0 and 3.There is no abort function in SYCL kernel. so we cannot use abort here. // The code will never reach here __builtin_unreachable(); } __builtin_unreachable(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type( Scalar in, Scalar other) { return PacketReturnType(in, other, other, other); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) { lhs = PacketReturnType(rhs[0], rhs[1], rhs[2], rhs[3]); } }; template struct PacketWrapper { typedef typename ::Eigen::internal::unpacket_traits::type Scalar; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index, PacketReturnType &in) { return in; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type(Scalar in, Scalar) { return PacketReturnType(in); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) { lhs = rhs[0]; } }; template struct PacketWrapper { typedef typename ::Eigen::internal::unpacket_traits::type Scalar; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static Scalar scalarize(Index index, PacketReturnType &in) { switch (index) { case 0: return in.x(); case 1: return in.y(); default: //INDEX MUST BE BETWEEN 0 and 1.There is no abort function in SYCL kernel. so we cannot use abort here. // The code will never reach here __builtin_unreachable(); } __builtin_unreachable(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static PacketReturnType convert_to_packet_type( Scalar in, Scalar other) { return PacketReturnType(in, other); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void set_packet(PacketReturnType &lhs, Scalar *rhs) { lhs = PacketReturnType(rhs[0], rhs[1]); } }; #endif } // end namespace internal } // end namespace TensorSycl } // end namespace Eigen #endif // EIGEN_INTEROP_HEADERS_SYCL_H RcppEigen/inst/include/Eigen/src/Core/arch/SYCL/SyclMemoryModel.h0000644000176200001440000005254014562417221024172 0ustar liggesusers/*************************************************************************** * Copyright (C) 2017 Codeplay Software Limited * This Source Code Form is subject to the terms of the Mozilla * Public License v. 2.0. If a copy of the MPL was not distributed * with this file, You can obtain one at http://mozilla.org/MPL/2.0/. * * * SyclMemoryModel.h * * Description: * Interface for SYCL buffers to behave as a non-dereferenceable pointer * Interface for Placeholder accessor to behave as a pointer on both host * and device * * Authors: * * Ruyman Reyes Codeplay Software Ltd. * Mehdi Goli Codeplay Software Ltd. * Vanya Yaneva Codeplay Software Ltd. * **************************************************************************/ #if defined(EIGEN_USE_SYCL) && \ !defined(EIGEN_CXX11_TENSOR_TENSOR_SYCL_STORAGE_MEMORY_H) #define EIGEN_CXX11_TENSOR_TENSOR_SYCL_STORAGE_MEMORY_H #include #ifdef EIGEN_EXCEPTIONS #include #endif #include #include #include #include namespace Eigen { namespace TensorSycl { namespace internal { using sycl_acc_target = cl::sycl::access::target; using sycl_acc_mode = cl::sycl::access::mode; /** * Default values for template arguments */ using buffer_data_type_t = uint8_t; const sycl_acc_target default_acc_target = sycl_acc_target::global_buffer; const sycl_acc_mode default_acc_mode = sycl_acc_mode::read_write; /** * PointerMapper * Associates fake pointers with buffers. * */ class PointerMapper { public: using base_ptr_t = std::intptr_t; /* Structure of a virtual pointer * * |================================================| * | POINTER ADDRESS | * |================================================| */ struct virtual_pointer_t { /* Type for the pointers */ base_ptr_t m_contents; /** Conversions from virtual_pointer_t to * void * should just reinterpret_cast the integer number */ operator void *() const { return reinterpret_cast(m_contents); } /** * Convert back to the integer number. */ operator base_ptr_t() const { return m_contents; } /** * Add a certain value to the pointer to create a * new pointer to that offset */ virtual_pointer_t operator+(size_t off) { return m_contents + off; } /* Numerical order for sorting pointers in containers. */ bool operator<(virtual_pointer_t rhs) const { return (static_cast(m_contents) < static_cast(rhs.m_contents)); } bool operator>(virtual_pointer_t rhs) const { return (static_cast(m_contents) > static_cast(rhs.m_contents)); } /** * Numerical order for sorting pointers in containers */ bool operator==(virtual_pointer_t rhs) const { return (static_cast(m_contents) == static_cast(rhs.m_contents)); } /** * Simple forward to the equality overload. */ bool operator!=(virtual_pointer_t rhs) const { return !(this->operator==(rhs)); } /** * Converts a void * into a virtual pointer structure. * Note that this will only work if the void * was * already a virtual_pointer_t, but we have no way of * checking */ virtual_pointer_t(const void *ptr) : m_contents(reinterpret_cast(ptr)){}; /** * Creates a virtual_pointer_t from the given integer * number */ virtual_pointer_t(base_ptr_t u) : m_contents(u){}; }; /* Definition of a null pointer */ const virtual_pointer_t null_virtual_ptr = nullptr; /** * Whether if a pointer is null or not. * A pointer is nullptr if the value is of null_virtual_ptr */ static inline bool is_nullptr(virtual_pointer_t ptr) { return (static_cast(ptr) == nullptr); } /* basic type for all buffers */ using buffer_t = cl::sycl::buffer_mem; /** * Node that stores information about a device allocation. * Nodes are sorted by size to organise a free list of nodes * that can be recovered. */ struct pMapNode_t { buffer_t m_buffer; size_t m_size; bool m_free; pMapNode_t(buffer_t b, size_t size, bool f) : m_buffer{b}, m_size{size}, m_free{f} { m_buffer.set_final_data(nullptr); } bool operator<=(const pMapNode_t &rhs) { return (m_size <= rhs.m_size); } }; /** Storage of the pointer / buffer tree */ using pointerMap_t = std::map; /** * Obtain the insertion point in the pointer map for * a pointer of the given size. * \param requiredSize Size attemted to reclaim */ typename pointerMap_t::iterator get_insertion_point(size_t requiredSize) { typename pointerMap_t::iterator retVal; bool reuse = false; if (!m_freeList.empty()) { // try to re-use an existing block for (auto freeElem : m_freeList) { if (freeElem->second.m_size >= requiredSize) { retVal = freeElem; reuse = true; // Element is not going to be free anymore m_freeList.erase(freeElem); break; } } } if (!reuse) { retVal = std::prev(m_pointerMap.end()); } return retVal; } /** * Returns an iterator to the node that stores the information * of the given virtual pointer from the given pointer map structure. * If pointer is not found, throws std::out_of_range. * If the pointer map structure is empty, throws std::out_of_range * * \param pMap the pointerMap_t structure storing all the pointers * \param virtual_pointer_ptr The virtual pointer to obtain the node of * \throws std::out:of_range if the pointer is not found or pMap is empty */ typename pointerMap_t::iterator get_node(const virtual_pointer_t ptr) { if (this->count() == 0) { m_pointerMap.clear(); EIGEN_THROW_X(std::out_of_range("There are no pointers allocated\n")); } if (is_nullptr(ptr)) { m_pointerMap.clear(); EIGEN_THROW_X(std::out_of_range("Cannot access null pointer\n")); } // The previous element to the lower bound is the node that // holds this memory address auto node = m_pointerMap.lower_bound(ptr); // If the value of the pointer is not the one of the node // then we return the previous one if (node == std::end(m_pointerMap)) { --node; } else if (node->first != ptr) { if (node == std::begin(m_pointerMap)) { m_pointerMap.clear(); EIGEN_THROW_X( std::out_of_range("The pointer is not registered in the map\n")); } --node; } return node; } /* get_buffer. * Returns a buffer from the map using the pointer address */ template cl::sycl::buffer get_buffer( const virtual_pointer_t ptr) { using sycl_buffer_t = cl::sycl::buffer; // get_node() returns a `buffer_mem`, so we need to cast it to a `buffer<>`. // We can do this without the `buffer_mem` being a pointer, as we // only declare member variables in the base class (`buffer_mem`) and not in // the child class (`buffer<>). auto node = get_node(ptr); eigen_assert(node->first == ptr || node->first < ptr); eigen_assert(ptr < static_cast(node->second.m_size + node->first)); return *(static_cast(&node->second.m_buffer)); } /** * @brief Returns an accessor to the buffer of the given virtual pointer * @param accessMode * @param accessTarget * @param ptr The virtual pointer */ template cl::sycl::accessor get_access(const virtual_pointer_t ptr) { auto buf = get_buffer(ptr); return buf.template get_access(); } /** * @brief Returns an accessor to the buffer of the given virtual pointer * in the given command group scope * @param accessMode * @param accessTarget * @param ptr The virtual pointer * @param cgh Reference to the command group scope */ template cl::sycl::accessor get_access(const virtual_pointer_t ptr, cl::sycl::handler &cgh) { auto buf = get_buffer(ptr); return buf.template get_access(cgh); } /* * Returns the offset from the base address of this pointer. */ inline std::ptrdiff_t get_offset(const virtual_pointer_t ptr) { // The previous element to the lower bound is the node that // holds this memory address auto node = get_node(ptr); auto start = node->first; eigen_assert(start == ptr || start < ptr); eigen_assert(ptr < start + node->second.m_size); return (ptr - start); } /* * Returns the number of elements by which the given pointer is offset from * the base address. */ template inline size_t get_element_offset(const virtual_pointer_t ptr) { return get_offset(ptr) / sizeof(buffer_data_type); } /** * Constructs the PointerMapper structure. */ PointerMapper(base_ptr_t baseAddress = 4096) : m_pointerMap{}, m_freeList{}, m_baseAddress{baseAddress} { if (m_baseAddress == 0) { EIGEN_THROW_X(std::invalid_argument("Base address cannot be zero\n")); } }; /** * PointerMapper cannot be copied or moved */ PointerMapper(const PointerMapper &) = delete; /** * Empty the pointer list */ inline void clear() { m_freeList.clear(); m_pointerMap.clear(); } /* add_pointer. * Adds an existing pointer to the map and returns the virtual pointer id. */ inline virtual_pointer_t add_pointer(const buffer_t &b) { return add_pointer_impl(b); } /* add_pointer. * Adds a pointer to the map and returns the virtual pointer id. */ inline virtual_pointer_t add_pointer(buffer_t &&b) { return add_pointer_impl(b); } /** * @brief Fuses the given node with the previous nodes in the * pointer map if they are free * * @param node A reference to the free node to be fused */ void fuse_forward(typename pointerMap_t::iterator &node) { while (node != std::prev(m_pointerMap.end())) { // if following node is free // remove it and extend the current node with its size auto fwd_node = std::next(node); if (!fwd_node->second.m_free) { break; } auto fwd_size = fwd_node->second.m_size; m_freeList.erase(fwd_node); m_pointerMap.erase(fwd_node); node->second.m_size += fwd_size; } } /** * @brief Fuses the given node with the following nodes in the * pointer map if they are free * * @param node A reference to the free node to be fused */ void fuse_backward(typename pointerMap_t::iterator &node) { while (node != m_pointerMap.begin()) { // if previous node is free, extend it // with the size of the current one auto prev_node = std::prev(node); if (!prev_node->second.m_free) { break; } prev_node->second.m_size += node->second.m_size; // remove the current node m_freeList.erase(node); m_pointerMap.erase(node); // point to the previous node node = prev_node; } } /* remove_pointer. * Removes the given pointer from the map. * The pointer is allowed to be reused only if ReUse if true. */ template void remove_pointer(const virtual_pointer_t ptr) { if (is_nullptr(ptr)) { return; } auto node = this->get_node(ptr); node->second.m_free = true; m_freeList.emplace(node); // Fuse the node // with free nodes before and after it fuse_forward(node); fuse_backward(node); // If after fusing the node is the last one // simply remove it (since it is free) if (node == std::prev(m_pointerMap.end())) { m_freeList.erase(node); m_pointerMap.erase(node); } } /* count. * Return the number of active pointers (i.e, pointers that * have been malloc but not freed). */ size_t count() const { return (m_pointerMap.size() - m_freeList.size()); } private: /* add_pointer_impl. * Adds a pointer to the map and returns the virtual pointer id. * BufferT is either a const buffer_t& or a buffer_t&&. */ template virtual_pointer_t add_pointer_impl(BufferT b) { virtual_pointer_t retVal = nullptr; size_t bufSize = b.get_count(); pMapNode_t p{b, bufSize, false}; // If this is the first pointer: if (m_pointerMap.empty()) { virtual_pointer_t initialVal{m_baseAddress}; m_pointerMap.emplace(initialVal, p); return initialVal; } auto lastElemIter = get_insertion_point(bufSize); // We are recovering an existing free node if (lastElemIter->second.m_free) { lastElemIter->second.m_buffer = b; lastElemIter->second.m_free = false; // If the recovered node is bigger than the inserted one // add a new free node with the remaining space if (lastElemIter->second.m_size > bufSize) { // create a new node with the remaining space auto remainingSize = lastElemIter->second.m_size - bufSize; pMapNode_t p2{b, remainingSize, true}; // update size of the current node lastElemIter->second.m_size = bufSize; // add the new free node auto newFreePtr = lastElemIter->first + bufSize; auto freeNode = m_pointerMap.emplace(newFreePtr, p2).first; m_freeList.emplace(freeNode); } retVal = lastElemIter->first; } else { size_t lastSize = lastElemIter->second.m_size; retVal = lastElemIter->first + lastSize; m_pointerMap.emplace(retVal, p); } return retVal; } /** * Compare two iterators to pointer map entries according to * the size of the allocation on the device. */ struct SortBySize { bool operator()(typename pointerMap_t::iterator a, typename pointerMap_t::iterator b) const { return ((a->first < b->first) && (a->second <= b->second)) || ((a->first < b->first) && (b->second <= a->second)); } }; /* Maps the pointer addresses to buffer and size pairs. */ pointerMap_t m_pointerMap; /* List of free nodes available for re-using */ std::set m_freeList; /* Base address used when issuing the first virtual pointer, allows users * to specify alignment. Cannot be zero. */ std::intptr_t m_baseAddress; }; /* remove_pointer. * Removes the given pointer from the map. * The pointer is allowed to be reused only if ReUse if true. */ template <> inline void PointerMapper::remove_pointer(const virtual_pointer_t ptr) { if (is_nullptr(ptr)) { return; } m_pointerMap.erase(this->get_node(ptr)); } /** * Malloc-like interface to the pointer-mapper. * Given a size, creates a byte-typed buffer and returns a * fake pointer to keep track of it. * \param size Size in bytes of the desired allocation * \throw cl::sycl::exception if error while creating the buffer */ inline void *SYCLmalloc(size_t size, PointerMapper &pMap) { if (size == 0) { return nullptr; } // Create a generic buffer of the given size using buffer_t = cl::sycl::buffer; auto thePointer = pMap.add_pointer(buffer_t(cl::sycl::range<1>{size})); // Store the buffer on the global list return static_cast(thePointer); } /** * Free-like interface to the pointer mapper. * Given a fake-pointer created with the virtual-pointer malloc, * destroys the buffer and remove it from the list. * If ReUse is false, the pointer is not added to the freeList, * it should be false only for sub-buffers. */ template inline void SYCLfree(void *ptr, PointerMapper &pMap) { pMap.template remove_pointer(ptr); } /** * Clear all the memory allocated by SYCL. */ template inline void SYCLfreeAll(PointerMapper &pMap) { pMap.clear(); } template struct RangeAccess { static const auto global_access = cl::sycl::access::target::global_buffer; static const auto is_place_holder = cl::sycl::access::placeholder::true_t; typedef T scalar_t; typedef scalar_t &ref_t; typedef typename cl::sycl::global_ptr::pointer_t ptr_t; // the accessor type does not necessarily the same as T typedef cl::sycl::accessor accessor; typedef RangeAccess self_t; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE RangeAccess(accessor access, size_t offset, std::intptr_t virtual_ptr) : access_(access), offset_(offset), virtual_ptr_(virtual_ptr) {} RangeAccess(cl::sycl::buffer buff = cl::sycl::buffer(cl::sycl::range<1>(1))) : access_{accessor{buff}}, offset_(0), virtual_ptr_(-1) {} // This should be only used for null constructor on the host side RangeAccess(std::nullptr_t) : RangeAccess() {} // This template parameter must be removed and scalar_t should be replaced EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptr_t get_pointer() const { return (access_.get_pointer().get() + offset_); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t &operator+=(Index offset) { offset_ += (offset); return *this; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t operator+(Index offset) const { return self_t(access_, offset_ + offset, virtual_ptr_); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t operator-(Index offset) const { return self_t(access_, offset_ - offset, virtual_ptr_); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t &operator-=(Index offset) { offset_ -= offset; return *this; } // THIS IS FOR NULL COMPARISON ONLY EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend bool operator==( const RangeAccess &lhs, std::nullptr_t) { return ((lhs.virtual_ptr_ == -1)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend bool operator!=( const RangeAccess &lhs, std::nullptr_t i) { return !(lhs == i); } // THIS IS FOR NULL COMPARISON ONLY EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend bool operator==( std::nullptr_t, const RangeAccess &rhs) { return ((rhs.virtual_ptr_ == -1)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend bool operator!=( std::nullptr_t i, const RangeAccess &rhs) { return !(i == rhs); } // Prefix operator (Increment and return value) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t &operator++() { offset_++; return (*this); } // Postfix operator (Return value and increment) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE self_t operator++(int i) { EIGEN_UNUSED_VARIABLE(i); self_t temp_iterator(*this); offset_++; return temp_iterator; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t get_size() const { return (access_.get_count() - offset_); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t get_offset() const { return offset_; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void set_offset(std::ptrdiff_t offset) { offset_ = offset; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ref_t operator*() const { return *get_pointer(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ref_t operator*() { return *get_pointer(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ptr_t operator->() = delete; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ref_t operator[](int x) { return *(get_pointer() + x); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ref_t operator[](int x) const { return *(get_pointer() + x); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_t *get_virtual_pointer() const { return reinterpret_cast(virtual_ptr_ + (offset_ * sizeof(scalar_t))); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit operator bool() const { return (virtual_ptr_ != -1); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator RangeAccess() { return RangeAccess(access_, offset_, virtual_ptr_); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator RangeAccess() const { return RangeAccess(access_, offset_, virtual_ptr_); } // binding placeholder accessors to a command group handler for SYCL EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void bind( cl::sycl::handler &cgh) const { cgh.require(access_); } private: accessor access_; size_t offset_; std::intptr_t virtual_ptr_; // the location of the buffer in the map }; template struct RangeAccess : RangeAccess { typedef RangeAccess Base; using Base::Base; }; } // namespace internal } // namespace TensorSycl } // namespace Eigen #endif // EIGEN_CXX11_TENSOR_TENSOR_SYCL_STORAGE_MEMORY_H RcppEigen/inst/include/Eigen/src/Core/arch/NEON/0000755000176200001440000000000014562417221020714 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/NEON/MathFunctions.h0000644000176200001440000000601314562417221023647 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATH_FUNCTIONS_NEON_H #define EIGEN_MATH_FUNCTIONS_NEON_H namespace Eigen { namespace internal { template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f pexp(const Packet2f& x) { return pexp_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp(const Packet4f& x) { return pexp_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f plog(const Packet2f& x) { return plog_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog(const Packet4f& x) { return plog_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f psin(const Packet2f& x) { return psin_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psin(const Packet4f& x) { return psin_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f pcos(const Packet2f& x) { return pcos_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pcos(const Packet4f& x) { return pcos_float(x); } // Hyperbolic Tangent function. template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2f ptanh(const Packet2f& x) { return internal::generic_fast_tanh_float(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh(const Packet4f& x) { return internal::generic_fast_tanh_float(x); } BF16_PACKET_FUNCTION(Packet4f, Packet4bf, psin) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pcos) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, plog) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, pexp) BF16_PACKET_FUNCTION(Packet4f, Packet4bf, ptanh) template <> EIGEN_STRONG_INLINE Packet4bf pfrexp(const Packet4bf& a, Packet4bf& exponent) { Packet4f fexponent; const Packet4bf out = F32ToBf16(pfrexp(Bf16ToF32(a), fexponent)); exponent = F32ToBf16(fexponent); return out; } template <> EIGEN_STRONG_INLINE Packet4bf pldexp(const Packet4bf& a, const Packet4bf& exponent) { return F32ToBf16(pldexp(Bf16ToF32(a), Bf16ToF32(exponent))); } //---------- double ---------- #if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp(const Packet2d& x) { return pexp_double(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d plog(const Packet2d& x) { return plog_double(x); } #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_NEON_H RcppEigen/inst/include/Eigen/src/Core/arch/NEON/PacketMath.h0000644000176200001440000056212514562417221023121 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // Copyright (C) 2010 Konstantinos Margaritis // Heavily based on Gael's SSE version. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_NEON_H #define EIGEN_PACKET_MATH_NEON_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #if EIGEN_ARCH_ARM64 #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 #else #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 16 #endif #endif #if EIGEN_COMP_MSVC_STRICT // In MSVC's arm_neon.h header file, all NEON vector types // are aliases to the same underlying type __n128. // We thus have to wrap them to make them different C++ types. // (See also bug 1428) typedef eigen_packet_wrapper Packet2f; typedef eigen_packet_wrapper Packet4f; typedef eigen_packet_wrapper Packet4c; typedef eigen_packet_wrapper Packet8c; typedef eigen_packet_wrapper Packet16c; typedef eigen_packet_wrapper Packet4uc; typedef eigen_packet_wrapper Packet8uc; typedef eigen_packet_wrapper Packet16uc; typedef eigen_packet_wrapper Packet4s; typedef eigen_packet_wrapper Packet8s; typedef eigen_packet_wrapper Packet4us; typedef eigen_packet_wrapper Packet8us; typedef eigen_packet_wrapper Packet2i; typedef eigen_packet_wrapper Packet4i; typedef eigen_packet_wrapper Packet2ui; typedef eigen_packet_wrapper Packet4ui; typedef eigen_packet_wrapper Packet2l; typedef eigen_packet_wrapper Packet2ul; #else typedef float32x2_t Packet2f; typedef float32x4_t Packet4f; typedef eigen_packet_wrapper Packet4c; typedef int8x8_t Packet8c; typedef int8x16_t Packet16c; typedef eigen_packet_wrapper Packet4uc; typedef uint8x8_t Packet8uc; typedef uint8x16_t Packet16uc; typedef int16x4_t Packet4s; typedef int16x8_t Packet8s; typedef uint16x4_t Packet4us; typedef uint16x8_t Packet8us; typedef int32x2_t Packet2i; typedef int32x4_t Packet4i; typedef uint32x2_t Packet2ui; typedef uint32x4_t Packet4ui; typedef int64x2_t Packet2l; typedef uint64x2_t Packet2ul; #endif // EIGEN_COMP_MSVC_STRICT EIGEN_STRONG_INLINE Packet4f shuffle1(const Packet4f& m, int mask){ const float* a = reinterpret_cast(&m); Packet4f res = {*(a + (mask & 3)), *(a + ((mask >> 2) & 3)), *(a + ((mask >> 4) & 3 )), *(a + ((mask >> 6) & 3))}; return res; } // fuctionally equivalent to _mm_shuffle_ps in SSE when interleave // == false (i.e. shuffle(m, n, mask) equals _mm_shuffle_ps(m, n, mask)), // interleave m and n when interleave == true. Currently used in LU/arch/InverseSize4.h // to enable a shared implementation for fast inversion of matrices of size 4. template EIGEN_STRONG_INLINE Packet4f shuffle2(const Packet4f &m, const Packet4f &n, int mask) { const float* a = reinterpret_cast(&m); const float* b = reinterpret_cast(&n); Packet4f res = {*(a + (mask & 3)), *(a + ((mask >> 2) & 3)), *(b + ((mask >> 4) & 3)), *(b + ((mask >> 6) & 3))}; return res; } template<> EIGEN_STRONG_INLINE Packet4f shuffle2(const Packet4f &m, const Packet4f &n, int mask) { const float* a = reinterpret_cast(&m); const float* b = reinterpret_cast(&n); Packet4f res = {*(a + (mask & 3)), *(b + ((mask >> 2) & 3)), *(a + ((mask >> 4) & 3)), *(b + ((mask >> 6) & 3))}; return res; } EIGEN_STRONG_INLINE static int eigen_neon_shuffle_mask(int p, int q, int r, int s) {return ((s)<<6|(r)<<4|(q)<<2|(p));} EIGEN_STRONG_INLINE Packet4f vec4f_swizzle1(const Packet4f& a, int p, int q, int r, int s) { return shuffle1(a, eigen_neon_shuffle_mask(p, q, r, s)); } EIGEN_STRONG_INLINE Packet4f vec4f_swizzle2(const Packet4f& a, const Packet4f& b, int p, int q, int r, int s) { return shuffle2(a,b,eigen_neon_shuffle_mask(p, q, r, s)); } EIGEN_STRONG_INLINE Packet4f vec4f_movelh(const Packet4f& a, const Packet4f& b) { return shuffle2(a,b,eigen_neon_shuffle_mask(0, 1, 0, 1)); } EIGEN_STRONG_INLINE Packet4f vec4f_movehl(const Packet4f& a, const Packet4f& b) { return shuffle2(b,a,eigen_neon_shuffle_mask(2, 3, 2, 3)); } EIGEN_STRONG_INLINE Packet4f vec4f_unpacklo(const Packet4f& a, const Packet4f& b) { return shuffle2(a,b,eigen_neon_shuffle_mask(0, 0, 1, 1)); } EIGEN_STRONG_INLINE Packet4f vec4f_unpackhi(const Packet4f& a, const Packet4f& b) { return shuffle2(a,b,eigen_neon_shuffle_mask(2, 2, 3, 3)); } #define vec4f_duplane(a, p) \ vdupq_lane_f32(vget_low_f32(a), p) #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \ const Packet4f p4f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \ const Packet4f p4f_##NAME = vreinterpretq_f32_u32(pset1(X)) #define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \ const Packet4i p4i_##NAME = pset1(X) #if EIGEN_ARCH_ARM64 // __builtin_prefetch tends to do nothing on ARM64 compilers because the // prefetch instructions there are too detailed for __builtin_prefetch to map // meaningfully to them. #define EIGEN_ARM_PREFETCH(ADDR) __asm__ __volatile__("prfm pldl1keep, [%[addr]]\n" ::[addr] "r"(ADDR) : ); #elif EIGEN_HAS_BUILTIN(__builtin_prefetch) || EIGEN_COMP_GNUC #define EIGEN_ARM_PREFETCH(ADDR) __builtin_prefetch(ADDR); #elif defined __pld #define EIGEN_ARM_PREFETCH(ADDR) __pld(ADDR) #elif EIGEN_ARCH_ARM32 #define EIGEN_ARM_PREFETCH(ADDR) __asm__ __volatile__ ("pld [%[addr]]\n" :: [addr] "r" (ADDR) : ); #else // by default no explicit prefetching #define EIGEN_ARM_PREFETCH(ADDR) #endif template <> struct packet_traits : default_packet_traits { typedef Packet4f type; typedef Packet2f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasDiv = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBessel = 0, // Issues with accuracy. HasNdtri = 0 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16c type; typedef Packet8c half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasAbsDiff = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16uc type; typedef Packet8uc half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 0, HasAbs = 1, HasAbsDiff = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasSqrt = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8s type; typedef Packet4s half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasAbsDiff = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8us type; typedef Packet4us half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 0, HasAbs = 0, HasAbsDiff = 1, HasArg = 0, HasAbs2 = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasSqrt = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4i type; typedef Packet2i half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4ui type; typedef Packet2ui half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 0, HasAbs = 0, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasSqrt = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet2l type; typedef Packet2l half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet2ul type; typedef Packet2ul half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 0, HasAbs = 0, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0 }; }; #if EIGEN_GNUC_AT_MOST(4, 4) && !EIGEN_COMP_LLVM // workaround gcc 4.2, 4.3 and 4.4 compilation issue EIGEN_STRONG_INLINE float32x4_t vld1q_f32(const float* x) { return ::vld1q_f32((const float32_t*)x); } EIGEN_STRONG_INLINE float32x2_t vld1_f32(const float* x) { return ::vld1_f32 ((const float32_t*)x); } EIGEN_STRONG_INLINE float32x2_t vld1_dup_f32(const float* x) { return ::vld1_dup_f32 ((const float32_t*)x); } EIGEN_STRONG_INLINE void vst1q_f32(float* to, float32x4_t from) { ::vst1q_f32((float32_t*)to,from); } EIGEN_STRONG_INLINE void vst1_f32 (float* to, float32x2_t from) { ::vst1_f32 ((float32_t*)to,from); } #endif template<> struct unpacket_traits { typedef float type; typedef Packet2f half; typedef Packet2i integer_packet; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef float type; typedef Packet2f half; typedef Packet4i integer_packet; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int8_t type; typedef Packet4c half; enum { size = 4, alignment = Unaligned, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int8_t type; typedef Packet4c half; enum { size = 8, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int8_t type; typedef Packet8c half; enum { size = 16, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint8_t type; typedef Packet4uc half; enum { size = 4, alignment = Unaligned, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint8_t type; typedef Packet4uc half; enum { size = 8, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint8_t type; typedef Packet8uc half; enum { size = 16, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false}; }; template<> struct unpacket_traits { typedef int16_t type; typedef Packet4s half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int16_t type; typedef Packet4s half; enum { size = 8, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint16_t type; typedef Packet4us half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint16_t type; typedef Packet4us half; enum { size = 8, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int32_t type; typedef Packet2i half; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int32_t type; typedef Packet2i half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint32_t type; typedef Packet2ui half; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint32_t type; typedef Packet2ui half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef int64_t type; typedef Packet2l half; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef uint64_t type; typedef Packet2ul half; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> EIGEN_STRONG_INLINE Packet2f pset1(const float& from) { return vdup_n_f32(from); } template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { return vdupq_n_f32(from); } template<> EIGEN_STRONG_INLINE Packet4c pset1(const int8_t& from) { return vget_lane_s32(vreinterpret_s32_s8(vdup_n_s8(from)), 0); } template<> EIGEN_STRONG_INLINE Packet8c pset1(const int8_t& from) { return vdup_n_s8(from); } template<> EIGEN_STRONG_INLINE Packet16c pset1(const int8_t& from) { return vdupq_n_s8(from); } template<> EIGEN_STRONG_INLINE Packet4uc pset1(const uint8_t& from) { return vget_lane_u32(vreinterpret_u32_u8(vdup_n_u8(from)), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pset1(const uint8_t& from) { return vdup_n_u8(from); } template<> EIGEN_STRONG_INLINE Packet16uc pset1(const uint8_t& from) { return vdupq_n_u8(from); } template<> EIGEN_STRONG_INLINE Packet4s pset1(const int16_t& from) { return vdup_n_s16(from); } template<> EIGEN_STRONG_INLINE Packet8s pset1(const int16_t& from) { return vdupq_n_s16(from); } template<> EIGEN_STRONG_INLINE Packet4us pset1(const uint16_t& from) { return vdup_n_u16(from); } template<> EIGEN_STRONG_INLINE Packet8us pset1(const uint16_t& from) { return vdupq_n_u16(from); } template<> EIGEN_STRONG_INLINE Packet2i pset1(const int32_t& from) { return vdup_n_s32(from); } template<> EIGEN_STRONG_INLINE Packet4i pset1(const int32_t& from) { return vdupq_n_s32(from); } template<> EIGEN_STRONG_INLINE Packet2ui pset1(const uint32_t& from) { return vdup_n_u32(from); } template<> EIGEN_STRONG_INLINE Packet4ui pset1(const uint32_t& from) { return vdupq_n_u32(from); } template<> EIGEN_STRONG_INLINE Packet2l pset1(const int64_t& from) { return vdupq_n_s64(from); } template<> EIGEN_STRONG_INLINE Packet2ul pset1(const uint64_t& from) { return vdupq_n_u64(from); } template<> EIGEN_STRONG_INLINE Packet2f pset1frombits(unsigned int from) { return vreinterpret_f32_u32(vdup_n_u32(from)); } template<> EIGEN_STRONG_INLINE Packet4f pset1frombits(unsigned int from) { return vreinterpretq_f32_u32(vdupq_n_u32(from)); } template<> EIGEN_STRONG_INLINE Packet2f plset(const float& a) { const float c[] = {0.0f,1.0f}; return vadd_f32(pset1(a), vld1_f32(c)); } template<> EIGEN_STRONG_INLINE Packet4f plset(const float& a) { const float c[] = {0.0f,1.0f,2.0f,3.0f}; return vaddq_f32(pset1(a), vld1q_f32(c)); } template<> EIGEN_STRONG_INLINE Packet4c plset(const int8_t& a) { return vget_lane_s32(vreinterpret_s32_s8(vadd_s8(vreinterpret_s8_u32(vdup_n_u32(0x03020100)), vdup_n_s8(a))), 0); } template<> EIGEN_STRONG_INLINE Packet8c plset(const int8_t& a) { const int8_t c[] = {0,1,2,3,4,5,6,7}; return vadd_s8(pset1(a), vld1_s8(c)); } template<> EIGEN_STRONG_INLINE Packet16c plset(const int8_t& a) { const int8_t c[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; return vaddq_s8(pset1(a), vld1q_s8(c)); } template<> EIGEN_STRONG_INLINE Packet4uc plset(const uint8_t& a) { return vget_lane_u32(vreinterpret_u32_u8(vadd_u8(vreinterpret_u8_u32(vdup_n_u32(0x03020100)), vdup_n_u8(a))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc plset(const uint8_t& a) { const uint8_t c[] = {0,1,2,3,4,5,6,7}; return vadd_u8(pset1(a), vld1_u8(c)); } template<> EIGEN_STRONG_INLINE Packet16uc plset(const uint8_t& a) { const uint8_t c[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; return vaddq_u8(pset1(a), vld1q_u8(c)); } template<> EIGEN_STRONG_INLINE Packet4s plset(const int16_t& a) { const int16_t c[] = {0,1,2,3}; return vadd_s16(pset1(a), vld1_s16(c)); } template<> EIGEN_STRONG_INLINE Packet4us plset(const uint16_t& a) { const uint16_t c[] = {0,1,2,3}; return vadd_u16(pset1(a), vld1_u16(c)); } template<> EIGEN_STRONG_INLINE Packet8s plset(const int16_t& a) { const int16_t c[] = {0,1,2,3,4,5,6,7}; return vaddq_s16(pset1(a), vld1q_s16(c)); } template<> EIGEN_STRONG_INLINE Packet8us plset(const uint16_t& a) { const uint16_t c[] = {0,1,2,3,4,5,6,7}; return vaddq_u16(pset1(a), vld1q_u16(c)); } template<> EIGEN_STRONG_INLINE Packet2i plset(const int32_t& a) { const int32_t c[] = {0,1}; return vadd_s32(pset1(a), vld1_s32(c)); } template<> EIGEN_STRONG_INLINE Packet4i plset(const int32_t& a) { const int32_t c[] = {0,1,2,3}; return vaddq_s32(pset1(a), vld1q_s32(c)); } template<> EIGEN_STRONG_INLINE Packet2ui plset(const uint32_t& a) { const uint32_t c[] = {0,1}; return vadd_u32(pset1(a), vld1_u32(c)); } template<> EIGEN_STRONG_INLINE Packet4ui plset(const uint32_t& a) { const uint32_t c[] = {0,1,2,3}; return vaddq_u32(pset1(a), vld1q_u32(c)); } template<> EIGEN_STRONG_INLINE Packet2l plset(const int64_t& a) { const int64_t c[] = {0,1}; return vaddq_s64(pset1(a), vld1q_s64(c)); } template<> EIGEN_STRONG_INLINE Packet2ul plset(const uint64_t& a) { const uint64_t c[] = {0,1}; return vaddq_u64(pset1(a), vld1q_u64(c)); } template<> EIGEN_STRONG_INLINE Packet2f padd(const Packet2f& a, const Packet2f& b) { return vadd_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f padd(const Packet4f& a, const Packet4f& b) { return vaddq_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4c padd(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vadd_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c padd(const Packet8c& a, const Packet8c& b) { return vadd_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c padd(const Packet16c& a, const Packet16c& b) { return vaddq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc padd(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vadd_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc padd(const Packet8uc& a, const Packet8uc& b) { return vadd_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc padd(const Packet16uc& a, const Packet16uc& b) { return vaddq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s padd(const Packet4s& a, const Packet4s& b) { return vadd_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s padd(const Packet8s& a, const Packet8s& b) { return vaddq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us padd(const Packet4us& a, const Packet4us& b) { return vadd_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us padd(const Packet8us& a, const Packet8us& b) { return vaddq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i padd(const Packet2i& a, const Packet2i& b) { return vadd_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i padd(const Packet4i& a, const Packet4i& b) { return vaddq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui padd(const Packet2ui& a, const Packet2ui& b) { return vadd_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui padd(const Packet4ui& a, const Packet4ui& b) { return vaddq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l padd(const Packet2l& a, const Packet2l& b) { return vaddq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul padd(const Packet2ul& a, const Packet2ul& b) { return vaddq_u64(a,b); } template<> EIGEN_STRONG_INLINE Packet2f psub(const Packet2f& a, const Packet2f& b) { return vsub_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f psub(const Packet4f& a, const Packet4f& b) { return vsubq_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4c psub(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vsub_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c psub(const Packet8c& a, const Packet8c& b) { return vsub_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c psub(const Packet16c& a, const Packet16c& b) { return vsubq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc psub(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vsub_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc psub(const Packet8uc& a, const Packet8uc& b) { return vsub_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc psub(const Packet16uc& a, const Packet16uc& b) { return vsubq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s psub(const Packet4s& a, const Packet4s& b) { return vsub_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s psub(const Packet8s& a, const Packet8s& b) { return vsubq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us psub(const Packet4us& a, const Packet4us& b) { return vsub_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us psub(const Packet8us& a, const Packet8us& b) { return vsubq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i psub(const Packet2i& a, const Packet2i& b) { return vsub_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i psub(const Packet4i& a, const Packet4i& b) { return vsubq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui psub(const Packet2ui& a, const Packet2ui& b) { return vsub_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui psub(const Packet4ui& a, const Packet4ui& b) { return vsubq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l psub(const Packet2l& a, const Packet2l& b) { return vsubq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul psub(const Packet2ul& a, const Packet2ul& b) { return vsubq_u64(a,b); } template<> EIGEN_STRONG_INLINE Packet2f pxor(const Packet2f& a, const Packet2f& b); template<> EIGEN_STRONG_INLINE Packet2f paddsub(const Packet2f& a, const Packet2f & b) { Packet2f mask = {numext::bit_cast(0x80000000u), 0.0f}; return padd(a, pxor(mask, b)); } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b); template<> EIGEN_STRONG_INLINE Packet4f paddsub(const Packet4f& a, const Packet4f& b) { Packet4f mask = {numext::bit_cast(0x80000000u), 0.0f, numext::bit_cast(0x80000000u), 0.0f}; return padd(a, pxor(mask, b)); } template<> EIGEN_STRONG_INLINE Packet2f pnegate(const Packet2f& a) { return vneg_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { return vnegq_f32(a); } template<> EIGEN_STRONG_INLINE Packet4c pnegate(const Packet4c& a) { return vget_lane_s32(vreinterpret_s32_s8(vneg_s8(vreinterpret_s8_s32(vdup_n_s32(a)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pnegate(const Packet8c& a) { return vneg_s8(a); } template<> EIGEN_STRONG_INLINE Packet16c pnegate(const Packet16c& a) { return vnegq_s8(a); } template<> EIGEN_STRONG_INLINE Packet4s pnegate(const Packet4s& a) { return vneg_s16(a); } template<> EIGEN_STRONG_INLINE Packet8s pnegate(const Packet8s& a) { return vnegq_s16(a); } template<> EIGEN_STRONG_INLINE Packet2i pnegate(const Packet2i& a) { return vneg_s32(a); } template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return vnegq_s32(a); } template<> EIGEN_STRONG_INLINE Packet2l pnegate(const Packet2l& a) { #if EIGEN_ARCH_ARM64 return vnegq_s64(a); #else return vcombine_s64( vdup_n_s64(-vgetq_lane_s64(a, 0)), vdup_n_s64(-vgetq_lane_s64(a, 1))); #endif } template<> EIGEN_STRONG_INLINE Packet2f pconj(const Packet2f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4c pconj(const Packet4c& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8c pconj(const Packet8c& a) { return a; } template<> EIGEN_STRONG_INLINE Packet16c pconj(const Packet16c& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4uc pconj(const Packet4uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8uc pconj(const Packet8uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet16uc pconj(const Packet16uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4s pconj(const Packet4s& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8s pconj(const Packet8s& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4us pconj(const Packet4us& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8us pconj(const Packet8us& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2i pconj(const Packet2i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2ui pconj(const Packet2ui& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4ui pconj(const Packet4ui& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2l pconj(const Packet2l& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2ul pconj(const Packet2ul& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2f pmul(const Packet2f& a, const Packet2f& b) { return vmul_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pmul(const Packet4f& a, const Packet4f& b) { return vmulq_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4c pmul(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vmul_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pmul(const Packet8c& a, const Packet8c& b) { return vmul_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pmul(const Packet16c& a, const Packet16c& b) { return vmulq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pmul(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vmul_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pmul(const Packet8uc& a, const Packet8uc& b) { return vmul_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pmul(const Packet16uc& a, const Packet16uc& b) { return vmulq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pmul(const Packet4s& a, const Packet4s& b) { return vmul_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pmul(const Packet8s& a, const Packet8s& b) { return vmulq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pmul(const Packet4us& a, const Packet4us& b) { return vmul_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pmul(const Packet8us& a, const Packet8us& b) { return vmulq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pmul(const Packet2i& a, const Packet2i& b) { return vmul_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pmul(const Packet4i& a, const Packet4i& b) { return vmulq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pmul(const Packet2ui& a, const Packet2ui& b) { return vmul_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pmul(const Packet4ui& a, const Packet4ui& b) { return vmulq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pmul(const Packet2l& a, const Packet2l& b) { return vcombine_s64( vdup_n_s64(vgetq_lane_s64(a, 0)*vgetq_lane_s64(b, 0)), vdup_n_s64(vgetq_lane_s64(a, 1)*vgetq_lane_s64(b, 1))); } template<> EIGEN_STRONG_INLINE Packet2ul pmul(const Packet2ul& a, const Packet2ul& b) { return vcombine_u64( vdup_n_u64(vgetq_lane_u64(a, 0)*vgetq_lane_u64(b, 0)), vdup_n_u64(vgetq_lane_u64(a, 1)*vgetq_lane_u64(b, 1))); } template<> EIGEN_STRONG_INLINE Packet2f pdiv(const Packet2f& a, const Packet2f& b) { #if EIGEN_ARCH_ARM64 return vdiv_f32(a,b); #else Packet2f inv, restep, div; // NEON does not offer a divide instruction, we have to do a reciprocal approximation // However NEON in contrast to other SIMD engines (AltiVec/SSE), offers // a reciprocal estimate AND a reciprocal step -which saves a few instructions // vrecpeq_f32() returns an estimate to 1/b, which we will finetune with // Newton-Raphson and vrecpsq_f32() inv = vrecpe_f32(b); // This returns a differential, by which we will have to multiply inv to get a better // approximation of 1/b. restep = vrecps_f32(b, inv); inv = vmul_f32(restep, inv); // Finally, multiply a by 1/b and get the wanted result of the division. div = vmul_f32(a, inv); return div; #endif } template<> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { #if EIGEN_ARCH_ARM64 return vdivq_f32(a,b); #else Packet4f inv, restep, div; // NEON does not offer a divide instruction, we have to do a reciprocal approximation // However NEON in contrast to other SIMD engines (AltiVec/SSE), offers // a reciprocal estimate AND a reciprocal step -which saves a few instructions // vrecpeq_f32() returns an estimate to 1/b, which we will finetune with // Newton-Raphson and vrecpsq_f32() inv = vrecpeq_f32(b); // This returns a differential, by which we will have to multiply inv to get a better // approximation of 1/b. restep = vrecpsq_f32(b, inv); inv = vmulq_f32(restep, inv); // Finally, multiply a by 1/b and get the wanted result of the division. div = vmulq_f32(a, inv); return div; #endif } template<> EIGEN_STRONG_INLINE Packet4c pdiv(const Packet4c& /*a*/, const Packet4c& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet8c pdiv(const Packet8c& /*a*/, const Packet8c& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet16c pdiv(const Packet16c& /*a*/, const Packet16c& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet4uc pdiv(const Packet4uc& /*a*/, const Packet4uc& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet8uc pdiv(const Packet8uc& /*a*/, const Packet8uc& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet16uc pdiv(const Packet16uc& /*a*/, const Packet16uc& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet4s pdiv(const Packet4s& /*a*/, const Packet4s& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet8s pdiv(const Packet8s& /*a*/, const Packet8s& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet4us pdiv(const Packet4us& /*a*/, const Packet4us& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet8us pdiv(const Packet8us& /*a*/, const Packet8us& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet2i pdiv(const Packet2i& /*a*/, const Packet2i& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet4i pdiv(const Packet4i& /*a*/, const Packet4i& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet2ui pdiv(const Packet2ui& /*a*/, const Packet2ui& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet4ui pdiv(const Packet4ui& /*a*/, const Packet4ui& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0); } template<> EIGEN_STRONG_INLINE Packet2l pdiv(const Packet2l& /*a*/, const Packet2l& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0LL); } template<> EIGEN_STRONG_INLINE Packet2ul pdiv(const Packet2ul& /*a*/, const Packet2ul& /*b*/) { eigen_assert(false && "packet integer division are not supported by NEON"); return pset1(0ULL); } #ifdef __ARM_FEATURE_FMA template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vfmaq_f32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet2f pmadd(const Packet2f& a, const Packet2f& b, const Packet2f& c) { return vfma_f32(c,a,b); } #else template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vmlaq_f32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet2f pmadd(const Packet2f& a, const Packet2f& b, const Packet2f& c) { return vmla_f32(c,a,b); } #endif // No FMA instruction for int, so use MLA unconditionally. template<> EIGEN_STRONG_INLINE Packet4c pmadd(const Packet4c& a, const Packet4c& b, const Packet4c& c) { return vget_lane_s32(vreinterpret_s32_s8(vmla_s8( vreinterpret_s8_s32(vdup_n_s32(c)), vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pmadd(const Packet8c& a, const Packet8c& b, const Packet8c& c) { return vmla_s8(c,a,b); } template<> EIGEN_STRONG_INLINE Packet16c pmadd(const Packet16c& a, const Packet16c& b, const Packet16c& c) { return vmlaq_s8(c,a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pmadd(const Packet4uc& a, const Packet4uc& b, const Packet4uc& c) { return vget_lane_u32(vreinterpret_u32_u8(vmla_u8( vreinterpret_u8_u32(vdup_n_u32(c)), vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pmadd(const Packet8uc& a, const Packet8uc& b, const Packet8uc& c) { return vmla_u8(c,a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pmadd(const Packet16uc& a, const Packet16uc& b, const Packet16uc& c) { return vmlaq_u8(c,a,b); } template<> EIGEN_STRONG_INLINE Packet4s pmadd(const Packet4s& a, const Packet4s& b, const Packet4s& c) { return vmla_s16(c,a,b); } template<> EIGEN_STRONG_INLINE Packet8s pmadd(const Packet8s& a, const Packet8s& b, const Packet8s& c) { return vmlaq_s16(c,a,b); } template<> EIGEN_STRONG_INLINE Packet4us pmadd(const Packet4us& a, const Packet4us& b, const Packet4us& c) { return vmla_u16(c,a,b); } template<> EIGEN_STRONG_INLINE Packet8us pmadd(const Packet8us& a, const Packet8us& b, const Packet8us& c) { return vmlaq_u16(c,a,b); } template<> EIGEN_STRONG_INLINE Packet2i pmadd(const Packet2i& a, const Packet2i& b, const Packet2i& c) { return vmla_s32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return vmlaq_s32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pmadd(const Packet2ui& a, const Packet2ui& b, const Packet2ui& c) { return vmla_u32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pmadd(const Packet4ui& a, const Packet4ui& b, const Packet4ui& c) { return vmlaq_u32(c,a,b); } template<> EIGEN_STRONG_INLINE Packet2f pabsdiff(const Packet2f& a, const Packet2f& b) { return vabd_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pabsdiff(const Packet4f& a, const Packet4f& b) { return vabdq_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4c pabsdiff(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vabd_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pabsdiff(const Packet8c& a, const Packet8c& b) { return vabd_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pabsdiff(const Packet16c& a, const Packet16c& b) { return vabdq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pabsdiff(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vabd_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pabsdiff(const Packet8uc& a, const Packet8uc& b) { return vabd_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pabsdiff(const Packet16uc& a, const Packet16uc& b) { return vabdq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pabsdiff(const Packet4s& a, const Packet4s& b) { return vabd_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pabsdiff(const Packet8s& a, const Packet8s& b) { return vabdq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pabsdiff(const Packet4us& a, const Packet4us& b) { return vabd_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pabsdiff(const Packet8us& a, const Packet8us& b) { return vabdq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pabsdiff(const Packet2i& a, const Packet2i& b) { return vabd_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pabsdiff(const Packet4i& a, const Packet4i& b) { return vabdq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pabsdiff(const Packet2ui& a, const Packet2ui& b) { return vabd_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pabsdiff(const Packet4ui& a, const Packet4ui& b) { return vabdq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2f pmin(const Packet2f& a, const Packet2f& b) { return vmin_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { return vminq_f32(a,b); } #ifdef __ARM_FEATURE_NUMERIC_MAXMIN // numeric max and min are only available if ARM_FEATURE_NUMERIC_MAXMIN is defined (which can only be the case for Armv8 systems). template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { return vminnmq_f32(a, b); } template<> EIGEN_STRONG_INLINE Packet2f pmin(const Packet2f& a, const Packet2f& b) { return vminnm_f32(a, b); } #endif template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { return pmin(a, b); } template<> EIGEN_STRONG_INLINE Packet2f pmin(const Packet2f& a, const Packet2f& b) { return pmin(a, b); } template<> EIGEN_STRONG_INLINE Packet4c pmin(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vmin_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pmin(const Packet8c& a, const Packet8c& b) { return vmin_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pmin(const Packet16c& a, const Packet16c& b) { return vminq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pmin(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vmin_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pmin(const Packet8uc& a, const Packet8uc& b) { return vmin_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pmin(const Packet16uc& a, const Packet16uc& b) { return vminq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pmin(const Packet4s& a, const Packet4s& b) { return vmin_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pmin(const Packet8s& a, const Packet8s& b) { return vminq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pmin(const Packet4us& a, const Packet4us& b) { return vmin_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pmin(const Packet8us& a, const Packet8us& b) { return vminq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pmin(const Packet2i& a, const Packet2i& b) { return vmin_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pmin(const Packet4i& a, const Packet4i& b) { return vminq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pmin(const Packet2ui& a, const Packet2ui& b) { return vmin_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pmin(const Packet4ui& a, const Packet4ui& b) { return vminq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pmin(const Packet2l& a, const Packet2l& b) { return vcombine_s64( vdup_n_s64((std::min)(vgetq_lane_s64(a, 0), vgetq_lane_s64(b, 0))), vdup_n_s64((std::min)(vgetq_lane_s64(a, 1), vgetq_lane_s64(b, 1)))); } template<> EIGEN_STRONG_INLINE Packet2ul pmin(const Packet2ul& a, const Packet2ul& b) { return vcombine_u64( vdup_n_u64((std::min)(vgetq_lane_u64(a, 0), vgetq_lane_u64(b, 0))), vdup_n_u64((std::min)(vgetq_lane_u64(a, 1), vgetq_lane_u64(b, 1)))); } template<> EIGEN_STRONG_INLINE Packet2f pmax(const Packet2f& a, const Packet2f& b) { return vmax_f32(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { return vmaxq_f32(a,b); } #ifdef __ARM_FEATURE_NUMERIC_MAXMIN // numeric max and min are only available if ARM_FEATURE_NUMERIC_MAXMIN is defined (which can only be the case for Armv8 systems). template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { return vmaxnmq_f32(a, b); } template<> EIGEN_STRONG_INLINE Packet2f pmax(const Packet2f& a, const Packet2f& b) { return vmaxnm_f32(a, b); } #endif template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { return pmax(a, b); } template<> EIGEN_STRONG_INLINE Packet2f pmax(const Packet2f& a, const Packet2f& b) { return pmax(a, b); } template<> EIGEN_STRONG_INLINE Packet4c pmax(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_s8(vmax_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pmax(const Packet8c& a, const Packet8c& b) { return vmax_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pmax(const Packet16c& a, const Packet16c& b) { return vmaxq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pmax(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vmax_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pmax(const Packet8uc& a, const Packet8uc& b) { return vmax_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pmax(const Packet16uc& a, const Packet16uc& b) { return vmaxq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pmax(const Packet4s& a, const Packet4s& b) { return vmax_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pmax(const Packet8s& a, const Packet8s& b) { return vmaxq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pmax(const Packet4us& a, const Packet4us& b) { return vmax_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pmax(const Packet8us& a, const Packet8us& b) { return vmaxq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pmax(const Packet2i& a, const Packet2i& b) { return vmax_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pmax(const Packet4i& a, const Packet4i& b) { return vmaxq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pmax(const Packet2ui& a, const Packet2ui& b) { return vmax_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pmax(const Packet4ui& a, const Packet4ui& b) { return vmaxq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pmax(const Packet2l& a, const Packet2l& b) { return vcombine_s64( vdup_n_s64((std::max)(vgetq_lane_s64(a, 0), vgetq_lane_s64(b, 0))), vdup_n_s64((std::max)(vgetq_lane_s64(a, 1), vgetq_lane_s64(b, 1)))); } template<> EIGEN_STRONG_INLINE Packet2ul pmax(const Packet2ul& a, const Packet2ul& b) { return vcombine_u64( vdup_n_u64((std::max)(vgetq_lane_u64(a, 0), vgetq_lane_u64(b, 0))), vdup_n_u64((std::max)(vgetq_lane_u64(a, 1), vgetq_lane_u64(b, 1)))); } template<> EIGEN_STRONG_INLINE Packet2f pcmp_le(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vcle_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_le(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vcleq_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4c pcmp_le(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_u8(vcle_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pcmp_le(const Packet8c& a, const Packet8c& b) { return vreinterpret_s8_u8(vcle_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_le(const Packet16c& a, const Packet16c& b) { return vreinterpretq_s8_u8(vcleq_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet4uc pcmp_le(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vcle_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pcmp_le(const Packet8uc& a, const Packet8uc& b) { return vcle_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_le(const Packet16uc& a, const Packet16uc& b) { return vcleq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pcmp_le(const Packet4s& a, const Packet4s& b) { return vreinterpret_s16_u16(vcle_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_le(const Packet8s& a, const Packet8s& b) { return vreinterpretq_s16_u16(vcleq_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet4us pcmp_le(const Packet4us& a, const Packet4us& b) { return vcle_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_le(const Packet8us& a, const Packet8us& b) { return vcleq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pcmp_le(const Packet2i& a, const Packet2i& b) { return vreinterpret_s32_u32(vcle_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_le(const Packet4i& a, const Packet4i& b) { return vreinterpretq_s32_u32(vcleq_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet2ui pcmp_le(const Packet2ui& a, const Packet2ui& b) { return vcle_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pcmp_le(const Packet4ui& a, const Packet4ui& b) { return vcleq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pcmp_le(const Packet2l& a, const Packet2l& b) { #if EIGEN_ARCH_ARM64 return vreinterpretq_s64_u64(vcleq_s64(a,b)); #else return vcombine_s64( vdup_n_s64(vgetq_lane_s64(a, 0) <= vgetq_lane_s64(b, 0) ? numext::int64_t(-1) : 0), vdup_n_s64(vgetq_lane_s64(a, 1) <= vgetq_lane_s64(b, 1) ? numext::int64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2ul pcmp_le(const Packet2ul& a, const Packet2ul& b) { #if EIGEN_ARCH_ARM64 return vcleq_u64(a,b); #else return vcombine_u64( vdup_n_u64(vgetq_lane_u64(a, 0) <= vgetq_lane_u64(b, 0) ? numext::uint64_t(-1) : 0), vdup_n_u64(vgetq_lane_u64(a, 1) <= vgetq_lane_u64(b, 1) ? numext::uint64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2f pcmp_lt(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vclt_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vcltq_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4c pcmp_lt(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_u8(vclt_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pcmp_lt(const Packet8c& a, const Packet8c& b) { return vreinterpret_s8_u8(vclt_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_lt(const Packet16c& a, const Packet16c& b) { return vreinterpretq_s8_u8(vcltq_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet4uc pcmp_lt(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vclt_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pcmp_lt(const Packet8uc& a, const Packet8uc& b) { return vclt_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_lt(const Packet16uc& a, const Packet16uc& b) { return vcltq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pcmp_lt(const Packet4s& a, const Packet4s& b) { return vreinterpret_s16_u16(vclt_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_lt(const Packet8s& a, const Packet8s& b) { return vreinterpretq_s16_u16(vcltq_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet4us pcmp_lt(const Packet4us& a, const Packet4us& b) { return vclt_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_lt(const Packet8us& a, const Packet8us& b) { return vcltq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pcmp_lt(const Packet2i& a, const Packet2i& b) { return vreinterpret_s32_u32(vclt_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_lt(const Packet4i& a, const Packet4i& b) { return vreinterpretq_s32_u32(vcltq_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet2ui pcmp_lt(const Packet2ui& a, const Packet2ui& b) { return vclt_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pcmp_lt(const Packet4ui& a, const Packet4ui& b) { return vcltq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pcmp_lt(const Packet2l& a, const Packet2l& b) { #if EIGEN_ARCH_ARM64 return vreinterpretq_s64_u64(vcltq_s64(a,b)); #else return vcombine_s64( vdup_n_s64(vgetq_lane_s64(a, 0) < vgetq_lane_s64(b, 0) ? numext::int64_t(-1) : 0), vdup_n_s64(vgetq_lane_s64(a, 1) < vgetq_lane_s64(b, 1) ? numext::int64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2ul pcmp_lt(const Packet2ul& a, const Packet2ul& b) { #if EIGEN_ARCH_ARM64 return vcltq_u64(a,b); #else return vcombine_u64( vdup_n_u64(vgetq_lane_u64(a, 0) < vgetq_lane_u64(b, 0) ? numext::uint64_t(-1) : 0), vdup_n_u64(vgetq_lane_u64(a, 1) < vgetq_lane_u64(b, 1) ? numext::uint64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2f pcmp_eq(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vceq_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_eq(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vceqq_f32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4c pcmp_eq(const Packet4c& a, const Packet4c& b) { return vget_lane_s32(vreinterpret_s32_u8(vceq_s8( vreinterpret_s8_s32(vdup_n_s32(a)), vreinterpret_s8_s32(vdup_n_s32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pcmp_eq(const Packet8c& a, const Packet8c& b) { return vreinterpret_s8_u8(vceq_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_eq(const Packet16c& a, const Packet16c& b) { return vreinterpretq_s8_u8(vceqq_s8(a,b)); } template<> EIGEN_STRONG_INLINE Packet4uc pcmp_eq(const Packet4uc& a, const Packet4uc& b) { return vget_lane_u32(vreinterpret_u32_u8(vceq_u8( vreinterpret_u8_u32(vdup_n_u32(a)), vreinterpret_u8_u32(vdup_n_u32(b)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc pcmp_eq(const Packet8uc& a, const Packet8uc& b) { return vceq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_eq(const Packet16uc& a, const Packet16uc& b) { return vceqq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pcmp_eq(const Packet4s& a, const Packet4s& b) { return vreinterpret_s16_u16(vceq_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_eq(const Packet8s& a, const Packet8s& b) { return vreinterpretq_s16_u16(vceqq_s16(a,b)); } template<> EIGEN_STRONG_INLINE Packet4us pcmp_eq(const Packet4us& a, const Packet4us& b) { return vceq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_eq(const Packet8us& a, const Packet8us& b) { return vceqq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pcmp_eq(const Packet2i& a, const Packet2i& b) { return vreinterpret_s32_u32(vceq_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_eq(const Packet4i& a, const Packet4i& b) { return vreinterpretq_s32_u32(vceqq_s32(a,b)); } template<> EIGEN_STRONG_INLINE Packet2ui pcmp_eq(const Packet2ui& a, const Packet2ui& b) { return vceq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pcmp_eq(const Packet4ui& a, const Packet4ui& b) { return vceqq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pcmp_eq(const Packet2l& a, const Packet2l& b) { #if EIGEN_ARCH_ARM64 return vreinterpretq_s64_u64(vceqq_s64(a,b)); #else return vcombine_s64( vdup_n_s64(vgetq_lane_s64(a, 0) == vgetq_lane_s64(b, 0) ? numext::int64_t(-1) : 0), vdup_n_s64(vgetq_lane_s64(a, 1) == vgetq_lane_s64(b, 1) ? numext::int64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2ul pcmp_eq(const Packet2ul& a, const Packet2ul& b) { #if EIGEN_ARCH_ARM64 return vceqq_u64(a,b); #else return vcombine_u64( vdup_n_u64(vgetq_lane_u64(a, 0) == vgetq_lane_u64(b, 0) ? numext::uint64_t(-1) : 0), vdup_n_u64(vgetq_lane_u64(a, 1) == vgetq_lane_u64(b, 1) ? numext::uint64_t(-1) : 0)); #endif } template<> EIGEN_STRONG_INLINE Packet2f pcmp_lt_or_nan(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vmvn_u32(vcge_f32(a,b))); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt_or_nan(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vmvnq_u32(vcgeq_f32(a,b))); } // Logical Operations are not supported for float, so we have to reinterpret casts using NEON intrinsics template<> EIGEN_STRONG_INLINE Packet2f pand(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vand_u32(vreinterpret_u32_f32(a),vreinterpret_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4f pand(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4c pand(const Packet4c& a, const Packet4c& b) { return a & b; } template<> EIGEN_STRONG_INLINE Packet8c pand(const Packet8c& a, const Packet8c& b) { return vand_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pand(const Packet16c& a, const Packet16c& b) { return vandq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pand(const Packet4uc& a, const Packet4uc& b) { return a & b; } template<> EIGEN_STRONG_INLINE Packet8uc pand(const Packet8uc& a, const Packet8uc& b) { return vand_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pand(const Packet16uc& a, const Packet16uc& b) { return vandq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pand(const Packet4s& a, const Packet4s& b) { return vand_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pand(const Packet8s& a, const Packet8s& b) { return vandq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pand(const Packet4us& a, const Packet4us& b) { return vand_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pand(const Packet8us& a, const Packet8us& b) { return vandq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pand(const Packet2i& a, const Packet2i& b) { return vand_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pand(const Packet4i& a, const Packet4i& b) { return vandq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pand(const Packet2ui& a, const Packet2ui& b) { return vand_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pand(const Packet4ui& a, const Packet4ui& b) { return vandq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pand(const Packet2l& a, const Packet2l& b) { return vandq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul pand(const Packet2ul& a, const Packet2ul& b) { return vandq_u64(a,b); } template<> EIGEN_STRONG_INLINE Packet2f por(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vorr_u32(vreinterpret_u32_f32(a),vreinterpret_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4f por(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4c por(const Packet4c& a, const Packet4c& b) { return a | b; } template<> EIGEN_STRONG_INLINE Packet8c por(const Packet8c& a, const Packet8c& b) { return vorr_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c por(const Packet16c& a, const Packet16c& b) { return vorrq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc por(const Packet4uc& a, const Packet4uc& b) { return a | b; } template<> EIGEN_STRONG_INLINE Packet8uc por(const Packet8uc& a, const Packet8uc& b) { return vorr_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc por(const Packet16uc& a, const Packet16uc& b) { return vorrq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s por(const Packet4s& a, const Packet4s& b) { return vorr_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s por(const Packet8s& a, const Packet8s& b) { return vorrq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us por(const Packet4us& a, const Packet4us& b) { return vorr_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us por(const Packet8us& a, const Packet8us& b) { return vorrq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i por(const Packet2i& a, const Packet2i& b) { return vorr_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i por(const Packet4i& a, const Packet4i& b) { return vorrq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui por(const Packet2ui& a, const Packet2ui& b) { return vorr_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui por(const Packet4ui& a, const Packet4ui& b) { return vorrq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l por(const Packet2l& a, const Packet2l& b) { return vorrq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul por(const Packet2ul& a, const Packet2ul& b) { return vorrq_u64(a,b); } template<> EIGEN_STRONG_INLINE Packet2f pxor(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(a),vreinterpret_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4c pxor(const Packet4c& a, const Packet4c& b) { return a ^ b; } template<> EIGEN_STRONG_INLINE Packet8c pxor(const Packet8c& a, const Packet8c& b) { return veor_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pxor(const Packet16c& a, const Packet16c& b) { return veorq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pxor(const Packet4uc& a, const Packet4uc& b) { return a ^ b; } template<> EIGEN_STRONG_INLINE Packet8uc pxor(const Packet8uc& a, const Packet8uc& b) { return veor_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pxor(const Packet16uc& a, const Packet16uc& b) { return veorq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pxor(const Packet4s& a, const Packet4s& b) { return veor_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pxor(const Packet8s& a, const Packet8s& b) { return veorq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pxor(const Packet4us& a, const Packet4us& b) { return veor_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pxor(const Packet8us& a, const Packet8us& b) { return veorq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pxor(const Packet2i& a, const Packet2i& b) { return veor_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pxor(const Packet4i& a, const Packet4i& b) { return veorq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pxor(const Packet2ui& a, const Packet2ui& b) { return veor_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pxor(const Packet4ui& a, const Packet4ui& b) { return veorq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pxor(const Packet2l& a, const Packet2l& b) { return veorq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul pxor(const Packet2ul& a, const Packet2ul& b) { return veorq_u64(a,b); } template<> EIGEN_STRONG_INLINE Packet2f pandnot(const Packet2f& a, const Packet2f& b) { return vreinterpret_f32_u32(vbic_u32(vreinterpret_u32_f32(a),vreinterpret_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { return vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b))); } template<> EIGEN_STRONG_INLINE Packet4c pandnot(const Packet4c& a, const Packet4c& b) { return a & ~b; } template<> EIGEN_STRONG_INLINE Packet8c pandnot(const Packet8c& a, const Packet8c& b) { return vbic_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pandnot(const Packet16c& a, const Packet16c& b) { return vbicq_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc pandnot(const Packet4uc& a, const Packet4uc& b) { return a & ~b; } template<> EIGEN_STRONG_INLINE Packet8uc pandnot(const Packet8uc& a, const Packet8uc& b) { return vbic_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pandnot(const Packet16uc& a, const Packet16uc& b) { return vbicq_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet4s pandnot(const Packet4s& a, const Packet4s& b) { return vbic_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet8s pandnot(const Packet8s& a, const Packet8s& b) { return vbicq_s16(a,b); } template<> EIGEN_STRONG_INLINE Packet4us pandnot(const Packet4us& a, const Packet4us& b) { return vbic_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pandnot(const Packet8us& a, const Packet8us& b) { return vbicq_u16(a,b); } template<> EIGEN_STRONG_INLINE Packet2i pandnot(const Packet2i& a, const Packet2i& b) { return vbic_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet4i pandnot(const Packet4i& a, const Packet4i& b) { return vbicq_s32(a,b); } template<> EIGEN_STRONG_INLINE Packet2ui pandnot(const Packet2ui& a, const Packet2ui& b) { return vbic_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet4ui pandnot(const Packet4ui& a, const Packet4ui& b) { return vbicq_u32(a,b); } template<> EIGEN_STRONG_INLINE Packet2l pandnot(const Packet2l& a, const Packet2l& b) { return vbicq_s64(a,b); } template<> EIGEN_STRONG_INLINE Packet2ul pandnot(const Packet2ul& a, const Packet2ul& b) { return vbicq_u64(a,b); } template EIGEN_STRONG_INLINE Packet4c parithmetic_shift_right(Packet4c& a) { return vget_lane_s32(vreinterpret_s32_s8(vshr_n_s8(vreinterpret_s8_s32(vdup_n_s32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8c parithmetic_shift_right(Packet8c a) { return vshr_n_s8(a,N); } template EIGEN_STRONG_INLINE Packet16c parithmetic_shift_right(Packet16c a) { return vshrq_n_s8(a,N); } template EIGEN_STRONG_INLINE Packet4uc parithmetic_shift_right(Packet4uc& a) { return vget_lane_u32(vreinterpret_u32_u8(vshr_n_u8(vreinterpret_u8_u32(vdup_n_u32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8uc parithmetic_shift_right(Packet8uc a) { return vshr_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet16uc parithmetic_shift_right(Packet16uc a) { return vshrq_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet4s parithmetic_shift_right(Packet4s a) { return vshr_n_s16(a,N); } template EIGEN_STRONG_INLINE Packet8s parithmetic_shift_right(Packet8s a) { return vshrq_n_s16(a,N); } template EIGEN_STRONG_INLINE Packet4us parithmetic_shift_right(Packet4us a) { return vshr_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet8us parithmetic_shift_right(Packet8us a) { return vshrq_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet2i parithmetic_shift_right(Packet2i a) { return vshr_n_s32(a,N); } template EIGEN_STRONG_INLINE Packet4i parithmetic_shift_right(Packet4i a) { return vshrq_n_s32(a,N); } template EIGEN_STRONG_INLINE Packet2ui parithmetic_shift_right(Packet2ui a) { return vshr_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet4ui parithmetic_shift_right(Packet4ui a) { return vshrq_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet2l parithmetic_shift_right(Packet2l a) { return vshrq_n_s64(a,N); } template EIGEN_STRONG_INLINE Packet2ul parithmetic_shift_right(Packet2ul a) { return vshrq_n_u64(a,N); } template EIGEN_STRONG_INLINE Packet4c plogical_shift_right(Packet4c& a) { return vget_lane_s32(vreinterpret_s32_u8(vshr_n_u8(vreinterpret_u8_s32(vdup_n_s32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8c plogical_shift_right(Packet8c a) { return vreinterpret_s8_u8(vshr_n_u8(vreinterpret_u8_s8(a),N)); } template EIGEN_STRONG_INLINE Packet16c plogical_shift_right(Packet16c a) { return vreinterpretq_s8_u8(vshrq_n_u8(vreinterpretq_u8_s8(a),N)); } template EIGEN_STRONG_INLINE Packet4uc plogical_shift_right(Packet4uc& a) { return vget_lane_u32(vreinterpret_u32_s8(vshr_n_s8(vreinterpret_s8_u32(vdup_n_u32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8uc plogical_shift_right(Packet8uc a) { return vshr_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet16uc plogical_shift_right(Packet16uc a) { return vshrq_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet4s plogical_shift_right(Packet4s a) { return vreinterpret_s16_u16(vshr_n_u16(vreinterpret_u16_s16(a),N)); } template EIGEN_STRONG_INLINE Packet8s plogical_shift_right(Packet8s a) { return vreinterpretq_s16_u16(vshrq_n_u16(vreinterpretq_u16_s16(a),N)); } template EIGEN_STRONG_INLINE Packet4us plogical_shift_right(Packet4us a) { return vshr_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet8us plogical_shift_right(Packet8us a) { return vshrq_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet2i plogical_shift_right(Packet2i a) { return vreinterpret_s32_u32(vshr_n_u32(vreinterpret_u32_s32(a),N)); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_right(Packet4i a) { return vreinterpretq_s32_u32(vshrq_n_u32(vreinterpretq_u32_s32(a),N)); } template EIGEN_STRONG_INLINE Packet2ui plogical_shift_right(Packet2ui a) { return vshr_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet4ui plogical_shift_right(Packet4ui a) { return vshrq_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet2l plogical_shift_right(Packet2l a) { return vreinterpretq_s64_u64(vshrq_n_u64(vreinterpretq_u64_s64(a),N)); } template EIGEN_STRONG_INLINE Packet2ul plogical_shift_right(Packet2ul a) { return vshrq_n_u64(a,N); } template EIGEN_STRONG_INLINE Packet4c plogical_shift_left(Packet4c& a) { return vget_lane_s32(vreinterpret_s32_s8(vshl_n_s8(vreinterpret_s8_s32(vdup_n_s32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8c plogical_shift_left(Packet8c a) { return vshl_n_s8(a,N); } template EIGEN_STRONG_INLINE Packet16c plogical_shift_left(Packet16c a) { return vshlq_n_s8(a,N); } template EIGEN_STRONG_INLINE Packet4uc plogical_shift_left(Packet4uc& a) { return vget_lane_u32(vreinterpret_u32_u8(vshl_n_u8(vreinterpret_u8_u32(vdup_n_u32(a)), N)), 0); } template EIGEN_STRONG_INLINE Packet8uc plogical_shift_left(Packet8uc a) { return vshl_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet16uc plogical_shift_left(Packet16uc a) { return vshlq_n_u8(a,N); } template EIGEN_STRONG_INLINE Packet4s plogical_shift_left(Packet4s a) { return vshl_n_s16(a,N); } template EIGEN_STRONG_INLINE Packet8s plogical_shift_left(Packet8s a) { return vshlq_n_s16(a,N); } template EIGEN_STRONG_INLINE Packet4us plogical_shift_left(Packet4us a) { return vshl_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet8us plogical_shift_left(Packet8us a) { return vshlq_n_u16(a,N); } template EIGEN_STRONG_INLINE Packet2i plogical_shift_left(Packet2i a) { return vshl_n_s32(a,N); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_left(Packet4i a) { return vshlq_n_s32(a,N); } template EIGEN_STRONG_INLINE Packet2ui plogical_shift_left(Packet2ui a) { return vshl_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet4ui plogical_shift_left(Packet4ui a) { return vshlq_n_u32(a,N); } template EIGEN_STRONG_INLINE Packet2l plogical_shift_left(Packet2l a) { return vshlq_n_s64(a,N); } template EIGEN_STRONG_INLINE Packet2ul plogical_shift_left(Packet2ul a) { return vshlq_n_u64(a,N); } template<> EIGEN_STRONG_INLINE Packet2f pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_f32(from); } template<> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_f32(from); } template<> EIGEN_STRONG_INLINE Packet4c pload(const int8_t* from) { Packet4c res; memcpy(&res, from, sizeof(Packet4c)); return res; } template<> EIGEN_STRONG_INLINE Packet8c pload(const int8_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_s8(from); } template<> EIGEN_STRONG_INLINE Packet16c pload(const int8_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_s8(from); } template<> EIGEN_STRONG_INLINE Packet4uc pload(const uint8_t* from) { Packet4uc res; memcpy(&res, from, sizeof(Packet4uc)); return res; } template<> EIGEN_STRONG_INLINE Packet8uc pload(const uint8_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_u8(from); } template<> EIGEN_STRONG_INLINE Packet16uc pload(const uint8_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_u8(from); } template<> EIGEN_STRONG_INLINE Packet4s pload(const int16_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_s16(from); } template<> EIGEN_STRONG_INLINE Packet8s pload(const int16_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_s16(from); } template<> EIGEN_STRONG_INLINE Packet4us pload(const uint16_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_u16(from); } template<> EIGEN_STRONG_INLINE Packet8us pload(const uint16_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_u16(from); } template<> EIGEN_STRONG_INLINE Packet2i pload(const int32_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_s32(from); } template<> EIGEN_STRONG_INLINE Packet4i pload(const int32_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_s32(from); } template<> EIGEN_STRONG_INLINE Packet2ui pload(const uint32_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_u32(from); } template<> EIGEN_STRONG_INLINE Packet4ui pload(const uint32_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_u32(from); } template<> EIGEN_STRONG_INLINE Packet2l pload(const int64_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_s64(from); } template<> EIGEN_STRONG_INLINE Packet2ul pload(const uint64_t* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_u64(from); } template<> EIGEN_STRONG_INLINE Packet2f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_f32(from); } template<> EIGEN_STRONG_INLINE Packet4f ploadu(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_f32(from); } template<> EIGEN_STRONG_INLINE Packet4c ploadu(const int8_t* from) { Packet4c res; memcpy(&res, from, sizeof(Packet4c)); return res; } template<> EIGEN_STRONG_INLINE Packet8c ploadu(const int8_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_s8(from); } template<> EIGEN_STRONG_INLINE Packet16c ploadu(const int8_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_s8(from); } template<> EIGEN_STRONG_INLINE Packet4uc ploadu(const uint8_t* from) { Packet4uc res; memcpy(&res, from, sizeof(Packet4uc)); return res; } template<> EIGEN_STRONG_INLINE Packet8uc ploadu(const uint8_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_u8(from); } template<> EIGEN_STRONG_INLINE Packet16uc ploadu(const uint8_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_u8(from); } template<> EIGEN_STRONG_INLINE Packet4s ploadu(const int16_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_s16(from); } template<> EIGEN_STRONG_INLINE Packet8s ploadu(const int16_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_s16(from); } template<> EIGEN_STRONG_INLINE Packet4us ploadu(const uint16_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_u16(from); } template<> EIGEN_STRONG_INLINE Packet8us ploadu(const uint16_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_u16(from); } template<> EIGEN_STRONG_INLINE Packet2i ploadu(const int32_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_s32(from); } template<> EIGEN_STRONG_INLINE Packet4i ploadu(const int32_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_s32(from); } template<> EIGEN_STRONG_INLINE Packet2ui ploadu(const uint32_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_u32(from); } template<> EIGEN_STRONG_INLINE Packet4ui ploadu(const uint32_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_u32(from); } template<> EIGEN_STRONG_INLINE Packet2l ploadu(const int64_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_s64(from); } template<> EIGEN_STRONG_INLINE Packet2ul ploadu(const uint64_t* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_u64(from); } template<> EIGEN_STRONG_INLINE Packet2f ploaddup(const float* from) { return vld1_dup_f32(from); } template<> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { return vcombine_f32(vld1_dup_f32(from), vld1_dup_f32(from+1)); } template<> EIGEN_STRONG_INLINE Packet4c ploaddup(const int8_t* from) { const int8x8_t a = vreinterpret_s8_s32(vdup_n_s32(pload(from))); return vget_lane_s32(vreinterpret_s32_s8(vzip_s8(a,a).val[0]), 0); } template<> EIGEN_STRONG_INLINE Packet8c ploaddup(const int8_t* from) { const int8x8_t a = vld1_s8(from); return vzip_s8(a,a).val[0]; } template<> EIGEN_STRONG_INLINE Packet16c ploaddup(const int8_t* from) { const int8x8_t a = vld1_s8(from); const int8x8x2_t b = vzip_s8(a,a); return vcombine_s8(b.val[0], b.val[1]); } template<> EIGEN_STRONG_INLINE Packet4uc ploaddup(const uint8_t* from) { const uint8x8_t a = vreinterpret_u8_u32(vdup_n_u32(pload(from))); return vget_lane_u32(vreinterpret_u32_u8(vzip_u8(a,a).val[0]), 0); } template<> EIGEN_STRONG_INLINE Packet8uc ploaddup(const uint8_t* from) { const uint8x8_t a = vld1_u8(from); return vzip_u8(a,a).val[0]; } template<> EIGEN_STRONG_INLINE Packet16uc ploaddup(const uint8_t* from) { const uint8x8_t a = vld1_u8(from); const uint8x8x2_t b = vzip_u8(a,a); return vcombine_u8(b.val[0], b.val[1]); } template<> EIGEN_STRONG_INLINE Packet4s ploaddup(const int16_t* from) { return vreinterpret_s16_u32(vzip_u32(vreinterpret_u32_s16(vld1_dup_s16(from)), vreinterpret_u32_s16(vld1_dup_s16(from+1))).val[0]); } template<> EIGEN_STRONG_INLINE Packet8s ploaddup(const int16_t* from) { const int16x4_t a = vld1_s16(from); const int16x4x2_t b = vzip_s16(a,a); return vcombine_s16(b.val[0], b.val[1]); } template<> EIGEN_STRONG_INLINE Packet4us ploaddup(const uint16_t* from) { return vreinterpret_u16_u32(vzip_u32(vreinterpret_u32_u16(vld1_dup_u16(from)), vreinterpret_u32_u16(vld1_dup_u16(from+1))).val[0]); } template<> EIGEN_STRONG_INLINE Packet8us ploaddup(const uint16_t* from) { const uint16x4_t a = vld1_u16(from); const uint16x4x2_t b = vzip_u16(a,a); return vcombine_u16(b.val[0], b.val[1]); } template<> EIGEN_STRONG_INLINE Packet2i ploaddup(const int32_t* from) { return vld1_dup_s32(from); } template<> EIGEN_STRONG_INLINE Packet4i ploaddup(const int32_t* from) { return vcombine_s32(vld1_dup_s32(from), vld1_dup_s32(from+1)); } template<> EIGEN_STRONG_INLINE Packet2ui ploaddup(const uint32_t* from) { return vld1_dup_u32(from); } template<> EIGEN_STRONG_INLINE Packet4ui ploaddup(const uint32_t* from) { return vcombine_u32(vld1_dup_u32(from), vld1_dup_u32(from+1)); } template<> EIGEN_STRONG_INLINE Packet2l ploaddup(const int64_t* from) { return vld1q_dup_s64(from); } template<> EIGEN_STRONG_INLINE Packet2ul ploaddup(const uint64_t* from) { return vld1q_dup_u64(from); } template<> EIGEN_STRONG_INLINE Packet4f ploadquad(const float* from) { return vld1q_dup_f32(from); } template<> EIGEN_STRONG_INLINE Packet4c ploadquad(const int8_t* from) { return vget_lane_s32(vreinterpret_s32_s8(vld1_dup_s8(from)), 0); } template<> EIGEN_STRONG_INLINE Packet8c ploadquad(const int8_t* from) { return vreinterpret_s8_u32(vzip_u32( vreinterpret_u32_s8(vld1_dup_s8(from)), vreinterpret_u32_s8(vld1_dup_s8(from+1))).val[0]); } template<> EIGEN_STRONG_INLINE Packet16c ploadquad(const int8_t* from) { const int8x8_t a = vreinterpret_s8_u32(vzip_u32( vreinterpret_u32_s8(vld1_dup_s8(from)), vreinterpret_u32_s8(vld1_dup_s8(from+1))).val[0]); const int8x8_t b = vreinterpret_s8_u32(vzip_u32( vreinterpret_u32_s8(vld1_dup_s8(from+2)), vreinterpret_u32_s8(vld1_dup_s8(from+3))).val[0]); return vcombine_s8(a,b); } template<> EIGEN_STRONG_INLINE Packet4uc ploadquad(const uint8_t* from) { return vget_lane_u32(vreinterpret_u32_u8(vld1_dup_u8(from)), 0); } template<> EIGEN_STRONG_INLINE Packet8uc ploadquad(const uint8_t* from) { return vreinterpret_u8_u32(vzip_u32( vreinterpret_u32_u8(vld1_dup_u8(from)), vreinterpret_u32_u8(vld1_dup_u8(from+1))).val[0]); } template<> EIGEN_STRONG_INLINE Packet16uc ploadquad(const uint8_t* from) { const uint8x8_t a = vreinterpret_u8_u32(vzip_u32( vreinterpret_u32_u8(vld1_dup_u8(from)), vreinterpret_u32_u8(vld1_dup_u8(from+1))).val[0]); const uint8x8_t b = vreinterpret_u8_u32(vzip_u32( vreinterpret_u32_u8(vld1_dup_u8(from+2)), vreinterpret_u32_u8(vld1_dup_u8(from+3))).val[0]); return vcombine_u8(a,b); } template<> EIGEN_STRONG_INLINE Packet8s ploadquad(const int16_t* from) { return vcombine_s16(vld1_dup_s16(from), vld1_dup_s16(from+1)); } template<> EIGEN_STRONG_INLINE Packet8us ploadquad(const uint16_t* from) { return vcombine_u16(vld1_dup_u16(from), vld1_dup_u16(from+1)); } template<> EIGEN_STRONG_INLINE Packet4i ploadquad(const int32_t* from) { return vld1q_dup_s32(from); } template<> EIGEN_STRONG_INLINE Packet4ui ploadquad(const uint32_t* from) { return vld1q_dup_u32(from); } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet2f& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_f32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_f32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int8_t* to, const Packet4c& from) { memcpy(to, &from, sizeof(from)); } template<> EIGEN_STRONG_INLINE void pstore(int8_t* to, const Packet8c& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_s8(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int8_t* to, const Packet16c& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_s8(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint8_t* to, const Packet4uc& from) { memcpy(to, &from, sizeof(from)); } template<> EIGEN_STRONG_INLINE void pstore(uint8_t* to, const Packet8uc& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_u8(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint8_t* to, const Packet16uc& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_u8(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int16_t* to, const Packet4s& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_s16(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int16_t* to, const Packet8s& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_s16(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint16_t* to, const Packet4us& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_u16(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint16_t* to, const Packet8us& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_u16(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int32_t* to, const Packet2i& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_s32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int32_t* to, const Packet4i& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_s32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint32_t* to, const Packet2ui& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_u32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint32_t* to, const Packet4ui& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_u32(to,from); } template<> EIGEN_STRONG_INLINE void pstore(int64_t* to, const Packet2l& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_s64(to,from); } template<> EIGEN_STRONG_INLINE void pstore(uint64_t* to, const Packet2ul& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_u64(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet2f& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_f32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet4f& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_f32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int8_t* to, const Packet4c& from) { memcpy(to, &from, sizeof(from)); } template<> EIGEN_STRONG_INLINE void pstoreu(int8_t* to, const Packet8c& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_s8(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int8_t* to, const Packet16c& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_s8(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint8_t* to, const Packet4uc& from) { memcpy(to, &from, sizeof(from)); } template<> EIGEN_STRONG_INLINE void pstoreu(uint8_t* to, const Packet8uc& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_u8(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint8_t* to, const Packet16uc& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_u8(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int16_t* to, const Packet4s& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_s16(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int16_t* to, const Packet8s& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_s16(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint16_t* to, const Packet4us& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_u16(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint16_t* to, const Packet8us& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_u16(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int32_t* to, const Packet2i& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_s32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int32_t* to, const Packet4i& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_s32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint32_t* to, const Packet2ui& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_u32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint32_t* to, const Packet4ui& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_u32(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(int64_t* to, const Packet2l& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_s64(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(uint64_t* to, const Packet2ul& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_u64(to,from); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2f pgather(const float* from, Index stride) { Packet2f res = vld1_dup_f32(from); res = vld1_lane_f32(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4f pgather(const float* from, Index stride) { Packet4f res = vld1q_dup_f32(from); res = vld1q_lane_f32(from + 1*stride, res, 1); res = vld1q_lane_f32(from + 2*stride, res, 2); res = vld1q_lane_f32(from + 3*stride, res, 3); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4c pgather(const int8_t* from, Index stride) { Packet4c res; for (int i = 0; i != 4; i++) reinterpret_cast(&res)[i] = *(from + i * stride); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8c pgather(const int8_t* from, Index stride) { Packet8c res = vld1_dup_s8(from); res = vld1_lane_s8(from + 1*stride, res, 1); res = vld1_lane_s8(from + 2*stride, res, 2); res = vld1_lane_s8(from + 3*stride, res, 3); res = vld1_lane_s8(from + 4*stride, res, 4); res = vld1_lane_s8(from + 5*stride, res, 5); res = vld1_lane_s8(from + 6*stride, res, 6); res = vld1_lane_s8(from + 7*stride, res, 7); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet16c pgather(const int8_t* from, Index stride) { Packet16c res = vld1q_dup_s8(from); res = vld1q_lane_s8(from + 1*stride, res, 1); res = vld1q_lane_s8(from + 2*stride, res, 2); res = vld1q_lane_s8(from + 3*stride, res, 3); res = vld1q_lane_s8(from + 4*stride, res, 4); res = vld1q_lane_s8(from + 5*stride, res, 5); res = vld1q_lane_s8(from + 6*stride, res, 6); res = vld1q_lane_s8(from + 7*stride, res, 7); res = vld1q_lane_s8(from + 8*stride, res, 8); res = vld1q_lane_s8(from + 9*stride, res, 9); res = vld1q_lane_s8(from + 10*stride, res, 10); res = vld1q_lane_s8(from + 11*stride, res, 11); res = vld1q_lane_s8(from + 12*stride, res, 12); res = vld1q_lane_s8(from + 13*stride, res, 13); res = vld1q_lane_s8(from + 14*stride, res, 14); res = vld1q_lane_s8(from + 15*stride, res, 15); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4uc pgather(const uint8_t* from, Index stride) { Packet4uc res; for (int i = 0; i != 4; i++) reinterpret_cast(&res)[i] = *(from + i * stride); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8uc pgather(const uint8_t* from, Index stride) { Packet8uc res = vld1_dup_u8(from); res = vld1_lane_u8(from + 1*stride, res, 1); res = vld1_lane_u8(from + 2*stride, res, 2); res = vld1_lane_u8(from + 3*stride, res, 3); res = vld1_lane_u8(from + 4*stride, res, 4); res = vld1_lane_u8(from + 5*stride, res, 5); res = vld1_lane_u8(from + 6*stride, res, 6); res = vld1_lane_u8(from + 7*stride, res, 7); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet16uc pgather(const uint8_t* from, Index stride) { Packet16uc res = vld1q_dup_u8(from); res = vld1q_lane_u8(from + 1*stride, res, 1); res = vld1q_lane_u8(from + 2*stride, res, 2); res = vld1q_lane_u8(from + 3*stride, res, 3); res = vld1q_lane_u8(from + 4*stride, res, 4); res = vld1q_lane_u8(from + 5*stride, res, 5); res = vld1q_lane_u8(from + 6*stride, res, 6); res = vld1q_lane_u8(from + 7*stride, res, 7); res = vld1q_lane_u8(from + 8*stride, res, 8); res = vld1q_lane_u8(from + 9*stride, res, 9); res = vld1q_lane_u8(from + 10*stride, res, 10); res = vld1q_lane_u8(from + 11*stride, res, 11); res = vld1q_lane_u8(from + 12*stride, res, 12); res = vld1q_lane_u8(from + 13*stride, res, 13); res = vld1q_lane_u8(from + 14*stride, res, 14); res = vld1q_lane_u8(from + 15*stride, res, 15); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4s pgather(const int16_t* from, Index stride) { Packet4s res = vld1_dup_s16(from); res = vld1_lane_s16(from + 1*stride, res, 1); res = vld1_lane_s16(from + 2*stride, res, 2); res = vld1_lane_s16(from + 3*stride, res, 3); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8s pgather(const int16_t* from, Index stride) { Packet8s res = vld1q_dup_s16(from); res = vld1q_lane_s16(from + 1*stride, res, 1); res = vld1q_lane_s16(from + 2*stride, res, 2); res = vld1q_lane_s16(from + 3*stride, res, 3); res = vld1q_lane_s16(from + 4*stride, res, 4); res = vld1q_lane_s16(from + 5*stride, res, 5); res = vld1q_lane_s16(from + 6*stride, res, 6); res = vld1q_lane_s16(from + 7*stride, res, 7); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4us pgather(const uint16_t* from, Index stride) { Packet4us res = vld1_dup_u16(from); res = vld1_lane_u16(from + 1*stride, res, 1); res = vld1_lane_u16(from + 2*stride, res, 2); res = vld1_lane_u16(from + 3*stride, res, 3); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8us pgather(const uint16_t* from, Index stride) { Packet8us res = vld1q_dup_u16(from); res = vld1q_lane_u16(from + 1*stride, res, 1); res = vld1q_lane_u16(from + 2*stride, res, 2); res = vld1q_lane_u16(from + 3*stride, res, 3); res = vld1q_lane_u16(from + 4*stride, res, 4); res = vld1q_lane_u16(from + 5*stride, res, 5); res = vld1q_lane_u16(from + 6*stride, res, 6); res = vld1q_lane_u16(from + 7*stride, res, 7); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2i pgather(const int32_t* from, Index stride) { Packet2i res = vld1_dup_s32(from); res = vld1_lane_s32(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4i pgather(const int32_t* from, Index stride) { Packet4i res = vld1q_dup_s32(from); res = vld1q_lane_s32(from + 1*stride, res, 1); res = vld1q_lane_s32(from + 2*stride, res, 2); res = vld1q_lane_s32(from + 3*stride, res, 3); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2ui pgather(const uint32_t* from, Index stride) { Packet2ui res = vld1_dup_u32(from); res = vld1_lane_u32(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4ui pgather(const uint32_t* from, Index stride) { Packet4ui res = vld1q_dup_u32(from); res = vld1q_lane_u32(from + 1*stride, res, 1); res = vld1q_lane_u32(from + 2*stride, res, 2); res = vld1q_lane_u32(from + 3*stride, res, 3); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2l pgather(const int64_t* from, Index stride) { Packet2l res = vld1q_dup_s64(from); res = vld1q_lane_s64(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2ul pgather(const uint64_t* from, Index stride) { Packet2ul res = vld1q_dup_u64(from); res = vld1q_lane_u64(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(float* to, const Packet2f& from, Index stride) { vst1_lane_f32(to + stride*0, from, 0); vst1_lane_f32(to + stride*1, from, 1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(float* to, const Packet4f& from, Index stride) { vst1q_lane_f32(to + stride*0, from, 0); vst1q_lane_f32(to + stride*1, from, 1); vst1q_lane_f32(to + stride*2, from, 2); vst1q_lane_f32(to + stride*3, from, 3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int8_t* to, const Packet4c& from, Index stride) { for (int i = 0; i != 4; i++) *(to + i * stride) = reinterpret_cast(&from)[i]; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int8_t* to, const Packet8c& from, Index stride) { vst1_lane_s8(to + stride*0, from, 0); vst1_lane_s8(to + stride*1, from, 1); vst1_lane_s8(to + stride*2, from, 2); vst1_lane_s8(to + stride*3, from, 3); vst1_lane_s8(to + stride*4, from, 4); vst1_lane_s8(to + stride*5, from, 5); vst1_lane_s8(to + stride*6, from, 6); vst1_lane_s8(to + stride*7, from, 7); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int8_t* to, const Packet16c& from, Index stride) { vst1q_lane_s8(to + stride*0, from, 0); vst1q_lane_s8(to + stride*1, from, 1); vst1q_lane_s8(to + stride*2, from, 2); vst1q_lane_s8(to + stride*3, from, 3); vst1q_lane_s8(to + stride*4, from, 4); vst1q_lane_s8(to + stride*5, from, 5); vst1q_lane_s8(to + stride*6, from, 6); vst1q_lane_s8(to + stride*7, from, 7); vst1q_lane_s8(to + stride*8, from, 8); vst1q_lane_s8(to + stride*9, from, 9); vst1q_lane_s8(to + stride*10, from, 10); vst1q_lane_s8(to + stride*11, from, 11); vst1q_lane_s8(to + stride*12, from, 12); vst1q_lane_s8(to + stride*13, from, 13); vst1q_lane_s8(to + stride*14, from, 14); vst1q_lane_s8(to + stride*15, from, 15); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint8_t* to, const Packet4uc& from, Index stride) { for (int i = 0; i != 4; i++) *(to + i * stride) = reinterpret_cast(&from)[i]; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint8_t* to, const Packet8uc& from, Index stride) { vst1_lane_u8(to + stride*0, from, 0); vst1_lane_u8(to + stride*1, from, 1); vst1_lane_u8(to + stride*2, from, 2); vst1_lane_u8(to + stride*3, from, 3); vst1_lane_u8(to + stride*4, from, 4); vst1_lane_u8(to + stride*5, from, 5); vst1_lane_u8(to + stride*6, from, 6); vst1_lane_u8(to + stride*7, from, 7); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint8_t* to, const Packet16uc& from, Index stride) { vst1q_lane_u8(to + stride*0, from, 0); vst1q_lane_u8(to + stride*1, from, 1); vst1q_lane_u8(to + stride*2, from, 2); vst1q_lane_u8(to + stride*3, from, 3); vst1q_lane_u8(to + stride*4, from, 4); vst1q_lane_u8(to + stride*5, from, 5); vst1q_lane_u8(to + stride*6, from, 6); vst1q_lane_u8(to + stride*7, from, 7); vst1q_lane_u8(to + stride*8, from, 8); vst1q_lane_u8(to + stride*9, from, 9); vst1q_lane_u8(to + stride*10, from, 10); vst1q_lane_u8(to + stride*11, from, 11); vst1q_lane_u8(to + stride*12, from, 12); vst1q_lane_u8(to + stride*13, from, 13); vst1q_lane_u8(to + stride*14, from, 14); vst1q_lane_u8(to + stride*15, from, 15); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int16_t* to, const Packet4s& from, Index stride) { vst1_lane_s16(to + stride*0, from, 0); vst1_lane_s16(to + stride*1, from, 1); vst1_lane_s16(to + stride*2, from, 2); vst1_lane_s16(to + stride*3, from, 3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int16_t* to, const Packet8s& from, Index stride) { vst1q_lane_s16(to + stride*0, from, 0); vst1q_lane_s16(to + stride*1, from, 1); vst1q_lane_s16(to + stride*2, from, 2); vst1q_lane_s16(to + stride*3, from, 3); vst1q_lane_s16(to + stride*4, from, 4); vst1q_lane_s16(to + stride*5, from, 5); vst1q_lane_s16(to + stride*6, from, 6); vst1q_lane_s16(to + stride*7, from, 7); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint16_t* to, const Packet4us& from, Index stride) { vst1_lane_u16(to + stride*0, from, 0); vst1_lane_u16(to + stride*1, from, 1); vst1_lane_u16(to + stride*2, from, 2); vst1_lane_u16(to + stride*3, from, 3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint16_t* to, const Packet8us& from, Index stride) { vst1q_lane_u16(to + stride*0, from, 0); vst1q_lane_u16(to + stride*1, from, 1); vst1q_lane_u16(to + stride*2, from, 2); vst1q_lane_u16(to + stride*3, from, 3); vst1q_lane_u16(to + stride*4, from, 4); vst1q_lane_u16(to + stride*5, from, 5); vst1q_lane_u16(to + stride*6, from, 6); vst1q_lane_u16(to + stride*7, from, 7); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int32_t* to, const Packet2i& from, Index stride) { vst1_lane_s32(to + stride*0, from, 0); vst1_lane_s32(to + stride*1, from, 1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int32_t* to, const Packet4i& from, Index stride) { vst1q_lane_s32(to + stride*0, from, 0); vst1q_lane_s32(to + stride*1, from, 1); vst1q_lane_s32(to + stride*2, from, 2); vst1q_lane_s32(to + stride*3, from, 3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint32_t* to, const Packet2ui& from, Index stride) { vst1_lane_u32(to + stride*0, from, 0); vst1_lane_u32(to + stride*1, from, 1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint32_t* to, const Packet4ui& from, Index stride) { vst1q_lane_u32(to + stride*0, from, 0); vst1q_lane_u32(to + stride*1, from, 1); vst1q_lane_u32(to + stride*2, from, 2); vst1q_lane_u32(to + stride*3, from, 3); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(int64_t* to, const Packet2l& from, Index stride) { vst1q_lane_s64(to + stride*0, from, 0); vst1q_lane_s64(to + stride*1, from, 1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(uint64_t* to, const Packet2ul& from, Index stride) { vst1q_lane_u64(to + stride*0, from, 0); vst1q_lane_u64(to + stride*1, from, 1); } template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const int8_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const uint8_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const int16_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const uint16_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const int32_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const uint32_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const int64_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const uint64_t* addr) { EIGEN_ARM_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE float pfirst(const Packet2f& a) { return vget_lane_f32(a,0); } template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { return vgetq_lane_f32(a,0); } template<> EIGEN_STRONG_INLINE int8_t pfirst(const Packet4c& a) { return static_cast(a & 0xff); } template<> EIGEN_STRONG_INLINE int8_t pfirst(const Packet8c& a) { return vget_lane_s8(a,0); } template<> EIGEN_STRONG_INLINE int8_t pfirst(const Packet16c& a) { return vgetq_lane_s8(a,0); } template<> EIGEN_STRONG_INLINE uint8_t pfirst(const Packet4uc& a) { return static_cast(a & 0xff); } template<> EIGEN_STRONG_INLINE uint8_t pfirst(const Packet8uc& a) { return vget_lane_u8(a,0); } template<> EIGEN_STRONG_INLINE uint8_t pfirst(const Packet16uc& a) { return vgetq_lane_u8(a,0); } template<> EIGEN_STRONG_INLINE int16_t pfirst(const Packet4s& a) { return vget_lane_s16(a,0); } template<> EIGEN_STRONG_INLINE int16_t pfirst(const Packet8s& a) { return vgetq_lane_s16(a,0); } template<> EIGEN_STRONG_INLINE uint16_t pfirst(const Packet4us& a) { return vget_lane_u16(a,0); } template<> EIGEN_STRONG_INLINE uint16_t pfirst(const Packet8us& a) { return vgetq_lane_u16(a,0); } template<> EIGEN_STRONG_INLINE int32_t pfirst(const Packet2i& a) { return vget_lane_s32(a,0); } template<> EIGEN_STRONG_INLINE int32_t pfirst(const Packet4i& a) { return vgetq_lane_s32(a,0); } template<> EIGEN_STRONG_INLINE uint32_t pfirst(const Packet2ui& a) { return vget_lane_u32(a,0); } template<> EIGEN_STRONG_INLINE uint32_t pfirst(const Packet4ui& a) { return vgetq_lane_u32(a,0); } template<> EIGEN_STRONG_INLINE int64_t pfirst(const Packet2l& a) { return vgetq_lane_s64(a,0); } template<> EIGEN_STRONG_INLINE uint64_t pfirst(const Packet2ul& a) { return vgetq_lane_u64(a,0); } template<> EIGEN_STRONG_INLINE Packet2f preverse(const Packet2f& a) { return vrev64_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { const float32x4_t a_r64 = vrev64q_f32(a); return vcombine_f32(vget_high_f32(a_r64), vget_low_f32(a_r64)); } template<> EIGEN_STRONG_INLINE Packet4c preverse(const Packet4c& a) { return vget_lane_s32(vreinterpret_s32_s8(vrev64_s8(vreinterpret_s8_s32(vdup_n_s32(a)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c preverse(const Packet8c& a) { return vrev64_s8(a); } template<> EIGEN_STRONG_INLINE Packet16c preverse(const Packet16c& a) { const int8x16_t a_r64 = vrev64q_s8(a); return vcombine_s8(vget_high_s8(a_r64), vget_low_s8(a_r64)); } template<> EIGEN_STRONG_INLINE Packet4uc preverse(const Packet4uc& a) { return vget_lane_u32(vreinterpret_u32_u8(vrev64_u8(vreinterpret_u8_u32(vdup_n_u32(a)))), 0); } template<> EIGEN_STRONG_INLINE Packet8uc preverse(const Packet8uc& a) { return vrev64_u8(a); } template<> EIGEN_STRONG_INLINE Packet16uc preverse(const Packet16uc& a) { const uint8x16_t a_r64 = vrev64q_u8(a); return vcombine_u8(vget_high_u8(a_r64), vget_low_u8(a_r64)); } template<> EIGEN_STRONG_INLINE Packet4s preverse(const Packet4s& a) { return vrev64_s16(a); } template<> EIGEN_STRONG_INLINE Packet8s preverse(const Packet8s& a) { const int16x8_t a_r64 = vrev64q_s16(a); return vcombine_s16(vget_high_s16(a_r64), vget_low_s16(a_r64)); } template<> EIGEN_STRONG_INLINE Packet4us preverse(const Packet4us& a) { return vrev64_u16(a); } template<> EIGEN_STRONG_INLINE Packet8us preverse(const Packet8us& a) { const uint16x8_t a_r64 = vrev64q_u16(a); return vcombine_u16(vget_high_u16(a_r64), vget_low_u16(a_r64)); } template<> EIGEN_STRONG_INLINE Packet2i preverse(const Packet2i& a) { return vrev64_s32(a); } template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { const int32x4_t a_r64 = vrev64q_s32(a); return vcombine_s32(vget_high_s32(a_r64), vget_low_s32(a_r64)); } template<> EIGEN_STRONG_INLINE Packet2ui preverse(const Packet2ui& a) { return vrev64_u32(a); } template<> EIGEN_STRONG_INLINE Packet4ui preverse(const Packet4ui& a) { const uint32x4_t a_r64 = vrev64q_u32(a); return vcombine_u32(vget_high_u32(a_r64), vget_low_u32(a_r64)); } template<> EIGEN_STRONG_INLINE Packet2l preverse(const Packet2l& a) { return vcombine_s64(vget_high_s64(a), vget_low_s64(a)); } template<> EIGEN_STRONG_INLINE Packet2ul preverse(const Packet2ul& a) { return vcombine_u64(vget_high_u64(a), vget_low_u64(a)); } template<> EIGEN_STRONG_INLINE Packet2f pabs(const Packet2f& a) { return vabs_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { return vabsq_f32(a); } template<> EIGEN_STRONG_INLINE Packet4c pabs(const Packet4c& a) { return vget_lane_s32(vreinterpret_s32_s8(vabs_s8(vreinterpret_s8_s32(vdup_n_s32(a)))), 0); } template<> EIGEN_STRONG_INLINE Packet8c pabs(const Packet8c& a) { return vabs_s8(a); } template<> EIGEN_STRONG_INLINE Packet16c pabs(const Packet16c& a) { return vabsq_s8(a); } template<> EIGEN_STRONG_INLINE Packet4uc pabs(const Packet4uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8uc pabs(const Packet8uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet16uc pabs(const Packet16uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4s pabs(const Packet4s& a) { return vabs_s16(a); } template<> EIGEN_STRONG_INLINE Packet8s pabs(const Packet8s& a) { return vabsq_s16(a); } template<> EIGEN_STRONG_INLINE Packet4us pabs(const Packet4us& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8us pabs(const Packet8us& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2i pabs(const Packet2i& a) { return vabs_s32(a); } template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vabsq_s32(a); } template<> EIGEN_STRONG_INLINE Packet2ui pabs(const Packet2ui& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4ui pabs(const Packet4ui& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2l pabs(const Packet2l& a) { #if EIGEN_ARCH_ARM64 return vabsq_s64(a); #else return vcombine_s64( vdup_n_s64((std::abs)(vgetq_lane_s64(a, 0))), vdup_n_s64((std::abs)(vgetq_lane_s64(a, 1)))); #endif } template<> EIGEN_STRONG_INLINE Packet2ul pabs(const Packet2ul& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2f pfrexp(const Packet2f& a, Packet2f& exponent) { return pfrexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet4f pfrexp(const Packet4f& a, Packet4f& exponent) { return pfrexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet2f pldexp(const Packet2f& a, const Packet2f& exponent) { return pldexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet4f pldexp(const Packet4f& a, const Packet4f& exponent) { return pldexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE float predux(const Packet2f& a) { return vget_lane_f32(vpadd_f32(a,a), 0); } template<> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { const float32x2_t sum = vadd_f32(vget_low_f32(a), vget_high_f32(a)); return vget_lane_f32(vpadd_f32(sum, sum), 0); } template<> EIGEN_STRONG_INLINE int8_t predux(const Packet4c& a) { const int8x8_t a_dup = vreinterpret_s8_s32(vdup_n_s32(a)); int8x8_t sum = vpadd_s8(a_dup, a_dup); sum = vpadd_s8(sum, sum); return vget_lane_s8(sum, 0); } template<> EIGEN_STRONG_INLINE int8_t predux(const Packet8c& a) { int8x8_t sum = vpadd_s8(a,a); sum = vpadd_s8(sum, sum); sum = vpadd_s8(sum, sum); return vget_lane_s8(sum, 0); } template<> EIGEN_STRONG_INLINE int8_t predux(const Packet16c& a) { int8x8_t sum = vadd_s8(vget_low_s8(a), vget_high_s8(a)); sum = vpadd_s8(sum, sum); sum = vpadd_s8(sum, sum); sum = vpadd_s8(sum, sum); return vget_lane_s8(sum, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux(const Packet4uc& a) { const uint8x8_t a_dup = vreinterpret_u8_u32(vdup_n_u32(a)); uint8x8_t sum = vpadd_u8(a_dup, a_dup); sum = vpadd_u8(sum, sum); return vget_lane_u8(sum, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux(const Packet8uc& a) { uint8x8_t sum = vpadd_u8(a,a); sum = vpadd_u8(sum, sum); sum = vpadd_u8(sum, sum); return vget_lane_u8(sum, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux(const Packet16uc& a) { uint8x8_t sum = vadd_u8(vget_low_u8(a), vget_high_u8(a)); sum = vpadd_u8(sum, sum); sum = vpadd_u8(sum, sum); sum = vpadd_u8(sum, sum); return vget_lane_u8(sum, 0); } template<> EIGEN_STRONG_INLINE int16_t predux(const Packet4s& a) { const int16x4_t sum = vpadd_s16(a,a); return vget_lane_s16(vpadd_s16(sum, sum), 0); } template<> EIGEN_STRONG_INLINE int16_t predux(const Packet8s& a) { int16x4_t sum = vadd_s16(vget_low_s16(a), vget_high_s16(a)); sum = vpadd_s16(sum, sum); sum = vpadd_s16(sum, sum); return vget_lane_s16(sum, 0); } template<> EIGEN_STRONG_INLINE uint16_t predux(const Packet4us& a) { const uint16x4_t sum = vpadd_u16(a,a); return vget_lane_u16(vpadd_u16(sum, sum), 0); } template<> EIGEN_STRONG_INLINE uint16_t predux(const Packet8us& a) { uint16x4_t sum = vadd_u16(vget_low_u16(a), vget_high_u16(a)); sum = vpadd_u16(sum, sum); sum = vpadd_u16(sum, sum); return vget_lane_u16(sum, 0); } template<> EIGEN_STRONG_INLINE int32_t predux(const Packet2i& a) { return vget_lane_s32(vpadd_s32(a,a), 0); } template<> EIGEN_STRONG_INLINE int32_t predux(const Packet4i& a) { const int32x2_t sum = vadd_s32(vget_low_s32(a), vget_high_s32(a)); return vget_lane_s32(vpadd_s32(sum, sum), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux(const Packet2ui& a) { return vget_lane_u32(vpadd_u32(a,a), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux(const Packet4ui& a) { const uint32x2_t sum = vadd_u32(vget_low_u32(a), vget_high_u32(a)); return vget_lane_u32(vpadd_u32(sum, sum), 0); } template<> EIGEN_STRONG_INLINE int64_t predux(const Packet2l& a) { return vgetq_lane_s64(a, 0) + vgetq_lane_s64(a, 1); } template<> EIGEN_STRONG_INLINE uint64_t predux(const Packet2ul& a) { return vgetq_lane_u64(a, 0) + vgetq_lane_u64(a, 1); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4c predux_half_dowto4(const Packet8c& a) { return vget_lane_s32(vreinterpret_s32_s8(vadd_s8(a, vreinterpret_s8_s32(vrev64_s32(vreinterpret_s32_s8(a))))), 0); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8c predux_half_dowto4(const Packet16c& a) { return vadd_s8(vget_high_s8(a), vget_low_s8(a)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4uc predux_half_dowto4(const Packet8uc& a) { return vget_lane_u32(vreinterpret_u32_u8(vadd_u8(a, vreinterpret_u8_u32(vrev64_u32(vreinterpret_u32_u8(a))))), 0); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8uc predux_half_dowto4(const Packet16uc& a) { return vadd_u8(vget_high_u8(a), vget_low_u8(a)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4s predux_half_dowto4(const Packet8s& a) { return vadd_s16(vget_high_s16(a), vget_low_s16(a)); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4us predux_half_dowto4(const Packet8us& a) { return vadd_u16(vget_high_u16(a), vget_low_u16(a)); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE float predux_mul(const Packet2f& a) { return vget_lane_f32(a, 0) * vget_lane_f32(a, 1); } template<> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { return predux_mul(vmul_f32(vget_low_f32(a), vget_high_f32(a))); } template<> EIGEN_STRONG_INLINE int8_t predux_mul(const Packet4c& a) { int8x8_t prod = vreinterpret_s8_s32(vdup_n_s32(a)); prod = vmul_s8(prod, vrev16_s8(prod)); return vget_lane_s8(prod, 0) * vget_lane_s8(prod, 2); } template<> EIGEN_STRONG_INLINE int8_t predux_mul(const Packet8c& a) { int8x8_t prod = vmul_s8(a, vrev16_s8(a)); prod = vmul_s8(prod, vrev32_s8(prod)); return vget_lane_s8(prod, 0) * vget_lane_s8(prod, 4); } template<> EIGEN_STRONG_INLINE int8_t predux_mul(const Packet16c& a) { return predux_mul(vmul_s8(vget_low_s8(a), vget_high_s8(a))); } template<> EIGEN_STRONG_INLINE uint8_t predux_mul(const Packet4uc& a) { uint8x8_t prod = vreinterpret_u8_u32(vdup_n_u32(a)); prod = vmul_u8(prod, vrev16_u8(prod)); return vget_lane_u8(prod, 0) * vget_lane_u8(prod, 2); } template<> EIGEN_STRONG_INLINE uint8_t predux_mul(const Packet8uc& a) { uint8x8_t prod = vmul_u8(a, vrev16_u8(a)); prod = vmul_u8(prod, vrev32_u8(prod)); return vget_lane_u8(prod, 0) * vget_lane_u8(prod, 4); } template<> EIGEN_STRONG_INLINE uint8_t predux_mul(const Packet16uc& a) { return predux_mul(vmul_u8(vget_low_u8(a), vget_high_u8(a))); } template<> EIGEN_STRONG_INLINE int16_t predux_mul(const Packet4s& a) { const int16x4_t prod = vmul_s16(a, vrev32_s16(a)); return vget_lane_s16(prod, 0) * vget_lane_s16(prod, 2); } template<> EIGEN_STRONG_INLINE int16_t predux_mul(const Packet8s& a) { int16x4_t prod; // Get the product of a_lo * a_hi -> |a1*a5|a2*a6|a3*a7|a4*a8| prod = vmul_s16(vget_low_s16(a), vget_high_s16(a)); // Swap and multiply |a1*a5*a2*a6|a3*a7*a4*a8| prod = vmul_s16(prod, vrev32_s16(prod)); // Multiply |a1*a5*a2*a6*a3*a7*a4*a8| return vget_lane_s16(prod, 0) * vget_lane_s16(prod, 2); } template<> EIGEN_STRONG_INLINE uint16_t predux_mul(const Packet4us& a) { const uint16x4_t prod = vmul_u16(a, vrev32_u16(a)); return vget_lane_u16(prod, 0) * vget_lane_u16(prod, 2); } template<> EIGEN_STRONG_INLINE uint16_t predux_mul(const Packet8us& a) { uint16x4_t prod; // Get the product of a_lo * a_hi -> |a1*a5|a2*a6|a3*a7|a4*a8| prod = vmul_u16(vget_low_u16(a), vget_high_u16(a)); // Swap and multiply |a1*a5*a2*a6|a3*a7*a4*a8| prod = vmul_u16(prod, vrev32_u16(prod)); // Multiply |a1*a5*a2*a6*a3*a7*a4*a8| return vget_lane_u16(prod, 0) * vget_lane_u16(prod, 2); } template<> EIGEN_STRONG_INLINE int32_t predux_mul(const Packet2i& a) { return vget_lane_s32(a, 0) * vget_lane_s32(a, 1); } template<> EIGEN_STRONG_INLINE int32_t predux_mul(const Packet4i& a) { return predux_mul(vmul_s32(vget_low_s32(a), vget_high_s32(a))); } template<> EIGEN_STRONG_INLINE uint32_t predux_mul(const Packet2ui& a) { return vget_lane_u32(a, 0) * vget_lane_u32(a, 1); } template<> EIGEN_STRONG_INLINE uint32_t predux_mul(const Packet4ui& a) { return predux_mul(vmul_u32(vget_low_u32(a), vget_high_u32(a))); } template<> EIGEN_STRONG_INLINE int64_t predux_mul(const Packet2l& a) { return vgetq_lane_s64(a, 0) * vgetq_lane_s64(a, 1); } template<> EIGEN_STRONG_INLINE uint64_t predux_mul(const Packet2ul& a) { return vgetq_lane_u64(a, 0) * vgetq_lane_u64(a, 1); } // min template<> EIGEN_STRONG_INLINE float predux_min(const Packet2f& a) { return vget_lane_f32(vpmin_f32(a,a), 0); } template<> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { const float32x2_t min = vmin_f32(vget_low_f32(a), vget_high_f32(a)); return vget_lane_f32(vpmin_f32(min, min), 0); } template<> EIGEN_STRONG_INLINE int8_t predux_min(const Packet4c& a) { const int8x8_t a_dup = vreinterpret_s8_s32(vdup_n_s32(a)); int8x8_t min = vpmin_s8(a_dup, a_dup); min = vpmin_s8(min, min); return vget_lane_s8(min, 0); } template<> EIGEN_STRONG_INLINE int8_t predux_min(const Packet8c& a) { int8x8_t min = vpmin_s8(a,a); min = vpmin_s8(min, min); min = vpmin_s8(min, min); return vget_lane_s8(min, 0); } template<> EIGEN_STRONG_INLINE int8_t predux_min(const Packet16c& a) { int8x8_t min = vmin_s8(vget_low_s8(a), vget_high_s8(a)); min = vpmin_s8(min, min); min = vpmin_s8(min, min); min = vpmin_s8(min, min); return vget_lane_s8(min, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_min(const Packet4uc& a) { const uint8x8_t a_dup = vreinterpret_u8_u32(vdup_n_u32(a)); uint8x8_t min = vpmin_u8(a_dup, a_dup); min = vpmin_u8(min, min); return vget_lane_u8(min, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_min(const Packet8uc& a) { uint8x8_t min = vpmin_u8(a,a); min = vpmin_u8(min, min); min = vpmin_u8(min, min); return vget_lane_u8(min, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_min(const Packet16uc& a) { uint8x8_t min = vmin_u8(vget_low_u8(a), vget_high_u8(a)); min = vpmin_u8(min, min); min = vpmin_u8(min, min); min = vpmin_u8(min, min); return vget_lane_u8(min, 0); } template<> EIGEN_STRONG_INLINE int16_t predux_min(const Packet4s& a) { const int16x4_t min = vpmin_s16(a,a); return vget_lane_s16(vpmin_s16(min, min), 0); } template<> EIGEN_STRONG_INLINE int16_t predux_min(const Packet8s& a) { int16x4_t min = vmin_s16(vget_low_s16(a), vget_high_s16(a)); min = vpmin_s16(min, min); min = vpmin_s16(min, min); return vget_lane_s16(min, 0); } template<> EIGEN_STRONG_INLINE uint16_t predux_min(const Packet4us& a) { const uint16x4_t min = vpmin_u16(a,a); return vget_lane_u16(vpmin_u16(min, min), 0); } template<> EIGEN_STRONG_INLINE uint16_t predux_min(const Packet8us& a) { uint16x4_t min = vmin_u16(vget_low_u16(a), vget_high_u16(a)); min = vpmin_u16(min, min); min = vpmin_u16(min, min); return vget_lane_u16(min, 0); } template<> EIGEN_STRONG_INLINE int32_t predux_min(const Packet2i& a) { return vget_lane_s32(vpmin_s32(a,a), 0); } template<> EIGEN_STRONG_INLINE int32_t predux_min(const Packet4i& a) { const int32x2_t min = vmin_s32(vget_low_s32(a), vget_high_s32(a)); return vget_lane_s32(vpmin_s32(min, min), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux_min(const Packet2ui& a) { return vget_lane_u32(vpmin_u32(a,a), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux_min(const Packet4ui& a) { const uint32x2_t min = vmin_u32(vget_low_u32(a), vget_high_u32(a)); return vget_lane_u32(vpmin_u32(min, min), 0); } template<> EIGEN_STRONG_INLINE int64_t predux_min(const Packet2l& a) { return (std::min)(vgetq_lane_s64(a, 0), vgetq_lane_s64(a, 1)); } template<> EIGEN_STRONG_INLINE uint64_t predux_min(const Packet2ul& a) { return (std::min)(vgetq_lane_u64(a, 0), vgetq_lane_u64(a, 1)); } // max template<> EIGEN_STRONG_INLINE float predux_max(const Packet2f& a) { return vget_lane_f32(vpmax_f32(a,a), 0); } template<> EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { const float32x2_t max = vmax_f32(vget_low_f32(a), vget_high_f32(a)); return vget_lane_f32(vpmax_f32(max, max), 0); } template<> EIGEN_STRONG_INLINE int8_t predux_max(const Packet4c& a) { const int8x8_t a_dup = vreinterpret_s8_s32(vdup_n_s32(a)); int8x8_t max = vpmax_s8(a_dup, a_dup); max = vpmax_s8(max, max); return vget_lane_s8(max, 0); } template<> EIGEN_STRONG_INLINE int8_t predux_max(const Packet8c& a) { int8x8_t max = vpmax_s8(a,a); max = vpmax_s8(max, max); max = vpmax_s8(max, max); return vget_lane_s8(max, 0); } template<> EIGEN_STRONG_INLINE int8_t predux_max(const Packet16c& a) { int8x8_t max = vmax_s8(vget_low_s8(a), vget_high_s8(a)); max = vpmax_s8(max, max); max = vpmax_s8(max, max); max = vpmax_s8(max, max); return vget_lane_s8(max, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_max(const Packet4uc& a) { const uint8x8_t a_dup = vreinterpret_u8_u32(vdup_n_u32(a)); uint8x8_t max = vpmax_u8(a_dup, a_dup); max = vpmax_u8(max, max); return vget_lane_u8(max, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_max(const Packet8uc& a) { uint8x8_t max = vpmax_u8(a,a); max = vpmax_u8(max, max); max = vpmax_u8(max, max); return vget_lane_u8(max, 0); } template<> EIGEN_STRONG_INLINE uint8_t predux_max(const Packet16uc& a) { uint8x8_t max = vmax_u8(vget_low_u8(a), vget_high_u8(a)); max = vpmax_u8(max, max); max = vpmax_u8(max, max); max = vpmax_u8(max, max); return vget_lane_u8(max, 0); } template<> EIGEN_STRONG_INLINE int16_t predux_max(const Packet4s& a) { const int16x4_t max = vpmax_s16(a,a); return vget_lane_s16(vpmax_s16(max, max), 0); } template<> EIGEN_STRONG_INLINE int16_t predux_max(const Packet8s& a) { int16x4_t max = vmax_s16(vget_low_s16(a), vget_high_s16(a)); max = vpmax_s16(max, max); max = vpmax_s16(max, max); return vget_lane_s16(max, 0); } template<> EIGEN_STRONG_INLINE uint16_t predux_max(const Packet4us& a) { const uint16x4_t max = vpmax_u16(a,a); return vget_lane_u16(vpmax_u16(max, max), 0); } template<> EIGEN_STRONG_INLINE uint16_t predux_max(const Packet8us& a) { uint16x4_t max = vmax_u16(vget_low_u16(a), vget_high_u16(a)); max = vpmax_u16(max, max); max = vpmax_u16(max, max); return vget_lane_u16(max, 0); } template<> EIGEN_STRONG_INLINE int32_t predux_max(const Packet2i& a) { return vget_lane_s32(vpmax_s32(a,a), 0); } template<> EIGEN_STRONG_INLINE int32_t predux_max(const Packet4i& a) { const int32x2_t max = vmax_s32(vget_low_s32(a), vget_high_s32(a)); return vget_lane_s32(vpmax_s32(max, max), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux_max(const Packet2ui& a) { return vget_lane_u32(vpmax_u32(a,a), 0); } template<> EIGEN_STRONG_INLINE uint32_t predux_max(const Packet4ui& a) { const uint32x2_t max = vmax_u32(vget_low_u32(a), vget_high_u32(a)); return vget_lane_u32(vpmax_u32(max, max), 0); } template<> EIGEN_STRONG_INLINE int64_t predux_max(const Packet2l& a) { return (std::max)(vgetq_lane_s64(a, 0), vgetq_lane_s64(a, 1)); } template<> EIGEN_STRONG_INLINE uint64_t predux_max(const Packet2ul& a) { return (std::max)(vgetq_lane_u64(a, 0), vgetq_lane_u64(a, 1)); } template<> EIGEN_STRONG_INLINE bool predux_any(const Packet4f& x) { uint32x2_t tmp = vorr_u32(vget_low_u32( vreinterpretq_u32_f32(x)), vget_high_u32(vreinterpretq_u32_f32(x))); return vget_lane_u32(vpmax_u32(tmp, tmp), 0); } // Helpers for ptranspose. namespace detail { template void zip_in_place(Packet& p1, Packet& p2); template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet2f& p1, Packet2f& p2) { const float32x2x2_t tmp = vzip_f32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4f& p1, Packet4f& p2) { const float32x4x2_t tmp = vzipq_f32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet8c& p1, Packet8c& p2) { const int8x8x2_t tmp = vzip_s8(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet16c& p1, Packet16c& p2) { const int8x16x2_t tmp = vzipq_s8(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet8uc& p1, Packet8uc& p2) { const uint8x8x2_t tmp = vzip_u8(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet16uc& p1, Packet16uc& p2) { const uint8x16x2_t tmp = vzipq_u8(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet2i& p1, Packet2i& p2) { const int32x2x2_t tmp = vzip_s32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4i& p1, Packet4i& p2) { const int32x4x2_t tmp = vzipq_s32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet2ui& p1, Packet2ui& p2) { const uint32x2x2_t tmp = vzip_u32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4ui& p1, Packet4ui& p2) { const uint32x4x2_t tmp = vzipq_u32(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4s& p1, Packet4s& p2) { const int16x4x2_t tmp = vzip_s16(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet8s& p1, Packet8s& p2) { const int16x8x2_t tmp = vzipq_s16(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4us& p1, Packet4us& p2) { const uint16x4x2_t tmp = vzip_u16(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet8us& p1, Packet8us& p2) { const uint16x8x2_t tmp = vzipq_u16(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } template EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock& kernel) { zip_in_place(kernel.packet[0], kernel.packet[1]); } template EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock& kernel) { zip_in_place(kernel.packet[0], kernel.packet[2]); zip_in_place(kernel.packet[1], kernel.packet[3]); zip_in_place(kernel.packet[0], kernel.packet[1]); zip_in_place(kernel.packet[2], kernel.packet[3]); } template EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock& kernel) { zip_in_place(kernel.packet[0], kernel.packet[4]); zip_in_place(kernel.packet[1], kernel.packet[5]); zip_in_place(kernel.packet[2], kernel.packet[6]); zip_in_place(kernel.packet[3], kernel.packet[7]); zip_in_place(kernel.packet[0], kernel.packet[2]); zip_in_place(kernel.packet[1], kernel.packet[3]); zip_in_place(kernel.packet[4], kernel.packet[6]); zip_in_place(kernel.packet[5], kernel.packet[7]); zip_in_place(kernel.packet[0], kernel.packet[1]); zip_in_place(kernel.packet[2], kernel.packet[3]); zip_in_place(kernel.packet[4], kernel.packet[5]); zip_in_place(kernel.packet[6], kernel.packet[7]); } template EIGEN_ALWAYS_INLINE void ptranspose_impl(PacketBlock& kernel) { EIGEN_UNROLL_LOOP for (int i=0; i<4; ++i) { const int m = (1 << i); EIGEN_UNROLL_LOOP for (int j=0; j& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { const int8x8_t a = vreinterpret_s8_s32(vset_lane_s32(kernel.packet[2], vdup_n_s32(kernel.packet[0]), 1)); const int8x8_t b = vreinterpret_s8_s32(vset_lane_s32(kernel.packet[3], vdup_n_s32(kernel.packet[1]), 1)); const int8x8x2_t zip8 = vzip_s8(a,b); const int16x4x2_t zip16 = vzip_s16(vreinterpret_s16_s8(zip8.val[0]), vreinterpret_s16_s8(zip8.val[1])); kernel.packet[0] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[0]), 0); kernel.packet[1] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[0]), 1); kernel.packet[2] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[1]), 0); kernel.packet[3] = vget_lane_s32(vreinterpret_s32_s16(zip16.val[1]), 1); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { const uint8x8_t a = vreinterpret_u8_u32(vset_lane_u32(kernel.packet[2], vdup_n_u32(kernel.packet[0]), 1)); const uint8x8_t b = vreinterpret_u8_u32(vset_lane_u32(kernel.packet[3], vdup_n_u32(kernel.packet[1]), 1)); const uint8x8x2_t zip8 = vzip_u8(a,b); const uint16x4x2_t zip16 = vzip_u16(vreinterpret_u16_u8(zip8.val[0]), vreinterpret_u16_u8(zip8.val[1])); kernel.packet[0] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[0]), 0); kernel.packet[1] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[0]), 1); kernel.packet[2] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[1]), 0); kernel.packet[3] = vget_lane_u32(vreinterpret_u32_u16(zip16.val[1]), 1); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::zip_in_place(kernel.packet[0], kernel.packet[1]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { #if EIGEN_ARCH_ARM64 const int64x2_t tmp1 = vzip1q_s64(kernel.packet[0], kernel.packet[1]); kernel.packet[1] = vzip2q_s64(kernel.packet[0], kernel.packet[1]); kernel.packet[0] = tmp1; #else const int64x1_t tmp[2][2] = { { vget_low_s64(kernel.packet[0]), vget_high_s64(kernel.packet[0]) }, { vget_low_s64(kernel.packet[1]), vget_high_s64(kernel.packet[1]) } }; kernel.packet[0] = vcombine_s64(tmp[0][0], tmp[1][0]); kernel.packet[1] = vcombine_s64(tmp[0][1], tmp[1][1]); #endif } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { #if EIGEN_ARCH_ARM64 const uint64x2_t tmp1 = vzip1q_u64(kernel.packet[0], kernel.packet[1]); kernel.packet[1] = vzip2q_u64(kernel.packet[0], kernel.packet[1]); kernel.packet[0] = tmp1; #else const uint64x1_t tmp[2][2] = { { vget_low_u64(kernel.packet[0]), vget_high_u64(kernel.packet[0]) }, { vget_low_u64(kernel.packet[1]), vget_high_u64(kernel.packet[1]) } }; kernel.packet[0] = vcombine_u64(tmp[0][0], tmp[1][0]); kernel.packet[1] = vcombine_u64(tmp[0][1], tmp[1][1]); #endif } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2f pselect( const Packet2f& mask, const Packet2f& a, const Packet2f& b) { return vbsl_f32(vreinterpret_u32_f32(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4f pselect(const Packet4f& mask, const Packet4f& a, const Packet4f& b) { return vbslq_f32(vreinterpretq_u32_f32(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8c pselect(const Packet8c& mask, const Packet8c& a, const Packet8c& b) { return vbsl_s8(vreinterpret_u8_s8(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet16c pselect(const Packet16c& mask, const Packet16c& a, const Packet16c& b) { return vbslq_s8(vreinterpretq_u8_s8(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8uc pselect(const Packet8uc& mask, const Packet8uc& a, const Packet8uc& b) { return vbsl_u8(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet16uc pselect(const Packet16uc& mask, const Packet16uc& a, const Packet16uc& b) { return vbslq_u8(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4s pselect(const Packet4s& mask, const Packet4s& a, const Packet4s& b) { return vbsl_s16(vreinterpret_u16_s16(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8s pselect(const Packet8s& mask, const Packet8s& a, const Packet8s& b) { return vbslq_s16(vreinterpretq_u16_s16(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4us pselect(const Packet4us& mask, const Packet4us& a, const Packet4us& b) { return vbsl_u16(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8us pselect(const Packet8us& mask, const Packet8us& a, const Packet8us& b) { return vbslq_u16(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2i pselect(const Packet2i& mask, const Packet2i& a, const Packet2i& b) { return vbsl_s32(vreinterpret_u32_s32(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4i pselect(const Packet4i& mask, const Packet4i& a, const Packet4i& b) { return vbslq_s32(vreinterpretq_u32_s32(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2ui pselect(const Packet2ui& mask, const Packet2ui& a, const Packet2ui& b) { return vbsl_u32(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4ui pselect(const Packet4ui& mask, const Packet4ui& a, const Packet4ui& b) { return vbslq_u32(mask, a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2l pselect(const Packet2l& mask, const Packet2l& a, const Packet2l& b) { return vbslq_s64(vreinterpretq_u64_s64(mask), a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2ul pselect(const Packet2ul& mask, const Packet2ul& a, const Packet2ul& b) { return vbslq_u64(mask, a, b); } // Use armv8 rounding intinsics if available. #if EIGEN_ARCH_ARMV8 template<> EIGEN_STRONG_INLINE Packet2f print(const Packet2f& a) { return vrndn_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f print(const Packet4f& a) { return vrndnq_f32(a); } template<> EIGEN_STRONG_INLINE Packet2f pfloor(const Packet2f& a) { return vrndm_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { return vrndmq_f32(a); } template<> EIGEN_STRONG_INLINE Packet2f pceil(const Packet2f& a) { return vrndp_f32(a); } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { return vrndpq_f32(a); } #else template<> EIGEN_STRONG_INLINE Packet4f print(const Packet4f& a) { // Adds and subtracts signum(a) * 2^23 to force rounding. const Packet4f limit = pset1(static_cast(1<<23)); const Packet4f abs_a = pabs(a); Packet4f r = padd(abs_a, limit); // Don't compile-away addition and subtraction. EIGEN_OPTIMIZATION_BARRIER(r); r = psub(r, limit); // If greater than limit, simply return a. Otherwise, account for sign. r = pselect(pcmp_lt(abs_a, limit), pselect(pcmp_lt(a, pzero(a)), pnegate(r), r), a); return r; } template<> EIGEN_STRONG_INLINE Packet2f print(const Packet2f& a) { // Adds and subtracts signum(a) * 2^23 to force rounding. const Packet2f limit = pset1(static_cast(1<<23)); const Packet2f abs_a = pabs(a); Packet2f r = padd(abs_a, limit); // Don't compile-away addition and subtraction. EIGEN_OPTIMIZATION_BARRIER(r); r = psub(r, limit); // If greater than limit, simply return a. Otherwise, account for sign. r = pselect(pcmp_lt(abs_a, limit), pselect(pcmp_lt(a, pzero(a)), pnegate(r), r), a); return r; } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { const Packet4f cst_1 = pset1(1.0f); Packet4f tmp = print(a); // If greater, subtract one. Packet4f mask = pcmp_lt(a, tmp); mask = pand(mask, cst_1); return psub(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet2f pfloor(const Packet2f& a) { const Packet2f cst_1 = pset1(1.0f); Packet2f tmp = print(a); // If greater, subtract one. Packet2f mask = pcmp_lt(a, tmp); mask = pand(mask, cst_1); return psub(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { const Packet4f cst_1 = pset1(1.0f); Packet4f tmp = print(a); // If smaller, add one. Packet4f mask = pcmp_lt(tmp, a); mask = pand(mask, cst_1); return padd(tmp, mask); } template<> EIGEN_STRONG_INLINE Packet2f pceil(const Packet2f& a) { const Packet2f cst_1 = pset1(1.0); Packet2f tmp = print(a); // If smaller, add one. Packet2f mask = pcmp_lt(tmp, a); mask = pand(mask, cst_1); return padd(tmp, mask); } #endif /** * Computes the integer square root * @remarks The calculation is performed using an algorithm which iterates through each binary digit of the result * and tests whether setting that digit to 1 would cause the square of the value to be greater than the argument * value. The algorithm is described in detail here: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf . */ template<> EIGEN_STRONG_INLINE Packet4uc psqrt(const Packet4uc& a) { uint8x8_t x = vreinterpret_u8_u32(vdup_n_u32(a)); uint8x8_t res = vdup_n_u8(0); uint8x8_t add = vdup_n_u8(0x8); for (int i = 0; i < 4; i++) { const uint8x8_t temp = vorr_u8(res, add); res = vbsl_u8(vcge_u8(x, vmul_u8(temp, temp)), temp, res); add = vshr_n_u8(add, 1); } return vget_lane_u32(vreinterpret_u32_u8(res), 0); } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet8uc psqrt(const Packet8uc& a) { uint8x8_t res = vdup_n_u8(0); uint8x8_t add = vdup_n_u8(0x8); for (int i = 0; i < 4; i++) { const uint8x8_t temp = vorr_u8(res, add); res = vbsl_u8(vcge_u8(a, vmul_u8(temp, temp)), temp, res); add = vshr_n_u8(add, 1); } return res; } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet16uc psqrt(const Packet16uc& a) { uint8x16_t res = vdupq_n_u8(0); uint8x16_t add = vdupq_n_u8(0x8); for (int i = 0; i < 4; i++) { const uint8x16_t temp = vorrq_u8(res, add); res = vbslq_u8(vcgeq_u8(a, vmulq_u8(temp, temp)), temp, res); add = vshrq_n_u8(add, 1); } return res; } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet4us psqrt(const Packet4us& a) { uint16x4_t res = vdup_n_u16(0); uint16x4_t add = vdup_n_u16(0x80); for (int i = 0; i < 8; i++) { const uint16x4_t temp = vorr_u16(res, add); res = vbsl_u16(vcge_u16(a, vmul_u16(temp, temp)), temp, res); add = vshr_n_u16(add, 1); } return res; } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet8us psqrt(const Packet8us& a) { uint16x8_t res = vdupq_n_u16(0); uint16x8_t add = vdupq_n_u16(0x80); for (int i = 0; i < 8; i++) { const uint16x8_t temp = vorrq_u16(res, add); res = vbslq_u16(vcgeq_u16(a, vmulq_u16(temp, temp)), temp, res); add = vshrq_n_u16(add, 1); } return res; } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet2ui psqrt(const Packet2ui& a) { uint32x2_t res = vdup_n_u32(0); uint32x2_t add = vdup_n_u32(0x8000); for (int i = 0; i < 16; i++) { const uint32x2_t temp = vorr_u32(res, add); res = vbsl_u32(vcge_u32(a, vmul_u32(temp, temp)), temp, res); add = vshr_n_u32(add, 1); } return res; } /// @copydoc Eigen::internal::psqrt(const Packet4uc& a) template<> EIGEN_STRONG_INLINE Packet4ui psqrt(const Packet4ui& a) { uint32x4_t res = vdupq_n_u32(0); uint32x4_t add = vdupq_n_u32(0x8000); for (int i = 0; i < 16; i++) { const uint32x4_t temp = vorrq_u32(res, add); res = vbslq_u32(vcgeq_u32(a, vmulq_u32(temp, temp)), temp, res); add = vshrq_n_u32(add, 1); } return res; } template<> EIGEN_STRONG_INLINE Packet4f prsqrt(const Packet4f& a) { // Compute approximate reciprocal sqrt. Packet4f x = vrsqrteq_f32(a); // Do Newton iterations for 1/sqrt(x). x = vmulq_f32(vrsqrtsq_f32(vmulq_f32(a, x), x), x); x = vmulq_f32(vrsqrtsq_f32(vmulq_f32(a, x), x), x); const Packet4f infinity = pset1(NumTraits::infinity()); return pselect(pcmp_eq(a, pzero(a)), infinity, x); } template<> EIGEN_STRONG_INLINE Packet2f prsqrt(const Packet2f& a) { // Compute approximate reciprocal sqrt. Packet2f x = vrsqrte_f32(a); // Do Newton iterations for 1/sqrt(x). x = vmul_f32(vrsqrts_f32(vmul_f32(a, x), x), x); x = vmul_f32(vrsqrts_f32(vmul_f32(a, x), x), x); const Packet2f infinity = pset1(NumTraits::infinity()); return pselect(pcmp_eq(a, pzero(a)), infinity, x); } // Unfortunately vsqrt_f32 is only available for A64. #if EIGEN_ARCH_ARM64 template<> EIGEN_STRONG_INLINE Packet4f psqrt(const Packet4f& _x){return vsqrtq_f32(_x);} template<> EIGEN_STRONG_INLINE Packet2f psqrt(const Packet2f& _x){return vsqrt_f32(_x); } #else template<> EIGEN_STRONG_INLINE Packet4f psqrt(const Packet4f& a) { const Packet4f infinity = pset1(NumTraits::infinity()); const Packet4f is_zero_or_inf = por(pcmp_eq(a, pzero(a)), pcmp_eq(a, infinity)); return pselect(is_zero_or_inf, a, pmul(a, prsqrt(a))); } template<> EIGEN_STRONG_INLINE Packet2f psqrt(const Packet2f& a) { const Packet2f infinity = pset1(NumTraits::infinity()); const Packet2f is_zero_or_inf = por(pcmp_eq(a, pzero(a)), pcmp_eq(a, infinity)); return pselect(is_zero_or_inf, a, pmul(a, prsqrt(a))); } #endif //---------- bfloat16 ---------- // TODO: Add support for native armv8.6-a bfloat16_t // TODO: Guard if we have native bfloat16 support typedef eigen_packet_wrapper Packet4bf; template<> struct is_arithmetic { enum { value = true }; }; template<> struct packet_traits : default_packet_traits { typedef Packet4bf type; typedef Packet4bf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasDiv = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, HasSqrt = 0, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasBessel = 0, // Issues with accuracy. HasNdtri = 0 }; }; template<> struct unpacket_traits { typedef bfloat16 type; typedef Packet4bf half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; namespace detail { template<> EIGEN_ALWAYS_INLINE void zip_in_place(Packet4bf& p1, Packet4bf& p2) { const uint16x4x2_t tmp = vzip_u16(p1, p2); p1 = tmp.val[0]; p2 = tmp.val[1]; } } // namespace detail EIGEN_STRONG_INLINE Packet4bf F32ToBf16(const Packet4f& p) { // See the scalar implemention in BFloat16.h for a comprehensible explanation // of this fast rounding algorithm Packet4ui input = reinterpret_cast(p); // lsb = (input >> 16) & 1 Packet4ui lsb = vandq_u32(vshrq_n_u32(input, 16), vdupq_n_u32(1)); // rounding_bias = 0x7fff + lsb Packet4ui rounding_bias = vaddq_u32(lsb, vdupq_n_u32(0x7fff)); // input += rounding_bias input = vaddq_u32(input, rounding_bias); // input = input >> 16 input = vshrq_n_u32(input, 16); // Replace float-nans by bfloat16-nans, that is 0x7fc0 const Packet4ui bf16_nan = vdupq_n_u32(0x7fc0); const Packet4ui mask = vceqq_f32(p, p); input = vbslq_u32(mask, input, bf16_nan); // output = static_cast(input) return vmovn_u32(input); } EIGEN_STRONG_INLINE Packet4f Bf16ToF32(const Packet4bf& p) { return reinterpret_cast(vshlq_n_u32(vmovl_u16(p), 16)); } EIGEN_STRONG_INLINE Packet4bf F32MaskToBf16Mask(const Packet4f& p) { return vmovn_u32(vreinterpretq_u32_f32(p)); } template<> EIGEN_STRONG_INLINE Packet4bf pset1(const bfloat16& from) { return pset1(from.value); } template<> EIGEN_STRONG_INLINE bfloat16 pfirst(const Packet4bf& from) { return bfloat16_impl::raw_uint16_to_bfloat16(static_cast(pfirst(from))); } template<> EIGEN_STRONG_INLINE Packet4bf pload(const bfloat16* from) { return pload(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet4bf ploadu(const bfloat16* from) { return ploadu(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE void pstore(bfloat16* to, const Packet4bf& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_u16(reinterpret_cast(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(bfloat16* to, const Packet4bf& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_u16(reinterpret_cast(to), from); } template<> EIGEN_STRONG_INLINE Packet4bf ploaddup(const bfloat16* from) { return ploaddup(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet4bf pabs(const Packet4bf& a) { return F32ToBf16(pabs(Bf16ToF32(a))); } template <> EIGEN_STRONG_INLINE Packet4bf pmin(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmin(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet4bf pmin(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmin(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet4bf pmin(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmin(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet4bf pmax(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmax(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet4bf pmax(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmax(Bf16ToF32(a), Bf16ToF32(b))); } template <> EIGEN_STRONG_INLINE Packet4bf pmax(const Packet4bf &a, const Packet4bf &b) { return F32ToBf16(pmax(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf plset(const bfloat16& a) { return F32ToBf16(plset(static_cast(a))); } template<> EIGEN_STRONG_INLINE Packet4bf por(const Packet4bf& a,const Packet4bf& b) { return por(a, b); } template<> EIGEN_STRONG_INLINE Packet4bf pxor(const Packet4bf& a,const Packet4bf& b) { return pxor(a, b); } template<> EIGEN_STRONG_INLINE Packet4bf pand(const Packet4bf& a,const Packet4bf& b) { return pand(a, b); } template<> EIGEN_STRONG_INLINE Packet4bf pandnot(const Packet4bf& a,const Packet4bf& b) { return pandnot(a, b); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4bf pselect(const Packet4bf& mask, const Packet4bf& a, const Packet4bf& b) { return pselect(mask, a, b); } template<> EIGEN_STRONG_INLINE Packet4bf print(const Packet4bf& a) { return F32ToBf16(print(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet4bf pfloor(const Packet4bf& a) { return F32ToBf16(pfloor(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet4bf pceil(const Packet4bf& a) { return F32ToBf16(pceil(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet4bf pconj(const Packet4bf& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4bf padd(const Packet4bf& a, const Packet4bf& b) { return F32ToBf16(padd(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf psub(const Packet4bf& a, const Packet4bf& b) { return F32ToBf16(psub(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pmul(const Packet4bf& a, const Packet4bf& b) { return F32ToBf16(pmul(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pdiv(const Packet4bf& a, const Packet4bf& b) { return F32ToBf16(pdiv(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pgather(const bfloat16* from, Index stride) { return pgather(reinterpret_cast(from), stride); } template<> EIGEN_STRONG_INLINE void pscatter(bfloat16* to, const Packet4bf& from, Index stride) { pscatter(reinterpret_cast(to), from, stride); } template<> EIGEN_STRONG_INLINE bfloat16 predux(const Packet4bf& a) { return static_cast(predux(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_max(const Packet4bf& a) { return static_cast(predux_max(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_min(const Packet4bf& a) { return static_cast(predux_min(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE bfloat16 predux_mul(const Packet4bf& a) { return static_cast(predux_mul(Bf16ToF32(a))); } template<> EIGEN_STRONG_INLINE Packet4bf preverse(const Packet4bf& a) { return preverse(a); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { detail::ptranspose_impl(kernel); } template<> EIGEN_STRONG_INLINE Packet4bf pabsdiff(const Packet4bf& a, const Packet4bf& b) { return F32ToBf16(pabsdiff(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pcmp_eq(const Packet4bf& a, const Packet4bf& b) { return F32MaskToBf16Mask(pcmp_eq(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pcmp_lt(const Packet4bf& a, const Packet4bf& b) { return F32MaskToBf16Mask(pcmp_lt(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pcmp_lt_or_nan(const Packet4bf& a, const Packet4bf& b) { return F32MaskToBf16Mask(pcmp_lt_or_nan(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pcmp_le(const Packet4bf& a, const Packet4bf& b) { return F32MaskToBf16Mask(pcmp_le(Bf16ToF32(a), Bf16ToF32(b))); } template<> EIGEN_STRONG_INLINE Packet4bf pnegate(const Packet4bf& a) { return pxor(a, pset1(static_cast(0x8000))); } //---------- double ---------- // Clang 3.5 in the iOS toolchain has an ICE triggered by NEON intrisics for double. // Confirmed at least with __apple_build_version__ = 6000054. #ifdef __apple_build_version__ // Let's hope that by the time __apple_build_version__ hits the 601* range, the bug will be fixed. // https://gist.github.com/yamaya/2924292 suggests that the 3 first digits are only updated with // major toolchain updates. #define EIGEN_APPLE_DOUBLE_NEON_BUG (__apple_build_version__ < 6010000) #else #define EIGEN_APPLE_DOUBLE_NEON_BUG 0 #endif #if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG // Bug 907: workaround missing declarations of the following two functions in the ADK // Defining these functions as templates ensures that if these intrinsics are // already defined in arm_neon.h, then our workaround doesn't cause a conflict // and has lower priority in overload resolution. template uint64x2_t vreinterpretq_u64_f64(T a) { return (uint64x2_t) a; } template float64x2_t vreinterpretq_f64_u64(T a) { return (float64x2_t) a; } typedef float64x2_t Packet2d; typedef float64x1_t Packet1d; // fuctionally equivalent to _mm_shuffle_pd in SSE (i.e. shuffle(m, n, mask) equals _mm_shuffle_pd(m,n,mask)) // Currently used in LU/arch/InverseSize4.h to enable a shared implementation // for fast inversion of matrices of size 4. EIGEN_STRONG_INLINE Packet2d shuffle(const Packet2d& m, const Packet2d& n, int mask) { const double* a = reinterpret_cast(&m); const double* b = reinterpret_cast(&n); Packet2d res = {*(a + (mask & 1)), *(b + ((mask >> 1) & 1))}; return res; } EIGEN_STRONG_INLINE Packet2d vec2d_swizzle2(const Packet2d& a, const Packet2d& b, int mask) { return shuffle(a, b, mask); } EIGEN_STRONG_INLINE Packet2d vec2d_unpacklo(const Packet2d& a,const Packet2d& b) { return shuffle(a, b, 0); } EIGEN_STRONG_INLINE Packet2d vec2d_unpackhi(const Packet2d& a,const Packet2d& b) { return shuffle(a, b, 3); } #define vec2d_duplane(a, p) \ vdupq_laneq_f64(a, p) template<> struct packet_traits : default_packet_traits { typedef Packet2d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasCmp = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 1, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasDiv = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasSin = 0, HasCos = 0, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasTanh = 0, HasErf = 0 }; }; template<> struct unpacket_traits { typedef double type; typedef Packet2d half; typedef Packet2l integer_packet; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { return vdupq_n_f64(from); } template<> EIGEN_STRONG_INLINE Packet2d plset(const double& a) { const double c[] = {0.0,1.0}; return vaddq_f64(pset1(a), vld1q_f64(c)); } template<> EIGEN_STRONG_INLINE Packet2d padd(const Packet2d& a, const Packet2d& b) { return vaddq_f64(a,b); } template<> EIGEN_STRONG_INLINE Packet2d psub(const Packet2d& a, const Packet2d& b) { return vsubq_f64(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& , const Packet2d& ); template<> EIGEN_STRONG_INLINE Packet2d paddsub(const Packet2d& a, const Packet2d& b){ const Packet2d mask = {numext::bit_cast(0x8000000000000000ull),0.0}; return padd(a, pxor(mask, b)); } template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { return vnegq_f64(a); } template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2d pmul(const Packet2d& a, const Packet2d& b) { return vmulq_f64(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pdiv(const Packet2d& a, const Packet2d& b) { return vdivq_f64(a,b); } #ifdef __ARM_FEATURE_FMA // See bug 936. See above comment about FMA for float. template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vfmaq_f64(c,a,b); } #else template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vmlaq_f64(c,a,b); } #endif template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return vminq_f64(a,b); } #ifdef __ARM_FEATURE_NUMERIC_MAXMIN // numeric max and min are only available if ARM_FEATURE_NUMERIC_MAXMIN is defined (which can only be the case for Armv8 systems). template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return vminnmq_f64(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return vmaxnmq_f64(a, b); } #endif template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { return pmin(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return vmaxq_f64(a,b); } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { return pmax(a, b); } // Logical Operations are not supported for float, so we have to reinterpret casts using NEON intrinsics template<> EIGEN_STRONG_INLINE Packet2d pand(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b))); } template<> EIGEN_STRONG_INLINE Packet2d por(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b))); } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b))); } template<> EIGEN_STRONG_INLINE Packet2d pandnot(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b))); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_le(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vcleq_f64(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vcltq_f64(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt_or_nan(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u32(vmvnq_u32(vreinterpretq_u32_u64(vcgeq_f64(a,b)))); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_eq(const Packet2d& a, const Packet2d& b) { return vreinterpretq_f64_u64(vceqq_f64(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pload(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_f64(from); } template<> EIGEN_STRONG_INLINE Packet2d ploadu(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_f64(from); } template<> EIGEN_STRONG_INLINE Packet2d ploaddup(const double* from) { return vld1q_dup_f64(from); } template<> EIGEN_STRONG_INLINE void pstore(double* to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_f64(to,from); } template<> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet2d& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_f64(to,from); } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2d pgather(const double* from, Index stride) { Packet2d res = pset1(0.0); res = vld1q_lane_f64(from + 0*stride, res, 0); res = vld1q_lane_f64(from + 1*stride, res, 1); return res; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(double* to, const Packet2d& from, Index stride) { vst1q_lane_f64(to + stride*0, from, 0); vst1q_lane_f64(to + stride*1, from, 1); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { EIGEN_ARM_PREFETCH(addr); } // FIXME only store the 2 first elements ? template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { return vgetq_lane_f64(a,0); } template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return vcombine_f64(vget_high_f64(a), vget_low_f64(a)); } template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vabsq_f64(a); } #if EIGEN_COMP_CLANG && defined(__apple_build_version__) // workaround ICE, see bug 907 template<> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { return (vget_low_f64(a) + vget_high_f64(a))[0]; } #else template<> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0); } #endif // Other reduction functions: // mul #if EIGEN_COMP_CLANG && defined(__apple_build_version__) template<> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { return (vget_low_f64(a) * vget_high_f64(a))[0]; } #else template<> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { return vget_lane_f64(vget_low_f64(a) * vget_high_f64(a), 0); } #endif // min template<> EIGEN_STRONG_INLINE double predux_min(const Packet2d& a) { return vgetq_lane_f64(vpminq_f64(a,a), 0); } // max template<> EIGEN_STRONG_INLINE double predux_max(const Packet2d& a) { return vgetq_lane_f64(vpmaxq_f64(a,a), 0); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { const float64x2_t tmp1 = vzip1q_f64(kernel.packet[0], kernel.packet[1]); const float64x2_t tmp2 = vzip2q_f64(kernel.packet[0], kernel.packet[1]); kernel.packet[0] = tmp1; kernel.packet[1] = tmp2; } template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet2d pselect( const Packet2d& mask, const Packet2d& a, const Packet2d& b) { return vbslq_f64(vreinterpretq_u64_f64(mask), a, b); } template<> EIGEN_STRONG_INLINE Packet2d print(const Packet2d& a) { return vrndnq_f64(a); } template<> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { return vrndmq_f64(a); } template<> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { return vrndpq_f64(a); } template<> EIGEN_STRONG_INLINE Packet2d pldexp(const Packet2d& a, const Packet2d& exponent) { return pldexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE Packet2d pfrexp(const Packet2d& a, Packet2d& exponent) { return pfrexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet2d pset1frombits(uint64_t from) { return vreinterpretq_f64_u64(vdupq_n_u64(from)); } template<> EIGEN_STRONG_INLINE Packet2d prsqrt(const Packet2d& a) { // Compute approximate reciprocal sqrt. Packet2d x = vrsqrteq_f64(a); // Do Newton iterations for 1/sqrt(x). x = vmulq_f64(vrsqrtsq_f64(vmulq_f64(a, x), x), x); x = vmulq_f64(vrsqrtsq_f64(vmulq_f64(a, x), x), x); x = vmulq_f64(vrsqrtsq_f64(vmulq_f64(a, x), x), x); const Packet2d infinity = pset1(NumTraits::infinity()); return pselect(pcmp_eq(a, pzero(a)), infinity, x); } template<> EIGEN_STRONG_INLINE Packet2d psqrt(const Packet2d& _x){ return vsqrtq_f64(_x); } #endif // EIGEN_ARCH_ARM64 // Do we have an fp16 types and supporting Neon intrinsics? #if EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC typedef float16x4_t Packet4hf; typedef float16x8_t Packet8hf; template <> struct packet_traits : default_packet_traits { typedef Packet8hf type; typedef Packet4hf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 1, HasCmp = 1, HasCast = 1, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasNegate = 1, HasAbs = 1, HasArg = 0, HasAbs2 = 1, HasAbsDiff = 0, HasMin = 1, HasMax = 1, HasConj = 1, HasSetLinear = 0, HasBlend = 0, HasInsert = 1, HasReduxp = 1, HasDiv = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasSin = 0, HasCos = 0, HasLog = 0, HasExp = 0, HasSqrt = 1, HasRsqrt = 1, HasErf = EIGEN_FAST_MATH, HasBessel = 0, // Issues with accuracy. HasNdtri = 0 }; }; template <> struct unpacket_traits { typedef Eigen::half type; typedef Packet4hf half; enum { size = 4, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template <> struct unpacket_traits { typedef Eigen::half type; typedef Packet4hf half; enum { size = 8, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4hf predux_half_dowto4(const Packet8hf& a) { return vadd_f16(vget_low_f16(a), vget_high_f16(a)); } template <> EIGEN_STRONG_INLINE Packet8hf pset1(const Eigen::half& from) { return vdupq_n_f16(from.x); } template <> EIGEN_STRONG_INLINE Packet4hf pset1(const Eigen::half& from) { return vdup_n_f16(from.x); } template <> EIGEN_STRONG_INLINE Packet8hf plset(const Eigen::half& a) { const float16_t f[] = {0, 1, 2, 3, 4, 5, 6, 7}; Packet8hf countdown = vld1q_f16(f); return vaddq_f16(pset1(a), countdown); } template <> EIGEN_STRONG_INLINE Packet4hf plset(const Eigen::half& a) { const float16_t f[] = {0, 1, 2, 3}; Packet4hf countdown = vld1_f16(f); return vadd_f16(pset1(a), countdown); } template <> EIGEN_STRONG_INLINE Packet8hf padd(const Packet8hf& a, const Packet8hf& b) { return vaddq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf padd(const Packet4hf& a, const Packet4hf& b) { return vadd_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet8hf psub(const Packet8hf& a, const Packet8hf& b) { return vsubq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf psub(const Packet4hf& a, const Packet4hf& b) { return vsub_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet8hf pnegate(const Packet8hf& a) { return vnegq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf pnegate(const Packet4hf& a) { return vneg_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf pconj(const Packet8hf& a) { return a; } template <> EIGEN_STRONG_INLINE Packet4hf pconj(const Packet4hf& a) { return a; } template <> EIGEN_STRONG_INLINE Packet8hf pmul(const Packet8hf& a, const Packet8hf& b) { return vmulq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf pmul(const Packet4hf& a, const Packet4hf& b) { return vmul_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet8hf pdiv(const Packet8hf& a, const Packet8hf& b) { return vdivq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf pdiv(const Packet4hf& a, const Packet4hf& b) { return vdiv_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet8hf pmadd(const Packet8hf& a, const Packet8hf& b, const Packet8hf& c) { return vfmaq_f16(c, a, b); } template <> EIGEN_STRONG_INLINE Packet4hf pmadd(const Packet4hf& a, const Packet4hf& b, const Packet4hf& c) { return vfma_f16(c, a, b); } template <> EIGEN_STRONG_INLINE Packet8hf pmin(const Packet8hf& a, const Packet8hf& b) { return vminq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf pmin(const Packet4hf& a, const Packet4hf& b) { return vmin_f16(a, b); } #ifdef __ARM_FEATURE_NUMERIC_MAXMIN // numeric max and min are only available if ARM_FEATURE_NUMERIC_MAXMIN is defined (which can only be the case for Armv8 systems). template<> EIGEN_STRONG_INLINE Packet4hf pmin(const Packet4hf& a, const Packet4hf& b) { return vminnm_f16(a, b); } template<> EIGEN_STRONG_INLINE Packet8hf pmin(const Packet8hf& a, const Packet8hf& b) { return vminnmq_f16(a, b); } #endif template<> EIGEN_STRONG_INLINE Packet4hf pmin(const Packet4hf& a, const Packet4hf& b) { return pmin(a, b); } template<> EIGEN_STRONG_INLINE Packet8hf pmin(const Packet8hf& a, const Packet8hf& b) { return pmin(a, b); } template <> EIGEN_STRONG_INLINE Packet8hf pmax(const Packet8hf& a, const Packet8hf& b) { return vmaxq_f16(a, b); } template <> EIGEN_STRONG_INLINE Packet4hf pmax(const Packet4hf& a, const Packet4hf& b) { return vmax_f16(a, b); } #ifdef __ARM_FEATURE_NUMERIC_MAXMIN // numeric max and min are only available if ARM_FEATURE_NUMERIC_MAXMIN is defined (which can only be the case for Armv8 systems). template<> EIGEN_STRONG_INLINE Packet4hf pmax(const Packet4hf& a, const Packet4hf& b) { return vmaxnm_f16(a, b); } template<> EIGEN_STRONG_INLINE Packet8hf pmax(const Packet8hf& a, const Packet8hf& b) { return vmaxnmq_f16(a, b); } #endif template<> EIGEN_STRONG_INLINE Packet4hf pmax(const Packet4hf& a, const Packet4hf& b) { return pmax(a, b); } template<> EIGEN_STRONG_INLINE Packet8hf pmax(const Packet8hf& a, const Packet8hf& b) { return pmax(a, b); } #define EIGEN_MAKE_ARM_FP16_CMP_8(name) \ template <> \ EIGEN_STRONG_INLINE Packet8hf pcmp_##name(const Packet8hf& a, const Packet8hf& b) { \ return vreinterpretq_f16_u16(vc##name##q_f16(a, b)); \ } #define EIGEN_MAKE_ARM_FP16_CMP_4(name) \ template <> \ EIGEN_STRONG_INLINE Packet4hf pcmp_##name(const Packet4hf& a, const Packet4hf& b) { \ return vreinterpret_f16_u16(vc##name##_f16(a, b)); \ } EIGEN_MAKE_ARM_FP16_CMP_8(eq) EIGEN_MAKE_ARM_FP16_CMP_8(lt) EIGEN_MAKE_ARM_FP16_CMP_8(le) EIGEN_MAKE_ARM_FP16_CMP_4(eq) EIGEN_MAKE_ARM_FP16_CMP_4(lt) EIGEN_MAKE_ARM_FP16_CMP_4(le) #undef EIGEN_MAKE_ARM_FP16_CMP_8 #undef EIGEN_MAKE_ARM_FP16_CMP_4 template <> EIGEN_STRONG_INLINE Packet8hf pcmp_lt_or_nan(const Packet8hf& a, const Packet8hf& b) { return vreinterpretq_f16_u16(vmvnq_u16(vcgeq_f16(a, b))); } template <> EIGEN_STRONG_INLINE Packet4hf pcmp_lt_or_nan(const Packet4hf& a, const Packet4hf& b) { return vreinterpret_f16_u16(vmvn_u16(vcge_f16(a, b))); } template <> EIGEN_STRONG_INLINE Packet8hf print(const Packet8hf& a) { return vrndnq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf print(const Packet4hf& a) { return vrndn_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf pfloor(const Packet8hf& a) { return vrndmq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf pfloor(const Packet4hf& a) { return vrndm_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf pceil(const Packet8hf& a) { return vrndpq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf pceil(const Packet4hf& a) { return vrndp_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf psqrt(const Packet8hf& a) { return vsqrtq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf psqrt(const Packet4hf& a) { return vsqrt_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf pand(const Packet8hf& a, const Packet8hf& b) { return vreinterpretq_f16_u16(vandq_u16(vreinterpretq_u16_f16(a), vreinterpretq_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet4hf pand(const Packet4hf& a, const Packet4hf& b) { return vreinterpret_f16_u16(vand_u16(vreinterpret_u16_f16(a), vreinterpret_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet8hf por(const Packet8hf& a, const Packet8hf& b) { return vreinterpretq_f16_u16(vorrq_u16(vreinterpretq_u16_f16(a), vreinterpretq_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet4hf por(const Packet4hf& a, const Packet4hf& b) { return vreinterpret_f16_u16(vorr_u16(vreinterpret_u16_f16(a), vreinterpret_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet8hf pxor(const Packet8hf& a, const Packet8hf& b) { return vreinterpretq_f16_u16(veorq_u16(vreinterpretq_u16_f16(a), vreinterpretq_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet4hf pxor(const Packet4hf& a, const Packet4hf& b) { return vreinterpret_f16_u16(veor_u16(vreinterpret_u16_f16(a), vreinterpret_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet8hf pandnot(const Packet8hf& a, const Packet8hf& b) { return vreinterpretq_f16_u16(vbicq_u16(vreinterpretq_u16_f16(a), vreinterpretq_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet4hf pandnot(const Packet4hf& a, const Packet4hf& b) { return vreinterpret_f16_u16(vbic_u16(vreinterpret_u16_f16(a), vreinterpret_u16_f16(b))); } template <> EIGEN_STRONG_INLINE Packet8hf pload(const Eigen::half* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_f16(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet4hf pload(const Eigen::half* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1_f16(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet8hf ploadu(const Eigen::half* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_f16(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet4hf ploadu(const Eigen::half* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1_f16(reinterpret_cast(from)); } template <> EIGEN_STRONG_INLINE Packet8hf ploaddup(const Eigen::half* from) { Packet8hf packet; packet[0] = from[0].x; packet[1] = from[0].x; packet[2] = from[1].x; packet[3] = from[1].x; packet[4] = from[2].x; packet[5] = from[2].x; packet[6] = from[3].x; packet[7] = from[3].x; return packet; } template <> EIGEN_STRONG_INLINE Packet4hf ploaddup(const Eigen::half* from) { float16x4_t packet; float16_t* tmp; tmp = (float16_t*)&packet; tmp[0] = from[0].x; tmp[1] = from[0].x; tmp[2] = from[1].x; tmp[3] = from[1].x; return packet; } template <> EIGEN_STRONG_INLINE Packet8hf ploadquad(const Eigen::half* from) { Packet4hf lo, hi; lo = vld1_dup_f16(reinterpret_cast(from)); hi = vld1_dup_f16(reinterpret_cast(from+1)); return vcombine_f16(lo, hi); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8hf pinsertfirst(const Packet8hf& a, Eigen::half b) { return vsetq_lane_f16(b.x, a, 0); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4hf pinsertfirst(const Packet4hf& a, Eigen::half b) { return vset_lane_f16(b.x, a, 0); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8hf pselect(const Packet8hf& mask, const Packet8hf& a, const Packet8hf& b) { return vbslq_f16(vreinterpretq_u16_f16(mask), a, b); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4hf pselect(const Packet4hf& mask, const Packet4hf& a, const Packet4hf& b) { return vbsl_f16(vreinterpret_u16_f16(mask), a, b); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8hf pinsertlast(const Packet8hf& a, Eigen::half b) { return vsetq_lane_f16(b.x, a, 7); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4hf pinsertlast(const Packet4hf& a, Eigen::half b) { return vset_lane_f16(b.x, a, 3); } template <> EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const Packet8hf& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_f16(reinterpret_cast(to), from); } template <> EIGEN_STRONG_INLINE void pstore(Eigen::half* to, const Packet4hf& from) { EIGEN_DEBUG_ALIGNED_STORE vst1_f16(reinterpret_cast(to), from); } template <> EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const Packet8hf& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_f16(reinterpret_cast(to), from); } template <> EIGEN_STRONG_INLINE void pstoreu(Eigen::half* to, const Packet4hf& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1_f16(reinterpret_cast(to), from); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet8hf pgather(const Eigen::half* from, Index stride) { Packet8hf res = pset1(Eigen::half(0.f)); res = vsetq_lane_f16(from[0 * stride].x, res, 0); res = vsetq_lane_f16(from[1 * stride].x, res, 1); res = vsetq_lane_f16(from[2 * stride].x, res, 2); res = vsetq_lane_f16(from[3 * stride].x, res, 3); res = vsetq_lane_f16(from[4 * stride].x, res, 4); res = vsetq_lane_f16(from[5 * stride].x, res, 5); res = vsetq_lane_f16(from[6 * stride].x, res, 6); res = vsetq_lane_f16(from[7 * stride].x, res, 7); return res; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet4hf pgather(const Eigen::half* from, Index stride) { Packet4hf res = pset1(Eigen::half(0.f)); res = vset_lane_f16(from[0 * stride].x, res, 0); res = vset_lane_f16(from[1 * stride].x, res, 1); res = vset_lane_f16(from[2 * stride].x, res, 2); res = vset_lane_f16(from[3 * stride].x, res, 3); return res; } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(Eigen::half* to, const Packet8hf& from, Index stride) { to[stride * 0].x = vgetq_lane_f16(from, 0); to[stride * 1].x = vgetq_lane_f16(from, 1); to[stride * 2].x = vgetq_lane_f16(from, 2); to[stride * 3].x = vgetq_lane_f16(from, 3); to[stride * 4].x = vgetq_lane_f16(from, 4); to[stride * 5].x = vgetq_lane_f16(from, 5); to[stride * 6].x = vgetq_lane_f16(from, 6); to[stride * 7].x = vgetq_lane_f16(from, 7); } template <> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pscatter(Eigen::half* to, const Packet4hf& from, Index stride) { to[stride * 0].x = vget_lane_f16(from, 0); to[stride * 1].x = vget_lane_f16(from, 1); to[stride * 2].x = vget_lane_f16(from, 2); to[stride * 3].x = vget_lane_f16(from, 3); } template <> EIGEN_STRONG_INLINE void prefetch(const Eigen::half* addr) { EIGEN_ARM_PREFETCH(addr); } template <> EIGEN_STRONG_INLINE Eigen::half pfirst(const Packet8hf& a) { float16_t x[8]; vst1q_f16(x, a); Eigen::half h; h.x = x[0]; return h; } template <> EIGEN_STRONG_INLINE Eigen::half pfirst(const Packet4hf& a) { float16_t x[4]; vst1_f16(x, a); Eigen::half h; h.x = x[0]; return h; } template<> EIGEN_STRONG_INLINE Packet8hf preverse(const Packet8hf& a) { float16x4_t a_lo, a_hi; Packet8hf a_r64; a_r64 = vrev64q_f16(a); a_lo = vget_low_f16(a_r64); a_hi = vget_high_f16(a_r64); return vcombine_f16(a_hi, a_lo); } template <> EIGEN_STRONG_INLINE Packet4hf preverse(const Packet4hf& a) { return vrev64_f16(a); } template <> EIGEN_STRONG_INLINE Packet8hf pabs(const Packet8hf& a) { return vabsq_f16(a); } template <> EIGEN_STRONG_INLINE Packet4hf pabs(const Packet4hf& a) { return vabs_f16(a); } template <> EIGEN_STRONG_INLINE Eigen::half predux(const Packet8hf& a) { float16x4_t a_lo, a_hi, sum; a_lo = vget_low_f16(a); a_hi = vget_high_f16(a); sum = vpadd_f16(a_lo, a_hi); sum = vpadd_f16(sum, sum); sum = vpadd_f16(sum, sum); Eigen::half h; h.x = vget_lane_f16(sum, 0); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux(const Packet4hf& a) { float16x4_t sum; sum = vpadd_f16(a, a); sum = vpadd_f16(sum, sum); Eigen::half h; h.x = vget_lane_f16(sum, 0); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_mul(const Packet8hf& a) { float16x4_t a_lo, a_hi, prod; a_lo = vget_low_f16(a); a_hi = vget_high_f16(a); prod = vmul_f16(a_lo, a_hi); prod = vmul_f16(prod, vrev64_f16(prod)); Eigen::half h; h.x = vmulh_f16(vget_lane_f16(prod, 0), vget_lane_f16(prod, 1)); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_mul(const Packet4hf& a) { float16x4_t prod; prod = vmul_f16(a, vrev64_f16(a)); Eigen::half h; h.x = vmulh_f16(vget_lane_f16(prod, 0), vget_lane_f16(prod, 1)); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_min(const Packet8hf& a) { float16x4_t a_lo, a_hi, min; a_lo = vget_low_f16(a); a_hi = vget_high_f16(a); min = vpmin_f16(a_lo, a_hi); min = vpmin_f16(min, min); min = vpmin_f16(min, min); Eigen::half h; h.x = vget_lane_f16(min, 0); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_min(const Packet4hf& a) { Packet4hf tmp; tmp = vpmin_f16(a, a); tmp = vpmin_f16(tmp, tmp); Eigen::half h; h.x = vget_lane_f16(tmp, 0); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_max(const Packet8hf& a) { float16x4_t a_lo, a_hi, max; a_lo = vget_low_f16(a); a_hi = vget_high_f16(a); max = vpmax_f16(a_lo, a_hi); max = vpmax_f16(max, max); max = vpmax_f16(max, max); Eigen::half h; h.x = vget_lane_f16(max, 0); return h; } template <> EIGEN_STRONG_INLINE Eigen::half predux_max(const Packet4hf& a) { Packet4hf tmp; tmp = vpmax_f16(a, a); tmp = vpmax_f16(tmp, tmp); Eigen::half h; h.x = vget_lane_f16(tmp, 0); return h; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { const float16x8x2_t zip16_1 = vzipq_f16(kernel.packet[0], kernel.packet[1]); const float16x8x2_t zip16_2 = vzipq_f16(kernel.packet[2], kernel.packet[3]); const float32x4x2_t zip32_1 = vzipq_f32(vreinterpretq_f32_f16(zip16_1.val[0]), vreinterpretq_f32_f16(zip16_2.val[0])); const float32x4x2_t zip32_2 = vzipq_f32(vreinterpretq_f32_f16(zip16_1.val[1]), vreinterpretq_f32_f16(zip16_2.val[1])); kernel.packet[0] = vreinterpretq_f16_f32(zip32_1.val[0]); kernel.packet[1] = vreinterpretq_f16_f32(zip32_1.val[1]); kernel.packet[2] = vreinterpretq_f16_f32(zip32_2.val[0]); kernel.packet[3] = vreinterpretq_f16_f32(zip32_2.val[1]); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { EIGEN_ALIGN16 float16x4x4_t tmp_x4; float16_t* tmp = (float16_t*)&kernel; tmp_x4 = vld4_f16(tmp); kernel.packet[0] = tmp_x4.val[0]; kernel.packet[1] = tmp_x4.val[1]; kernel.packet[2] = tmp_x4.val[2]; kernel.packet[3] = tmp_x4.val[3]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { float16x8x2_t T_1[4]; T_1[0] = vuzpq_f16(kernel.packet[0], kernel.packet[1]); T_1[1] = vuzpq_f16(kernel.packet[2], kernel.packet[3]); T_1[2] = vuzpq_f16(kernel.packet[4], kernel.packet[5]); T_1[3] = vuzpq_f16(kernel.packet[6], kernel.packet[7]); float16x8x2_t T_2[4]; T_2[0] = vuzpq_f16(T_1[0].val[0], T_1[1].val[0]); T_2[1] = vuzpq_f16(T_1[0].val[1], T_1[1].val[1]); T_2[2] = vuzpq_f16(T_1[2].val[0], T_1[3].val[0]); T_2[3] = vuzpq_f16(T_1[2].val[1], T_1[3].val[1]); float16x8x2_t T_3[4]; T_3[0] = vuzpq_f16(T_2[0].val[0], T_2[2].val[0]); T_3[1] = vuzpq_f16(T_2[0].val[1], T_2[2].val[1]); T_3[2] = vuzpq_f16(T_2[1].val[0], T_2[3].val[0]); T_3[3] = vuzpq_f16(T_2[1].val[1], T_2[3].val[1]); kernel.packet[0] = T_3[0].val[0]; kernel.packet[1] = T_3[2].val[0]; kernel.packet[2] = T_3[1].val[0]; kernel.packet[3] = T_3[3].val[0]; kernel.packet[4] = T_3[0].val[1]; kernel.packet[5] = T_3[2].val[1]; kernel.packet[6] = T_3[1].val[1]; kernel.packet[7] = T_3[3].val[1]; } #endif // end EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_NEON_H RcppEigen/inst/include/Eigen/src/Core/arch/NEON/TypeCasting.h0000644000176200001440000014412614562417221023327 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2018 Rasmus Munk Larsen // Copyright (C) 2020 Antonio Sanchez // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TYPE_CASTING_NEON_H #define EIGEN_TYPE_CASTING_NEON_H namespace Eigen { namespace internal { //============================================================================== // pcast, SrcType = float //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4f& a) { return a; } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet2f& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; // If float64 exists, first convert to that to keep as much precision as possible. #if EIGEN_ARCH_ARM64 template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet4f& a) { // Discard second half of input. return vcvtq_s64_f64(vcvt_f64_f32(vget_low_f32(a))); } template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet4f& a) { // Discard second half of input. return vcvtq_u64_f64(vcvt_f64_f32(vget_low_f32(a))); } #else template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet4f& a) { // Discard second half of input. return vmovl_s32(vget_low_s32(vcvtq_s32_f32(a))); } template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet4f& a) { // Discard second half of input. return vmovl_u32(vget_low_u32(vcvtq_u32_f32(a))); } #endif // EIGEN_ARCH_ARM64 template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet4f& a) { return vcvtq_s32_f32(a); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet2f& a) { return vcvt_s32_f32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet4f& a) { return vcvtq_u32_f32(a); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet2f& a) { return vcvt_u32_f32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet4f& a, const Packet4f& b) { return vcombine_s16(vmovn_s32(vcvtq_s32_f32(a)), vmovn_s32(vcvtq_s32_f32(b))); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet2f& a, const Packet2f& b) { return vmovn_s32(vcombine_s32(vcvt_s32_f32(a), vcvt_s32_f32(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet4f& a, const Packet4f& b) { return vcombine_u16(vmovn_u32(vcvtq_u32_f32(a)), vmovn_u32(vcvtq_u32_f32(b))); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet2f& a, const Packet2f& b) { return vmovn_u32(vcombine_u32(vcvt_u32_f32(a), vcvt_u32_f32(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet4f& a, const Packet4f& b, const Packet4f& c, const Packet4f& d) { const int16x8_t ab_s16 = pcast(a, b); const int16x8_t cd_s16 = pcast(c, d); return vcombine_s8(vmovn_s16(ab_s16), vmovn_s16(cd_s16)); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet2f& a, const Packet2f& b, const Packet2f& c, const Packet2f& d) { const int16x4_t ab_s16 = pcast(a, b); const int16x4_t cd_s16 = pcast(c, d); return vmovn_s16(vcombine_s16(ab_s16, cd_s16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet4f& a, const Packet4f& b, const Packet4f& c, const Packet4f& d) { const uint16x8_t ab_u16 = pcast(a, b); const uint16x8_t cd_u16 = pcast(c, d); return vcombine_u8(vmovn_u16(ab_u16), vmovn_u16(cd_u16)); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet2f& a, const Packet2f& b, const Packet2f& c, const Packet2f& d) { const uint16x4_t ab_u16 = pcast(a, b); const uint16x4_t cd_u16 = pcast(c, d); return vmovn_u16(vcombine_u16(ab_u16, cd_u16)); } //============================================================================== // pcast, SrcType = int8_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet16c& a) { // Discard all but first 4 bytes. return vcvtq_f32_s32(vmovl_s16(vget_low_s16(vmovl_s8(vget_low_s8(a))))); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet8c& a) { // Discard all but first 2 bytes. return vcvt_f32_s32(vget_low_s32(vmovl_s16(vget_low_s16(vmovl_s8(a))))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet16c& a) { // Discard all but first two bytes. return vmovl_s32(vget_low_s32(vmovl_s16(vget_low_s16(vmovl_s8(vget_low_s8(a)))))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet16c& a) { return vreinterpretq_u64_s64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet16c& a) { // Discard all but first 4 bytes. return vmovl_s16(vget_low_s16(vmovl_s8(vget_low_s8(a)))); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet8c& a) { // Discard all but first 2 bytes. return vget_low_s32(vmovl_s16(vget_low_s16(vmovl_s8(a)))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet16c& a) { return vreinterpretq_u32_s32(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet8c& a) { return vreinterpret_u32_s32(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet16c& a) { // Discard second half of input. return vmovl_s8(vget_low_s8(a)); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet8c& a) { // Discard second half of input. return vget_low_s16(vmovl_s8(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet16c& a) { return vreinterpretq_u16_s16(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet8c& a) { return vreinterpret_u16_s16(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet16c& a) { return a; } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet8c& a) { return a; } template <> EIGEN_STRONG_INLINE Packet4c pcast(const Packet4c& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet16c& a) { return vreinterpretq_u8_s8(a); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet8c& a) { return vreinterpret_u8_s8(a); } template <> EIGEN_STRONG_INLINE Packet4uc pcast(const Packet4c& a) { return static_cast(a); } //============================================================================== // pcast, SrcType = uint8_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet16uc& a) { // Discard all but first 4 bytes. return vcvtq_f32_u32(vmovl_u16(vget_low_u16(vmovl_u8(vget_low_u8(a))))); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet8uc& a) { // Discard all but first 2 bytes. return vcvt_f32_u32(vget_low_u32(vmovl_u16(vget_low_u16(vmovl_u8(a))))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet16uc& a) { // Discard all but first two bytes. return vmovl_u32(vget_low_u32(vmovl_u16(vget_low_u16(vmovl_u8(vget_low_u8(a)))))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet16uc& a) { return vreinterpretq_s64_u64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet16uc& a) { // Discard all but first 4 bytes. return vmovl_u16(vget_low_u16(vmovl_u8(vget_low_u8(a)))); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet8uc& a) { // Discard all but first 2 bytes. return vget_low_u32(vmovl_u16(vget_low_u16(vmovl_u8(a)))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet16uc& a) { return vreinterpretq_s32_u32(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet8uc& a) { return vreinterpret_s32_u32(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet16uc& a) { // Discard second half of input. return vmovl_u8(vget_low_u8(a)); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet8uc& a) { // Discard second half of input. return vget_low_u16(vmovl_u8(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet16uc& a) { return vreinterpretq_s16_u16(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet8uc& a) { return vreinterpret_s16_u16(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet16uc& a) { return a; } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet8uc& a) { return a; } template <> EIGEN_STRONG_INLINE Packet4uc pcast(const Packet4uc& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet16uc& a) { return vreinterpretq_s8_u8(a); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet8uc& a) { return vreinterpret_s8_u8(a); } template <> EIGEN_STRONG_INLINE Packet4c pcast(const Packet4uc& a) { return static_cast(a); } //============================================================================== // pcast, SrcType = int16_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet8s& a) { // Discard second half of input. return vcvtq_f32_s32(vmovl_s16(vget_low_s16(a))); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet4s& a) { // Discard second half of input. return vcvt_f32_s32(vget_low_s32(vmovl_s16(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet8s& a) { // Discard all but first two values. return vmovl_s32(vget_low_s32(vmovl_s16(vget_low_s16(a)))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet8s& a) { return vreinterpretq_u64_s64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet8s& a) { // Discard second half of input. return vmovl_s16(vget_low_s16(a)); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet4s& a) { // Discard second half of input. return vget_low_s32(vmovl_s16(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet8s& a) { return vreinterpretq_u32_s32(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet4s& a) { return vreinterpret_u32_s32(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet8s& a) { return a; } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet4s& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet8s& a) { return vreinterpretq_u16_s16(a); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet4s& a) { return vreinterpret_u16_s16(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet8s& a, const Packet8s& b) { return vcombine_s8(vmovn_s16(a), vmovn_s16(b)); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet4s& a, const Packet4s& b) { return vmovn_s16(vcombine_s16(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet8s& a, const Packet8s& b) { return vcombine_u8(vmovn_u16(vreinterpretq_u16_s16(a)), vmovn_u16(vreinterpretq_u16_s16(b))); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet4s& a, const Packet4s& b) { return vmovn_u16(vcombine_u16(vreinterpret_u16_s16(a), vreinterpret_u16_s16(b))); } //============================================================================== // pcast, SrcType = uint16_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet8us& a) { // Discard second half of input. return vcvtq_f32_u32(vmovl_u16(vget_low_u16(a))); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet4us& a) { // Discard second half of input. return vcvt_f32_u32(vget_low_u32(vmovl_u16(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet8us& a) { // Discard all but first two values. return vmovl_u32(vget_low_u32(vmovl_u16(vget_low_u16(a)))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet8us& a) { return vreinterpretq_s64_u64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet8us& a) { // Discard second half of input. return vmovl_u16(vget_low_u16(a)); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet4us& a) { // Discard second half of input. return vget_low_u32(vmovl_u16(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet8us& a) { return vreinterpretq_s32_u32(pcast(a)); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet4us& a) { return vreinterpret_s32_u32(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet8us& a) { return a; } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet4us& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet8us& a) { return vreinterpretq_s16_u16(a); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet4us& a) { return vreinterpret_s16_u16(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet8us& a, const Packet8us& b) { return vcombine_u8(vmovn_u16(a), vmovn_u16(b)); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet4us& a, const Packet4us& b) { return vmovn_u16(vcombine_u16(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet8us& a, const Packet8us& b) { return vreinterpretq_s8_u8(pcast(a, b)); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet4us& a, const Packet4us& b) { return vreinterpret_s8_u8(pcast(a, b)); } //============================================================================== // pcast, SrcType = int32_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4i& a) { return vcvtq_f32_s32(a); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet2i& a) { return vcvt_f32_s32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet4i& a) { // Discard second half of input. return vmovl_s32(vget_low_s32(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet4i& a) { return vreinterpretq_u64_s64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet4i& a) { return a; } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet2i& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet4i& a) { return vreinterpretq_u32_s32(a); } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet2i& a) { return vreinterpret_u32_s32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet4i& a, const Packet4i& b) { return vcombine_s16(vmovn_s32(a), vmovn_s32(b)); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet2i& a, const Packet2i& b) { return vmovn_s32(vcombine_s32(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet4i& a, const Packet4i& b) { return vcombine_u16(vmovn_u32(vreinterpretq_u32_s32(a)), vmovn_u32(vreinterpretq_u32_s32(b))); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet2i& a, const Packet2i& b) { return vmovn_u32(vreinterpretq_u32_s32(vcombine_s32(a, b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet4i& a, const Packet4i& b, const Packet4i& c, const Packet4i& d) { const int16x8_t ab_s16 = pcast(a, b); const int16x8_t cd_s16 = pcast(c, d); return vcombine_s8(vmovn_s16(ab_s16), vmovn_s16(cd_s16)); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet2i& a, const Packet2i& b, const Packet2i& c, const Packet2i& d) { const int16x4_t ab_s16 = vmovn_s32(vcombine_s32(a, b)); const int16x4_t cd_s16 = vmovn_s32(vcombine_s32(c, d)); return vmovn_s16(vcombine_s16(ab_s16, cd_s16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet4i& a, const Packet4i& b, const Packet4i& c, const Packet4i& d) { const uint16x8_t ab_u16 = pcast(a, b); const uint16x8_t cd_u16 = pcast(c, d); return vcombine_u8(vmovn_u16(ab_u16), vmovn_u16(cd_u16)); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet2i& a, const Packet2i& b, const Packet2i& c, const Packet2i& d) { const uint16x4_t ab_u16 = pcast(a, b); const uint16x4_t cd_u16 = pcast(c, d); return vmovn_u16(vcombine_u16(ab_u16, cd_u16)); } //============================================================================== // pcast, SrcType = uint32_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4ui& a) { return vcvtq_f32_u32(a); } template <> EIGEN_STRONG_INLINE Packet2f pcast(const Packet2ui& a) { return vcvt_f32_u32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet4ui& a) { // Discard second half of input. return vmovl_u32(vget_low_u32(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet4ui& a) { return vreinterpretq_s64_u64(pcast(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet4ui& a) { return a; } template <> EIGEN_STRONG_INLINE Packet2ui pcast(const Packet2ui& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet4ui& a) { return vreinterpretq_s32_u32(a); } template <> EIGEN_STRONG_INLINE Packet2i pcast(const Packet2ui& a) { return vreinterpret_s32_u32(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet4ui& a, const Packet4ui& b) { return vcombine_u16(vmovn_u32(a), vmovn_u32(b)); } template <> EIGEN_STRONG_INLINE Packet4us pcast(const Packet2ui& a, const Packet2ui& b) { return vmovn_u32(vcombine_u32(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet4ui& a, const Packet4ui& b) { return vreinterpretq_s16_u16(pcast(a, b)); } template <> EIGEN_STRONG_INLINE Packet4s pcast(const Packet2ui& a, const Packet2ui& b) { return vreinterpret_s16_u16(pcast(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet4ui& a, const Packet4ui& b, const Packet4ui& c, const Packet4ui& d) { const uint16x8_t ab_u16 = vcombine_u16(vmovn_u32(a), vmovn_u32(b)); const uint16x8_t cd_u16 = vcombine_u16(vmovn_u32(c), vmovn_u32(d)); return vcombine_u8(vmovn_u16(ab_u16), vmovn_u16(cd_u16)); } template <> EIGEN_STRONG_INLINE Packet8uc pcast(const Packet2ui& a, const Packet2ui& b, const Packet2ui& c, const Packet2ui& d) { const uint16x4_t ab_u16 = vmovn_u32(vcombine_u32(a, b)); const uint16x4_t cd_u16 = vmovn_u32(vcombine_u32(c, d)); return vmovn_u16(vcombine_u16(ab_u16, cd_u16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet4ui& a, const Packet4ui& b, const Packet4ui& c, const Packet4ui& d) { return vreinterpretq_s8_u8(pcast(a, b, c, d)); } template <> EIGEN_STRONG_INLINE Packet8c pcast(const Packet2ui& a, const Packet2ui& b, const Packet2ui& c, const Packet2ui& d) { return vreinterpret_s8_u8(pcast(a, b, c, d)); } //============================================================================== // pcast, SrcType = int64_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet2l& a, const Packet2l& b) { return vcvtq_f32_s32(vcombine_s32(vmovn_s64(a), vmovn_s64(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet2l& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet2l& a) { return vreinterpretq_u64_s64(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet2l& a, const Packet2l& b) { return vcombine_s32(vmovn_s64(a), vmovn_s64(b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet2l& a, const Packet2l& b) { return vcombine_u32(vmovn_u64(vreinterpretq_u64_s64(a)), vmovn_u64(vreinterpretq_u64_s64(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet2l& a, const Packet2l& b, const Packet2l& c, const Packet2l& d) { const int32x4_t ab_s32 = pcast(a, b); const int32x4_t cd_s32 = pcast(c, d); return vcombine_s16(vmovn_s32(ab_s32), vmovn_s32(cd_s32)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet2l& a, const Packet2l& b, const Packet2l& c, const Packet2l& d) { const uint32x4_t ab_u32 = pcast(a, b); const uint32x4_t cd_u32 = pcast(c, d); return vcombine_u16(vmovn_u32(ab_u32), vmovn_u32(cd_u32)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet2l& a, const Packet2l& b, const Packet2l& c, const Packet2l& d, const Packet2l& e, const Packet2l& f, const Packet2l& g, const Packet2l& h) { const int16x8_t abcd_s16 = pcast(a, b, c, d); const int16x8_t efgh_s16 = pcast(e, f, g, h); return vcombine_s8(vmovn_s16(abcd_s16), vmovn_s16(efgh_s16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet2l& a, const Packet2l& b, const Packet2l& c, const Packet2l& d, const Packet2l& e, const Packet2l& f, const Packet2l& g, const Packet2l& h) { const uint16x8_t abcd_u16 = pcast(a, b, c, d); const uint16x8_t efgh_u16 = pcast(e, f, g, h); return vcombine_u8(vmovn_u16(abcd_u16), vmovn_u16(efgh_u16)); } //============================================================================== // pcast, SrcType = uint64_t //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet2ul& a, const Packet2ul& b) { return vcvtq_f32_u32(vcombine_u32(vmovn_u64(a), vmovn_u64(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet2ul& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet2ul& a) { return vreinterpretq_s64_u64(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet2ul& a, const Packet2ul& b) { return vcombine_u32(vmovn_u64(a), vmovn_u64(b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet2ul& a, const Packet2ul& b) { return vreinterpretq_s32_u32(pcast(a, b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet2ul& a, const Packet2ul& b, const Packet2ul& c, const Packet2ul& d) { const uint16x4_t ab_u16 = vmovn_u32(vcombine_u32(vmovn_u64(a), vmovn_u64(b))); const uint16x4_t cd_u16 = vmovn_u32(vcombine_u32(vmovn_u64(c), vmovn_u64(d))); return vcombine_u16(ab_u16, cd_u16); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet2ul& a, const Packet2ul& b, const Packet2ul& c, const Packet2ul& d) { return vreinterpretq_s16_u16(pcast(a, b, c, d)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet2ul& a, const Packet2ul& b, const Packet2ul& c, const Packet2ul& d, const Packet2ul& e, const Packet2ul& f, const Packet2ul& g, const Packet2ul& h) { const uint16x8_t abcd_u16 = pcast(a, b, c, d); const uint16x8_t efgh_u16 = pcast(e, f, g, h); return vcombine_u8(vmovn_u16(abcd_u16), vmovn_u16(efgh_u16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet2ul& a, const Packet2ul& b, const Packet2ul& c, const Packet2ul& d, const Packet2ul& e, const Packet2ul& f, const Packet2ul& g, const Packet2ul& h) { return vreinterpretq_s8_u8(pcast(a, b, c, d, e, f, g, h)); } //============================================================================== // preinterpret //============================================================================== template <> EIGEN_STRONG_INLINE Packet2f preinterpret(const Packet2i& a) { return vreinterpret_f32_s32(a); } template <> EIGEN_STRONG_INLINE Packet2f preinterpret(const Packet2ui& a) { return vreinterpret_f32_u32(a); } template <> EIGEN_STRONG_INLINE Packet4f preinterpret(const Packet4i& a) { return vreinterpretq_f32_s32(a); } template <> EIGEN_STRONG_INLINE Packet4f preinterpret(const Packet4ui& a) { return vreinterpretq_f32_u32(a); } template <> EIGEN_STRONG_INLINE Packet4c preinterpret(const Packet4uc& a) { return static_cast(a); } template <> EIGEN_STRONG_INLINE Packet8c preinterpret(const Packet8uc& a) { return vreinterpret_s8_u8(a); } template <> EIGEN_STRONG_INLINE Packet16c preinterpret(const Packet16uc& a) { return vreinterpretq_s8_u8(a); } template <> EIGEN_STRONG_INLINE Packet4uc preinterpret(const Packet4c& a) { return static_cast(a); } template <> EIGEN_STRONG_INLINE Packet8uc preinterpret(const Packet8c& a) { return vreinterpret_u8_s8(a); } template <> EIGEN_STRONG_INLINE Packet16uc preinterpret(const Packet16c& a) { return vreinterpretq_u8_s8(a); } template <> EIGEN_STRONG_INLINE Packet4s preinterpret(const Packet4us& a) { return vreinterpret_s16_u16(a); } template <> EIGEN_STRONG_INLINE Packet8s preinterpret(const Packet8us& a) { return vreinterpretq_s16_u16(a); } template <> EIGEN_STRONG_INLINE Packet4us preinterpret(const Packet4s& a) { return vreinterpret_u16_s16(a); } template <> EIGEN_STRONG_INLINE Packet8us preinterpret(const Packet8s& a) { return vreinterpretq_u16_s16(a); } template <> EIGEN_STRONG_INLINE Packet2i preinterpret(const Packet2f& a) { return vreinterpret_s32_f32(a); } template <> EIGEN_STRONG_INLINE Packet2i preinterpret(const Packet2ui& a) { return vreinterpret_s32_u32(a); } template <> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet4f& a) { return vreinterpretq_s32_f32(a); } template <> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet4ui& a) { return vreinterpretq_s32_u32(a); } template <> EIGEN_STRONG_INLINE Packet2ui preinterpret(const Packet2f& a) { return vreinterpret_u32_f32(a); } template <> EIGEN_STRONG_INLINE Packet2ui preinterpret(const Packet2i& a) { return vreinterpret_u32_s32(a); } template <> EIGEN_STRONG_INLINE Packet4ui preinterpret(const Packet4f& a) { return vreinterpretq_u32_f32(a); } template <> EIGEN_STRONG_INLINE Packet4ui preinterpret(const Packet4i& a) { return vreinterpretq_u32_s32(a); } template <> EIGEN_STRONG_INLINE Packet2l preinterpret(const Packet2ul& a) { return vreinterpretq_s64_u64(a); } template <> EIGEN_STRONG_INLINE Packet2ul preinterpret(const Packet2l& a) { return vreinterpretq_u64_s64(a); } #if EIGEN_ARCH_ARM64 //============================================================================== // pcast/preinterpret, Double //============================================================================== template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet2d& a) { return a; } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4f pcast(const Packet2d& a, const Packet2d& b) { return vcombine_f32(vcvt_f32_f64(a), vcvt_f32_f64(b)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2l pcast(const Packet2d& a) { return vcvtq_s64_f64(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2ul pcast(const Packet2d& a) { return vcvtq_u64_f64(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4i pcast(const Packet2d& a, const Packet2d& b) { return vcombine_s32(vmovn_s64(vcvtq_s64_f64(a)), vmovn_s64(vcvtq_s64_f64(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 2, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet2d& a, const Packet2d& b) { return vcombine_u32(vmovn_u64(vcvtq_u64_f64(a)), vmovn_u64(vcvtq_u64_f64(b))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8s pcast(const Packet2d& a, const Packet2d& b, const Packet2d& c, const Packet2d& d) { const int32x4_t ab_s32 = pcast(a, b); const int32x4_t cd_s32 = pcast(c, d); return vcombine_s16(vmovn_s32(ab_s32), vmovn_s32(cd_s32)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet8us pcast(const Packet2d& a, const Packet2d& b, const Packet2d& c, const Packet2d& d) { const uint32x4_t ab_u32 = pcast(a, b); const uint32x4_t cd_u32 = pcast(c, d); return vcombine_u16(vmovn_u32(ab_u32), vmovn_u32(cd_u32)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16c pcast(const Packet2d& a, const Packet2d& b, const Packet2d& c, const Packet2d& d, const Packet2d& e, const Packet2d& f, const Packet2d& g, const Packet2d& h) { const int16x8_t abcd_s16 = pcast(a, b, c, d); const int16x8_t efgh_s16 = pcast(e, f, g, h); return vcombine_s8(vmovn_s16(abcd_s16), vmovn_s16(efgh_s16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 8, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet16uc pcast(const Packet2d& a, const Packet2d& b, const Packet2d& c, const Packet2d& d, const Packet2d& e, const Packet2d& f, const Packet2d& g, const Packet2d& h) { const uint16x8_t abcd_u16 = pcast(a, b, c, d); const uint16x8_t efgh_u16 = pcast(e, f, g, h); return vcombine_u8(vmovn_u16(abcd_u16), vmovn_u16(efgh_u16)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet4f& a) { // Discard second-half of input. return vcvt_f64_f32(vget_low_f32(a)); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet16c& a) { // Discard all but first two values. return vcvt_f64_f32(pcast(vget_low_s8(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 8 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet16uc& a) { // Discard all but first two values. return vcvt_f64_f32(pcast(vget_low_u8(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet8s& a) { // Discard all but first two values. return vcvt_f64_f32(pcast(vget_low_s16(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 4 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet8us& a) { // Discard all but first two values. return vcvt_f64_f32(pcast(vget_low_u16(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet4i& a) { // Discard second half of input. return vcvtq_f64_s64(vmovl_s32(vget_low_s32(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 2 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet4ui& a) { // Discard second half of input. return vcvtq_f64_u64(vmovl_u32(vget_low_u32(a))); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet2l& a) { return vcvtq_f64_s64(a); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> EIGEN_STRONG_INLINE Packet2d pcast(const Packet2ul& a) { return vcvtq_f64_u64(a); } template <> EIGEN_STRONG_INLINE Packet2d preinterpret(const Packet2l& a) { return vreinterpretq_f64_s64(a); } template <> EIGEN_STRONG_INLINE Packet2d preinterpret(const Packet2ul& a) { return vreinterpretq_f64_u64(a); } template <> EIGEN_STRONG_INLINE Packet2l preinterpret(const Packet2d& a) { return vreinterpretq_s64_f64(a); } template <> EIGEN_STRONG_INLINE Packet2ul preinterpret(const Packet2d& a) { return vreinterpretq_u64_f64(a); } template <> EIGEN_STRONG_INLINE Packet2d preinterpret(const Packet4i& a) { return vreinterpretq_f64_s32(a); } template <> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet2d& a) { return vreinterpretq_s32_f64(a); } #endif // EIGEN_ARCH_ARM64 } // end namespace internal } // end namespace Eigen #endif // EIGEN_TYPE_CASTING_NEON_H RcppEigen/inst/include/Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h0000644000176200001440000001523714562417221025546 0ustar liggesusersnamespace Eigen { namespace internal { #if EIGEN_ARCH_ARM && EIGEN_COMP_CLANG // Clang seems to excessively spill registers in the GEBP kernel on 32-bit arm. // Here we specialize gebp_traits to eliminate these register spills. // See #2138. template<> struct gebp_traits : gebp_traits { EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const { // This volatile inline ASM both acts as a barrier to prevent reordering, // as well as enforces strict register use. asm volatile( "vmla.f32 %q[r], %q[c], %q[alpha]" : [r] "+w" (r) : [c] "w" (c), [alpha] "w" (alpha) : ); } template EIGEN_STRONG_INLINE void madd(const Packet4f& a, const Packet4f& b, Packet4f& c, Packet4f& tmp, const LaneIdType&) const { acc(a, b, c); } template EIGEN_STRONG_INLINE void madd(const Packet4f& a, const QuadPacket& b, Packet4f& c, Packet4f& tmp, const LaneIdType& lane) const { madd(a, b.get(lane), c, tmp, lane); } }; #endif // EIGEN_ARCH_ARM && EIGEN_COMP_CLANG #if EIGEN_ARCH_ARM64 template<> struct gebp_traits : gebp_traits { typedef float RhsPacket; typedef float32x4_t RhsPacketx4; EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const { dest = *b; } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { dest = vld1q_f32(b); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const { dest = *b; } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const {} EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const { loadRhs(b,dest); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const { c = vfmaq_n_f32(c, a, b); } // NOTE: Template parameter inference failed when compiled with Android NDK: // "candidate template ignored: could not match 'FixedInt' against 'Eigen::internal::FixedInt<0>". EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const { madd_helper<0>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const { madd_helper<1>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const { madd_helper<2>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const { madd_helper<3>(a, b, c); } private: template EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const { #if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0)) // workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 // vfmaq_laneq_f32 is implemented through a costly dup if(LaneID==0) asm("fmla %0.4s, %1.4s, %2.s[0]\n" : "+w" (c) : "w" (a), "w" (b) : ); else if(LaneID==1) asm("fmla %0.4s, %1.4s, %2.s[1]\n" : "+w" (c) : "w" (a), "w" (b) : ); else if(LaneID==2) asm("fmla %0.4s, %1.4s, %2.s[2]\n" : "+w" (c) : "w" (a), "w" (b) : ); else if(LaneID==3) asm("fmla %0.4s, %1.4s, %2.s[3]\n" : "+w" (c) : "w" (a), "w" (b) : ); #else c = vfmaq_laneq_f32(c, a, b, LaneID); #endif } }; template<> struct gebp_traits : gebp_traits { typedef double RhsPacket; struct RhsPacketx4 { float64x2_t B_0, B_1; }; EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const { dest = *b; } EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketx4& dest) const { dest.B_0 = vld1q_f64(b); dest.B_1 = vld1q_f64(b+2); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacket& dest) const { loadRhs(b,dest); } EIGEN_STRONG_INLINE void updateRhs(const RhsScalar*, RhsPacketx4&) const {} EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const { loadRhs(b,dest); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const { c = vfmaq_n_f64(c, a, b); } // NOTE: Template parameter inference failed when compiled with Android NDK: // "candidate template ignored: could not match 'FixedInt' against 'Eigen::internal::FixedInt<0>". EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<0>&) const { madd_helper<0>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<1>&) const { madd_helper<1>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<2>&) const { madd_helper<2>(a, b, c); } EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c, RhsPacket& /*tmp*/, const FixedInt<3>&) const { madd_helper<3>(a, b, c); } private: template EIGEN_STRONG_INLINE void madd_helper(const LhsPacket& a, const RhsPacketx4& b, AccPacket& c) const { #if EIGEN_COMP_GNUC_STRICT && !(EIGEN_GNUC_AT_LEAST(9,0)) // workaround gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 // vfmaq_laneq_f64 is implemented through a costly dup if(LaneID==0) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : ); else if(LaneID==1) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_0) : ); else if(LaneID==2) asm("fmla %0.2d, %1.2d, %2.d[0]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : ); else if(LaneID==3) asm("fmla %0.2d, %1.2d, %2.d[1]\n" : "+w" (c) : "w" (a), "w" (b.B_1) : ); #else if(LaneID==0) c = vfmaq_laneq_f64(c, a, b.B_0, 0); else if(LaneID==1) c = vfmaq_laneq_f64(c, a, b.B_0, 1); else if(LaneID==2) c = vfmaq_laneq_f64(c, a, b.B_1, 0); else if(LaneID==3) c = vfmaq_laneq_f64(c, a, b.B_1, 1); #endif } }; #endif // EIGEN_ARCH_ARM64 } // namespace internal } // namespace Eigen RcppEigen/inst/include/Eigen/src/Core/arch/NEON/Complex.h0000644000176200001440000005374714562417221022514 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Gael Guennebaud // Copyright (C) 2010 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_NEON_H #define EIGEN_COMPLEX_NEON_H namespace Eigen { namespace internal { inline uint32x4_t p4ui_CONJ_XOR() { // See bug 1325, clang fails to call vld1q_u64. #if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML uint32x4_t ret = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; return ret; #else static const uint32_t conj_XOR_DATA[] = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; return vld1q_u32( conj_XOR_DATA ); #endif } inline uint32x2_t p2ui_CONJ_XOR() { static const uint32_t conj_XOR_DATA[] = { 0x00000000, 0x80000000 }; return vld1_u32( conj_XOR_DATA ); } //---------- float ---------- struct Packet1cf { EIGEN_STRONG_INLINE Packet1cf() {} EIGEN_STRONG_INLINE explicit Packet1cf(const Packet2f& a) : v(a) {} Packet2f v; }; struct Packet2cf { EIGEN_STRONG_INLINE Packet2cf() {} EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {} Packet4f v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet2cf type; typedef Packet1cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; typedef Packet1cf half; typedef Packet2f as_real; enum { size = 1, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> struct unpacket_traits { typedef std::complex type; typedef Packet1cf half; typedef Packet4f as_real; enum { size = 2, alignment = Aligned16, vectorizable = true, masked_load_available = false, masked_store_available = false }; }; template<> EIGEN_STRONG_INLINE Packet1cf pcast(const float& a) { return Packet1cf(vset_lane_f32(a, vdup_n_f32(0.f), 0)); } template<> EIGEN_STRONG_INLINE Packet2cf pcast(const Packet2f& a) { return Packet2cf(vreinterpretq_f32_u64(vmovl_u32(vreinterpret_u32_f32(a)))); } template<> EIGEN_STRONG_INLINE Packet1cf pset1(const std::complex& from) { return Packet1cf(vld1_f32(reinterpret_cast(&from))); } template<> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { const float32x2_t r64 = vld1_f32(reinterpret_cast(&from)); return Packet2cf(vcombine_f32(r64, r64)); } template<> EIGEN_STRONG_INLINE Packet1cf padd(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(padd(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf padd(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(padd(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet1cf psub(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(psub(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf psub(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(psub(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet1cf pnegate(const Packet1cf& a) { return Packet1cf(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet1cf pconj(const Packet1cf& a) { const Packet2ui b = vreinterpret_u32_f32(a.v); return Packet1cf(vreinterpret_f32_u32(veor_u32(b, p2ui_CONJ_XOR()))); } template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { const Packet4ui b = vreinterpretq_u32_f32(a.v); return Packet2cf(vreinterpretq_f32_u32(veorq_u32(b, p4ui_CONJ_XOR()))); } template<> EIGEN_STRONG_INLINE Packet1cf pmul(const Packet1cf& a, const Packet1cf& b) { Packet2f v1, v2; // Get the real values of a | a1_re | a1_re | v1 = vdup_lane_f32(a.v, 0); // Get the imag values of a | a1_im | a1_im | v2 = vdup_lane_f32(a.v, 1); // Multiply the real a with b v1 = vmul_f32(v1, b.v); // Multiply the imag a with b v2 = vmul_f32(v2, b.v); // Conjugate v2 v2 = vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(v2), p2ui_CONJ_XOR())); // Swap real/imag elements in v2. v2 = vrev64_f32(v2); // Add and return the result return Packet1cf(vadd_f32(v1, v2)); } template<> EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { Packet4f v1, v2; // Get the real values of a | a1_re | a1_re | a2_re | a2_re | v1 = vcombine_f32(vdup_lane_f32(vget_low_f32(a.v), 0), vdup_lane_f32(vget_high_f32(a.v), 0)); // Get the imag values of a | a1_im | a1_im | a2_im | a2_im | v2 = vcombine_f32(vdup_lane_f32(vget_low_f32(a.v), 1), vdup_lane_f32(vget_high_f32(a.v), 1)); // Multiply the real a with b v1 = vmulq_f32(v1, b.v); // Multiply the imag a with b v2 = vmulq_f32(v2, b.v); // Conjugate v2 v2 = vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(v2), p4ui_CONJ_XOR())); // Swap real/imag elements in v2. v2 = vrev64q_f32(v2); // Add and return the result return Packet2cf(vaddq_f32(v1, v2)); } template<> EIGEN_STRONG_INLINE Packet1cf pcmp_eq(const Packet1cf& a, const Packet1cf& b) { // Compare real and imaginary parts of a and b to get the mask vector: // [re(a[0])==re(b[0]), im(a[0])==im(b[0])] Packet2f eq = pcmp_eq(a.v, b.v); // Swap real/imag elements in the mask in to get: // [im(a[0])==im(b[0]), re(a[0])==re(b[0])] Packet2f eq_swapped = vrev64_f32(eq); // Return re(a)==re(b) && im(a)==im(b) by computing bitwise AND of eq and eq_swapped return Packet1cf(pand(eq, eq_swapped)); } template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) { // Compare real and imaginary parts of a and b to get the mask vector: // [re(a[0])==re(b[0]), im(a[0])==im(b[0]), re(a[1])==re(b[1]), im(a[1])==im(b[1])] Packet4f eq = pcmp_eq(a.v, b.v); // Swap real/imag elements in the mask in to get: // [im(a[0])==im(b[0]), re(a[0])==re(b[0]), im(a[1])==im(b[1]), re(a[1])==re(b[1])] Packet4f eq_swapped = vrev64q_f32(eq); // Return re(a)==re(b) && im(a)==im(b) by computing bitwise AND of eq and eq_swapped return Packet2cf(pand(eq, eq_swapped)); } template<> EIGEN_STRONG_INLINE Packet1cf pand(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(vreinterpret_f32_u32(vand_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet2cf pand(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cf por(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(vreinterpret_f32_u32(vorr_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet2cf por(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cf pxor(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet2cf pxor(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cf pandnot(const Packet1cf& a, const Packet1cf& b) { return Packet1cf(vreinterpret_f32_u32(vbic_u32(vreinterpret_u32_f32(a.v), vreinterpret_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet2cf pandnot(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a.v), vreinterpretq_u32_f32(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cf pload(const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cf(pload((const float*)from)); } template<> EIGEN_STRONG_INLINE Packet2cf pload(const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload(reinterpret_cast(from))); } template<> EIGEN_STRONG_INLINE Packet1cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cf(ploadu((const float*)from)); } template<> EIGEN_STRONG_INLINE Packet2cf ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu(reinterpret_cast(from))); } template<> EIGEN_STRONG_INLINE Packet1cf ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE Packet2cf ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex *to, const Packet1cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex *to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(reinterpret_cast(to), from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex *to, const Packet1cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex *to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast(to), from.v); } template<> EIGEN_DEVICE_FUNC inline Packet1cf pgather, Packet1cf>( const std::complex* from, Index stride) { const Packet2f tmp = vdup_n_f32(std::real(from[0*stride])); return Packet1cf(vset_lane_f32(std::imag(from[0*stride]), tmp, 1)); } template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather, Packet2cf>( const std::complex* from, Index stride) { Packet4f res = vdupq_n_f32(std::real(from[0*stride])); res = vsetq_lane_f32(std::imag(from[0*stride]), res, 1); res = vsetq_lane_f32(std::real(from[1*stride]), res, 2); res = vsetq_lane_f32(std::imag(from[1*stride]), res, 3); return Packet2cf(res); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet1cf>( std::complex* to, const Packet1cf& from, Index stride) { to[stride*0] = std::complex(vget_lane_f32(from.v, 0), vget_lane_f32(from.v, 1)); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cf>( std::complex* to, const Packet2cf& from, Index stride) { to[stride*0] = std::complex(vgetq_lane_f32(from.v, 0), vgetq_lane_f32(from.v, 1)); to[stride*1] = std::complex(vgetq_lane_f32(from.v, 2), vgetq_lane_f32(from.v, 3)); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex *addr) { EIGEN_ARM_PREFETCH(reinterpret_cast(addr)); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cf& a) { EIGEN_ALIGN16 std::complex x; vst1_f32(reinterpret_cast(&x), a.v); return x; } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cf& a) { EIGEN_ALIGN16 std::complex x[2]; vst1q_f32(reinterpret_cast(x), a.v); return x[0]; } template<> EIGEN_STRONG_INLINE Packet1cf preverse(const Packet1cf& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { return Packet2cf(vcombine_f32(vget_high_f32(a.v), vget_low_f32(a.v))); } template<> EIGEN_STRONG_INLINE Packet1cf pcplxflip(const Packet1cf& a) { return Packet1cf(vrev64_f32(a.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip(const Packet2cf& a) { return Packet2cf(vrev64q_f32(a.v)); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet1cf& a) { std::complex s; vst1_f32((float *)&s, a.v); return s; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { std::complex s; vst1_f32(reinterpret_cast(&s), vadd_f32(vget_low_f32(a.v), vget_high_f32(a.v))); return s; } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cf& a) { std::complex s; vst1_f32((float *)&s, a.v); return s; } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { float32x2_t a1, a2, v1, v2, prod; std::complex s; a1 = vget_low_f32(a.v); a2 = vget_high_f32(a.v); // Get the real values of a | a1_re | a1_re | a2_re | a2_re | v1 = vdup_lane_f32(a1, 0); // Get the real values of a | a1_im | a1_im | a2_im | a2_im | v2 = vdup_lane_f32(a1, 1); // Multiply the real a with b v1 = vmul_f32(v1, a2); // Multiply the imag a with b v2 = vmul_f32(v2, a2); // Conjugate v2 v2 = vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(v2), p2ui_CONJ_XOR())); // Swap real/imag elements in v2. v2 = vrev64_f32(v2); // Add v1, v2 prod = vadd_f32(v1, v2); vst1_f32(reinterpret_cast(&s), prod); return s; } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cf,Packet2f) EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f) template<> EIGEN_STRONG_INLINE Packet1cf pdiv(const Packet1cf& a, const Packet1cf& b) { // TODO optimize it for NEON Packet1cf res = pmul(a, pconj(b)); Packet2f s, rev_s; // this computes the norm s = vmul_f32(b.v, b.v); rev_s = vrev64_f32(s); return Packet1cf(pdiv(res.v, vadd_f32(s, rev_s))); } template<> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { // TODO optimize it for NEON Packet2cf res = pmul(a,pconj(b)); Packet4f s, rev_s; // this computes the norm s = vmulq_f32(b.v, b.v); rev_s = vrev64q_f32(s); return Packet2cf(pdiv(res.v, vaddq_f32(s, rev_s))); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& /*kernel*/) {} EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet4f tmp = vcombine_f32(vget_high_f32(kernel.packet[0].v), vget_high_f32(kernel.packet[1].v)); kernel.packet[0].v = vcombine_f32(vget_low_f32(kernel.packet[0].v), vget_low_f32(kernel.packet[1].v)); kernel.packet[1].v = tmp; } template<> EIGEN_STRONG_INLINE Packet1cf psqrt(const Packet1cf& a) { return psqrt_complex(a); } template<> EIGEN_STRONG_INLINE Packet2cf psqrt(const Packet2cf& a) { return psqrt_complex(a); } //---------- double ---------- #if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG // See bug 1325, clang fails to call vld1q_u64. #if EIGEN_COMP_CLANG || EIGEN_COMP_CASTXML static uint64x2_t p2ul_CONJ_XOR = {0x0, 0x8000000000000000}; #else const uint64_t p2ul_conj_XOR_DATA[] = { 0x0, 0x8000000000000000 }; static uint64x2_t p2ul_CONJ_XOR = vld1q_u64( p2ul_conj_XOR_DATA ); #endif struct Packet1cd { EIGEN_STRONG_INLINE Packet1cd() {} EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {} Packet2d v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet1cd type; typedef Packet1cd half; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 1, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; typedef Packet1cd half; typedef Packet2d as_real; enum { size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; }; template<> EIGEN_STRONG_INLINE Packet1cd pload(const std::complex* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload(reinterpret_cast(from))); } template<> EIGEN_STRONG_INLINE Packet1cd ploadu(const std::complex* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu(reinterpret_cast(from))); } template<> EIGEN_STRONG_INLINE Packet1cd pset1(const std::complex& from) { /* here we really have to use unaligned loads :( */ return ploadu(&from); } template<> EIGEN_STRONG_INLINE Packet1cd padd(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(padd(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd psub(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(psub(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v), p2ul_CONJ_XOR))); } template<> EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) { Packet2d v1, v2; // Get the real values of a v1 = vdupq_lane_f64(vget_low_f64(a.v), 0); // Get the imag values of a v2 = vdupq_lane_f64(vget_high_f64(a.v), 0); // Multiply the real a with b v1 = vmulq_f64(v1, b.v); // Multiply the imag a with b v2 = vmulq_f64(v2, b.v); // Conjugate v2 v2 = vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(v2), p2ul_CONJ_XOR)); // Swap real/imag elements in v2. v2 = preverse(v2); // Add and return the result return Packet1cd(vaddq_f64(v1, v2)); } template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) { // Compare real and imaginary parts of a and b to get the mask vector: // [re(a)==re(b), im(a)==im(b)] Packet2d eq = pcmp_eq(a.v, b.v); // Swap real/imag elements in the mask in to get: // [im(a)==im(b), re(a)==re(b)] Packet2d eq_swapped = vreinterpretq_f64_u32(vrev64q_u32(vreinterpretq_u32_f64(eq))); // Return re(a)==re(b) & im(a)==im(b) by computing bitwise AND of eq and eq_swapped return Packet1cd(pand(eq, eq_swapped)); } template<> EIGEN_STRONG_INLINE Packet1cd pand(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cd por(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cd pxor(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cd pandnot(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v)))); } template<> EIGEN_STRONG_INLINE Packet1cd ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex *to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(reinterpret_cast(to), from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex *to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast(to), from.v); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex *addr) { EIGEN_ARM_PREFETCH(reinterpret_cast(addr)); } template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather, Packet1cd>( const std::complex* from, Index stride) { Packet2d res = pset1(0.0); res = vsetq_lane_f64(std::real(from[0*stride]), res, 0); res = vsetq_lane_f64(std::imag(from[0*stride]), res, 1); return Packet1cd(res); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet1cd>( std::complex* to, const Packet1cd& from, Index stride) { to[stride*0] = std::complex(vgetq_lane_f64(from.v, 0), vgetq_lane_f64(from.v, 1)); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cd& a) { EIGEN_ALIGN16 std::complex res; pstore >(&res, a); return res; } template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet1cd& a) { return pfirst(a); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cd& a) { return pfirst(a); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d) template<> EIGEN_STRONG_INLINE Packet1cd pdiv(const Packet1cd& a, const Packet1cd& b) { // TODO optimize it for NEON Packet1cd res = pmul(a,pconj(b)); Packet2d s = pmul(b.v, b.v); Packet2d rev_s = preverse(s); return Packet1cd(pdiv(res.v, padd(s,rev_s))); } EIGEN_STRONG_INLINE Packet1cd pcplxflip/**/(const Packet1cd& x) { return Packet1cd(preverse(Packet2d(x.v))); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet2d tmp = vcombine_f64(vget_high_f64(kernel.packet[0].v), vget_high_f64(kernel.packet[1].v)); kernel.packet[0].v = vcombine_f64(vget_low_f64(kernel.packet[0].v), vget_low_f64(kernel.packet[1].v)); kernel.packet[1].v = tmp; } template<> EIGEN_STRONG_INLINE Packet1cd psqrt(const Packet1cd& a) { return psqrt_complex(a); } #endif // EIGEN_ARCH_ARM64 } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX_NEON_H RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/0000755000176200001440000000000014562417221021504 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/MatrixProductCommon.h0000644000176200001440000002242214562417221025635 0ustar liggesusers//#define EIGEN_POWER_USE_PREFETCH // Use prefetching in gemm routines #ifdef EIGEN_POWER_USE_PREFETCH #define EIGEN_POWER_PREFETCH(p) prefetch(p) #else #define EIGEN_POWER_PREFETCH(p) #endif namespace Eigen { namespace internal { template EIGEN_STRONG_INLINE void gemm_extra_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index row, Index col, Index remaining_rows, Index remaining_cols, const Packet& pAlpha); template EIGEN_STRONG_INLINE void gemm_extra_row( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index row, Index col, Index rows, Index cols, Index remaining_rows, const Packet& pAlpha, const Packet& pMask); template EIGEN_STRONG_INLINE void gemm_unrolled_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index& row, Index rows, Index col, Index remaining_cols, const Packet& pAlpha); template EIGEN_ALWAYS_INLINE Packet bmask(const int remaining_rows); template EIGEN_STRONG_INLINE void gemm_complex_extra_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index row, Index col, Index remaining_rows, Index remaining_cols, const Packet& pAlphaReal, const Packet& pAlphaImag); template EIGEN_STRONG_INLINE void gemm_complex_extra_row( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index row, Index col, Index rows, Index cols, Index remaining_rows, const Packet& pAlphaReal, const Packet& pAlphaImag, const Packet& pMask); template EIGEN_STRONG_INLINE void gemm_complex_unrolled_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index& row, Index rows, Index col, Index remaining_cols, const Packet& pAlphaReal, const Packet& pAlphaImag); template EIGEN_ALWAYS_INLINE Packet ploadLhs(const Scalar* lhs); template EIGEN_ALWAYS_INLINE void bload(PacketBlock& acc, const DataMapper& res, Index row, Index col); template EIGEN_ALWAYS_INLINE void bload(PacketBlock& acc, const DataMapper& res, Index row, Index col); template EIGEN_ALWAYS_INLINE void bscale(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha); template EIGEN_ALWAYS_INLINE void bscalec(PacketBlock& aReal, PacketBlock& aImag, const Packet& bReal, const Packet& bImag, PacketBlock& cReal, PacketBlock& cImag); const static Packet16uc p16uc_SETCOMPLEX32_FIRST = { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}; const static Packet16uc p16uc_SETCOMPLEX32_SECOND = { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}; //[a,b],[ai,bi] = [a,ai] - This is equivalent to p16uc_GETREAL64 const static Packet16uc p16uc_SETCOMPLEX64_FIRST = { 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}; //[a,b],[ai,bi] = [b,bi] - This is equivalent to p16uc_GETIMAG64 const static Packet16uc p16uc_SETCOMPLEX64_SECOND = { 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}; // Grab two decouples real/imaginary PacketBlocks and return two coupled (real/imaginary pairs) PacketBlocks. template EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& acc1, PacketBlock& acc2) { acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST); acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_FIRST); acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_FIRST); acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_FIRST); acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_SECOND); acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX32_SECOND); acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX32_SECOND); acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX32_SECOND); } template EIGEN_ALWAYS_INLINE void bcouple(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& tRes, PacketBlock& acc1, PacketBlock& acc2) { bcouple_common(taccReal, taccImag, acc1, acc2); acc1.packet[0] = padd(tRes.packet[0], acc1.packet[0]); acc1.packet[1] = padd(tRes.packet[1], acc1.packet[1]); acc1.packet[2] = padd(tRes.packet[2], acc1.packet[2]); acc1.packet[3] = padd(tRes.packet[3], acc1.packet[3]); acc2.packet[0] = padd(tRes.packet[4], acc2.packet[0]); acc2.packet[1] = padd(tRes.packet[5], acc2.packet[1]); acc2.packet[2] = padd(tRes.packet[6], acc2.packet[2]); acc2.packet[3] = padd(tRes.packet[7], acc2.packet[3]); } template EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& acc1, PacketBlock& acc2) { acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_FIRST); acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX32_SECOND); } template EIGEN_ALWAYS_INLINE void bcouple(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& tRes, PacketBlock& acc1, PacketBlock& acc2) { bcouple_common(taccReal, taccImag, acc1, acc2); acc1.packet[0] = padd(tRes.packet[0], acc1.packet[0]); acc2.packet[0] = padd(tRes.packet[1], acc2.packet[0]); } template<> EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& acc1, PacketBlock& acc2) { acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST); acc1.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_FIRST); acc1.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_FIRST); acc1.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_FIRST); acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_SECOND); acc2.packet[1].v = vec_perm(taccReal.packet[1], taccImag.packet[1], p16uc_SETCOMPLEX64_SECOND); acc2.packet[2].v = vec_perm(taccReal.packet[2], taccImag.packet[2], p16uc_SETCOMPLEX64_SECOND); acc2.packet[3].v = vec_perm(taccReal.packet[3], taccImag.packet[3], p16uc_SETCOMPLEX64_SECOND); } template<> EIGEN_ALWAYS_INLINE void bcouple_common(PacketBlock& taccReal, PacketBlock& taccImag, PacketBlock& acc1, PacketBlock& acc2) { acc1.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_FIRST); acc2.packet[0].v = vec_perm(taccReal.packet[0], taccImag.packet[0], p16uc_SETCOMPLEX64_SECOND); } // This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled. template EIGEN_ALWAYS_INLINE Packet ploadRhs(const Scalar* rhs) { return ploadu(rhs); } } // end namespace internal } // end namespace Eigen RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/MathFunctions.h0000644000176200001440000000442314562417221024442 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007 Julien Pommier // Copyright (C) 2009 Gael Guennebaud // Copyright (C) 2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATH_FUNCTIONS_ALTIVEC_H #define EIGEN_MATH_FUNCTIONS_ALTIVEC_H namespace Eigen { namespace internal { template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog(const Packet4f& _x) { return plog_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp(const Packet4f& _x) { return pexp_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psin(const Packet4f& _x) { return psin_float(_x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pcos(const Packet4f& _x) { return pcos_float(_x); } #ifndef EIGEN_COMP_CLANG template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f prsqrt(const Packet4f& x) { return vec_rsqrt(x); } #endif #ifdef __VSX__ #ifndef EIGEN_COMP_CLANG template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d prsqrt(const Packet2d& x) { return vec_rsqrt(x); } #endif template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psqrt(const Packet4f& x) { return vec_sqrt(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d psqrt(const Packet2d& x) { return vec_sqrt(x); } template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp(const Packet2d& _x) { return pexp_double(_x); } #endif // Hyperbolic Tangent function. template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh(const Packet4f& x) { return internal::generic_fast_tanh_float(x); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/PacketMath.h0000755000176200001440000030777214562417221023721 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_ALTIVEC_H #define EIGEN_PACKET_MATH_ALTIVEC_H namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 4 #endif #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif // NOTE Altivec has 32 registers, but Eigen only accepts a value of 8 or 16 #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 #endif typedef __vector float Packet4f; typedef __vector int Packet4i; typedef __vector unsigned int Packet4ui; typedef __vector __bool int Packet4bi; typedef __vector short int Packet8s; typedef __vector unsigned short int Packet8us; typedef __vector signed char Packet16c; typedef __vector unsigned char Packet16uc; typedef eigen_packet_wrapper<__vector unsigned short int,0> Packet8bf; // We don't want to write the same code all the time, but we need to reuse the constants // and it doesn't really work to declare them global, so we define macros instead #define _EIGEN_DECLARE_CONST_FAST_Packet4f(NAME,X) \ Packet4f p4f_##NAME = {X, X, X, X} #define _EIGEN_DECLARE_CONST_FAST_Packet4i(NAME,X) \ Packet4i p4i_##NAME = vec_splat_s32(X) #define _EIGEN_DECLARE_CONST_FAST_Packet4ui(NAME,X) \ Packet4ui p4ui_##NAME = {X, X, X, X} #define _EIGEN_DECLARE_CONST_FAST_Packet8us(NAME,X) \ Packet8us p8us_##NAME = {X, X, X, X, X, X, X, X} #define _EIGEN_DECLARE_CONST_FAST_Packet16uc(NAME,X) \ Packet16uc p16uc_##NAME = {X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X} #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \ Packet4f p4f_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \ Packet4i p4i_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet2d(NAME,X) \ Packet2d p2d_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet2l(NAME,X) \ Packet2l p2l_##NAME = pset1(X) #define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \ const Packet4f p4f_##NAME = reinterpret_cast(pset1(X)) #define DST_CHAN 1 #define DST_CTRL(size, count, stride) (((size) << 24) | ((count) << 16) | (stride)) #define __UNPACK_TYPE__(PACKETNAME) typename unpacket_traits::type // These constants are endian-agnostic static _EIGEN_DECLARE_CONST_FAST_Packet4f(ZERO, 0); //{ 0.0, 0.0, 0.0, 0.0} static _EIGEN_DECLARE_CONST_FAST_Packet4i(ZERO, 0); //{ 0, 0, 0, 0,} static _EIGEN_DECLARE_CONST_FAST_Packet4i(ONE,1); //{ 1, 1, 1, 1} static _EIGEN_DECLARE_CONST_FAST_Packet4i(MINUS16,-16); //{ -16, -16, -16, -16} static _EIGEN_DECLARE_CONST_FAST_Packet4i(MINUS1,-1); //{ -1, -1, -1, -1} static _EIGEN_DECLARE_CONST_FAST_Packet4ui(SIGN, 0x80000000u); static _EIGEN_DECLARE_CONST_FAST_Packet4ui(PREV0DOT5, 0x3EFFFFFFu); static _EIGEN_DECLARE_CONST_FAST_Packet8us(ONE,1); //{ 1, 1, 1, 1, 1, 1, 1, 1} static _EIGEN_DECLARE_CONST_FAST_Packet16uc(ONE,1); static Packet4f p4f_MZERO = (Packet4f) vec_sl((Packet4ui)p4i_MINUS1, (Packet4ui)p4i_MINUS1); //{ 0x80000000, 0x80000000, 0x80000000, 0x80000000} #ifndef __VSX__ static Packet4f p4f_ONE = vec_ctf(p4i_ONE, 0); //{ 1.0, 1.0, 1.0, 1.0} #endif static Packet4f p4f_COUNTDOWN = { 0.0, 1.0, 2.0, 3.0 }; static Packet4i p4i_COUNTDOWN = { 0, 1, 2, 3 }; static Packet8s p8s_COUNTDOWN = { 0, 1, 2, 3, 4, 5, 6, 7 }; static Packet8us p8us_COUNTDOWN = { 0, 1, 2, 3, 4, 5, 6, 7 }; static Packet16c p16c_COUNTDOWN = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; static Packet16uc p16uc_COUNTDOWN = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; static Packet16uc p16uc_REVERSE32 = { 12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3 }; static Packet16uc p16uc_REVERSE16 = { 14,15, 12,13, 10,11, 8,9, 6,7, 4,5, 2,3, 0,1 }; static Packet16uc p16uc_REVERSE8 = { 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 }; static Packet16uc p16uc_DUPLICATE32_HI = { 0,1,2,3, 0,1,2,3, 4,5,6,7, 4,5,6,7 }; static Packet16uc p16uc_DUPLICATE16_HI = { 0,1,0,1, 2,3,2,3, 4,5,4,5, 6,7,6,7 }; static Packet16uc p16uc_DUPLICATE8_HI = { 0,0, 1,1, 2,2, 3,3, 4,4, 5,5, 6,6, 7,7 }; static const Packet16uc p16uc_DUPLICATE16_EVEN= { 0,1 ,0,1, 4,5, 4,5, 8,9, 8,9, 12,13, 12,13 }; static const Packet16uc p16uc_DUPLICATE16_ODD = { 2,3 ,2,3, 6,7, 6,7, 10,11, 10,11, 14,15, 14,15 }; static Packet16uc p16uc_QUADRUPLICATE16_HI = { 0,1,0,1,0,1,0,1, 2,3,2,3,2,3,2,3 }; // Handle endianness properly while loading constants // Define global static constants: #ifdef _BIG_ENDIAN static Packet16uc p16uc_FORWARD = vec_lvsl(0, (float*)0); #ifdef __VSX__ static Packet16uc p16uc_REVERSE64 = { 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; #endif static Packet16uc p16uc_PSET32_WODD = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 0), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 2), 8);//{ 0,1,2,3, 0,1,2,3, 8,9,10,11, 8,9,10,11 }; static Packet16uc p16uc_PSET32_WEVEN = vec_sld(p16uc_DUPLICATE32_HI, (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 3), 8);//{ 4,5,6,7, 4,5,6,7, 12,13,14,15, 12,13,14,15 }; static Packet16uc p16uc_HALF64_0_16 = vec_sld((Packet16uc)p4i_ZERO, vec_splat((Packet16uc) vec_abs(p4i_MINUS16), 3), 8); //{ 0,0,0,0, 0,0,0,0, 16,16,16,16, 16,16,16,16}; #else static Packet16uc p16uc_FORWARD = p16uc_REVERSE32; static Packet16uc p16uc_REVERSE64 = { 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; static Packet16uc p16uc_PSET32_WODD = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 1), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 3), 8);//{ 0,1,2,3, 0,1,2,3, 8,9,10,11, 8,9,10,11 }; static Packet16uc p16uc_PSET32_WEVEN = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 0), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 2), 8);//{ 4,5,6,7, 4,5,6,7, 12,13,14,15, 12,13,14,15 }; static Packet16uc p16uc_HALF64_0_16 = vec_sld(vec_splat((Packet16uc) vec_abs(p4i_MINUS16), 0), (Packet16uc)p4i_ZERO, 8); //{ 0,0,0,0, 0,0,0,0, 16,16,16,16, 16,16,16,16}; #endif // _BIG_ENDIAN static Packet16uc p16uc_PSET64_HI = (Packet16uc) vec_mergeh((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN); //{ 0,1,2,3, 4,5,6,7, 0,1,2,3, 4,5,6,7 }; static Packet16uc p16uc_PSET64_LO = (Packet16uc) vec_mergel((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN); //{ 8,9,10,11, 12,13,14,15, 8,9,10,11, 12,13,14,15 }; static Packet16uc p16uc_TRANSPOSE64_HI = p16uc_PSET64_HI + p16uc_HALF64_0_16; //{ 0,1,2,3, 4,5,6,7, 16,17,18,19, 20,21,22,23}; static Packet16uc p16uc_TRANSPOSE64_LO = p16uc_PSET64_LO + p16uc_HALF64_0_16; //{ 8,9,10,11, 12,13,14,15, 24,25,26,27, 28,29,30,31}; static Packet16uc p16uc_COMPLEX32_REV = vec_sld(p16uc_REVERSE32, p16uc_REVERSE32, 8); //{ 4,5,6,7, 0,1,2,3, 12,13,14,15, 8,9,10,11 }; #ifdef _BIG_ENDIAN static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_FORWARD, p16uc_FORWARD, 8); //{ 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; #else static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_PSET64_HI, p16uc_PSET64_LO, 8); //{ 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 }; #endif // _BIG_ENDIAN #if EIGEN_HAS_BUILTIN(__builtin_prefetch) || EIGEN_COMP_GNUC #define EIGEN_PPC_PREFETCH(ADDR) __builtin_prefetch(ADDR); #else #define EIGEN_PPC_PREFETCH(ADDR) asm( " dcbt [%[addr]]\n" :: [addr] "r" (ADDR) : "cc" ); #endif template <> struct packet_traits : default_packet_traits { typedef Packet4f type; typedef Packet4f half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasMin = 1, HasMax = 1, HasAbs = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, #ifdef __VSX__ HasSqrt = 1, #if !EIGEN_COMP_CLANG HasRsqrt = 1, #else HasRsqrt = 0, #endif #else HasSqrt = 0, HasRsqrt = 0, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, #endif HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasNegate = 1, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8bf type; typedef Packet8bf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasMin = 1, HasMax = 1, HasAbs = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, #ifdef __VSX__ HasSqrt = 1, #if !EIGEN_COMP_CLANG HasRsqrt = 1, #else HasRsqrt = 0, #endif #else HasSqrt = 0, HasRsqrt = 0, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, #endif HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasNegate = 1, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4i type; typedef Packet4i half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasShift = 1, HasMul = 1, HasDiv = 0, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8s type; typedef Packet8s half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 0, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet8us type; typedef Packet8us half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 8, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 0, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16c type; typedef Packet16c half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 0, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet16uc type; typedef Packet16uc half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 16, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 0, HasBlend = 1 }; }; template<> struct unpacket_traits { typedef float type; typedef Packet4f half; typedef Packet4i integer_packet; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef int type; typedef Packet4i half; enum {size=4, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef short int type; typedef Packet8s half; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef unsigned short int type; typedef Packet8us half; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef signed char type; typedef Packet16c half; enum {size=16, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef unsigned char type; typedef Packet16uc half; enum {size=16, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; template<> struct unpacket_traits { typedef bfloat16 type; typedef Packet8bf half; enum {size=8, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; }; inline std::ostream & operator <<(std::ostream & s, const Packet16c & v) { union { Packet16c v; signed char n[16]; } vt; vt.v = v; for (int i=0; i< 16; i++) s << vt.n[i] << ", "; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet16uc & v) { union { Packet16uc v; unsigned char n[16]; } vt; vt.v = v; for (int i=0; i< 16; i++) s << vt.n[i] << ", "; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet4f & v) { union { Packet4f v; float n[4]; } vt; vt.v = v; s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet4i & v) { union { Packet4i v; int n[4]; } vt; vt.v = v; s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet4ui & v) { union { Packet4ui v; unsigned int n[4]; } vt; vt.v = v; s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3]; return s; } template EIGEN_STRONG_INLINE Packet pload_common(const __UNPACK_TYPE__(Packet)* from) { // some versions of GCC throw "unused-but-set-parameter". // ignoring these warnings for now. EIGEN_UNUSED_VARIABLE(from); EIGEN_DEBUG_ALIGNED_LOAD #ifdef __VSX__ return vec_xl(0, const_cast<__UNPACK_TYPE__(Packet)*>(from)); #else return vec_ld(0, from); #endif } // Need to define them first or we get specialization after instantiation errors template<> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet4i pload(const int* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet8s pload(const short int* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet8us pload(const unsigned short int* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet16c pload(const signed char* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet16uc pload(const unsigned char* from) { return pload_common(from); } template<> EIGEN_STRONG_INLINE Packet8bf pload(const bfloat16* from) { return pload_common(reinterpret_cast(from)); } template EIGEN_STRONG_INLINE void pstore_common(__UNPACK_TYPE__(Packet)* to, const Packet& from){ // some versions of GCC throw "unused-but-set-parameter" (float *to). // ignoring these warnings for now. EIGEN_UNUSED_VARIABLE(to); EIGEN_DEBUG_ALIGNED_STORE #ifdef __VSX__ vec_xst(from, 0, to); #else vec_st(from, 0, to); #endif } template<> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { pstore_common(to, from); } template<> EIGEN_STRONG_INLINE void pstore(int* to, const Packet4i& from) { pstore_common(to, from); } template<> EIGEN_STRONG_INLINE void pstore(short int* to, const Packet8s& from) { pstore_common(to, from); } template<> EIGEN_STRONG_INLINE void pstore(unsigned short int* to, const Packet8us& from) { pstore_common(to, from); } template<> EIGEN_STRONG_INLINE void pstore(bfloat16* to, const Packet8bf& from) { pstore_common(reinterpret_cast(to), from); } template<> EIGEN_STRONG_INLINE void pstore(signed char* to, const Packet16c& from) { pstore_common(to, from); } template<> EIGEN_STRONG_INLINE void pstore(unsigned char* to, const Packet16uc& from) { pstore_common(to, from); } template EIGEN_STRONG_INLINE Packet pset1_size4(const __UNPACK_TYPE__(Packet)& from) { Packet v = {from, from, from, from}; return v; } template EIGEN_STRONG_INLINE Packet pset1_size8(const __UNPACK_TYPE__(Packet)& from) { Packet v = {from, from, from, from, from, from, from, from}; return v; } template EIGEN_STRONG_INLINE Packet pset1_size16(const __UNPACK_TYPE__(Packet)& from) { Packet v = {from, from, from, from, from, from, from, from, from, from, from, from, from, from, from, from}; return v; } template<> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { return pset1_size4(from); } template<> EIGEN_STRONG_INLINE Packet4i pset1(const int& from) { return pset1_size4(from); } template<> EIGEN_STRONG_INLINE Packet8s pset1(const short int& from) { return pset1_size8(from); } template<> EIGEN_STRONG_INLINE Packet8us pset1(const unsigned short int& from) { return pset1_size8(from); } template<> EIGEN_STRONG_INLINE Packet16c pset1(const signed char& from) { return pset1_size16(from); } template<> EIGEN_STRONG_INLINE Packet16uc pset1(const unsigned char& from) { return pset1_size16(from); } template<> EIGEN_STRONG_INLINE Packet4f pset1frombits(unsigned int from) { return reinterpret_cast(pset1(from)); } template<> EIGEN_STRONG_INLINE Packet8bf pset1(const bfloat16& from) { return pset1_size8(reinterpret_cast(from)); } template EIGEN_STRONG_INLINE void pbroadcast4_common(const __UNPACK_TYPE__(Packet) *a, Packet& a0, Packet& a1, Packet& a2, Packet& a3) { a3 = pload(a); a0 = vec_splat(a3, 0); a1 = vec_splat(a3, 1); a2 = vec_splat(a3, 2); a3 = vec_splat(a3, 3); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const float *a, Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3) { pbroadcast4_common(a, a0, a1, a2, a3); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const int *a, Packet4i& a0, Packet4i& a1, Packet4i& a2, Packet4i& a3) { pbroadcast4_common(a, a0, a1, a2, a3); } template EIGEN_DEVICE_FUNC inline Packet pgather_common(const __UNPACK_TYPE__(Packet)* from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[4]; a[0] = from[0*stride]; a[1] = from[1*stride]; a[2] = from[2*stride]; a[3] = from[3*stride]; return pload(a); } template<> EIGEN_DEVICE_FUNC inline Packet4f pgather(const float* from, Index stride) { return pgather_common(from, stride); } template<> EIGEN_DEVICE_FUNC inline Packet4i pgather(const int* from, Index stride) { return pgather_common(from, stride); } template EIGEN_DEVICE_FUNC inline Packet pgather_size8(const __UNPACK_TYPE__(Packet)* from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[8]; a[0] = from[0*stride]; a[1] = from[1*stride]; a[2] = from[2*stride]; a[3] = from[3*stride]; a[4] = from[4*stride]; a[5] = from[5*stride]; a[6] = from[6*stride]; a[7] = from[7*stride]; return pload(a); } template<> EIGEN_DEVICE_FUNC inline Packet8s pgather(const short int* from, Index stride) { return pgather_size8(from, stride); } template<> EIGEN_DEVICE_FUNC inline Packet8us pgather(const unsigned short int* from, Index stride) { return pgather_size8(from, stride); } template<> EIGEN_DEVICE_FUNC inline Packet8bf pgather(const bfloat16* from, Index stride) { return pgather_size8(from, stride); } template EIGEN_DEVICE_FUNC inline Packet pgather_size16(const __UNPACK_TYPE__(Packet)* from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[16]; a[0] = from[0*stride]; a[1] = from[1*stride]; a[2] = from[2*stride]; a[3] = from[3*stride]; a[4] = from[4*stride]; a[5] = from[5*stride]; a[6] = from[6*stride]; a[7] = from[7*stride]; a[8] = from[8*stride]; a[9] = from[9*stride]; a[10] = from[10*stride]; a[11] = from[11*stride]; a[12] = from[12*stride]; a[13] = from[13*stride]; a[14] = from[14*stride]; a[15] = from[15*stride]; return pload(a); } template<> EIGEN_DEVICE_FUNC inline Packet16c pgather(const signed char* from, Index stride) { return pgather_size16(from, stride); } template<> EIGEN_DEVICE_FUNC inline Packet16uc pgather(const unsigned char* from, Index stride) { return pgather_size16(from, stride); } template EIGEN_DEVICE_FUNC inline void pscatter_size4(__UNPACK_TYPE__(Packet)* to, const Packet& from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[4]; pstore<__UNPACK_TYPE__(Packet)>(a, from); to[0*stride] = a[0]; to[1*stride] = a[1]; to[2*stride] = a[2]; to[3*stride] = a[3]; } template<> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet4f& from, Index stride) { pscatter_size4(to, from, stride); } template<> EIGEN_DEVICE_FUNC inline void pscatter(int* to, const Packet4i& from, Index stride) { pscatter_size4(to, from, stride); } template EIGEN_DEVICE_FUNC inline void pscatter_size8(__UNPACK_TYPE__(Packet)* to, const Packet& from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[8]; pstore<__UNPACK_TYPE__(Packet)>(a, from); to[0*stride] = a[0]; to[1*stride] = a[1]; to[2*stride] = a[2]; to[3*stride] = a[3]; to[4*stride] = a[4]; to[5*stride] = a[5]; to[6*stride] = a[6]; to[7*stride] = a[7]; } template<> EIGEN_DEVICE_FUNC inline void pscatter(short int* to, const Packet8s& from, Index stride) { pscatter_size8(to, from, stride); } template<> EIGEN_DEVICE_FUNC inline void pscatter(unsigned short int* to, const Packet8us& from, Index stride) { pscatter_size8(to, from, stride); } template<> EIGEN_DEVICE_FUNC inline void pscatter(bfloat16* to, const Packet8bf& from, Index stride) { pscatter_size8(to, from, stride); } template EIGEN_DEVICE_FUNC inline void pscatter_size16(__UNPACK_TYPE__(Packet)* to, const Packet& from, Index stride) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) a[16]; pstore<__UNPACK_TYPE__(Packet)>(a, from); to[0*stride] = a[0]; to[1*stride] = a[1]; to[2*stride] = a[2]; to[3*stride] = a[3]; to[4*stride] = a[4]; to[5*stride] = a[5]; to[6*stride] = a[6]; to[7*stride] = a[7]; to[8*stride] = a[8]; to[9*stride] = a[9]; to[10*stride] = a[10]; to[11*stride] = a[11]; to[12*stride] = a[12]; to[13*stride] = a[13]; to[14*stride] = a[14]; to[15*stride] = a[15]; } template<> EIGEN_DEVICE_FUNC inline void pscatter(signed char* to, const Packet16c& from, Index stride) { pscatter_size16(to, from, stride); } template<> EIGEN_DEVICE_FUNC inline void pscatter(unsigned char* to, const Packet16uc& from, Index stride) { pscatter_size16(to, from, stride); } template<> EIGEN_STRONG_INLINE Packet4f plset(const float& a) { return pset1(a) + p4f_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet4i plset(const int& a) { return pset1(a) + p4i_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet8s plset(const short int& a) { return pset1(a) + p8s_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet8us plset(const unsigned short int& a) { return pset1(a) + p8us_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet16c plset(const signed char& a) { return pset1(a) + p16c_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet16uc plset(const unsigned char& a) { return pset1(a) + p16uc_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet4f padd (const Packet4f& a, const Packet4f& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet4i padd (const Packet4i& a, const Packet4i& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet4ui padd (const Packet4ui& a, const Packet4ui& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet8s padd (const Packet8s& a, const Packet8s& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet8us padd (const Packet8us& a, const Packet8us& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet16c padd (const Packet16c& a, const Packet16c& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet16uc padd(const Packet16uc& a, const Packet16uc& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet4f psub (const Packet4f& a, const Packet4f& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet4i psub (const Packet4i& a, const Packet4i& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet8s psub (const Packet8s& a, const Packet8s& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet8us psub (const Packet8us& a, const Packet8us& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet16c psub (const Packet16c& a, const Packet16c& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet16uc psub(const Packet16uc& a, const Packet16uc& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { return p4f_ZERO - a; } template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return p4i_ZERO - a; } template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; } template<> EIGEN_STRONG_INLINE Packet4f pmul (const Packet4f& a, const Packet4f& b) { return vec_madd(a,b, p4f_MZERO); } template<> EIGEN_STRONG_INLINE Packet4i pmul (const Packet4i& a, const Packet4i& b) { return a * b; } template<> EIGEN_STRONG_INLINE Packet8s pmul (const Packet8s& a, const Packet8s& b) { return vec_mul(a,b); } template<> EIGEN_STRONG_INLINE Packet8us pmul (const Packet8us& a, const Packet8us& b) { return vec_mul(a,b); } template<> EIGEN_STRONG_INLINE Packet16c pmul (const Packet16c& a, const Packet16c& b) { return vec_mul(a,b); } template<> EIGEN_STRONG_INLINE Packet16uc pmul(const Packet16uc& a, const Packet16uc& b) { return vec_mul(a,b); } template<> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { #ifndef __VSX__ // VSX actually provides a div instruction Packet4f t, y_0, y_1; // Altivec does not offer a divide instruction, we have to do a reciprocal approximation y_0 = vec_re(b); // Do one Newton-Raphson iteration to get the needed accuracy t = vec_nmsub(y_0, b, p4f_ONE); y_1 = vec_madd(y_0, t, y_0); return vec_madd(a, y_1, p4f_MZERO); #else return vec_div(a, b); #endif } template<> EIGEN_STRONG_INLINE Packet4i pdiv(const Packet4i& /*a*/, const Packet4i& /*b*/) { eigen_assert(false && "packet integer division are not supported by AltiVec"); return pset1(0); } // for some weird raisons, it has to be overloaded for packet of integers template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vec_madd(a,b,c); } template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return a*b + c; } template<> EIGEN_STRONG_INLINE Packet8s pmadd(const Packet8s& a, const Packet8s& b, const Packet8s& c) { return vec_madd(a,b,c); } template<> EIGEN_STRONG_INLINE Packet8us pmadd(const Packet8us& a, const Packet8us& b, const Packet8us& c) { return vec_madd(a,b,c); } template<> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { #ifdef __VSX__ // NOTE: about 10% slower than vec_min, but consistent with std::min and SSE regarding NaN Packet4f ret; __asm__ ("xvcmpgesp %x0,%x1,%x2\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b)); return ret; #else return vec_min(a, b); #endif } template<> EIGEN_STRONG_INLINE Packet4i pmin(const Packet4i& a, const Packet4i& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet8s pmin(const Packet8s& a, const Packet8s& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet8us pmin(const Packet8us& a, const Packet8us& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet16c pmin(const Packet16c& a, const Packet16c& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet16uc pmin(const Packet16uc& a, const Packet16uc& b) { return vec_min(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { #ifdef __VSX__ // NOTE: about 10% slower than vec_max, but consistent with std::max and SSE regarding NaN Packet4f ret; __asm__ ("xvcmpgtsp %x0,%x2,%x1\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b)); return ret; #else return vec_max(a, b); #endif } template<> EIGEN_STRONG_INLINE Packet4i pmax(const Packet4i& a, const Packet4i& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet8s pmax(const Packet8s& a, const Packet8s& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet8us pmax(const Packet8us& a, const Packet8us& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet16c pmax(const Packet16c& a, const Packet16c& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet16uc pmax(const Packet16uc& a, const Packet16uc& b) { return vec_max(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_le(const Packet4f& a, const Packet4f& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt(const Packet4f& a, const Packet4f& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_eq(const Packet4f& a, const Packet4f& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pcmp_lt_or_nan(const Packet4f& a, const Packet4f& b) { Packet4f c = reinterpret_cast(vec_cmpge(a,b)); return vec_nor(c,c); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_le(const Packet4i& a, const Packet4i& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_lt(const Packet4i& a, const Packet4i& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet4i pcmp_eq(const Packet4i& a, const Packet4i& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_le(const Packet8s& a, const Packet8s& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_lt(const Packet8s& a, const Packet8s& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet8s pcmp_eq(const Packet8s& a, const Packet8s& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_le(const Packet8us& a, const Packet8us& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_lt(const Packet8us& a, const Packet8us& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet8us pcmp_eq(const Packet8us& a, const Packet8us& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_le(const Packet16c& a, const Packet16c& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_lt(const Packet16c& a, const Packet16c& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet16c pcmp_eq(const Packet16c& a, const Packet16c& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_le(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_lt(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet16uc pcmp_eq(const Packet16uc& a, const Packet16uc& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet4f pand(const Packet4f& a, const Packet4f& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pand(const Packet4i& a, const Packet4i& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet4ui pand(const Packet4ui& a, const Packet4ui& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet8us pand(const Packet8us& a, const Packet8us& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pand(const Packet8bf& a, const Packet8bf& b) { return pand(a, b); } template<> EIGEN_STRONG_INLINE Packet4f por(const Packet4f& a, const Packet4f& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet4i por(const Packet4i& a, const Packet4i& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet8s por(const Packet8s& a, const Packet8s& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet8us por(const Packet8us& a, const Packet8us& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet8bf por(const Packet8bf& a, const Packet8bf& b) { return por(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pxor(const Packet4i& a, const Packet4i& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pxor(const Packet8bf& a, const Packet8bf& b) { return pxor(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { return vec_andc(a, b); } template<> EIGEN_STRONG_INLINE Packet4i pandnot(const Packet4i& a, const Packet4i& b) { return vec_andc(a, b); } template<> EIGEN_STRONG_INLINE Packet4f pselect(const Packet4f& mask, const Packet4f& a, const Packet4f& b) { return vec_sel(b, a, reinterpret_cast(mask)); } template<> EIGEN_STRONG_INLINE Packet4f pround(const Packet4f& a) { Packet4f t = vec_add(reinterpret_cast(vec_or(vec_and(reinterpret_cast(a), p4ui_SIGN), p4ui_PREV0DOT5)), a); Packet4f res; #ifdef __VSX__ __asm__("xvrspiz %x0, %x1\n\t" : "=&wa" (res) : "wa" (t)); #else __asm__("vrfiz %0, %1\n\t" : "=v" (res) : "v" (t)); #endif return res; } template<> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { return vec_ceil(a); } template<> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { return vec_floor(a); } template<> EIGEN_STRONG_INLINE Packet4f print(const Packet4f& a) { Packet4f res; __asm__("xvrspic %x0, %x1\n\t" : "=&wa" (res) : "wa" (a)); return res; } template EIGEN_STRONG_INLINE Packet ploadu_common(const __UNPACK_TYPE__(Packet)* from) { EIGEN_DEBUG_ALIGNED_LOAD #ifdef _BIG_ENDIAN Packet16uc MSQ, LSQ; Packet16uc mask; MSQ = vec_ld(0, (unsigned char *)from); // most significant quadword LSQ = vec_ld(15, (unsigned char *)from); // least significant quadword mask = vec_lvsl(0, from); // create the permute mask //TODO: Add static_cast here return (Packet) vec_perm(MSQ, LSQ, mask); // align the data #else EIGEN_DEBUG_UNALIGNED_LOAD return vec_xl(0, const_cast<__UNPACK_TYPE__(Packet)*>(from)); #endif } template<> EIGEN_STRONG_INLINE Packet4f ploadu(const float* from) { return ploadu_common(from); } template<> EIGEN_STRONG_INLINE Packet4i ploadu(const int* from) { return ploadu_common(from); } template<> EIGEN_STRONG_INLINE Packet8s ploadu(const short int* from) { return ploadu_common(from); } template<> EIGEN_STRONG_INLINE Packet8us ploadu(const unsigned short int* from) { return ploadu_common(from); } template<> EIGEN_STRONG_INLINE Packet8bf ploadu(const bfloat16* from) { return ploadu_common(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet16c ploadu(const signed char* from) { return ploadu_common(from); } template<> EIGEN_STRONG_INLINE Packet16uc ploadu(const unsigned char* from) { return ploadu_common(from); } template EIGEN_STRONG_INLINE Packet ploaddup_common(const __UNPACK_TYPE__(Packet)* from) { Packet p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_DUPLICATE32_HI); } template<> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { return ploaddup_common(from); } template<> EIGEN_STRONG_INLINE Packet4i ploaddup(const int* from) { return ploaddup_common(from); } template<> EIGEN_STRONG_INLINE Packet8s ploaddup(const short int* from) { Packet8s p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_DUPLICATE16_HI); } template<> EIGEN_STRONG_INLINE Packet8us ploaddup(const unsigned short int* from) { Packet8us p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_DUPLICATE16_HI); } template<> EIGEN_STRONG_INLINE Packet8s ploadquad(const short int* from) { Packet8s p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_QUADRUPLICATE16_HI); } template<> EIGEN_STRONG_INLINE Packet8us ploadquad(const unsigned short int* from) { Packet8us p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_QUADRUPLICATE16_HI); } template<> EIGEN_STRONG_INLINE Packet8bf ploadquad(const bfloat16* from) { return ploadquad(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet16c ploaddup(const signed char* from) { Packet16c p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_DUPLICATE8_HI); } template<> EIGEN_STRONG_INLINE Packet16uc ploaddup(const unsigned char* from) { Packet16uc p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_perm(p, p, p16uc_DUPLICATE8_HI); } template EIGEN_STRONG_INLINE void pstoreu_common(__UNPACK_TYPE__(Packet)* to, const Packet& from) { EIGEN_DEBUG_UNALIGNED_STORE #ifdef _BIG_ENDIAN // Taken from http://developer.apple.com/hardwaredrivers/ve/alignment.html // Warning: not thread safe! Packet16uc MSQ, LSQ, edges; Packet16uc edgeAlign, align; MSQ = vec_ld(0, (unsigned char *)to); // most significant quadword LSQ = vec_ld(15, (unsigned char *)to); // least significant quadword edgeAlign = vec_lvsl(0, to); // permute map to extract edges edges=vec_perm(LSQ,MSQ,edgeAlign); // extract the edges align = vec_lvsr( 0, to ); // permute map to misalign data MSQ = vec_perm(edges,(Packet16uc)from,align); // misalign the data (MSQ) LSQ = vec_perm((Packet16uc)from,edges,align); // misalign the data (LSQ) vec_st( LSQ, 15, (unsigned char *)to ); // Store the LSQ part first vec_st( MSQ, 0, (unsigned char *)to ); // Store the MSQ part second #else vec_xst(from, 0, to); #endif } template<> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet4f& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(int* to, const Packet4i& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(short int* to, const Packet8s& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(unsigned short int* to, const Packet8us& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(bfloat16* to, const Packet8bf& from) { pstoreu_common(reinterpret_cast(to), from); } template<> EIGEN_STRONG_INLINE void pstoreu(signed char* to, const Packet16c& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void pstoreu(unsigned char* to, const Packet16uc& from) { pstoreu_common(to, from); } template<> EIGEN_STRONG_INLINE void prefetch(const float* addr) { EIGEN_PPC_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE void prefetch(const int* addr) { EIGEN_PPC_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { EIGEN_ALIGN16 float x; vec_ste(a, 0, &x); return x; } template<> EIGEN_STRONG_INLINE int pfirst(const Packet4i& a) { EIGEN_ALIGN16 int x; vec_ste(a, 0, &x); return x; } template EIGEN_STRONG_INLINE __UNPACK_TYPE__(Packet) pfirst_common(const Packet& a) { EIGEN_ALIGN16 __UNPACK_TYPE__(Packet) x; vec_ste(a, 0, &x); return x; } template<> EIGEN_STRONG_INLINE short int pfirst(const Packet8s& a) { return pfirst_common(a); } template<> EIGEN_STRONG_INLINE unsigned short int pfirst(const Packet8us& a) { return pfirst_common(a); } template<> EIGEN_STRONG_INLINE signed char pfirst(const Packet16c& a) { return pfirst_common(a); } template<> EIGEN_STRONG_INLINE unsigned char pfirst(const Packet16uc& a) { return pfirst_common(a); } template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE32)); } template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE32)); } template<> EIGEN_STRONG_INLINE Packet8s preverse(const Packet8s& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE16)); } template<> EIGEN_STRONG_INLINE Packet8us preverse(const Packet8us& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE16)); } template<> EIGEN_STRONG_INLINE Packet16c preverse(const Packet16c& a) { return vec_perm(a, a, p16uc_REVERSE8); } template<> EIGEN_STRONG_INLINE Packet16uc preverse(const Packet16uc& a) { return vec_perm(a, a, p16uc_REVERSE8); } template<> EIGEN_STRONG_INLINE Packet8bf preverse(const Packet8bf& a) { return preverse(a); } template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE Packet8s pabs(const Packet8s& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE Packet8us pabs(const Packet8us& a) { return a; } template<> EIGEN_STRONG_INLINE Packet16c pabs(const Packet16c& a) { return vec_abs(a); } template<> EIGEN_STRONG_INLINE Packet16uc pabs(const Packet16uc& a) { return a; } template<> EIGEN_STRONG_INLINE Packet8bf pabs(const Packet8bf& a) { _EIGEN_DECLARE_CONST_FAST_Packet8us(abs_mask,0x7FFF); return pand(p8us_abs_mask, a); } template EIGEN_STRONG_INLINE Packet4i parithmetic_shift_right(const Packet4i& a) { return vec_sra(a,reinterpret_cast(pset1(N))); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_right(const Packet4i& a) { return vec_sr(a,reinterpret_cast(pset1(N))); } template EIGEN_STRONG_INLINE Packet4i plogical_shift_left(const Packet4i& a) { return vec_sl(a,reinterpret_cast(pset1(N))); } template EIGEN_STRONG_INLINE Packet4f plogical_shift_left(const Packet4f& a) { const _EIGEN_DECLARE_CONST_FAST_Packet4ui(mask, N); Packet4ui r = vec_sl(reinterpret_cast(a), p4ui_mask); return reinterpret_cast(r); } template EIGEN_STRONG_INLINE Packet4f plogical_shift_right(const Packet4f& a) { const _EIGEN_DECLARE_CONST_FAST_Packet4ui(mask, N); Packet4ui r = vec_sr(reinterpret_cast(a), p4ui_mask); return reinterpret_cast(r); } template EIGEN_STRONG_INLINE Packet4ui plogical_shift_right(const Packet4ui& a) { const _EIGEN_DECLARE_CONST_FAST_Packet4ui(mask, N); return vec_sr(a, p4ui_mask); } template EIGEN_STRONG_INLINE Packet4ui plogical_shift_left(const Packet4ui& a) { const _EIGEN_DECLARE_CONST_FAST_Packet4ui(mask, N); return vec_sl(a, p4ui_mask); } template EIGEN_STRONG_INLINE Packet8us plogical_shift_left(const Packet8us& a) { const _EIGEN_DECLARE_CONST_FAST_Packet8us(mask, N); return vec_sl(a, p8us_mask); } template EIGEN_STRONG_INLINE Packet8us plogical_shift_right(const Packet8us& a) { const _EIGEN_DECLARE_CONST_FAST_Packet8us(mask, N); return vec_sr(a, p8us_mask); } EIGEN_STRONG_INLINE Packet4f Bf16ToF32Even(const Packet8bf& bf){ return plogical_shift_left<16>(reinterpret_cast(bf.m_val)); } EIGEN_STRONG_INLINE Packet4f Bf16ToF32Odd(const Packet8bf& bf){ const _EIGEN_DECLARE_CONST_FAST_Packet4ui(high_mask, 0xFFFF0000); return pand( reinterpret_cast(bf.m_val), reinterpret_cast(p4ui_high_mask) ); } // Simple interleaving of bool masks, prevents true values from being // converted to NaNs. EIGEN_STRONG_INLINE Packet8bf F32ToBf16Bool(Packet4f even, Packet4f odd) { const _EIGEN_DECLARE_CONST_FAST_Packet4ui(high_mask, 0xFFFF0000); Packet4f bf_odd, bf_even; bf_odd = pand(reinterpret_cast(p4ui_high_mask), odd); bf_even = plogical_shift_right<16>(even); return reinterpret_cast(por(bf_even, bf_odd)); } EIGEN_STRONG_INLINE Packet8bf F32ToBf16(Packet4f p4f){ Packet4ui input = reinterpret_cast(p4f); Packet4ui lsb = plogical_shift_right<16>(input); lsb = pand(lsb, reinterpret_cast(p4i_ONE)); _EIGEN_DECLARE_CONST_FAST_Packet4ui(BIAS,0x7FFFu); Packet4ui rounding_bias = padd(lsb, p4ui_BIAS); input = padd(input, rounding_bias); //Test NaN and Subnormal - Begin const _EIGEN_DECLARE_CONST_FAST_Packet4ui(exp_mask, 0x7F800000); Packet4ui exp = pand(p4ui_exp_mask, reinterpret_cast(p4f)); const _EIGEN_DECLARE_CONST_FAST_Packet4ui(mantissa_mask, 0x7FFFFF); Packet4ui mantissa = pand(p4ui_mantissa_mask, reinterpret_cast(p4f)); const _EIGEN_DECLARE_CONST_FAST_Packet4ui(max_exp, 0x7F800000); Packet4bi is_max_exp = vec_cmpeq(exp, p4ui_max_exp); Packet4bi is_zero_exp = vec_cmpeq(exp, reinterpret_cast(p4i_ZERO)); Packet4bi is_mant_zero = vec_cmpeq(mantissa, reinterpret_cast(p4i_ZERO)); Packet4ui nan_selector = pandnot( reinterpret_cast(is_max_exp), reinterpret_cast(is_mant_zero) ); Packet4ui subnormal_selector = pandnot( reinterpret_cast(is_zero_exp), reinterpret_cast(is_mant_zero) ); const _EIGEN_DECLARE_CONST_FAST_Packet4ui(nan, 0x7FC00000); input = vec_sel(input, p4ui_nan, nan_selector); input = vec_sel(input, reinterpret_cast(p4f), subnormal_selector); //Test NaN and Subnormal - End input = plogical_shift_right<16>(input); return reinterpret_cast(input); } EIGEN_STRONG_INLINE Packet8bf F32ToBf16(Packet4f even, Packet4f odd){ Packet4f bf_odd, bf_even; bf_odd = reinterpret_cast(F32ToBf16(odd).m_val); bf_odd = plogical_shift_left<16>(bf_odd); bf_even = reinterpret_cast(F32ToBf16(even).m_val); return reinterpret_cast(por(bf_even, bf_odd)); } #define BF16_TO_F32_UNARY_OP_WRAPPER(OP, A) \ Packet4f a_even = Bf16ToF32Even(A);\ Packet4f a_odd = Bf16ToF32Odd(A);\ Packet4f op_even = OP(a_even);\ Packet4f op_odd = OP(a_odd);\ return F32ToBf16(op_even, op_odd);\ #define BF16_TO_F32_BINARY_OP_WRAPPER(OP, A, B) \ Packet4f a_even = Bf16ToF32Even(A);\ Packet4f a_odd = Bf16ToF32Odd(A);\ Packet4f b_even = Bf16ToF32Even(B);\ Packet4f b_odd = Bf16ToF32Odd(B);\ Packet4f op_even = OP(a_even, b_even);\ Packet4f op_odd = OP(a_odd, b_odd);\ return F32ToBf16(op_even, op_odd);\ #define BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(OP, A, B) \ Packet4f a_even = Bf16ToF32Even(A);\ Packet4f a_odd = Bf16ToF32Odd(A);\ Packet4f b_even = Bf16ToF32Even(B);\ Packet4f b_odd = Bf16ToF32Odd(B);\ Packet4f op_even = OP(a_even, b_even);\ Packet4f op_odd = OP(a_odd, b_odd);\ return F32ToBf16Bool(op_even, op_odd);\ template<> EIGEN_STRONG_INLINE Packet8bf padd(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(padd, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pmul(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(pmul, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pdiv(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(pdiv, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pnegate(const Packet8bf& a) { BF16_TO_F32_UNARY_OP_WRAPPER(pnegate, a); } template<> EIGEN_STRONG_INLINE Packet8bf psub(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(psub, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf psqrt (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(vec_sqrt, a); } template<> EIGEN_STRONG_INLINE Packet8bf prsqrt (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(prsqrt, a); } template<> EIGEN_STRONG_INLINE Packet8bf pexp (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(pexp_float, a); } template<> EIGEN_STRONG_INLINE Packet4f pldexp(const Packet4f& a, const Packet4f& exponent) { return pldexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet8bf pldexp (const Packet8bf& a, const Packet8bf& exponent){ BF16_TO_F32_BINARY_OP_WRAPPER(pldexp, a, exponent); } template<> EIGEN_STRONG_INLINE Packet4f pfrexp(const Packet4f& a, Packet4f& exponent) { return pfrexp_generic(a,exponent); } template<> EIGEN_STRONG_INLINE Packet8bf pfrexp (const Packet8bf& a, Packet8bf& e){ Packet4f a_even = Bf16ToF32Even(a); Packet4f a_odd = Bf16ToF32Odd(a); Packet4f e_even; Packet4f e_odd; Packet4f op_even = pfrexp(a_even, e_even); Packet4f op_odd = pfrexp(a_odd, e_odd); e = F32ToBf16(e_even, e_odd); return F32ToBf16(op_even, op_odd); } template<> EIGEN_STRONG_INLINE Packet8bf psin (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(psin_float, a); } template<> EIGEN_STRONG_INLINE Packet8bf pcos (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(pcos_float, a); } template<> EIGEN_STRONG_INLINE Packet8bf plog (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(plog_float, a); } template<> EIGEN_STRONG_INLINE Packet8bf pfloor (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(pfloor, a); } template<> EIGEN_STRONG_INLINE Packet8bf pceil (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(pceil, a); } template<> EIGEN_STRONG_INLINE Packet8bf pround (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(pround, a); } template<> EIGEN_STRONG_INLINE Packet8bf print (const Packet8bf& a){ BF16_TO_F32_UNARY_OP_WRAPPER(print, a); } template<> EIGEN_STRONG_INLINE Packet8bf pmadd(const Packet8bf& a, const Packet8bf& b, const Packet8bf& c) { Packet4f a_even = Bf16ToF32Even(a); Packet4f a_odd = Bf16ToF32Odd(a); Packet4f b_even = Bf16ToF32Even(b); Packet4f b_odd = Bf16ToF32Odd(b); Packet4f c_even = Bf16ToF32Even(c); Packet4f c_odd = Bf16ToF32Odd(c); Packet4f pmadd_even = pmadd(a_even, b_even, c_even); Packet4f pmadd_odd = pmadd(a_odd, b_odd, c_odd); return F32ToBf16(pmadd_even, pmadd_odd); } template<> EIGEN_STRONG_INLINE Packet8bf pmin(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(pmin, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pmax(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER(pmax, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_lt, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_lt_or_nan(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_lt_or_nan, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_le(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_le, a, b); } template<> EIGEN_STRONG_INLINE Packet8bf pcmp_eq(const Packet8bf& a, const Packet8bf& b) { BF16_TO_F32_BINARY_OP_WRAPPER_BOOL(pcmp_eq, a, b); } template<> EIGEN_STRONG_INLINE bfloat16 pfirst(const Packet8bf& a) { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16((pfirst(a))); } template<> EIGEN_STRONG_INLINE Packet8bf ploaddup(const bfloat16* from) { return ploaddup(reinterpret_cast(from)); } template<> EIGEN_STRONG_INLINE Packet8bf plset(const bfloat16& a) { bfloat16 countdown[8] = { bfloat16(0), bfloat16(1), bfloat16(2), bfloat16(3), bfloat16(4), bfloat16(5), bfloat16(6), bfloat16(7) }; return padd(pset1(a), pload(countdown)); } template<> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { Packet4f b, sum; b = vec_sld(a, a, 8); sum = a + b; b = vec_sld(sum, sum, 4); sum += b; return pfirst(sum); } template<> EIGEN_STRONG_INLINE int predux(const Packet4i& a) { Packet4i sum; sum = vec_sums(a, p4i_ZERO); #ifdef _BIG_ENDIAN sum = vec_sld(sum, p4i_ZERO, 12); #else sum = vec_sld(p4i_ZERO, sum, 4); #endif return pfirst(sum); } template<> EIGEN_STRONG_INLINE bfloat16 predux(const Packet8bf& a) { float redux_even = predux(Bf16ToF32Even(a)); float redux_odd = predux(Bf16ToF32Odd(a)); float f32_result = redux_even + redux_odd; return bfloat16(f32_result); } template EIGEN_STRONG_INLINE __UNPACK_TYPE__(Packet) predux_size8(const Packet& a) { union{ Packet v; __UNPACK_TYPE__(Packet) n[8]; } vt; vt.v = a; EIGEN_ALIGN16 int first_loader[4] = { vt.n[0], vt.n[1], vt.n[2], vt.n[3] }; EIGEN_ALIGN16 int second_loader[4] = { vt.n[4], vt.n[5], vt.n[6], vt.n[7] }; Packet4i first_half = pload(first_loader); Packet4i second_half = pload(second_loader); return static_cast<__UNPACK_TYPE__(Packet)>(predux(first_half) + predux(second_half)); } template<> EIGEN_STRONG_INLINE short int predux(const Packet8s& a) { return predux_size8(a); } template<> EIGEN_STRONG_INLINE unsigned short int predux(const Packet8us& a) { return predux_size8(a); } template EIGEN_STRONG_INLINE __UNPACK_TYPE__(Packet) predux_size16(const Packet& a) { union{ Packet v; __UNPACK_TYPE__(Packet) n[16]; } vt; vt.v = a; EIGEN_ALIGN16 int first_loader[4] = { vt.n[0], vt.n[1], vt.n[2], vt.n[3] }; EIGEN_ALIGN16 int second_loader[4] = { vt.n[4], vt.n[5], vt.n[6], vt.n[7] }; EIGEN_ALIGN16 int third_loader[4] = { vt.n[8], vt.n[9], vt.n[10], vt.n[11] }; EIGEN_ALIGN16 int fourth_loader[4] = { vt.n[12], vt.n[13], vt.n[14], vt.n[15] }; Packet4i first_quarter = pload(first_loader); Packet4i second_quarter = pload(second_loader); Packet4i third_quarter = pload(third_loader); Packet4i fourth_quarter = pload(fourth_loader); return static_cast<__UNPACK_TYPE__(Packet)>(predux(first_quarter) + predux(second_quarter) + predux(third_quarter) + predux(fourth_quarter)); } template<> EIGEN_STRONG_INLINE signed char predux(const Packet16c& a) { return predux_size16(a); } template<> EIGEN_STRONG_INLINE unsigned char predux(const Packet16uc& a) { return predux_size16(a); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { Packet4f prod; prod = pmul(a, vec_sld(a, a, 8)); return pfirst(pmul(prod, vec_sld(prod, prod, 4))); } template<> EIGEN_STRONG_INLINE int predux_mul(const Packet4i& a) { EIGEN_ALIGN16 int aux[4]; pstore(aux, a); return aux[0] * aux[1] * aux[2] * aux[3]; } template<> EIGEN_STRONG_INLINE short int predux_mul(const Packet8s& a) { Packet8s pair, quad, octo; pair = vec_mul(a, vec_sld(a, a, 8)); quad = vec_mul(pair, vec_sld(pair, pair, 4)); octo = vec_mul(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE unsigned short int predux_mul(const Packet8us& a) { Packet8us pair, quad, octo; pair = vec_mul(a, vec_sld(a, a, 8)); quad = vec_mul(pair, vec_sld(pair, pair, 4)); octo = vec_mul(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE bfloat16 predux_mul(const Packet8bf& a) { float redux_even = predux_mul(Bf16ToF32Even(a)); float redux_odd = predux_mul(Bf16ToF32Odd(a)); float f32_result = redux_even * redux_odd; return bfloat16(f32_result); } template<> EIGEN_STRONG_INLINE signed char predux_mul(const Packet16c& a) { Packet16c pair, quad, octo, result; pair = vec_mul(a, vec_sld(a, a, 8)); quad = vec_mul(pair, vec_sld(pair, pair, 4)); octo = vec_mul(quad, vec_sld(quad, quad, 2)); result = vec_mul(octo, vec_sld(octo, octo, 1)); return pfirst(result); } template<> EIGEN_STRONG_INLINE unsigned char predux_mul(const Packet16uc& a) { Packet16uc pair, quad, octo, result; pair = vec_mul(a, vec_sld(a, a, 8)); quad = vec_mul(pair, vec_sld(pair, pair, 4)); octo = vec_mul(quad, vec_sld(quad, quad, 2)); result = vec_mul(octo, vec_sld(octo, octo, 1)); return pfirst(result); } // min template EIGEN_STRONG_INLINE __UNPACK_TYPE__(Packet) predux_min4(const Packet& a) { Packet b, res; b = vec_min(a, vec_sld(a, a, 8)); res = vec_min(b, vec_sld(b, b, 4)); return pfirst(res); } template<> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { return predux_min4(a); } template<> EIGEN_STRONG_INLINE int predux_min(const Packet4i& a) { return predux_min4(a); } template<> EIGEN_STRONG_INLINE bfloat16 predux_min(const Packet8bf& a) { float redux_even = predux_min(Bf16ToF32Even(a)); float redux_odd = predux_min(Bf16ToF32Odd(a)); float f32_result = (std::min)(redux_even, redux_odd); return bfloat16(f32_result); } template<> EIGEN_STRONG_INLINE short int predux_min(const Packet8s& a) { Packet8s pair, quad, octo; //pair = { Min(a0,a4), Min(a1,a5), Min(a2,a6), Min(a3,a7) } pair = vec_min(a, vec_sld(a, a, 8)); //quad = { Min(a0, a4, a2, a6), Min(a1, a5, a3, a7) } quad = vec_min(pair, vec_sld(pair, pair, 4)); //octo = { Min(a0, a4, a2, a6, a1, a5, a3, a7) } octo = vec_min(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE unsigned short int predux_min(const Packet8us& a) { Packet8us pair, quad, octo; //pair = { Min(a0,a4), Min(a1,a5), Min(a2,a6), Min(a3,a7) } pair = vec_min(a, vec_sld(a, a, 8)); //quad = { Min(a0, a4, a2, a6), Min(a1, a5, a3, a7) } quad = vec_min(pair, vec_sld(pair, pair, 4)); //octo = { Min(a0, a4, a2, a6, a1, a5, a3, a7) } octo = vec_min(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE signed char predux_min(const Packet16c& a) { Packet16c pair, quad, octo, result; pair = vec_min(a, vec_sld(a, a, 8)); quad = vec_min(pair, vec_sld(pair, pair, 4)); octo = vec_min(quad, vec_sld(quad, quad, 2)); result = vec_min(octo, vec_sld(octo, octo, 1)); return pfirst(result); } template<> EIGEN_STRONG_INLINE unsigned char predux_min(const Packet16uc& a) { Packet16uc pair, quad, octo, result; pair = vec_min(a, vec_sld(a, a, 8)); quad = vec_min(pair, vec_sld(pair, pair, 4)); octo = vec_min(quad, vec_sld(quad, quad, 2)); result = vec_min(octo, vec_sld(octo, octo, 1)); return pfirst(result); } // max template EIGEN_STRONG_INLINE __UNPACK_TYPE__(Packet) predux_max4(const Packet& a) { Packet b, res; b = vec_max(a, vec_sld(a, a, 8)); res = vec_max(b, vec_sld(b, b, 4)); return pfirst(res); } template<> EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { return predux_max4(a); } template<> EIGEN_STRONG_INLINE int predux_max(const Packet4i& a) { return predux_max4(a); } template<> EIGEN_STRONG_INLINE bfloat16 predux_max(const Packet8bf& a) { float redux_even = predux_max(Bf16ToF32Even(a)); float redux_odd = predux_max(Bf16ToF32Odd(a)); float f32_result = (std::max)(redux_even, redux_odd); return bfloat16(f32_result); } template<> EIGEN_STRONG_INLINE short int predux_max(const Packet8s& a) { Packet8s pair, quad, octo; //pair = { Max(a0,a4), Max(a1,a5), Max(a2,a6), Max(a3,a7) } pair = vec_max(a, vec_sld(a, a, 8)); //quad = { Max(a0, a4, a2, a6), Max(a1, a5, a3, a7) } quad = vec_max(pair, vec_sld(pair, pair, 4)); //octo = { Max(a0, a4, a2, a6, a1, a5, a3, a7) } octo = vec_max(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE unsigned short int predux_max(const Packet8us& a) { Packet8us pair, quad, octo; //pair = { Max(a0,a4), Max(a1,a5), Max(a2,a6), Max(a3,a7) } pair = vec_max(a, vec_sld(a, a, 8)); //quad = { Max(a0, a4, a2, a6), Max(a1, a5, a3, a7) } quad = vec_max(pair, vec_sld(pair, pair, 4)); //octo = { Max(a0, a4, a2, a6, a1, a5, a3, a7) } octo = vec_max(quad, vec_sld(quad, quad, 2)); return pfirst(octo); } template<> EIGEN_STRONG_INLINE signed char predux_max(const Packet16c& a) { Packet16c pair, quad, octo, result; pair = vec_max(a, vec_sld(a, a, 8)); quad = vec_max(pair, vec_sld(pair, pair, 4)); octo = vec_max(quad, vec_sld(quad, quad, 2)); result = vec_max(octo, vec_sld(octo, octo, 1)); return pfirst(result); } template<> EIGEN_STRONG_INLINE unsigned char predux_max(const Packet16uc& a) { Packet16uc pair, quad, octo, result; pair = vec_max(a, vec_sld(a, a, 8)); quad = vec_max(pair, vec_sld(pair, pair, 4)); octo = vec_max(quad, vec_sld(quad, quad, 2)); result = vec_max(octo, vec_sld(octo, octo, 1)); return pfirst(result); } template<> EIGEN_STRONG_INLINE bool predux_any(const Packet4f& x) { return vec_any_ne(x, pzero(x)); } template EIGEN_DEVICE_FUNC inline void ptranpose_common(PacketBlock& kernel){ T t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { ptranpose_common(kernel); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { ptranpose_common(kernel); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8s t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8us t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8us t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0].m_val, kernel.packet[2].m_val); t1 = vec_mergel(kernel.packet[0].m_val, kernel.packet[2].m_val); t2 = vec_mergeh(kernel.packet[1].m_val, kernel.packet[3].m_val); t3 = vec_mergel(kernel.packet[1].m_val, kernel.packet[3].m_val); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet16c t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet16uc t0, t1, t2, t3; t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]); t1 = vec_mergel(kernel.packet[0], kernel.packet[2]); t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]); t3 = vec_mergel(kernel.packet[1], kernel.packet[3]); kernel.packet[0] = vec_mergeh(t0, t2); kernel.packet[1] = vec_mergel(t0, t2); kernel.packet[2] = vec_mergeh(t1, t3); kernel.packet[3] = vec_mergel(t1, t3); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8s v[8], sum[8]; v[0] = vec_mergeh(kernel.packet[0], kernel.packet[4]); v[1] = vec_mergel(kernel.packet[0], kernel.packet[4]); v[2] = vec_mergeh(kernel.packet[1], kernel.packet[5]); v[3] = vec_mergel(kernel.packet[1], kernel.packet[5]); v[4] = vec_mergeh(kernel.packet[2], kernel.packet[6]); v[5] = vec_mergel(kernel.packet[2], kernel.packet[6]); v[6] = vec_mergeh(kernel.packet[3], kernel.packet[7]); v[7] = vec_mergel(kernel.packet[3], kernel.packet[7]); sum[0] = vec_mergeh(v[0], v[4]); sum[1] = vec_mergel(v[0], v[4]); sum[2] = vec_mergeh(v[1], v[5]); sum[3] = vec_mergel(v[1], v[5]); sum[4] = vec_mergeh(v[2], v[6]); sum[5] = vec_mergel(v[2], v[6]); sum[6] = vec_mergeh(v[3], v[7]); sum[7] = vec_mergel(v[3], v[7]); kernel.packet[0] = vec_mergeh(sum[0], sum[4]); kernel.packet[1] = vec_mergel(sum[0], sum[4]); kernel.packet[2] = vec_mergeh(sum[1], sum[5]); kernel.packet[3] = vec_mergel(sum[1], sum[5]); kernel.packet[4] = vec_mergeh(sum[2], sum[6]); kernel.packet[5] = vec_mergel(sum[2], sum[6]); kernel.packet[6] = vec_mergeh(sum[3], sum[7]); kernel.packet[7] = vec_mergel(sum[3], sum[7]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8us v[8], sum[8]; v[0] = vec_mergeh(kernel.packet[0], kernel.packet[4]); v[1] = vec_mergel(kernel.packet[0], kernel.packet[4]); v[2] = vec_mergeh(kernel.packet[1], kernel.packet[5]); v[3] = vec_mergel(kernel.packet[1], kernel.packet[5]); v[4] = vec_mergeh(kernel.packet[2], kernel.packet[6]); v[5] = vec_mergel(kernel.packet[2], kernel.packet[6]); v[6] = vec_mergeh(kernel.packet[3], kernel.packet[7]); v[7] = vec_mergel(kernel.packet[3], kernel.packet[7]); sum[0] = vec_mergeh(v[0], v[4]); sum[1] = vec_mergel(v[0], v[4]); sum[2] = vec_mergeh(v[1], v[5]); sum[3] = vec_mergel(v[1], v[5]); sum[4] = vec_mergeh(v[2], v[6]); sum[5] = vec_mergel(v[2], v[6]); sum[6] = vec_mergeh(v[3], v[7]); sum[7] = vec_mergel(v[3], v[7]); kernel.packet[0] = vec_mergeh(sum[0], sum[4]); kernel.packet[1] = vec_mergel(sum[0], sum[4]); kernel.packet[2] = vec_mergeh(sum[1], sum[5]); kernel.packet[3] = vec_mergel(sum[1], sum[5]); kernel.packet[4] = vec_mergeh(sum[2], sum[6]); kernel.packet[5] = vec_mergel(sum[2], sum[6]); kernel.packet[6] = vec_mergeh(sum[3], sum[7]); kernel.packet[7] = vec_mergel(sum[3], sum[7]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet8bf v[8], sum[8]; v[0] = vec_mergeh(kernel.packet[0].m_val, kernel.packet[4].m_val); v[1] = vec_mergel(kernel.packet[0].m_val, kernel.packet[4].m_val); v[2] = vec_mergeh(kernel.packet[1].m_val, kernel.packet[5].m_val); v[3] = vec_mergel(kernel.packet[1].m_val, kernel.packet[5].m_val); v[4] = vec_mergeh(kernel.packet[2].m_val, kernel.packet[6].m_val); v[5] = vec_mergel(kernel.packet[2].m_val, kernel.packet[6].m_val); v[6] = vec_mergeh(kernel.packet[3].m_val, kernel.packet[7].m_val); v[7] = vec_mergel(kernel.packet[3].m_val, kernel.packet[7].m_val); sum[0] = vec_mergeh(v[0].m_val, v[4].m_val); sum[1] = vec_mergel(v[0].m_val, v[4].m_val); sum[2] = vec_mergeh(v[1].m_val, v[5].m_val); sum[3] = vec_mergel(v[1].m_val, v[5].m_val); sum[4] = vec_mergeh(v[2].m_val, v[6].m_val); sum[5] = vec_mergel(v[2].m_val, v[6].m_val); sum[6] = vec_mergeh(v[3].m_val, v[7].m_val); sum[7] = vec_mergel(v[3].m_val, v[7].m_val); kernel.packet[0] = vec_mergeh(sum[0].m_val, sum[4].m_val); kernel.packet[1] = vec_mergel(sum[0].m_val, sum[4].m_val); kernel.packet[2] = vec_mergeh(sum[1].m_val, sum[5].m_val); kernel.packet[3] = vec_mergel(sum[1].m_val, sum[5].m_val); kernel.packet[4] = vec_mergeh(sum[2].m_val, sum[6].m_val); kernel.packet[5] = vec_mergel(sum[2].m_val, sum[6].m_val); kernel.packet[6] = vec_mergeh(sum[3].m_val, sum[7].m_val); kernel.packet[7] = vec_mergel(sum[3].m_val, sum[7].m_val); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet16c step1[16], step2[16], step3[16]; step1[0] = vec_mergeh(kernel.packet[0], kernel.packet[8]); step1[1] = vec_mergel(kernel.packet[0], kernel.packet[8]); step1[2] = vec_mergeh(kernel.packet[1], kernel.packet[9]); step1[3] = vec_mergel(kernel.packet[1], kernel.packet[9]); step1[4] = vec_mergeh(kernel.packet[2], kernel.packet[10]); step1[5] = vec_mergel(kernel.packet[2], kernel.packet[10]); step1[6] = vec_mergeh(kernel.packet[3], kernel.packet[11]); step1[7] = vec_mergel(kernel.packet[3], kernel.packet[11]); step1[8] = vec_mergeh(kernel.packet[4], kernel.packet[12]); step1[9] = vec_mergel(kernel.packet[4], kernel.packet[12]); step1[10] = vec_mergeh(kernel.packet[5], kernel.packet[13]); step1[11] = vec_mergel(kernel.packet[5], kernel.packet[13]); step1[12] = vec_mergeh(kernel.packet[6], kernel.packet[14]); step1[13] = vec_mergel(kernel.packet[6], kernel.packet[14]); step1[14] = vec_mergeh(kernel.packet[7], kernel.packet[15]); step1[15] = vec_mergel(kernel.packet[7], kernel.packet[15]); step2[0] = vec_mergeh(step1[0], step1[8]); step2[1] = vec_mergel(step1[0], step1[8]); step2[2] = vec_mergeh(step1[1], step1[9]); step2[3] = vec_mergel(step1[1], step1[9]); step2[4] = vec_mergeh(step1[2], step1[10]); step2[5] = vec_mergel(step1[2], step1[10]); step2[6] = vec_mergeh(step1[3], step1[11]); step2[7] = vec_mergel(step1[3], step1[11]); step2[8] = vec_mergeh(step1[4], step1[12]); step2[9] = vec_mergel(step1[4], step1[12]); step2[10] = vec_mergeh(step1[5], step1[13]); step2[11] = vec_mergel(step1[5], step1[13]); step2[12] = vec_mergeh(step1[6], step1[14]); step2[13] = vec_mergel(step1[6], step1[14]); step2[14] = vec_mergeh(step1[7], step1[15]); step2[15] = vec_mergel(step1[7], step1[15]); step3[0] = vec_mergeh(step2[0], step2[8]); step3[1] = vec_mergel(step2[0], step2[8]); step3[2] = vec_mergeh(step2[1], step2[9]); step3[3] = vec_mergel(step2[1], step2[9]); step3[4] = vec_mergeh(step2[2], step2[10]); step3[5] = vec_mergel(step2[2], step2[10]); step3[6] = vec_mergeh(step2[3], step2[11]); step3[7] = vec_mergel(step2[3], step2[11]); step3[8] = vec_mergeh(step2[4], step2[12]); step3[9] = vec_mergel(step2[4], step2[12]); step3[10] = vec_mergeh(step2[5], step2[13]); step3[11] = vec_mergel(step2[5], step2[13]); step3[12] = vec_mergeh(step2[6], step2[14]); step3[13] = vec_mergel(step2[6], step2[14]); step3[14] = vec_mergeh(step2[7], step2[15]); step3[15] = vec_mergel(step2[7], step2[15]); kernel.packet[0] = vec_mergeh(step3[0], step3[8]); kernel.packet[1] = vec_mergel(step3[0], step3[8]); kernel.packet[2] = vec_mergeh(step3[1], step3[9]); kernel.packet[3] = vec_mergel(step3[1], step3[9]); kernel.packet[4] = vec_mergeh(step3[2], step3[10]); kernel.packet[5] = vec_mergel(step3[2], step3[10]); kernel.packet[6] = vec_mergeh(step3[3], step3[11]); kernel.packet[7] = vec_mergel(step3[3], step3[11]); kernel.packet[8] = vec_mergeh(step3[4], step3[12]); kernel.packet[9] = vec_mergel(step3[4], step3[12]); kernel.packet[10] = vec_mergeh(step3[5], step3[13]); kernel.packet[11] = vec_mergel(step3[5], step3[13]); kernel.packet[12] = vec_mergeh(step3[6], step3[14]); kernel.packet[13] = vec_mergel(step3[6], step3[14]); kernel.packet[14] = vec_mergeh(step3[7], step3[15]); kernel.packet[15] = vec_mergel(step3[7], step3[15]); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet16uc step1[16], step2[16], step3[16]; step1[0] = vec_mergeh(kernel.packet[0], kernel.packet[8]); step1[1] = vec_mergel(kernel.packet[0], kernel.packet[8]); step1[2] = vec_mergeh(kernel.packet[1], kernel.packet[9]); step1[3] = vec_mergel(kernel.packet[1], kernel.packet[9]); step1[4] = vec_mergeh(kernel.packet[2], kernel.packet[10]); step1[5] = vec_mergel(kernel.packet[2], kernel.packet[10]); step1[6] = vec_mergeh(kernel.packet[3], kernel.packet[11]); step1[7] = vec_mergel(kernel.packet[3], kernel.packet[11]); step1[8] = vec_mergeh(kernel.packet[4], kernel.packet[12]); step1[9] = vec_mergel(kernel.packet[4], kernel.packet[12]); step1[10] = vec_mergeh(kernel.packet[5], kernel.packet[13]); step1[11] = vec_mergel(kernel.packet[5], kernel.packet[13]); step1[12] = vec_mergeh(kernel.packet[6], kernel.packet[14]); step1[13] = vec_mergel(kernel.packet[6], kernel.packet[14]); step1[14] = vec_mergeh(kernel.packet[7], kernel.packet[15]); step1[15] = vec_mergel(kernel.packet[7], kernel.packet[15]); step2[0] = vec_mergeh(step1[0], step1[8]); step2[1] = vec_mergel(step1[0], step1[8]); step2[2] = vec_mergeh(step1[1], step1[9]); step2[3] = vec_mergel(step1[1], step1[9]); step2[4] = vec_mergeh(step1[2], step1[10]); step2[5] = vec_mergel(step1[2], step1[10]); step2[6] = vec_mergeh(step1[3], step1[11]); step2[7] = vec_mergel(step1[3], step1[11]); step2[8] = vec_mergeh(step1[4], step1[12]); step2[9] = vec_mergel(step1[4], step1[12]); step2[10] = vec_mergeh(step1[5], step1[13]); step2[11] = vec_mergel(step1[5], step1[13]); step2[12] = vec_mergeh(step1[6], step1[14]); step2[13] = vec_mergel(step1[6], step1[14]); step2[14] = vec_mergeh(step1[7], step1[15]); step2[15] = vec_mergel(step1[7], step1[15]); step3[0] = vec_mergeh(step2[0], step2[8]); step3[1] = vec_mergel(step2[0], step2[8]); step3[2] = vec_mergeh(step2[1], step2[9]); step3[3] = vec_mergel(step2[1], step2[9]); step3[4] = vec_mergeh(step2[2], step2[10]); step3[5] = vec_mergel(step2[2], step2[10]); step3[6] = vec_mergeh(step2[3], step2[11]); step3[7] = vec_mergel(step2[3], step2[11]); step3[8] = vec_mergeh(step2[4], step2[12]); step3[9] = vec_mergel(step2[4], step2[12]); step3[10] = vec_mergeh(step2[5], step2[13]); step3[11] = vec_mergel(step2[5], step2[13]); step3[12] = vec_mergeh(step2[6], step2[14]); step3[13] = vec_mergel(step2[6], step2[14]); step3[14] = vec_mergeh(step2[7], step2[15]); step3[15] = vec_mergel(step2[7], step2[15]); kernel.packet[0] = vec_mergeh(step3[0], step3[8]); kernel.packet[1] = vec_mergel(step3[0], step3[8]); kernel.packet[2] = vec_mergeh(step3[1], step3[9]); kernel.packet[3] = vec_mergel(step3[1], step3[9]); kernel.packet[4] = vec_mergeh(step3[2], step3[10]); kernel.packet[5] = vec_mergel(step3[2], step3[10]); kernel.packet[6] = vec_mergeh(step3[3], step3[11]); kernel.packet[7] = vec_mergel(step3[3], step3[11]); kernel.packet[8] = vec_mergeh(step3[4], step3[12]); kernel.packet[9] = vec_mergel(step3[4], step3[12]); kernel.packet[10] = vec_mergeh(step3[5], step3[13]); kernel.packet[11] = vec_mergel(step3[5], step3[13]); kernel.packet[12] = vec_mergeh(step3[6], step3[14]); kernel.packet[13] = vec_mergel(step3[6], step3[14]); kernel.packet[14] = vec_mergeh(step3[7], step3[15]); kernel.packet[15] = vec_mergel(step3[7], step3[15]); } template EIGEN_STRONG_INLINE Packet pblend4(const Selector<4>& ifPacket, const Packet& thenPacket, const Packet& elsePacket) { Packet4ui select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3] }; Packet4ui mask = reinterpret_cast(vec_cmpeq(reinterpret_cast(select), reinterpret_cast(p4i_ONE))); return vec_sel(elsePacket, thenPacket, mask); } template<> EIGEN_STRONG_INLINE Packet4i pblend(const Selector<4>& ifPacket, const Packet4i& thenPacket, const Packet4i& elsePacket) { return pblend4(ifPacket, thenPacket, elsePacket); } template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) { return pblend4(ifPacket, thenPacket, elsePacket); } template<> EIGEN_STRONG_INLINE Packet8s pblend(const Selector<8>& ifPacket, const Packet8s& thenPacket, const Packet8s& elsePacket) { Packet8us select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3], ifPacket.select[4], ifPacket.select[5], ifPacket.select[6], ifPacket.select[7] }; Packet8us mask = reinterpret_cast(vec_cmpeq(select, p8us_ONE)); Packet8s result = vec_sel(elsePacket, thenPacket, mask); return result; } template<> EIGEN_STRONG_INLINE Packet8us pblend(const Selector<8>& ifPacket, const Packet8us& thenPacket, const Packet8us& elsePacket) { Packet8us select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3], ifPacket.select[4], ifPacket.select[5], ifPacket.select[6], ifPacket.select[7] }; Packet8us mask = reinterpret_cast(vec_cmpeq(reinterpret_cast(select), p8us_ONE)); return vec_sel(elsePacket, thenPacket, mask); } template<> EIGEN_STRONG_INLINE Packet8bf pblend(const Selector<8>& ifPacket, const Packet8bf& thenPacket, const Packet8bf& elsePacket) { return pblend(ifPacket, thenPacket, elsePacket); } template<> EIGEN_STRONG_INLINE Packet16c pblend(const Selector<16>& ifPacket, const Packet16c& thenPacket, const Packet16c& elsePacket) { Packet16uc select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3], ifPacket.select[4], ifPacket.select[5], ifPacket.select[6], ifPacket.select[7], ifPacket.select[8], ifPacket.select[9], ifPacket.select[10], ifPacket.select[11], ifPacket.select[12], ifPacket.select[13], ifPacket.select[14], ifPacket.select[15] }; Packet16uc mask = reinterpret_cast(vec_cmpeq(reinterpret_cast(select), p16uc_ONE)); return vec_sel(elsePacket, thenPacket, mask); } template<> EIGEN_STRONG_INLINE Packet16uc pblend(const Selector<16>& ifPacket, const Packet16uc& thenPacket, const Packet16uc& elsePacket) { Packet16uc select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3], ifPacket.select[4], ifPacket.select[5], ifPacket.select[6], ifPacket.select[7], ifPacket.select[8], ifPacket.select[9], ifPacket.select[10], ifPacket.select[11], ifPacket.select[12], ifPacket.select[13], ifPacket.select[14], ifPacket.select[15] }; Packet16uc mask = reinterpret_cast(vec_cmpeq(reinterpret_cast(select), p16uc_ONE)); return vec_sel(elsePacket, thenPacket, mask); } template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template <> struct type_casting_traits { enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 }; }; template<> EIGEN_STRONG_INLINE Packet4i pcast(const Packet4f& a) { return vec_cts(a,0); } template<> EIGEN_STRONG_INLINE Packet4ui pcast(const Packet4f& a) { return vec_ctu(a,0); } template<> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4i& a) { return vec_ctf(a,0); } template<> EIGEN_STRONG_INLINE Packet4f pcast(const Packet4ui& a) { return vec_ctf(a,0); } template<> EIGEN_STRONG_INLINE Packet8us pcast(const Packet8bf& a) { Packet4f float_even = Bf16ToF32Even(a); Packet4f float_odd = Bf16ToF32Odd(a); Packet4ui int_even = pcast(float_even); Packet4ui int_odd = pcast(float_odd); const _EIGEN_DECLARE_CONST_FAST_Packet4ui(low_mask, 0x0000FFFF); Packet4ui low_even = pand(int_even, p4ui_low_mask); Packet4ui low_odd = pand(int_odd, p4ui_low_mask); //Check values that are bigger than USHRT_MAX (0xFFFF) Packet4bi overflow_selector; if(vec_any_gt(int_even, p4ui_low_mask)){ overflow_selector = vec_cmpgt(int_even, p4ui_low_mask); low_even = vec_sel(low_even, p4ui_low_mask, overflow_selector); } if(vec_any_gt(int_odd, p4ui_low_mask)){ overflow_selector = vec_cmpgt(int_odd, p4ui_low_mask); low_odd = vec_sel(low_even, p4ui_low_mask, overflow_selector); } low_odd = plogical_shift_left<16>(low_odd); Packet4ui int_final = por(low_even, low_odd); return reinterpret_cast(int_final); } template<> EIGEN_STRONG_INLINE Packet8bf pcast(const Packet8us& a) { //short -> int -> float -> bfloat16 const _EIGEN_DECLARE_CONST_FAST_Packet4ui(low_mask, 0x0000FFFF); Packet4ui int_cast = reinterpret_cast(a); Packet4ui int_even = pand(int_cast, p4ui_low_mask); Packet4ui int_odd = plogical_shift_right<16>(int_cast); Packet4f float_even = pcast(int_even); Packet4f float_odd = pcast(int_odd); return F32ToBf16(float_even, float_odd); } template<> EIGEN_STRONG_INLINE Packet4i preinterpret(const Packet4f& a) { return reinterpret_cast(a); } template<> EIGEN_STRONG_INLINE Packet4f preinterpret(const Packet4i& a) { return reinterpret_cast(a); } //---------- double ---------- #ifdef __VSX__ typedef __vector double Packet2d; typedef __vector unsigned long long Packet2ul; typedef __vector long long Packet2l; #if EIGEN_COMP_CLANG typedef Packet2ul Packet2bl; #else typedef __vector __bool long Packet2bl; #endif static Packet2l p2l_ONE = { 1, 1 }; static Packet2l p2l_ZERO = reinterpret_cast(p4i_ZERO); static Packet2ul p2ul_SIGN = { 0x8000000000000000ull, 0x8000000000000000ull }; static Packet2ul p2ul_PREV0DOT5 = { 0x3FDFFFFFFFFFFFFFull, 0x3FDFFFFFFFFFFFFFull }; static Packet2d p2d_ONE = { 1.0, 1.0 }; static Packet2d p2d_ZERO = reinterpret_cast(p4f_ZERO); static Packet2d p2d_MZERO = { numext::bit_cast(0x8000000000000000ull), numext::bit_cast(0x8000000000000000ull) }; #ifdef _BIG_ENDIAN static Packet2d p2d_COUNTDOWN = reinterpret_cast(vec_sld(reinterpret_cast(p2d_ZERO), reinterpret_cast(p2d_ONE), 8)); #else static Packet2d p2d_COUNTDOWN = reinterpret_cast(vec_sld(reinterpret_cast(p2d_ONE), reinterpret_cast(p2d_ZERO), 8)); #endif template Packet2d vec_splat_dbl(Packet2d& a) { return vec_splat(a, index); } template<> struct packet_traits : default_packet_traits { typedef Packet2d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size=2, HasHalfPacket = 1, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasMin = 1, HasMax = 1, HasAbs = 1, HasSin = 0, HasCos = 0, HasLog = 0, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasRint = 1, HasNegate = 1, HasBlend = 1 }; }; template<> struct unpacket_traits { typedef double type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2d half; }; inline std::ostream & operator <<(std::ostream & s, const Packet2l & v) { union { Packet2l v; int64_t n[2]; } vt; vt.v = v; s << vt.n[0] << ", " << vt.n[1]; return s; } inline std::ostream & operator <<(std::ostream & s, const Packet2d & v) { union { Packet2d v; double n[2]; } vt; vt.v = v; s << vt.n[0] << ", " << vt.n[1]; return s; } // Need to define them first or we get specialization after instantiation errors template<> EIGEN_STRONG_INLINE Packet2d pload(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return vec_xl(0, const_cast(from)); // cast needed by Clang } template<> EIGEN_STRONG_INLINE void pstore(double* to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE vec_xst(from, 0, to); } template<> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { Packet2d v = {from, from}; return v; } template<> EIGEN_STRONG_INLINE Packet2d pset1frombits(unsigned long from) { Packet2l v = {static_cast(from), static_cast(from)}; return reinterpret_cast(v); } template<> EIGEN_STRONG_INLINE void pbroadcast4(const double *a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3) { //This way is faster than vec_splat (at least for doubles in Power 9) a0 = pset1(a[0]); a1 = pset1(a[1]); a2 = pset1(a[2]); a3 = pset1(a[3]); } template<> EIGEN_DEVICE_FUNC inline Packet2d pgather(const double* from, Index stride) { EIGEN_ALIGN16 double af[2]; af[0] = from[0*stride]; af[1] = from[1*stride]; return pload(af); } template<> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet2d& from, Index stride) { EIGEN_ALIGN16 double af[2]; pstore(af, from); to[0*stride] = af[0]; to[1*stride] = af[1]; } template<> EIGEN_STRONG_INLINE Packet2d plset(const double& a) { return pset1(a) + p2d_COUNTDOWN; } template<> EIGEN_STRONG_INLINE Packet2d padd(const Packet2d& a, const Packet2d& b) { return a + b; } template<> EIGEN_STRONG_INLINE Packet2d psub(const Packet2d& a, const Packet2d& b) { return a - b; } template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { return p2d_ZERO - a; } template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; } template<> EIGEN_STRONG_INLINE Packet2d pmul(const Packet2d& a, const Packet2d& b) { return vec_madd(a,b,p2d_MZERO); } template<> EIGEN_STRONG_INLINE Packet2d pdiv(const Packet2d& a, const Packet2d& b) { return vec_div(a,b); } // for some weird raisons, it has to be overloaded for packet of integers template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vec_madd(a, b, c); } template<> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { // NOTE: about 10% slower than vec_min, but consistent with std::min and SSE regarding NaN Packet2d ret; __asm__ ("xvcmpgedp %x0,%x1,%x2\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b)); return ret; } template<> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { // NOTE: about 10% slower than vec_max, but consistent with std::max and SSE regarding NaN Packet2d ret; __asm__ ("xvcmpgtdp %x0,%x2,%x1\n\txxsel %x0,%x1,%x2,%x0" : "=&wa" (ret) : "wa" (a), "wa" (b)); return ret; } template<> EIGEN_STRONG_INLINE Packet2d pcmp_le(const Packet2d& a, const Packet2d& b) { return reinterpret_cast(vec_cmple(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt(const Packet2d& a, const Packet2d& b) { return reinterpret_cast(vec_cmplt(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_eq(const Packet2d& a, const Packet2d& b) { return reinterpret_cast(vec_cmpeq(a,b)); } template<> EIGEN_STRONG_INLINE Packet2d pcmp_lt_or_nan(const Packet2d& a, const Packet2d& b) { Packet2d c = reinterpret_cast(vec_cmpge(a,b)); return vec_nor(c,c); } template<> EIGEN_STRONG_INLINE Packet2d pand(const Packet2d& a, const Packet2d& b) { return vec_and(a, b); } template<> EIGEN_STRONG_INLINE Packet2d por(const Packet2d& a, const Packet2d& b) { return vec_or(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& a, const Packet2d& b) { return vec_xor(a, b); } template<> EIGEN_STRONG_INLINE Packet2d pandnot(const Packet2d& a, const Packet2d& b) { return vec_and(a, vec_nor(b, b)); } template<> EIGEN_STRONG_INLINE Packet2d pround(const Packet2d& a) { Packet2d t = vec_add(reinterpret_cast(vec_or(vec_and(reinterpret_cast(a), p2ul_SIGN), p2ul_PREV0DOT5)), a); Packet2d res; __asm__("xvrdpiz %x0, %x1\n\t" : "=&wa" (res) : "wa" (t)); return res; } template<> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { return vec_ceil(a); } template<> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { return vec_floor(a); } template<> EIGEN_STRONG_INLINE Packet2d print(const Packet2d& a) { Packet2d res; __asm__("xvrdpic %x0, %x1\n\t" : "=&wa" (res) : "wa" (a)); return res; } template<> EIGEN_STRONG_INLINE Packet2d ploadu(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vec_xl(0, const_cast(from)); } template<> EIGEN_STRONG_INLINE Packet2d ploaddup(const double* from) { Packet2d p; if((std::ptrdiff_t(from) % 16) == 0) p = pload(from); else p = ploadu(from); return vec_splat_dbl<0>(p); } template<> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet2d& from) { EIGEN_DEBUG_UNALIGNED_STORE vec_xst(from, 0, to); } template<> EIGEN_STRONG_INLINE void prefetch(const double* addr) { EIGEN_PPC_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { EIGEN_ALIGN16 double x[2]; pstore(x, a); return x[0]; } template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return reinterpret_cast(vec_perm(reinterpret_cast(a), reinterpret_cast(a), p16uc_REVERSE64)); } template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vec_abs(a); } // VSX support varies between different compilers and even different // versions of the same compiler. For gcc version >= 4.9.3, we can use // vec_cts to efficiently convert Packet2d to Packet2l. Otherwise, use // a slow version that works with older compilers. // Update: apparently vec_cts/vec_ctf intrinsics for 64-bit doubles // are buggy, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70963 template<> inline Packet2l pcast(const Packet2d& x) { #if EIGEN_GNUC_AT_LEAST(5, 4) || \ (EIGEN_GNUC_AT(6, 1) && __GNUC_PATCHLEVEL__ >= 1) return vec_cts(x, 0); // TODO: check clang version. #else double tmp[2]; memcpy(tmp, &x, sizeof(tmp)); Packet2l l = { static_cast(tmp[0]), static_cast(tmp[1]) }; return l; #endif } template<> inline Packet2d pcast(const Packet2l& x) { unsigned long long tmp[2]; memcpy(tmp, &x, sizeof(tmp)); Packet2d d = { static_cast(tmp[0]), static_cast(tmp[1]) }; return d; } // Packet2l shifts. // For POWER8 we simply use vec_sr/l. // // Things are more complicated for POWER7. There is actually a // vec_xxsxdi intrinsic but it is not supported by some gcc versions. // So we need to shift by N % 32 and rearrage bytes. #ifdef __POWER8_VECTOR__ template EIGEN_STRONG_INLINE Packet2l plogical_shift_left(const Packet2l& a) { const Packet2ul shift = { N, N }; return vec_sl(a, shift); } template EIGEN_STRONG_INLINE Packet2l plogical_shift_right(const Packet2l& a) { const Packet2ul shift = { N, N }; return vec_sr(a, shift); } #else // Shifts [A, B, C, D] to [B, 0, D, 0]. // Used to implement left shifts for Packet2l. EIGEN_ALWAYS_INLINE Packet4i shift_even_left(const Packet4i& a) { static const Packet16uc perm = { 0x14, 0x15, 0x16, 0x17, 0x00, 0x01, 0x02, 0x03, 0x1c, 0x1d, 0x1e, 0x1f, 0x08, 0x09, 0x0a, 0x0b }; #ifdef _BIG_ENDIAN return vec_perm(p4i_ZERO, a, perm); #else return vec_perm(a, p4i_ZERO, perm); #endif } // Shifts [A, B, C, D] to [0, A, 0, C]. // Used to implement right shifts for Packet2l. EIGEN_ALWAYS_INLINE Packet4i shift_odd_right(const Packet4i& a) { static const Packet16uc perm = { 0x04, 0x05, 0x06, 0x07, 0x10, 0x11, 0x12, 0x13, 0x0c, 0x0d, 0x0e, 0x0f, 0x18, 0x19, 0x1a, 0x1b }; #ifdef _BIG_ENDIAN return vec_perm(p4i_ZERO, a, perm); #else return vec_perm(a, p4i_ZERO, perm); #endif } template struct plogical_shift_left_impl; template struct plogical_shift_left_impl= 0)>::type> { static EIGEN_STRONG_INLINE Packet2l run(const Packet2l& a) { static const unsigned n = static_cast(N); const Packet4ui shift = {n, n, n, n}; const Packet4i ai = reinterpret_cast(a); static const unsigned m = static_cast(32 - N); const Packet4ui shift_right = {m, m, m, m}; const Packet4i out_hi = vec_sl(ai, shift); const Packet4i out_lo = shift_even_left(vec_sr(ai, shift_right)); return reinterpret_cast(por(out_hi, out_lo)); } }; template struct plogical_shift_left_impl= 32)>::type> { static EIGEN_STRONG_INLINE Packet2l run(const Packet2l& a) { static const unsigned m = static_cast(N - 32); const Packet4ui shift = {m, m, m, m}; const Packet4i ai = reinterpret_cast(a); return reinterpret_cast(shift_even_left(vec_sl(ai, shift))); } }; template EIGEN_STRONG_INLINE Packet2l plogical_shift_left(const Packet2l& a) { return plogical_shift_left_impl::run(a); } template struct plogical_shift_right_impl; template struct plogical_shift_right_impl= 0)>::type> { static EIGEN_STRONG_INLINE Packet2l run(const Packet2l& a) { static const unsigned n = static_cast(N); const Packet4ui shift = {n, n, n, n}; const Packet4i ai = reinterpret_cast(a); static const unsigned m = static_cast(32 - N); const Packet4ui shift_left = {m, m, m, m}; const Packet4i out_lo = vec_sr(ai, shift); const Packet4i out_hi = shift_odd_right(vec_sl(ai, shift_left)); return reinterpret_cast(por(out_hi, out_lo)); } }; template struct plogical_shift_right_impl= 32)>::type> { static EIGEN_STRONG_INLINE Packet2l run(const Packet2l& a) { static const unsigned m = static_cast(N - 32); const Packet4ui shift = {m, m, m, m}; const Packet4i ai = reinterpret_cast(a); return reinterpret_cast(shift_odd_right(vec_sr(ai, shift))); } }; template EIGEN_STRONG_INLINE Packet2l plogical_shift_right(const Packet2l& a) { return plogical_shift_right_impl::run(a); } #endif template<> EIGEN_STRONG_INLINE Packet2d pldexp(const Packet2d& a, const Packet2d& exponent) { // Clamp exponent to [-2099, 2099] const Packet2d max_exponent = pset1(2099.0); const Packet2l e = pcast(pmin(pmax(exponent, pnegate(max_exponent)), max_exponent)); // Split 2^e into four factors and multiply: const Packet2l bias = { 1023, 1023 }; Packet2l b = plogical_shift_right<2>(e); // floor(e/4) Packet2d c = reinterpret_cast(plogical_shift_left<52>(b + bias)); Packet2d out = pmul(pmul(pmul(a, c), c), c); // a * 2^(3b) b = psub(psub(psub(e, b), b), b); // e - 3b c = reinterpret_cast(plogical_shift_left<52>(b + bias)); // 2^(e - 3b) out = pmul(out, c); // a * 2^e return out; } // Extract exponent without existence of Packet2l. template<> EIGEN_STRONG_INLINE Packet2d pfrexp_generic_get_biased_exponent(const Packet2d& a) { return pcast(plogical_shift_right<52>(reinterpret_cast(pabs(a)))); } template<> EIGEN_STRONG_INLINE Packet2d pfrexp (const Packet2d& a, Packet2d& exponent) { return pfrexp_generic(a, exponent); } template<> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { Packet2d b, sum; b = reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)); sum = a + b; return pfirst(sum); } // Other reduction functions: // mul template<> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { return pfirst(pmul(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } // min template<> EIGEN_STRONG_INLINE double predux_min(const Packet2d& a) { return pfirst(pmin(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } // max template<> EIGEN_STRONG_INLINE double predux_max(const Packet2d& a) { return pfirst(pmax(a, reinterpret_cast(vec_sld(reinterpret_cast(a), reinterpret_cast(a), 8)))); } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { Packet2d t0, t1; t0 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_HI); t1 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_LO); kernel.packet[0] = t0; kernel.packet[1] = t1; } template<> EIGEN_STRONG_INLINE Packet2d pblend(const Selector<2>& ifPacket, const Packet2d& thenPacket, const Packet2d& elsePacket) { Packet2l select = { ifPacket.select[0], ifPacket.select[1] }; Packet2bl mask = reinterpret_cast( vec_cmpeq(reinterpret_cast(select), reinterpret_cast(p2l_ONE)) ); return vec_sel(elsePacket, thenPacket, mask); } #endif // __VSX__ } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/MatrixProductMMA.h0000644000176200001440000006036414562417221025026 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020 Everton Constantino (everton.constantino@ibm.com) // Copyright (C) 2021 Chip Kerchner (chip.kerchner@ibm.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H #define EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H #pragma GCC target("cpu=power10") #ifdef __has_builtin #if !__has_builtin(__builtin_vsx_assemble_pair) #define __builtin_vsx_assemble_pair __builtin_mma_assemble_pair #endif #endif namespace Eigen { namespace internal { template EIGEN_ALWAYS_INLINE void bsetzeroMMA(__vector_quad* acc) { __builtin_mma_xxsetaccz(acc); } template EIGEN_ALWAYS_INLINE void storeAccumulator(Index i, Index j, const DataMapper& data, const Packet& alpha, __vector_quad* acc) { PacketBlock result; __builtin_mma_disassemble_acc(&result.packet, acc); PacketBlock tRes; bload(tRes, data, i, j); bscale(tRes, result, alpha); data.template storePacketBlock(i, j, tRes); } template EIGEN_ALWAYS_INLINE void storeComplexAccumulator(Index i, Index j, const DataMapper& data, const Packet& alphaReal, const Packet& alphaImag, __vector_quad* accReal, __vector_quad* accImag) { PacketBlock resultReal, resultImag; __builtin_mma_disassemble_acc(&resultReal.packet, accReal); __builtin_mma_disassemble_acc(&resultImag.packet, accImag); PacketBlock tRes; bload(tRes, data, i, j); PacketBlock taccReal, taccImag; bscalec(resultReal, resultImag, alphaReal, alphaImag, taccReal, taccImag); PacketBlock acc1, acc2; bcouple(taccReal, taccImag, tRes, acc1, acc2); data.template storePacketBlock(i + N*accColsC, j, acc1); data.template storePacketBlock(i + (N+1)*accColsC, j, acc2); } // Defaults to float32, since Eigen still supports C++03 we can't use default template arguments template EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const RhsPacket& a, const LhsPacket& b) { if(NegativeAccumulate) { __builtin_mma_xvf32gernp(acc, (__vector unsigned char)a, (__vector unsigned char)b); } else { __builtin_mma_xvf32gerpp(acc, (__vector unsigned char)a, (__vector unsigned char)b); } } template EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const PacketBlock& a, const Packet2d& b) { __vector_pair* a0 = (__vector_pair *)(&a.packet[0]); if(NegativeAccumulate) { __builtin_mma_xvf64gernp(acc, *a0, (__vector unsigned char)b); } else { __builtin_mma_xvf64gerpp(acc, *a0, (__vector unsigned char)b); } } template EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad* acc, const __vector_pair& a, const Packet2d& b) { if(NegativeAccumulate) { __builtin_mma_xvf64gernp(acc, (__vector_pair)a, (__vector unsigned char)b); } else { __builtin_mma_xvf64gerpp(acc, (__vector_pair)a, (__vector unsigned char)b); } } template EIGEN_ALWAYS_INLINE void pgerMMA(__vector_quad*, const __vector_pair&, const Packet4f&) { // Just for compilation } template EIGEN_ALWAYS_INLINE void pgercMMA(__vector_quad* accReal, __vector_quad* accImag, const Packet& lhsV, const Packet& lhsVi, const RhsPacket& rhsV, const RhsPacket& rhsVi) { pgerMMA(accReal, rhsV, lhsV); if(LhsIsReal) { pgerMMA(accImag, rhsVi, lhsV); } else { if(!RhsIsReal) { pgerMMA(accReal, rhsVi, lhsVi); pgerMMA(accImag, rhsVi, lhsV); } else { EIGEN_UNUSED_VARIABLE(rhsVi); } pgerMMA(accImag, rhsV, lhsVi); } } // This is necessary because ploadRhs for double returns a pair of vectors when MMA is enabled. template EIGEN_ALWAYS_INLINE void ploadRhsMMA(const Scalar* rhs, Packet& rhsV) { rhsV = ploadRhs((const Scalar*)(rhs)); } template<> EIGEN_ALWAYS_INLINE void ploadRhsMMA >(const double* rhs, PacketBlock& rhsV) { rhsV.packet[0] = ploadRhs((const double *)((Packet2d *)rhs )); rhsV.packet[1] = ploadRhs((const double *)(((Packet2d *)rhs) + 1)); } template<> EIGEN_ALWAYS_INLINE void ploadRhsMMA(const double* rhs, __vector_pair& rhsV) { #if EIGEN_COMP_LLVM __builtin_vsx_assemble_pair(&rhsV, (__vector unsigned char)(ploadRhs((const double *)(((Packet2d *)rhs) + 1))), (__vector unsigned char)(ploadRhs((const double *)((Packet2d *)rhs )))); #else __asm__ ("lxvp %x0,%1" : "=wa" (rhsV) : "Y" (*rhs)); #endif } template<> EIGEN_ALWAYS_INLINE void ploadRhsMMA(const float*, __vector_pair&) { // Just for compilation } // PEEL_MMA loop factor. #define PEEL_MMA 7 #define MICRO_MMA_UNROLL(func) \ func(0) func(1) func(2) func(3) func(4) func(5) func(6) func(7) #define MICRO_MMA_LOAD_ONE(iter) \ if (unroll_factor > iter) { \ lhsV##iter = ploadLhs(lhs_ptr##iter); \ lhs_ptr##iter += accCols; \ } else { \ EIGEN_UNUSED_VARIABLE(lhsV##iter); \ } #define MICRO_MMA_WORK_ONE(iter, type, peel) \ if (unroll_factor > iter) { \ pgerMMA(&accZero##iter, rhsV##peel, lhsV##iter); \ } #define MICRO_MMA_TYPE_PEEL(func, func2, type, peel) \ if (PEEL_MMA > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4, lhsV5, lhsV6, lhsV7; \ ploadRhsMMA(rhs_ptr + (accRows * peel), rhsV##peel); \ MICRO_MMA_UNROLL(func2); \ func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) \ func(4,type,peel) func(5,type,peel) func(6,type,peel) func(7,type,peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ } #define MICRO_MMA_UNROLL_TYPE_PEEL(func, func2, type) \ type rhsV0, rhsV1, rhsV2, rhsV3, rhsV4, rhsV5, rhsV6, rhsV7, rhsV8, rhsV9; \ MICRO_MMA_TYPE_PEEL(func,func2,type,0); MICRO_MMA_TYPE_PEEL(func,func2,type,1); \ MICRO_MMA_TYPE_PEEL(func,func2,type,2); MICRO_MMA_TYPE_PEEL(func,func2,type,3); \ MICRO_MMA_TYPE_PEEL(func,func2,type,4); MICRO_MMA_TYPE_PEEL(func,func2,type,5); \ MICRO_MMA_TYPE_PEEL(func,func2,type,6); MICRO_MMA_TYPE_PEEL(func,func2,type,7); \ MICRO_MMA_TYPE_PEEL(func,func2,type,8); MICRO_MMA_TYPE_PEEL(func,func2,type,9); #define MICRO_MMA_UNROLL_TYPE_ONE(func, func2, type) \ type rhsV0; \ MICRO_MMA_TYPE_PEEL(func,func2,type,0); #define MICRO_MMA_ONE_PEEL \ if (sizeof(Scalar) == sizeof(float)) { \ MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \ } else { \ MICRO_MMA_UNROLL_TYPE_PEEL(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \ } \ rhs_ptr += (accRows * PEEL_MMA); #define MICRO_MMA_ONE \ if (sizeof(Scalar) == sizeof(float)) { \ MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, RhsPacket); \ } else { \ MICRO_MMA_UNROLL_TYPE_ONE(MICRO_MMA_WORK_ONE, MICRO_MMA_LOAD_ONE, __vector_pair); \ } \ rhs_ptr += accRows; #define MICRO_MMA_DST_PTR_ONE(iter) \ if (unroll_factor > iter) { \ bsetzeroMMA(&accZero##iter); \ } else { \ EIGEN_UNUSED_VARIABLE(accZero##iter); \ } #define MICRO_MMA_DST_PTR MICRO_MMA_UNROLL(MICRO_MMA_DST_PTR_ONE) #define MICRO_MMA_SRC_PTR_ONE(iter) \ if (unroll_factor > iter) { \ lhs_ptr##iter = lhs_base + ( (row/accCols) + iter )*strideA*accCols + accCols*offsetA; \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \ } #define MICRO_MMA_SRC_PTR MICRO_MMA_UNROLL(MICRO_MMA_SRC_PTR_ONE) #define MICRO_MMA_PREFETCH_ONE(iter) \ if (unroll_factor > iter) { \ EIGEN_POWER_PREFETCH(lhs_ptr##iter); \ } #define MICRO_MMA_PREFETCH MICRO_MMA_UNROLL(MICRO_MMA_PREFETCH_ONE) #define MICRO_MMA_STORE_ONE(iter) \ if (unroll_factor > iter) { \ storeAccumulator(row + iter*accCols, col, res, pAlpha, &accZero##iter); \ } #define MICRO_MMA_STORE MICRO_MMA_UNROLL(MICRO_MMA_STORE_ONE) template EIGEN_STRONG_INLINE void gemm_unrolled_MMA_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index& row, Index col, const Packet& pAlpha) { const Scalar* rhs_ptr = rhs_base; const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, * lhs_ptr7 = NULL; __vector_quad accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7; MICRO_MMA_SRC_PTR MICRO_MMA_DST_PTR Index k = 0; for(; k + PEEL_MMA <= depth; k+= PEEL_MMA) { EIGEN_POWER_PREFETCH(rhs_ptr); MICRO_MMA_PREFETCH MICRO_MMA_ONE_PEEL } for(; k < depth; k++) { MICRO_MMA_ONE } MICRO_MMA_STORE row += unroll_factor*accCols; } template void gemmMMA(const DataMapper& res, const Scalar* blockA, const Scalar* blockB, Index rows, Index depth, Index cols, Scalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index remaining_rows = rows % accCols; const Index remaining_cols = cols % accRows; if( strideA == -1 ) strideA = depth; if( strideB == -1 ) strideB = depth; const Packet pAlpha = pset1(alpha); const Packet pMask = bmask((const int)(remaining_rows)); Index col = 0; for(; col + accRows <= cols; col += accRows) { const Scalar* rhs_base = blockB + col*strideB + accRows*offsetB; const Scalar* lhs_base = blockA; Index row = 0; #define MAX_MMA_UNROLL 7 while(row + MAX_MMA_UNROLL*accCols <= rows) { gemm_unrolled_MMA_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); } switch( (rows-row)/accCols ) { #if MAX_MMA_UNROLL > 7 case 7: gemm_unrolled_MMA_iteration<7, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 6 case 6: gemm_unrolled_MMA_iteration<6, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 5 case 5: gemm_unrolled_MMA_iteration<5, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 4 case 4: gemm_unrolled_MMA_iteration<4, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 3 case 3: gemm_unrolled_MMA_iteration<3, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 2 case 2: gemm_unrolled_MMA_iteration<2, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_MMA_UNROLL > 1 case 1: gemm_unrolled_MMA_iteration<1, Scalar, Packet, RhsPacket, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif default: break; } #undef MAX_MMA_UNROLL if(remaining_rows > 0) { gemm_extra_row(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, rows, cols, remaining_rows, pAlpha, pMask); } } if(remaining_cols > 0) { const Scalar* rhs_base = blockB + col*strideB + remaining_cols*offsetB; const Scalar* lhs_base = blockA; for(; col < cols; col++) { Index row = 0; gemm_unrolled_col(res, lhs_base, rhs_base, depth, strideA, offsetA, row, rows, col, remaining_cols, pAlpha); if (remaining_rows > 0) { gemm_extra_col(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_rows, remaining_cols, pAlpha); } rhs_base++; } } } #define accColsC (accCols / 2) #define advanceRows ((LhsIsReal) ? 1 : 2) #define advanceCols ((RhsIsReal) ? 1 : 2) // PEEL_COMPLEX_MMA loop factor. #define PEEL_COMPLEX_MMA 7 #define MICRO_COMPLEX_MMA_UNROLL(func) \ func(0) func(1) func(2) func(3) func(4) #define MICRO_COMPLEX_MMA_LOAD_ONE(iter) \ if (unroll_factor > iter) { \ lhsV##iter = ploadLhs(lhs_ptr_real##iter); \ lhs_ptr_real##iter += accCols; \ if(!LhsIsReal) { \ lhsVi##iter = ploadLhs(lhs_ptr_imag##iter); \ lhs_ptr_imag##iter += accCols; \ } else { \ EIGEN_UNUSED_VARIABLE(lhsVi##iter); \ } \ } else { \ EIGEN_UNUSED_VARIABLE(lhsV##iter); \ EIGEN_UNUSED_VARIABLE(lhsVi##iter); \ } #define MICRO_COMPLEX_MMA_WORK_ONE(iter, type, peel) \ if (unroll_factor > iter) { \ pgercMMA(&accReal##iter, &accImag##iter, lhsV##iter, lhsVi##iter, rhsV##peel, rhsVi##peel); \ } #define MICRO_COMPLEX_MMA_TYPE_PEEL(func, func2, type, peel) \ if (PEEL_COMPLEX_MMA > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4; \ Packet lhsVi0, lhsVi1, lhsVi2, lhsVi3, lhsVi4; \ ploadRhsMMA(rhs_ptr_real + (accRows * peel), rhsV##peel); \ if(!RhsIsReal) { \ ploadRhsMMA(rhs_ptr_imag + (accRows * peel), rhsVi##peel); \ } else { \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } \ MICRO_COMPLEX_MMA_UNROLL(func2); \ func(0,type,peel) func(1,type,peel) func(2,type,peel) func(3,type,peel) func(4,type,peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } #define MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(func, func2, type) \ type rhsV0, rhsV1, rhsV2, rhsV3, rhsV4, rhsV5, rhsV6, rhsV7, rhsV8, rhsV9; \ type rhsVi0, rhsVi1, rhsVi2, rhsVi3, rhsVi4, rhsVi5, rhsVi6, rhsVi7, rhsVi8, rhsVi9; \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,1); \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,2); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,3); \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,4); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,5); \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,6); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,7); \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,8); MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,9); #define MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(func, func2, type) \ type rhsV0, rhsVi0; \ MICRO_COMPLEX_MMA_TYPE_PEEL(func,func2,type,0); #define MICRO_COMPLEX_MMA_ONE_PEEL \ if (sizeof(Scalar) == sizeof(float)) { \ MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \ } else { \ MICRO_COMPLEX_MMA_UNROLL_TYPE_PEEL(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \ } \ rhs_ptr_real += (accRows * PEEL_COMPLEX_MMA); \ if(!RhsIsReal) rhs_ptr_imag += (accRows * PEEL_COMPLEX_MMA); #define MICRO_COMPLEX_MMA_ONE \ if (sizeof(Scalar) == sizeof(float)) { \ MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, RhsPacket); \ } else { \ MICRO_COMPLEX_MMA_UNROLL_TYPE_ONE(MICRO_COMPLEX_MMA_WORK_ONE, MICRO_COMPLEX_MMA_LOAD_ONE, __vector_pair); \ } \ rhs_ptr_real += accRows; \ if(!RhsIsReal) rhs_ptr_imag += accRows; #define MICRO_COMPLEX_MMA_DST_PTR_ONE(iter) \ if (unroll_factor > iter) { \ bsetzeroMMA(&accReal##iter); \ bsetzeroMMA(&accImag##iter); \ } else { \ EIGEN_UNUSED_VARIABLE(accReal##iter); \ EIGEN_UNUSED_VARIABLE(accImag##iter); \ } #define MICRO_COMPLEX_MMA_DST_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_DST_PTR_ONE) #define MICRO_COMPLEX_MMA_SRC_PTR_ONE(iter) \ if (unroll_factor > iter) { \ lhs_ptr_real##iter = lhs_base + ( ((advanceRows*row)/accCols) + iter*advanceRows )*strideA*accCols + accCols*offsetA; \ if(!LhsIsReal) { \ lhs_ptr_imag##iter = lhs_ptr_real##iter + accCols*strideA; \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \ } \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr_real##iter); \ EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \ } #define MICRO_COMPLEX_MMA_SRC_PTR MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_SRC_PTR_ONE) #define MICRO_COMPLEX_MMA_PREFETCH_ONE(iter) \ if (unroll_factor > iter) { \ EIGEN_POWER_PREFETCH(lhs_ptr_real##iter); \ if(!LhsIsReal) { \ EIGEN_POWER_PREFETCH(lhs_ptr_imag##iter); \ } \ } #define MICRO_COMPLEX_MMA_PREFETCH MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_PREFETCH_ONE) #define MICRO_COMPLEX_MMA_STORE_ONE(iter) \ if (unroll_factor > iter) { \ storeComplexAccumulator(row + iter*accCols, col, res, pAlphaReal, pAlphaImag, &accReal##iter, &accImag##iter); \ } #define MICRO_COMPLEX_MMA_STORE MICRO_COMPLEX_MMA_UNROLL(MICRO_COMPLEX_MMA_STORE_ONE) template EIGEN_STRONG_INLINE void gemm_complex_unrolled_MMA_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index& row, Index col, const Packet& pAlphaReal, const Packet& pAlphaImag) { const Scalar* rhs_ptr_real = rhs_base; const Scalar* rhs_ptr_imag; if(!RhsIsReal) { rhs_ptr_imag = rhs_base + accRows*strideB; } else { EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); } const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL; const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL; const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL; __vector_quad accReal0, accImag0, accReal1, accImag1, accReal2, accImag2, accReal3, accImag3, accReal4, accImag4; MICRO_COMPLEX_MMA_SRC_PTR MICRO_COMPLEX_MMA_DST_PTR Index k = 0; for(; k + PEEL_COMPLEX_MMA <= depth; k+= PEEL_COMPLEX_MMA) { EIGEN_POWER_PREFETCH(rhs_ptr_real); if(!RhsIsReal) { EIGEN_POWER_PREFETCH(rhs_ptr_imag); } MICRO_COMPLEX_MMA_PREFETCH MICRO_COMPLEX_MMA_ONE_PEEL } for(; k < depth; k++) { MICRO_COMPLEX_MMA_ONE } MICRO_COMPLEX_MMA_STORE row += unroll_factor*accCols; } template void gemm_complexMMA(const DataMapper& res, const LhsScalar* blockAc, const RhsScalar* blockBc, Index rows, Index depth, Index cols, Scalarc alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index remaining_rows = rows % accCols; const Index remaining_cols = cols % accRows; if( strideA == -1 ) strideA = depth; if( strideB == -1 ) strideB = depth; const Packet pAlphaReal = pset1(alpha.real()); const Packet pAlphaImag = pset1(alpha.imag()); const Packet pMask = bmask((const int)(remaining_rows)); const Scalar* blockA = (Scalar *) blockAc; const Scalar* blockB = (Scalar *) blockBc; Index col = 0; for(; col + accRows <= cols; col += accRows) { const Scalar* rhs_base = blockB + advanceCols*col*strideB + accRows*offsetB; const Scalar* lhs_base = blockA; Index row = 0; #define MAX_COMPLEX_MMA_UNROLL 4 while(row + MAX_COMPLEX_MMA_UNROLL*accCols <= rows) { gemm_complex_unrolled_MMA_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); } switch( (rows-row)/accCols ) { #if MAX_COMPLEX_MMA_UNROLL > 4 case 4: gemm_complex_unrolled_MMA_iteration<4, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_MMA_UNROLL > 3 case 3: gemm_complex_unrolled_MMA_iteration<3, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_MMA_UNROLL > 2 case 2: gemm_complex_unrolled_MMA_iteration<2, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_MMA_UNROLL > 1 case 1: gemm_complex_unrolled_MMA_iteration<1, Scalar, Packet, Packetc, RhsPacket, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif default: break; } #undef MAX_COMPLEX_MMA_UNROLL if(remaining_rows > 0) { gemm_complex_extra_row(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask); } } if(remaining_cols > 0) { const Scalar* rhs_base = blockB + advanceCols*col*strideB + remaining_cols*offsetB; const Scalar* lhs_base = blockA; for(; col < cols; col++) { Index row = 0; gemm_complex_unrolled_col(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, rows, col, remaining_cols, pAlphaReal, pAlphaImag); if (remaining_rows > 0) { gemm_complex_extra_col(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_rows, remaining_cols, pAlphaReal, pAlphaImag); } rhs_base++; } } } #undef accColsC #undef advanceRows #undef advanceCols #pragma GCC reset_options } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATRIX_PRODUCT_MMA_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/MatrixProduct.h0000644000176200001440000035107314562417221024473 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2020 Everton Constantino (everton.constantino@ibm.com) // Copyright (C) 2021 Chip Kerchner (chip.kerchner@ibm.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIX_PRODUCT_ALTIVEC_H #define EIGEN_MATRIX_PRODUCT_ALTIVEC_H #ifndef EIGEN_ALTIVEC_USE_CUSTOM_PACK #define EIGEN_ALTIVEC_USE_CUSTOM_PACK 1 #endif #include "MatrixProductCommon.h" // Since LLVM doesn't support dynamic dispatching, force either always MMA or VSX #if EIGEN_COMP_LLVM #if !defined(EIGEN_ALTIVEC_DISABLE_MMA) && !defined(EIGEN_ALTIVEC_MMA_ONLY) #ifdef __MMA__ #define EIGEN_ALTIVEC_MMA_ONLY #else #define EIGEN_ALTIVEC_DISABLE_MMA #endif #endif #endif #ifdef __has_builtin #if __has_builtin(__builtin_mma_assemble_acc) #define ALTIVEC_MMA_SUPPORT #endif #endif #if defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) #include "MatrixProductMMA.h" #endif /************************************************************************************************** * TODO * * - Check StorageOrder on dhs_pack (the innermost second loop seems unvectorized when it could). * * - Check the possibility of transposing as GETREAL and GETIMAG when needed. * **************************************************************************************************/ namespace Eigen { namespace internal { /************************** * Constants and typedefs * **************************/ template struct quad_traits { typedef typename packet_traits::type vectortype; typedef PacketBlock type; typedef vectortype rhstype; enum { vectorsize = packet_traits::size, size = 4, rows = 4 }; }; template<> struct quad_traits { typedef Packet2d vectortype; typedef PacketBlock type; typedef PacketBlock rhstype; enum { vectorsize = packet_traits::size, size = 2, rows = 4 }; }; // MatrixProduct decomposes real/imaginary vectors into a real vector and an imaginary vector, this turned out // to be faster than Eigen's usual approach of having real/imaginary pairs on a single vector. This constants then // are responsible to extract from convert between Eigen's and MatrixProduct approach. const static Packet16uc p16uc_GETREAL32 = { 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27}; const static Packet16uc p16uc_GETIMAG32 = { 4, 5, 6, 7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31}; const static Packet16uc p16uc_GETREAL64 = { 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}; //[a,ai],[b,bi] = [ai,bi] const static Packet16uc p16uc_GETIMAG64 = { 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}; /********************************************* * Single precision real and complex packing * * *******************************************/ /** * Symm packing is related to packing of symmetric adjoint blocks, as expected the packing leaves * the diagonal real, whatever is below it is copied from the respective upper diagonal element and * conjugated. There's no PanelMode available for symm packing. * * Packing in general is supposed to leave the lhs block and the rhs block easy to be read by gemm using * its respective rank-update instructions. The float32/64 versions are different because at this moment * the size of the accumulator is fixed at 512-bits so you can't have a 4x4 accumulator of 64-bit elements. * * As mentioned earlier MatrixProduct breaks complex numbers into a real vector and a complex vector so packing has * to take that into account, at the moment, we run pack the real part and then the imaginary part, this is the main * reason why packing for complex is broken down into several different parts, also the reason why we endup having a * float32/64 and complex float32/64 version. **/ template EIGEN_ALWAYS_INLINE std::complex getAdjointVal(Index i, Index j, const_blas_data_mapper, Index, StorageOrder>& dt) { std::complex v; if(i < j) { v.real( dt(j,i).real()); v.imag(-dt(j,i).imag()); } else if(i > j) { v.real( dt(i,j).real()); v.imag( dt(i,j).imag()); } else { v.real( dt(i,j).real()); v.imag((Scalar)0.0); } return v; } template EIGEN_STRONG_INLINE void symm_pack_complex_rhs_helper(std::complex* blockB, const std::complex* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { const Index depth = k2 + rows; const_blas_data_mapper, Index, StorageOrder> rhs(_rhs, rhsStride); const Index vectorSize = N*quad_traits::vectorsize; const Index vectorDelta = vectorSize * rows; Scalar* blockBf = reinterpret_cast(blockB); Index rir = 0, rii, j = 0; for(; j + vectorSize <= cols; j+=vectorSize) { rii = rir + vectorDelta; for(Index i = k2; i < depth; i++) { for(Index k = 0; k < vectorSize; k++) { std::complex v = getAdjointVal(i, j + k, rhs); blockBf[rir + k] = v.real(); blockBf[rii + k] = v.imag(); } rir += vectorSize; rii += vectorSize; } rir += vectorDelta; } if (j < cols) { rii = rir + ((cols - j) * rows); for(Index i = k2; i < depth; i++) { Index k = j; for(; k < cols; k++) { std::complex v = getAdjointVal(i, k, rhs); blockBf[rir] = v.real(); blockBf[rii] = v.imag(); rir += 1; rii += 1; } } } } template EIGEN_STRONG_INLINE void symm_pack_complex_lhs_helper(std::complex* blockA, const std::complex* _lhs, Index lhsStride, Index cols, Index rows) { const Index depth = cols; const_blas_data_mapper, Index, StorageOrder> lhs(_lhs, lhsStride); const Index vectorSize = quad_traits::vectorsize; const Index vectorDelta = vectorSize * depth; Scalar* blockAf = (Scalar *)(blockA); Index rir = 0, rii, j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { rii = rir + vectorDelta; for(Index i = 0; i < depth; i++) { for(Index k = 0; k < vectorSize; k++) { std::complex v = getAdjointVal(j+k, i, lhs); blockAf[rir + k] = v.real(); blockAf[rii + k] = v.imag(); } rir += vectorSize; rii += vectorSize; } rir += vectorDelta; } if (j < rows) { rii = rir + ((rows - j) * depth); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { std::complex v = getAdjointVal(k, i, lhs); blockAf[rir] = v.real(); blockAf[rii] = v.imag(); rir += 1; rii += 1; } } } } template EIGEN_STRONG_INLINE void symm_pack_rhs_helper(Scalar* blockB, const Scalar* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { const Index depth = k2 + rows; const_blas_data_mapper rhs(_rhs, rhsStride); const Index vectorSize = quad_traits::vectorsize; Index ri = 0, j = 0; for(; j + N*vectorSize <= cols; j+=N*vectorSize) { Index i = k2; for(; i < depth; i++) { for(Index k = 0; k < N*vectorSize; k++) { if(i <= j+k) blockB[ri + k] = rhs(j+k, i); else blockB[ri + k] = rhs(i, j+k); } ri += N*vectorSize; } } if (j < cols) { for(Index i = k2; i < depth; i++) { Index k = j; for(; k < cols; k++) { if(k <= i) blockB[ri] = rhs(i, k); else blockB[ri] = rhs(k, i); ri += 1; } } } } template EIGEN_STRONG_INLINE void symm_pack_lhs_helper(Scalar* blockA, const Scalar* _lhs, Index lhsStride, Index cols, Index rows) { const Index depth = cols; const_blas_data_mapper lhs(_lhs, lhsStride); const Index vectorSize = quad_traits::vectorsize; Index ri = 0, j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { Index i = 0; for(; i < depth; i++) { for(Index k = 0; k < vectorSize; k++) { if(i <= j+k) blockA[ri + k] = lhs(j+k, i); else blockA[ri + k] = lhs(i, j+k); } ri += vectorSize; } } if (j < rows) { for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { if(i <= k) blockA[ri] = lhs(k, i); else blockA[ri] = lhs(i, k); ri += 1; } } } } template struct symm_pack_rhs, Index, nr, StorageOrder> { void operator()(std::complex* blockB, const std::complex* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { symm_pack_complex_rhs_helper(blockB, _rhs, rhsStride, rows, cols, k2); } }; template struct symm_pack_lhs, Index, Pack1, Pack2_dummy, StorageOrder> { void operator()(std::complex* blockA, const std::complex* _lhs, Index lhsStride, Index cols, Index rows) { symm_pack_complex_lhs_helper(blockA, _lhs, lhsStride, cols, rows); } }; // *********** symm_pack std::complex *********** template struct symm_pack_rhs, Index, nr, StorageOrder> { void operator()(std::complex* blockB, const std::complex* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { symm_pack_complex_rhs_helper(blockB, _rhs, rhsStride, rows, cols, k2); } }; template struct symm_pack_lhs, Index, Pack1, Pack2_dummy, StorageOrder> { void operator()(std::complex* blockA, const std::complex* _lhs, Index lhsStride, Index cols, Index rows) { symm_pack_complex_lhs_helper(blockA, _lhs, lhsStride, cols, rows); } }; // *********** symm_pack float32 *********** template struct symm_pack_rhs { void operator()(float* blockB, const float* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { symm_pack_rhs_helper(blockB, _rhs, rhsStride, rows, cols, k2); } }; template struct symm_pack_lhs { void operator()(float* blockA, const float* _lhs, Index lhsStride, Index cols, Index rows) { symm_pack_lhs_helper(blockA, _lhs, lhsStride, cols, rows); } }; // *********** symm_pack float64 *********** template struct symm_pack_rhs { void operator()(double* blockB, const double* _rhs, Index rhsStride, Index rows, Index cols, Index k2) { symm_pack_rhs_helper(blockB, _rhs, rhsStride, rows, cols, k2); } }; template struct symm_pack_lhs { void operator()(double* blockA, const double* _lhs, Index lhsStride, Index cols, Index rows) { symm_pack_lhs_helper(blockA, _lhs, lhsStride, cols, rows); } }; /** * PanelMode * Packing might be called several times before being multiplied by gebp_kernel, this happens because * on special occasions it fills part of block with other parts of the matrix. Two variables control * how PanelMode should behave: offset and stride. The idea is that those variables represent whatever * is going to be the real offset and stride in the future and this is what you should obey. The process * is to behave as you would with normal packing but leave the start of each part with the correct offset * and the end as well respecting the real stride the block will have. Gebp is aware of both blocks stride * and offset and behaves accordingly. **/ template EIGEN_ALWAYS_INLINE void storeBlock(Scalar* to, PacketBlock& block) { const Index size = 16 / sizeof(Scalar); pstore(to + (0 * size), block.packet[0]); pstore(to + (1 * size), block.packet[1]); pstore(to + (2 * size), block.packet[2]); pstore(to + (3 * size), block.packet[3]); } template EIGEN_ALWAYS_INLINE void storeBlock(Scalar* to, PacketBlock& block) { const Index size = 16 / sizeof(Scalar); pstore(to + (0 * size), block.packet[0]); pstore(to + (1 * size), block.packet[1]); } // General template for lhs & rhs complex packing. template struct dhs_cpack { EIGEN_STRONG_INLINE void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; const Index vectorDelta = vectorSize * ((PanelMode) ? stride : depth); Index rir = ((PanelMode) ? (vectorSize*offset) : 0), rii; Scalar* blockAt = reinterpret_cast(blockA); Index j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { Index i = 0; rii = rir + vectorDelta; for(; i + vectorSize <= depth; i+=vectorSize) { PacketBlock blockr, blocki; PacketBlock cblock; if (UseLhs) { bload(cblock, lhs, j, i); } else { bload(cblock, lhs, i, j); } blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[4].v, p16uc_GETREAL32); blockr.packet[1] = vec_perm(cblock.packet[1].v, cblock.packet[5].v, p16uc_GETREAL32); blockr.packet[2] = vec_perm(cblock.packet[2].v, cblock.packet[6].v, p16uc_GETREAL32); blockr.packet[3] = vec_perm(cblock.packet[3].v, cblock.packet[7].v, p16uc_GETREAL32); blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[4].v, p16uc_GETIMAG32); blocki.packet[1] = vec_perm(cblock.packet[1].v, cblock.packet[5].v, p16uc_GETIMAG32); blocki.packet[2] = vec_perm(cblock.packet[2].v, cblock.packet[6].v, p16uc_GETIMAG32); blocki.packet[3] = vec_perm(cblock.packet[3].v, cblock.packet[7].v, p16uc_GETIMAG32); if(Conjugate) { blocki.packet[0] = -blocki.packet[0]; blocki.packet[1] = -blocki.packet[1]; blocki.packet[2] = -blocki.packet[2]; blocki.packet[3] = -blocki.packet[3]; } if(((StorageOrder == RowMajor) && UseLhs) || (((StorageOrder == ColMajor) && !UseLhs))) { ptranspose(blockr); ptranspose(blocki); } storeBlock(blockAt + rir, blockr); storeBlock(blockAt + rii, blocki); rir += 4*vectorSize; rii += 4*vectorSize; } for(; i < depth; i++) { PacketBlock blockr, blocki; PacketBlock cblock; if(((StorageOrder == ColMajor) && UseLhs) || (((StorageOrder == RowMajor) && !UseLhs))) { if (UseLhs) { cblock.packet[0] = lhs.template loadPacket(j + 0, i); cblock.packet[1] = lhs.template loadPacket(j + 2, i); } else { cblock.packet[0] = lhs.template loadPacket(i, j + 0); cblock.packet[1] = lhs.template loadPacket(i, j + 2); } } else { std::complex lhs0, lhs1; if (UseLhs) { lhs0 = lhs(j + 0, i); lhs1 = lhs(j + 1, i); cblock.packet[0] = pload2(&lhs0, &lhs1); lhs0 = lhs(j + 2, i); lhs1 = lhs(j + 3, i); cblock.packet[1] = pload2(&lhs0, &lhs1); } else { lhs0 = lhs(i, j + 0); lhs1 = lhs(i, j + 1); cblock.packet[0] = pload2(&lhs0, &lhs1); lhs0 = lhs(i, j + 2); lhs1 = lhs(i, j + 3); cblock.packet[1] = pload2(&lhs0, &lhs1); } } blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETREAL32); blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETIMAG32); if(Conjugate) { blocki.packet[0] = -blocki.packet[0]; } pstore(blockAt + rir, blockr.packet[0]); pstore(blockAt + rii, blocki.packet[0]); rir += vectorSize; rii += vectorSize; } rir += ((PanelMode) ? (vectorSize*(2*stride - depth)) : vectorDelta); } if (j < rows) { if(PanelMode) rir += (offset*(rows - j - vectorSize)); rii = rir + (((PanelMode) ? stride : depth) * (rows - j)); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { if (UseLhs) { blockAt[rir] = lhs(k, i).real(); if(Conjugate) blockAt[rii] = -lhs(k, i).imag(); else blockAt[rii] = lhs(k, i).imag(); } else { blockAt[rir] = lhs(i, k).real(); if(Conjugate) blockAt[rii] = -lhs(i, k).imag(); else blockAt[rii] = lhs(i, k).imag(); } rir += 1; rii += 1; } } } } }; // General template for lhs & rhs packing. template struct dhs_pack{ EIGEN_STRONG_INLINE void operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; Index ri = 0, j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { Index i = 0; if(PanelMode) ri += vectorSize*offset; for(; i + vectorSize <= depth; i+=vectorSize) { PacketBlock block; if (UseLhs) { bload(block, lhs, j, i); } else { bload(block, lhs, i, j); } if(((StorageOrder == RowMajor) && UseLhs) || ((StorageOrder == ColMajor) && !UseLhs)) { ptranspose(block); } storeBlock(blockA + ri, block); ri += 4*vectorSize; } for(; i < depth; i++) { if(((StorageOrder == RowMajor) && UseLhs) || ((StorageOrder == ColMajor) && !UseLhs)) { if (UseLhs) { blockA[ri+0] = lhs(j+0, i); blockA[ri+1] = lhs(j+1, i); blockA[ri+2] = lhs(j+2, i); blockA[ri+3] = lhs(j+3, i); } else { blockA[ri+0] = lhs(i, j+0); blockA[ri+1] = lhs(i, j+1); blockA[ri+2] = lhs(i, j+2); blockA[ri+3] = lhs(i, j+3); } } else { Packet lhsV; if (UseLhs) { lhsV = lhs.template loadPacket(j, i); } else { lhsV = lhs.template loadPacket(i, j); } pstore(blockA + ri, lhsV); } ri += vectorSize; } if(PanelMode) ri += vectorSize*(stride - offset - depth); } if (j < rows) { if(PanelMode) ri += offset*(rows - j); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { if (UseLhs) { blockA[ri] = lhs(k, i); } else { blockA[ri] = lhs(i, k); } ri += 1; } } } } }; // General template for lhs packing, float64 specialization. template struct dhs_pack { EIGEN_STRONG_INLINE void operator()(double* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; Index ri = 0, j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { Index i = 0; if(PanelMode) ri += vectorSize*offset; for(; i + vectorSize <= depth; i+=vectorSize) { PacketBlock block; if(StorageOrder == RowMajor) { block.packet[0] = lhs.template loadPacket(j + 0, i); block.packet[1] = lhs.template loadPacket(j + 1, i); ptranspose(block); } else { block.packet[0] = lhs.template loadPacket(j, i + 0); block.packet[1] = lhs.template loadPacket(j, i + 1); } storeBlock(blockA + ri, block); ri += 2*vectorSize; } for(; i < depth; i++) { if(StorageOrder == RowMajor) { blockA[ri+0] = lhs(j+0, i); blockA[ri+1] = lhs(j+1, i); } else { Packet2d lhsV = lhs.template loadPacket(j, i); pstore(blockA + ri, lhsV); } ri += vectorSize; } if(PanelMode) ri += vectorSize*(stride - offset - depth); } if (j < rows) { if(PanelMode) ri += offset*(rows - j); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { blockA[ri] = lhs(k, i); ri += 1; } } } } }; // General template for rhs packing, float64 specialization. template struct dhs_pack { EIGEN_STRONG_INLINE void operator()(double* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; Index ri = 0, j = 0; for(; j + 2*vectorSize <= cols; j+=2*vectorSize) { Index i = 0; if(PanelMode) ri += offset*(2*vectorSize); for(; i + vectorSize <= depth; i+=vectorSize) { PacketBlock block; if(StorageOrder == ColMajor) { PacketBlock block1, block2; block1.packet[0] = rhs.template loadPacket(i, j + 0); block1.packet[1] = rhs.template loadPacket(i, j + 1); block2.packet[0] = rhs.template loadPacket(i, j + 2); block2.packet[1] = rhs.template loadPacket(i, j + 3); ptranspose(block1); ptranspose(block2); pstore(blockB + ri , block1.packet[0]); pstore(blockB + ri + 2, block2.packet[0]); pstore(blockB + ri + 4, block1.packet[1]); pstore(blockB + ri + 6, block2.packet[1]); } else { block.packet[0] = rhs.template loadPacket(i + 0, j + 0); //[a1 a2] block.packet[1] = rhs.template loadPacket(i + 0, j + 2); //[a3 a4] block.packet[2] = rhs.template loadPacket(i + 1, j + 0); //[b1 b2] block.packet[3] = rhs.template loadPacket(i + 1, j + 2); //[b3 b4] storeBlock(blockB + ri, block); } ri += 4*vectorSize; } for(; i < depth; i++) { if(StorageOrder == ColMajor) { blockB[ri+0] = rhs(i, j+0); blockB[ri+1] = rhs(i, j+1); ri += vectorSize; blockB[ri+0] = rhs(i, j+2); blockB[ri+1] = rhs(i, j+3); } else { Packet2d rhsV = rhs.template loadPacket(i, j); pstore(blockB + ri, rhsV); ri += vectorSize; rhsV = rhs.template loadPacket(i, j + 2); pstore(blockB + ri, rhsV); } ri += vectorSize; } if(PanelMode) ri += (2*vectorSize)*(stride - offset - depth); } if (j < cols) { if(PanelMode) ri += offset*(cols - j); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < cols; k++) { blockB[ri] = rhs(i, k); ri += 1; } } } } }; // General template for lhs complex packing, float64 specialization. template struct dhs_cpack { EIGEN_STRONG_INLINE void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; const Index vectorDelta = vectorSize * ((PanelMode) ? stride : depth); Index rir = ((PanelMode) ? (vectorSize*offset) : 0), rii; double* blockAt = reinterpret_cast(blockA); Index j = 0; for(; j + vectorSize <= rows; j+=vectorSize) { Index i = 0; rii = rir + vectorDelta; for(; i + vectorSize <= depth; i+=vectorSize) { PacketBlock blockr, blocki; PacketBlock cblock; if(StorageOrder == ColMajor) { cblock.packet[0] = lhs.template loadPacket(j, i + 0); //[a1 a1i] cblock.packet[1] = lhs.template loadPacket(j, i + 1); //[b1 b1i] cblock.packet[2] = lhs.template loadPacket(j + 1, i + 0); //[a2 a2i] cblock.packet[3] = lhs.template loadPacket(j + 1, i + 1); //[b2 b2i] blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[2].v, p16uc_GETREAL64); //[a1 a2] blockr.packet[1] = vec_perm(cblock.packet[1].v, cblock.packet[3].v, p16uc_GETREAL64); //[b1 b2] blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[2].v, p16uc_GETIMAG64); blocki.packet[1] = vec_perm(cblock.packet[1].v, cblock.packet[3].v, p16uc_GETIMAG64); } else { cblock.packet[0] = lhs.template loadPacket(j + 0, i); //[a1 a1i] cblock.packet[1] = lhs.template loadPacket(j + 1, i); //[a2 a2i] cblock.packet[2] = lhs.template loadPacket(j + 0, i + 1); //[b1 b1i] cblock.packet[3] = lhs.template loadPacket(j + 1, i + 1); //[b2 b2i blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETREAL64); //[a1 a2] blockr.packet[1] = vec_perm(cblock.packet[2].v, cblock.packet[3].v, p16uc_GETREAL64); //[b1 b2] blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETIMAG64); blocki.packet[1] = vec_perm(cblock.packet[2].v, cblock.packet[3].v, p16uc_GETIMAG64); } if(Conjugate) { blocki.packet[0] = -blocki.packet[0]; blocki.packet[1] = -blocki.packet[1]; } storeBlock(blockAt + rir, blockr); storeBlock(blockAt + rii, blocki); rir += 2*vectorSize; rii += 2*vectorSize; } for(; i < depth; i++) { PacketBlock blockr, blocki; PacketBlock cblock; cblock.packet[0] = lhs.template loadPacket(j + 0, i); cblock.packet[1] = lhs.template loadPacket(j + 1, i); blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETREAL64); blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETIMAG64); if(Conjugate) { blocki.packet[0] = -blocki.packet[0]; } pstore(blockAt + rir, blockr.packet[0]); pstore(blockAt + rii, blocki.packet[0]); rir += vectorSize; rii += vectorSize; } rir += ((PanelMode) ? (vectorSize*(2*stride - depth)) : vectorDelta); } if (j < rows) { if(PanelMode) rir += (offset*(rows - j - vectorSize)); rii = rir + (((PanelMode) ? stride : depth) * (rows - j)); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < rows; k++) { blockAt[rir] = lhs(k, i).real(); if(Conjugate) blockAt[rii] = -lhs(k, i).imag(); else blockAt[rii] = lhs(k, i).imag(); rir += 1; rii += 1; } } } } }; // General template for rhs complex packing, float64 specialization. template struct dhs_cpack { EIGEN_STRONG_INLINE void operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { const Index vectorSize = quad_traits::vectorsize; const Index vectorDelta = 2*vectorSize * ((PanelMode) ? stride : depth); Index rir = ((PanelMode) ? (2*vectorSize*offset) : 0), rii; double* blockBt = reinterpret_cast(blockB); Index j = 0; for(; j + 2*vectorSize <= cols; j+=2*vectorSize) { Index i = 0; rii = rir + vectorDelta; for(; i < depth; i++) { PacketBlock cblock; PacketBlock blockr, blocki; bload(cblock, rhs, i, j); blockr.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETREAL64); blockr.packet[1] = vec_perm(cblock.packet[2].v, cblock.packet[3].v, p16uc_GETREAL64); blocki.packet[0] = vec_perm(cblock.packet[0].v, cblock.packet[1].v, p16uc_GETIMAG64); blocki.packet[1] = vec_perm(cblock.packet[2].v, cblock.packet[3].v, p16uc_GETIMAG64); if(Conjugate) { blocki.packet[0] = -blocki.packet[0]; blocki.packet[1] = -blocki.packet[1]; } storeBlock(blockBt + rir, blockr); storeBlock(blockBt + rii, blocki); rir += 2*vectorSize; rii += 2*vectorSize; } rir += ((PanelMode) ? (2*vectorSize*(2*stride - depth)) : vectorDelta); } if (j < cols) { if(PanelMode) rir += (offset*(cols - j - 2*vectorSize)); rii = rir + (((PanelMode) ? stride : depth) * (cols - j)); for(Index i = 0; i < depth; i++) { Index k = j; for(; k < cols; k++) { blockBt[rir] = rhs(i, k).real(); if(Conjugate) blockBt[rii] = -rhs(i, k).imag(); else blockBt[rii] = rhs(i, k).imag(); rir += 1; rii += 1; } } } } }; /************** * GEMM utils * **************/ // 512-bits rank1-update of acc. It can either positive or negative accumulate (useful for complex gemm). template EIGEN_ALWAYS_INLINE void pger_common(PacketBlock* acc, const Packet& lhsV, const Packet* rhsV) { if(NegativeAccumulate) { acc->packet[0] = vec_nmsub(lhsV, rhsV[0], acc->packet[0]); acc->packet[1] = vec_nmsub(lhsV, rhsV[1], acc->packet[1]); acc->packet[2] = vec_nmsub(lhsV, rhsV[2], acc->packet[2]); acc->packet[3] = vec_nmsub(lhsV, rhsV[3], acc->packet[3]); } else { acc->packet[0] = vec_madd(lhsV, rhsV[0], acc->packet[0]); acc->packet[1] = vec_madd(lhsV, rhsV[1], acc->packet[1]); acc->packet[2] = vec_madd(lhsV, rhsV[2], acc->packet[2]); acc->packet[3] = vec_madd(lhsV, rhsV[3], acc->packet[3]); } } template EIGEN_ALWAYS_INLINE void pger_common(PacketBlock* acc, const Packet& lhsV, const Packet* rhsV) { if(NegativeAccumulate) { acc->packet[0] = vec_nmsub(lhsV, rhsV[0], acc->packet[0]); } else { acc->packet[0] = vec_madd(lhsV, rhsV[0], acc->packet[0]); } } template EIGEN_ALWAYS_INLINE void pger(PacketBlock* acc, const Scalar* lhs, const Packet* rhsV) { Packet lhsV = pload(lhs); pger_common(acc, lhsV, rhsV); } template EIGEN_ALWAYS_INLINE void loadPacketRemaining(const Scalar* lhs, Packet &lhsV, Index remaining_rows) { #ifdef _ARCH_PWR9 lhsV = vec_xl_len((Scalar *)lhs, remaining_rows * sizeof(Scalar)); #else Index i = 0; do { lhsV[i] = lhs[i]; } while (++i < remaining_rows); #endif } template EIGEN_ALWAYS_INLINE void pger(PacketBlock* acc, const Scalar* lhs, const Packet* rhsV, Index remaining_rows) { Packet lhsV; loadPacketRemaining(lhs, lhsV, remaining_rows); pger_common(acc, lhsV, rhsV); } // 512-bits rank1-update of complex acc. It takes decoupled accumulators as entries. It also takes cares of mixed types real * complex and complex * real. template EIGEN_ALWAYS_INLINE void pgerc_common(PacketBlock* accReal, PacketBlock* accImag, const Packet &lhsV, const Packet &lhsVi, const Packet* rhsV, const Packet* rhsVi) { pger_common(accReal, lhsV, rhsV); if(LhsIsReal) { pger_common(accImag, lhsV, rhsVi); EIGEN_UNUSED_VARIABLE(lhsVi); } else { if (!RhsIsReal) { pger_common(accReal, lhsVi, rhsVi); pger_common(accImag, lhsV, rhsVi); } else { EIGEN_UNUSED_VARIABLE(rhsVi); } pger_common(accImag, lhsVi, rhsV); } } template EIGEN_ALWAYS_INLINE void pgerc(PacketBlock* accReal, PacketBlock* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi) { Packet lhsV = ploadLhs(lhs_ptr); Packet lhsVi; if(!LhsIsReal) lhsVi = ploadLhs(lhs_ptr_imag); else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); pgerc_common(accReal, accImag, lhsV, lhsVi, rhsV, rhsVi); } template EIGEN_ALWAYS_INLINE void loadPacketRemaining(const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, Packet &lhsV, Packet &lhsVi, Index remaining_rows) { #ifdef _ARCH_PWR9 lhsV = vec_xl_len((Scalar *)lhs_ptr, remaining_rows * sizeof(Scalar)); if(!LhsIsReal) lhsVi = vec_xl_len((Scalar *)lhs_ptr_imag, remaining_rows * sizeof(Scalar)); else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); #else Index i = 0; do { lhsV[i] = lhs_ptr[i]; if(!LhsIsReal) lhsVi[i] = lhs_ptr_imag[i]; } while (++i < remaining_rows); if(LhsIsReal) EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); #endif } template EIGEN_ALWAYS_INLINE void pgerc(PacketBlock* accReal, PacketBlock* accImag, const Scalar* lhs_ptr, const Scalar* lhs_ptr_imag, const Packet* rhsV, const Packet* rhsVi, Index remaining_rows) { Packet lhsV, lhsVi; loadPacketRemaining(lhs_ptr, lhs_ptr_imag, lhsV, lhsVi, remaining_rows); pgerc_common(accReal, accImag, lhsV, lhsVi, rhsV, rhsVi); } template EIGEN_ALWAYS_INLINE Packet ploadLhs(const Scalar* lhs) { return ploadu(lhs); } // Zero the accumulator on PacketBlock. template EIGEN_ALWAYS_INLINE void bsetzero(PacketBlock& acc) { acc.packet[0] = pset1((Scalar)0); acc.packet[1] = pset1((Scalar)0); acc.packet[2] = pset1((Scalar)0); acc.packet[3] = pset1((Scalar)0); } template EIGEN_ALWAYS_INLINE void bsetzero(PacketBlock& acc) { acc.packet[0] = pset1((Scalar)0); } // Scale the PacketBlock vectors by alpha. template EIGEN_ALWAYS_INLINE void bscale(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha) { acc.packet[0] = pmadd(pAlpha, accZ.packet[0], acc.packet[0]); acc.packet[1] = pmadd(pAlpha, accZ.packet[1], acc.packet[1]); acc.packet[2] = pmadd(pAlpha, accZ.packet[2], acc.packet[2]); acc.packet[3] = pmadd(pAlpha, accZ.packet[3], acc.packet[3]); } template EIGEN_ALWAYS_INLINE void bscale(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha) { acc.packet[0] = pmadd(pAlpha, accZ.packet[0], acc.packet[0]); } template EIGEN_ALWAYS_INLINE void bscalec_common(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha) { acc.packet[0] = pmul(accZ.packet[0], pAlpha); acc.packet[1] = pmul(accZ.packet[1], pAlpha); acc.packet[2] = pmul(accZ.packet[2], pAlpha); acc.packet[3] = pmul(accZ.packet[3], pAlpha); } template EIGEN_ALWAYS_INLINE void bscalec_common(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha) { acc.packet[0] = pmul(accZ.packet[0], pAlpha); } // Complex version of PacketBlock scaling. template EIGEN_ALWAYS_INLINE void bscalec(PacketBlock& aReal, PacketBlock& aImag, const Packet& bReal, const Packet& bImag, PacketBlock& cReal, PacketBlock& cImag) { bscalec_common(cReal, aReal, bReal); bscalec_common(cImag, aImag, bReal); pger_common(&cReal, bImag, aImag.packet); pger_common(&cImag, bImag, aReal.packet); } template EIGEN_ALWAYS_INLINE void band(PacketBlock& acc, const Packet& pMask) { acc.packet[0] = pand(acc.packet[0], pMask); acc.packet[1] = pand(acc.packet[1], pMask); acc.packet[2] = pand(acc.packet[2], pMask); acc.packet[3] = pand(acc.packet[3], pMask); } template EIGEN_ALWAYS_INLINE void bscalec(PacketBlock& aReal, PacketBlock& aImag, const Packet& bReal, const Packet& bImag, PacketBlock& cReal, PacketBlock& cImag, const Packet& pMask) { band(aReal, pMask); band(aImag, pMask); bscalec(aReal, aImag, bReal, bImag, cReal, cImag); } // Load a PacketBlock, the N parameters make tunning gemm easier so we can add more accumulators as needed. template EIGEN_ALWAYS_INLINE void bload(PacketBlock& acc, const DataMapper& res, Index row, Index col) { if (StorageOrder == RowMajor) { acc.packet[0] = res.template loadPacket(row + 0, col + N*accCols); acc.packet[1] = res.template loadPacket(row + 1, col + N*accCols); acc.packet[2] = res.template loadPacket(row + 2, col + N*accCols); acc.packet[3] = res.template loadPacket(row + 3, col + N*accCols); } else { acc.packet[0] = res.template loadPacket(row + N*accCols, col + 0); acc.packet[1] = res.template loadPacket(row + N*accCols, col + 1); acc.packet[2] = res.template loadPacket(row + N*accCols, col + 2); acc.packet[3] = res.template loadPacket(row + N*accCols, col + 3); } } // An overload of bload when you have a PacketBLock with 8 vectors. template EIGEN_ALWAYS_INLINE void bload(PacketBlock& acc, const DataMapper& res, Index row, Index col) { if (StorageOrder == RowMajor) { acc.packet[0] = res.template loadPacket(row + 0, col + N*accCols); acc.packet[1] = res.template loadPacket(row + 1, col + N*accCols); acc.packet[2] = res.template loadPacket(row + 2, col + N*accCols); acc.packet[3] = res.template loadPacket(row + 3, col + N*accCols); acc.packet[4] = res.template loadPacket(row + 0, col + (N+1)*accCols); acc.packet[5] = res.template loadPacket(row + 1, col + (N+1)*accCols); acc.packet[6] = res.template loadPacket(row + 2, col + (N+1)*accCols); acc.packet[7] = res.template loadPacket(row + 3, col + (N+1)*accCols); } else { acc.packet[0] = res.template loadPacket(row + N*accCols, col + 0); acc.packet[1] = res.template loadPacket(row + N*accCols, col + 1); acc.packet[2] = res.template loadPacket(row + N*accCols, col + 2); acc.packet[3] = res.template loadPacket(row + N*accCols, col + 3); acc.packet[4] = res.template loadPacket(row + (N+1)*accCols, col + 0); acc.packet[5] = res.template loadPacket(row + (N+1)*accCols, col + 1); acc.packet[6] = res.template loadPacket(row + (N+1)*accCols, col + 2); acc.packet[7] = res.template loadPacket(row + (N+1)*accCols, col + 3); } } template EIGEN_ALWAYS_INLINE void bload(PacketBlock& acc, const DataMapper& res, Index row, Index col) { acc.packet[0] = res.template loadPacket(row + N*accCols, col + 0); acc.packet[1] = res.template loadPacket(row + (N+1)*accCols, col + 0); } const static Packet4i mask41 = { -1, 0, 0, 0 }; const static Packet4i mask42 = { -1, -1, 0, 0 }; const static Packet4i mask43 = { -1, -1, -1, 0 }; const static Packet2l mask21 = { -1, 0 }; template EIGEN_ALWAYS_INLINE Packet bmask(const int remaining_rows) { if (remaining_rows == 0) { return pset1(float(0.0)); // Not used } else { switch (remaining_rows) { case 1: return Packet(mask41); case 2: return Packet(mask42); default: return Packet(mask43); } } } template<> EIGEN_ALWAYS_INLINE Packet2d bmask(const int remaining_rows) { if (remaining_rows == 0) { return pset1(double(0.0)); // Not used } else { return Packet2d(mask21); } } template EIGEN_ALWAYS_INLINE void bscale(PacketBlock& acc, PacketBlock& accZ, const Packet& pAlpha, const Packet& pMask) { band(accZ, pMask); bscale(acc, accZ, pAlpha); } template EIGEN_ALWAYS_INLINE void pbroadcast4_old(const __UNPACK_TYPE__(Packet)* a, Packet& a0, Packet& a1, Packet& a2, Packet& a3) { pbroadcast4(a, a0, a1, a2, a3); } template<> EIGEN_ALWAYS_INLINE void pbroadcast4_old(const double* a, Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3) { a1 = pload(a); a3 = pload(a + 2); a0 = vec_splat(a1, 0); a1 = vec_splat(a1, 1); a2 = vec_splat(a3, 0); a3 = vec_splat(a3, 1); } // PEEL loop factor. #define PEEL 7 template EIGEN_ALWAYS_INLINE void MICRO_EXTRA_COL( const Scalar* &lhs_ptr, const Scalar* &rhs_ptr, PacketBlock &accZero, Index remaining_rows, Index remaining_cols) { Packet rhsV[1]; rhsV[0] = pset1(rhs_ptr[0]); pger<1,Scalar, Packet, false>(&accZero, lhs_ptr, rhsV); lhs_ptr += remaining_rows; rhs_ptr += remaining_cols; } template EIGEN_STRONG_INLINE void gemm_extra_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index row, Index col, Index remaining_rows, Index remaining_cols, const Packet& pAlpha) { const Scalar* rhs_ptr = rhs_base; const Scalar* lhs_ptr = lhs_base + row*strideA + remaining_rows*offsetA; PacketBlock accZero; bsetzero(accZero); Index remaining_depth = (depth & -accRows); Index k = 0; for(; k + PEEL <= remaining_depth; k+= PEEL) { EIGEN_POWER_PREFETCH(rhs_ptr); EIGEN_POWER_PREFETCH(lhs_ptr); for (int l = 0; l < PEEL; l++) { MICRO_EXTRA_COL(lhs_ptr, rhs_ptr, accZero, remaining_rows, remaining_cols); } } for(; k < remaining_depth; k++) { MICRO_EXTRA_COL(lhs_ptr, rhs_ptr, accZero, remaining_rows, remaining_cols); } for(; k < depth; k++) { Packet rhsV[1]; rhsV[0] = pset1(rhs_ptr[0]); pger<1, Scalar, Packet, Index, false>(&accZero, lhs_ptr, rhsV, remaining_rows); lhs_ptr += remaining_rows; rhs_ptr += remaining_cols; } accZero.packet[0] = vec_mul(pAlpha, accZero.packet[0]); for(Index i = 0; i < remaining_rows; i++) { res(row + i, col) += accZero.packet[0][i]; } } template EIGEN_ALWAYS_INLINE void MICRO_EXTRA_ROW( const Scalar* &lhs_ptr, const Scalar* &rhs_ptr, PacketBlock &accZero, Index remaining_rows) { Packet rhsV[4]; pbroadcast4(rhs_ptr, rhsV[0], rhsV[1], rhsV[2], rhsV[3]); pger<4, Scalar, Packet, false>(&accZero, lhs_ptr, rhsV); lhs_ptr += remaining_rows; rhs_ptr += accRows; } template EIGEN_STRONG_INLINE void gemm_extra_row( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index row, Index col, Index rows, Index cols, Index remaining_rows, const Packet& pAlpha, const Packet& pMask) { const Scalar* rhs_ptr = rhs_base; const Scalar* lhs_ptr = lhs_base + row*strideA + remaining_rows*offsetA; PacketBlock accZero, acc; bsetzero(accZero); Index remaining_depth = (col + accRows < cols) ? depth : (depth & -accRows); Index k = 0; for(; k + PEEL <= remaining_depth; k+= PEEL) { EIGEN_POWER_PREFETCH(rhs_ptr); EIGEN_POWER_PREFETCH(lhs_ptr); for (int l = 0; l < PEEL; l++) { MICRO_EXTRA_ROW(lhs_ptr, rhs_ptr, accZero, remaining_rows); } } for(; k < remaining_depth; k++) { MICRO_EXTRA_ROW(lhs_ptr, rhs_ptr, accZero, remaining_rows); } if ((remaining_depth == depth) && (rows >= accCols)) { for(Index j = 0; j < 4; j++) { acc.packet[j] = res.template loadPacket(row, col + j); } bscale(acc, accZero, pAlpha, pMask); res.template storePacketBlock(row, col, acc); } else { for(; k < depth; k++) { Packet rhsV[4]; pbroadcast4(rhs_ptr, rhsV[0], rhsV[1], rhsV[2], rhsV[3]); pger<4, Scalar, Packet, Index, false>(&accZero, lhs_ptr, rhsV, remaining_rows); lhs_ptr += remaining_rows; rhs_ptr += accRows; } for(Index j = 0; j < 4; j++) { accZero.packet[j] = vec_mul(pAlpha, accZero.packet[j]); } for(Index j = 0; j < 4; j++) { for(Index i = 0; i < remaining_rows; i++) { res(row + i, col + j) += accZero.packet[j][i]; } } } } #define MICRO_UNROLL(func) \ func(0) func(1) func(2) func(3) func(4) func(5) func(6) func(7) #define MICRO_UNROLL_WORK(func, func2, peel) \ MICRO_UNROLL(func2); \ func(0,peel) func(1,peel) func(2,peel) func(3,peel) \ func(4,peel) func(5,peel) func(6,peel) func(7,peel) #define MICRO_LOAD_ONE(iter) \ if (unroll_factor > iter) { \ lhsV##iter = ploadLhs(lhs_ptr##iter); \ lhs_ptr##iter += accCols; \ } else { \ EIGEN_UNUSED_VARIABLE(lhsV##iter); \ } #define MICRO_WORK_ONE(iter, peel) \ if (unroll_factor > iter) { \ pger_common(&accZero##iter, lhsV##iter, rhsV##peel); \ } #define MICRO_TYPE_PEEL4(func, func2, peel) \ if (PEEL > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4, lhsV5, lhsV6, lhsV7; \ pbroadcast4(rhs_ptr + (accRows * peel), rhsV##peel[0], rhsV##peel[1], rhsV##peel[2], rhsV##peel[3]); \ MICRO_UNROLL_WORK(func, func2, peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ } #define MICRO_TYPE_PEEL1(func, func2, peel) \ if (PEEL > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4, lhsV5, lhsV6, lhsV7; \ rhsV##peel[0] = pset1(rhs_ptr[remaining_cols * peel]); \ MICRO_UNROLL_WORK(func, func2, peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ } #define MICRO_UNROLL_TYPE_PEEL(M, func, func1, func2) \ Packet rhsV0[M], rhsV1[M], rhsV2[M], rhsV3[M], rhsV4[M], rhsV5[M], rhsV6[M], rhsV7[M], rhsV8[M], rhsV9[M]; \ func(func1,func2,0); func(func1,func2,1); \ func(func1,func2,2); func(func1,func2,3); \ func(func1,func2,4); func(func1,func2,5); \ func(func1,func2,6); func(func1,func2,7); \ func(func1,func2,8); func(func1,func2,9); #define MICRO_UNROLL_TYPE_ONE(M, func, func1, func2) \ Packet rhsV0[M]; \ func(func1,func2,0); #define MICRO_ONE_PEEL4 \ MICRO_UNROLL_TYPE_PEEL(4, MICRO_TYPE_PEEL4, MICRO_WORK_ONE, MICRO_LOAD_ONE); \ rhs_ptr += (accRows * PEEL); #define MICRO_ONE4 \ MICRO_UNROLL_TYPE_ONE(4, MICRO_TYPE_PEEL4, MICRO_WORK_ONE, MICRO_LOAD_ONE); \ rhs_ptr += accRows; #define MICRO_ONE_PEEL1 \ MICRO_UNROLL_TYPE_PEEL(1, MICRO_TYPE_PEEL1, MICRO_WORK_ONE, MICRO_LOAD_ONE); \ rhs_ptr += (remaining_cols * PEEL); #define MICRO_ONE1 \ MICRO_UNROLL_TYPE_ONE(1, MICRO_TYPE_PEEL1, MICRO_WORK_ONE, MICRO_LOAD_ONE); \ rhs_ptr += remaining_cols; #define MICRO_DST_PTR_ONE(iter) \ if (unroll_factor > iter) { \ bsetzero(accZero##iter); \ } else { \ EIGEN_UNUSED_VARIABLE(accZero##iter); \ } #define MICRO_DST_PTR MICRO_UNROLL(MICRO_DST_PTR_ONE) #define MICRO_SRC_PTR_ONE(iter) \ if (unroll_factor > iter) { \ lhs_ptr##iter = lhs_base + ( (row/accCols) + iter )*strideA*accCols + accCols*offsetA; \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr##iter); \ } #define MICRO_SRC_PTR MICRO_UNROLL(MICRO_SRC_PTR_ONE) #define MICRO_PREFETCH_ONE(iter) \ if (unroll_factor > iter) { \ EIGEN_POWER_PREFETCH(lhs_ptr##iter); \ } #define MICRO_PREFETCH MICRO_UNROLL(MICRO_PREFETCH_ONE) #define MICRO_STORE_ONE(iter) \ if (unroll_factor > iter) { \ acc.packet[0] = res.template loadPacket(row + iter*accCols, col + 0); \ acc.packet[1] = res.template loadPacket(row + iter*accCols, col + 1); \ acc.packet[2] = res.template loadPacket(row + iter*accCols, col + 2); \ acc.packet[3] = res.template loadPacket(row + iter*accCols, col + 3); \ bscale(acc, accZero##iter, pAlpha); \ res.template storePacketBlock(row + iter*accCols, col, acc); \ } #define MICRO_STORE MICRO_UNROLL(MICRO_STORE_ONE) #define MICRO_COL_STORE_ONE(iter) \ if (unroll_factor > iter) { \ acc.packet[0] = res.template loadPacket(row + iter*accCols, col + 0); \ bscale(acc, accZero##iter, pAlpha); \ res.template storePacketBlock(row + iter*accCols, col, acc); \ } #define MICRO_COL_STORE MICRO_UNROLL(MICRO_COL_STORE_ONE) template EIGEN_STRONG_INLINE void gemm_unrolled_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index& row, Index col, const Packet& pAlpha) { const Scalar* rhs_ptr = rhs_base; const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, * lhs_ptr7 = NULL; PacketBlock accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7; PacketBlock acc; MICRO_SRC_PTR MICRO_DST_PTR Index k = 0; for(; k + PEEL <= depth; k+= PEEL) { EIGEN_POWER_PREFETCH(rhs_ptr); MICRO_PREFETCH MICRO_ONE_PEEL4 } for(; k < depth; k++) { MICRO_ONE4 } MICRO_STORE row += unroll_factor*accCols; } template EIGEN_STRONG_INLINE void gemm_unrolled_col_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index& row, Index col, Index remaining_cols, const Packet& pAlpha) { const Scalar* rhs_ptr = rhs_base; const Scalar* lhs_ptr0 = NULL, * lhs_ptr1 = NULL, * lhs_ptr2 = NULL, * lhs_ptr3 = NULL, * lhs_ptr4 = NULL, * lhs_ptr5 = NULL, * lhs_ptr6 = NULL, *lhs_ptr7 = NULL; PacketBlock accZero0, accZero1, accZero2, accZero3, accZero4, accZero5, accZero6, accZero7; PacketBlock acc; MICRO_SRC_PTR MICRO_DST_PTR Index k = 0; for(; k + PEEL <= depth; k+= PEEL) { EIGEN_POWER_PREFETCH(rhs_ptr); MICRO_PREFETCH MICRO_ONE_PEEL1 } for(; k < depth; k++) { MICRO_ONE1 } MICRO_COL_STORE row += unroll_factor*accCols; } template EIGEN_STRONG_INLINE void gemm_unrolled_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index& row, Index rows, Index col, Index remaining_cols, const Packet& pAlpha) { #define MAX_UNROLL 6 while(row + MAX_UNROLL*accCols <= rows) { gemm_unrolled_col_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); } switch( (rows-row)/accCols ) { #if MAX_UNROLL > 7 case 7: gemm_unrolled_col_iteration<7, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 6 case 6: gemm_unrolled_col_iteration<6, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 5 case 5: gemm_unrolled_col_iteration<5, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 4 case 4: gemm_unrolled_col_iteration<4, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 3 case 3: gemm_unrolled_col_iteration<3, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 2 case 2: gemm_unrolled_col_iteration<2, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif #if MAX_UNROLL > 1 case 1: gemm_unrolled_col_iteration<1, Scalar, Packet, DataMapper, Index, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_cols, pAlpha); break; #endif default: break; } #undef MAX_UNROLL } /**************** * GEMM kernels * * **************/ template EIGEN_STRONG_INLINE void gemm(const DataMapper& res, const Scalar* blockA, const Scalar* blockB, Index rows, Index depth, Index cols, Scalar alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index remaining_rows = rows % accCols; const Index remaining_cols = cols % accRows; if( strideA == -1 ) strideA = depth; if( strideB == -1 ) strideB = depth; const Packet pAlpha = pset1(alpha); const Packet pMask = bmask((const int)(remaining_rows)); Index col = 0; for(; col + accRows <= cols; col += accRows) { const Scalar* rhs_base = blockB + col*strideB + accRows*offsetB; const Scalar* lhs_base = blockA; Index row = 0; #define MAX_UNROLL 6 while(row + MAX_UNROLL*accCols <= rows) { gemm_unrolled_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); } switch( (rows-row)/accCols ) { #if MAX_UNROLL > 7 case 7: gemm_unrolled_iteration<7, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 6 case 6: gemm_unrolled_iteration<6, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 5 case 5: gemm_unrolled_iteration<5, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 4 case 4: gemm_unrolled_iteration<4, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 3 case 3: gemm_unrolled_iteration<3, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 2 case 2: gemm_unrolled_iteration<2, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif #if MAX_UNROLL > 1 case 1: gemm_unrolled_iteration<1, Scalar, Packet, DataMapper, Index, accRows, accCols>(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, pAlpha); break; #endif default: break; } #undef MAX_UNROLL if(remaining_rows > 0) { gemm_extra_row(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, rows, cols, remaining_rows, pAlpha, pMask); } } if(remaining_cols > 0) { const Scalar* rhs_base = blockB + col*strideB + remaining_cols*offsetB; const Scalar* lhs_base = blockA; for(; col < cols; col++) { Index row = 0; gemm_unrolled_col(res, lhs_base, rhs_base, depth, strideA, offsetA, row, rows, col, remaining_cols, pAlpha); if (remaining_rows > 0) { gemm_extra_col(res, lhs_base, rhs_base, depth, strideA, offsetA, row, col, remaining_rows, remaining_cols, pAlpha); } rhs_base++; } } } #define accColsC (accCols / 2) #define advanceRows ((LhsIsReal) ? 1 : 2) #define advanceCols ((RhsIsReal) ? 1 : 2) // PEEL_COMPLEX loop factor. #define PEEL_COMPLEX 3 template EIGEN_ALWAYS_INLINE void MICRO_COMPLEX_EXTRA_COL( const Scalar* &lhs_ptr_real, const Scalar* &lhs_ptr_imag, const Scalar* &rhs_ptr_real, const Scalar* &rhs_ptr_imag, PacketBlock &accReal, PacketBlock &accImag, Index remaining_rows, Index remaining_cols) { Packet rhsV[1], rhsVi[1]; rhsV[0] = pset1(rhs_ptr_real[0]); if(!RhsIsReal) rhsVi[0] = pset1(rhs_ptr_imag[0]); pgerc<1, Scalar, Packet, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal, &accImag, lhs_ptr_real, lhs_ptr_imag, rhsV, rhsVi); lhs_ptr_real += remaining_rows; if(!LhsIsReal) lhs_ptr_imag += remaining_rows; else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); rhs_ptr_real += remaining_cols; if(!RhsIsReal) rhs_ptr_imag += remaining_cols; else EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); } template EIGEN_STRONG_INLINE void gemm_complex_extra_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index row, Index col, Index remaining_rows, Index remaining_cols, const Packet& pAlphaReal, const Packet& pAlphaImag) { const Scalar* rhs_ptr_real = rhs_base; const Scalar* rhs_ptr_imag; if(!RhsIsReal) rhs_ptr_imag = rhs_base + remaining_cols*strideB; else EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); const Scalar* lhs_ptr_real = lhs_base + advanceRows*row*strideA + remaining_rows*offsetA; const Scalar* lhs_ptr_imag; if(!LhsIsReal) lhs_ptr_imag = lhs_ptr_real + remaining_rows*strideA; else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); PacketBlock accReal, accImag; PacketBlock taccReal, taccImag; PacketBlock acc0, acc1; bsetzero(accReal); bsetzero(accImag); Index remaining_depth = (depth & -accRows); Index k = 0; for(; k + PEEL_COMPLEX <= remaining_depth; k+= PEEL_COMPLEX) { EIGEN_POWER_PREFETCH(rhs_ptr_real); if(!RhsIsReal) { EIGEN_POWER_PREFETCH(rhs_ptr_imag); } EIGEN_POWER_PREFETCH(lhs_ptr_real); if(!LhsIsReal) { EIGEN_POWER_PREFETCH(lhs_ptr_imag); } for (int l = 0; l < PEEL_COMPLEX; l++) { MICRO_COMPLEX_EXTRA_COL(lhs_ptr_real, lhs_ptr_imag, rhs_ptr_real, rhs_ptr_imag, accReal, accImag, remaining_rows, remaining_cols); } } for(; k < remaining_depth; k++) { MICRO_COMPLEX_EXTRA_COL(lhs_ptr_real, lhs_ptr_imag, rhs_ptr_real, rhs_ptr_imag, accReal, accImag, remaining_rows, remaining_cols); } for(; k < depth; k++) { Packet rhsV[1], rhsVi[1]; rhsV[0] = pset1(rhs_ptr_real[0]); if(!RhsIsReal) rhsVi[0] = pset1(rhs_ptr_imag[0]); pgerc<1, Scalar, Packet, Index, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal, &accImag, lhs_ptr_real, lhs_ptr_imag, rhsV, rhsVi, remaining_rows); lhs_ptr_real += remaining_rows; if(!LhsIsReal) lhs_ptr_imag += remaining_rows; rhs_ptr_real += remaining_cols; if(!RhsIsReal) rhs_ptr_imag += remaining_cols; } bscalec(accReal, accImag, pAlphaReal, pAlphaImag, taccReal, taccImag); bcouple_common(taccReal, taccImag, acc0, acc1); if ((sizeof(Scalar) == sizeof(float)) && (remaining_rows == 1)) { res(row + 0, col + 0) += pfirst(acc0.packet[0]); } else { acc0.packet[0] += res.template loadPacket(row + 0, col + 0); res.template storePacketBlock(row + 0, col + 0, acc0); if(remaining_rows > accColsC) { res(row + accColsC, col + 0) += pfirst(acc1.packet[0]); } } } template EIGEN_ALWAYS_INLINE void MICRO_COMPLEX_EXTRA_ROW( const Scalar* &lhs_ptr_real, const Scalar* &lhs_ptr_imag, const Scalar* &rhs_ptr_real, const Scalar* &rhs_ptr_imag, PacketBlock &accReal, PacketBlock &accImag, Index remaining_rows) { Packet rhsV[4], rhsVi[4]; pbroadcast4_old(rhs_ptr_real, rhsV[0], rhsV[1], rhsV[2], rhsV[3]); if(!RhsIsReal) pbroadcast4_old(rhs_ptr_imag, rhsVi[0], rhsVi[1], rhsVi[2], rhsVi[3]); pgerc<4, Scalar, Packet, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal, &accImag, lhs_ptr_real, lhs_ptr_imag, rhsV, rhsVi); lhs_ptr_real += remaining_rows; if(!LhsIsReal) lhs_ptr_imag += remaining_rows; else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); rhs_ptr_real += accRows; if(!RhsIsReal) rhs_ptr_imag += accRows; else EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); } template EIGEN_STRONG_INLINE void gemm_complex_extra_row( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index row, Index col, Index rows, Index cols, Index remaining_rows, const Packet& pAlphaReal, const Packet& pAlphaImag, const Packet& pMask) { const Scalar* rhs_ptr_real = rhs_base; const Scalar* rhs_ptr_imag; if(!RhsIsReal) rhs_ptr_imag = rhs_base + accRows*strideB; else EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); const Scalar* lhs_ptr_real = lhs_base + advanceRows*row*strideA + remaining_rows*offsetA; const Scalar* lhs_ptr_imag; if(!LhsIsReal) lhs_ptr_imag = lhs_ptr_real + remaining_rows*strideA; else EIGEN_UNUSED_VARIABLE(lhs_ptr_imag); PacketBlock accReal, accImag; PacketBlock taccReal, taccImag; PacketBlock acc0, acc1; PacketBlock tRes; bsetzero(accReal); bsetzero(accImag); Index remaining_depth = (col + accRows < cols) ? depth : (depth & -accRows); Index k = 0; for(; k + PEEL_COMPLEX <= remaining_depth; k+= PEEL_COMPLEX) { EIGEN_POWER_PREFETCH(rhs_ptr_real); if(!RhsIsReal) { EIGEN_POWER_PREFETCH(rhs_ptr_imag); } EIGEN_POWER_PREFETCH(lhs_ptr_real); if(!LhsIsReal) { EIGEN_POWER_PREFETCH(lhs_ptr_imag); } for (int l = 0; l < PEEL_COMPLEX; l++) { MICRO_COMPLEX_EXTRA_ROW(lhs_ptr_real, lhs_ptr_imag, rhs_ptr_real, rhs_ptr_imag, accReal, accImag, remaining_rows); } } for(; k < remaining_depth; k++) { MICRO_COMPLEX_EXTRA_ROW(lhs_ptr_real, lhs_ptr_imag, rhs_ptr_real, rhs_ptr_imag, accReal, accImag, remaining_rows); } if ((remaining_depth == depth) && (rows >= accCols)) { bload(tRes, res, row, col); bscalec(accReal, accImag, pAlphaReal, pAlphaImag, taccReal, taccImag, pMask); bcouple(taccReal, taccImag, tRes, acc0, acc1); res.template storePacketBlock(row + 0, col, acc0); res.template storePacketBlock(row + accColsC, col, acc1); } else { for(; k < depth; k++) { Packet rhsV[4], rhsVi[4]; pbroadcast4_old(rhs_ptr_real, rhsV[0], rhsV[1], rhsV[2], rhsV[3]); if(!RhsIsReal) pbroadcast4_old(rhs_ptr_imag, rhsVi[0], rhsVi[1], rhsVi[2], rhsVi[3]); pgerc<4, Scalar, Packet, Index, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal, &accImag, lhs_ptr_real, lhs_ptr_imag, rhsV, rhsVi, remaining_rows); lhs_ptr_real += remaining_rows; if(!LhsIsReal) lhs_ptr_imag += remaining_rows; rhs_ptr_real += accRows; if(!RhsIsReal) rhs_ptr_imag += accRows; } bscalec(accReal, accImag, pAlphaReal, pAlphaImag, taccReal, taccImag); bcouple_common(taccReal, taccImag, acc0, acc1); if ((sizeof(Scalar) == sizeof(float)) && (remaining_rows == 1)) { for(Index j = 0; j < 4; j++) { res(row + 0, col + j) += pfirst(acc0.packet[j]); } } else { for(Index j = 0; j < 4; j++) { PacketBlock acc2; acc2.packet[0] = res.template loadPacket(row + 0, col + j) + acc0.packet[j]; res.template storePacketBlock(row + 0, col + j, acc2); if(remaining_rows > accColsC) { res(row + accColsC, col + j) += pfirst(acc1.packet[j]); } } } } } #define MICRO_COMPLEX_UNROLL(func) \ func(0) func(1) func(2) func(3) func(4) #define MICRO_COMPLEX_UNROLL_WORK(func, func2, peel) \ MICRO_COMPLEX_UNROLL(func2); \ func(0,peel) func(1,peel) func(2,peel) func(3,peel) func(4,peel) #define MICRO_COMPLEX_LOAD_ONE(iter) \ if (unroll_factor > iter) { \ lhsV##iter = ploadLhs(lhs_ptr_real##iter); \ lhs_ptr_real##iter += accCols; \ if(!LhsIsReal) { \ lhsVi##iter = ploadLhs(lhs_ptr_imag##iter); \ lhs_ptr_imag##iter += accCols; \ } else { \ EIGEN_UNUSED_VARIABLE(lhsVi##iter); \ } \ } else { \ EIGEN_UNUSED_VARIABLE(lhsV##iter); \ EIGEN_UNUSED_VARIABLE(lhsVi##iter); \ } #define MICRO_COMPLEX_WORK_ONE4(iter, peel) \ if (unroll_factor > iter) { \ pgerc_common<4, Packet, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal##iter, &accImag##iter, lhsV##iter, lhsVi##iter, rhsV##peel, rhsVi##peel); \ } #define MICRO_COMPLEX_WORK_ONE1(iter, peel) \ if (unroll_factor > iter) { \ pgerc_common<1, Packet, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(&accReal##iter, &accImag##iter, lhsV##iter, lhsVi##iter, rhsV##peel, rhsVi##peel); \ } #define MICRO_COMPLEX_TYPE_PEEL4(func, func2, peel) \ if (PEEL_COMPLEX > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4; \ Packet lhsVi0, lhsVi1, lhsVi2, lhsVi3, lhsVi4; \ pbroadcast4_old(rhs_ptr_real + (accRows * peel), rhsV##peel[0], rhsV##peel[1], rhsV##peel[2], rhsV##peel[3]); \ if(!RhsIsReal) { \ pbroadcast4_old(rhs_ptr_imag + (accRows * peel), rhsVi##peel[0], rhsVi##peel[1], rhsVi##peel[2], rhsVi##peel[3]); \ } else { \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } \ MICRO_COMPLEX_UNROLL_WORK(func, func2, peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } #define MICRO_COMPLEX_TYPE_PEEL1(func, func2, peel) \ if (PEEL_COMPLEX > peel) { \ Packet lhsV0, lhsV1, lhsV2, lhsV3, lhsV4; \ Packet lhsVi0, lhsVi1, lhsVi2, lhsVi3, lhsVi4; \ rhsV##peel[0] = pset1(rhs_ptr_real[remaining_cols * peel]); \ if(!RhsIsReal) { \ rhsVi##peel[0] = pset1(rhs_ptr_imag[remaining_cols * peel]); \ } else { \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } \ MICRO_COMPLEX_UNROLL_WORK(func, func2, peel) \ } else { \ EIGEN_UNUSED_VARIABLE(rhsV##peel); \ EIGEN_UNUSED_VARIABLE(rhsVi##peel); \ } #define MICRO_COMPLEX_UNROLL_TYPE_PEEL(M, func, func1, func2) \ Packet rhsV0[M], rhsV1[M], rhsV2[M], rhsV3[M], rhsV4[M], rhsV5[M], rhsV6[M], rhsV7[M], rhsV8[M], rhsV9[M]; \ Packet rhsVi0[M], rhsVi1[M], rhsVi2[M], rhsVi3[M], rhsVi4[M], rhsVi5[M], rhsVi6[M], rhsVi7[M], rhsVi8[M], rhsVi9[M]; \ func(func1,func2,0); func(func1,func2,1); \ func(func1,func2,2); func(func1,func2,3); \ func(func1,func2,4); func(func1,func2,5); \ func(func1,func2,6); func(func1,func2,7); \ func(func1,func2,8); func(func1,func2,9); #define MICRO_COMPLEX_UNROLL_TYPE_ONE(M, func, func1, func2) \ Packet rhsV0[M], rhsVi0[M];\ func(func1,func2,0); #define MICRO_COMPLEX_ONE_PEEL4 \ MICRO_COMPLEX_UNROLL_TYPE_PEEL(4, MICRO_COMPLEX_TYPE_PEEL4, MICRO_COMPLEX_WORK_ONE4, MICRO_COMPLEX_LOAD_ONE); \ rhs_ptr_real += (accRows * PEEL_COMPLEX); \ if(!RhsIsReal) rhs_ptr_imag += (accRows * PEEL_COMPLEX); #define MICRO_COMPLEX_ONE4 \ MICRO_COMPLEX_UNROLL_TYPE_ONE(4, MICRO_COMPLEX_TYPE_PEEL4, MICRO_COMPLEX_WORK_ONE4, MICRO_COMPLEX_LOAD_ONE); \ rhs_ptr_real += accRows; \ if(!RhsIsReal) rhs_ptr_imag += accRows; #define MICRO_COMPLEX_ONE_PEEL1 \ MICRO_COMPLEX_UNROLL_TYPE_PEEL(1, MICRO_COMPLEX_TYPE_PEEL1, MICRO_COMPLEX_WORK_ONE1, MICRO_COMPLEX_LOAD_ONE); \ rhs_ptr_real += (remaining_cols * PEEL_COMPLEX); \ if(!RhsIsReal) rhs_ptr_imag += (remaining_cols * PEEL_COMPLEX); #define MICRO_COMPLEX_ONE1 \ MICRO_COMPLEX_UNROLL_TYPE_ONE(1, MICRO_COMPLEX_TYPE_PEEL1, MICRO_COMPLEX_WORK_ONE1, MICRO_COMPLEX_LOAD_ONE); \ rhs_ptr_real += remaining_cols; \ if(!RhsIsReal) rhs_ptr_imag += remaining_cols; #define MICRO_COMPLEX_DST_PTR_ONE(iter) \ if (unroll_factor > iter) { \ bsetzero(accReal##iter); \ bsetzero(accImag##iter); \ } else { \ EIGEN_UNUSED_VARIABLE(accReal##iter); \ EIGEN_UNUSED_VARIABLE(accImag##iter); \ } #define MICRO_COMPLEX_DST_PTR MICRO_COMPLEX_UNROLL(MICRO_COMPLEX_DST_PTR_ONE) #define MICRO_COMPLEX_SRC_PTR_ONE(iter) \ if (unroll_factor > iter) { \ lhs_ptr_real##iter = lhs_base + ( ((advanceRows*row)/accCols) + iter*advanceRows )*strideA*accCols + accCols*offsetA; \ if(!LhsIsReal) { \ lhs_ptr_imag##iter = lhs_ptr_real##iter + accCols*strideA; \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \ } \ } else { \ EIGEN_UNUSED_VARIABLE(lhs_ptr_real##iter); \ EIGEN_UNUSED_VARIABLE(lhs_ptr_imag##iter); \ } #define MICRO_COMPLEX_SRC_PTR MICRO_COMPLEX_UNROLL(MICRO_COMPLEX_SRC_PTR_ONE) #define MICRO_COMPLEX_PREFETCH_ONE(iter) \ if (unroll_factor > iter) { \ EIGEN_POWER_PREFETCH(lhs_ptr_real##iter); \ if(!LhsIsReal) { \ EIGEN_POWER_PREFETCH(lhs_ptr_imag##iter); \ } \ } #define MICRO_COMPLEX_PREFETCH MICRO_COMPLEX_UNROLL(MICRO_COMPLEX_PREFETCH_ONE) #define MICRO_COMPLEX_STORE_ONE(iter) \ if (unroll_factor > iter) { \ bload(tRes, res, row + iter*accCols, col); \ bscalec(accReal##iter, accImag##iter, pAlphaReal, pAlphaImag, taccReal, taccImag); \ bcouple(taccReal, taccImag, tRes, acc0, acc1); \ res.template storePacketBlock(row + iter*accCols + 0, col, acc0); \ res.template storePacketBlock(row + iter*accCols + accColsC, col, acc1); \ } #define MICRO_COMPLEX_STORE MICRO_COMPLEX_UNROLL(MICRO_COMPLEX_STORE_ONE) #define MICRO_COMPLEX_COL_STORE_ONE(iter) \ if (unroll_factor > iter) { \ bload(tRes, res, row + iter*accCols, col); \ bscalec(accReal##iter, accImag##iter, pAlphaReal, pAlphaImag, taccReal, taccImag); \ bcouple(taccReal, taccImag, tRes, acc0, acc1); \ res.template storePacketBlock(row + iter*accCols + 0, col, acc0); \ res.template storePacketBlock(row + iter*accCols + accColsC, col, acc1); \ } #define MICRO_COMPLEX_COL_STORE MICRO_COMPLEX_UNROLL(MICRO_COMPLEX_COL_STORE_ONE) template EIGEN_STRONG_INLINE void gemm_complex_unrolled_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index& row, Index col, const Packet& pAlphaReal, const Packet& pAlphaImag) { const Scalar* rhs_ptr_real = rhs_base; const Scalar* rhs_ptr_imag; if(!RhsIsReal) { rhs_ptr_imag = rhs_base + accRows*strideB; } else { EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); } const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL; const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL; const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL; PacketBlock accReal0, accImag0, accReal1, accImag1; PacketBlock accReal2, accImag2, accReal3, accImag3; PacketBlock accReal4, accImag4; PacketBlock taccReal, taccImag; PacketBlock acc0, acc1; PacketBlock tRes; MICRO_COMPLEX_SRC_PTR MICRO_COMPLEX_DST_PTR Index k = 0; for(; k + PEEL_COMPLEX <= depth; k+= PEEL_COMPLEX) { EIGEN_POWER_PREFETCH(rhs_ptr_real); if(!RhsIsReal) { EIGEN_POWER_PREFETCH(rhs_ptr_imag); } MICRO_COMPLEX_PREFETCH MICRO_COMPLEX_ONE_PEEL4 } for(; k < depth; k++) { MICRO_COMPLEX_ONE4 } MICRO_COMPLEX_STORE row += unroll_factor*accCols; } template EIGEN_STRONG_INLINE void gemm_complex_unrolled_col_iteration( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index& row, Index col, Index remaining_cols, const Packet& pAlphaReal, const Packet& pAlphaImag) { const Scalar* rhs_ptr_real = rhs_base; const Scalar* rhs_ptr_imag; if(!RhsIsReal) { rhs_ptr_imag = rhs_base + remaining_cols*strideB; } else { EIGEN_UNUSED_VARIABLE(rhs_ptr_imag); } const Scalar* lhs_ptr_real0 = NULL, * lhs_ptr_imag0 = NULL, * lhs_ptr_real1 = NULL, * lhs_ptr_imag1 = NULL; const Scalar* lhs_ptr_real2 = NULL, * lhs_ptr_imag2 = NULL, * lhs_ptr_real3 = NULL, * lhs_ptr_imag3 = NULL; const Scalar* lhs_ptr_real4 = NULL, * lhs_ptr_imag4 = NULL; PacketBlock accReal0, accImag0, accReal1, accImag1; PacketBlock accReal2, accImag2, accReal3, accImag3; PacketBlock accReal4, accImag4; PacketBlock taccReal, taccImag; PacketBlock acc0, acc1; PacketBlock tRes; MICRO_COMPLEX_SRC_PTR MICRO_COMPLEX_DST_PTR Index k = 0; for(; k + PEEL_COMPLEX <= depth; k+= PEEL_COMPLEX) { EIGEN_POWER_PREFETCH(rhs_ptr_real); if(!RhsIsReal) { EIGEN_POWER_PREFETCH(rhs_ptr_imag); } MICRO_COMPLEX_PREFETCH MICRO_COMPLEX_ONE_PEEL1 } for(; k < depth; k++) { MICRO_COMPLEX_ONE1 } MICRO_COMPLEX_COL_STORE row += unroll_factor*accCols; } template EIGEN_STRONG_INLINE void gemm_complex_unrolled_col( const DataMapper& res, const Scalar* lhs_base, const Scalar* rhs_base, Index depth, Index strideA, Index offsetA, Index strideB, Index& row, Index rows, Index col, Index remaining_cols, const Packet& pAlphaReal, const Packet& pAlphaImag) { #define MAX_COMPLEX_UNROLL 3 while(row + MAX_COMPLEX_UNROLL*accCols <= rows) { gemm_complex_unrolled_col_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_cols, pAlphaReal, pAlphaImag); } switch( (rows-row)/accCols ) { #if MAX_COMPLEX_UNROLL > 4 case 4: gemm_complex_unrolled_col_iteration<4, Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_cols, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 3 case 3: gemm_complex_unrolled_col_iteration<3, Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_cols, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 2 case 2: gemm_complex_unrolled_col_iteration<2, Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_cols, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 1 case 1: gemm_complex_unrolled_col_iteration<1, Scalar, Packet, Packetc, DataMapper, Index, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_cols, pAlphaReal, pAlphaImag); break; #endif default: break; } #undef MAX_COMPLEX_UNROLL } template EIGEN_STRONG_INLINE void gemm_complex(const DataMapper& res, const LhsScalar* blockAc, const RhsScalar* blockBc, Index rows, Index depth, Index cols, Scalarc alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index remaining_rows = rows % accCols; const Index remaining_cols = cols % accRows; if( strideA == -1 ) strideA = depth; if( strideB == -1 ) strideB = depth; const Packet pAlphaReal = pset1(alpha.real()); const Packet pAlphaImag = pset1(alpha.imag()); const Packet pMask = bmask((const int)(remaining_rows)); const Scalar* blockA = (Scalar *) blockAc; const Scalar* blockB = (Scalar *) blockBc; Index col = 0; for(; col + accRows <= cols; col += accRows) { const Scalar* rhs_base = blockB + advanceCols*col*strideB + accRows*offsetB; const Scalar* lhs_base = blockA; Index row = 0; #define MAX_COMPLEX_UNROLL 3 while(row + MAX_COMPLEX_UNROLL*accCols <= rows) { gemm_complex_unrolled_iteration(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); } switch( (rows-row)/accCols ) { #if MAX_COMPLEX_UNROLL > 4 case 4: gemm_complex_unrolled_iteration<4, Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 3 case 3: gemm_complex_unrolled_iteration<3, Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 2 case 2: gemm_complex_unrolled_iteration<2, Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif #if MAX_COMPLEX_UNROLL > 1 case 1: gemm_complex_unrolled_iteration<1, Scalar, Packet, Packetc, DataMapper, Index, accRows, accCols, ConjugateLhs, ConjugateRhs, LhsIsReal, RhsIsReal>(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, pAlphaReal, pAlphaImag); break; #endif default: break; } #undef MAX_COMPLEX_UNROLL if(remaining_rows > 0) { gemm_complex_extra_row(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, rows, cols, remaining_rows, pAlphaReal, pAlphaImag, pMask); } } if(remaining_cols > 0) { const Scalar* rhs_base = blockB + advanceCols*col*strideB + remaining_cols*offsetB; const Scalar* lhs_base = blockA; for(; col < cols; col++) { Index row = 0; gemm_complex_unrolled_col(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, rows, col, remaining_cols, pAlphaReal, pAlphaImag); if (remaining_rows > 0) { gemm_complex_extra_col(res, lhs_base, rhs_base, depth, strideA, offsetA, strideB, row, col, remaining_rows, remaining_cols, pAlphaReal, pAlphaImag); } rhs_base++; } } } #undef accColsC #undef advanceCols #undef advanceRows /************************************ * ppc64le template specializations * * **********************************/ template struct gemm_pack_lhs { void operator()(double* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs ::operator()(double* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_pack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_lhs { void operator()(double* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs ::operator()(double* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_pack pack; pack(blockA, lhs, depth, rows, stride, offset); } #if EIGEN_ALTIVEC_USE_CUSTOM_PACK template struct gemm_pack_rhs { void operator()(double* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs ::operator()(double* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_pack pack; pack(blockB, rhs, depth, cols, stride, offset); } template struct gemm_pack_rhs { void operator()(double* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs ::operator()(double* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_pack pack; pack(blockB, rhs, depth, cols, stride, offset); } #endif template struct gemm_pack_lhs { void operator()(float* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs ::operator()(float* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_pack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_lhs { void operator()(float* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs ::operator()(float* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_pack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode> { void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode> ::operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_cpack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, ColMajor, Conjugate, PanelMode> { void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, ColMajor, Conjugate, PanelMode> ::operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_cpack pack; pack(blockA, lhs, depth, rows, stride, offset); } #if EIGEN_ALTIVEC_USE_CUSTOM_PACK template struct gemm_pack_rhs { void operator()(float* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs ::operator()(float* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_pack pack; pack(blockB, rhs, depth, cols, stride, offset); } template struct gemm_pack_rhs { void operator()(float* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs ::operator()(float* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_pack pack; pack(blockB, rhs, depth, cols, stride, offset); } #endif template struct gemm_pack_rhs, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> { void operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> ::operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_cpack pack; pack(blockB, rhs, depth, cols, stride, offset); } template struct gemm_pack_rhs, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode> { void operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode> ::operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_cpack pack; pack(blockB, rhs, depth, cols, stride, offset); } template struct gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode> { void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, RowMajor, Conjugate, PanelMode> ::operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_cpack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, ColMajor, Conjugate, PanelMode> { void operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0); }; template void gemm_pack_lhs, Index, DataMapper, Pack1, Pack2, Packet, ColMajor, Conjugate, PanelMode> ::operator()(std::complex* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) { dhs_cpack pack; pack(blockA, lhs, depth, rows, stride, offset); } template struct gemm_pack_rhs, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> { void operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> ::operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_cpack pack; pack(blockB, rhs, depth, cols, stride, offset); } template struct gemm_pack_rhs, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode> { void operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0); }; template void gemm_pack_rhs, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode> ::operator()(std::complex* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) { dhs_cpack pack; pack(blockB, rhs, depth, cols, stride, offset); } // ********* gebp specializations ********* template struct gebp_kernel { typedef typename quad_traits::vectortype Packet; typedef typename quad_traits::rhstype RhsPacket; void operator()(const DataMapper& res, const float* blockA, const float* blockB, Index rows, Index depth, Index cols, float alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel ::operator()(const DataMapper& res, const float* blockA, const float* blockB, Index rows, Index depth, Index cols, float alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const float*, const float*, Index, Index, Index, float, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemmMMA; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemmMMA; } else{ gemm_function = &Eigen::internal::gemm; } #else gemm_function = &Eigen::internal::gemm; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, std::complex, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef Packet4f Packet; typedef Packet2cf Packetc; typedef Packet4f RhsPacket; void operator()(const DataMapper& res, const std::complex* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, std::complex, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const std::complex* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const std::complex*, const std::complex*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; } else{ gemm_function = &Eigen::internal::gemm_complex, std::complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; } #else gemm_function = &Eigen::internal::gemm_complex, std::complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef Packet4f Packet; typedef Packet2cf Packetc; typedef Packet4f RhsPacket; void operator()(const DataMapper& res, const float* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const float* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const float*, const std::complex*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; } else{ gemm_function = &Eigen::internal::gemm_complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; } #else gemm_function = &Eigen::internal::gemm_complex, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, float, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef Packet4f Packet; typedef Packet2cf Packetc; typedef Packet4f RhsPacket; void operator()(const DataMapper& res, const std::complex* blockA, const float* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, float, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const std::complex* blockA, const float* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const std::complex*, const float*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, float, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, float, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; } else{ gemm_function = &Eigen::internal::gemm_complex, float, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; } #else gemm_function = &Eigen::internal::gemm_complex, float, std::complex, float, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel { typedef typename quad_traits::vectortype Packet; typedef typename quad_traits::rhstype RhsPacket; void operator()(const DataMapper& res, const double* blockA, const double* blockB, Index rows, Index depth, Index cols, double alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel ::operator()(const DataMapper& res, const double* blockA, const double* blockB, Index rows, Index depth, Index cols, double alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const double*, const double*, Index, Index, Index, double, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemmMMA; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemmMMA; } else{ gemm_function = &Eigen::internal::gemm; } #else gemm_function = &Eigen::internal::gemm; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, std::complex, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef quad_traits::vectortype Packet; typedef Packet1cd Packetc; typedef quad_traits::rhstype RhsPacket; void operator()(const DataMapper& res, const std::complex* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, std::complex, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const std::complex* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const std::complex*, const std::complex*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; } else{ gemm_function = &Eigen::internal::gemm_complex, std::complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; } #else gemm_function = &Eigen::internal::gemm_complex, std::complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, false>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, double, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef quad_traits::vectortype Packet; typedef Packet1cd Packetc; typedef quad_traits::rhstype RhsPacket; void operator()(const DataMapper& res, const std::complex* blockA, const double* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, double, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const std::complex* blockA, const double* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const std::complex*, const double*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, double, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, double, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; } else{ gemm_function = &Eigen::internal::gemm_complex, double, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; } #else gemm_function = &Eigen::internal::gemm_complex, double, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, false, true>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } template struct gebp_kernel, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> { typedef quad_traits::vectortype Packet; typedef Packet1cd Packetc; typedef quad_traits::rhstype RhsPacket; void operator()(const DataMapper& res, const double* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0); }; template void gebp_kernel, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> ::operator()(const DataMapper& res, const double* blockA, const std::complex* blockB, Index rows, Index depth, Index cols, std::complex alpha, Index strideA, Index strideB, Index offsetA, Index offsetB) { const Index accRows = quad_traits::rows; const Index accCols = quad_traits::size; void (*gemm_function)(const DataMapper&, const double*, const std::complex*, Index, Index, Index, std::complex, Index, Index, Index, Index); #ifdef EIGEN_ALTIVEC_MMA_ONLY //generate with MMA only gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; #elif defined(ALTIVEC_MMA_SUPPORT) && !defined(EIGEN_ALTIVEC_DISABLE_MMA) if (__builtin_cpu_supports ("arch_3_1") && __builtin_cpu_supports ("mma")){ gemm_function = &Eigen::internal::gemm_complexMMA, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; } else{ gemm_function = &Eigen::internal::gemm_complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; } #else gemm_function = &Eigen::internal::gemm_complex, std::complex, double, Index, Packet, Packetc, RhsPacket, DataMapper, accRows, accCols, ConjugateLhs, ConjugateRhs, true, false>; #endif gemm_function(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATRIX_PRODUCT_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/AltiVec/Complex.h0000644000176200001440000004023414562417221023267 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2010 Gael Guennebaud // Copyright (C) 2010-2016 Konstantinos Margaritis // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX32_ALTIVEC_H #define EIGEN_COMPLEX32_ALTIVEC_H namespace Eigen { namespace internal { static Packet4ui p4ui_CONJ_XOR = vec_mergeh((Packet4ui)p4i_ZERO, (Packet4ui)p4f_MZERO);//{ 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; #ifdef __VSX__ #if defined(_BIG_ENDIAN) static Packet2ul p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 }; static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO, (Packet4ui) p2d_MZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 }; #else static Packet2ul p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO, (Packet4ui) p2d_MZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 }; static Packet2ul p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2d_MZERO, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 }; #endif #endif //---------- float ---------- struct Packet2cf { EIGEN_STRONG_INLINE explicit Packet2cf() {} EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {} EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { Packet4f v1, v2; // Permute and multiply the real parts of a and b v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD); // Get the imaginary parts of a v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN); // multiply a_re * b v1 = vec_madd(v1, b.v, p4f_ZERO); // multiply a_im * b and get the conjugate result v2 = vec_madd(v2, b.v, p4f_ZERO); v2 = reinterpret_cast(pxor(v2, reinterpret_cast(p4ui_CONJ_XOR))); // permute back to a proper order v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV); return Packet2cf(padd(v1, v2)); } EIGEN_STRONG_INLINE Packet2cf& operator*=(const Packet2cf& b) { v = pmul(Packet2cf(*this), b).v; return *this; } EIGEN_STRONG_INLINE Packet2cf operator*(const Packet2cf& b) const { return Packet2cf(*this) *= b; } EIGEN_STRONG_INLINE Packet2cf& operator+=(const Packet2cf& b) { v = padd(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet2cf operator+(const Packet2cf& b) const { return Packet2cf(*this) += b; } EIGEN_STRONG_INLINE Packet2cf& operator-=(const Packet2cf& b) { v = psub(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet2cf operator-(const Packet2cf& b) const { return Packet2cf(*this) -= b; } EIGEN_STRONG_INLINE Packet2cf operator-(void) const { return Packet2cf(-v); } Packet4f v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet2cf type; typedef Packet2cf half; typedef Packet4f as_real; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, #ifdef __VSX__ HasBlend = 1, #endif HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; enum {size=2, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet2cf half; typedef Packet4f as_real; }; template<> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { Packet2cf res; if((std::ptrdiff_t(&from) % 16) == 0) res.v = pload((const float *)&from); else res.v = ploadu((const float *)&from); res.v = vec_perm(res.v, res.v, p16uc_PSET64_HI); return res; } template<> EIGEN_STRONG_INLINE Packet2cf pload(const std::complex* from) { return Packet2cf(pload((const float *) from)); } template<> EIGEN_STRONG_INLINE Packet2cf ploadu(const std::complex* from) { return Packet2cf(ploadu((const float*) from)); } template<> EIGEN_STRONG_INLINE Packet2cf ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet2cf& from) { pstore((float*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet2cf& from) { pstoreu((float*)to, from.v); } EIGEN_STRONG_INLINE Packet2cf pload2(const std::complex* from0, const std::complex* from1) { Packet4f res0, res1; #ifdef __VSX__ __asm__ ("lxsdx %x0,%y1" : "=wa" (res0) : "Z" (*from0)); __asm__ ("lxsdx %x0,%y1" : "=wa" (res1) : "Z" (*from1)); #ifdef _BIG_ENDIAN __asm__ ("xxpermdi %x0, %x1, %x2, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1)); #else __asm__ ("xxpermdi %x0, %x2, %x1, 0" : "=wa" (res0) : "wa" (res0), "wa" (res1)); #endif #else *reinterpret_cast *>(&res0) = *from0; *reinterpret_cast *>(&res1) = *from1; res0 = vec_perm(res0, res1, p16uc_TRANSPOSE64_HI); #endif return Packet2cf(res0); } template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather, Packet2cf>(const std::complex* from, Index stride) { EIGEN_ALIGN16 std::complex af[2]; af[0] = from[0*stride]; af[1] = from[1*stride]; return pload(af); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cf>(std::complex* to, const Packet2cf& from, Index stride) { EIGEN_ALIGN16 std::complex af[2]; pstore >((std::complex *) af, from); to[0*stride] = af[0]; to[1*stride] = af[1]; } template<> EIGEN_STRONG_INLINE Packet2cf padd(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(a.v + b.v); } template<> EIGEN_STRONG_INLINE Packet2cf psub(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(a.v - b.v); } template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(a.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf(pxor(a.v, reinterpret_cast(p4ui_CONJ_XOR))); } template<> EIGEN_STRONG_INLINE Packet2cf pand (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pand(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf por (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(por(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pxor (const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pxor(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet2cf pandnot(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(pandnot(a.v, b.v)); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { EIGEN_PPC_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cf& a) { EIGEN_ALIGN16 std::complex res[2]; pstore((float *)&res, a.v); return res[0]; } template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { Packet4f rev_a; rev_a = vec_perm(a.v, a.v, p16uc_COMPLEX32_REV2); return Packet2cf(rev_a); } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { Packet4f b; b = vec_sld(a.v, a.v, 8); b = padd(a.v, b); return pfirst(Packet2cf(b)); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { Packet4f b; Packet2cf prod; b = vec_sld(a.v, a.v, 8); prod = pmul(a, Packet2cf(b)); return pfirst(prod); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf,Packet4f) template<> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { // TODO optimize it for AltiVec Packet2cf res = pmul(a, pconj(b)); Packet4f s = pmul(b.v, b.v); return Packet2cf(pdiv(res.v, padd(s, vec_perm(s, s, p16uc_COMPLEX32_REV)))); } template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip(const Packet2cf& x) { return Packet2cf(vec_perm(x.v, x.v, p16uc_COMPLEX32_REV)); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet4f tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI); kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO); kernel.packet[0].v = tmp; } template<> EIGEN_STRONG_INLINE Packet2cf pcmp_eq(const Packet2cf& a, const Packet2cf& b) { Packet4f eq = reinterpret_cast(vec_cmpeq(a.v,b.v)); return Packet2cf(vec_and(eq, vec_perm(eq, eq, p16uc_COMPLEX32_REV))); } #ifdef __VSX__ template<> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) { Packet2cf result; result.v = reinterpret_cast(pblend(ifPacket, reinterpret_cast(thenPacket.v), reinterpret_cast(elsePacket.v))); return result; } #endif template<> EIGEN_STRONG_INLINE Packet2cf psqrt(const Packet2cf& a) { return psqrt_complex(a); } //---------- double ---------- #ifdef __VSX__ struct Packet1cd { EIGEN_STRONG_INLINE Packet1cd() {} EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {} EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) { Packet2d a_re, a_im, v1, v2; // Permute and multiply the real parts of a and b a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI); // Get the imaginary parts of a a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO); // multiply a_re * b v1 = vec_madd(a_re, b.v, p2d_ZERO); // multiply a_im * b and get the conjugate result v2 = vec_madd(a_im, b.v, p2d_ZERO); v2 = reinterpret_cast(vec_sld(reinterpret_cast(v2), reinterpret_cast(v2), 8)); v2 = pxor(v2, reinterpret_cast(p2ul_CONJ_XOR1)); return Packet1cd(padd(v1, v2)); } EIGEN_STRONG_INLINE Packet1cd& operator*=(const Packet1cd& b) { v = pmul(Packet1cd(*this), b).v; return *this; } EIGEN_STRONG_INLINE Packet1cd operator*(const Packet1cd& b) const { return Packet1cd(*this) *= b; } EIGEN_STRONG_INLINE Packet1cd& operator+=(const Packet1cd& b) { v = padd(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet1cd operator+(const Packet1cd& b) const { return Packet1cd(*this) += b; } EIGEN_STRONG_INLINE Packet1cd& operator-=(const Packet1cd& b) { v = psub(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet1cd operator-(const Packet1cd& b) const { return Packet1cd(*this) -= b; } EIGEN_STRONG_INLINE Packet1cd operator-(void) const { return Packet1cd(-v); } Packet2d v; }; template<> struct packet_traits > : default_packet_traits { typedef Packet1cd type; typedef Packet1cd half; typedef Packet2d as_real; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 1, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template<> struct unpacket_traits { typedef std::complex type; enum {size=1, alignment=Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false}; typedef Packet1cd half; typedef Packet2d as_real; }; template<> EIGEN_STRONG_INLINE Packet1cd pload (const std::complex* from) { return Packet1cd(pload((const double*)from)); } template<> EIGEN_STRONG_INLINE Packet1cd ploadu(const std::complex* from) { return Packet1cd(ploadu((const double*)from)); } template<> EIGEN_STRONG_INLINE void pstore >(std::complex * to, const Packet1cd& from) { pstore((double*)to, from.v); } template<> EIGEN_STRONG_INLINE void pstoreu >(std::complex * to, const Packet1cd& from) { pstoreu((double*)to, from.v); } template<> EIGEN_STRONG_INLINE Packet1cd pset1(const std::complex& from) { /* here we really have to use unaligned loads :( */ return ploadu(&from); } template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather, Packet1cd>(const std::complex* from, Index) { return pload(from); } template<> EIGEN_DEVICE_FUNC inline void pscatter, Packet1cd>(std::complex* to, const Packet1cd& from, Index) { pstore >(to, from); } template<> EIGEN_STRONG_INLINE Packet1cd padd(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v + b.v); } template<> EIGEN_STRONG_INLINE Packet1cd psub(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(a.v - b.v); } template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); } template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(pxor(a.v, reinterpret_cast(p2ul_CONJ_XOR2))); } template<> EIGEN_STRONG_INLINE Packet1cd pand (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pand(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd por (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(por(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pxor (const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pxor(a.v,b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd pandnot(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(pandnot(a.v, b.v)); } template<> EIGEN_STRONG_INLINE Packet1cd ploaddup(const std::complex* from) { return pset1(*from); } template<> EIGEN_STRONG_INLINE void prefetch >(const std::complex * addr) { EIGEN_PPC_PREFETCH(addr); } template<> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cd& a) { EIGEN_ALIGN16 std::complex res[2]; pstore >(res, a); return res[0]; } template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; } template<> EIGEN_STRONG_INLINE std::complex predux(const Packet1cd& a) { return pfirst(a); } template<> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cd& a) { return pfirst(a); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd,Packet2d) template<> EIGEN_STRONG_INLINE Packet1cd pdiv(const Packet1cd& a, const Packet1cd& b) { // TODO optimize it for AltiVec Packet1cd res = pmul(a,pconj(b)); Packet2d s = pmul(b.v, b.v); return Packet1cd(pdiv(res.v, padd(s, vec_perm(s, s, p16uc_REVERSE64)))); } EIGEN_STRONG_INLINE Packet1cd pcplxflip/**/(const Packet1cd& x) { return Packet1cd(preverse(Packet2d(x.v))); } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { Packet2d tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI); kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO); kernel.packet[0].v = tmp; } template<> EIGEN_STRONG_INLINE Packet1cd pcmp_eq(const Packet1cd& a, const Packet1cd& b) { // Compare real and imaginary parts of a and b to get the mask vector: // [re(a)==re(b), im(a)==im(b)] Packet2d eq = reinterpret_cast(vec_cmpeq(a.v,b.v)); // Swap real/imag elements in the mask in to get: // [im(a)==im(b), re(a)==re(b)] Packet2d eq_swapped = reinterpret_cast(vec_sld(reinterpret_cast(eq), reinterpret_cast(eq), 8)); // Return re(a)==re(b) & im(a)==im(b) by computing bitwise AND of eq and eq_swapped return Packet1cd(vec_and(eq, eq_swapped)); } template<> EIGEN_STRONG_INLINE Packet1cd psqrt(const Packet1cd& a) { return psqrt_complex(a); } #endif // __VSX__ } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX32_ALTIVEC_H RcppEigen/inst/include/Eigen/src/Core/arch/CUDA/0000755000176200001440000000000014562417221020671 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/CUDA/Complex.h0000644000176200001440000004164514562417221022463 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Benoit Steiner // Copyright (C) 2021 C. Antonio Sanchez // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_CUDA_H #define EIGEN_COMPLEX_CUDA_H // clang-format off // Many std::complex methods such as operator+, operator-, operator* and // operator/ are not constexpr. Due to this, GCC and older versions of clang do // not treat them as device functions and thus Eigen functors making use of // these operators fail to compile. Here, we manually specialize these // operators and functors for complex types when building for CUDA to enable // their use on-device. #if defined(EIGEN_CUDACC) && defined(EIGEN_GPU_COMPILE_PHASE) // ICC already specializes std::complex and std::complex // operators, preventing us from making them device functions here. // This will lead to silent runtime errors if the operators are used on device. // // To allow std::complex operator use on device, define _OVERRIDE_COMPLEX_SPECIALIZATION_ // prior to first inclusion of . This prevents ICC from adding // its own specializations, so our custom ones below can be used instead. #if !(defined(EIGEN_COMP_ICC) && defined(_USE_COMPLEX_SPECIALIZATION_)) // Import Eigen's internal operator specializations. #define EIGEN_USING_STD_COMPLEX_OPERATORS \ using Eigen::complex_operator_detail::operator+; \ using Eigen::complex_operator_detail::operator-; \ using Eigen::complex_operator_detail::operator*; \ using Eigen::complex_operator_detail::operator/; \ using Eigen::complex_operator_detail::operator+=; \ using Eigen::complex_operator_detail::operator-=; \ using Eigen::complex_operator_detail::operator*=; \ using Eigen::complex_operator_detail::operator/=; \ using Eigen::complex_operator_detail::operator==; \ using Eigen::complex_operator_detail::operator!=; namespace Eigen { // Specialized std::complex overloads. namespace complex_operator_detail { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex complex_multiply(const std::complex& a, const std::complex& b) { const T a_real = numext::real(a); const T a_imag = numext::imag(a); const T b_real = numext::real(b); const T b_imag = numext::imag(b); return std::complex( a_real * b_real - a_imag * b_imag, a_imag * b_real + a_real * b_imag); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex complex_divide_fast(const std::complex& a, const std::complex& b) { const T a_real = numext::real(a); const T a_imag = numext::imag(a); const T b_real = numext::real(b); const T b_imag = numext::imag(b); const T norm = (b_real * b_real + b_imag * b_imag); return std::complex((a_real * b_real + a_imag * b_imag) / norm, (a_imag * b_real - a_real * b_imag) / norm); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex complex_divide_stable(const std::complex& a, const std::complex& b) { const T a_real = numext::real(a); const T a_imag = numext::imag(a); const T b_real = numext::real(b); const T b_imag = numext::imag(b); // Smith's complex division (https://arxiv.org/pdf/1210.4539.pdf), // guards against over/under-flow. const bool scale_imag = numext::abs(b_imag) <= numext::abs(b_real); const T rscale = scale_imag ? T(1) : b_real / b_imag; const T iscale = scale_imag ? b_imag / b_real : T(1); const T denominator = b_real * rscale + b_imag * iscale; return std::complex((a_real * rscale + a_imag * iscale) / denominator, (a_imag * rscale - a_real * iscale) / denominator); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::complex complex_divide(const std::complex& a, const std::complex& b) { #if EIGEN_FAST_MATH return complex_divide_fast(a, b); #else return complex_divide_stable(a, b); #endif } // NOTE: We cannot specialize compound assignment operators with Scalar T, // (i.e. operator@=(const T&), for @=+,-,*,/) // since they are already specialized for float/double/long double within // the standard header. We also do not specialize the stream // operators. #define EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(T) \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator+(const std::complex& a) { return a; } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator-(const std::complex& a) { \ return std::complex(-numext::real(a), -numext::imag(a)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator+(const std::complex& a, const std::complex& b) { \ return std::complex(numext::real(a) + numext::real(b), numext::imag(a) + numext::imag(b)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator+(const std::complex& a, const T& b) { \ return std::complex(numext::real(a) + b, numext::imag(a)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator+(const T& a, const std::complex& b) { \ return std::complex(a + numext::real(b), numext::imag(b)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator-(const std::complex& a, const std::complex& b) { \ return std::complex(numext::real(a) - numext::real(b), numext::imag(a) - numext::imag(b)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator-(const std::complex& a, const T& b) { \ return std::complex(numext::real(a) - b, numext::imag(a)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator-(const T& a, const std::complex& b) { \ return std::complex(a - numext::real(b), -numext::imag(b)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator*(const std::complex& a, const std::complex& b) { \ return complex_multiply(a, b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator*(const std::complex& a, const T& b) { \ return std::complex(numext::real(a) * b, numext::imag(a) * b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator*(const T& a, const std::complex& b) { \ return std::complex(a * numext::real(b), a * numext::imag(b)); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator/(const std::complex& a, const std::complex& b) { \ return complex_divide(a, b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator/(const std::complex& a, const T& b) { \ return std::complex(numext::real(a) / b, numext::imag(a) / b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex operator/(const T& a, const std::complex& b) { \ return complex_divide(std::complex(a, 0), b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex& operator+=(std::complex& a, const std::complex& b) { \ numext::real_ref(a) += numext::real(b); \ numext::imag_ref(a) += numext::imag(b); \ return a; \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex& operator-=(std::complex& a, const std::complex& b) { \ numext::real_ref(a) -= numext::real(b); \ numext::imag_ref(a) -= numext::imag(b); \ return a; \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex& operator*=(std::complex& a, const std::complex& b) { \ a = complex_multiply(a, b); \ return a; \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ std::complex& operator/=(std::complex& a, const std::complex& b) { \ a = complex_divide(a, b); \ return a; \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator==(const std::complex& a, const std::complex& b) { \ return numext::real(a) == numext::real(b) && numext::imag(a) == numext::imag(b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator==(const std::complex& a, const T& b) { \ return numext::real(a) == b && numext::imag(a) == 0; \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator==(const T& a, const std::complex& b) { \ return a == numext::real(b) && 0 == numext::imag(b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator!=(const std::complex& a, const std::complex& b) { \ return !(a == b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator!=(const std::complex& a, const T& b) { \ return !(a == b); \ } \ \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ bool operator!=(const T& a, const std::complex& b) { \ return !(a == b); \ } // Do not specialize for long double, since that reduces to double on device. EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(float) EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS(double) #undef EIGEN_CREATE_STD_COMPLEX_OPERATOR_SPECIALIZATIONS } // namespace complex_operator_detail EIGEN_USING_STD_COMPLEX_OPERATORS namespace numext { EIGEN_USING_STD_COMPLEX_OPERATORS } // namespace numext namespace internal { EIGEN_USING_STD_COMPLEX_OPERATORS } // namespace internal } // namespace Eigen #endif // !(EIGEN_COMP_ICC && _USE_COMPLEX_SPECIALIZATION_) #endif // EIGEN_CUDACC && EIGEN_GPU_COMPILE_PHASE #endif // EIGEN_COMPLEX_CUDA_H RcppEigen/inst/include/Eigen/src/Core/arch/MSA/0000755000176200001440000000000014562417221020575 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/MSA/MathFunctions.h0000644000176200001440000003743714562417221023546 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007 Julien Pommier // Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com) // Copyright (C) 2016 Gael Guennebaud // // Copyright (C) 2018 Wave Computing, Inc. // Written by: // Chris Larsen // Alexey Frunze (afrunze@wavecomp.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* The sin, cos, exp, and log functions of this file come from * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/ */ /* The tanh function of this file is an adaptation of * template T generic_fast_tanh_float(const T&) * from MathFunctionsImpl.h. */ #ifndef EIGEN_MATH_FUNCTIONS_MSA_H #define EIGEN_MATH_FUNCTIONS_MSA_H namespace Eigen { namespace internal { template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f plog(const Packet4f& _x) { static _EIGEN_DECLARE_CONST_Packet4f(cephes_SQRTHF, 0.707106781186547524f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p0, 7.0376836292e-2f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p1, -1.1514610310e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p2, 1.1676998740e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p3, -1.2420140846e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p4, +1.4249322787e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p5, -1.6668057665e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p6, +2.0000714765e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p7, -2.4999993993e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p8, +3.3333331174e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q1, -2.12194440e-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q2, 0.693359375f); static _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f); static _EIGEN_DECLARE_CONST_Packet4f(1, 1.0f); // Convert negative argument into NAN (quiet negative, to be specific). Packet4f zero = (Packet4f)__builtin_msa_ldi_w(0); Packet4i neg_mask = __builtin_msa_fclt_w(_x, zero); Packet4i zero_mask = __builtin_msa_fceq_w(_x, zero); Packet4f non_neg_x_or_nan = padd(_x, (Packet4f)neg_mask); // Add 0.0 or NAN. Packet4f x = non_neg_x_or_nan; // Extract exponent from x = mantissa * 2**exponent, where 1.0 <= mantissa < 2.0. // N.B. the exponent is one less of what frexpf() would return. Packet4i e_int = __builtin_msa_ftint_s_w(__builtin_msa_flog2_w(x)); // Multiply x by 2**(-exponent-1) to get 0.5 <= x < 1.0 as from frexpf(). x = __builtin_msa_fexp2_w(x, (Packet4i)__builtin_msa_nori_b((v16u8)e_int, 0)); /* if (x < SQRTHF) { x = x + x - 1.0; } else { e += 1; x = x - 1.0; } */ Packet4f xx = padd(x, x); Packet4i ge_mask = __builtin_msa_fcle_w(p4f_cephes_SQRTHF, x); e_int = psub(e_int, ge_mask); x = (Packet4f)__builtin_msa_bsel_v((v16u8)ge_mask, (v16u8)xx, (v16u8)x); x = psub(x, p4f_1); Packet4f e = __builtin_msa_ffint_s_w(e_int); Packet4f x2 = pmul(x, x); Packet4f x3 = pmul(x2, x); Packet4f y, y1, y2; y = pmadd(p4f_cephes_log_p0, x, p4f_cephes_log_p1); y1 = pmadd(p4f_cephes_log_p3, x, p4f_cephes_log_p4); y2 = pmadd(p4f_cephes_log_p6, x, p4f_cephes_log_p7); y = pmadd(y, x, p4f_cephes_log_p2); y1 = pmadd(y1, x, p4f_cephes_log_p5); y2 = pmadd(y2, x, p4f_cephes_log_p8); y = pmadd(y, x3, y1); y = pmadd(y, x3, y2); y = pmul(y, x3); y = pmadd(e, p4f_cephes_log_q1, y); x = __builtin_msa_fmsub_w(x, x2, p4f_half); x = padd(x, y); x = pmadd(e, p4f_cephes_log_q2, x); // x is now the logarithm result candidate. We still need to handle the // extreme arguments of zero and positive infinity, though. // N.B. if the argument is +INFINITY, x is NAN because the polynomial terms // contain infinities of both signs (see the coefficients and code above). // INFINITY - INFINITY is NAN. // If the argument is +INFINITY, make it the new result candidate. // To achieve that we choose the smaller of the result candidate and the // argument. // This is correct for all finite pairs of values (the logarithm is smaller // than the argument). // This is also correct in the special case when the argument is +INFINITY // and the result candidate is NAN. This is because the fmin.df instruction // prefers non-NANs to NANs. x = __builtin_msa_fmin_w(x, non_neg_x_or_nan); // If the argument is zero (including -0.0), the result becomes -INFINITY. Packet4i neg_infs = __builtin_msa_slli_w(zero_mask, 23); x = (Packet4f)__builtin_msa_bsel_v((v16u8)zero_mask, (v16u8)x, (v16u8)neg_infs); return x; } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pexp(const Packet4f& _x) { // Limiting single-precision pexp's argument to [-128, +128] lets pexp // reach 0 and INFINITY naturally. static _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -128.0f); static _EIGEN_DECLARE_CONST_Packet4f(exp_hi, +128.0f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500e-4f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507e-3f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073e-3f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894e-2f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459e-1f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201e-1f); static _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f); static _EIGEN_DECLARE_CONST_Packet4f(1, 1.0f); Packet4f x = _x; // Clamp x. x = (Packet4f)__builtin_msa_bsel_v((v16u8)__builtin_msa_fclt_w(x, p4f_exp_lo), (v16u8)x, (v16u8)p4f_exp_lo); x = (Packet4f)__builtin_msa_bsel_v((v16u8)__builtin_msa_fclt_w(p4f_exp_hi, x), (v16u8)x, (v16u8)p4f_exp_hi); // Round to nearest integer by adding 0.5 (with x's sign) and truncating. Packet4f x2_add = (Packet4f)__builtin_msa_binsli_w((v4u32)p4f_half, (v4u32)x, 0); Packet4f x2 = pmadd(x, p4f_cephes_LOG2EF, x2_add); Packet4i x2_int = __builtin_msa_ftrunc_s_w(x2); Packet4f x2_int_f = __builtin_msa_ffint_s_w(x2_int); x = __builtin_msa_fmsub_w(x, x2_int_f, p4f_cephes_exp_C1); x = __builtin_msa_fmsub_w(x, x2_int_f, p4f_cephes_exp_C2); Packet4f z = pmul(x, x); Packet4f y = p4f_cephes_exp_p0; y = pmadd(y, x, p4f_cephes_exp_p1); y = pmadd(y, x, p4f_cephes_exp_p2); y = pmadd(y, x, p4f_cephes_exp_p3); y = pmadd(y, x, p4f_cephes_exp_p4); y = pmadd(y, x, p4f_cephes_exp_p5); y = pmadd(y, z, x); y = padd(y, p4f_1); // y *= 2**exponent. y = __builtin_msa_fexp2_w(y, x2_int); return y; } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f ptanh(const Packet4f& _x) { static _EIGEN_DECLARE_CONST_Packet4f(tanh_tiny, 1e-4f); static _EIGEN_DECLARE_CONST_Packet4f(tanh_hi, 9.0f); // The monomial coefficients of the numerator polynomial (odd). static _EIGEN_DECLARE_CONST_Packet4f(alpha_1, 4.89352455891786e-3f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_3, 6.37261928875436e-4f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_5, 1.48572235717979e-5f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_7, 5.12229709037114e-8f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_9, -8.60467152213735e-11f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_11, 2.00018790482477e-13f); static _EIGEN_DECLARE_CONST_Packet4f(alpha_13, -2.76076847742355e-16f); // The monomial coefficients of the denominator polynomial (even). static _EIGEN_DECLARE_CONST_Packet4f(beta_0, 4.89352518554385e-3f); static _EIGEN_DECLARE_CONST_Packet4f(beta_2, 2.26843463243900e-3f); static _EIGEN_DECLARE_CONST_Packet4f(beta_4, 1.18534705686654e-4f); static _EIGEN_DECLARE_CONST_Packet4f(beta_6, 1.19825839466702e-6f); Packet4f x = pabs(_x); Packet4i tiny_mask = __builtin_msa_fclt_w(x, p4f_tanh_tiny); // Clamp the inputs to the range [-9, 9] since anything outside // this range is -/+1.0f in single-precision. x = (Packet4f)__builtin_msa_bsel_v((v16u8)__builtin_msa_fclt_w(p4f_tanh_hi, x), (v16u8)x, (v16u8)p4f_tanh_hi); // Since the polynomials are odd/even, we need x**2. Packet4f x2 = pmul(x, x); // Evaluate the numerator polynomial p. Packet4f p = pmadd(x2, p4f_alpha_13, p4f_alpha_11); p = pmadd(x2, p, p4f_alpha_9); p = pmadd(x2, p, p4f_alpha_7); p = pmadd(x2, p, p4f_alpha_5); p = pmadd(x2, p, p4f_alpha_3); p = pmadd(x2, p, p4f_alpha_1); p = pmul(x, p); // Evaluate the denominator polynomial q. Packet4f q = pmadd(x2, p4f_beta_6, p4f_beta_4); q = pmadd(x2, q, p4f_beta_2); q = pmadd(x2, q, p4f_beta_0); // Divide the numerator by the denominator. p = pdiv(p, q); // Reinstate the sign. p = (Packet4f)__builtin_msa_binsli_w((v4u32)p, (v4u32)_x, 0); // When the argument is very small in magnitude it's more accurate to just return it. p = (Packet4f)__builtin_msa_bsel_v((v16u8)tiny_mask, (v16u8)p, (v16u8)_x); return p; } template Packet4f psincos_inner_msa_float(const Packet4f& _x) { static _EIGEN_DECLARE_CONST_Packet4f(sincos_max_arg, 13176795.0f); // Approx. (2**24) / (4/Pi). static _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP1, -0.78515625f); static _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP2, -2.4187564849853515625e-4f); static _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP3, -3.77489497744594108e-8f); static _EIGEN_DECLARE_CONST_Packet4f(sincof_p0, -1.9515295891e-4f); static _EIGEN_DECLARE_CONST_Packet4f(sincof_p1, 8.3321608736e-3f); static _EIGEN_DECLARE_CONST_Packet4f(sincof_p2, -1.6666654611e-1f); static _EIGEN_DECLARE_CONST_Packet4f(coscof_p0, 2.443315711809948e-5f); static _EIGEN_DECLARE_CONST_Packet4f(coscof_p1, -1.388731625493765e-3f); static _EIGEN_DECLARE_CONST_Packet4f(coscof_p2, 4.166664568298827e-2f); static _EIGEN_DECLARE_CONST_Packet4f(cephes_FOPI, 1.27323954473516f); // 4/Pi. static _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f); static _EIGEN_DECLARE_CONST_Packet4f(1, 1.0f); Packet4f x = pabs(_x); // Translate infinite arguments into NANs. Packet4f zero_or_nan_if_inf = psub(_x, _x); x = padd(x, zero_or_nan_if_inf); // Prevent sin/cos from generating values larger than 1.0 in magnitude // for very large arguments by setting x to 0.0. Packet4i small_or_nan_mask = __builtin_msa_fcult_w(x, p4f_sincos_max_arg); x = pand(x, (Packet4f)small_or_nan_mask); // Scale x by 4/Pi to find x's octant. Packet4f y = pmul(x, p4f_cephes_FOPI); // Get the octant. We'll reduce x by this number of octants or by one more than it. Packet4i y_int = __builtin_msa_ftrunc_s_w(y); // x's from even-numbered octants will translate to octant 0: [0, +Pi/4]. // x's from odd-numbered octants will translate to octant -1: [-Pi/4, 0]. // Adjustment for odd-numbered octants: octant = (octant + 1) & (~1). Packet4i y_int1 = __builtin_msa_addvi_w(y_int, 1); Packet4i y_int2 = (Packet4i)__builtin_msa_bclri_w((Packet4ui)y_int1, 0); // bclri = bit-clear y = __builtin_msa_ffint_s_w(y_int2); // Compute the sign to apply to the polynomial. Packet4i sign_mask = sine ? pxor(__builtin_msa_slli_w(y_int1, 29), (Packet4i)_x) : __builtin_msa_slli_w(__builtin_msa_addvi_w(y_int, 3), 29); // Get the polynomial selection mask. // We'll calculate both (sin and cos) polynomials and then select from the two. Packet4i poly_mask = __builtin_msa_ceqi_w(__builtin_msa_slli_w(y_int2, 30), 0); // Reduce x by y octants to get: -Pi/4 <= x <= +Pi/4. // The magic pass: "Extended precision modular arithmetic" // x = ((x - y * DP1) - y * DP2) - y * DP3 Packet4f tmp1 = pmul(y, p4f_minus_cephes_DP1); Packet4f tmp2 = pmul(y, p4f_minus_cephes_DP2); Packet4f tmp3 = pmul(y, p4f_minus_cephes_DP3); x = padd(x, tmp1); x = padd(x, tmp2); x = padd(x, tmp3); // Evaluate the cos(x) polynomial. y = p4f_coscof_p0; Packet4f z = pmul(x, x); y = pmadd(y, z, p4f_coscof_p1); y = pmadd(y, z, p4f_coscof_p2); y = pmul(y, z); y = pmul(y, z); y = __builtin_msa_fmsub_w(y, z, p4f_half); y = padd(y, p4f_1); // Evaluate the sin(x) polynomial. Packet4f y2 = p4f_sincof_p0; y2 = pmadd(y2, z, p4f_sincof_p1); y2 = pmadd(y2, z, p4f_sincof_p2); y2 = pmul(y2, z); y2 = pmadd(y2, x, x); // Select the correct result from the two polynomials. y = sine ? (Packet4f)__builtin_msa_bsel_v((v16u8)poly_mask, (v16u8)y, (v16u8)y2) : (Packet4f)__builtin_msa_bsel_v((v16u8)poly_mask, (v16u8)y2, (v16u8)y); // Update the sign. sign_mask = pxor(sign_mask, (Packet4i)y); y = (Packet4f)__builtin_msa_binsli_w((v4u32)y, (v4u32)sign_mask, 0); // binsli = bit-insert-left return y; } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f psin(const Packet4f& x) { return psincos_inner_msa_float(x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4f pcos(const Packet4f& x) { return psincos_inner_msa_float(x); } template <> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet2d pexp(const Packet2d& _x) { // Limiting double-precision pexp's argument to [-1024, +1024] lets pexp // reach 0 and INFINITY naturally. static _EIGEN_DECLARE_CONST_Packet2d(exp_lo, -1024.0); static _EIGEN_DECLARE_CONST_Packet2d(exp_hi, +1024.0); static _EIGEN_DECLARE_CONST_Packet2d(cephes_LOG2EF, 1.4426950408889634073599); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C1, 0.693145751953125); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C2, 1.42860682030941723212e-6); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p0, 1.26177193074810590878e-4); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p1, 3.02994407707441961300e-2); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p2, 9.99999999999999999910e-1); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q0, 3.00198505138664455042e-6); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q1, 2.52448340349684104192e-3); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q2, 2.27265548208155028766e-1); static _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q3, 2.00000000000000000009e0); static _EIGEN_DECLARE_CONST_Packet2d(half, 0.5); static _EIGEN_DECLARE_CONST_Packet2d(1, 1.0); static _EIGEN_DECLARE_CONST_Packet2d(2, 2.0); Packet2d x = _x; // Clamp x. x = (Packet2d)__builtin_msa_bsel_v((v16u8)__builtin_msa_fclt_d(x, p2d_exp_lo), (v16u8)x, (v16u8)p2d_exp_lo); x = (Packet2d)__builtin_msa_bsel_v((v16u8)__builtin_msa_fclt_d(p2d_exp_hi, x), (v16u8)x, (v16u8)p2d_exp_hi); // Round to nearest integer by adding 0.5 (with x's sign) and truncating. Packet2d x2_add = (Packet2d)__builtin_msa_binsli_d((v2u64)p2d_half, (v2u64)x, 0); Packet2d x2 = pmadd(x, p2d_cephes_LOG2EF, x2_add); Packet2l x2_long = __builtin_msa_ftrunc_s_d(x2); Packet2d x2_long_d = __builtin_msa_ffint_s_d(x2_long); x = __builtin_msa_fmsub_d(x, x2_long_d, p2d_cephes_exp_C1); x = __builtin_msa_fmsub_d(x, x2_long_d, p2d_cephes_exp_C2); x2 = pmul(x, x); Packet2d px = p2d_cephes_exp_p0; px = pmadd(px, x2, p2d_cephes_exp_p1); px = pmadd(px, x2, p2d_cephes_exp_p2); px = pmul(px, x); Packet2d qx = p2d_cephes_exp_q0; qx = pmadd(qx, x2, p2d_cephes_exp_q1); qx = pmadd(qx, x2, p2d_cephes_exp_q2); qx = pmadd(qx, x2, p2d_cephes_exp_q3); x = pdiv(px, psub(qx, px)); x = pmadd(p2d_2, x, p2d_1); // x *= 2**exponent. x = __builtin_msa_fexp2_d(x, x2_long); return x; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATH_FUNCTIONS_MSA_H RcppEigen/inst/include/Eigen/src/Core/arch/MSA/PacketMath.h0000644000176200001440000010151714562417221022774 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2018 Wave Computing, Inc. // Written by: // Chris Larsen // Alexey Frunze (afrunze@wavecomp.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PACKET_MATH_MSA_H #define EIGEN_PACKET_MATH_MSA_H #include #include namespace Eigen { namespace internal { #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif #ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD #define EIGEN_HAS_SINGLE_INSTRUCTION_MADD #endif #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 32 #endif #if 0 #define EIGEN_MSA_DEBUG \ static bool firstTime = true; \ do { \ if (firstTime) { \ std::cout << __FILE__ << ':' << __LINE__ << ':' << __FUNCTION__ << std::endl; \ firstTime = false; \ } \ } while (0) #else #define EIGEN_MSA_DEBUG #endif #define EIGEN_MSA_SHF_I8(a, b, c, d) (((d) << 6) | ((c) << 4) | ((b) << 2) | (a)) typedef v4f32 Packet4f; typedef v4i32 Packet4i; typedef v4u32 Packet4ui; #define _EIGEN_DECLARE_CONST_Packet4f(NAME, X) const Packet4f p4f_##NAME = { X, X, X, X } #define _EIGEN_DECLARE_CONST_Packet4i(NAME, X) const Packet4i p4i_##NAME = { X, X, X, X } #define _EIGEN_DECLARE_CONST_Packet4ui(NAME, X) const Packet4ui p4ui_##NAME = { X, X, X, X } inline std::ostream& operator<<(std::ostream& os, const Packet4f& value) { os << "[ " << value[0] << ", " << value[1] << ", " << value[2] << ", " << value[3] << " ]"; return os; } inline std::ostream& operator<<(std::ostream& os, const Packet4i& value) { os << "[ " << value[0] << ", " << value[1] << ", " << value[2] << ", " << value[3] << " ]"; return os; } inline std::ostream& operator<<(std::ostream& os, const Packet4ui& value) { os << "[ " << value[0] << ", " << value[1] << ", " << value[2] << ", " << value[3] << " ]"; return os; } template <> struct packet_traits : default_packet_traits { typedef Packet4f type; typedef Packet4f half; // Packet2f intrinsics not implemented yet enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, // Packet2f intrinsics not implemented yet // FIXME check the Has* HasDiv = 1, HasSin = EIGEN_FAST_MATH, HasCos = EIGEN_FAST_MATH, HasTanh = EIGEN_FAST_MATH, HasErf = EIGEN_FAST_MATH, HasLog = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasBlend = 1 }; }; template <> struct packet_traits : default_packet_traits { typedef Packet4i type; typedef Packet4i half; // Packet2i intrinsics not implemented yet enum { Vectorizable = 1, AlignedOnScalar = 1, size = 4, HasHalfPacket = 0, // Packet2i intrinsics not implemented yet // FIXME check the Has* HasDiv = 1, HasBlend = 1 }; }; template <> struct unpacket_traits { typedef float type; enum { size = 4, alignment = Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; typedef Packet4f half; }; template <> struct unpacket_traits { typedef int32_t type; enum { size = 4, alignment = Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; typedef Packet4i half; }; template <> EIGEN_STRONG_INLINE Packet4f pset1(const float& from) { EIGEN_MSA_DEBUG; Packet4f v = { from, from, from, from }; return v; } template <> EIGEN_STRONG_INLINE Packet4i pset1(const int32_t& from) { EIGEN_MSA_DEBUG; return __builtin_msa_fill_w(from); } template <> EIGEN_STRONG_INLINE Packet4f pload1(const float* from) { EIGEN_MSA_DEBUG; float f = *from; Packet4f v = { f, f, f, f }; return v; } template <> EIGEN_STRONG_INLINE Packet4i pload1(const int32_t* from) { EIGEN_MSA_DEBUG; return __builtin_msa_fill_w(*from); } template <> EIGEN_STRONG_INLINE Packet4f padd(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fadd_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4i padd(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_addv_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f plset(const float& a) { EIGEN_MSA_DEBUG; static const Packet4f countdown = { 0.0f, 1.0f, 2.0f, 3.0f }; return padd(pset1(a), countdown); } template <> EIGEN_STRONG_INLINE Packet4i plset(const int32_t& a) { EIGEN_MSA_DEBUG; static const Packet4i countdown = { 0, 1, 2, 3 }; return padd(pset1(a), countdown); } template <> EIGEN_STRONG_INLINE Packet4f psub(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fsub_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4i psub(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_subv_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_bnegi_w((v4u32)a, 31); } template <> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { EIGEN_MSA_DEBUG; return __builtin_msa_addvi_w((v4i32)__builtin_msa_nori_b((v16u8)a, 0), 1); } template <> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { EIGEN_MSA_DEBUG; return a; } template <> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { EIGEN_MSA_DEBUG; return a; } template <> EIGEN_STRONG_INLINE Packet4f pmul(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fmul_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4i pmul(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_mulv_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f pdiv(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fdiv_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4i pdiv(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_div_s_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { EIGEN_MSA_DEBUG; return __builtin_msa_fmadd_w(c, a, b); } template <> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { EIGEN_MSA_DEBUG; // Use "asm" construct to avoid __builtin_msa_maddv_w GNU C bug. Packet4i value = c; __asm__("maddv.w %w[value], %w[a], %w[b]\n" // Outputs : [value] "+f"(value) // Inputs : [a] "f"(a), [b] "f"(b)); return value; } template <> EIGEN_STRONG_INLINE Packet4f pand(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_and_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4i pand(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return (Packet4i)__builtin_msa_and_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4f por(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_or_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4i por(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return (Packet4i)__builtin_msa_or_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4f pxor(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_xor_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4i pxor(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return (Packet4i)__builtin_msa_xor_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet4f pandnot(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; return pand(a, (Packet4f)__builtin_msa_xori_b((v16u8)b, 255)); } template <> EIGEN_STRONG_INLINE Packet4i pandnot(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return pand(a, (Packet4i)__builtin_msa_xori_b((v16u8)b, 255)); } template <> EIGEN_STRONG_INLINE Packet4f pmin(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH // This prefers numbers to NaNs. return __builtin_msa_fmin_w(a, b); #else // This prefers NaNs to numbers. Packet4i aNaN = __builtin_msa_fcun_w(a, a); Packet4i aMinOrNaN = por(__builtin_msa_fclt_w(a, b), aNaN); return (Packet4f)__builtin_msa_bsel_v((v16u8)aMinOrNaN, (v16u8)b, (v16u8)a); #endif } template <> EIGEN_STRONG_INLINE Packet4i pmin(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_min_s_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f pmax(const Packet4f& a, const Packet4f& b) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH // This prefers numbers to NaNs. return __builtin_msa_fmax_w(a, b); #else // This prefers NaNs to numbers. Packet4i aNaN = __builtin_msa_fcun_w(a, a); Packet4i aMaxOrNaN = por(__builtin_msa_fclt_w(b, a), aNaN); return (Packet4f)__builtin_msa_bsel_v((v16u8)aMaxOrNaN, (v16u8)b, (v16u8)a); #endif } template <> EIGEN_STRONG_INLINE Packet4i pmax(const Packet4i& a, const Packet4i& b) { EIGEN_MSA_DEBUG; return __builtin_msa_max_s_w(a, b); } template <> EIGEN_STRONG_INLINE Packet4f pload(const float* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_LOAD return (Packet4f)__builtin_msa_ld_w(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet4i pload(const int32_t* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_LOAD return __builtin_msa_ld_w(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet4f ploadu(const float* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return (Packet4f)__builtin_msa_ld_w(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet4i ploadu(const int32_t* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return (Packet4i)__builtin_msa_ld_w(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet4f ploaddup(const float* from) { EIGEN_MSA_DEBUG; float f0 = from[0], f1 = from[1]; Packet4f v0 = { f0, f0, f0, f0 }; Packet4f v1 = { f1, f1, f1, f1 }; return (Packet4f)__builtin_msa_ilvr_d((v2i64)v1, (v2i64)v0); } template <> EIGEN_STRONG_INLINE Packet4i ploaddup(const int32_t* from) { EIGEN_MSA_DEBUG; int32_t i0 = from[0], i1 = from[1]; Packet4i v0 = { i0, i0, i0, i0 }; Packet4i v1 = { i1, i1, i1, i1 }; return (Packet4i)__builtin_msa_ilvr_d((v2i64)v1, (v2i64)v0); } template <> EIGEN_STRONG_INLINE void pstore(float* to, const Packet4f& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_STORE __builtin_msa_st_w((Packet4i)from, to, 0); } template <> EIGEN_STRONG_INLINE void pstore(int32_t* to, const Packet4i& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_STORE __builtin_msa_st_w(from, to, 0); } template <> EIGEN_STRONG_INLINE void pstoreu(float* to, const Packet4f& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_STORE __builtin_msa_st_w((Packet4i)from, to, 0); } template <> EIGEN_STRONG_INLINE void pstoreu(int32_t* to, const Packet4i& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_STORE __builtin_msa_st_w(from, to, 0); } template <> EIGEN_DEVICE_FUNC inline Packet4f pgather(const float* from, Index stride) { EIGEN_MSA_DEBUG; float f = *from; Packet4f v = { f, f, f, f }; v[1] = from[stride]; v[2] = from[2 * stride]; v[3] = from[3 * stride]; return v; } template <> EIGEN_DEVICE_FUNC inline Packet4i pgather(const int32_t* from, Index stride) { EIGEN_MSA_DEBUG; int32_t i = *from; Packet4i v = { i, i, i, i }; v[1] = from[stride]; v[2] = from[2 * stride]; v[3] = from[3 * stride]; return v; } template <> EIGEN_DEVICE_FUNC inline void pscatter(float* to, const Packet4f& from, Index stride) { EIGEN_MSA_DEBUG; *to = from[0]; to += stride; *to = from[1]; to += stride; *to = from[2]; to += stride; *to = from[3]; } template <> EIGEN_DEVICE_FUNC inline void pscatter(int32_t* to, const Packet4i& from, Index stride) { EIGEN_MSA_DEBUG; *to = from[0]; to += stride; *to = from[1]; to += stride; *to = from[2]; to += stride; *to = from[3]; } template <> EIGEN_STRONG_INLINE void prefetch(const float* addr) { EIGEN_MSA_DEBUG; __builtin_prefetch(addr); } template <> EIGEN_STRONG_INLINE void prefetch(const int32_t* addr) { EIGEN_MSA_DEBUG; __builtin_prefetch(addr); } template <> EIGEN_STRONG_INLINE float pfirst(const Packet4f& a) { EIGEN_MSA_DEBUG; return a[0]; } template <> EIGEN_STRONG_INLINE int32_t pfirst(const Packet4i& a) { EIGEN_MSA_DEBUG; return a[0]; } template <> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_shf_w((v4i32)a, EIGEN_MSA_SHF_I8(3, 2, 1, 0)); } template <> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { EIGEN_MSA_DEBUG; return __builtin_msa_shf_w(a, EIGEN_MSA_SHF_I8(3, 2, 1, 0)); } template <> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { EIGEN_MSA_DEBUG; return (Packet4f)__builtin_msa_bclri_w((v4u32)a, 31); } template <> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { EIGEN_MSA_DEBUG; Packet4i zero = __builtin_msa_ldi_w(0); return __builtin_msa_add_a_w(zero, a); } template <> EIGEN_STRONG_INLINE float predux(const Packet4f& a) { EIGEN_MSA_DEBUG; Packet4f s = padd(a, (Packet4f)__builtin_msa_shf_w((v4i32)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); s = padd(s, (Packet4f)__builtin_msa_shf_w((v4i32)s, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return s[0]; } template <> EIGEN_STRONG_INLINE int32_t predux(const Packet4i& a) { EIGEN_MSA_DEBUG; Packet4i s = padd(a, __builtin_msa_shf_w(a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); s = padd(s, __builtin_msa_shf_w(s, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return s[0]; } // Other reduction functions: // mul template <> EIGEN_STRONG_INLINE float predux_mul(const Packet4f& a) { EIGEN_MSA_DEBUG; Packet4f p = pmul(a, (Packet4f)__builtin_msa_shf_w((v4i32)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); p = pmul(p, (Packet4f)__builtin_msa_shf_w((v4i32)p, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return p[0]; } template <> EIGEN_STRONG_INLINE int32_t predux_mul(const Packet4i& a) { EIGEN_MSA_DEBUG; Packet4i p = pmul(a, __builtin_msa_shf_w(a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); p = pmul(p, __builtin_msa_shf_w(p, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return p[0]; } // min template <> EIGEN_STRONG_INLINE float predux_min(const Packet4f& a) { EIGEN_MSA_DEBUG; // Swap 64-bit halves of a. Packet4f swapped = (Packet4f)__builtin_msa_shf_w((Packet4i)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); #if !EIGEN_FAST_MATH // Detect presence of NaNs from pairs a[0]-a[2] and a[1]-a[3] as two 32-bit // masks of all zeroes/ones in low 64 bits. v16u8 unord = (v16u8)__builtin_msa_fcun_w(a, swapped); // Combine the two masks into one: 64 ones if no NaNs, otherwise 64 zeroes. unord = (v16u8)__builtin_msa_ceqi_d((v2i64)unord, 0); #endif // Continue with min computation. Packet4f v = __builtin_msa_fmin_w(a, swapped); v = __builtin_msa_fmin_w( v, (Packet4f)__builtin_msa_shf_w((Packet4i)v, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); #if !EIGEN_FAST_MATH // Based on the mask select between v and 4 qNaNs. v16u8 qnans = (v16u8)__builtin_msa_fill_w(0x7FC00000); v = (Packet4f)__builtin_msa_bsel_v(unord, qnans, (v16u8)v); #endif return v[0]; } template <> EIGEN_STRONG_INLINE int32_t predux_min(const Packet4i& a) { EIGEN_MSA_DEBUG; Packet4i m = pmin(a, __builtin_msa_shf_w(a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); m = pmin(m, __builtin_msa_shf_w(m, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return m[0]; } // max template <> EIGEN_STRONG_INLINE float predux_max(const Packet4f& a) { EIGEN_MSA_DEBUG; // Swap 64-bit halves of a. Packet4f swapped = (Packet4f)__builtin_msa_shf_w((Packet4i)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); #if !EIGEN_FAST_MATH // Detect presence of NaNs from pairs a[0]-a[2] and a[1]-a[3] as two 32-bit // masks of all zeroes/ones in low 64 bits. v16u8 unord = (v16u8)__builtin_msa_fcun_w(a, swapped); // Combine the two masks into one: 64 ones if no NaNs, otherwise 64 zeroes. unord = (v16u8)__builtin_msa_ceqi_d((v2i64)unord, 0); #endif // Continue with max computation. Packet4f v = __builtin_msa_fmax_w(a, swapped); v = __builtin_msa_fmax_w( v, (Packet4f)__builtin_msa_shf_w((Packet4i)v, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); #if !EIGEN_FAST_MATH // Based on the mask select between v and 4 qNaNs. v16u8 qnans = (v16u8)__builtin_msa_fill_w(0x7FC00000); v = (Packet4f)__builtin_msa_bsel_v(unord, qnans, (v16u8)v); #endif return v[0]; } template <> EIGEN_STRONG_INLINE int32_t predux_max(const Packet4i& a) { EIGEN_MSA_DEBUG; Packet4i m = pmax(a, __builtin_msa_shf_w(a, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); m = pmax(m, __builtin_msa_shf_w(m, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); return m[0]; } inline std::ostream& operator<<(std::ostream& os, const PacketBlock& value) { os << "[ " << value.packet[0] << "," << std::endl << " " << value.packet[1] << "," << std::endl << " " << value.packet[2] << "," << std::endl << " " << value.packet[3] << " ]"; return os; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { EIGEN_MSA_DEBUG; v4i32 tmp1, tmp2, tmp3, tmp4; tmp1 = __builtin_msa_ilvr_w((v4i32)kernel.packet[1], (v4i32)kernel.packet[0]); tmp2 = __builtin_msa_ilvr_w((v4i32)kernel.packet[3], (v4i32)kernel.packet[2]); tmp3 = __builtin_msa_ilvl_w((v4i32)kernel.packet[1], (v4i32)kernel.packet[0]); tmp4 = __builtin_msa_ilvl_w((v4i32)kernel.packet[3], (v4i32)kernel.packet[2]); kernel.packet[0] = (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1); kernel.packet[1] = (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1); kernel.packet[2] = (Packet4f)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3); kernel.packet[3] = (Packet4f)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3); } inline std::ostream& operator<<(std::ostream& os, const PacketBlock& value) { os << "[ " << value.packet[0] << "," << std::endl << " " << value.packet[1] << "," << std::endl << " " << value.packet[2] << "," << std::endl << " " << value.packet[3] << " ]"; return os; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { EIGEN_MSA_DEBUG; v4i32 tmp1, tmp2, tmp3, tmp4; tmp1 = __builtin_msa_ilvr_w(kernel.packet[1], kernel.packet[0]); tmp2 = __builtin_msa_ilvr_w(kernel.packet[3], kernel.packet[2]); tmp3 = __builtin_msa_ilvl_w(kernel.packet[1], kernel.packet[0]); tmp4 = __builtin_msa_ilvl_w(kernel.packet[3], kernel.packet[2]); kernel.packet[0] = (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp2, (v2i64)tmp1); kernel.packet[1] = (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp2, (v2i64)tmp1); kernel.packet[2] = (Packet4i)__builtin_msa_ilvr_d((v2i64)tmp4, (v2i64)tmp3); kernel.packet[3] = (Packet4i)__builtin_msa_ilvod_d((v2i64)tmp4, (v2i64)tmp3); } template <> EIGEN_STRONG_INLINE Packet4f psqrt(const Packet4f& a) { EIGEN_MSA_DEBUG; return __builtin_msa_fsqrt_w(a); } template <> EIGEN_STRONG_INLINE Packet4f prsqrt(const Packet4f& a) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH return __builtin_msa_frsqrt_w(a); #else Packet4f ones = __builtin_msa_ffint_s_w(__builtin_msa_ldi_w(1)); return pdiv(ones, psqrt(a)); #endif } template <> EIGEN_STRONG_INLINE Packet4f pfloor(const Packet4f& a) { Packet4f v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" // 3 = round towards -INFINITY. "ctcmsa $1, %[new_mode]\n" "frint.w %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet4f pceil(const Packet4f& a) { Packet4f v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" "xori %[new_mode], %[new_mode], 1\n" // 2 = round towards +INFINITY. "ctcmsa $1, %[new_mode]\n" "frint.w %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet4f pround(const Packet4f& a) { Packet4f v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" "xori %[new_mode], %[new_mode], 3\n" // 0 = round to nearest, ties to even. "ctcmsa $1, %[new_mode]\n" "frint.w %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) { Packet4ui select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3] }; Packet4i mask = __builtin_msa_ceqi_w((Packet4i)select, 0); return (Packet4f)__builtin_msa_bsel_v((v16u8)mask, (v16u8)thenPacket, (v16u8)elsePacket); } template <> EIGEN_STRONG_INLINE Packet4i pblend(const Selector<4>& ifPacket, const Packet4i& thenPacket, const Packet4i& elsePacket) { Packet4ui select = { ifPacket.select[0], ifPacket.select[1], ifPacket.select[2], ifPacket.select[3] }; Packet4i mask = __builtin_msa_ceqi_w((Packet4i)select, 0); return (Packet4i)__builtin_msa_bsel_v((v16u8)mask, (v16u8)thenPacket, (v16u8)elsePacket); } //---------- double ---------- typedef v2f64 Packet2d; typedef v2i64 Packet2l; typedef v2u64 Packet2ul; #define _EIGEN_DECLARE_CONST_Packet2d(NAME, X) const Packet2d p2d_##NAME = { X, X } #define _EIGEN_DECLARE_CONST_Packet2l(NAME, X) const Packet2l p2l_##NAME = { X, X } #define _EIGEN_DECLARE_CONST_Packet2ul(NAME, X) const Packet2ul p2ul_##NAME = { X, X } inline std::ostream& operator<<(std::ostream& os, const Packet2d& value) { os << "[ " << value[0] << ", " << value[1] << " ]"; return os; } inline std::ostream& operator<<(std::ostream& os, const Packet2l& value) { os << "[ " << value[0] << ", " << value[1] << " ]"; return os; } inline std::ostream& operator<<(std::ostream& os, const Packet2ul& value) { os << "[ " << value[0] << ", " << value[1] << " ]"; return os; } template <> struct packet_traits : default_packet_traits { typedef Packet2d type; typedef Packet2d half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, // FIXME check the Has* HasDiv = 1, HasExp = 1, HasSqrt = 1, HasRsqrt = 1, HasRound = 1, HasFloor = 1, HasCeil = 1, HasBlend = 1 }; }; template <> struct unpacket_traits { typedef double type; enum { size = 2, alignment = Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; typedef Packet2d half; }; template <> EIGEN_STRONG_INLINE Packet2d pset1(const double& from) { EIGEN_MSA_DEBUG; Packet2d value = { from, from }; return value; } template <> EIGEN_STRONG_INLINE Packet2d padd(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fadd_d(a, b); } template <> EIGEN_STRONG_INLINE Packet2d plset(const double& a) { EIGEN_MSA_DEBUG; static const Packet2d countdown = { 0.0, 1.0 }; return padd(pset1(a), countdown); } template <> EIGEN_STRONG_INLINE Packet2d psub(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fsub_d(a, b); } template <> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_bnegi_d((v2u64)a, 63); } template <> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { EIGEN_MSA_DEBUG; return a; } template <> EIGEN_STRONG_INLINE Packet2d pmul(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fmul_d(a, b); } template <> EIGEN_STRONG_INLINE Packet2d pdiv(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return __builtin_msa_fdiv_d(a, b); } template <> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { EIGEN_MSA_DEBUG; return __builtin_msa_fmadd_d(c, a, b); } // Logical Operations are not supported for float, so we have to reinterpret casts using MSA // intrinsics template <> EIGEN_STRONG_INLINE Packet2d pand(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_and_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet2d por(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_or_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet2d pxor(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_xor_v((v16u8)a, (v16u8)b); } template <> EIGEN_STRONG_INLINE Packet2d pandnot(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; return pand(a, (Packet2d)__builtin_msa_xori_b((v16u8)b, 255)); } template <> EIGEN_STRONG_INLINE Packet2d pload(const double* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return (Packet2d)__builtin_msa_ld_d(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet2d pmin(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH // This prefers numbers to NaNs. return __builtin_msa_fmin_d(a, b); #else // This prefers NaNs to numbers. v2i64 aNaN = __builtin_msa_fcun_d(a, a); v2i64 aMinOrNaN = por(__builtin_msa_fclt_d(a, b), aNaN); return (Packet2d)__builtin_msa_bsel_v((v16u8)aMinOrNaN, (v16u8)b, (v16u8)a); #endif } template <> EIGEN_STRONG_INLINE Packet2d pmax(const Packet2d& a, const Packet2d& b) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH // This prefers numbers to NaNs. return __builtin_msa_fmax_d(a, b); #else // This prefers NaNs to numbers. v2i64 aNaN = __builtin_msa_fcun_d(a, a); v2i64 aMaxOrNaN = por(__builtin_msa_fclt_d(b, a), aNaN); return (Packet2d)__builtin_msa_bsel_v((v16u8)aMaxOrNaN, (v16u8)b, (v16u8)a); #endif } template <> EIGEN_STRONG_INLINE Packet2d ploadu(const double* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return (Packet2d)__builtin_msa_ld_d(const_cast(from), 0); } template <> EIGEN_STRONG_INLINE Packet2d ploaddup(const double* from) { EIGEN_MSA_DEBUG; Packet2d value = { *from, *from }; return value; } template <> EIGEN_STRONG_INLINE void pstore(double* to, const Packet2d& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_STORE __builtin_msa_st_d((v2i64)from, to, 0); } template <> EIGEN_STRONG_INLINE void pstoreu(double* to, const Packet2d& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_STORE __builtin_msa_st_d((v2i64)from, to, 0); } template <> EIGEN_DEVICE_FUNC inline Packet2d pgather(const double* from, Index stride) { EIGEN_MSA_DEBUG; Packet2d value; value[0] = *from; from += stride; value[1] = *from; return value; } template <> EIGEN_DEVICE_FUNC inline void pscatter(double* to, const Packet2d& from, Index stride) { EIGEN_MSA_DEBUG; *to = from[0]; to += stride; *to = from[1]; } template <> EIGEN_STRONG_INLINE void prefetch(const double* addr) { EIGEN_MSA_DEBUG; __builtin_prefetch(addr); } template <> EIGEN_STRONG_INLINE double pfirst(const Packet2d& a) { EIGEN_MSA_DEBUG; return a[0]; } template <> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_shf_w((v4i32)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); } template <> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { EIGEN_MSA_DEBUG; return (Packet2d)__builtin_msa_bclri_d((v2u64)a, 63); } template <> EIGEN_STRONG_INLINE double predux(const Packet2d& a) { EIGEN_MSA_DEBUG; Packet2d s = padd(a, preverse(a)); return s[0]; } // Other reduction functions: // mul template <> EIGEN_STRONG_INLINE double predux_mul(const Packet2d& a) { EIGEN_MSA_DEBUG; Packet2d p = pmul(a, preverse(a)); return p[0]; } // min template <> EIGEN_STRONG_INLINE double predux_min(const Packet2d& a) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH Packet2d swapped = (Packet2d)__builtin_msa_shf_w((Packet4i)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); Packet2d v = __builtin_msa_fmin_d(a, swapped); return v[0]; #else double a0 = a[0], a1 = a[1]; return ((numext::isnan)(a0) || a0 < a1) ? a0 : a1; #endif } // max template <> EIGEN_STRONG_INLINE double predux_max(const Packet2d& a) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH Packet2d swapped = (Packet2d)__builtin_msa_shf_w((Packet4i)a, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); Packet2d v = __builtin_msa_fmax_d(a, swapped); return v[0]; #else double a0 = a[0], a1 = a[1]; return ((numext::isnan)(a0) || a0 > a1) ? a0 : a1; #endif } template <> EIGEN_STRONG_INLINE Packet2d psqrt(const Packet2d& a) { EIGEN_MSA_DEBUG; return __builtin_msa_fsqrt_d(a); } template <> EIGEN_STRONG_INLINE Packet2d prsqrt(const Packet2d& a) { EIGEN_MSA_DEBUG; #if EIGEN_FAST_MATH return __builtin_msa_frsqrt_d(a); #else Packet2d ones = __builtin_msa_ffint_s_d(__builtin_msa_ldi_d(1)); return pdiv(ones, psqrt(a)); #endif } inline std::ostream& operator<<(std::ostream& os, const PacketBlock& value) { os << "[ " << value.packet[0] << "," << std::endl << " " << value.packet[1] << " ]"; return os; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { EIGEN_MSA_DEBUG; Packet2d trn1 = (Packet2d)__builtin_msa_ilvev_d((v2i64)kernel.packet[1], (v2i64)kernel.packet[0]); Packet2d trn2 = (Packet2d)__builtin_msa_ilvod_d((v2i64)kernel.packet[1], (v2i64)kernel.packet[0]); kernel.packet[0] = trn1; kernel.packet[1] = trn2; } template <> EIGEN_STRONG_INLINE Packet2d pfloor(const Packet2d& a) { Packet2d v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" // 3 = round towards -INFINITY. "ctcmsa $1, %[new_mode]\n" "frint.d %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet2d pceil(const Packet2d& a) { Packet2d v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" "xori %[new_mode], %[new_mode], 1\n" // 2 = round towards +INFINITY. "ctcmsa $1, %[new_mode]\n" "frint.d %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet2d pround(const Packet2d& a) { Packet2d v = a; int32_t old_mode, new_mode; asm volatile( "cfcmsa %[old_mode], $1\n" "ori %[new_mode], %[old_mode], 3\n" "xori %[new_mode], %[new_mode], 3\n" // 0 = round to nearest, ties to even. "ctcmsa $1, %[new_mode]\n" "frint.d %w[v], %w[v]\n" "ctcmsa $1, %[old_mode]\n" : // outputs [old_mode] "=r"(old_mode), [new_mode] "=r"(new_mode), [v] "+f"(v) : // inputs : // clobbers ); return v; } template <> EIGEN_STRONG_INLINE Packet2d pblend(const Selector<2>& ifPacket, const Packet2d& thenPacket, const Packet2d& elsePacket) { Packet2ul select = { ifPacket.select[0], ifPacket.select[1] }; Packet2l mask = __builtin_msa_ceqi_d((Packet2l)select, 0); return (Packet2d)__builtin_msa_bsel_v((v16u8)mask, (v16u8)thenPacket, (v16u8)elsePacket); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_PACKET_MATH_MSA_H RcppEigen/inst/include/Eigen/src/Core/arch/MSA/Complex.h0000644000176200001440000004220514562417221022360 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2018 Wave Computing, Inc. // Written by: // Chris Larsen // Alexey Frunze (afrunze@wavecomp.com) // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COMPLEX_MSA_H #define EIGEN_COMPLEX_MSA_H #include namespace Eigen { namespace internal { //---------- float ---------- struct Packet2cf { EIGEN_STRONG_INLINE Packet2cf() { } EIGEN_STRONG_INLINE explicit Packet2cf(const std::complex& a, const std::complex& b) { Packet4f t = { std::real(a), std::imag(a), std::real(b), std::imag(b) }; v = t; } EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) { } EIGEN_STRONG_INLINE Packet2cf(const Packet2cf& a) : v(a.v) { } EIGEN_STRONG_INLINE Packet2cf& operator=(const Packet2cf& b) { v = b.v; return *this; } EIGEN_STRONG_INLINE Packet2cf conjugate(void) const { return Packet2cf((Packet4f)__builtin_msa_bnegi_d((v2u64)v, 63)); } EIGEN_STRONG_INLINE Packet2cf& operator*=(const Packet2cf& b) { Packet4f v1, v2; // Get the real values of a | a1_re | a1_re | a2_re | a2_re | v1 = (Packet4f)__builtin_msa_ilvev_w((v4i32)v, (v4i32)v); // Get the imag values of a | a1_im | a1_im | a2_im | a2_im | v2 = (Packet4f)__builtin_msa_ilvod_w((v4i32)v, (v4i32)v); // Multiply the real a with b v1 = pmul(v1, b.v); // Multiply the imag a with b v2 = pmul(v2, b.v); // Conjugate v2 v2 = Packet2cf(v2).conjugate().v; // Swap real/imag elements in v2. v2 = (Packet4f)__builtin_msa_shf_w((v4i32)v2, EIGEN_MSA_SHF_I8(1, 0, 3, 2)); // Add and return the result v = padd(v1, v2); return *this; } EIGEN_STRONG_INLINE Packet2cf operator*(const Packet2cf& b) const { return Packet2cf(*this) *= b; } EIGEN_STRONG_INLINE Packet2cf& operator+=(const Packet2cf& b) { v = padd(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet2cf operator+(const Packet2cf& b) const { return Packet2cf(*this) += b; } EIGEN_STRONG_INLINE Packet2cf& operator-=(const Packet2cf& b) { v = psub(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet2cf operator-(const Packet2cf& b) const { return Packet2cf(*this) -= b; } EIGEN_STRONG_INLINE Packet2cf& operator/=(const Packet2cf& b) { *this *= b.conjugate(); Packet4f s = pmul(b.v, b.v); s = padd(s, (Packet4f)__builtin_msa_shf_w((v4i32)s, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); v = pdiv(v, s); return *this; } EIGEN_STRONG_INLINE Packet2cf operator/(const Packet2cf& b) const { return Packet2cf(*this) /= b; } EIGEN_STRONG_INLINE Packet2cf operator-(void) const { return Packet2cf(pnegate(v)); } Packet4f v; }; inline std::ostream& operator<<(std::ostream& os, const Packet2cf& value) { os << "[ (" << value.v[0] << ", " << value.v[1] << "i)," " (" << value.v[2] << ", " << value.v[3] << "i) ]"; return os; } template <> struct packet_traits > : default_packet_traits { typedef Packet2cf type; typedef Packet2cf half; enum { Vectorizable = 1, AlignedOnScalar = 1, size = 2, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0, HasBlend = 1 }; }; template <> struct unpacket_traits { typedef std::complex type; enum { size = 2, alignment = Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; typedef Packet2cf half; }; template <> EIGEN_STRONG_INLINE Packet2cf pset1(const std::complex& from) { EIGEN_MSA_DEBUG; float f0 = from.real(), f1 = from.imag(); Packet4f v0 = { f0, f0, f0, f0 }; Packet4f v1 = { f1, f1, f1, f1 }; return Packet2cf((Packet4f)__builtin_msa_ilvr_w((Packet4i)v1, (Packet4i)v0)); } template <> EIGEN_STRONG_INLINE Packet2cf padd(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return a + b; } template <> EIGEN_STRONG_INLINE Packet2cf psub(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return a - b; } template <> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { EIGEN_MSA_DEBUG; return -a; } template <> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { EIGEN_MSA_DEBUG; return a.conjugate(); } template <> EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return a * b; } template <> EIGEN_STRONG_INLINE Packet2cf pand(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return Packet2cf(pand(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet2cf por(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return Packet2cf(por(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet2cf pxor(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return Packet2cf(pxor(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet2cf pandnot(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return Packet2cf(pandnot(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet2cf pload(const std::complex* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload((const float*)from)); } template <> EIGEN_STRONG_INLINE Packet2cf ploadu(const std::complex* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu((const float*)from)); } template <> EIGEN_STRONG_INLINE Packet2cf ploaddup(const std::complex* from) { EIGEN_MSA_DEBUG; return pset1(*from); } template <> EIGEN_STRONG_INLINE void pstore >(std::complex* to, const Packet2cf& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); } template <> EIGEN_STRONG_INLINE void pstoreu >(std::complex* to, const Packet2cf& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); } template <> EIGEN_DEVICE_FUNC inline Packet2cf pgather, Packet2cf>( const std::complex* from, Index stride) { EIGEN_MSA_DEBUG; return Packet2cf(from[0 * stride], from[1 * stride]); } template <> EIGEN_DEVICE_FUNC inline void pscatter, Packet2cf>(std::complex* to, const Packet2cf& from, Index stride) { EIGEN_MSA_DEBUG; *to = std::complex(from.v[0], from.v[1]); to += stride; *to = std::complex(from.v[2], from.v[3]); } template <> EIGEN_STRONG_INLINE void prefetch >(const std::complex* addr) { EIGEN_MSA_DEBUG; prefetch(reinterpret_cast(addr)); } template <> EIGEN_STRONG_INLINE std::complex pfirst(const Packet2cf& a) { EIGEN_MSA_DEBUG; return std::complex(a.v[0], a.v[1]); } template <> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { EIGEN_MSA_DEBUG; return Packet2cf((Packet4f)__builtin_msa_shf_w((v4i32)a.v, EIGEN_MSA_SHF_I8(2, 3, 0, 1))); } template <> EIGEN_STRONG_INLINE Packet2cf pcplxflip(const Packet2cf& a) { EIGEN_MSA_DEBUG; return Packet2cf((Packet4f)__builtin_msa_shf_w((v4i32)a.v, EIGEN_MSA_SHF_I8(1, 0, 3, 2))); } template <> EIGEN_STRONG_INLINE std::complex predux(const Packet2cf& a) { EIGEN_MSA_DEBUG; Packet4f value = (Packet4f)preverse((Packet2d)a.v); value += a.v; return std::complex(value[0], value[1]); } template <> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet2cf& a) { EIGEN_MSA_DEBUG; return std::complex((a.v[0] * a.v[2]) - (a.v[1] * a.v[3]), (a.v[0] * a.v[3]) + (a.v[1] * a.v[2])); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet2cf, Packet4f) template <> EIGEN_STRONG_INLINE Packet2cf pdiv(const Packet2cf& a, const Packet2cf& b) { EIGEN_MSA_DEBUG; return a / b; } inline std::ostream& operator<<(std::ostream& os, const PacketBlock& value) { os << "[ " << value.packet[0] << ", " << std::endl << " " << value.packet[1] << " ]"; return os; } EIGEN_DEVICE_FUNC inline void ptranspose(PacketBlock& kernel) { EIGEN_MSA_DEBUG; Packet4f tmp = (Packet4f)__builtin_msa_ilvl_d((v2i64)kernel.packet[1].v, (v2i64)kernel.packet[0].v); kernel.packet[0].v = (Packet4f)__builtin_msa_ilvr_d((v2i64)kernel.packet[1].v, (v2i64)kernel.packet[0].v); kernel.packet[1].v = tmp; } template <> EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) { return (Packet2cf)(Packet4f)pblend(ifPacket, (Packet2d)thenPacket.v, (Packet2d)elsePacket.v); } //---------- double ---------- struct Packet1cd { EIGEN_STRONG_INLINE Packet1cd() { } EIGEN_STRONG_INLINE explicit Packet1cd(const std::complex& a) { v[0] = std::real(a); v[1] = std::imag(a); } EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) { } EIGEN_STRONG_INLINE Packet1cd(const Packet1cd& a) : v(a.v) { } EIGEN_STRONG_INLINE Packet1cd& operator=(const Packet1cd& b) { v = b.v; return *this; } EIGEN_STRONG_INLINE Packet1cd conjugate(void) const { static const v2u64 p2ul_CONJ_XOR = { 0x0, 0x8000000000000000 }; return (Packet1cd)pxor(v, (Packet2d)p2ul_CONJ_XOR); } EIGEN_STRONG_INLINE Packet1cd& operator*=(const Packet1cd& b) { Packet2d v1, v2; // Get the real values of a | a1_re | a1_re v1 = (Packet2d)__builtin_msa_ilvev_d((v2i64)v, (v2i64)v); // Get the imag values of a | a1_im | a1_im v2 = (Packet2d)__builtin_msa_ilvod_d((v2i64)v, (v2i64)v); // Multiply the real a with b v1 = pmul(v1, b.v); // Multiply the imag a with b v2 = pmul(v2, b.v); // Conjugate v2 v2 = Packet1cd(v2).conjugate().v; // Swap real/imag elements in v2. v2 = (Packet2d)__builtin_msa_shf_w((v4i32)v2, EIGEN_MSA_SHF_I8(2, 3, 0, 1)); // Add and return the result v = padd(v1, v2); return *this; } EIGEN_STRONG_INLINE Packet1cd operator*(const Packet1cd& b) const { return Packet1cd(*this) *= b; } EIGEN_STRONG_INLINE Packet1cd& operator+=(const Packet1cd& b) { v = padd(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet1cd operator+(const Packet1cd& b) const { return Packet1cd(*this) += b; } EIGEN_STRONG_INLINE Packet1cd& operator-=(const Packet1cd& b) { v = psub(v, b.v); return *this; } EIGEN_STRONG_INLINE Packet1cd operator-(const Packet1cd& b) const { return Packet1cd(*this) -= b; } EIGEN_STRONG_INLINE Packet1cd& operator/=(const Packet1cd& b) { *this *= b.conjugate(); Packet2d s = pmul(b.v, b.v); s = padd(s, preverse(s)); v = pdiv(v, s); return *this; } EIGEN_STRONG_INLINE Packet1cd operator/(const Packet1cd& b) const { return Packet1cd(*this) /= b; } EIGEN_STRONG_INLINE Packet1cd operator-(void) const { return Packet1cd(pnegate(v)); } Packet2d v; }; inline std::ostream& operator<<(std::ostream& os, const Packet1cd& value) { os << "[ (" << value.v[0] << ", " << value.v[1] << "i) ]"; return os; } template <> struct packet_traits > : default_packet_traits { typedef Packet1cd type; typedef Packet1cd half; enum { Vectorizable = 1, AlignedOnScalar = 0, size = 1, HasHalfPacket = 0, HasAdd = 1, HasSub = 1, HasMul = 1, HasDiv = 1, HasNegate = 1, HasAbs = 0, HasAbs2 = 0, HasMin = 0, HasMax = 0, HasSetLinear = 0 }; }; template <> struct unpacket_traits { typedef std::complex type; enum { size = 1, alignment = Aligned16, vectorizable=true, masked_load_available=false, masked_store_available=false }; typedef Packet1cd half; }; template <> EIGEN_STRONG_INLINE Packet1cd pload(const std::complex* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload((const double*)from)); } template <> EIGEN_STRONG_INLINE Packet1cd ploadu(const std::complex* from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu((const double*)from)); } template <> EIGEN_STRONG_INLINE Packet1cd pset1(const std::complex& from) { EIGEN_MSA_DEBUG; return Packet1cd(from); } template <> EIGEN_STRONG_INLINE Packet1cd padd(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return a + b; } template <> EIGEN_STRONG_INLINE Packet1cd psub(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return a - b; } template <> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { EIGEN_MSA_DEBUG; return -a; } template <> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { EIGEN_MSA_DEBUG; return a.conjugate(); } template <> EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return a * b; } template <> EIGEN_STRONG_INLINE Packet1cd pand(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return Packet1cd(pand(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet1cd por(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return Packet1cd(por(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet1cd pxor(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return Packet1cd(pxor(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet1cd pandnot(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return Packet1cd(pandnot(a.v, b.v)); } template <> EIGEN_STRONG_INLINE Packet1cd ploaddup(const std::complex* from) { EIGEN_MSA_DEBUG; return pset1(*from); } template <> EIGEN_STRONG_INLINE void pstore >(std::complex* to, const Packet1cd& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); } template <> EIGEN_STRONG_INLINE void pstoreu >(std::complex* to, const Packet1cd& from) { EIGEN_MSA_DEBUG; EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); } template <> EIGEN_STRONG_INLINE void prefetch >(const std::complex* addr) { EIGEN_MSA_DEBUG; prefetch(reinterpret_cast(addr)); } template <> EIGEN_DEVICE_FUNC inline Packet1cd pgather, Packet1cd>( const std::complex* from, Index stride __attribute__((unused))) { EIGEN_MSA_DEBUG; Packet1cd res; res.v[0] = std::real(from[0]); res.v[1] = std::imag(from[0]); return res; } template <> EIGEN_DEVICE_FUNC inline void pscatter, Packet1cd>(std::complex* to, const Packet1cd& from, Index stride __attribute__((unused))) { EIGEN_MSA_DEBUG; pstore(to, from); } template <> EIGEN_STRONG_INLINE std::complex pfirst(const Packet1cd& a) { EIGEN_MSA_DEBUG; return std::complex(a.v[0], a.v[1]); } template <> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { EIGEN_MSA_DEBUG; return a; } template <> EIGEN_STRONG_INLINE std::complex predux(const Packet1cd& a) { EIGEN_MSA_DEBUG; return pfirst(a); } template <> EIGEN_STRONG_INLINE std::complex predux_mul(const Packet1cd& a) { EIGEN_MSA_DEBUG; return pfirst(a); } EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(Packet1cd, Packet2d) template <> EIGEN_STRONG_INLINE Packet1cd pdiv(const Packet1cd& a, const Packet1cd& b) { EIGEN_MSA_DEBUG; return a / b; } EIGEN_STRONG_INLINE Packet1cd pcplxflip /**/ (const Packet1cd& x) { EIGEN_MSA_DEBUG; return Packet1cd(preverse(Packet2d(x.v))); } inline std::ostream& operator<<(std::ostream& os, const PacketBlock& value) { os << "[ " << value.packet[0] << ", " << std::endl << " " << value.packet[1] << " ]"; return os; } EIGEN_STRONG_INLINE void ptranspose(PacketBlock& kernel) { EIGEN_MSA_DEBUG; Packet2d v1, v2; v1 = (Packet2d)__builtin_msa_ilvev_d((v2i64)kernel.packet[0].v, (v2i64)kernel.packet[1].v); // Get the imag values of a v2 = (Packet2d)__builtin_msa_ilvod_d((v2i64)kernel.packet[0].v, (v2i64)kernel.packet[1].v); kernel.packet[0].v = v1; kernel.packet[1].v = v2; } } // end namespace internal } // end namespace Eigen #endif // EIGEN_COMPLEX_MSA_H RcppEigen/inst/include/Eigen/src/Core/arch/HIP/0000755000176200001440000000000014562417221020575 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/HIP/hcc/0000755000176200001440000000000014562417221021332 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/HIP/hcc/math_constants.h0000644000176200001440000000126314562417221024532 0ustar liggesusers/* * math_constants.h - * HIP equivalent of the CUDA header of the same name */ #ifndef __MATH_CONSTANTS_H__ #define __MATH_CONSTANTS_H__ /* single precision constants */ #define HIPRT_INF_F __int_as_float(0x7f800000) #define HIPRT_NAN_F __int_as_float(0x7fffffff) #define HIPRT_MIN_DENORM_F __int_as_float(0x00000001) #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) #define HIPRT_NEG_ZERO_F __int_as_float(0x80000000) #define HIPRT_ZERO_F 0.0f #define HIPRT_ONE_F 1.0f /* double precision constants */ #define HIPRT_INF __hiloint2double(0x7ff00000, 0x00000000) #define HIPRT_NAN __hiloint2double(0xfff80000, 0x00000000) #endif RcppEigen/inst/include/Eigen/src/Core/arch/Default/0000755000176200001440000000000014562417221021541 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/arch/Default/BFloat16.h0000644000176200001440000006442714562417221023245 0ustar liggesusers/* Copyright 2017 The TensorFlow Authors. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ==============================================================================*/ #ifndef EIGEN_BFLOAT16_H #define EIGEN_BFLOAT16_H #define BF16_PACKET_FUNCTION(PACKET_F, PACKET_BF16, METHOD) \ template <> \ EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED \ PACKET_BF16 METHOD(const PACKET_BF16& _x) { \ return F32ToBf16(METHOD(Bf16ToF32(_x))); \ } namespace Eigen { struct bfloat16; namespace bfloat16_impl { // Make our own __bfloat16_raw definition. struct __bfloat16_raw { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw() : value(0) {} explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw(unsigned short raw) : value(raw) {} unsigned short value; }; EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(unsigned short value); template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff); // Forward declarations of template specializations, to avoid Visual C++ 2019 errors, saying: // > error C2908: explicit specialization; 'float_to_bfloat16_rtne' has already been instantiated template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff); template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff); EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h); struct bfloat16_base : public __bfloat16_raw { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base() {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16_base(const __bfloat16_raw& h) : __bfloat16_raw(h) {} }; } // namespace bfloat16_impl // Class definition. struct bfloat16 : public bfloat16_impl::bfloat16_base { typedef bfloat16_impl::__bfloat16_raw __bfloat16_raw; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16() {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const __bfloat16_raw& h) : bfloat16_impl::bfloat16_base(h) {} explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(bool b) : bfloat16_impl::bfloat16_base(bfloat16_impl::raw_uint16_to_bfloat16(b ? 0x3f80 : 0)) {} template explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(T val) : bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne::value>(static_cast(val))) {} explicit EIGEN_DEVICE_FUNC bfloat16(float f) : bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne(f)) {} // Following the convention of numpy, converting between complex and // float will lead to loss of imag value. template explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR bfloat16(const std::complex& val) : bfloat16_impl::bfloat16_base(bfloat16_impl::float_to_bfloat16_rtne(static_cast(val.real()))) {} EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless. return bfloat16_impl::bfloat16_to_float(*this); } }; } // namespace Eigen namespace std { template<> struct numeric_limits { static const bool is_specialized = true; static const bool is_signed = true; static const bool is_integer = false; static const bool is_exact = false; static const bool has_infinity = true; static const bool has_quiet_NaN = true; static const bool has_signaling_NaN = true; static const float_denorm_style has_denorm = std::denorm_absent; static const bool has_denorm_loss = false; static const std::float_round_style round_style = numeric_limits::round_style; static const bool is_iec559 = false; static const bool is_bounded = true; static const bool is_modulo = false; static const int digits = 8; static const int digits10 = 2; static const int max_digits10 = 4; static const int radix = 2; static const int min_exponent = numeric_limits::min_exponent; static const int min_exponent10 = numeric_limits::min_exponent10; static const int max_exponent = numeric_limits::max_exponent; static const int max_exponent10 = numeric_limits::max_exponent10; static const bool traps = numeric_limits::traps; static const bool tinyness_before = numeric_limits::tinyness_before; static Eigen::bfloat16 (min)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0080); } static Eigen::bfloat16 lowest() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0xff7f); } static Eigen::bfloat16 (max)() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f7f); } static Eigen::bfloat16 epsilon() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x3c00); } static Eigen::bfloat16 round_error() { return Eigen::bfloat16(0x3f00); } static Eigen::bfloat16 infinity() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f80); } static Eigen::bfloat16 quiet_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0); } static Eigen::bfloat16 signaling_NaN() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x7f81); } static Eigen::bfloat16 denorm_min() { return Eigen::bfloat16_impl::raw_uint16_to_bfloat16(0x0001); } }; // If std::numeric_limits is specialized, should also specialize // std::numeric_limits, std::numeric_limits, and // std::numeric_limits // https://stackoverflow.com/a/16519653/ template<> struct numeric_limits : numeric_limits {}; template<> struct numeric_limits : numeric_limits {}; template<> struct numeric_limits : numeric_limits {}; } // namespace std namespace Eigen { namespace bfloat16_impl { // We need to distinguish ‘clang as the CUDA compiler’ from ‘clang as the host compiler, // invoked by NVCC’ (e.g. on MacOS). The former needs to see both host and device implementation // of the functions, while the latter can only deal with one of them. #if !defined(EIGEN_HAS_NATIVE_BF16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for bfloat16 floats #if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC) // We need to provide emulated *host-side* BF16 operators for clang. #pragma push_macro("EIGEN_DEVICE_FUNC") #undef EIGEN_DEVICE_FUNC #if defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_NATIVE_BF16) #define EIGEN_DEVICE_FUNC __host__ #else // both host and device need emulated ops. #define EIGEN_DEVICE_FUNC __host__ __device__ #endif #endif // Definitions for CPUs, mostly working through conversion // to/from fp32. EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const bfloat16& b) { return bfloat16(float(a) + float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const bfloat16& a, const int& b) { return bfloat16(float(a) + static_cast(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator + (const int& a, const bfloat16& b) { return bfloat16(static_cast(a) + float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator * (const bfloat16& a, const bfloat16& b) { return bfloat16(float(a) * float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a, const bfloat16& b) { return bfloat16(float(a) - float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, const bfloat16& b) { return bfloat16(float(a) / float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator - (const bfloat16& a) { bfloat16 result; result.value = a.value ^ 0x8000; return result; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator += (bfloat16& a, const bfloat16& b) { a = bfloat16(float(a) + float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator *= (bfloat16& a, const bfloat16& b) { a = bfloat16(float(a) * float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator -= (bfloat16& a, const bfloat16& b) { a = bfloat16(float(a) - float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16& operator /= (bfloat16& a, const bfloat16& b) { a = bfloat16(float(a) / float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a) { a += bfloat16(1); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a) { a -= bfloat16(1); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator++(bfloat16& a, int) { bfloat16 original_value = a; ++a; return original_value; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator--(bfloat16& a, int) { bfloat16 original_value = a; --a; return original_value; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const bfloat16& a, const bfloat16& b) { return numext::equal_strict(float(a),float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const bfloat16& a, const bfloat16& b) { return numext::not_equal_strict(float(a), float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const bfloat16& a, const bfloat16& b) { return float(a) < float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const bfloat16& a, const bfloat16& b) { return float(a) <= float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const bfloat16& a, const bfloat16& b) { return float(a) > float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const bfloat16& a, const bfloat16& b) { return float(a) >= float(b); } #if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC) #pragma pop_macro("EIGEN_DEVICE_FUNC") #endif #endif // Emulate support for bfloat16 floats // Division by an index. Do it in full float precision to avoid accuracy // issues in converting the denominator to bfloat16. EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 operator / (const bfloat16& a, Index b) { return bfloat16(static_cast(a) / static_cast(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw truncate_to_bfloat16(const float v) { __bfloat16_raw output; if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(v)) { output.value = std::signbit(v) ? 0xFFC0: 0x7FC0; return output; } const uint16_t* p = reinterpret_cast(&v); #if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ output.value = p[0]; #else output.value = p[1]; #endif return output; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __bfloat16_raw raw_uint16_to_bfloat16(numext::uint16_t value) { return __bfloat16_raw(value); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR numext::uint16_t raw_bfloat16_as_uint16(const __bfloat16_raw& bf) { return bf.value; } // float_to_bfloat16_rtne template specialization that does not make any // assumption about the value of its function argument (ff). template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff) { #if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16)) // Nothing to do here #else __bfloat16_raw output; if (Eigen::numext::isnan EIGEN_NOT_A_MACRO(ff)) { // If the value is a NaN, squash it to a qNaN with msb of fraction set, // this makes sure after truncation we don't end up with an inf. // // qNaN magic: All exponent bits set + most significant bit of fraction // set. output.value = std::signbit(ff) ? 0xFFC0: 0x7FC0; } else { // Fast rounding algorithm that rounds a half value to nearest even. This // reduces expected error when we convert a large number of floats. Here // is how it works: // // Definitions: // To convert a float 32 to bfloat16, a float 32 can be viewed as 32 bits // with the following tags: // // Sign | Exp (8 bits) | Frac (23 bits) // S EEEEEEEE FFFFFFLRTTTTTTTTTTTTTTT // // S: Sign bit. // E: Exponent bits. // F: First 6 bits of fraction. // L: Least significant bit of resulting bfloat16 if we truncate away the // rest of the float32. This is also the 7th bit of fraction // R: Rounding bit, 8th bit of fraction. // T: Sticky bits, rest of fraction, 15 bits. // // To round half to nearest even, there are 3 cases where we want to round // down (simply truncate the result of the bits away, which consists of // rounding bit and sticky bits) and two cases where we want to round up // (truncate then add one to the result). // // The fast converting algorithm simply adds lsb (L) to 0x7fff (15 bits of // 1s) as the rounding bias, adds the rounding bias to the input, then // truncates the last 16 bits away. // // To understand how it works, we can analyze this algorithm case by case: // // 1. L = 0, R = 0: // Expect: round down, this is less than half value. // // Algorithm: // - Rounding bias: 0x7fff + 0 = 0x7fff // - Adding rounding bias to input may create any carry, depending on // whether there is any value set to 1 in T bits. // - R may be set to 1 if there is a carry. // - L remains 0. // - Note that this case also handles Inf and -Inf, where all fraction // bits, including L, R and Ts are all 0. The output remains Inf after // this algorithm. // // 2. L = 1, R = 0: // Expect: round down, this is less than half value. // // Algorithm: // - Rounding bias: 0x7fff + 1 = 0x8000 // - Adding rounding bias to input doesn't change sticky bits but // adds 1 to rounding bit. // - L remains 1. // // 3. L = 0, R = 1, all of T are 0: // Expect: round down, this is exactly at half, the result is already // even (L=0). // // Algorithm: // - Rounding bias: 0x7fff + 0 = 0x7fff // - Adding rounding bias to input sets all sticky bits to 1, but // doesn't create a carry. // - R remains 1. // - L remains 0. // // 4. L = 1, R = 1: // Expect: round up, this is exactly at half, the result needs to be // round to the next even number. // // Algorithm: // - Rounding bias: 0x7fff + 1 = 0x8000 // - Adding rounding bias to input doesn't change sticky bits, but // creates a carry from rounding bit. // - The carry sets L to 0, creates another carry bit and propagate // forward to F bits. // - If all the F bits are 1, a carry then propagates to the exponent // bits, which then creates the minimum value with the next exponent // value. Note that we won't have the case where exponents are all 1, // since that's either a NaN (handled in the other if condition) or inf // (handled in case 1). // // 5. L = 0, R = 1, any of T is 1: // Expect: round up, this is greater than half. // // Algorithm: // - Rounding bias: 0x7fff + 0 = 0x7fff // - Adding rounding bias to input creates a carry from sticky bits, // sets rounding bit to 0, then create another carry. // - The second carry sets L to 1. // // Examples: // // Exact half value that is already even: // Input: // Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit) // S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT // 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1000000000000000 // // This falls into case 3. We truncate the rest of 16 bits and no // carry is created into F and L: // // Output: // Sign | Exp (8 bit) | Frac (first 7 bit) // S E E E E E E E E F F F F F F L // 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 // // Exact half value, round to next even number: // Input: // Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit) // S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT // 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1000000000000000 // // This falls into case 4. We create a carry from R and T, // which then propagates into L and F: // // Output: // Sign | Exp (8 bit) | Frac (first 7 bit) // S E E E E E E E E F F F F F F L // 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 // // // Max denormal value round to min normal value: // Input: // Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit) // S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT // 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1111111111111111 // // This falls into case 4. We create a carry from R and T, // propagate into L and F, which then propagates into exponent // bits: // // Output: // Sign | Exp (8 bit) | Frac (first 7 bit) // S E E E E E E E E F F F F F F L // 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 // // Max normal value round to Inf: // Input: // Sign | Exp (8 bit) | Frac (first 7 bit) | Frac (last 16 bit) // S E E E E E E E E F F F F F F L RTTTTTTTTTTTTTTT // 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1111111111111111 // // This falls into case 4. We create a carry from R and T, // propagate into L and F, which then propagates into exponent // bits: // // Sign | Exp (8 bit) | Frac (first 7 bit) // S E E E E E E E E F F F F F F L // 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 // At this point, ff must be either a normal float, or +/-infinity. output = float_to_bfloat16_rtne(ff); } return output; #endif } // float_to_bfloat16_rtne template specialization that assumes that its function // argument (ff) is either a normal floating point number, or +/-infinity, or // zero. Used to improve the runtime performance of conversion from an integer // type to bfloat16. template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __bfloat16_raw float_to_bfloat16_rtne(float ff) { #if (defined(EIGEN_HAS_CUDA_BF16) && defined(EIGEN_HAS_HIP_BF16)) // Nothing to do here #else numext::uint32_t input = numext::bit_cast(ff); __bfloat16_raw output; // Least significant bit of resulting bfloat. numext::uint32_t lsb = (input >> 16) & 1; numext::uint32_t rounding_bias = 0x7fff + lsb; input += rounding_bias; output.value = static_cast(input >> 16); return output; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float bfloat16_to_float(__bfloat16_raw h) { float result = 0; unsigned short* q = reinterpret_cast(&result); #if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ q[0] = h.value; #else q[1] = h.value; #endif return result; } // --- standard functions --- EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const bfloat16& a) { EIGEN_USING_STD(isinf); return (isinf)(float(a)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const bfloat16& a) { EIGEN_USING_STD(isnan); return (isnan)(float(a)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const bfloat16& a) { return !(isinf EIGEN_NOT_A_MACRO (a)) && !(isnan EIGEN_NOT_A_MACRO (a)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 abs(const bfloat16& a) { bfloat16 result; result.value = a.value & 0x7FFF; return result; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 exp(const bfloat16& a) { return bfloat16(::expf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 expm1(const bfloat16& a) { return bfloat16(numext::expm1(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log(const bfloat16& a) { return bfloat16(::logf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log1p(const bfloat16& a) { return bfloat16(numext::log1p(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log10(const bfloat16& a) { return bfloat16(::log10f(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 log2(const bfloat16& a) { return bfloat16(static_cast(EIGEN_LOG2E) * ::logf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sqrt(const bfloat16& a) { return bfloat16(::sqrtf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 pow(const bfloat16& a, const bfloat16& b) { return bfloat16(::powf(float(a), float(b))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sin(const bfloat16& a) { return bfloat16(::sinf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cos(const bfloat16& a) { return bfloat16(::cosf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tan(const bfloat16& a) { return bfloat16(::tanf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asin(const bfloat16& a) { return bfloat16(::asinf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acos(const bfloat16& a) { return bfloat16(::acosf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atan(const bfloat16& a) { return bfloat16(::atanf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 sinh(const bfloat16& a) { return bfloat16(::sinhf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 cosh(const bfloat16& a) { return bfloat16(::coshf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 tanh(const bfloat16& a) { return bfloat16(::tanhf(float(a))); } #if EIGEN_HAS_CXX11_MATH EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 asinh(const bfloat16& a) { return bfloat16(::asinhf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 acosh(const bfloat16& a) { return bfloat16(::acoshf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 atanh(const bfloat16& a) { return bfloat16(::atanhf(float(a))); } #endif EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 floor(const bfloat16& a) { return bfloat16(::floorf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 ceil(const bfloat16& a) { return bfloat16(::ceilf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 rint(const bfloat16& a) { return bfloat16(::rintf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 round(const bfloat16& a) { return bfloat16(::roundf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmod(const bfloat16& a, const bfloat16& b) { return bfloat16(::fmodf(float(a), float(b))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (min)(const bfloat16& a, const bfloat16& b) { const float f1 = static_cast(a); const float f2 = static_cast(b); return f2 < f1 ? b : a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 (max)(const bfloat16& a, const bfloat16& b) { const float f1 = static_cast(a); const float f2 = static_cast(b); return f1 < f2 ? b : a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmin(const bfloat16& a, const bfloat16& b) { const float f1 = static_cast(a); const float f2 = static_cast(b); return bfloat16(::fminf(f1, f2)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bfloat16 fmax(const bfloat16& a, const bfloat16& b) { const float f1 = static_cast(a); const float f2 = static_cast(b); return bfloat16(::fmaxf(f1, f2)); } #ifndef EIGEN_NO_IO EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const bfloat16& v) { os << static_cast(v); return os; } #endif } // namespace bfloat16_impl namespace internal { template<> struct random_default_impl { static inline bfloat16 run(const bfloat16& x, const bfloat16& y) { return x + (y-x) * bfloat16(float(std::rand()) / float(RAND_MAX)); } static inline bfloat16 run() { return run(bfloat16(-1.f), bfloat16(1.f)); } }; template<> struct is_arithmetic { enum { value = true }; }; } // namespace internal template<> struct NumTraits : GenericNumTraits { enum { IsSigned = true, IsInteger = false, IsComplex = false, RequireInitialization = false }; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 epsilon() { return bfloat16_impl::raw_uint16_to_bfloat16(0x3c00); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 dummy_precision() { return bfloat16_impl::raw_uint16_to_bfloat16(0x3D4D); // bfloat16(5e-2f); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 highest() { return bfloat16_impl::raw_uint16_to_bfloat16(0x7F7F); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 lowest() { return bfloat16_impl::raw_uint16_to_bfloat16(0xFF7F); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 infinity() { return bfloat16_impl::raw_uint16_to_bfloat16(0x7f80); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::bfloat16 quiet_NaN() { return bfloat16_impl::raw_uint16_to_bfloat16(0x7fc0); } }; } // namespace Eigen namespace Eigen { namespace numext { template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool (isnan)(const Eigen::bfloat16& h) { return (bfloat16_impl::isnan)(h); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool (isinf)(const Eigen::bfloat16& h) { return (bfloat16_impl::isinf)(h); } template<> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool (isfinite)(const Eigen::bfloat16& h) { return (bfloat16_impl::isfinite)(h); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::bfloat16 bit_cast(const uint16_t& src) { return Eigen::bfloat16(Eigen::bfloat16_impl::raw_uint16_to_bfloat16(src)); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast(const Eigen::bfloat16& src) { return Eigen::bfloat16_impl::raw_bfloat16_as_uint16(src); } } // namespace numext } // namespace Eigen #if EIGEN_HAS_STD_HASH namespace std { template <> struct hash { EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::bfloat16& a) const { return static_cast(Eigen::numext::bit_cast(a)); } }; } // namespace std #endif #endif // EIGEN_BFLOAT16_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h0000644000176200001440000020416014562417221027124 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007 Julien Pommier // Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com) // Copyright (C) 2009-2019 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* The exp and log functions of this file initially come from * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/ */ #ifndef EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H #define EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H namespace Eigen { namespace internal { // Creates a Scalar integer type with same bit-width. template struct make_integer; template<> struct make_integer { typedef numext::int32_t type; }; template<> struct make_integer { typedef numext::int64_t type; }; template<> struct make_integer { typedef numext::int16_t type; }; template<> struct make_integer { typedef numext::int16_t type; }; template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pfrexp_generic_get_biased_exponent(const Packet& a) { typedef typename unpacket_traits::type Scalar; typedef typename unpacket_traits::integer_packet PacketI; enum { mantissa_bits = numext::numeric_limits::digits - 1}; return pcast(plogical_shift_right(preinterpret(pabs(a)))); } // Safely applies frexp, correctly handles denormals. // Assumes IEEE floating point format. template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pfrexp_generic(const Packet& a, Packet& exponent) { typedef typename unpacket_traits::type Scalar; typedef typename make_unsigned::type>::type ScalarUI; enum { TotalBits = sizeof(Scalar) * CHAR_BIT, MantissaBits = numext::numeric_limits::digits - 1, ExponentBits = int(TotalBits) - int(MantissaBits) - 1 }; EIGEN_CONSTEXPR ScalarUI scalar_sign_mantissa_mask = ~(((ScalarUI(1) << int(ExponentBits)) - ScalarUI(1)) << int(MantissaBits)); // ~0x7f800000 const Packet sign_mantissa_mask = pset1frombits(static_cast(scalar_sign_mantissa_mask)); const Packet half = pset1(Scalar(0.5)); const Packet zero = pzero(a); const Packet normal_min = pset1((numext::numeric_limits::min)()); // Minimum normal value, 2^-126 // To handle denormals, normalize by multiplying by 2^(int(MantissaBits)+1). const Packet is_denormal = pcmp_lt(pabs(a), normal_min); EIGEN_CONSTEXPR ScalarUI scalar_normalization_offset = ScalarUI(int(MantissaBits) + 1); // 24 // The following cannot be constexpr because bfloat16(uint16_t) is not constexpr. const Scalar scalar_normalization_factor = Scalar(ScalarUI(1) << int(scalar_normalization_offset)); // 2^24 const Packet normalization_factor = pset1(scalar_normalization_factor); const Packet normalized_a = pselect(is_denormal, pmul(a, normalization_factor), a); // Determine exponent offset: -126 if normal, -126-24 if denormal const Scalar scalar_exponent_offset = -Scalar((ScalarUI(1)<<(int(ExponentBits)-1)) - ScalarUI(2)); // -126 Packet exponent_offset = pset1(scalar_exponent_offset); const Packet normalization_offset = pset1(-Scalar(scalar_normalization_offset)); // -24 exponent_offset = pselect(is_denormal, padd(exponent_offset, normalization_offset), exponent_offset); // Determine exponent and mantissa from normalized_a. exponent = pfrexp_generic_get_biased_exponent(normalized_a); // Zero, Inf and NaN return 'a' unmodified, exponent is zero // (technically the exponent is unspecified for inf/NaN, but GCC/Clang set it to zero) const Scalar scalar_non_finite_exponent = Scalar((ScalarUI(1) << int(ExponentBits)) - ScalarUI(1)); // 255 const Packet non_finite_exponent = pset1(scalar_non_finite_exponent); const Packet is_zero_or_not_finite = por(pcmp_eq(a, zero), pcmp_eq(exponent, non_finite_exponent)); const Packet m = pselect(is_zero_or_not_finite, a, por(pand(normalized_a, sign_mantissa_mask), half)); exponent = pselect(is_zero_or_not_finite, zero, padd(exponent, exponent_offset)); return m; } // Safely applies ldexp, correctly handles overflows, underflows and denormals. // Assumes IEEE floating point format. template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pldexp_generic(const Packet& a, const Packet& exponent) { // We want to return a * 2^exponent, allowing for all possible integer // exponents without overflowing or underflowing in intermediate // computations. // // Since 'a' and the output can be denormal, the maximum range of 'exponent' // to consider for a float is: // -255-23 -> 255+23 // Below -278 any finite float 'a' will become zero, and above +278 any // finite float will become inf, including when 'a' is the smallest possible // denormal. // // Unfortunately, 2^(278) cannot be represented using either one or two // finite normal floats, so we must split the scale factor into at least // three parts. It turns out to be faster to split 'exponent' into four // factors, since [exponent>>2] is much faster to compute that [exponent/3]. // // Set e = min(max(exponent, -278), 278); // b = floor(e/4); // out = ((((a * 2^(b)) * 2^(b)) * 2^(b)) * 2^(e-3*b)) // // This will avoid any intermediate overflows and correctly handle 0, inf, // NaN cases. typedef typename unpacket_traits::integer_packet PacketI; typedef typename unpacket_traits::type Scalar; typedef typename unpacket_traits::type ScalarI; enum { TotalBits = sizeof(Scalar) * CHAR_BIT, MantissaBits = numext::numeric_limits::digits - 1, ExponentBits = int(TotalBits) - int(MantissaBits) - 1 }; const Packet max_exponent = pset1(Scalar((ScalarI(1)<((ScalarI(1)<<(int(ExponentBits)-1)) - ScalarI(1)); // 127 const PacketI e = pcast(pmin(pmax(exponent, pnegate(max_exponent)), max_exponent)); PacketI b = parithmetic_shift_right<2>(e); // floor(e/4); Packet c = preinterpret(plogical_shift_left(padd(b, bias))); // 2^b Packet out = pmul(pmul(pmul(a, c), c), c); // a * 2^(3b) b = psub(psub(psub(e, b), b), b); // e - 3b c = preinterpret(plogical_shift_left(padd(b, bias))); // 2^(e-3*b) out = pmul(out, c); return out; } // Explicitly multiplies // a * (2^e) // clamping e to the range // [NumTraits::min_exponent()-2, NumTraits::max_exponent()] // // This is approx 7x faster than pldexp_impl, but will prematurely over/underflow // if 2^e doesn't fit into a normal floating-point Scalar. // // Assumes IEEE floating point format template struct pldexp_fast_impl { typedef typename unpacket_traits::integer_packet PacketI; typedef typename unpacket_traits::type Scalar; typedef typename unpacket_traits::type ScalarI; enum { TotalBits = sizeof(Scalar) * CHAR_BIT, MantissaBits = numext::numeric_limits::digits - 1, ExponentBits = int(TotalBits) - int(MantissaBits) - 1 }; static EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet run(const Packet& a, const Packet& exponent) { const Packet bias = pset1(Scalar((ScalarI(1)<<(int(ExponentBits)-1)) - ScalarI(1))); // 127 const Packet limit = pset1(Scalar((ScalarI(1)<(pmin(pmax(padd(exponent, bias), pzero(limit)), limit)); // exponent + 127 // return a * (2^e) return pmul(a, preinterpret(plogical_shift_left(e))); } }; // Natural or base 2 logarithm. // Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2) // and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can // be easily approximated by a polynomial centered on m=1 for stability. // TODO(gonnet): Further reduce the interval allowing for lower-degree // polynomial interpolants -> ... -> profit! template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_impl_float(const Packet _x) { Packet x = _x; const Packet cst_1 = pset1(1.0f); const Packet cst_neg_half = pset1(-0.5f); // The smallest non denormalized float number. const Packet cst_min_norm_pos = pset1frombits( 0x00800000u); const Packet cst_minus_inf = pset1frombits( 0xff800000u); const Packet cst_pos_inf = pset1frombits( 0x7f800000u); // Polynomial coefficients. const Packet cst_cephes_SQRTHF = pset1(0.707106781186547524f); const Packet cst_cephes_log_p0 = pset1(7.0376836292E-2f); const Packet cst_cephes_log_p1 = pset1(-1.1514610310E-1f); const Packet cst_cephes_log_p2 = pset1(1.1676998740E-1f); const Packet cst_cephes_log_p3 = pset1(-1.2420140846E-1f); const Packet cst_cephes_log_p4 = pset1(+1.4249322787E-1f); const Packet cst_cephes_log_p5 = pset1(-1.6668057665E-1f); const Packet cst_cephes_log_p6 = pset1(+2.0000714765E-1f); const Packet cst_cephes_log_p7 = pset1(-2.4999993993E-1f); const Packet cst_cephes_log_p8 = pset1(+3.3333331174E-1f); // Truncate input values to the minimum positive normal. x = pmax(x, cst_min_norm_pos); Packet e; // extract significant in the range [0.5,1) and exponent x = pfrexp(x,e); // part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2)) // and shift by -1. The values are then centered around 0, which improves // the stability of the polynomial evaluation. // if( x < SQRTHF ) { // e -= 1; // x = x + x - 1.0; // } else { x = x - 1.0; } Packet mask = pcmp_lt(x, cst_cephes_SQRTHF); Packet tmp = pand(x, mask); x = psub(x, cst_1); e = psub(e, pand(cst_1, mask)); x = padd(x, tmp); Packet x2 = pmul(x, x); Packet x3 = pmul(x2, x); // Evaluate the polynomial approximant of degree 8 in three parts, probably // to improve instruction-level parallelism. Packet y, y1, y2; y = pmadd(cst_cephes_log_p0, x, cst_cephes_log_p1); y1 = pmadd(cst_cephes_log_p3, x, cst_cephes_log_p4); y2 = pmadd(cst_cephes_log_p6, x, cst_cephes_log_p7); y = pmadd(y, x, cst_cephes_log_p2); y1 = pmadd(y1, x, cst_cephes_log_p5); y2 = pmadd(y2, x, cst_cephes_log_p8); y = pmadd(y, x3, y1); y = pmadd(y, x3, y2); y = pmul(y, x3); y = pmadd(cst_neg_half, x2, y); x = padd(x, y); // Add the logarithm of the exponent back to the result of the interpolation. if (base2) { const Packet cst_log2e = pset1(static_cast(EIGEN_LOG2E)); x = pmadd(x, cst_log2e, e); } else { const Packet cst_ln2 = pset1(static_cast(EIGEN_LN2)); x = pmadd(e, cst_ln2, x); } Packet invalid_mask = pcmp_lt_or_nan(_x, pzero(_x)); Packet iszero_mask = pcmp_eq(_x,pzero(_x)); Packet pos_inf_mask = pcmp_eq(_x,cst_pos_inf); // Filter out invalid inputs, i.e.: // - negative arg will be NAN // - 0 will be -INF // - +INF will be +INF return pselect(iszero_mask, cst_minus_inf, por(pselect(pos_inf_mask,cst_pos_inf,x), invalid_mask)); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_float(const Packet _x) { return plog_impl_float(_x); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog2_float(const Packet _x) { return plog_impl_float(_x); } /* Returns the base e (2.718...) or base 2 logarithm of x. * The argument is separated into its exponent and fractional parts. * The logarithm of the fraction in the interval [sqrt(1/2), sqrt(2)], * is approximated by * * log(1+x) = x - 0.5 x**2 + x**3 P(x)/Q(x). * * for more detail see: http://www.netlib.org/cephes/ */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_impl_double(const Packet _x) { Packet x = _x; const Packet cst_1 = pset1(1.0); const Packet cst_neg_half = pset1(-0.5); // The smallest non denormalized double. const Packet cst_min_norm_pos = pset1frombits( static_cast(0x0010000000000000ull)); const Packet cst_minus_inf = pset1frombits( static_cast(0xfff0000000000000ull)); const Packet cst_pos_inf = pset1frombits( static_cast(0x7ff0000000000000ull)); // Polynomial Coefficients for log(1+x) = x - x**2/2 + x**3 P(x)/Q(x) // 1/sqrt(2) <= x < sqrt(2) const Packet cst_cephes_SQRTHF = pset1(0.70710678118654752440E0); const Packet cst_cephes_log_p0 = pset1(1.01875663804580931796E-4); const Packet cst_cephes_log_p1 = pset1(4.97494994976747001425E-1); const Packet cst_cephes_log_p2 = pset1(4.70579119878881725854E0); const Packet cst_cephes_log_p3 = pset1(1.44989225341610930846E1); const Packet cst_cephes_log_p4 = pset1(1.79368678507819816313E1); const Packet cst_cephes_log_p5 = pset1(7.70838733755885391666E0); const Packet cst_cephes_log_q0 = pset1(1.0); const Packet cst_cephes_log_q1 = pset1(1.12873587189167450590E1); const Packet cst_cephes_log_q2 = pset1(4.52279145837532221105E1); const Packet cst_cephes_log_q3 = pset1(8.29875266912776603211E1); const Packet cst_cephes_log_q4 = pset1(7.11544750618563894466E1); const Packet cst_cephes_log_q5 = pset1(2.31251620126765340583E1); // Truncate input values to the minimum positive normal. x = pmax(x, cst_min_norm_pos); Packet e; // extract significant in the range [0.5,1) and exponent x = pfrexp(x,e); // Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2)) // and shift by -1. The values are then centered around 0, which improves // the stability of the polynomial evaluation. // if( x < SQRTHF ) { // e -= 1; // x = x + x - 1.0; // } else { x = x - 1.0; } Packet mask = pcmp_lt(x, cst_cephes_SQRTHF); Packet tmp = pand(x, mask); x = psub(x, cst_1); e = psub(e, pand(cst_1, mask)); x = padd(x, tmp); Packet x2 = pmul(x, x); Packet x3 = pmul(x2, x); // Evaluate the polynomial approximant , probably to improve instruction-level parallelism. // y = x - 0.5*x^2 + x^3 * polevl( x, P, 5 ) / p1evl( x, Q, 5 ) ); Packet y, y1, y_; y = pmadd(cst_cephes_log_p0, x, cst_cephes_log_p1); y1 = pmadd(cst_cephes_log_p3, x, cst_cephes_log_p4); y = pmadd(y, x, cst_cephes_log_p2); y1 = pmadd(y1, x, cst_cephes_log_p5); y_ = pmadd(y, x3, y1); y = pmadd(cst_cephes_log_q0, x, cst_cephes_log_q1); y1 = pmadd(cst_cephes_log_q3, x, cst_cephes_log_q4); y = pmadd(y, x, cst_cephes_log_q2); y1 = pmadd(y1, x, cst_cephes_log_q5); y = pmadd(y, x3, y1); y_ = pmul(y_, x3); y = pdiv(y_, y); y = pmadd(cst_neg_half, x2, y); x = padd(x, y); // Add the logarithm of the exponent back to the result of the interpolation. if (base2) { const Packet cst_log2e = pset1(static_cast(EIGEN_LOG2E)); x = pmadd(x, cst_log2e, e); } else { const Packet cst_ln2 = pset1(static_cast(EIGEN_LN2)); x = pmadd(e, cst_ln2, x); } Packet invalid_mask = pcmp_lt_or_nan(_x, pzero(_x)); Packet iszero_mask = pcmp_eq(_x,pzero(_x)); Packet pos_inf_mask = pcmp_eq(_x,cst_pos_inf); // Filter out invalid inputs, i.e.: // - negative arg will be NAN // - 0 will be -INF // - +INF will be +INF return pselect(iszero_mask, cst_minus_inf, por(pselect(pos_inf_mask,cst_pos_inf,x), invalid_mask)); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_double(const Packet _x) { return plog_impl_double(_x); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog2_double(const Packet _x) { return plog_impl_double(_x); } /** \internal \returns log(1 + x) computed using W. Kahan's formula. See: http://www.plunk.org/~hatch/rightway.php */ template Packet generic_plog1p(const Packet& x) { typedef typename unpacket_traits::type ScalarType; const Packet one = pset1(ScalarType(1)); Packet xp1 = padd(x, one); Packet small_mask = pcmp_eq(xp1, one); Packet log1 = plog(xp1); Packet inf_mask = pcmp_eq(xp1, log1); Packet log_large = pmul(x, pdiv(log1, psub(xp1, one))); return pselect(por(small_mask, inf_mask), x, log_large); } /** \internal \returns exp(x)-1 computed using W. Kahan's formula. See: http://www.plunk.org/~hatch/rightway.php */ template Packet generic_expm1(const Packet& x) { typedef typename unpacket_traits::type ScalarType; const Packet one = pset1(ScalarType(1)); const Packet neg_one = pset1(ScalarType(-1)); Packet u = pexp(x); Packet one_mask = pcmp_eq(u, one); Packet u_minus_one = psub(u, one); Packet neg_one_mask = pcmp_eq(u_minus_one, neg_one); Packet logu = plog(u); // The following comparison is to catch the case where // exp(x) = +inf. It is written in this way to avoid having // to form the constant +inf, which depends on the packet // type. Packet pos_inf_mask = pcmp_eq(logu, u); Packet expm1 = pmul(u_minus_one, pdiv(x, logu)); expm1 = pselect(pos_inf_mask, u, expm1); return pselect(one_mask, x, pselect(neg_one_mask, neg_one, expm1)); } // Exponential function. Works by writing "x = m*log(2) + r" where // "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then // "exp(x) = 2^m*exp(r)" where exp(r) is in the range [-1,1). template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pexp_float(const Packet _x) { const Packet cst_1 = pset1(1.0f); const Packet cst_half = pset1(0.5f); const Packet cst_exp_hi = pset1( 88.723f); const Packet cst_exp_lo = pset1(-88.723f); const Packet cst_cephes_LOG2EF = pset1(1.44269504088896341f); const Packet cst_cephes_exp_p0 = pset1(1.9875691500E-4f); const Packet cst_cephes_exp_p1 = pset1(1.3981999507E-3f); const Packet cst_cephes_exp_p2 = pset1(8.3334519073E-3f); const Packet cst_cephes_exp_p3 = pset1(4.1665795894E-2f); const Packet cst_cephes_exp_p4 = pset1(1.6666665459E-1f); const Packet cst_cephes_exp_p5 = pset1(5.0000001201E-1f); // Clamp x. Packet x = pmax(pmin(_x, cst_exp_hi), cst_exp_lo); // Express exp(x) as exp(m*ln(2) + r), start by extracting // m = floor(x/ln(2) + 0.5). Packet m = pfloor(pmadd(x, cst_cephes_LOG2EF, cst_half)); // Get r = x - m*ln(2). If no FMA instructions are available, m*ln(2) is // subtracted out in two parts, m*C1+m*C2 = m*ln(2), to avoid accumulating // truncation errors. const Packet cst_cephes_exp_C1 = pset1(-0.693359375f); const Packet cst_cephes_exp_C2 = pset1(2.12194440e-4f); Packet r = pmadd(m, cst_cephes_exp_C1, x); r = pmadd(m, cst_cephes_exp_C2, r); Packet r2 = pmul(r, r); Packet r3 = pmul(r2, r); // Evaluate the polynomial approximant,improved by instruction-level parallelism. Packet y, y1, y2; y = pmadd(cst_cephes_exp_p0, r, cst_cephes_exp_p1); y1 = pmadd(cst_cephes_exp_p3, r, cst_cephes_exp_p4); y2 = padd(r, cst_1); y = pmadd(y, r, cst_cephes_exp_p2); y1 = pmadd(y1, r, cst_cephes_exp_p5); y = pmadd(y, r3, y1); y = pmadd(y, r2, y2); // Return 2^m * exp(r). // TODO: replace pldexp with faster implementation since y in [-1, 1). return pmax(pldexp(y,m), _x); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pexp_double(const Packet _x) { Packet x = _x; const Packet cst_1 = pset1(1.0); const Packet cst_2 = pset1(2.0); const Packet cst_half = pset1(0.5); const Packet cst_exp_hi = pset1(709.784); const Packet cst_exp_lo = pset1(-709.784); const Packet cst_cephes_LOG2EF = pset1(1.4426950408889634073599); const Packet cst_cephes_exp_p0 = pset1(1.26177193074810590878e-4); const Packet cst_cephes_exp_p1 = pset1(3.02994407707441961300e-2); const Packet cst_cephes_exp_p2 = pset1(9.99999999999999999910e-1); const Packet cst_cephes_exp_q0 = pset1(3.00198505138664455042e-6); const Packet cst_cephes_exp_q1 = pset1(2.52448340349684104192e-3); const Packet cst_cephes_exp_q2 = pset1(2.27265548208155028766e-1); const Packet cst_cephes_exp_q3 = pset1(2.00000000000000000009e0); const Packet cst_cephes_exp_C1 = pset1(0.693145751953125); const Packet cst_cephes_exp_C2 = pset1(1.42860682030941723212e-6); Packet tmp, fx; // clamp x x = pmax(pmin(x, cst_exp_hi), cst_exp_lo); // Express exp(x) as exp(g + n*log(2)). fx = pmadd(cst_cephes_LOG2EF, x, cst_half); // Get the integer modulus of log(2), i.e. the "n" described above. fx = pfloor(fx); // Get the remainder modulo log(2), i.e. the "g" described above. Subtract // n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last // digits right. tmp = pmul(fx, cst_cephes_exp_C1); Packet z = pmul(fx, cst_cephes_exp_C2); x = psub(x, tmp); x = psub(x, z); Packet x2 = pmul(x, x); // Evaluate the numerator polynomial of the rational interpolant. Packet px = cst_cephes_exp_p0; px = pmadd(px, x2, cst_cephes_exp_p1); px = pmadd(px, x2, cst_cephes_exp_p2); px = pmul(px, x); // Evaluate the denominator polynomial of the rational interpolant. Packet qx = cst_cephes_exp_q0; qx = pmadd(qx, x2, cst_cephes_exp_q1); qx = pmadd(qx, x2, cst_cephes_exp_q2); qx = pmadd(qx, x2, cst_cephes_exp_q3); // I don't really get this bit, copied from the SSE2 routines, so... // TODO(gonnet): Figure out what is going on here, perhaps find a better // rational interpolant? x = pdiv(px, psub(qx, px)); x = pmadd(cst_2, x, cst_1); // Construct the result 2^n * exp(g) = e * x. The max is used to catch // non-finite values in the input. // TODO: replace pldexp with faster implementation since x in [-1, 1). return pmax(pldexp(x,fx), _x); } // The following code is inspired by the following stack-overflow answer: // https://stackoverflow.com/questions/30463616/payne-hanek-algorithm-implementation-in-c/30465751#30465751 // It has been largely optimized: // - By-pass calls to frexp. // - Aligned loads of required 96 bits of 2/pi. This is accomplished by // (1) balancing the mantissa and exponent to the required bits of 2/pi are // aligned on 8-bits, and (2) replicating the storage of the bits of 2/pi. // - Avoid a branch in rounding and extraction of the remaining fractional part. // Overall, I measured a speed up higher than x2 on x86-64. inline float trig_reduce_huge (float xf, int *quadrant) { using Eigen::numext::int32_t; using Eigen::numext::uint32_t; using Eigen::numext::int64_t; using Eigen::numext::uint64_t; const double pio2_62 = 3.4061215800865545e-19; // pi/2 * 2^-62 const uint64_t zero_dot_five = uint64_t(1) << 61; // 0.5 in 2.62-bit fixed-point foramt // 192 bits of 2/pi for Payne-Hanek reduction // Bits are introduced by packet of 8 to enable aligned reads. static const uint32_t two_over_pi [] = { 0x00000028, 0x000028be, 0x0028be60, 0x28be60db, 0xbe60db93, 0x60db9391, 0xdb939105, 0x9391054a, 0x91054a7f, 0x054a7f09, 0x4a7f09d5, 0x7f09d5f4, 0x09d5f47d, 0xd5f47d4d, 0xf47d4d37, 0x7d4d3770, 0x4d377036, 0x377036d8, 0x7036d8a5, 0x36d8a566, 0xd8a5664f, 0xa5664f10, 0x664f10e4, 0x4f10e410, 0x10e41000, 0xe4100000 }; uint32_t xi = numext::bit_cast(xf); // Below, -118 = -126 + 8. // -126 is to get the exponent, // +8 is to enable alignment of 2/pi's bits on 8 bits. // This is possible because the fractional part of x as only 24 meaningful bits. uint32_t e = (xi >> 23) - 118; // Extract the mantissa and shift it to align it wrt the exponent xi = ((xi & 0x007fffffu)| 0x00800000u) << (e & 0x7); uint32_t i = e >> 3; uint32_t twoopi_1 = two_over_pi[i-1]; uint32_t twoopi_2 = two_over_pi[i+3]; uint32_t twoopi_3 = two_over_pi[i+7]; // Compute x * 2/pi in 2.62-bit fixed-point format. uint64_t p; p = uint64_t(xi) * twoopi_3; p = uint64_t(xi) * twoopi_2 + (p >> 32); p = (uint64_t(xi * twoopi_1) << 32) + p; // Round to nearest: add 0.5 and extract integral part. uint64_t q = (p + zero_dot_five) >> 62; *quadrant = int(q); // Now it remains to compute "r = x - q*pi/2" with high accuracy, // since we have p=x/(pi/2) with high accuracy, we can more efficiently compute r as: // r = (p-q)*pi/2, // where the product can be be carried out with sufficient accuracy using double precision. p -= q<<62; return float(double(int64_t(p)) * pio2_62); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED #if EIGEN_GNUC_AT_LEAST(4,4) && EIGEN_COMP_GNUC_STRICT __attribute__((optimize("-fno-unsafe-math-optimizations"))) #endif Packet psincos_float(const Packet& _x) { typedef typename unpacket_traits::integer_packet PacketI; const Packet cst_2oPI = pset1(0.636619746685028076171875f); // 2/PI const Packet cst_rounding_magic = pset1(12582912); // 2^23 for rounding const PacketI csti_1 = pset1(1); const Packet cst_sign_mask = pset1frombits(0x80000000u); Packet x = pabs(_x); // Scale x by 2/Pi to find x's octant. Packet y = pmul(x, cst_2oPI); // Rounding trick: Packet y_round = padd(y, cst_rounding_magic); EIGEN_OPTIMIZATION_BARRIER(y_round) PacketI y_int = preinterpret(y_round); // last 23 digits represent integer (if abs(x)<2^24) y = psub(y_round, cst_rounding_magic); // nearest integer to x*4/pi // Reduce x by y octants to get: -Pi/4 <= x <= +Pi/4 // using "Extended precision modular arithmetic" #if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD) // This version requires true FMA for high accuracy // It provides a max error of 1ULP up to (with absolute_error < 5.9605e-08): const float huge_th = ComputeSine ? 117435.992f : 71476.0625f; x = pmadd(y, pset1(-1.57079601287841796875f), x); x = pmadd(y, pset1(-3.1391647326017846353352069854736328125e-07f), x); x = pmadd(y, pset1(-5.390302529957764765544681040410068817436695098876953125e-15f), x); #else // Without true FMA, the previous set of coefficients maintain 1ULP accuracy // up to x<15.7 (for sin), but accuracy is immediately lost for x>15.7. // We thus use one more iteration to maintain 2ULPs up to reasonably large inputs. // The following set of coefficients maintain 1ULP up to 9.43 and 14.16 for sin and cos respectively. // and 2 ULP up to: const float huge_th = ComputeSine ? 25966.f : 18838.f; x = pmadd(y, pset1(-1.5703125), x); // = 0xbfc90000 EIGEN_OPTIMIZATION_BARRIER(x) x = pmadd(y, pset1(-0.000483989715576171875), x); // = 0xb9fdc000 EIGEN_OPTIMIZATION_BARRIER(x) x = pmadd(y, pset1(1.62865035235881805419921875e-07), x); // = 0x342ee000 x = pmadd(y, pset1(5.5644315544167710640977020375430583953857421875e-11), x); // = 0x2e74b9ee // For the record, the following set of coefficients maintain 2ULP up // to a slightly larger range: // const float huge_th = ComputeSine ? 51981.f : 39086.125f; // but it slightly fails to maintain 1ULP for two values of sin below pi. // x = pmadd(y, pset1(-3.140625/2.), x); // x = pmadd(y, pset1(-0.00048351287841796875), x); // x = pmadd(y, pset1(-3.13855707645416259765625e-07), x); // x = pmadd(y, pset1(-6.0771006282767103812147979624569416046142578125e-11), x); // For the record, with only 3 iterations it is possible to maintain // 1 ULP up to 3PI (maybe more) and 2ULP up to 255. // The coefficients are: 0xbfc90f80, 0xb7354480, 0x2e74b9ee #endif if(predux_any(pcmp_le(pset1(huge_th),pabs(_x)))) { const int PacketSize = unpacket_traits::size; EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) float vals[PacketSize]; EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) float x_cpy[PacketSize]; EIGEN_ALIGN_TO_BOUNDARY(sizeof(Packet)) int y_int2[PacketSize]; pstoreu(vals, pabs(_x)); pstoreu(x_cpy, x); pstoreu(y_int2, y_int); for(int k=0; k=huge_th && (numext::isfinite)(val)) x_cpy[k] = trig_reduce_huge(val,&y_int2[k]); } x = ploadu(x_cpy); y_int = ploadu(y_int2); } // Compute the sign to apply to the polynomial. // sin: sign = second_bit(y_int) xor signbit(_x) // cos: sign = second_bit(y_int+1) Packet sign_bit = ComputeSine ? pxor(_x, preinterpret(plogical_shift_left<30>(y_int))) : preinterpret(plogical_shift_left<30>(padd(y_int,csti_1))); sign_bit = pand(sign_bit, cst_sign_mask); // clear all but left most bit // Get the polynomial selection mask from the second bit of y_int // We'll calculate both (sin and cos) polynomials and then select from the two. Packet poly_mask = preinterpret(pcmp_eq(pand(y_int, csti_1), pzero(y_int))); Packet x2 = pmul(x,x); // Evaluate the cos(x) polynomial. (-Pi/4 <= x <= Pi/4) Packet y1 = pset1(2.4372266125283204019069671630859375e-05f); y1 = pmadd(y1, x2, pset1(-0.00138865201734006404876708984375f )); y1 = pmadd(y1, x2, pset1(0.041666619479656219482421875f )); y1 = pmadd(y1, x2, pset1(-0.5f)); y1 = pmadd(y1, x2, pset1(1.f)); // Evaluate the sin(x) polynomial. (Pi/4 <= x <= Pi/4) // octave/matlab code to compute those coefficients: // x = (0:0.0001:pi/4)'; // A = [x.^3 x.^5 x.^7]; // w = ((1.-(x/(pi/4)).^2).^5)*2000+1; # weights trading relative accuracy // c = (A'*diag(w)*A)\(A'*diag(w)*(sin(x)-x)); # weighted LS, linear coeff forced to 1 // printf('%.64f\n %.64f\n%.64f\n', c(3), c(2), c(1)) // Packet y2 = pset1(-0.0001959234114083702898469196984621021329076029360294342041015625f); y2 = pmadd(y2, x2, pset1( 0.0083326873655616851693794799871284340042620897293090820312500000f)); y2 = pmadd(y2, x2, pset1(-0.1666666203982298255503735617821803316473960876464843750000000000f)); y2 = pmul(y2, x2); y2 = pmadd(y2, x, x); // Select the correct result from the two polynomials. y = ComputeSine ? pselect(poly_mask,y2,y1) : pselect(poly_mask,y1,y2); // Update the sign and filter huge inputs return pxor(y, sign_bit); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet psin_float(const Packet& x) { return psincos_float(x); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pcos_float(const Packet& x) { return psincos_float(x); } template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet psqrt_complex(const Packet& a) { typedef typename unpacket_traits::type Scalar; typedef typename Scalar::value_type RealScalar; typedef typename unpacket_traits::as_real RealPacket; // Computes the principal sqrt of the complex numbers in the input. // // For example, for packets containing 2 complex numbers stored in interleaved format // a = [a0, a1] = [x0, y0, x1, y1], // where x0 = real(a0), y0 = imag(a0) etc., this function returns // b = [b0, b1] = [u0, v0, u1, v1], // such that b0^2 = a0, b1^2 = a1. // // To derive the formula for the complex square roots, let's consider the equation for // a single complex square root of the number x + i*y. We want to find real numbers // u and v such that // (u + i*v)^2 = x + i*y <=> // u^2 - v^2 + i*2*u*v = x + i*v. // By equating the real and imaginary parts we get: // u^2 - v^2 = x // 2*u*v = y. // // For x >= 0, this has the numerically stable solution // u = sqrt(0.5 * (x + sqrt(x^2 + y^2))) // v = 0.5 * (y / u) // and for x < 0, // v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2))) // u = 0.5 * (y / v) // // To avoid unnecessary over- and underflow, we compute sqrt(x^2 + y^2) as // l = max(|x|, |y|) * sqrt(1 + (min(|x|, |y|) / max(|x|, |y|))^2) , // In the following, without lack of generality, we have annotated the code, assuming // that the input is a packet of 2 complex numbers. // // Step 1. Compute l = [l0, l0, l1, l1], where // l0 = sqrt(x0^2 + y0^2), l1 = sqrt(x1^2 + y1^2) // To avoid over- and underflow, we use the stable formula for each hypotenuse // l0 = (min0 == 0 ? max0 : max0 * sqrt(1 + (min0/max0)**2)), // where max0 = max(|x0|, |y0|), min0 = min(|x0|, |y0|), and similarly for l1. RealPacket a_abs = pabs(a.v); // [|x0|, |y0|, |x1|, |y1|] RealPacket a_abs_flip = pcplxflip(Packet(a_abs)).v; // [|y0|, |x0|, |y1|, |x1|] RealPacket a_max = pmax(a_abs, a_abs_flip); RealPacket a_min = pmin(a_abs, a_abs_flip); RealPacket a_min_zero_mask = pcmp_eq(a_min, pzero(a_min)); RealPacket a_max_zero_mask = pcmp_eq(a_max, pzero(a_max)); RealPacket r = pdiv(a_min, a_max); const RealPacket cst_one = pset1(RealScalar(1)); RealPacket l = pmul(a_max, psqrt(padd(cst_one, pmul(r, r)))); // [l0, l0, l1, l1] // Set l to a_max if a_min is zero. l = pselect(a_min_zero_mask, a_max, l); // Step 2. Compute [rho0, *, rho1, *], where // rho0 = sqrt(0.5 * (l0 + |x0|)), rho1 = sqrt(0.5 * (l1 + |x1|)) // We don't care about the imaginary parts computed here. They will be overwritten later. const RealPacket cst_half = pset1(RealScalar(0.5)); Packet rho; rho.v = psqrt(pmul(cst_half, padd(a_abs, l))); // Step 3. Compute [rho0, eta0, rho1, eta1], where // eta0 = (y0 / l0) / 2, and eta1 = (y1 / l1) / 2. // set eta = 0 of input is 0 + i0. RealPacket eta = pandnot(pmul(cst_half, pdiv(a.v, pcplxflip(rho).v)), a_max_zero_mask); RealPacket real_mask = peven_mask(a.v); Packet positive_real_result; // Compute result for inputs with positive real part. positive_real_result.v = pselect(real_mask, rho.v, eta); // Step 4. Compute solution for inputs with negative real part: // [|eta0|, sign(y0)*rho0, |eta1|, sign(y1)*rho1] const RealScalar neg_zero = RealScalar(numext::bit_cast(0x80000000u)); const RealPacket cst_imag_sign_mask = pset1(Scalar(RealScalar(0.0), neg_zero)).v; RealPacket imag_signs = pand(a.v, cst_imag_sign_mask); Packet negative_real_result; // Notice that rho is positive, so taking it's absolute value is a noop. negative_real_result.v = por(pabs(pcplxflip(positive_real_result).v), imag_signs); // Step 5. Select solution branch based on the sign of the real parts. Packet negative_real_mask; negative_real_mask.v = pcmp_lt(pand(real_mask, a.v), pzero(a.v)); negative_real_mask.v = por(negative_real_mask.v, pcplxflip(negative_real_mask).v); Packet result = pselect(negative_real_mask, negative_real_result, positive_real_result); // Step 6. Handle special cases for infinities: // * If z is (x,+∞), the result is (+∞,+∞) even if x is NaN // * If z is (x,-∞), the result is (+∞,-∞) even if x is NaN // * If z is (-∞,y), the result is (0*|y|,+∞) for finite or NaN y // * If z is (+∞,y), the result is (+∞,0*|y|) for finite or NaN y const RealPacket cst_pos_inf = pset1(NumTraits::infinity()); Packet is_inf; is_inf.v = pcmp_eq(a_abs, cst_pos_inf); Packet is_real_inf; is_real_inf.v = pand(is_inf.v, real_mask); is_real_inf = por(is_real_inf, pcplxflip(is_real_inf)); // prepare packet of (+∞,0*|y|) or (0*|y|,+∞), depending on the sign of the infinite real part. Packet real_inf_result; real_inf_result.v = pmul(a_abs, pset1(Scalar(RealScalar(1.0), RealScalar(0.0))).v); real_inf_result.v = pselect(negative_real_mask.v, pcplxflip(real_inf_result).v, real_inf_result.v); // prepare packet of (+∞,+∞) or (+∞,-∞), depending on the sign of the infinite imaginary part. Packet is_imag_inf; is_imag_inf.v = pandnot(is_inf.v, real_mask); is_imag_inf = por(is_imag_inf, pcplxflip(is_imag_inf)); Packet imag_inf_result; imag_inf_result.v = por(pand(cst_pos_inf, real_mask), pandnot(a.v, real_mask)); return pselect(is_imag_inf, imag_inf_result, pselect(is_real_inf, real_inf_result,result)); } // TODO(rmlarsen): The following set of utilities for double word arithmetic // should perhaps be refactored as a separate file, since it would be generally // useful for special function implementation etc. Writing the algorithms in // terms if a double word type would also make the code more readable. // This function splits x into the nearest integer n and fractional part r, // such that x = n + r holds exactly. template EIGEN_STRONG_INLINE void absolute_split(const Packet& x, Packet& n, Packet& r) { n = pround(x); r = psub(x, n); } // This function computes the sum {s, r}, such that x + y = s_hi + s_lo // holds exactly, and s_hi = fl(x+y), if |x| >= |y|. template EIGEN_STRONG_INLINE void fast_twosum(const Packet& x, const Packet& y, Packet& s_hi, Packet& s_lo) { s_hi = padd(x, y); const Packet t = psub(s_hi, x); s_lo = psub(y, t); } #ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD // This function implements the extended precision product of // a pair of floating point numbers. Given {x, y}, it computes the pair // {p_hi, p_lo} such that x * y = p_hi + p_lo holds exactly and // p_hi = fl(x * y). template EIGEN_STRONG_INLINE void twoprod(const Packet& x, const Packet& y, Packet& p_hi, Packet& p_lo) { p_hi = pmul(x, y); p_lo = pmadd(x, y, pnegate(p_hi)); } #else // This function implements the Veltkamp splitting. Given a floating point // number x it returns the pair {x_hi, x_lo} such that x_hi + x_lo = x holds // exactly and that half of the significant of x fits in x_hi. // This is Algorithm 3 from Jean-Michel Muller, "Elementary Functions", // 3rd edition, Birkh\"auser, 2016. template EIGEN_STRONG_INLINE void veltkamp_splitting(const Packet& x, Packet& x_hi, Packet& x_lo) { typedef typename unpacket_traits::type Scalar; EIGEN_CONSTEXPR int shift = (NumTraits::digits() + 1) / 2; const Scalar shift_scale = Scalar(uint64_t(1) << shift); // Scalar constructor not necessarily constexpr. const Packet gamma = pmul(pset1(shift_scale + Scalar(1)), x); Packet rho = psub(x, gamma); x_hi = padd(rho, gamma); x_lo = psub(x, x_hi); } // This function implements Dekker's algorithm for products x * y. // Given floating point numbers {x, y} computes the pair // {p_hi, p_lo} such that x * y = p_hi + p_lo holds exactly and // p_hi = fl(x * y). template EIGEN_STRONG_INLINE void twoprod(const Packet& x, const Packet& y, Packet& p_hi, Packet& p_lo) { Packet x_hi, x_lo, y_hi, y_lo; veltkamp_splitting(x, x_hi, x_lo); veltkamp_splitting(y, y_hi, y_lo); p_hi = pmul(x, y); p_lo = pmadd(x_hi, y_hi, pnegate(p_hi)); p_lo = pmadd(x_hi, y_lo, p_lo); p_lo = pmadd(x_lo, y_hi, p_lo); p_lo = pmadd(x_lo, y_lo, p_lo); } #endif // EIGEN_HAS_SINGLE_INSTRUCTION_MADD // This function implements Dekker's algorithm for the addition // of two double word numbers represented by {x_hi, x_lo} and {y_hi, y_lo}. // It returns the result as a pair {s_hi, s_lo} such that // x_hi + x_lo + y_hi + y_lo = s_hi + s_lo holds exactly. // This is Algorithm 5 from Jean-Michel Muller, "Elementary Functions", // 3rd edition, Birkh\"auser, 2016. template EIGEN_STRONG_INLINE void twosum(const Packet& x_hi, const Packet& x_lo, const Packet& y_hi, const Packet& y_lo, Packet& s_hi, Packet& s_lo) { const Packet x_greater_mask = pcmp_lt(pabs(y_hi), pabs(x_hi)); Packet r_hi_1, r_lo_1; fast_twosum(x_hi, y_hi,r_hi_1, r_lo_1); Packet r_hi_2, r_lo_2; fast_twosum(y_hi, x_hi,r_hi_2, r_lo_2); const Packet r_hi = pselect(x_greater_mask, r_hi_1, r_hi_2); const Packet s1 = padd(padd(y_lo, r_lo_1), x_lo); const Packet s2 = padd(padd(x_lo, r_lo_2), y_lo); const Packet s = pselect(x_greater_mask, s1, s2); fast_twosum(r_hi, s, s_hi, s_lo); } // This is a version of twosum for double word numbers, // which assumes that |x_hi| >= |y_hi|. template EIGEN_STRONG_INLINE void fast_twosum(const Packet& x_hi, const Packet& x_lo, const Packet& y_hi, const Packet& y_lo, Packet& s_hi, Packet& s_lo) { Packet r_hi, r_lo; fast_twosum(x_hi, y_hi, r_hi, r_lo); const Packet s = padd(padd(y_lo, r_lo), x_lo); fast_twosum(r_hi, s, s_hi, s_lo); } // This is a version of twosum for adding a floating point number x to // double word number {y_hi, y_lo} number, with the assumption // that |x| >= |y_hi|. template EIGEN_STRONG_INLINE void fast_twosum(const Packet& x, const Packet& y_hi, const Packet& y_lo, Packet& s_hi, Packet& s_lo) { Packet r_hi, r_lo; fast_twosum(x, y_hi, r_hi, r_lo); const Packet s = padd(y_lo, r_lo); fast_twosum(r_hi, s, s_hi, s_lo); } // This function implements the multiplication of a double word // number represented by {x_hi, x_lo} by a floating point number y. // It returns the result as a pair {p_hi, p_lo} such that // (x_hi + x_lo) * y = p_hi + p_lo hold with a relative error // of less than 2*2^{-2p}, where p is the number of significand bit // in the floating point type. // This is Algorithm 7 from Jean-Michel Muller, "Elementary Functions", // 3rd edition, Birkh\"auser, 2016. template EIGEN_STRONG_INLINE void twoprod(const Packet& x_hi, const Packet& x_lo, const Packet& y, Packet& p_hi, Packet& p_lo) { Packet c_hi, c_lo1; twoprod(x_hi, y, c_hi, c_lo1); const Packet c_lo2 = pmul(x_lo, y); Packet t_hi, t_lo1; fast_twosum(c_hi, c_lo2, t_hi, t_lo1); const Packet t_lo2 = padd(t_lo1, c_lo1); fast_twosum(t_hi, t_lo2, p_hi, p_lo); } // This function implements the multiplication of two double word // numbers represented by {x_hi, x_lo} and {y_hi, y_lo}. // It returns the result as a pair {p_hi, p_lo} such that // (x_hi + x_lo) * (y_hi + y_lo) = p_hi + p_lo holds with a relative error // of less than 2*2^{-2p}, where p is the number of significand bit // in the floating point type. template EIGEN_STRONG_INLINE void twoprod(const Packet& x_hi, const Packet& x_lo, const Packet& y_hi, const Packet& y_lo, Packet& p_hi, Packet& p_lo) { Packet p_hi_hi, p_hi_lo; twoprod(x_hi, x_lo, y_hi, p_hi_hi, p_hi_lo); Packet p_lo_hi, p_lo_lo; twoprod(x_hi, x_lo, y_lo, p_lo_hi, p_lo_lo); fast_twosum(p_hi_hi, p_hi_lo, p_lo_hi, p_lo_lo, p_hi, p_lo); } // This function computes the reciprocal of a floating point number // with extra precision and returns the result as a double word. template void doubleword_reciprocal(const Packet& x, Packet& recip_hi, Packet& recip_lo) { typedef typename unpacket_traits::type Scalar; // 1. Approximate the reciprocal as the reciprocal of the high order element. Packet approx_recip = prsqrt(x); approx_recip = pmul(approx_recip, approx_recip); // 2. Run one step of Newton-Raphson iteration in double word arithmetic // to get the bottom half. The NR iteration for reciprocal of 'a' is // x_{i+1} = x_i * (2 - a * x_i) // -a*x_i Packet t1_hi, t1_lo; twoprod(pnegate(x), approx_recip, t1_hi, t1_lo); // 2 - a*x_i Packet t2_hi, t2_lo; fast_twosum(pset1(Scalar(2)), t1_hi, t2_hi, t2_lo); Packet t3_hi, t3_lo; fast_twosum(t2_hi, padd(t2_lo, t1_lo), t3_hi, t3_lo); // x_i * (2 - a * x_i) twoprod(t3_hi, t3_lo, approx_recip, recip_hi, recip_lo); } // This function computes log2(x) and returns the result as a double word. template struct accurate_log2 { template EIGEN_STRONG_INLINE void operator()(const Packet& x, Packet& log2_x_hi, Packet& log2_x_lo) { log2_x_hi = plog2(x); log2_x_lo = pzero(x); } }; // This specialization uses a more accurate algorithm to compute log2(x) for // floats in [1/sqrt(2);sqrt(2)] with a relative accuracy of ~6.42e-10. // This additional accuracy is needed to counter the error-magnification // inherent in multiplying by a potentially large exponent in pow(x,y). // The minimax polynomial used was calculated using the Sollya tool. // See sollya.org. template <> struct accurate_log2 { template EIGEN_STRONG_INLINE void operator()(const Packet& z, Packet& log2_x_hi, Packet& log2_x_lo) { // The function log(1+x)/x is approximated in the interval // [1/sqrt(2)-1;sqrt(2)-1] by a degree 10 polynomial of the form // Q(x) = (C0 + x * (C1 + x * (C2 + x * (C3 + x * P(x))))), // where the degree 6 polynomial P(x) is evaluated in single precision, // while the remaining 4 terms of Q(x), as well as the final multiplication by x // to reconstruct log(1+x) are evaluated in extra precision using // double word arithmetic. C0 through C3 are extra precise constants // stored as double words. // // The polynomial coefficients were calculated using Sollya commands: // > n = 10; // > f = log2(1+x)/x; // > interval = [sqrt(0.5)-1;sqrt(2)-1]; // > p = fpminimax(f,n,[|double,double,double,double,single...|],interval,relative,floating); const Packet p6 = pset1( 9.703654795885e-2f); const Packet p5 = pset1(-0.1690667718648f); const Packet p4 = pset1( 0.1720575392246f); const Packet p3 = pset1(-0.1789081543684f); const Packet p2 = pset1( 0.2050433009862f); const Packet p1 = pset1(-0.2404672354459f); const Packet p0 = pset1( 0.2885761857032f); const Packet C3_hi = pset1(-0.360674142838f); const Packet C3_lo = pset1(-6.13283912543e-09f); const Packet C2_hi = pset1(0.480897903442f); const Packet C2_lo = pset1(-1.44861207474e-08f); const Packet C1_hi = pset1(-0.721347510815f); const Packet C1_lo = pset1(-4.84483164698e-09f); const Packet C0_hi = pset1(1.44269502163f); const Packet C0_lo = pset1(2.01711713999e-08f); const Packet one = pset1(1.0f); const Packet x = psub(z, one); // Evaluate P(x) in working precision. // We evaluate it in multiple parts to improve instruction level // parallelism. Packet x2 = pmul(x,x); Packet p_even = pmadd(p6, x2, p4); p_even = pmadd(p_even, x2, p2); p_even = pmadd(p_even, x2, p0); Packet p_odd = pmadd(p5, x2, p3); p_odd = pmadd(p_odd, x2, p1); Packet p = pmadd(p_odd, x, p_even); // Now evaluate the low-order tems of Q(x) in double word precision. // In the following, due to the alternating signs and the fact that // |x| < sqrt(2)-1, we can assume that |C*_hi| >= q_i, and use // fast_twosum instead of the slower twosum. Packet q_hi, q_lo; Packet t_hi, t_lo; // C3 + x * p(x) twoprod(p, x, t_hi, t_lo); fast_twosum(C3_hi, C3_lo, t_hi, t_lo, q_hi, q_lo); // C2 + x * p(x) twoprod(q_hi, q_lo, x, t_hi, t_lo); fast_twosum(C2_hi, C2_lo, t_hi, t_lo, q_hi, q_lo); // C1 + x * p(x) twoprod(q_hi, q_lo, x, t_hi, t_lo); fast_twosum(C1_hi, C1_lo, t_hi, t_lo, q_hi, q_lo); // C0 + x * p(x) twoprod(q_hi, q_lo, x, t_hi, t_lo); fast_twosum(C0_hi, C0_lo, t_hi, t_lo, q_hi, q_lo); // log(z) ~= x * Q(x) twoprod(q_hi, q_lo, x, log2_x_hi, log2_x_lo); } }; // This specialization uses a more accurate algorithm to compute log2(x) for // floats in [1/sqrt(2);sqrt(2)] with a relative accuracy of ~1.27e-18. // This additional accuracy is needed to counter the error-magnification // inherent in multiplying by a potentially large exponent in pow(x,y). // The minimax polynomial used was calculated using the Sollya tool. // See sollya.org. template <> struct accurate_log2 { template EIGEN_STRONG_INLINE void operator()(const Packet& x, Packet& log2_x_hi, Packet& log2_x_lo) { // We use a transformation of variables: // r = c * (x-1) / (x+1), // such that // log2(x) = log2((1 + r/c) / (1 - r/c)) = f(r). // The function f(r) can be approximated well using an odd polynomial // of the form // P(r) = ((Q(r^2) * r^2 + C) * r^2 + 1) * r, // For the implementation of log2 here, Q is of degree 6 with // coefficient represented in working precision (double), while C is a // constant represented in extra precision as a double word to achieve // full accuracy. // // The polynomial coefficients were computed by the Sollya script: // // c = 2 / log(2); // trans = c * (x-1)/(x+1); // itrans = (1+x/c)/(1-x/c); // interval=[trans(sqrt(0.5)); trans(sqrt(2))]; // print(interval); // f = log2(itrans(x)); // p=fpminimax(f,[|1,3,5,7,9,11,13,15,17|],[|1,DD,double...|],interval,relative,floating); const Packet q12 = pset1(2.87074255468000586e-9); const Packet q10 = pset1(2.38957980901884082e-8); const Packet q8 = pset1(2.31032094540014656e-7); const Packet q6 = pset1(2.27279857398537278e-6); const Packet q4 = pset1(2.31271023278625638e-5); const Packet q2 = pset1(2.47556738444535513e-4); const Packet q0 = pset1(2.88543873228900172e-3); const Packet C_hi = pset1(0.0400377511598501157); const Packet C_lo = pset1(-4.77726582251425391e-19); const Packet one = pset1(1.0); const Packet cst_2_log2e_hi = pset1(2.88539008177792677); const Packet cst_2_log2e_lo = pset1(4.07660016854549667e-17); // c * (x - 1) Packet num_hi, num_lo; twoprod(cst_2_log2e_hi, cst_2_log2e_lo, psub(x, one), num_hi, num_lo); // TODO(rmlarsen): Investigate if using the division algorithm by // Muller et al. is faster/more accurate. // 1 / (x + 1) Packet denom_hi, denom_lo; doubleword_reciprocal(padd(x, one), denom_hi, denom_lo); // r = c * (x-1) / (x+1), Packet r_hi, r_lo; twoprod(num_hi, num_lo, denom_hi, denom_lo, r_hi, r_lo); // r2 = r * r Packet r2_hi, r2_lo; twoprod(r_hi, r_lo, r_hi, r_lo, r2_hi, r2_lo); // r4 = r2 * r2 Packet r4_hi, r4_lo; twoprod(r2_hi, r2_lo, r2_hi, r2_lo, r4_hi, r4_lo); // Evaluate Q(r^2) in working precision. We evaluate it in two parts // (even and odd in r^2) to improve instruction level parallelism. Packet q_even = pmadd(q12, r4_hi, q8); Packet q_odd = pmadd(q10, r4_hi, q6); q_even = pmadd(q_even, r4_hi, q4); q_odd = pmadd(q_odd, r4_hi, q2); q_even = pmadd(q_even, r4_hi, q0); Packet q = pmadd(q_odd, r2_hi, q_even); // Now evaluate the low order terms of P(x) in double word precision. // In the following, due to the increasing magnitude of the coefficients // and r being constrained to [-0.5, 0.5] we can use fast_twosum instead // of the slower twosum. // Q(r^2) * r^2 Packet p_hi, p_lo; twoprod(r2_hi, r2_lo, q, p_hi, p_lo); // Q(r^2) * r^2 + C Packet p1_hi, p1_lo; fast_twosum(C_hi, C_lo, p_hi, p_lo, p1_hi, p1_lo); // (Q(r^2) * r^2 + C) * r^2 Packet p2_hi, p2_lo; twoprod(r2_hi, r2_lo, p1_hi, p1_lo, p2_hi, p2_lo); // ((Q(r^2) * r^2 + C) * r^2 + 1) Packet p3_hi, p3_lo; fast_twosum(one, p2_hi, p2_lo, p3_hi, p3_lo); // log(z) ~= ((Q(r^2) * r^2 + C) * r^2 + 1) * r twoprod(p3_hi, p3_lo, r_hi, r_lo, log2_x_hi, log2_x_lo); } }; // This function computes exp2(x) (i.e. 2**x). template struct fast_accurate_exp2 { template EIGEN_STRONG_INLINE Packet operator()(const Packet& x) { // TODO(rmlarsen): Add a pexp2 packetop. return pexp(pmul(pset1(Scalar(EIGEN_LN2)), x)); } }; // This specialization uses a faster algorithm to compute exp2(x) for floats // in [-0.5;0.5] with a relative accuracy of 1 ulp. // The minimax polynomial used was calculated using the Sollya tool. // See sollya.org. template <> struct fast_accurate_exp2 { template EIGEN_STRONG_INLINE Packet operator()(const Packet& x) { // This function approximates exp2(x) by a degree 6 polynomial of the form // Q(x) = 1 + x * (C + x * P(x)), where the degree 4 polynomial P(x) is evaluated in // single precision, and the remaining steps are evaluated with extra precision using // double word arithmetic. C is an extra precise constant stored as a double word. // // The polynomial coefficients were calculated using Sollya commands: // > n = 6; // > f = 2^x; // > interval = [-0.5;0.5]; // > p = fpminimax(f,n,[|1,double,single...|],interval,relative,floating); const Packet p4 = pset1(1.539513905e-4f); const Packet p3 = pset1(1.340007293e-3f); const Packet p2 = pset1(9.618283249e-3f); const Packet p1 = pset1(5.550328270e-2f); const Packet p0 = pset1(0.2402264923f); const Packet C_hi = pset1(0.6931471825f); const Packet C_lo = pset1(2.36836577e-08f); const Packet one = pset1(1.0f); // Evaluate P(x) in working precision. // We evaluate even and odd parts of the polynomial separately // to gain some instruction level parallelism. Packet x2 = pmul(x,x); Packet p_even = pmadd(p4, x2, p2); Packet p_odd = pmadd(p3, x2, p1); p_even = pmadd(p_even, x2, p0); Packet p = pmadd(p_odd, x, p_even); // Evaluate the remaining terms of Q(x) with extra precision using // double word arithmetic. Packet p_hi, p_lo; // x * p(x) twoprod(p, x, p_hi, p_lo); // C + x * p(x) Packet q1_hi, q1_lo; twosum(p_hi, p_lo, C_hi, C_lo, q1_hi, q1_lo); // x * (C + x * p(x)) Packet q2_hi, q2_lo; twoprod(q1_hi, q1_lo, x, q2_hi, q2_lo); // 1 + x * (C + x * p(x)) Packet q3_hi, q3_lo; // Since |q2_hi| <= sqrt(2)-1 < 1, we can use fast_twosum // for adding it to unity here. fast_twosum(one, q2_hi, q3_hi, q3_lo); return padd(q3_hi, padd(q2_lo, q3_lo)); } }; // in [-0.5;0.5] with a relative accuracy of 1 ulp. // The minimax polynomial used was calculated using the Sollya tool. // See sollya.org. template <> struct fast_accurate_exp2 { template EIGEN_STRONG_INLINE Packet operator()(const Packet& x) { // This function approximates exp2(x) by a degree 10 polynomial of the form // Q(x) = 1 + x * (C + x * P(x)), where the degree 8 polynomial P(x) is evaluated in // single precision, and the remaining steps are evaluated with extra precision using // double word arithmetic. C is an extra precise constant stored as a double word. // // The polynomial coefficients were calculated using Sollya commands: // > n = 11; // > f = 2^x; // > interval = [-0.5;0.5]; // > p = fpminimax(f,n,[|1,DD,double...|],interval,relative,floating); const Packet p9 = pset1(4.431642109085495276e-10); const Packet p8 = pset1(7.073829923303358410e-9); const Packet p7 = pset1(1.017822306737031311e-7); const Packet p6 = pset1(1.321543498017646657e-6); const Packet p5 = pset1(1.525273342728892877e-5); const Packet p4 = pset1(1.540353045780084423e-4); const Packet p3 = pset1(1.333355814685869807e-3); const Packet p2 = pset1(9.618129107593478832e-3); const Packet p1 = pset1(5.550410866481961247e-2); const Packet p0 = pset1(0.240226506959101332); const Packet C_hi = pset1(0.693147180559945286); const Packet C_lo = pset1(4.81927865669806721e-17); const Packet one = pset1(1.0); // Evaluate P(x) in working precision. // We evaluate even and odd parts of the polynomial separately // to gain some instruction level parallelism. Packet x2 = pmul(x,x); Packet p_even = pmadd(p8, x2, p6); Packet p_odd = pmadd(p9, x2, p7); p_even = pmadd(p_even, x2, p4); p_odd = pmadd(p_odd, x2, p5); p_even = pmadd(p_even, x2, p2); p_odd = pmadd(p_odd, x2, p3); p_even = pmadd(p_even, x2, p0); p_odd = pmadd(p_odd, x2, p1); Packet p = pmadd(p_odd, x, p_even); // Evaluate the remaining terms of Q(x) with extra precision using // double word arithmetic. Packet p_hi, p_lo; // x * p(x) twoprod(p, x, p_hi, p_lo); // C + x * p(x) Packet q1_hi, q1_lo; twosum(p_hi, p_lo, C_hi, C_lo, q1_hi, q1_lo); // x * (C + x * p(x)) Packet q2_hi, q2_lo; twoprod(q1_hi, q1_lo, x, q2_hi, q2_lo); // 1 + x * (C + x * p(x)) Packet q3_hi, q3_lo; // Since |q2_hi| <= sqrt(2)-1 < 1, we can use fast_twosum // for adding it to unity here. fast_twosum(one, q2_hi, q3_hi, q3_lo); return padd(q3_hi, padd(q2_lo, q3_lo)); } }; // This function implements the non-trivial case of pow(x,y) where x is // positive and y is (possibly) non-integer. // Formally, pow(x,y) = exp2(y * log2(x)), where exp2(x) is shorthand for 2^x. // TODO(rmlarsen): We should probably add this as a packet up 'ppow', to make it // easier to specialize or turn off for specific types and/or backends.x template EIGEN_STRONG_INLINE Packet generic_pow_impl(const Packet& x, const Packet& y) { typedef typename unpacket_traits::type Scalar; // Split x into exponent e_x and mantissa m_x. Packet e_x; Packet m_x = pfrexp(x, e_x); // Adjust m_x to lie in [1/sqrt(2):sqrt(2)] to minimize absolute error in log2(m_x). EIGEN_CONSTEXPR Scalar sqrt_half = Scalar(0.70710678118654752440); const Packet m_x_scale_mask = pcmp_lt(m_x, pset1(sqrt_half)); m_x = pselect(m_x_scale_mask, pmul(pset1(Scalar(2)), m_x), m_x); e_x = pselect(m_x_scale_mask, psub(e_x, pset1(Scalar(1))), e_x); // Compute log2(m_x) with 6 extra bits of accuracy. Packet rx_hi, rx_lo; accurate_log2()(m_x, rx_hi, rx_lo); // Compute the two terms {y * e_x, y * r_x} in f = y * log2(x) with doubled // precision using double word arithmetic. Packet f1_hi, f1_lo, f2_hi, f2_lo; twoprod(e_x, y, f1_hi, f1_lo); twoprod(rx_hi, rx_lo, y, f2_hi, f2_lo); // Sum the two terms in f using double word arithmetic. We know // that |e_x| > |log2(m_x)|, except for the case where e_x==0. // This means that we can use fast_twosum(f1,f2). // In the case e_x == 0, e_x * y = f1 = 0, so we don't lose any // accuracy by violating the assumption of fast_twosum, because // it's a no-op. Packet f_hi, f_lo; fast_twosum(f1_hi, f1_lo, f2_hi, f2_lo, f_hi, f_lo); // Split f into integer and fractional parts. Packet n_z, r_z; absolute_split(f_hi, n_z, r_z); r_z = padd(r_z, f_lo); Packet n_r; absolute_split(r_z, n_r, r_z); n_z = padd(n_z, n_r); // We now have an accurate split of f = n_z + r_z and can compute // x^y = 2**{n_z + r_z) = exp2(r_z) * 2**{n_z}. // Since r_z is in [-0.5;0.5], we compute the first factor to high accuracy // using a specialized algorithm. Multiplication by the second factor can // be done exactly using pldexp(), since it is an integer power of 2. const Packet e_r = fast_accurate_exp2()(r_z); return pldexp(e_r, n_z); } // Generic implementation of pow(x,y). template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet generic_pow(const Packet& x, const Packet& y) { typedef typename unpacket_traits::type Scalar; const Packet cst_pos_inf = pset1(NumTraits::infinity()); const Packet cst_zero = pset1(Scalar(0)); const Packet cst_one = pset1(Scalar(1)); const Packet cst_nan = pset1(NumTraits::quiet_NaN()); const Packet abs_x = pabs(x); // Predicates for sign and magnitude of x. const Packet x_is_zero = pcmp_eq(x, cst_zero); const Packet x_is_neg = pcmp_lt(x, cst_zero); const Packet abs_x_is_inf = pcmp_eq(abs_x, cst_pos_inf); const Packet abs_x_is_one = pcmp_eq(abs_x, cst_one); const Packet abs_x_is_gt_one = pcmp_lt(cst_one, abs_x); const Packet abs_x_is_lt_one = pcmp_lt(abs_x, cst_one); const Packet x_is_one = pandnot(abs_x_is_one, x_is_neg); const Packet x_is_neg_one = pand(abs_x_is_one, x_is_neg); const Packet x_is_nan = pandnot(ptrue(x), pcmp_eq(x, x)); // Predicates for sign and magnitude of y. const Packet y_is_one = pcmp_eq(y, cst_one); const Packet y_is_zero = pcmp_eq(y, cst_zero); const Packet y_is_neg = pcmp_lt(y, cst_zero); const Packet y_is_pos = pandnot(ptrue(y), por(y_is_zero, y_is_neg)); const Packet y_is_nan = pandnot(ptrue(y), pcmp_eq(y, y)); const Packet abs_y_is_inf = pcmp_eq(pabs(y), cst_pos_inf); EIGEN_CONSTEXPR Scalar huge_exponent = (NumTraits::max_exponent() * Scalar(EIGEN_LN2)) / NumTraits::epsilon(); const Packet abs_y_is_huge = pcmp_le(pset1(huge_exponent), pabs(y)); // Predicates for whether y is integer and/or even. const Packet y_is_int = pcmp_eq(pfloor(y), y); const Packet y_div_2 = pmul(y, pset1(Scalar(0.5))); const Packet y_is_even = pcmp_eq(pround(y_div_2), y_div_2); // Predicates encoding special cases for the value of pow(x,y) const Packet invalid_negative_x = pandnot(pandnot(pandnot(x_is_neg, abs_x_is_inf), y_is_int), abs_y_is_inf); const Packet pow_is_one = por(por(x_is_one, y_is_zero), pand(x_is_neg_one, por(abs_y_is_inf, pandnot(y_is_even, invalid_negative_x)))); const Packet pow_is_nan = por(invalid_negative_x, por(x_is_nan, y_is_nan)); const Packet pow_is_zero = por(por(por(pand(x_is_zero, y_is_pos), pand(abs_x_is_inf, y_is_neg)), pand(pand(abs_x_is_lt_one, abs_y_is_huge), y_is_pos)), pand(pand(abs_x_is_gt_one, abs_y_is_huge), y_is_neg)); const Packet pow_is_inf = por(por(por(pand(x_is_zero, y_is_neg), pand(abs_x_is_inf, y_is_pos)), pand(pand(abs_x_is_lt_one, abs_y_is_huge), y_is_neg)), pand(pand(abs_x_is_gt_one, abs_y_is_huge), y_is_pos)); // General computation of pow(x,y) for positive x or negative x and integer y. const Packet negate_pow_abs = pandnot(x_is_neg, y_is_even); const Packet pow_abs = generic_pow_impl(abs_x, y); return pselect(y_is_one, x, pselect(pow_is_one, cst_one, pselect(pow_is_nan, cst_nan, pselect(pow_is_inf, cst_pos_inf, pselect(pow_is_zero, cst_zero, pselect(negate_pow_abs, pnegate(pow_abs), pow_abs)))))); } /* polevl (modified for Eigen) * * Evaluate polynomial * * * * SYNOPSIS: * * int N; * Scalar x, y, coef[N+1]; * * y = polevl( x, coef); * * * * DESCRIPTION: * * Evaluates polynomial of degree N: * * 2 N * y = C + C x + C x +...+ C x * 0 1 2 N * * Coefficients are stored in reverse order: * * coef[0] = C , ..., coef[N] = C . * N 0 * * The function p1evl() assumes that coef[N] = 1.0 and is * omitted from the array. Its calling arguments are * otherwise the same as polevl(). * * * The Eigen implementation is templatized. For best speed, store * coef as a const array (constexpr), e.g. * * const double coef[] = {1.0, 2.0, 3.0, ...}; * */ template struct ppolevl { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet run(const Packet& x, const typename unpacket_traits::type coeff[]) { EIGEN_STATIC_ASSERT((N > 0), YOU_MADE_A_PROGRAMMING_MISTAKE); return pmadd(ppolevl::run(x, coeff), x, pset1(coeff[N])); } }; template struct ppolevl { static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet run(const Packet& x, const typename unpacket_traits::type coeff[]) { EIGEN_UNUSED_VARIABLE(x); return pset1(coeff[0]); } }; /* chbevl (modified for Eigen) * * Evaluate Chebyshev series * * * * SYNOPSIS: * * int N; * Scalar x, y, coef[N], chebevl(); * * y = chbevl( x, coef, N ); * * * * DESCRIPTION: * * Evaluates the series * * N-1 * - ' * y = > coef[i] T (x/2) * - i * i=0 * * of Chebyshev polynomials Ti at argument x/2. * * Coefficients are stored in reverse order, i.e. the zero * order term is last in the array. Note N is the number of * coefficients, not the order. * * If coefficients are for the interval a to b, x must * have been transformed to x -> 2(2x - b - a)/(b-a) before * entering the routine. This maps x from (a, b) to (-1, 1), * over which the Chebyshev polynomials are defined. * * If the coefficients are for the inverted interval, in * which (a, b) is mapped to (1/b, 1/a), the transformation * required is x -> 2(2ab/x - b - a)/(b-a). If b is infinity, * this becomes x -> 4a/x - 1. * * * * SPEED: * * Taking advantage of the recurrence properties of the * Chebyshev polynomials, the routine requires one more * addition per loop than evaluating a nested polynomial of * the same degree. * */ template struct pchebevl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Packet run(Packet x, const typename unpacket_traits::type coef[]) { typedef typename unpacket_traits::type Scalar; Packet b0 = pset1(coef[0]); Packet b1 = pset1(static_cast(0.f)); Packet b2; for (int i = 1; i < N; i++) { b2 = b1; b1 = b0; b0 = psub(pmadd(x, b1, pset1(coef[i])), b2); } return pmul(pset1(static_cast(0.5f)), psub(b0, b2)); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/Half.h0000644000176200001440000010531614562417221022572 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. // // The conversion routines are Copyright (c) Fabian Giesen, 2016. // The original license follows: // // Copyright (c) Fabian Giesen, 2016 // All rights reserved. // Redistribution and use in source and binary forms, with or without // modification, are permitted. // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT // HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // Standard 16-bit float type, mostly useful for GPUs. Defines a new // type Eigen::half (inheriting either from CUDA's or HIP's __half struct) with // operator overloads such that it behaves basically as an arithmetic // type. It will be quite slow on CPUs (so it is recommended to stay // in fp32 for CPUs, except for simple parameter conversions, I/O // to disk and the likes), but fast on GPUs. #ifndef EIGEN_HALF_H #define EIGEN_HALF_H #include #if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) // When compiling with GPU support, the "__half_raw" base class as well as // some other routines are defined in the GPU compiler header files // (cuda_fp16.h, hip_fp16.h), and they are not tagged constexpr // As a consequence, we get compile failures when compiling Eigen with // GPU support. Hence the need to disable EIGEN_CONSTEXPR when building // Eigen with GPU support #pragma push_macro("EIGEN_CONSTEXPR") #undef EIGEN_CONSTEXPR #define EIGEN_CONSTEXPR #endif #define F16_PACKET_FUNCTION(PACKET_F, PACKET_F16, METHOD) \ template <> \ EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_UNUSED \ PACKET_F16 METHOD(const PACKET_F16& _x) { \ return float2half(METHOD(half2float(_x))); \ } namespace Eigen { struct half; namespace half_impl { // We want to use the __half_raw struct from the HIP header file only during the device compile phase. // This is required because of a quirk in the way TensorFlow GPU builds are done. // When compiling TensorFlow source code with GPU support, files that // * contain GPU kernels (i.e. *.cu.cc files) are compiled via hipcc // * do not contain GPU kernels ( i.e. *.cc files) are compiled via gcc (typically) // // Tensorflow uses the Eigen::half type as its FP16 type, and there are functions that // * are defined in a file that gets compiled via hipcc AND // * have Eigen::half as a pass-by-value argument AND // * are called in a file that gets compiled via gcc // // In the scenario described above the caller and callee will see different versions // of the Eigen::half base class __half_raw, and they will be compiled by different compilers // // There appears to be an ABI mismatch between gcc and clang (which is called by hipcc) that results in // the callee getting corrupted values for the Eigen::half argument. // // Making the host side compile phase of hipcc use the same Eigen::half impl, as the gcc compile, resolves // this error, and hence the following convoluted #if condition #if !defined(EIGEN_HAS_GPU_FP16) || !defined(EIGEN_GPU_COMPILE_PHASE) // Make our own __half_raw definition that is similar to CUDA's. struct __half_raw { #if (defined(EIGEN_HAS_GPU_FP16) && !defined(EIGEN_GPU_COMPILE_PHASE)) // Eigen::half can be used as the datatype for shared memory declarations (in Eigen and TF) // The element type for shared memory cannot have non-trivial constructors // and hence the following special casing (which skips the zero-initilization). // Note that this check gets done even in the host compilation phase, and // hence the need for this EIGEN_DEVICE_FUNC __half_raw() {} #else EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw() : x(0) {} #endif #if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(numext::bit_cast<__fp16>(raw)) { } __fp16 x; #else explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw(numext::uint16_t raw) : x(raw) {} numext::uint16_t x; #endif }; #elif defined(EIGEN_HAS_HIP_FP16) // Nothing to do here // HIP fp16 header file has a definition for __half_raw #elif defined(EIGEN_HAS_CUDA_FP16) #if EIGEN_CUDA_SDK_VER < 90000 // In CUDA < 9.0, __half is the equivalent of CUDA 9's __half_raw typedef __half __half_raw; #endif // defined(EIGEN_HAS_CUDA_FP16) #elif defined(SYCL_DEVICE_ONLY) typedef cl::sycl::half __half_raw; #endif EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x); EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff); EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h); struct half_base : public __half_raw { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base() {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half_raw& h) : __half_raw(h) {} #if defined(EIGEN_HAS_GPU_FP16) #if defined(EIGEN_HAS_HIP_FP16) EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) { x = __half_as_ushort(h); } #elif defined(EIGEN_HAS_CUDA_FP16) #if EIGEN_CUDA_SDK_VER >= 90000 EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half_base(const __half& h) : __half_raw(*(__half_raw*)&h) {} #endif #endif #endif }; } // namespace half_impl // Class definition. struct half : public half_impl::half_base { // Writing this out as separate #if-else blocks to make the code easier to follow // The same applies to most #if-else blocks in this file #if !defined(EIGEN_HAS_GPU_FP16) || !defined(EIGEN_GPU_COMPILE_PHASE) // Use the same base class for the following two scenarios // * when compiling without GPU support enabled // * during host compile phase when compiling with GPU support enabled typedef half_impl::__half_raw __half_raw; #elif defined(EIGEN_HAS_HIP_FP16) // Nothing to do here // HIP fp16 header file has a definition for __half_raw #elif defined(EIGEN_HAS_CUDA_FP16) // Note that EIGEN_CUDA_SDK_VER is set to 0 even when compiling with HIP, so // (EIGEN_CUDA_SDK_VER < 90000) is true even for HIP! So keeping this within // #if defined(EIGEN_HAS_CUDA_FP16) is needed #if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER < 90000 typedef half_impl::__half_raw __half_raw; #endif #endif EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half() {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half_raw& h) : half_impl::half_base(h) {} #if defined(EIGEN_HAS_GPU_FP16) #if defined(EIGEN_HAS_HIP_FP16) EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {} #elif defined(EIGEN_HAS_CUDA_FP16) #if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000 EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(const __half& h) : half_impl::half_base(h) {} #endif #endif #endif explicit EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR half(bool b) : half_impl::half_base(half_impl::raw_uint16_to_half(b ? 0x3c00 : 0)) {} template explicit EIGEN_DEVICE_FUNC half(T val) : half_impl::half_base(half_impl::float_to_half_rtne(static_cast(val))) {} explicit EIGEN_DEVICE_FUNC half(float f) : half_impl::half_base(half_impl::float_to_half_rtne(f)) {} // Following the convention of numpy, converting between complex and // float will lead to loss of imag value. template explicit EIGEN_DEVICE_FUNC half(std::complex c) : half_impl::half_base(half_impl::float_to_half_rtne(static_cast(c.real()))) {} EIGEN_DEVICE_FUNC operator float() const { // NOLINT: Allow implicit conversion to float, because it is lossless. return half_impl::half_to_float(*this); } #if defined(EIGEN_HAS_GPU_FP16) && !defined(EIGEN_GPU_COMPILE_PHASE) EIGEN_DEVICE_FUNC operator __half() const { ::__half_raw hr; hr.x = x; return __half(hr); } #endif }; } // end namespace Eigen namespace std { template<> struct numeric_limits { static const bool is_specialized = true; static const bool is_signed = true; static const bool is_integer = false; static const bool is_exact = false; static const bool has_infinity = true; static const bool has_quiet_NaN = true; static const bool has_signaling_NaN = true; static const float_denorm_style has_denorm = denorm_present; static const bool has_denorm_loss = false; static const std::float_round_style round_style = std::round_to_nearest; static const bool is_iec559 = false; static const bool is_bounded = false; static const bool is_modulo = false; static const int digits = 11; static const int digits10 = 3; // according to http://half.sourceforge.net/structstd_1_1numeric__limits_3_01half__float_1_1half_01_4.html static const int max_digits10 = 5; // according to http://half.sourceforge.net/structstd_1_1numeric__limits_3_01half__float_1_1half_01_4.html static const int radix = 2; static const int min_exponent = -13; static const int min_exponent10 = -4; static const int max_exponent = 16; static const int max_exponent10 = 4; static const bool traps = true; static const bool tinyness_before = false; static Eigen::half (min)() { return Eigen::half_impl::raw_uint16_to_half(0x400); } static Eigen::half lowest() { return Eigen::half_impl::raw_uint16_to_half(0xfbff); } static Eigen::half (max)() { return Eigen::half_impl::raw_uint16_to_half(0x7bff); } static Eigen::half epsilon() { return Eigen::half_impl::raw_uint16_to_half(0x0800); } static Eigen::half round_error() { return Eigen::half(0.5); } static Eigen::half infinity() { return Eigen::half_impl::raw_uint16_to_half(0x7c00); } static Eigen::half quiet_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7e00); } static Eigen::half signaling_NaN() { return Eigen::half_impl::raw_uint16_to_half(0x7d00); } static Eigen::half denorm_min() { return Eigen::half_impl::raw_uint16_to_half(0x1); } }; // If std::numeric_limits is specialized, should also specialize // std::numeric_limits, std::numeric_limits, and // std::numeric_limits // https://stackoverflow.com/a/16519653/ template<> struct numeric_limits : numeric_limits {}; template<> struct numeric_limits : numeric_limits {}; template<> struct numeric_limits : numeric_limits {}; } // end namespace std namespace Eigen { namespace half_impl { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \ EIGEN_CUDA_ARCH >= 530) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE)) // Note: We deliberatly do *not* define this to 1 even if we have Arm's native // fp16 type since GPU halfs are rather different from native CPU halfs. // TODO: Rename to something like EIGEN_HAS_NATIVE_GPU_FP16 #define EIGEN_HAS_NATIVE_FP16 #endif // Intrinsics for native fp16 support. Note that on current hardware, // these are no faster than fp32 arithmetic (you need to use the half2 // versions to get the ALU speed increased), but you do save the // conversion steps back and forth. #if defined(EIGEN_HAS_NATIVE_FP16) EIGEN_STRONG_INLINE __device__ half operator + (const half& a, const half& b) { #if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000 return __hadd(::__half(a), ::__half(b)); #else return __hadd(a, b); #endif } EIGEN_STRONG_INLINE __device__ half operator * (const half& a, const half& b) { return __hmul(a, b); } EIGEN_STRONG_INLINE __device__ half operator - (const half& a, const half& b) { return __hsub(a, b); } EIGEN_STRONG_INLINE __device__ half operator / (const half& a, const half& b) { #if defined(EIGEN_CUDA_SDK_VER) && EIGEN_CUDA_SDK_VER >= 90000 return __hdiv(a, b); #else float num = __half2float(a); float denom = __half2float(b); return __float2half(num / denom); #endif } EIGEN_STRONG_INLINE __device__ half operator - (const half& a) { return __hneg(a); } EIGEN_STRONG_INLINE __device__ half& operator += (half& a, const half& b) { a = a + b; return a; } EIGEN_STRONG_INLINE __device__ half& operator *= (half& a, const half& b) { a = a * b; return a; } EIGEN_STRONG_INLINE __device__ half& operator -= (half& a, const half& b) { a = a - b; return a; } EIGEN_STRONG_INLINE __device__ half& operator /= (half& a, const half& b) { a = a / b; return a; } EIGEN_STRONG_INLINE __device__ bool operator == (const half& a, const half& b) { return __heq(a, b); } EIGEN_STRONG_INLINE __device__ bool operator != (const half& a, const half& b) { return __hne(a, b); } EIGEN_STRONG_INLINE __device__ bool operator < (const half& a, const half& b) { return __hlt(a, b); } EIGEN_STRONG_INLINE __device__ bool operator <= (const half& a, const half& b) { return __hle(a, b); } EIGEN_STRONG_INLINE __device__ bool operator > (const half& a, const half& b) { return __hgt(a, b); } EIGEN_STRONG_INLINE __device__ bool operator >= (const half& a, const half& b) { return __hge(a, b); } #endif #if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) { return half(vaddh_f16(a.x, b.x)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator * (const half& a, const half& b) { return half(vmulh_f16(a.x, b.x)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a, const half& b) { return half(vsubh_f16(a.x, b.x)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, const half& b) { return half(vdivh_f16(a.x, b.x)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a) { return half(vnegh_f16(a.x)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator += (half& a, const half& b) { a = half(vaddh_f16(a.x, b.x)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator *= (half& a, const half& b) { a = half(vmulh_f16(a.x, b.x)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator -= (half& a, const half& b) { a = half(vsubh_f16(a.x, b.x)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator /= (half& a, const half& b) { a = half(vdivh_f16(a.x, b.x)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const half& a, const half& b) { return vceqh_f16(a.x, b.x); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const half& a, const half& b) { return !vceqh_f16(a.x, b.x); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const half& a, const half& b) { return vclth_f16(a.x, b.x); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const half& a, const half& b) { return vcleh_f16(a.x, b.x); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const half& a, const half& b) { return vcgth_f16(a.x, b.x); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const half& a, const half& b) { return vcgeh_f16(a.x, b.x); } // We need to distinguish ‘clang as the CUDA compiler’ from ‘clang as the host compiler, // invoked by NVCC’ (e.g. on MacOS). The former needs to see both host and device implementation // of the functions, while the latter can only deal with one of them. #elif !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) // Emulate support for half floats #if EIGEN_COMP_CLANG && defined(EIGEN_CUDACC) // We need to provide emulated *host-side* FP16 operators for clang. #pragma push_macro("EIGEN_DEVICE_FUNC") #undef EIGEN_DEVICE_FUNC #if defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_HAS_NATIVE_FP16) #define EIGEN_DEVICE_FUNC __host__ #else // both host and device need emulated ops. #define EIGEN_DEVICE_FUNC __host__ __device__ #endif #endif // Definitions for CPUs and older HIP+CUDA, mostly working through conversion // to/from fp32. EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator + (const half& a, const half& b) { return half(float(a) + float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator * (const half& a, const half& b) { return half(float(a) * float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a, const half& b) { return half(float(a) - float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, const half& b) { return half(float(a) / float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator - (const half& a) { half result; result.x = a.x ^ 0x8000; return result; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator += (half& a, const half& b) { a = half(float(a) + float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator *= (half& a, const half& b) { a = half(float(a) * float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator -= (half& a, const half& b) { a = half(float(a) - float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half& operator /= (half& a, const half& b) { a = half(float(a) / float(b)); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator == (const half& a, const half& b) { return numext::equal_strict(float(a),float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator != (const half& a, const half& b) { return numext::not_equal_strict(float(a), float(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator < (const half& a, const half& b) { return float(a) < float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator <= (const half& a, const half& b) { return float(a) <= float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator > (const half& a, const half& b) { return float(a) > float(b); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool operator >= (const half& a, const half& b) { return float(a) >= float(b); } #if defined(__clang__) && defined(__CUDA__) #pragma pop_macro("EIGEN_DEVICE_FUNC") #endif #endif // Emulate support for half floats // Division by an index. Do it in full float precision to avoid accuracy // issues in converting the denominator to half. EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator / (const half& a, Index b) { return half(static_cast(a) / static_cast(b)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator++(half& a) { a += half(1); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator--(half& a) { a -= half(1); return a; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator++(half& a, int) { half original_value = a; ++a; return original_value; } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half operator--(half& a, int) { half original_value = a; --a; return original_value; } // Conversion routines, including fallbacks for the host or older CUDA. // Note that newer Intel CPUs (Haswell or newer) have vectorized versions of // these in hardware. If we need more performance on older/other CPUs, they are // also possible to vectorize directly. EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR __half_raw raw_uint16_to_half(numext::uint16_t x) { // We cannot simply do a "return __half_raw(x)" here, because __half_raw is union type // in the hip_fp16 header file, and that will trigger a compile error // On the other hand, having anything but a return statement also triggers a compile error // because this is constexpr function. // Fortunately, since we need to disable EIGEN_CONSTEXPR for GPU anyway, we can get out // of this catch22 by having separate bodies for GPU / non GPU #if defined(EIGEN_HAS_GPU_FP16) __half_raw h; h.x = x; return h; #else return __half_raw(x); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC numext::uint16_t raw_half_as_uint16(const __half_raw& h) { // HIP/CUDA/Default have a member 'x' of type uint16_t. // For ARM64 native half, the member 'x' is of type __fp16, so we need to bit-cast. // For SYCL, cl::sycl::half is _Float16, so cast directly. #if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) return numext::bit_cast(h.x); #elif defined(SYCL_DEVICE_ONLY) return numext::bit_cast(h); #else return h.x; #endif } union float32_bits { unsigned int u; float f; }; EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC __half_raw float_to_half_rtne(float ff) { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) __half tmp_ff = __float2half(ff); return *(__half_raw*)&tmp_ff; #elif defined(EIGEN_HAS_FP16_C) __half_raw h; h.x = _cvtss_sh(ff, 0); return h; #elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) __half_raw h; h.x = static_cast<__fp16>(ff); return h; #else float32_bits f; f.f = ff; const float32_bits f32infty = { 255 << 23 }; const float32_bits f16max = { (127 + 16) << 23 }; const float32_bits denorm_magic = { ((127 - 15) + (23 - 10) + 1) << 23 }; unsigned int sign_mask = 0x80000000u; __half_raw o; o.x = static_cast(0x0u); unsigned int sign = f.u & sign_mask; f.u ^= sign; // NOTE all the integer compares in this function can be safely // compiled into signed compares since all operands are below // 0x80000000. Important if you want fast straight SSE2 code // (since there's no unsigned PCMPGTD). if (f.u >= f16max.u) { // result is Inf or NaN (all exponent bits set) o.x = (f.u > f32infty.u) ? 0x7e00 : 0x7c00; // NaN->qNaN and Inf->Inf } else { // (De)normalized number or zero if (f.u < (113 << 23)) { // resulting FP16 is subnormal or zero // use a magic value to align our 10 mantissa bits at the bottom of // the float. as long as FP addition is round-to-nearest-even this // just works. f.f += denorm_magic.f; // and one integer subtract of the bias later, we have our final float! o.x = static_cast(f.u - denorm_magic.u); } else { unsigned int mant_odd = (f.u >> 13) & 1; // resulting mantissa is odd // update exponent, rounding bias part 1 // Equivalent to `f.u += ((unsigned int)(15 - 127) << 23) + 0xfff`, but // without arithmetic overflow. f.u += 0xc8000fffU; // rounding bias part 2 f.u += mant_odd; // take the bits! o.x = static_cast(f.u >> 13); } } o.x |= static_cast(sign >> 16); return o; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC float half_to_float(__half_raw h) { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __half2float(h); #elif defined(EIGEN_HAS_FP16_C) return _cvtsh_ss(h.x); #elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) return static_cast(h.x); #else const float32_bits magic = { 113 << 23 }; const unsigned int shifted_exp = 0x7c00 << 13; // exponent mask after shift float32_bits o; o.u = (h.x & 0x7fff) << 13; // exponent/mantissa bits unsigned int exp = shifted_exp & o.u; // just the exponent o.u += (127 - 15) << 23; // exponent adjust // handle exponent special cases if (exp == shifted_exp) { // Inf/NaN? o.u += (128 - 16) << 23; // extra exp adjust } else if (exp == 0) { // Zero/Denormal? o.u += 1 << 23; // extra exp adjust o.f -= magic.f; // renormalize } o.u |= (h.x & 0x8000) << 16; // sign bit return o.f; #endif } // --- standard functions --- EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isinf)(const half& a) { #ifdef EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC return (numext::bit_cast(a.x) & 0x7fff) == 0x7c00; #else return (a.x & 0x7fff) == 0x7c00; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isnan)(const half& a) { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __hisnan(a); #elif defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) return (numext::bit_cast(a.x) & 0x7fff) > 0x7c00; #else return (a.x & 0x7fff) > 0x7c00; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool (isfinite)(const half& a) { return !(isinf EIGEN_NOT_A_MACRO (a)) && !(isnan EIGEN_NOT_A_MACRO (a)); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half abs(const half& a) { #if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) return half(vabsh_f16(a.x)); #else half result; result.x = a.x & 0x7FFF; return result; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half exp(const half& a) { #if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \ defined(EIGEN_HIP_DEVICE_COMPILE) return half(hexp(a)); #else return half(::expf(float(a))); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half expm1(const half& a) { return half(numext::expm1(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log(const half& a) { #if (defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDA_SDK_VER >= 80000 && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return half(::hlog(a)); #else return half(::logf(float(a))); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log1p(const half& a) { return half(numext::log1p(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log10(const half& a) { return half(::log10f(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half log2(const half& a) { return half(static_cast(EIGEN_LOG2E) * ::logf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half sqrt(const half& a) { #if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 530) || \ defined(EIGEN_HIP_DEVICE_COMPILE) return half(hsqrt(a)); #else return half(::sqrtf(float(a))); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half pow(const half& a, const half& b) { return half(::powf(float(a), float(b))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half sin(const half& a) { return half(::sinf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half cos(const half& a) { return half(::cosf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tan(const half& a) { return half(::tanf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half tanh(const half& a) { return half(::tanhf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half asin(const half& a) { return half(::asinf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half acos(const half& a) { return half(::acosf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) { #if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \ defined(EIGEN_HIP_DEVICE_COMPILE) return half(hfloor(a)); #else return half(::floorf(float(a))); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) { #if (EIGEN_CUDA_SDK_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \ defined(EIGEN_HIP_DEVICE_COMPILE) return half(hceil(a)); #else return half(::ceilf(float(a))); #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half rint(const half& a) { return half(::rintf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half round(const half& a) { return half(::roundf(float(a))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half fmod(const half& a, const half& b) { return half(::fmodf(float(a), float(b))); } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (min)(const half& a, const half& b) { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __hlt(b, a) ? b : a; #else const float f1 = static_cast(a); const float f2 = static_cast(b); return f2 < f1 ? b : a; #endif } EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half (max)(const half& a, const half& b) { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 530) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __hlt(a, b) ? b : a; #else const float f1 = static_cast(a); const float f2 = static_cast(b); return f1 < f2 ? b : a; #endif } #ifndef EIGEN_NO_IO EIGEN_ALWAYS_INLINE std::ostream& operator << (std::ostream& os, const half& v) { os << static_cast(v); return os; } #endif } // end namespace half_impl // import Eigen::half_impl::half into Eigen namespace // using half_impl::half; namespace internal { template<> struct random_default_impl { static inline half run(const half& x, const half& y) { return x + (y-x) * half(float(std::rand()) / float(RAND_MAX)); } static inline half run() { return run(half(-1.f), half(1.f)); } }; template<> struct is_arithmetic { enum { value = true }; }; } // end namespace internal template<> struct NumTraits : GenericNumTraits { enum { IsSigned = true, IsInteger = false, IsComplex = false, RequireInitialization = false }; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half epsilon() { return half_impl::raw_uint16_to_half(0x0800); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half dummy_precision() { return half_impl::raw_uint16_to_half(0x211f); // Eigen::half(1e-2f); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half highest() { return half_impl::raw_uint16_to_half(0x7bff); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half lowest() { return half_impl::raw_uint16_to_half(0xfbff); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half infinity() { return half_impl::raw_uint16_to_half(0x7c00); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static EIGEN_STRONG_INLINE Eigen::half quiet_NaN() { return half_impl::raw_uint16_to_half(0x7e00); } }; } // end namespace Eigen #if defined(EIGEN_HAS_GPU_FP16) || defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) #pragma pop_macro("EIGEN_CONSTEXPR") #endif namespace Eigen { namespace numext { #if defined(EIGEN_GPU_COMPILE_PHASE) template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isnan)(const Eigen::half& h) { return (half_impl::isnan)(h); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isinf)(const Eigen::half& h) { return (half_impl::isinf)(h); } template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE bool(isfinite)(const Eigen::half& h) { return (half_impl::isfinite)(h); } #endif template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Eigen::half bit_cast(const uint16_t& src) { return Eigen::half(Eigen::half_impl::raw_uint16_to_half(src)); } template <> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC uint16_t bit_cast(const Eigen::half& src) { return Eigen::half_impl::raw_half_as_uint16(src); } } // namespace numext } // namespace Eigen // Add the missing shfl* intrinsics. // The __shfl* functions are only valid on HIP or _CUDA_ARCH_ >= 300. // CUDA defines them for (__CUDA_ARCH__ >= 300 || !defined(__CUDA_ARCH__)) // // HIP and CUDA prior to SDK 9.0 define // __shfl, __shfl_up, __shfl_down, __shfl_xor for int and float // CUDA since 9.0 deprecates those and instead defines // __shfl_sync, __shfl_up_sync, __shfl_down_sync, __shfl_xor_sync, // with native support for __half and __nv_bfloat16 // // Note that the following are __device__ - only functions. #if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 300)) \ || defined(EIGEN_HIPCC) #if defined(EIGEN_HAS_CUDA_FP16) && EIGEN_CUDA_SDK_VER >= 90000 __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_sync(unsigned mask, Eigen::half var, int srcLane, int width=warpSize) { const __half h = var; return static_cast(__shfl_sync(mask, h, srcLane, width)); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) { const __half h = var; return static_cast(__shfl_up_sync(mask, h, delta, width)); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down_sync(unsigned mask, Eigen::half var, unsigned int delta, int width=warpSize) { const __half h = var; return static_cast(__shfl_down_sync(mask, h, delta, width)); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor_sync(unsigned mask, Eigen::half var, int laneMask, int width=warpSize) { const __half h = var; return static_cast(__shfl_xor_sync(mask, h, laneMask, width)); } #else // HIP or CUDA SDK < 9.0 __device__ EIGEN_STRONG_INLINE Eigen::half __shfl(Eigen::half var, int srcLane, int width=warpSize) { const int ivar = static_cast(Eigen::numext::bit_cast(var)); return Eigen::numext::bit_cast(static_cast(__shfl(ivar, srcLane, width))); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_up(Eigen::half var, unsigned int delta, int width=warpSize) { const int ivar = static_cast(Eigen::numext::bit_cast(var)); return Eigen::numext::bit_cast(static_cast(__shfl_up(ivar, delta, width))); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_down(Eigen::half var, unsigned int delta, int width=warpSize) { const int ivar = static_cast(Eigen::numext::bit_cast(var)); return Eigen::numext::bit_cast(static_cast(__shfl_down(ivar, delta, width))); } __device__ EIGEN_STRONG_INLINE Eigen::half __shfl_xor(Eigen::half var, int laneMask, int width=warpSize) { const int ivar = static_cast(Eigen::numext::bit_cast(var)); return Eigen::numext::bit_cast(static_cast(__shfl_xor(ivar, laneMask, width))); } #endif // HIP vs CUDA #endif // __shfl* // ldg() has an overload for __half_raw, but we also need one for Eigen::half. #if (defined(EIGEN_CUDACC) && (!defined(EIGEN_CUDA_ARCH) || EIGEN_CUDA_ARCH >= 350)) \ || defined(EIGEN_HIPCC) EIGEN_STRONG_INLINE __device__ Eigen::half __ldg(const Eigen::half* ptr) { return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast(ptr))); } #endif // __ldg #if EIGEN_HAS_STD_HASH namespace std { template <> struct hash { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t operator()(const Eigen::half& a) const { return static_cast(Eigen::numext::bit_cast(a)); } }; } // end namespace std #endif #endif // EIGEN_HALF_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/TypeCasting.h0000644000176200001440000000724214562417221024151 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Benoit Steiner // Copyright (C) 2019 Rasmus Munk Larsen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_GENERIC_TYPE_CASTING_H #define EIGEN_GENERIC_TYPE_CASTING_H namespace Eigen { namespace internal { template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef Eigen::half result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const float& a) const { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __float2half(a); #else return Eigen::half(a); #endif } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef Eigen::half result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::half operator() (const int& a) const { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __float2half(static_cast(a)); #else return Eigen::half(static_cast(a)); #endif } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef float result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::half& a) const { #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && EIGEN_CUDA_ARCH >= 300) || \ (defined(EIGEN_HAS_HIP_FP16) && defined(EIGEN_HIP_DEVICE_COMPILE)) return __half2float(a); #else return static_cast(a); #endif } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef Eigen::bfloat16 result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const float& a) const { return Eigen::bfloat16(a); } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef Eigen::bfloat16 result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Eigen::bfloat16 operator() (const int& a) const { return Eigen::bfloat16(static_cast(a)); } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template<> struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef float result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator() (const Eigen::bfloat16& a) const { return static_cast(a); } }; template<> struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; } } #endif // EIGEN_GENERIC_TYPE_CASTING_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/Settings.h0000644000176200001440000000332214562417221023512 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /* All the parameters defined in this file can be specialized in the * architecture specific files, and/or by the user. * More to come... */ #ifndef EIGEN_DEFAULT_SETTINGS_H #define EIGEN_DEFAULT_SETTINGS_H /** Defines the maximal loop size to enable meta unrolling of loops. * Note that the value here is expressed in Eigen's own notion of "number of FLOPS", * it does not correspond to the number of iterations or the number of instructions */ #ifndef EIGEN_UNROLLING_LIMIT #define EIGEN_UNROLLING_LIMIT 110 #endif /** Defines the threshold between a "small" and a "large" matrix. * This threshold is mainly used to select the proper product implementation. */ #ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD #define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8 #endif /** Defines the maximal width of the blocks used in the triangular product and solver * for vectors (level 2 blas xTRMV and xTRSV). The default is 8. */ #ifndef EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH #define EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH 8 #endif /** Defines the default number of registers available for that architecture. * Currently it must be 8 or 16. Other values will fail. */ #ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS #define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 8 #endif #endif // EIGEN_DEFAULT_SETTINGS_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/GenericPacketMathFunctionsFwd.h0000644000176200001440000000727214562417221027572 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2019 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H #define EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H namespace Eigen { namespace internal { // Forward declarations of the generic math functions // implemented in GenericPacketMathFunctions.h // This is needed to workaround a circular dependency. /*************************************************************************** * Some generic implementations to be used by implementors ***************************************************************************/ /** Default implementation of pfrexp. * It is expected to be called by implementers of template<> pfrexp. */ template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pfrexp_generic(const Packet& a, Packet& exponent); // Extracts the biased exponent value from Packet p, and casts the results to // a floating-point Packet type. Used by pfrexp_generic. Override this if // there is no unpacket_traits::integer_packet. template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pfrexp_generic_get_biased_exponent(const Packet& p); /** Default implementation of pldexp. * It is expected to be called by implementers of template<> pldexp. */ template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Packet pldexp_generic(const Packet& a, const Packet& exponent); /** \internal \returns log(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_float(const Packet _x); /** \internal \returns log2(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog2_float(const Packet _x); /** \internal \returns log(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog_double(const Packet _x); /** \internal \returns log2(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet plog2_double(const Packet _x); /** \internal \returns log(1 + x) */ template Packet generic_plog1p(const Packet& x); /** \internal \returns exp(x)-1 */ template Packet generic_expm1(const Packet& x); /** \internal \returns exp(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pexp_float(const Packet _x); /** \internal \returns exp(x) for double precision real numbers */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pexp_double(const Packet _x); /** \internal \returns sin(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet psin_float(const Packet& x); /** \internal \returns cos(x) for single precision float */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet pcos_float(const Packet& x); /** \internal \returns sqrt(x) for complex types */ template EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet psqrt_complex(const Packet& a); template struct ppolevl; } // end namespace internal } // end namespace Eigen #endif // EIGEN_ARCH_GENERIC_PACKET_MATH_FUNCTIONS_FWD_H RcppEigen/inst/include/Eigen/src/Core/arch/Default/ConjHelper.h0000644000176200001440000001220314562417221023741 0ustar liggesusers // This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARCH_CONJ_HELPER_H #define EIGEN_ARCH_CONJ_HELPER_H #define EIGEN_MAKE_CONJ_HELPER_CPLX_REAL(PACKET_CPLX, PACKET_REAL) \ template <> \ struct conj_helper { \ EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_REAL& x, \ const PACKET_CPLX& y, \ const PACKET_CPLX& c) const { \ return padd(c, this->pmul(x, y)); \ } \ EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_REAL& x, \ const PACKET_CPLX& y) const { \ return PACKET_CPLX(Eigen::internal::pmul(x, y.v)); \ } \ }; \ \ template <> \ struct conj_helper { \ EIGEN_STRONG_INLINE PACKET_CPLX pmadd(const PACKET_CPLX& x, \ const PACKET_REAL& y, \ const PACKET_CPLX& c) const { \ return padd(c, this->pmul(x, y)); \ } \ EIGEN_STRONG_INLINE PACKET_CPLX pmul(const PACKET_CPLX& x, \ const PACKET_REAL& y) const { \ return PACKET_CPLX(Eigen::internal::pmul(x.v, y)); \ } \ }; namespace Eigen { namespace internal { template struct conj_if; template<> struct conj_if { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const { return numext::conj(x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T pconj(const T& x) const { return internal::pconj(x); } }; template<> struct conj_if { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& operator()(const T& x) const { return x; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& pconj(const T& x) const { return x; } }; // Generic Implementation, assume scalars since the packet-version is // specialized below. template struct conj_helper { typedef typename ScalarBinaryOpTraits::ReturnType ResultType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType pmadd(const LhsType& x, const RhsType& y, const ResultType& c) const { return this->pmul(x, y) + c; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType pmul(const LhsType& x, const RhsType& y) const { return conj_if()(x) * conj_if()(y); } }; template struct conj_helper { typedef typename ScalarBinaryOpTraits::ReturnType ResultType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType pmadd(const LhsScalar& x, const RhsScalar& y, const ResultType& c) const { return this->pmul(x, y) + c; } // We save a conjuation by using the identity conj(a)*conj(b) = conj(a*b). EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType pmul(const LhsScalar& x, const RhsScalar& y) const { return numext::conj(x * y); } }; // Implementation with equal type, use packet operations. template struct conj_helper { typedef Packet ResultType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmadd(const Packet& x, const Packet& y, const Packet& c) const { return Eigen::internal::pmadd(conj_if().pconj(x), conj_if().pconj(y), c); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmul(const Packet& x, const Packet& y) const { return Eigen::internal::pmul(conj_if().pconj(x), conj_if().pconj(y)); } }; template struct conj_helper { typedef Packet ResultType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmadd(const Packet& x, const Packet& y, const Packet& c) const { return Eigen::internal::pmadd(pconj(x), pconj(y), c); } // We save a conjuation by using the identity conj(a)*conj(b) = conj(a*b). EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet pmul(const Packet& x, const Packet& y) const { return pconj(Eigen::internal::pmul(x, y)); } }; } // namespace internal } // namespace Eigen #endif // EIGEN_ARCH_CONJ_HELPER_H RcppEigen/inst/include/Eigen/src/Core/functors/0000755000176200001440000000000014562417221021103 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/functors/StlFunctors.h0000644000176200001440000001160614562417221023546 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STL_FUNCTORS_H #define EIGEN_STL_FUNCTORS_H namespace Eigen { // Portable replacements for certain functors. namespace numext { template struct equal_to { typedef bool result_type; EIGEN_DEVICE_FUNC bool operator()(const T& lhs, const T& rhs) const { return lhs == rhs; } }; template struct not_equal_to { typedef bool result_type; EIGEN_DEVICE_FUNC bool operator()(const T& lhs, const T& rhs) const { return lhs != rhs; } }; } namespace internal { // default functor traits for STL functors: template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > : functor_traits > {}; template struct functor_traits > { enum { Cost = 1, PacketAccess = false }; }; template struct functor_traits > : functor_traits > {}; #if (EIGEN_COMP_CXXVER < 11) // std::binder* are deprecated since c++11 and will be removed in c++17 template struct functor_traits > { enum { Cost = functor_traits::Cost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = functor_traits::Cost, PacketAccess = false }; }; #endif #if (EIGEN_COMP_CXXVER < 17) // std::unary_negate is deprecated since c++17 and will be removed in c++20 template struct functor_traits > { enum { Cost = 1 + functor_traits::Cost, PacketAccess = false }; }; // std::binary_negate is deprecated since c++17 and will be removed in c++20 template struct functor_traits > { enum { Cost = 1 + functor_traits::Cost, PacketAccess = false }; }; #endif #ifdef EIGEN_STDEXT_SUPPORT template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; template struct functor_traits > > { enum { Cost = 0, PacketAccess = false }; }; template struct functor_traits > > { enum { Cost = 0, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = functor_traits::Cost + functor_traits::Cost, PacketAccess = false }; }; template struct functor_traits > { enum { Cost = functor_traits::Cost + functor_traits::Cost + functor_traits::Cost, PacketAccess = false }; }; #endif // EIGEN_STDEXT_SUPPORT // allow to add new functors and specializations of functor_traits from outside Eigen. // this macro is really needed because functor_traits must be specialized after it is declared but before it is used... #ifdef EIGEN_FUNCTORS_PLUGIN #include EIGEN_FUNCTORS_PLUGIN #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_STL_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/functors/UnaryFunctors.h0000644000176200001440000011632214562417221024103 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_UNARY_FUNCTORS_H #define EIGEN_UNARY_FUNCTORS_H namespace Eigen { namespace internal { /** \internal * \brief Template functor to compute the opposite of a scalar * * \sa class CwiseUnaryOp, MatrixBase::operator- */ template struct scalar_opposite_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_opposite_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return -a; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pnegate(a); } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = packet_traits::HasNegate }; }; /** \internal * \brief Template functor to compute the absolute value of a scalar * * \sa class CwiseUnaryOp, Cwise::abs */ template struct scalar_abs_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_abs_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::abs(a); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pabs(a); } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = packet_traits::HasAbs }; }; /** \internal * \brief Template functor to compute the score of a scalar, to chose a pivot * * \sa class CwiseUnaryOp */ template struct scalar_score_coeff_op : scalar_abs_op { typedef void Score_is_abs; }; template struct functor_traits > : functor_traits > {}; /* Avoid recomputing abs when we know the score and they are the same. Not a true Eigen functor. */ template struct abs_knowing_score { EIGEN_EMPTY_STRUCT_CTOR(abs_knowing_score) typedef typename NumTraits::Real result_type; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a, const Score&) const { return numext::abs(a); } }; template struct abs_knowing_score::Score_is_abs> { EIGEN_EMPTY_STRUCT_CTOR(abs_knowing_score) typedef typename NumTraits::Real result_type; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scal&, const result_type& a) const { return a; } }; /** \internal * \brief Template functor to compute the squared absolute value of a scalar * * \sa class CwiseUnaryOp, Cwise::abs2 */ template struct scalar_abs2_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_abs2_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::abs2(a); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pmul(a,a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasAbs2 }; }; /** \internal * \brief Template functor to compute the conjugate of a complex value * * \sa class CwiseUnaryOp, MatrixBase::conjugate() */ template struct scalar_conjugate_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_conjugate_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::conj(a); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pconj(a); } }; template struct functor_traits > { enum { Cost = 0, // Yes the cost is zero even for complexes because in most cases for which // the cost is used, conjugation turns to be a no-op. Some examples: // cost(a*conj(b)) == cost(a*b) // cost(a+conj(b)) == cost(a+b) // ::HasConj }; }; /** \internal * \brief Template functor to compute the phase angle of a complex * * \sa class CwiseUnaryOp, Cwise::arg */ template struct scalar_arg_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_arg_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::arg(a); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::parg(a); } }; template struct functor_traits > { enum { Cost = NumTraits::IsComplex ? 5 * NumTraits::MulCost : NumTraits::AddCost, PacketAccess = packet_traits::HasArg }; }; /** \internal * \brief Template functor to cast a scalar to another type * * \sa class CwiseUnaryOp, MatrixBase::cast() */ template struct scalar_cast_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op) typedef NewType result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const NewType operator() (const Scalar& a) const { return cast(a); } }; template struct functor_traits > { enum { Cost = is_same::value ? 0 : NumTraits::AddCost, PacketAccess = false }; }; /** \internal * \brief Template functor to arithmetically shift a scalar right by a number of bits * * \sa class CwiseUnaryOp, MatrixBase::shift_right() */ template struct scalar_shift_right_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_shift_right_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return a >> N; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::parithmetic_shift_right(a); } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = packet_traits::HasShift }; }; /** \internal * \brief Template functor to logically shift a scalar left by a number of bits * * \sa class CwiseUnaryOp, MatrixBase::shift_left() */ template struct scalar_shift_left_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_shift_left_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return a << N; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::plogical_shift_left(a); } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = packet_traits::HasShift }; }; /** \internal * \brief Template functor to extract the real part of a complex * * \sa class CwiseUnaryOp, MatrixBase::real() */ template struct scalar_real_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_real_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::real(a); } }; template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; /** \internal * \brief Template functor to extract the imaginary part of a complex * * \sa class CwiseUnaryOp, MatrixBase::imag() */ template struct scalar_imag_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::imag(a); } }; template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; /** \internal * \brief Template functor to extract the real part of a complex as a reference * * \sa class CwiseUnaryOp, MatrixBase::real() */ template struct scalar_real_ref_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_real_ref_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::real_ref(*const_cast(&a)); } }; template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; /** \internal * \brief Template functor to extract the imaginary part of a complex as a reference * * \sa class CwiseUnaryOp, MatrixBase::imag() */ template struct scalar_imag_ref_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_ref_op) typedef typename NumTraits::Real result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::imag_ref(*const_cast(&a)); } }; template struct functor_traits > { enum { Cost = 0, PacketAccess = false }; }; /** \internal * * \brief Template functor to compute the exponential of a scalar * * \sa class CwiseUnaryOp, Cwise::exp() */ template struct scalar_exp_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_exp_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::exp(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pexp(a); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasExp, // The following numbers are based on the AVX implementation. #ifdef EIGEN_VECTORIZE_FMA // Haswell can issue 2 add/mul/madd per cycle. Cost = (sizeof(Scalar) == 4 // float: 8 pmadd, 4 pmul, 2 padd/psub, 6 other ? (8 * NumTraits::AddCost + 6 * NumTraits::MulCost) // double: 7 pmadd, 5 pmul, 3 padd/psub, 1 div, 13 other : (14 * NumTraits::AddCost + 6 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value)) #else Cost = (sizeof(Scalar) == 4 // float: 7 pmadd, 6 pmul, 4 padd/psub, 10 other ? (21 * NumTraits::AddCost + 13 * NumTraits::MulCost) // double: 7 pmadd, 5 pmul, 3 padd/psub, 1 div, 13 other : (23 * NumTraits::AddCost + 12 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value)) #endif }; }; /** \internal * * \brief Template functor to compute the exponential of a scalar - 1. * * \sa class CwiseUnaryOp, ArrayBase::expm1() */ template struct scalar_expm1_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_expm1_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::expm1(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pexpm1(a); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasExpm1, Cost = functor_traits >::Cost // TODO measure cost of expm1 }; }; /** \internal * * \brief Template functor to compute the logarithm of a scalar * * \sa class CwiseUnaryOp, ArrayBase::log() */ template struct scalar_log_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_log_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::log(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog(a); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasLog, Cost = (PacketAccess // The following numbers are based on the AVX implementation. #ifdef EIGEN_VECTORIZE_FMA // 8 pmadd, 6 pmul, 8 padd/psub, 16 other, can issue 2 add/mul/madd per cycle. ? (20 * NumTraits::AddCost + 7 * NumTraits::MulCost) #else // 8 pmadd, 6 pmul, 8 padd/psub, 20 other ? (36 * NumTraits::AddCost + 14 * NumTraits::MulCost) #endif // Measured cost of std::log. : sizeof(Scalar)==4 ? 40 : 85) }; }; /** \internal * * \brief Template functor to compute the logarithm of 1 plus a scalar value * * \sa class CwiseUnaryOp, ArrayBase::log1p() */ template struct scalar_log1p_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_log1p_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::log1p(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog1p(a); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasLog1p, Cost = functor_traits >::Cost // TODO measure cost of log1p }; }; /** \internal * * \brief Template functor to compute the base-10 logarithm of a scalar * * \sa class CwiseUnaryOp, Cwise::log10() */ template struct scalar_log10_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_log10_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { EIGEN_USING_STD(log10) return log10(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog10(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasLog10 }; }; /** \internal * * \brief Template functor to compute the base-2 logarithm of a scalar * * \sa class CwiseUnaryOp, Cwise::log2() */ template struct scalar_log2_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_log2_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return Scalar(EIGEN_LOG2E) * numext::log(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::plog2(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasLog }; }; /** \internal * \brief Template functor to compute the square root of a scalar * \sa class CwiseUnaryOp, Cwise::sqrt() */ template struct scalar_sqrt_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sqrt_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::sqrt(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psqrt(a); } }; template struct functor_traits > { enum { #if EIGEN_FAST_MATH // The following numbers are based on the AVX implementation. Cost = (sizeof(Scalar) == 8 ? 28 // 4 pmul, 1 pmadd, 3 other : (3 * NumTraits::AddCost + 5 * NumTraits::MulCost)), #else // The following numbers are based on min VSQRT throughput on Haswell. Cost = (sizeof(Scalar) == 8 ? 28 : 14), #endif PacketAccess = packet_traits::HasSqrt }; }; // Boolean specialization to eliminate -Wimplicit-conversion-floating-point-to-bool warnings. template<> struct scalar_sqrt_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sqrt_op) EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; } template EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return a; } }; template <> struct functor_traits > { enum { Cost = 1, PacketAccess = packet_traits::Vectorizable }; }; /** \internal * \brief Template functor to compute the reciprocal square root of a scalar * \sa class CwiseUnaryOp, Cwise::rsqrt() */ template struct scalar_rsqrt_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_rsqrt_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::rsqrt(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::prsqrt(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasRsqrt }; }; /** \internal * \brief Template functor to compute the cosine of a scalar * \sa class CwiseUnaryOp, ArrayBase::cos() */ template struct scalar_cos_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cos_op) EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return numext::cos(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pcos(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasCos }; }; /** \internal * \brief Template functor to compute the sine of a scalar * \sa class CwiseUnaryOp, ArrayBase::sin() */ template struct scalar_sin_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sin_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::sin(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psin(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasSin }; }; /** \internal * \brief Template functor to compute the tan of a scalar * \sa class CwiseUnaryOp, ArrayBase::tan() */ template struct scalar_tan_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_tan_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::tan(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::ptan(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasTan }; }; /** \internal * \brief Template functor to compute the arc cosine of a scalar * \sa class CwiseUnaryOp, ArrayBase::acos() */ template struct scalar_acos_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_acos_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::acos(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pacos(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasACos }; }; /** \internal * \brief Template functor to compute the arc sine of a scalar * \sa class CwiseUnaryOp, ArrayBase::asin() */ template struct scalar_asin_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_asin_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::asin(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pasin(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasASin }; }; /** \internal * \brief Template functor to compute the atan of a scalar * \sa class CwiseUnaryOp, ArrayBase::atan() */ template struct scalar_atan_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_atan_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::atan(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::patan(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasATan }; }; /** \internal * \brief Template functor to compute the tanh of a scalar * \sa class CwiseUnaryOp, ArrayBase::tanh() */ template struct scalar_tanh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_tanh_op) EIGEN_DEVICE_FUNC inline const Scalar operator()(const Scalar& a) const { return numext::tanh(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& x) const { return ptanh(x); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasTanh, Cost = ( (EIGEN_FAST_MATH && is_same::value) // The following numbers are based on the AVX implementation, #ifdef EIGEN_VECTORIZE_FMA // Haswell can issue 2 add/mul/madd per cycle. // 9 pmadd, 2 pmul, 1 div, 2 other ? (2 * NumTraits::AddCost + 6 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value) #else ? (11 * NumTraits::AddCost + 11 * NumTraits::MulCost + scalar_div_cost::HasDiv>::value) #endif // This number assumes a naive implementation of tanh : (6 * NumTraits::AddCost + 3 * NumTraits::MulCost + 2 * scalar_div_cost::HasDiv>::value + functor_traits >::Cost)) }; }; #if EIGEN_HAS_CXX11_MATH /** \internal * \brief Template functor to compute the atanh of a scalar * \sa class CwiseUnaryOp, ArrayBase::atanh() */ template struct scalar_atanh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_atanh_op) EIGEN_DEVICE_FUNC inline const Scalar operator()(const Scalar& a) const { return numext::atanh(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = false }; }; #endif /** \internal * \brief Template functor to compute the sinh of a scalar * \sa class CwiseUnaryOp, ArrayBase::sinh() */ template struct scalar_sinh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sinh_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::sinh(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psinh(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasSinh }; }; #if EIGEN_HAS_CXX11_MATH /** \internal * \brief Template functor to compute the asinh of a scalar * \sa class CwiseUnaryOp, ArrayBase::asinh() */ template struct scalar_asinh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_asinh_op) EIGEN_DEVICE_FUNC inline const Scalar operator()(const Scalar& a) const { return numext::asinh(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = false }; }; #endif /** \internal * \brief Template functor to compute the cosh of a scalar * \sa class CwiseUnaryOp, ArrayBase::cosh() */ template struct scalar_cosh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cosh_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return numext::cosh(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pcosh(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = packet_traits::HasCosh }; }; #if EIGEN_HAS_CXX11_MATH /** \internal * \brief Template functor to compute the acosh of a scalar * \sa class CwiseUnaryOp, ArrayBase::acosh() */ template struct scalar_acosh_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_acosh_op) EIGEN_DEVICE_FUNC inline const Scalar operator()(const Scalar& a) const { return numext::acosh(a); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = false }; }; #endif /** \internal * \brief Template functor to compute the inverse of a scalar * \sa class CwiseUnaryOp, Cwise::inverse() */ template struct scalar_inverse_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_inverse_op) EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return Scalar(1)/a; } template EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const { return internal::pdiv(pset1(Scalar(1)),a); } }; template struct functor_traits > { enum { PacketAccess = packet_traits::HasDiv, Cost = scalar_div_cost::value }; }; /** \internal * \brief Template functor to compute the square of a scalar * \sa class CwiseUnaryOp, Cwise::square() */ template struct scalar_square_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_square_op) EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a*a; } template EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const { return internal::pmul(a,a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasMul }; }; // Boolean specialization to avoid -Wint-in-bool-context warnings on GCC. template<> struct scalar_square_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_square_op) EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; } template EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const { return a; } }; template<> struct functor_traits > { enum { Cost = 0, PacketAccess = packet_traits::Vectorizable }; }; /** \internal * \brief Template functor to compute the cube of a scalar * \sa class CwiseUnaryOp, Cwise::cube() */ template struct scalar_cube_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cube_op) EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a*a*a; } template EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const { return internal::pmul(a,pmul(a,a)); } }; template struct functor_traits > { enum { Cost = 2*NumTraits::MulCost, PacketAccess = packet_traits::HasMul }; }; // Boolean specialization to avoid -Wint-in-bool-context warnings on GCC. template<> struct scalar_cube_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_cube_op) EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline bool operator() (const bool& a) const { return a; } template EIGEN_DEPRECATED EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const { return a; } }; template<> struct functor_traits > { enum { Cost = 0, PacketAccess = packet_traits::Vectorizable }; }; /** \internal * \brief Template functor to compute the rounded value of a scalar * \sa class CwiseUnaryOp, ArrayBase::round() */ template struct scalar_round_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_round_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::round(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pround(a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasRound }; }; /** \internal * \brief Template functor to compute the floor of a scalar * \sa class CwiseUnaryOp, ArrayBase::floor() */ template struct scalar_floor_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_floor_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::floor(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pfloor(a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasFloor }; }; /** \internal * \brief Template functor to compute the rounded (with current rounding mode) value of a scalar * \sa class CwiseUnaryOp, ArrayBase::rint() */ template struct scalar_rint_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_rint_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::rint(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::print(a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasRint }; }; /** \internal * \brief Template functor to compute the ceil of a scalar * \sa class CwiseUnaryOp, ArrayBase::ceil() */ template struct scalar_ceil_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_ceil_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return numext::ceil(a); } template EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::pceil(a); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = packet_traits::HasCeil }; }; /** \internal * \brief Template functor to compute whether a scalar is NaN * \sa class CwiseUnaryOp, ArrayBase::isnan() */ template struct scalar_isnan_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_isnan_op) typedef bool result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { #if defined(SYCL_DEVICE_ONLY) return numext::isnan(a); #else return (numext::isnan)(a); #endif } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = false }; }; /** \internal * \brief Template functor to check whether a scalar is +/-inf * \sa class CwiseUnaryOp, ArrayBase::isinf() */ template struct scalar_isinf_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_isinf_op) typedef bool result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { #if defined(SYCL_DEVICE_ONLY) return numext::isinf(a); #else return (numext::isinf)(a); #endif } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = false }; }; /** \internal * \brief Template functor to check whether a scalar has a finite value * \sa class CwiseUnaryOp, ArrayBase::isfinite() */ template struct scalar_isfinite_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_isfinite_op) typedef bool result_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { #if defined(SYCL_DEVICE_ONLY) return numext::isfinite(a); #else return (numext::isfinite)(a); #endif } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = false }; }; /** \internal * \brief Template functor to compute the logical not of a boolean * * \sa class CwiseUnaryOp, ArrayBase::operator! */ template struct scalar_boolean_not_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_not_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a) const { return !a; } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false }; }; /** \internal * \brief Template functor to compute the signum of a scalar * \sa class CwiseUnaryOp, Cwise::sign() */ template::IsComplex!=0), bool is_integer=(NumTraits::IsInteger!=0) > struct scalar_sign_op; template struct scalar_sign_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return Scalar( (a>Scalar(0)) - (a //EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psign(a); } }; template struct scalar_sign_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return (numext::isnan)(a) ? a : Scalar( (a>Scalar(0)) - (a //EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psign(a); } }; template struct scalar_sign_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op) EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { typedef typename NumTraits::Real real_type; real_type aa = numext::abs(a); if (aa==real_type(0)) return Scalar(0); aa = real_type(1)/aa; return Scalar(a.real()*aa, a.imag()*aa ); } //TODO //template //EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& a) const { return internal::psign(a); } }; template struct functor_traits > { enum { Cost = NumTraits::IsComplex ? ( 8*NumTraits::MulCost ) // roughly : ( 3*NumTraits::AddCost), PacketAccess = packet_traits::HasSign }; }; /** \internal * \brief Template functor to compute the logistic function of a scalar * \sa class CwiseUnaryOp, ArrayBase::logistic() */ template struct scalar_logistic_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_logistic_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const { return packetOp(x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& x) const { const Packet one = pset1(T(1)); return pdiv(one, padd(one, pexp(pnegate(x)))); } }; #ifndef EIGEN_GPU_COMPILE_PHASE /** \internal * \brief Template specialization of the logistic function for float. * * Uses just a 9/10-degree rational interpolant which * interpolates 1/(1+exp(-x)) - 0.5 up to a couple of ulps in the range * [-9, 18]. Below -9 we use the more accurate approximation * 1/(1+exp(-x)) ~= exp(x), and above 18 the logistic function is 1 withing * one ulp. The shifted logistic is interpolated because it was easier to * make the fit converge. * */ template <> struct scalar_logistic_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_logistic_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator()(const float& x) const { return packetOp(x); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& _x) const { const Packet cutoff_lower = pset1(-9.f); const Packet lt_mask = pcmp_lt(_x, cutoff_lower); const bool any_small = predux_any(lt_mask); // The upper cut-off is the smallest x for which the rational approximation evaluates to 1. // Choosing this value saves us a few instructions clamping the results at the end. #ifdef EIGEN_VECTORIZE_FMA const Packet cutoff_upper = pset1(15.7243833541870117f); #else const Packet cutoff_upper = pset1(15.6437711715698242f); #endif const Packet x = pmin(_x, cutoff_upper); // The monomial coefficients of the numerator polynomial (odd). const Packet alpha_1 = pset1(2.48287947061529e-01f); const Packet alpha_3 = pset1(8.51377133304701e-03f); const Packet alpha_5 = pset1(6.08574864600143e-05f); const Packet alpha_7 = pset1(1.15627324459942e-07f); const Packet alpha_9 = pset1(4.37031012579801e-11f); // The monomial coefficients of the denominator polynomial (even). const Packet beta_0 = pset1(9.93151921023180e-01f); const Packet beta_2 = pset1(1.16817656904453e-01f); const Packet beta_4 = pset1(1.70198817374094e-03f); const Packet beta_6 = pset1(6.29106785017040e-06f); const Packet beta_8 = pset1(5.76102136993427e-09f); const Packet beta_10 = pset1(6.10247389755681e-13f); // Since the polynomials are odd/even, we need x^2. const Packet x2 = pmul(x, x); // Evaluate the numerator polynomial p. Packet p = pmadd(x2, alpha_9, alpha_7); p = pmadd(x2, p, alpha_5); p = pmadd(x2, p, alpha_3); p = pmadd(x2, p, alpha_1); p = pmul(x, p); // Evaluate the denominator polynomial q. Packet q = pmadd(x2, beta_10, beta_8); q = pmadd(x2, q, beta_6); q = pmadd(x2, q, beta_4); q = pmadd(x2, q, beta_2); q = pmadd(x2, q, beta_0); // Divide the numerator by the denominator and shift it up. const Packet logistic = padd(pdiv(p, q), pset1(0.5f)); if (EIGEN_PREDICT_FALSE(any_small)) { const Packet exponential = pexp(_x); return pselect(lt_mask, exponential, logistic); } else { return logistic; } } }; #endif // #ifndef EIGEN_GPU_COMPILE_PHASE template struct functor_traits > { enum { // The cost estimate for float here here is for the common(?) case where // all arguments are greater than -9. Cost = scalar_div_cost::HasDiv>::value + (internal::is_same::value ? NumTraits::AddCost * 15 + NumTraits::MulCost * 11 : NumTraits::AddCost * 2 + functor_traits >::Cost), PacketAccess = packet_traits::HasAdd && packet_traits::HasDiv && (internal::is_same::value ? packet_traits::HasMul && packet_traits::HasMax && packet_traits::HasMin : packet_traits::HasNegate && packet_traits::HasExp) }; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/functors/TernaryFunctors.h0000644000176200001440000000113714107270226024423 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2016 Eugene Brevdo // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TERNARY_FUNCTORS_H #define EIGEN_TERNARY_FUNCTORS_H namespace Eigen { namespace internal { //---------- associative ternary functors ---------- } // end namespace internal } // end namespace Eigen #endif // EIGEN_TERNARY_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/functors/AssignmentFunctors.h0000644000176200001440000001503614562417221025115 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ASSIGNMENT_FUNCTORS_H #define EIGEN_ASSIGNMENT_FUNCTORS_H namespace Eigen { namespace internal { /** \internal * \brief Template functor for scalar/packet assignment * */ template struct assign_op { EIGEN_EMPTY_STRUCT_CTOR(assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(DstScalar& a, const SrcScalar& b) const { a = b; } template EIGEN_STRONG_INLINE void assignPacket(DstScalar* a, const Packet& b) const { internal::pstoret(a,b); } }; // Empty overload for void type (used by PermutationMatrix) template struct assign_op {}; template struct functor_traits > { enum { Cost = NumTraits::ReadCost, PacketAccess = is_same::value && packet_traits::Vectorizable && packet_traits::Vectorizable }; }; /** \internal * \brief Template functor for scalar/packet assignment with addition * */ template struct add_assign_op { EIGEN_EMPTY_STRUCT_CTOR(add_assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(DstScalar& a, const SrcScalar& b) const { a += b; } template EIGEN_STRONG_INLINE void assignPacket(DstScalar* a, const Packet& b) const { internal::pstoret(a,internal::padd(internal::ploadt(a),b)); } }; template struct functor_traits > { enum { Cost = NumTraits::ReadCost + NumTraits::AddCost, PacketAccess = is_same::value && packet_traits::HasAdd }; }; /** \internal * \brief Template functor for scalar/packet assignment with subtraction * */ template struct sub_assign_op { EIGEN_EMPTY_STRUCT_CTOR(sub_assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(DstScalar& a, const SrcScalar& b) const { a -= b; } template EIGEN_STRONG_INLINE void assignPacket(DstScalar* a, const Packet& b) const { internal::pstoret(a,internal::psub(internal::ploadt(a),b)); } }; template struct functor_traits > { enum { Cost = NumTraits::ReadCost + NumTraits::AddCost, PacketAccess = is_same::value && packet_traits::HasSub }; }; /** \internal * \brief Template functor for scalar/packet assignment with multiplication * */ template struct mul_assign_op { EIGEN_EMPTY_STRUCT_CTOR(mul_assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(DstScalar& a, const SrcScalar& b) const { a *= b; } template EIGEN_STRONG_INLINE void assignPacket(DstScalar* a, const Packet& b) const { internal::pstoret(a,internal::pmul(internal::ploadt(a),b)); } }; template struct functor_traits > { enum { Cost = NumTraits::ReadCost + NumTraits::MulCost, PacketAccess = is_same::value && packet_traits::HasMul }; }; /** \internal * \brief Template functor for scalar/packet assignment with diviving * */ template struct div_assign_op { EIGEN_EMPTY_STRUCT_CTOR(div_assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(DstScalar& a, const SrcScalar& b) const { a /= b; } template EIGEN_STRONG_INLINE void assignPacket(DstScalar* a, const Packet& b) const { internal::pstoret(a,internal::pdiv(internal::ploadt(a),b)); } }; template struct functor_traits > { enum { Cost = NumTraits::ReadCost + NumTraits::MulCost, PacketAccess = is_same::value && packet_traits::HasDiv }; }; /** \internal * \brief Template functor for scalar/packet assignment with swapping * * It works as follow. For a non-vectorized evaluation loop, we have: * for(i) func(A.coeffRef(i), B.coeff(i)); * where B is a SwapWrapper expression. The trick is to make SwapWrapper::coeff behaves like a non-const coeffRef. * Actually, SwapWrapper might not even be needed since even if B is a plain expression, since it has to be writable * B.coeff already returns a const reference to the underlying scalar value. * * The case of a vectorized loop is more tricky: * for(i,j) func.assignPacket(&A.coeffRef(i,j), B.packet(i,j)); * Here, B must be a SwapWrapper whose packet function actually returns a proxy object holding a Scalar*, * the actual alignment and Packet type. * */ template struct swap_assign_op { EIGEN_EMPTY_STRUCT_CTOR(swap_assign_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { #ifdef EIGEN_GPUCC // FIXME is there some kind of cuda::swap? Scalar t=b; const_cast(b)=a; a=t; #else using std::swap; swap(a,const_cast(b)); #endif } }; template struct functor_traits > { enum { Cost = 3 * NumTraits::ReadCost, PacketAccess = #if defined(EIGEN_VECTORIZE_AVX) && EIGEN_COMP_CLANG && (EIGEN_COMP_CLANG<800 || defined(__apple_build_version__)) // This is a partial workaround for a bug in clang generating bad code // when mixing 256/512 bits loads and 128 bits moves. // See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1684 // https://bugs.llvm.org/show_bug.cgi?id=40815 0 #else packet_traits::Vectorizable #endif }; }; } // namespace internal } // namespace Eigen #endif // EIGEN_ASSIGNMENT_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/functors/NullaryFunctors.h0000644000176200001440000002021614562417221024427 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NULLARY_FUNCTORS_H #define EIGEN_NULLARY_FUNCTORS_H namespace Eigen { namespace internal { template struct scalar_constant_op { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const scalar_constant_op& other) : m_other(other.m_other) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const Scalar& other) : m_other(other) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() () const { return m_other; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const PacketType packetOp() const { return internal::pset1(m_other); } const Scalar m_other; }; template struct functor_traits > { enum { Cost = 0 /* as the constant value should be loaded in register only once for the whole expression */, PacketAccess = packet_traits::Vectorizable, IsRepeatable = true }; }; template struct scalar_identity_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_identity_op) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (IndexType row, IndexType col) const { return row==col ? Scalar(1) : Scalar(0); } }; template struct functor_traits > { enum { Cost = NumTraits::AddCost, PacketAccess = false, IsRepeatable = true }; }; template struct linspaced_op_impl; template struct linspaced_op_impl { typedef typename NumTraits::Real RealScalar; EIGEN_DEVICE_FUNC linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) : m_low(low), m_high(high), m_size1(num_steps==1 ? 1 : num_steps-1), m_step(num_steps==1 ? Scalar() : Scalar((high-low)/RealScalar(num_steps-1))), m_flip(numext::abs(high) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (IndexType i) const { if(m_flip) return (i==0)? m_low : Scalar(m_high - RealScalar(m_size1-i)*m_step); else return (i==m_size1)? m_high : Scalar(m_low + RealScalar(i)*m_step); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(IndexType i) const { // Principle: // [low, ..., low] + ( [step, ..., step] * ( [i, ..., i] + [0, ..., size] ) ) if(m_flip) { Packet pi = plset(Scalar(i-m_size1)); Packet res = padd(pset1(m_high), pmul(pset1(m_step), pi)); if (EIGEN_PREDICT_TRUE(i != 0)) return res; Packet mask = pcmp_lt(pset1(0), plset(0)); return pselect(mask, res, pset1(m_low)); } else { Packet pi = plset(Scalar(i)); Packet res = padd(pset1(m_low), pmul(pset1(m_step), pi)); if(EIGEN_PREDICT_TRUE(i != m_size1-unpacket_traits::size+1)) return res; Packet mask = pcmp_lt(plset(0), pset1(unpacket_traits::size-1)); return pselect(mask, res, pset1(m_high)); } } const Scalar m_low; const Scalar m_high; const Index m_size1; const Scalar m_step; const bool m_flip; }; template struct linspaced_op_impl { EIGEN_DEVICE_FUNC linspaced_op_impl(const Scalar& low, const Scalar& high, Index num_steps) : m_low(low), m_multiplier((high-low)/convert_index(num_steps<=1 ? 1 : num_steps-1)), m_divisor(convert_index((high>=low?num_steps:-num_steps)+(high-low))/((numext::abs(high-low)+1)==0?1:(numext::abs(high-low)+1))), m_use_divisor(num_steps>1 && (numext::abs(high-low)+1) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (IndexType i) const { if(m_use_divisor) return m_low + convert_index(i)/m_divisor; else return m_low + convert_index(i)*m_multiplier; } const Scalar m_low; const Scalar m_multiplier; const Scalar m_divisor; const bool m_use_divisor; }; // ----- Linspace functor ---------------------------------------------------------------- // Forward declaration (we default to random access which does not really give // us a speed gain when using packet access but it allows to use the functor in // nested expressions). template struct linspaced_op; template struct functor_traits< linspaced_op > { enum { Cost = 1, PacketAccess = (!NumTraits::IsInteger) && packet_traits::HasSetLinear && packet_traits::HasBlend, /*&& ((!NumTraits::IsInteger) || packet_traits::HasDiv),*/ // <- vectorization for integer is currently disabled IsRepeatable = true }; }; template struct linspaced_op { EIGEN_DEVICE_FUNC linspaced_op(const Scalar& low, const Scalar& high, Index num_steps) : impl((num_steps==1 ? high : low),high,num_steps) {} template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (IndexType i) const { return impl(i); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(IndexType i) const { return impl.template packetOp(i); } // This proxy object handles the actual required temporaries and the different // implementations (integer vs. floating point). const linspaced_op_impl::IsInteger> impl; }; // Linear access is automatically determined from the operator() prototypes available for the given functor. // If it exposes an operator()(i,j), then we assume the i and j coefficients are required independently // and linear access is not possible. In all other cases, linear access is enabled. // Users should not have to deal with this structure. template struct functor_has_linear_access { enum { ret = !has_binary_operator::value }; }; // For unreliable compilers, let's specialize the has_*ary_operator // helpers so that at least built-in nullary functors work fine. #if !( (EIGEN_COMP_MSVC>1600) || (EIGEN_GNUC_AT_LEAST(4,8)) || (EIGEN_COMP_ICC>=1600)) template struct has_nullary_operator,IndexType> { enum { value = 1}; }; template struct has_unary_operator,IndexType> { enum { value = 0}; }; template struct has_binary_operator,IndexType> { enum { value = 0}; }; template struct has_nullary_operator,IndexType> { enum { value = 0}; }; template struct has_unary_operator,IndexType> { enum { value = 0}; }; template struct has_binary_operator,IndexType> { enum { value = 1}; }; template struct has_nullary_operator,IndexType> { enum { value = 0}; }; template struct has_unary_operator,IndexType> { enum { value = 1}; }; template struct has_binary_operator,IndexType> { enum { value = 0}; }; template struct has_nullary_operator,IndexType> { enum { value = 1}; }; template struct has_unary_operator,IndexType> { enum { value = 0}; }; template struct has_binary_operator,IndexType> { enum { value = 0}; }; #endif } // end namespace internal } // end namespace Eigen #endif // EIGEN_NULLARY_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/functors/BinaryFunctors.h0000644000176200001440000005067114562417221024235 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BINARY_FUNCTORS_H #define EIGEN_BINARY_FUNCTORS_H namespace Eigen { namespace internal { //---------- associative binary functors ---------- template struct binary_op_base { typedef Arg1 first_argument_type; typedef Arg2 second_argument_type; }; /** \internal * \brief Template functor to compute the sum of two scalars * * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, DenseBase::sum() */ template struct scalar_sum_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op) #else scalar_sum_op() { EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a + b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const { return internal::padd(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const { return internal::predux(a); } }; template struct functor_traits > { enum { Cost = (int(NumTraits::AddCost) + int(NumTraits::AddCost)) / 2, // rough estimate! PacketAccess = is_same::value && packet_traits::HasAdd && packet_traits::HasAdd // TODO vectorize mixed sum }; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool scalar_sum_op::operator() (const bool& a, const bool& b) const { return a || b; } /** \internal * \brief Template functor to compute the product of two scalars * * \sa class CwiseBinaryOp, Cwise::operator*(), class VectorwiseOp, MatrixBase::redux() */ template struct scalar_product_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op) #else scalar_product_op() { EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a * b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const { return internal::pmul(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const { return internal::predux_mul(a); } }; template struct functor_traits > { enum { Cost = (int(NumTraits::MulCost) + int(NumTraits::MulCost))/2, // rough estimate! PacketAccess = is_same::value && packet_traits::HasMul && packet_traits::HasMul // TODO vectorize mixed product }; }; template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool scalar_product_op::operator() (const bool& a, const bool& b) const { return a && b; } /** \internal * \brief Template functor to compute the conjugate product of two scalars * * This is a short cut for conj(x) * y which is needed for optimization purpose; in Eigen2 support mode, this becomes x * conj(y) */ template struct scalar_conj_product_op : binary_op_base { enum { Conj = NumTraits::IsComplex }; typedef typename ScalarBinaryOpTraits::ReturnType result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_conj_product_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return conj_helper().pmul(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const { return conj_helper().pmul(a,b); } }; template struct functor_traits > { enum { Cost = NumTraits::MulCost, PacketAccess = internal::is_same::value && packet_traits::HasMul }; }; /** \internal * \brief Template functor to compute the min of two scalars * * \sa class CwiseBinaryOp, MatrixBase::cwiseMin, class VectorwiseOp, MatrixBase::minCoeff() */ template struct scalar_min_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_min_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return internal::pmin(a, b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const { return internal::pmin(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const { return internal::predux_min(a); } }; template struct functor_traits > { enum { Cost = (NumTraits::AddCost+NumTraits::AddCost)/2, PacketAccess = internal::is_same::value && packet_traits::HasMin }; }; /** \internal * \brief Template functor to compute the max of two scalars * * \sa class CwiseBinaryOp, MatrixBase::cwiseMax, class VectorwiseOp, MatrixBase::maxCoeff() */ template struct scalar_max_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_max_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return internal::pmax(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet packetOp(const Packet& a, const Packet& b) const { return internal::pmax(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type predux(const Packet& a) const { return internal::predux_max(a); } }; template struct functor_traits > { enum { Cost = (NumTraits::AddCost+NumTraits::AddCost)/2, PacketAccess = internal::is_same::value && packet_traits::HasMax }; }; /** \internal * \brief Template functors for comparison of two scalars * \todo Implement packet-comparisons */ template struct scalar_cmp_op; template struct functor_traits > { enum { Cost = (NumTraits::AddCost+NumTraits::AddCost)/2, PacketAccess = false }; }; template struct result_of(LhsScalar,RhsScalar)> { typedef bool type; }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a==b;} }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a<=b;} }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a>b;} }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a>=b;} }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return !(a<=b || b<=a);} }; template struct scalar_cmp_op : binary_op_base { typedef bool result_type; EIGEN_EMPTY_STRUCT_CTOR(scalar_cmp_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(const LhsScalar& a, const RhsScalar& b) const {return a!=b;} }; /** \internal * \brief Template functor to compute the hypot of two \b positive \b and \b real scalars * * \sa MatrixBase::stableNorm(), class Redux */ template struct scalar_hypot_op : binary_op_base { EIGEN_EMPTY_STRUCT_CTOR(scalar_hypot_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar &x, const Scalar &y) const { // This functor is used by hypotNorm only for which it is faster to first apply abs // on all coefficients prior to reduction through hypot. // This way we avoid calling abs on positive and real entries, and this also permits // to seamlessly handle complexes. Otherwise we would have to handle both real and complexes // through the same functor... return internal::positive_real_hypot(x,y); } }; template struct functor_traits > { enum { Cost = 3 * NumTraits::AddCost + 2 * NumTraits::MulCost + 2 * scalar_div_cost::value, PacketAccess = false }; }; /** \internal * \brief Template functor to compute the pow of two scalars * See the specification of pow in https://en.cppreference.com/w/cpp/numeric/math/pow */ template struct scalar_pow_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_pow_op) #else scalar_pow_op() { typedef Scalar LhsScalar; typedef Exponent RhsScalar; EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC inline result_type operator() (const Scalar& a, const Exponent& b) const { return numext::pow(a, b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return generic_pow(a,b); } }; template struct functor_traits > { enum { Cost = 5 * NumTraits::MulCost, PacketAccess = (!NumTraits::IsComplex && !NumTraits::IsInteger && packet_traits::HasExp && packet_traits::HasLog && packet_traits::HasRound && packet_traits::HasCmp && // Temporarly disable packet access for half/bfloat16 until // accuracy is improved. !is_same::value && !is_same::value ) }; }; //---------- non associative binary functors ---------- /** \internal * \brief Template functor to compute the difference of two scalars * * \sa class CwiseBinaryOp, MatrixBase::operator- */ template struct scalar_difference_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op) #else scalar_difference_op() { EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a - b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::psub(a,b); } }; template struct functor_traits > { enum { Cost = (int(NumTraits::AddCost) + int(NumTraits::AddCost)) / 2, PacketAccess = is_same::value && packet_traits::HasSub && packet_traits::HasSub }; }; /** \internal * \brief Template functor to compute the quotient of two scalars * * \sa class CwiseBinaryOp, Cwise::operator/() */ template struct scalar_quotient_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op) #else scalar_quotient_op() { EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a / b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::pdiv(a,b); } }; template struct functor_traits > { typedef typename scalar_quotient_op::result_type result_type; enum { PacketAccess = is_same::value && packet_traits::HasDiv && packet_traits::HasDiv, Cost = scalar_div_cost::value }; }; /** \internal * \brief Template functor to compute the and of two booleans * * \sa class CwiseBinaryOp, ArrayBase::operator&& */ struct scalar_boolean_and_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_and_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a && b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::pand(a,b); } }; template<> struct functor_traits { enum { Cost = NumTraits::AddCost, PacketAccess = true }; }; /** \internal * \brief Template functor to compute the or of two booleans * * \sa class CwiseBinaryOp, ArrayBase::operator|| */ struct scalar_boolean_or_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_or_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a || b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::por(a,b); } }; template<> struct functor_traits { enum { Cost = NumTraits::AddCost, PacketAccess = true }; }; /** \internal * \brief Template functor to compute the xor of two booleans * * \sa class CwiseBinaryOp, ArrayBase::operator^ */ struct scalar_boolean_xor_op { EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_xor_op) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a ^ b; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::pxor(a,b); } }; template<> struct functor_traits { enum { Cost = NumTraits::AddCost, PacketAccess = true }; }; /** \internal * \brief Template functor to compute the absolute difference of two scalars * * \sa class CwiseBinaryOp, MatrixBase::absolute_difference */ template struct scalar_absolute_difference_op : binary_op_base { typedef typename ScalarBinaryOpTraits::ReturnType result_type; #ifndef EIGEN_SCALAR_BINARY_OP_PLUGIN EIGEN_EMPTY_STRUCT_CTOR(scalar_absolute_difference_op) #else scalar_absolute_difference_op() { EIGEN_SCALAR_BINARY_OP_PLUGIN } #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return numext::absdiff(a,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const { return internal::pabsdiff(a,b); } }; template struct functor_traits > { enum { Cost = (NumTraits::AddCost+NumTraits::AddCost)/2, PacketAccess = is_same::value && packet_traits::HasAbsDiff }; }; //---------- binary functors bound to a constant, thus appearing as a unary functor ---------- // The following two classes permits to turn any binary functor into a unary one with one argument bound to a constant value. // They are analogues to std::binder1st/binder2nd but with the following differences: // - they are compatible with packetOp // - they are portable across C++ versions (the std::binder* are deprecated in C++11) template struct bind1st_op : BinaryOp { typedef typename BinaryOp::first_argument_type first_argument_type; typedef typename BinaryOp::second_argument_type second_argument_type; typedef typename BinaryOp::result_type result_type; EIGEN_DEVICE_FUNC explicit bind1st_op(const first_argument_type &val) : m_value(val) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const second_argument_type& b) const { return BinaryOp::operator()(m_value,b); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& b) const { return BinaryOp::packetOp(internal::pset1(m_value), b); } first_argument_type m_value; }; template struct functor_traits > : functor_traits {}; template struct bind2nd_op : BinaryOp { typedef typename BinaryOp::first_argument_type first_argument_type; typedef typename BinaryOp::second_argument_type second_argument_type; typedef typename BinaryOp::result_type result_type; EIGEN_DEVICE_FUNC explicit bind2nd_op(const second_argument_type &val) : m_value(val) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const first_argument_type& a) const { return BinaryOp::operator()(a,m_value); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return BinaryOp::packetOp(a,internal::pset1(m_value)); } second_argument_type m_value; }; template struct functor_traits > : functor_traits {}; } // end namespace internal } // end namespace Eigen #endif // EIGEN_BINARY_FUNCTORS_H RcppEigen/inst/include/Eigen/src/Core/Reshaped.h0000644000176200001440000004121114562417221021143 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2017 Gael Guennebaud // Copyright (C) 2014 yoco // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_RESHAPED_H #define EIGEN_RESHAPED_H namespace Eigen { /** \class Reshaped * \ingroup Core_Module * * \brief Expression of a fixed-size or dynamic-size reshape * * \tparam XprType the type of the expression in which we are taking a reshape * \tparam Rows the number of rows of the reshape we are taking at compile time (optional) * \tparam Cols the number of columns of the reshape we are taking at compile time (optional) * \tparam Order can be ColMajor or RowMajor, default is ColMajor. * * This class represents an expression of either a fixed-size or dynamic-size reshape. * It is the return type of DenseBase::reshaped(NRowsType,NColsType) and * most of the time this is the only way it is used. * * However, in C++98, if you want to directly maniputate reshaped expressions, * for instance if you want to write a function returning such an expression, you * will need to use this class. In C++11, it is advised to use the \em auto * keyword for such use cases. * * Here is an example illustrating the dynamic case: * \include class_Reshaped.cpp * Output: \verbinclude class_Reshaped.out * * Here is an example illustrating the fixed-size case: * \include class_FixedReshaped.cpp * Output: \verbinclude class_FixedReshaped.out * * \sa DenseBase::reshaped(NRowsType,NColsType) */ namespace internal { template struct traits > : traits { typedef typename traits::Scalar Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::XprKind XprKind; enum{ MatrixRows = traits::RowsAtCompileTime, MatrixCols = traits::ColsAtCompileTime, RowsAtCompileTime = Rows, ColsAtCompileTime = Cols, MaxRowsAtCompileTime = Rows, MaxColsAtCompileTime = Cols, XpxStorageOrder = ((int(traits::Flags) & RowMajorBit) == RowMajorBit) ? RowMajor : ColMajor, ReshapedStorageOrder = (RowsAtCompileTime == 1 && ColsAtCompileTime != 1) ? RowMajor : (ColsAtCompileTime == 1 && RowsAtCompileTime != 1) ? ColMajor : XpxStorageOrder, HasSameStorageOrderAsXprType = (ReshapedStorageOrder == XpxStorageOrder), InnerSize = (ReshapedStorageOrder==int(RowMajor)) ? int(ColsAtCompileTime) : int(RowsAtCompileTime), InnerStrideAtCompileTime = HasSameStorageOrderAsXprType ? int(inner_stride_at_compile_time::ret) : Dynamic, OuterStrideAtCompileTime = Dynamic, HasDirectAccess = internal::has_direct_access::ret && (Order==int(XpxStorageOrder)) && ((evaluator::Flags&LinearAccessBit)==LinearAccessBit), MaskPacketAccessBit = (InnerSize == Dynamic || (InnerSize % packet_traits::size) == 0) && (InnerStrideAtCompileTime == 1) ? PacketAccessBit : 0, //MaskAlignedBit = ((OuterStrideAtCompileTime!=Dynamic) && (((OuterStrideAtCompileTime * int(sizeof(Scalar))) % 16) == 0)) ? AlignedBit : 0, FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1) ? LinearAccessBit : 0, FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, FlagsRowMajorBit = (ReshapedStorageOrder==int(RowMajor)) ? RowMajorBit : 0, FlagsDirectAccessBit = HasDirectAccess ? DirectAccessBit : 0, Flags0 = traits::Flags & ( (HereditaryBits & ~RowMajorBit) | MaskPacketAccessBit), Flags = (Flags0 | FlagsLinearAccessBit | FlagsLvalueBit | FlagsRowMajorBit | FlagsDirectAccessBit) }; }; template class ReshapedImpl_dense; } // end namespace internal template class ReshapedImpl; template class Reshaped : public ReshapedImpl::StorageKind> { typedef ReshapedImpl::StorageKind> Impl; public: //typedef typename Impl::Base Base; typedef Impl Base; EIGEN_GENERIC_PUBLIC_INTERFACE(Reshaped) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reshaped) /** Fixed-size constructor */ EIGEN_DEVICE_FUNC inline Reshaped(XprType& xpr) : Impl(xpr) { EIGEN_STATIC_ASSERT(RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic,THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE) eigen_assert(Rows * Cols == xpr.rows() * xpr.cols()); } /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC inline Reshaped(XprType& xpr, Index reshapeRows, Index reshapeCols) : Impl(xpr, reshapeRows, reshapeCols) { eigen_assert((RowsAtCompileTime==Dynamic || RowsAtCompileTime==reshapeRows) && (ColsAtCompileTime==Dynamic || ColsAtCompileTime==reshapeCols)); eigen_assert(reshapeRows * reshapeCols == xpr.rows() * xpr.cols()); } }; // The generic default implementation for dense reshape simply forward to the internal::ReshapedImpl_dense // that must be specialized for direct and non-direct access... template class ReshapedImpl : public internal::ReshapedImpl_dense >::HasDirectAccess> { typedef internal::ReshapedImpl_dense >::HasDirectAccess> Impl; public: typedef Impl Base; EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl) EIGEN_DEVICE_FUNC inline ReshapedImpl(XprType& xpr) : Impl(xpr) {} EIGEN_DEVICE_FUNC inline ReshapedImpl(XprType& xpr, Index reshapeRows, Index reshapeCols) : Impl(xpr, reshapeRows, reshapeCols) {} }; namespace internal { /** \internal Internal implementation of dense Reshaped in the general case. */ template class ReshapedImpl_dense : public internal::dense_xpr_base >::type { typedef Reshaped ReshapedType; public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(ReshapedType) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl_dense) typedef typename internal::ref_selector::non_const_type MatrixTypeNested; typedef typename internal::remove_all::type NestedExpression; class InnerIterator; /** Fixed-size constructor */ EIGEN_DEVICE_FUNC inline ReshapedImpl_dense(XprType& xpr) : m_xpr(xpr), m_rows(Rows), m_cols(Cols) {} /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC inline ReshapedImpl_dense(XprType& xpr, Index nRows, Index nCols) : m_xpr(xpr), m_rows(nRows), m_cols(nCols) {} EIGEN_DEVICE_FUNC Index rows() const { return m_rows; } EIGEN_DEVICE_FUNC Index cols() const { return m_cols; } #ifdef EIGEN_PARSED_BY_DOXYGEN /** \sa MapBase::data() */ EIGEN_DEVICE_FUNC inline const Scalar* data() const; EIGEN_DEVICE_FUNC inline Index innerStride() const; EIGEN_DEVICE_FUNC inline Index outerStride() const; #endif /** \returns the nested expression */ EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } /** \returns the nested expression */ EIGEN_DEVICE_FUNC typename internal::remove_reference::type& nestedExpression() { return m_xpr; } protected: MatrixTypeNested m_xpr; const internal::variable_if_dynamic m_rows; const internal::variable_if_dynamic m_cols; }; /** \internal Internal implementation of dense Reshaped in the direct access case. */ template class ReshapedImpl_dense : public MapBase > { typedef Reshaped ReshapedType; typedef typename internal::ref_selector::non_const_type XprTypeNested; public: typedef MapBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(ReshapedType) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ReshapedImpl_dense) /** Fixed-size constructor */ EIGEN_DEVICE_FUNC inline ReshapedImpl_dense(XprType& xpr) : Base(xpr.data()), m_xpr(xpr) {} /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC inline ReshapedImpl_dense(XprType& xpr, Index nRows, Index nCols) : Base(xpr.data(), nRows, nCols), m_xpr(xpr) {} EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } EIGEN_DEVICE_FUNC XprType& nestedExpression() { return m_xpr; } /** \sa MapBase::innerStride() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const { return m_xpr.innerStride(); } /** \sa MapBase::outerStride() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const { return ((Flags&RowMajorBit)==RowMajorBit) ? this->cols() : this->rows(); } protected: XprTypeNested m_xpr; }; // Evaluators template struct reshaped_evaluator; template struct evaluator > : reshaped_evaluator >::HasDirectAccess> { typedef Reshaped XprType; typedef typename XprType::Scalar Scalar; // TODO: should check for smaller packet types typedef typename packet_traits::type PacketScalar; enum { CoeffReadCost = evaluator::CoeffReadCost, HasDirectAccess = traits::HasDirectAccess, // RowsAtCompileTime = traits::RowsAtCompileTime, // ColsAtCompileTime = traits::ColsAtCompileTime, // MaxRowsAtCompileTime = traits::MaxRowsAtCompileTime, // MaxColsAtCompileTime = traits::MaxColsAtCompileTime, // // InnerStrideAtCompileTime = traits::HasSameStorageOrderAsXprType // ? int(inner_stride_at_compile_time::ret) // : Dynamic, // OuterStrideAtCompileTime = Dynamic, FlagsLinearAccessBit = (traits::RowsAtCompileTime == 1 || traits::ColsAtCompileTime == 1 || HasDirectAccess) ? LinearAccessBit : 0, FlagsRowMajorBit = (traits::ReshapedStorageOrder==int(RowMajor)) ? RowMajorBit : 0, FlagsDirectAccessBit = HasDirectAccess ? DirectAccessBit : 0, Flags0 = evaluator::Flags & (HereditaryBits & ~RowMajorBit), Flags = Flags0 | FlagsLinearAccessBit | FlagsRowMajorBit | FlagsDirectAccessBit, PacketAlignment = unpacket_traits::alignment, Alignment = evaluator::Alignment }; typedef reshaped_evaluator reshaped_evaluator_type; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : reshaped_evaluator_type(xpr) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } }; template struct reshaped_evaluator : evaluator_base > { typedef Reshaped XprType; enum { CoeffReadCost = evaluator::CoeffReadCost /* TODO + cost of index computations */, Flags = (evaluator::Flags & (HereditaryBits /*| LinearAccessBit | DirectAccessBit*/)), Alignment = 0 }; EIGEN_DEVICE_FUNC explicit reshaped_evaluator(const XprType& xpr) : m_argImpl(xpr.nestedExpression()), m_xpr(xpr) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef std::pair RowCol; inline RowCol index_remap(Index rowId, Index colId) const { if(Order==ColMajor) { const Index nth_elem_idx = colId * m_xpr.rows() + rowId; return RowCol(nth_elem_idx % m_xpr.nestedExpression().rows(), nth_elem_idx / m_xpr.nestedExpression().rows()); } else { const Index nth_elem_idx = colId + rowId * m_xpr.cols(); return RowCol(nth_elem_idx / m_xpr.nestedExpression().cols(), nth_elem_idx % m_xpr.nestedExpression().cols()); } } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index rowId, Index colId) { EIGEN_STATIC_ASSERT_LVALUE(XprType) const RowCol row_col = index_remap(rowId, colId); return m_argImpl.coeffRef(row_col.first, row_col.second); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const { const RowCol row_col = index_remap(rowId, colId); return m_argImpl.coeffRef(row_col.first, row_col.second); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const { const RowCol row_col = index_remap(rowId, colId); return m_argImpl.coeff(row_col.first, row_col.second); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index) { EIGEN_STATIC_ASSERT_LVALUE(XprType) const RowCol row_col = index_remap(Rows == 1 ? 0 : index, Rows == 1 ? index : 0); return m_argImpl.coeffRef(row_col.first, row_col.second); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { const RowCol row_col = index_remap(Rows == 1 ? 0 : index, Rows == 1 ? index : 0); return m_argImpl.coeffRef(row_col.first, row_col.second); } EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const { const RowCol row_col = index_remap(Rows == 1 ? 0 : index, Rows == 1 ? index : 0); return m_argImpl.coeff(row_col.first, row_col.second); } #if 0 EIGEN_DEVICE_FUNC template inline PacketScalar packet(Index rowId, Index colId) const { const RowCol row_col = index_remap(rowId, colId); return m_argImpl.template packet(row_col.first, row_col.second); } template EIGEN_DEVICE_FUNC inline void writePacket(Index rowId, Index colId, const PacketScalar& val) { const RowCol row_col = index_remap(rowId, colId); m_argImpl.const_cast_derived().template writePacket (row_col.first, row_col.second, val); } template EIGEN_DEVICE_FUNC inline PacketScalar packet(Index index) const { const RowCol row_col = index_remap(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0); return m_argImpl.template packet(row_col.first, row_col.second); } template EIGEN_DEVICE_FUNC inline void writePacket(Index index, const PacketScalar& val) { const RowCol row_col = index_remap(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0); return m_argImpl.template packet(row_col.first, row_col.second, val); } #endif protected: evaluator m_argImpl; const XprType& m_xpr; }; template struct reshaped_evaluator : mapbase_evaluator, typename Reshaped::PlainObject> { typedef Reshaped XprType; typedef typename XprType::Scalar Scalar; EIGEN_DEVICE_FUNC explicit reshaped_evaluator(const XprType& xpr) : mapbase_evaluator(xpr) { // TODO: for the 3.4 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime eigen_assert(((internal::UIntPtr(xpr.data()) % EIGEN_PLAIN_ENUM_MAX(1,evaluator::Alignment)) == 0) && "data is not aligned"); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_RESHAPED_H RcppEigen/inst/include/Eigen/src/Core/PermutationMatrix.h0000644000176200001440000005041414562417221023111 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Benoit Jacob // Copyright (C) 2009-2015 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_PERMUTATIONMATRIX_H #define EIGEN_PERMUTATIONMATRIX_H namespace Eigen { namespace internal { enum PermPermProduct_t {PermPermProduct}; } // end namespace internal /** \class PermutationBase * \ingroup Core_Module * * \brief Base class for permutations * * \tparam Derived the derived class * * This class is the base class for all expressions representing a permutation matrix, * internally stored as a vector of integers. * The convention followed here is that if \f$ \sigma \f$ is a permutation, the corresponding permutation matrix * \f$ P_\sigma \f$ is such that if \f$ (e_1,\ldots,e_p) \f$ is the canonical basis, we have: * \f[ P_\sigma(e_i) = e_{\sigma(i)}. \f] * This convention ensures that for any two permutations \f$ \sigma, \tau \f$, we have: * \f[ P_{\sigma\circ\tau} = P_\sigma P_\tau. \f] * * Permutation matrices are square and invertible. * * Notice that in addition to the member functions and operators listed here, there also are non-member * operator* to multiply any kind of permutation object with any kind of matrix expression (MatrixBase) * on either side. * * \sa class PermutationMatrix, class PermutationWrapper */ template class PermutationBase : public EigenBase { typedef internal::traits Traits; typedef EigenBase Base; public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename Traits::IndicesType IndicesType; enum { Flags = Traits::Flags, RowsAtCompileTime = Traits::RowsAtCompileTime, ColsAtCompileTime = Traits::ColsAtCompileTime, MaxRowsAtCompileTime = Traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = Traits::MaxColsAtCompileTime }; typedef typename Traits::StorageIndex StorageIndex; typedef Matrix DenseMatrixType; typedef PermutationMatrix PlainPermutationType; typedef PlainPermutationType PlainObject; using Base::derived; typedef Inverse InverseReturnType; typedef void Scalar; #endif /** Copies the other permutation into *this */ template Derived& operator=(const PermutationBase& other) { indices() = other.indices(); return derived(); } /** Assignment from the Transpositions \a tr */ template Derived& operator=(const TranspositionsBase& tr) { setIdentity(tr.size()); for(Index k=size()-1; k>=0; --k) applyTranspositionOnTheRight(k,tr.coeff(k)); return derived(); } /** \returns the number of rows */ inline EIGEN_DEVICE_FUNC Index rows() const { return Index(indices().size()); } /** \returns the number of columns */ inline EIGEN_DEVICE_FUNC Index cols() const { return Index(indices().size()); } /** \returns the size of a side of the respective square matrix, i.e., the number of indices */ inline EIGEN_DEVICE_FUNC Index size() const { return Index(indices().size()); } #ifndef EIGEN_PARSED_BY_DOXYGEN template void evalTo(MatrixBase& other) const { other.setZero(); for (Index i=0; i=0 && j>=0 && i=0 && j>=0 && i void assignTranspose(const PermutationBase& other) { for (Index i=0; i void assignProduct(const Lhs& lhs, const Rhs& rhs) { eigen_assert(lhs.cols() == rhs.rows()); for (Index i=0; i inline PlainPermutationType operator*(const PermutationBase& other) const { return PlainPermutationType(internal::PermPermProduct, derived(), other.derived()); } /** \returns the product of a permutation with another inverse permutation. * * \note \blank \note_try_to_help_rvo */ template inline PlainPermutationType operator*(const InverseImpl& other) const { return PlainPermutationType(internal::PermPermProduct, *this, other.eval()); } /** \returns the product of an inverse permutation with another permutation. * * \note \blank \note_try_to_help_rvo */ template friend inline PlainPermutationType operator*(const InverseImpl& other, const PermutationBase& perm) { return PlainPermutationType(internal::PermPermProduct, other.eval(), perm); } /** \returns the determinant of the permutation matrix, which is either 1 or -1 depending on the parity of the permutation. * * This function is O(\c n) procedure allocating a buffer of \c n booleans. */ Index determinant() const { Index res = 1; Index n = size(); Matrix mask(n); mask.fill(false); Index r = 0; while(r < n) { // search for the next seed while(r=n) break; // we got one, let's follow it until we are back to the seed Index k0 = r++; mask.coeffRef(k0) = true; for(Index k=indices().coeff(k0); k!=k0; k=indices().coeff(k)) { mask.coeffRef(k) = true; res = -res; } } return res; } protected: }; namespace internal { template struct traits > : traits > { typedef PermutationStorage StorageKind; typedef Matrix<_StorageIndex, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1> IndicesType; typedef _StorageIndex StorageIndex; typedef void Scalar; }; } /** \class PermutationMatrix * \ingroup Core_Module * * \brief Permutation matrix * * \tparam SizeAtCompileTime the number of rows/cols, or Dynamic * \tparam MaxSizeAtCompileTime the maximum number of rows/cols, or Dynamic. This optional parameter defaults to SizeAtCompileTime. Most of the time, you should not have to specify it. * \tparam _StorageIndex the integer type of the indices * * This class represents a permutation matrix, internally stored as a vector of integers. * * \sa class PermutationBase, class PermutationWrapper, class DiagonalMatrix */ template class PermutationMatrix : public PermutationBase > { typedef PermutationBase Base; typedef internal::traits Traits; public: typedef const PermutationMatrix& Nested; #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename Traits::IndicesType IndicesType; typedef typename Traits::StorageIndex StorageIndex; #endif inline PermutationMatrix() {} /** Constructs an uninitialized permutation matrix of given size. */ explicit inline PermutationMatrix(Index size) : m_indices(size) { eigen_internal_assert(size <= NumTraits::highest()); } /** Copy constructor. */ template inline PermutationMatrix(const PermutationBase& other) : m_indices(other.indices()) {} /** Generic constructor from expression of the indices. The indices * array has the meaning that the permutations sends each integer i to indices[i]. * * \warning It is your responsibility to check that the indices array that you passes actually * describes a permutation, i.e., each value between 0 and n-1 occurs exactly once, where n is the * array's size. */ template explicit inline PermutationMatrix(const MatrixBase& indices) : m_indices(indices) {} /** Convert the Transpositions \a tr to a permutation matrix */ template explicit PermutationMatrix(const TranspositionsBase& tr) : m_indices(tr.size()) { *this = tr; } /** Copies the other permutation into *this */ template PermutationMatrix& operator=(const PermutationBase& other) { m_indices = other.indices(); return *this; } /** Assignment from the Transpositions \a tr */ template PermutationMatrix& operator=(const TranspositionsBase& tr) { return Base::operator=(tr.derived()); } /** const version of indices(). */ const IndicesType& indices() const { return m_indices; } /** \returns a reference to the stored array representing the permutation. */ IndicesType& indices() { return m_indices; } /**** multiplication helpers to hopefully get RVO ****/ #ifndef EIGEN_PARSED_BY_DOXYGEN template PermutationMatrix(const InverseImpl& other) : m_indices(other.derived().nestedExpression().size()) { eigen_internal_assert(m_indices.size() <= NumTraits::highest()); StorageIndex end = StorageIndex(m_indices.size()); for (StorageIndex i=0; i PermutationMatrix(internal::PermPermProduct_t, const Lhs& lhs, const Rhs& rhs) : m_indices(lhs.indices().size()) { Base::assignProduct(lhs,rhs); } #endif protected: IndicesType m_indices; }; namespace internal { template struct traits,_PacketAccess> > : traits > { typedef PermutationStorage StorageKind; typedef Map, _PacketAccess> IndicesType; typedef _StorageIndex StorageIndex; typedef void Scalar; }; } template class Map,_PacketAccess> : public PermutationBase,_PacketAccess> > { typedef PermutationBase Base; typedef internal::traits Traits; public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename Traits::IndicesType IndicesType; typedef typename IndicesType::Scalar StorageIndex; #endif inline Map(const StorageIndex* indicesPtr) : m_indices(indicesPtr) {} inline Map(const StorageIndex* indicesPtr, Index size) : m_indices(indicesPtr,size) {} /** Copies the other permutation into *this */ template Map& operator=(const PermutationBase& other) { return Base::operator=(other.derived()); } /** Assignment from the Transpositions \a tr */ template Map& operator=(const TranspositionsBase& tr) { return Base::operator=(tr.derived()); } #ifndef EIGEN_PARSED_BY_DOXYGEN /** This is a special case of the templated operator=. Its purpose is to * prevent a default operator= from hiding the templated operator=. */ Map& operator=(const Map& other) { m_indices = other.m_indices; return *this; } #endif /** const version of indices(). */ const IndicesType& indices() const { return m_indices; } /** \returns a reference to the stored array representing the permutation. */ IndicesType& indices() { return m_indices; } protected: IndicesType m_indices; }; template class TranspositionsWrapper; namespace internal { template struct traits > { typedef PermutationStorage StorageKind; typedef void Scalar; typedef typename _IndicesType::Scalar StorageIndex; typedef _IndicesType IndicesType; enum { RowsAtCompileTime = _IndicesType::SizeAtCompileTime, ColsAtCompileTime = _IndicesType::SizeAtCompileTime, MaxRowsAtCompileTime = IndicesType::MaxSizeAtCompileTime, MaxColsAtCompileTime = IndicesType::MaxSizeAtCompileTime, Flags = 0 }; }; } /** \class PermutationWrapper * \ingroup Core_Module * * \brief Class to view a vector of integers as a permutation matrix * * \tparam _IndicesType the type of the vector of integer (can be any compatible expression) * * This class allows to view any vector expression of integers as a permutation matrix. * * \sa class PermutationBase, class PermutationMatrix */ template class PermutationWrapper : public PermutationBase > { typedef PermutationBase Base; typedef internal::traits Traits; public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename Traits::IndicesType IndicesType; #endif inline PermutationWrapper(const IndicesType& indices) : m_indices(indices) {} /** const version of indices(). */ const typename internal::remove_all::type& indices() const { return m_indices; } protected: typename IndicesType::Nested m_indices; }; /** \returns the matrix with the permutation applied to the columns. */ template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase &matrix, const PermutationBase& permutation) { return Product (matrix.derived(), permutation.derived()); } /** \returns the matrix with the permutation applied to the rows. */ template EIGEN_DEVICE_FUNC const Product operator*(const PermutationBase &permutation, const MatrixBase& matrix) { return Product (permutation.derived(), matrix.derived()); } template class InverseImpl : public EigenBase > { typedef typename PermutationType::PlainPermutationType PlainPermutationType; typedef internal::traits PermTraits; protected: InverseImpl() {} public: typedef Inverse InverseType; using EigenBase >::derived; #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename PermutationType::DenseMatrixType DenseMatrixType; enum { RowsAtCompileTime = PermTraits::RowsAtCompileTime, ColsAtCompileTime = PermTraits::ColsAtCompileTime, MaxRowsAtCompileTime = PermTraits::MaxRowsAtCompileTime, MaxColsAtCompileTime = PermTraits::MaxColsAtCompileTime }; #endif #ifndef EIGEN_PARSED_BY_DOXYGEN template void evalTo(MatrixBase& other) const { other.setZero(); for (Index i=0; i friend const Product operator*(const MatrixBase& matrix, const InverseType& trPerm) { return Product(matrix.derived(), trPerm.derived()); } /** \returns the matrix with the inverse permutation applied to the rows. */ template const Product operator*(const MatrixBase& matrix) const { return Product(derived(), matrix.derived()); } }; template const PermutationWrapper MatrixBase::asPermutation() const { return derived(); } namespace internal { template<> struct AssignmentKind { typedef EigenBase2EigenBase Kind; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_PERMUTATIONMATRIX_H RcppEigen/inst/include/Eigen/src/Core/ForceAlignedAccess.h0000644000176200001440000001145514562417221023063 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_FORCEALIGNEDACCESS_H #define EIGEN_FORCEALIGNEDACCESS_H namespace Eigen { /** \class ForceAlignedAccess * \ingroup Core_Module * * \brief Enforce aligned packet loads and stores regardless of what is requested * * \param ExpressionType the type of the object of which we are forcing aligned packet access * * This class is the return type of MatrixBase::forceAlignedAccess() * and most of the time this is the only way it is used. * * \sa MatrixBase::forceAlignedAccess() */ namespace internal { template struct traits > : public traits {}; } template class ForceAlignedAccess : public internal::dense_xpr_base< ForceAlignedAccess >::type { public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(ForceAlignedAccess) EIGEN_DEVICE_FUNC explicit inline ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); } EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const { return m_expression.coeff(row, col); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col) { return m_expression.const_cast_derived().coeffRef(row, col); } EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const { return m_expression.coeff(index); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index) { return m_expression.const_cast_derived().coeffRef(index); } template inline const PacketScalar packet(Index row, Index col) const { return m_expression.template packet(row, col); } template inline void writePacket(Index row, Index col, const PacketScalar& x) { m_expression.const_cast_derived().template writePacket(row, col, x); } template inline const PacketScalar packet(Index index) const { return m_expression.template packet(index); } template inline void writePacket(Index index, const PacketScalar& x) { m_expression.const_cast_derived().template writePacket(index, x); } EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; } protected: const ExpressionType& m_expression; private: ForceAlignedAccess& operator=(const ForceAlignedAccess&); }; /** \returns an expression of *this with forced aligned access * \sa forceAlignedAccessIf(),class ForceAlignedAccess */ template inline const ForceAlignedAccess MatrixBase::forceAlignedAccess() const { return ForceAlignedAccess(derived()); } /** \returns an expression of *this with forced aligned access * \sa forceAlignedAccessIf(), class ForceAlignedAccess */ template inline ForceAlignedAccess MatrixBase::forceAlignedAccess() { return ForceAlignedAccess(derived()); } /** \returns an expression of *this with forced aligned access if \a Enable is true. * \sa forceAlignedAccess(), class ForceAlignedAccess */ template template inline typename internal::add_const_on_value_type,Derived&>::type>::type MatrixBase::forceAlignedAccessIf() const { return derived(); // FIXME This should not work but apparently is never used } /** \returns an expression of *this with forced aligned access if \a Enable is true. * \sa forceAlignedAccess(), class ForceAlignedAccess */ template template inline typename internal::conditional,Derived&>::type MatrixBase::forceAlignedAccessIf() { return derived(); // FIXME This should not work but apparently is never used } } // end namespace Eigen #endif // EIGEN_FORCEALIGNEDACCESS_H RcppEigen/inst/include/Eigen/src/Core/DiagonalMatrix.h0000644000176200001440000003451614562417221022325 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // Copyright (C) 2007-2009 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DIAGONALMATRIX_H #define EIGEN_DIAGONALMATRIX_H namespace Eigen { #ifndef EIGEN_PARSED_BY_DOXYGEN template class DiagonalBase : public EigenBase { public: typedef typename internal::traits::DiagonalVectorType DiagonalVectorType; typedef typename DiagonalVectorType::Scalar Scalar; typedef typename DiagonalVectorType::RealScalar RealScalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::StorageIndex StorageIndex; enum { RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime, ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime, MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime, MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime, IsVectorAtCompileTime = 0, Flags = NoPreferredStorageOrderBit }; typedef Matrix DenseMatrixType; typedef DenseMatrixType DenseType; typedef DiagonalMatrix PlainObject; EIGEN_DEVICE_FUNC inline const Derived& derived() const { return *static_cast(this); } EIGEN_DEVICE_FUNC inline Derived& derived() { return *static_cast(this); } EIGEN_DEVICE_FUNC DenseMatrixType toDenseMatrix() const { return derived(); } EIGEN_DEVICE_FUNC inline const DiagonalVectorType& diagonal() const { return derived().diagonal(); } EIGEN_DEVICE_FUNC inline DiagonalVectorType& diagonal() { return derived().diagonal(); } EIGEN_DEVICE_FUNC inline Index rows() const { return diagonal().size(); } EIGEN_DEVICE_FUNC inline Index cols() const { return diagonal().size(); } template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase &matrix) const { return Product(derived(),matrix.derived()); } typedef DiagonalWrapper, const DiagonalVectorType> > InverseReturnType; EIGEN_DEVICE_FUNC inline const InverseReturnType inverse() const { return InverseReturnType(diagonal().cwiseInverse()); } EIGEN_DEVICE_FUNC inline const DiagonalWrapper operator*(const Scalar& scalar) const { return DiagonalWrapper(diagonal() * scalar); } EIGEN_DEVICE_FUNC friend inline const DiagonalWrapper operator*(const Scalar& scalar, const DiagonalBase& other) { return DiagonalWrapper(scalar * other.diagonal()); } template EIGEN_DEVICE_FUNC #ifdef EIGEN_PARSED_BY_DOXYGEN inline unspecified_expression_type #else inline const DiagonalWrapper #endif operator+(const DiagonalBase& other) const { return (diagonal() + other.diagonal()).asDiagonal(); } template EIGEN_DEVICE_FUNC #ifdef EIGEN_PARSED_BY_DOXYGEN inline unspecified_expression_type #else inline const DiagonalWrapper #endif operator-(const DiagonalBase& other) const { return (diagonal() - other.diagonal()).asDiagonal(); } }; #endif /** \class DiagonalMatrix * \ingroup Core_Module * * \brief Represents a diagonal matrix with its storage * * \param _Scalar the type of coefficients * \param SizeAtCompileTime the dimension of the matrix, or Dynamic * \param MaxSizeAtCompileTime the dimension of the matrix, or Dynamic. This parameter is optional and defaults * to SizeAtCompileTime. Most of the time, you do not need to specify it. * * \sa class DiagonalWrapper */ namespace internal { template struct traits > : traits > { typedef Matrix<_Scalar,SizeAtCompileTime,1,0,MaxSizeAtCompileTime,1> DiagonalVectorType; typedef DiagonalShape StorageKind; enum { Flags = LvalueBit | NoPreferredStorageOrderBit }; }; } template class DiagonalMatrix : public DiagonalBase > { public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef typename internal::traits::DiagonalVectorType DiagonalVectorType; typedef const DiagonalMatrix& Nested; typedef _Scalar Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::StorageIndex StorageIndex; #endif protected: DiagonalVectorType m_diagonal; public: /** const version of diagonal(). */ EIGEN_DEVICE_FUNC inline const DiagonalVectorType& diagonal() const { return m_diagonal; } /** \returns a reference to the stored vector of diagonal coefficients. */ EIGEN_DEVICE_FUNC inline DiagonalVectorType& diagonal() { return m_diagonal; } /** Default constructor without initialization */ EIGEN_DEVICE_FUNC inline DiagonalMatrix() {} /** Constructs a diagonal matrix with given dimension */ EIGEN_DEVICE_FUNC explicit inline DiagonalMatrix(Index dim) : m_diagonal(dim) {} /** 2D constructor. */ EIGEN_DEVICE_FUNC inline DiagonalMatrix(const Scalar& x, const Scalar& y) : m_diagonal(x,y) {} /** 3D constructor. */ EIGEN_DEVICE_FUNC inline DiagonalMatrix(const Scalar& x, const Scalar& y, const Scalar& z) : m_diagonal(x,y,z) {} #if EIGEN_HAS_CXX11 /** \brief Construct a diagonal matrix with fixed size from an arbitrary number of coefficients. \cpp11 * * There exists C++98 anologue constructors for fixed-size diagonal matrices having 2 or 3 coefficients. * * \warning To construct a diagonal matrix of fixed size, the number of values passed to this * constructor must match the fixed dimension of \c *this. * * \sa DiagonalMatrix(const Scalar&, const Scalar&) * \sa DiagonalMatrix(const Scalar&, const Scalar&, const Scalar&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DiagonalMatrix(const Scalar& a0, const Scalar& a1, const Scalar& a2, const ArgTypes&... args) : m_diagonal(a0, a1, a2, args...) {} /** \brief Constructs a DiagonalMatrix and initializes it by elements given by an initializer list of initializer * lists \cpp11 */ EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE DiagonalMatrix(const std::initializer_list>& list) : m_diagonal(list) {} #endif // EIGEN_HAS_CXX11 /** Copy constructor. */ template EIGEN_DEVICE_FUNC inline DiagonalMatrix(const DiagonalBase& other) : m_diagonal(other.diagonal()) {} #ifndef EIGEN_PARSED_BY_DOXYGEN /** copy constructor. prevent a default copy constructor from hiding the other templated constructor */ inline DiagonalMatrix(const DiagonalMatrix& other) : m_diagonal(other.diagonal()) {} #endif /** generic constructor from expression of the diagonal coefficients */ template EIGEN_DEVICE_FUNC explicit inline DiagonalMatrix(const MatrixBase& other) : m_diagonal(other) {} /** Copy operator. */ template EIGEN_DEVICE_FUNC DiagonalMatrix& operator=(const DiagonalBase& other) { m_diagonal = other.diagonal(); return *this; } #ifndef EIGEN_PARSED_BY_DOXYGEN /** This is a special case of the templated operator=. Its purpose is to * prevent a default operator= from hiding the templated operator=. */ EIGEN_DEVICE_FUNC DiagonalMatrix& operator=(const DiagonalMatrix& other) { m_diagonal = other.diagonal(); return *this; } #endif /** Resizes to given size. */ EIGEN_DEVICE_FUNC inline void resize(Index size) { m_diagonal.resize(size); } /** Sets all coefficients to zero. */ EIGEN_DEVICE_FUNC inline void setZero() { m_diagonal.setZero(); } /** Resizes and sets all coefficients to zero. */ EIGEN_DEVICE_FUNC inline void setZero(Index size) { m_diagonal.setZero(size); } /** Sets this matrix to be the identity matrix of the current size. */ EIGEN_DEVICE_FUNC inline void setIdentity() { m_diagonal.setOnes(); } /** Sets this matrix to be the identity matrix of the given size. */ EIGEN_DEVICE_FUNC inline void setIdentity(Index size) { m_diagonal.setOnes(size); } }; /** \class DiagonalWrapper * \ingroup Core_Module * * \brief Expression of a diagonal matrix * * \param _DiagonalVectorType the type of the vector of diagonal coefficients * * This class is an expression of a diagonal matrix, but not storing its own vector of diagonal coefficients, * instead wrapping an existing vector expression. It is the return type of MatrixBase::asDiagonal() * and most of the time this is the only way that it is used. * * \sa class DiagonalMatrix, class DiagonalBase, MatrixBase::asDiagonal() */ namespace internal { template struct traits > { typedef _DiagonalVectorType DiagonalVectorType; typedef typename DiagonalVectorType::Scalar Scalar; typedef typename DiagonalVectorType::StorageIndex StorageIndex; typedef DiagonalShape StorageKind; typedef typename traits::XprKind XprKind; enum { RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime, ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime, MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime, MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime, Flags = (traits::Flags & LvalueBit) | NoPreferredStorageOrderBit }; }; } template class DiagonalWrapper : public DiagonalBase >, internal::no_assignment_operator { public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef _DiagonalVectorType DiagonalVectorType; typedef DiagonalWrapper Nested; #endif /** Constructor from expression of diagonal coefficients to wrap. */ EIGEN_DEVICE_FUNC explicit inline DiagonalWrapper(DiagonalVectorType& a_diagonal) : m_diagonal(a_diagonal) {} /** \returns a const reference to the wrapped expression of diagonal coefficients. */ EIGEN_DEVICE_FUNC const DiagonalVectorType& diagonal() const { return m_diagonal; } protected: typename DiagonalVectorType::Nested m_diagonal; }; /** \returns a pseudo-expression of a diagonal matrix with *this as vector of diagonal coefficients * * \only_for_vectors * * Example: \include MatrixBase_asDiagonal.cpp * Output: \verbinclude MatrixBase_asDiagonal.out * * \sa class DiagonalWrapper, class DiagonalMatrix, diagonal(), isDiagonal() **/ template EIGEN_DEVICE_FUNC inline const DiagonalWrapper MatrixBase::asDiagonal() const { return DiagonalWrapper(derived()); } /** \returns true if *this is approximately equal to a diagonal matrix, * within the precision given by \a prec. * * Example: \include MatrixBase_isDiagonal.cpp * Output: \verbinclude MatrixBase_isDiagonal.out * * \sa asDiagonal() */ template bool MatrixBase::isDiagonal(const RealScalar& prec) const { if(cols() != rows()) return false; RealScalar maxAbsOnDiagonal = static_cast(-1); for(Index j = 0; j < cols(); ++j) { RealScalar absOnDiagonal = numext::abs(coeff(j,j)); if(absOnDiagonal > maxAbsOnDiagonal) maxAbsOnDiagonal = absOnDiagonal; } for(Index j = 0; j < cols(); ++j) for(Index i = 0; i < j; ++i) { if(!internal::isMuchSmallerThan(coeff(i, j), maxAbsOnDiagonal, prec)) return false; if(!internal::isMuchSmallerThan(coeff(j, i), maxAbsOnDiagonal, prec)) return false; } return true; } namespace internal { template<> struct storage_kind_to_shape { typedef DiagonalShape Shape; }; struct Diagonal2Dense {}; template<> struct AssignmentKind { typedef Diagonal2Dense Kind; }; // Diagonal matrix to Dense assignment template< typename DstXprType, typename SrcXprType, typename Functor> struct Assignment { static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &/*func*/) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); dst.setZero(); dst.diagonal() = src.diagonal(); } static void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op &/*func*/) { dst.diagonal() += src.diagonal(); } static void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op &/*func*/) { dst.diagonal() -= src.diagonal(); } }; } // namespace internal } // end namespace Eigen #endif // EIGEN_DIAGONALMATRIX_H RcppEigen/inst/include/Eigen/src/Core/Redux.h0000644000176200001440000004537314562417221020514 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_REDUX_H #define EIGEN_REDUX_H namespace Eigen { namespace internal { // TODO // * implement other kind of vectorization // * factorize code /*************************************************************************** * Part 1 : the logic deciding a strategy for vectorization and unrolling ***************************************************************************/ template struct redux_traits { public: typedef typename find_best_packet::type PacketType; enum { PacketSize = unpacket_traits::size, InnerMaxSize = int(Evaluator::IsRowMajor) ? Evaluator::MaxColsAtCompileTime : Evaluator::MaxRowsAtCompileTime, OuterMaxSize = int(Evaluator::IsRowMajor) ? Evaluator::MaxRowsAtCompileTime : Evaluator::MaxColsAtCompileTime, SliceVectorizedWork = int(InnerMaxSize)==Dynamic ? Dynamic : int(OuterMaxSize)==Dynamic ? (int(InnerMaxSize)>=int(PacketSize) ? Dynamic : 0) : (int(InnerMaxSize)/int(PacketSize)) * int(OuterMaxSize) }; enum { MightVectorize = (int(Evaluator::Flags)&ActualPacketAccessBit) && (functor_traits::PacketAccess), MayLinearVectorize = bool(MightVectorize) && (int(Evaluator::Flags)&LinearAccessBit), MaySliceVectorize = bool(MightVectorize) && (int(SliceVectorizedWork)==Dynamic || int(SliceVectorizedWork)>=3) }; public: enum { Traversal = int(MayLinearVectorize) ? int(LinearVectorizedTraversal) : int(MaySliceVectorize) ? int(SliceVectorizedTraversal) : int(DefaultTraversal) }; public: enum { Cost = Evaluator::SizeAtCompileTime == Dynamic ? HugeCost : int(Evaluator::SizeAtCompileTime) * int(Evaluator::CoeffReadCost) + (Evaluator::SizeAtCompileTime-1) * functor_traits::Cost, UnrollingLimit = EIGEN_UNROLLING_LIMIT * (int(Traversal) == int(DefaultTraversal) ? 1 : int(PacketSize)) }; public: enum { Unrolling = Cost <= UnrollingLimit ? CompleteUnrolling : NoUnrolling }; #ifdef EIGEN_DEBUG_ASSIGN static void debug() { std::cerr << "Xpr: " << typeid(typename Evaluator::XprType).name() << std::endl; std::cerr.setf(std::ios::hex, std::ios::basefield); EIGEN_DEBUG_VAR(Evaluator::Flags) std::cerr.unsetf(std::ios::hex); EIGEN_DEBUG_VAR(InnerMaxSize) EIGEN_DEBUG_VAR(OuterMaxSize) EIGEN_DEBUG_VAR(SliceVectorizedWork) EIGEN_DEBUG_VAR(PacketSize) EIGEN_DEBUG_VAR(MightVectorize) EIGEN_DEBUG_VAR(MayLinearVectorize) EIGEN_DEBUG_VAR(MaySliceVectorize) std::cerr << "Traversal" << " = " << Traversal << " (" << demangle_traversal(Traversal) << ")" << std::endl; EIGEN_DEBUG_VAR(UnrollingLimit) std::cerr << "Unrolling" << " = " << Unrolling << " (" << demangle_unrolling(Unrolling) << ")" << std::endl; std::cerr << std::endl; } #endif }; /*************************************************************************** * Part 2 : unrollers ***************************************************************************/ /*** no vectorization ***/ template struct redux_novec_unroller { enum { HalfLength = Length/2 }; typedef typename Evaluator::Scalar Scalar; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func& func) { return func(redux_novec_unroller::run(eval,func), redux_novec_unroller::run(eval,func)); } }; template struct redux_novec_unroller { enum { outer = Start / Evaluator::InnerSizeAtCompileTime, inner = Start % Evaluator::InnerSizeAtCompileTime }; typedef typename Evaluator::Scalar Scalar; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func&) { return eval.coeffByOuterInner(outer, inner); } }; // This is actually dead code and will never be called. It is required // to prevent false warnings regarding failed inlining though // for 0 length run() will never be called at all. template struct redux_novec_unroller { typedef typename Evaluator::Scalar Scalar; EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator&, const Func&) { return Scalar(); } }; /*** vectorization ***/ template struct redux_vec_unroller { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE PacketType run(const Evaluator &eval, const Func& func) { enum { PacketSize = unpacket_traits::size, HalfLength = Length/2 }; return func.packetOp( redux_vec_unroller::template run(eval,func), redux_vec_unroller::template run(eval,func) ); } }; template struct redux_vec_unroller { template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE PacketType run(const Evaluator &eval, const Func&) { enum { PacketSize = unpacket_traits::size, index = Start * PacketSize, outer = index / int(Evaluator::InnerSizeAtCompileTime), inner = index % int(Evaluator::InnerSizeAtCompileTime), alignment = Evaluator::Alignment }; return eval.template packetByOuterInner(outer, inner); } }; /*************************************************************************** * Part 3 : implementation of all cases ***************************************************************************/ template::Traversal, int Unrolling = redux_traits::Unrolling > struct redux_impl; template struct redux_impl { typedef typename Evaluator::Scalar Scalar; template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr) { eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix"); Scalar res; res = eval.coeffByOuterInner(0, 0); for(Index i = 1; i < xpr.innerSize(); ++i) res = func(res, eval.coeffByOuterInner(0, i)); for(Index i = 1; i < xpr.outerSize(); ++i) for(Index j = 0; j < xpr.innerSize(); ++j) res = func(res, eval.coeffByOuterInner(i, j)); return res; } }; template struct redux_impl : redux_novec_unroller { typedef redux_novec_unroller Base; typedef typename Evaluator::Scalar Scalar; template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func& func, const XprType& /*xpr*/) { return Base::run(eval,func); } }; template struct redux_impl { typedef typename Evaluator::Scalar Scalar; typedef typename redux_traits::PacketType PacketScalar; template static Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr) { const Index size = xpr.size(); const Index packetSize = redux_traits::PacketSize; const int packetAlignment = unpacket_traits::alignment; enum { alignment0 = (bool(Evaluator::Flags & DirectAccessBit) && bool(packet_traits::AlignedOnScalar)) ? int(packetAlignment) : int(Unaligned), alignment = EIGEN_PLAIN_ENUM_MAX(alignment0, Evaluator::Alignment) }; const Index alignedStart = internal::first_default_aligned(xpr); const Index alignedSize2 = ((size-alignedStart)/(2*packetSize))*(2*packetSize); const Index alignedSize = ((size-alignedStart)/(packetSize))*(packetSize); const Index alignedEnd2 = alignedStart + alignedSize2; const Index alignedEnd = alignedStart + alignedSize; Scalar res; if(alignedSize) { PacketScalar packet_res0 = eval.template packet(alignedStart); if(alignedSize>packetSize) // we have at least two packets to partly unroll the loop { PacketScalar packet_res1 = eval.template packet(alignedStart+packetSize); for(Index index = alignedStart + 2*packetSize; index < alignedEnd2; index += 2*packetSize) { packet_res0 = func.packetOp(packet_res0, eval.template packet(index)); packet_res1 = func.packetOp(packet_res1, eval.template packet(index+packetSize)); } packet_res0 = func.packetOp(packet_res0,packet_res1); if(alignedEnd>alignedEnd2) packet_res0 = func.packetOp(packet_res0, eval.template packet(alignedEnd2)); } res = func.predux(packet_res0); for(Index index = 0; index < alignedStart; ++index) res = func(res,eval.coeff(index)); for(Index index = alignedEnd; index < size; ++index) res = func(res,eval.coeff(index)); } else // too small to vectorize anything. // since this is dynamic-size hence inefficient anyway for such small sizes, don't try to optimize. { res = eval.coeff(0); for(Index index = 1; index < size; ++index) res = func(res,eval.coeff(index)); } return res; } }; // NOTE: for SliceVectorizedTraversal we simply bypass unrolling template struct redux_impl { typedef typename Evaluator::Scalar Scalar; typedef typename redux_traits::PacketType PacketType; template EIGEN_DEVICE_FUNC static Scalar run(const Evaluator &eval, const Func& func, const XprType& xpr) { eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix"); const Index innerSize = xpr.innerSize(); const Index outerSize = xpr.outerSize(); enum { packetSize = redux_traits::PacketSize }; const Index packetedInnerSize = ((innerSize)/packetSize)*packetSize; Scalar res; if(packetedInnerSize) { PacketType packet_res = eval.template packet(0,0); for(Index j=0; j(j,i)); res = func.predux(packet_res); for(Index j=0; j::run(eval, func, xpr); } return res; } }; template struct redux_impl { typedef typename Evaluator::Scalar Scalar; typedef typename redux_traits::PacketType PacketType; enum { PacketSize = redux_traits::PacketSize, Size = Evaluator::SizeAtCompileTime, VectorizedSize = (int(Size) / int(PacketSize)) * int(PacketSize) }; template EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Scalar run(const Evaluator &eval, const Func& func, const XprType &xpr) { EIGEN_ONLY_USED_FOR_DEBUG(xpr) eigen_assert(xpr.rows()>0 && xpr.cols()>0 && "you are using an empty matrix"); if (VectorizedSize > 0) { Scalar res = func.predux(redux_vec_unroller::template run(eval,func)); if (VectorizedSize != Size) res = func(res,redux_novec_unroller::run(eval,func)); return res; } else { return redux_novec_unroller::run(eval,func); } } }; // evaluator adaptor template class redux_evaluator : public internal::evaluator<_XprType> { typedef internal::evaluator<_XprType> Base; public: typedef _XprType XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit redux_evaluator(const XprType &xpr) : Base(xpr) {} typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; typedef typename XprType::PacketScalar PacketScalar; enum { MaxRowsAtCompileTime = XprType::MaxRowsAtCompileTime, MaxColsAtCompileTime = XprType::MaxColsAtCompileTime, // TODO we should not remove DirectAccessBit and rather find an elegant way to query the alignment offset at runtime from the evaluator Flags = Base::Flags & ~DirectAccessBit, IsRowMajor = XprType::IsRowMajor, SizeAtCompileTime = XprType::SizeAtCompileTime, InnerSizeAtCompileTime = XprType::InnerSizeAtCompileTime }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeffByOuterInner(Index outer, Index inner) const { return Base::coeff(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketType packetByOuterInner(Index outer, Index inner) const { return Base::template packet(IsRowMajor ? outer : inner, IsRowMajor ? inner : outer); } }; } // end namespace internal /*************************************************************************** * Part 4 : public API ***************************************************************************/ /** \returns the result of a full redux operation on the whole matrix or vector using \a func * * The template parameter \a BinaryOp is the type of the functor \a func which must be * an associative operator. Both current C++98 and C++11 functor styles are handled. * * \warning the matrix must be not empty, otherwise an assertion is triggered. * * \sa DenseBase::sum(), DenseBase::minCoeff(), DenseBase::maxCoeff(), MatrixBase::colwise(), MatrixBase::rowwise() */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::redux(const Func& func) const { eigen_assert(this->rows()>0 && this->cols()>0 && "you are using an empty matrix"); typedef typename internal::redux_evaluator ThisEvaluator; ThisEvaluator thisEval(derived()); // The initial expression is passed to the reducer as an additional argument instead of // passing it as a member of redux_evaluator to help return internal::redux_impl::run(thisEval, func, derived()); } /** \returns the minimum of all coefficients of \c *this. * In case \c *this contains NaN, NaNPropagation determines the behavior: * NaNPropagation == PropagateFast : undefined * NaNPropagation == PropagateNaN : result is NaN * NaNPropagation == PropagateNumbers : result is minimum of elements that are not NaN * \warning the matrix must be not empty, otherwise an assertion is triggered. */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::minCoeff() const { return derived().redux(Eigen::internal::scalar_min_op()); } /** \returns the maximum of all coefficients of \c *this. * In case \c *this contains NaN, NaNPropagation determines the behavior: * NaNPropagation == PropagateFast : undefined * NaNPropagation == PropagateNaN : result is NaN * NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN * \warning the matrix must be not empty, otherwise an assertion is triggered. */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::maxCoeff() const { return derived().redux(Eigen::internal::scalar_max_op()); } /** \returns the sum of all coefficients of \c *this * * If \c *this is empty, then the value 0 is returned. * * \sa trace(), prod(), mean() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::sum() const { if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0)) return Scalar(0); return derived().redux(Eigen::internal::scalar_sum_op()); } /** \returns the mean of all coefficients of *this * * \sa trace(), prod(), sum() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::mean() const { #ifdef __INTEL_COMPILER #pragma warning push #pragma warning ( disable : 2259 ) #endif return Scalar(derived().redux(Eigen::internal::scalar_sum_op())) / Scalar(this->size()); #ifdef __INTEL_COMPILER #pragma warning pop #endif } /** \returns the product of all coefficients of *this * * Example: \include MatrixBase_prod.cpp * Output: \verbinclude MatrixBase_prod.out * * \sa sum(), mean(), trace() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar DenseBase::prod() const { if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0)) return Scalar(1); return derived().redux(Eigen::internal::scalar_product_op()); } /** \returns the trace of \c *this, i.e. the sum of the coefficients on the main diagonal. * * \c *this can be any matrix, not necessarily square. * * \sa diagonal(), sum() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::traits::Scalar MatrixBase::trace() const { return derived().diagonal().sum(); } } // end namespace Eigen #endif // EIGEN_REDUX_H RcppEigen/inst/include/Eigen/src/Core/CwiseNullaryOp.h0000644000176200001440000010667214562417221022345 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CWISE_NULLARY_OP_H #define EIGEN_CWISE_NULLARY_OP_H namespace Eigen { namespace internal { template struct traits > : traits { enum { Flags = traits::Flags & RowMajorBit }; }; } // namespace internal /** \class CwiseNullaryOp * \ingroup Core_Module * * \brief Generic expression of a matrix where all coefficients are defined by a functor * * \tparam NullaryOp template functor implementing the operator * \tparam PlainObjectType the underlying plain matrix/array type * * This class represents an expression of a generic nullary operator. * It is the return type of the Ones(), Zero(), Constant(), Identity() and Random() methods, * and most of the time this is the only way it is used. * * However, if you want to write a function returning such an expression, you * will need to use this class. * * The functor NullaryOp must expose one of the following method:
\c operator()() if the procedural generation does not depend on the coefficient entries (e.g., random numbers)
\c operator()(Index i)if the procedural generation makes sense for vectors only and that it depends on the coefficient index \c i (e.g., linspace)
\c operator()(Index i,Index j)if the procedural generation depends on the matrix coordinates \c i, \c j (e.g., to generate a checkerboard with 0 and 1)
* It is also possible to expose the last two operators if the generation makes sense for matrices but can be optimized for vectors. * * See DenseBase::NullaryExpr(Index,const CustomNullaryOp&) for an example binding * C++11 random number generators. * * A nullary expression can also be used to implement custom sophisticated matrix manipulations * that cannot be covered by the existing set of natively supported matrix manipulations. * See this \ref TopicCustomizing_NullaryExpr "page" for some examples and additional explanations * on the behavior of CwiseNullaryOp. * * \sa class CwiseUnaryOp, class CwiseBinaryOp, DenseBase::NullaryExpr */ template class CwiseNullaryOp : public internal::dense_xpr_base< CwiseNullaryOp >::type, internal::no_assignment_operator { public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(CwiseNullaryOp) EIGEN_DEVICE_FUNC CwiseNullaryOp(Index rows, Index cols, const NullaryOp& func = NullaryOp()) : m_rows(rows), m_cols(cols), m_functor(func) { eigen_assert(rows >= 0 && (RowsAtCompileTime == Dynamic || RowsAtCompileTime == rows) && cols >= 0 && (ColsAtCompileTime == Dynamic || ColsAtCompileTime == cols)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const { return m_rows.value(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const { return m_cols.value(); } /** \returns the functor representing the nullary operation */ EIGEN_DEVICE_FUNC const NullaryOp& functor() const { return m_functor; } protected: const internal::variable_if_dynamic m_rows; const internal::variable_if_dynamic m_cols; const NullaryOp m_functor; }; /** \returns an expression of a matrix defined by a custom functor \a func * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this MatrixBase type. * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Zero() should be used * instead. * * The template parameter \a CustomNullaryOp is the type of the functor. * * \sa class CwiseNullaryOp */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE #ifndef EIGEN_PARSED_BY_DOXYGEN const CwiseNullaryOp::PlainObject> #else const CwiseNullaryOp #endif DenseBase::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& func) { return CwiseNullaryOp(rows, cols, func); } /** \returns an expression of a matrix defined by a custom functor \a func * * The parameter \a size is the size of the returned vector. * Must be compatible with this MatrixBase type. * * \only_for_vectors * * This variant is meant to be used for dynamic-size vector types. For fixed-size types, * it is redundant to pass \a size as argument, so Zero() should be used * instead. * * The template parameter \a CustomNullaryOp is the type of the functor. * * Here is an example with C++11 random generators: \include random_cpp11.cpp * Output: \verbinclude random_cpp11.out * * \sa class CwiseNullaryOp */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE #ifndef EIGEN_PARSED_BY_DOXYGEN const CwiseNullaryOp::PlainObject> #else const CwiseNullaryOp #endif DenseBase::NullaryExpr(Index size, const CustomNullaryOp& func) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) if(RowsAtCompileTime == 1) return CwiseNullaryOp(1, size, func); else return CwiseNullaryOp(size, 1, func); } /** \returns an expression of a matrix defined by a custom functor \a func * * This variant is only for fixed-size DenseBase types. For dynamic-size types, you * need to use the variants taking size arguments. * * The template parameter \a CustomNullaryOp is the type of the functor. * * \sa class CwiseNullaryOp */ template template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE #ifndef EIGEN_PARSED_BY_DOXYGEN const CwiseNullaryOp::PlainObject> #else const CwiseNullaryOp #endif DenseBase::NullaryExpr(const CustomNullaryOp& func) { return CwiseNullaryOp(RowsAtCompileTime, ColsAtCompileTime, func); } /** \returns an expression of a constant matrix of value \a value * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this DenseBase type. * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Zero() should be used * instead. * * The template parameter \a CustomNullaryOp is the type of the functor. * * \sa class CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Constant(Index rows, Index cols, const Scalar& value) { return DenseBase::NullaryExpr(rows, cols, internal::scalar_constant_op(value)); } /** \returns an expression of a constant matrix of value \a value * * The parameter \a size is the size of the returned vector. * Must be compatible with this DenseBase type. * * \only_for_vectors * * This variant is meant to be used for dynamic-size vector types. For fixed-size types, * it is redundant to pass \a size as argument, so Zero() should be used * instead. * * The template parameter \a CustomNullaryOp is the type of the functor. * * \sa class CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Constant(Index size, const Scalar& value) { return DenseBase::NullaryExpr(size, internal::scalar_constant_op(value)); } /** \returns an expression of a constant matrix of value \a value * * This variant is only for fixed-size DenseBase types. For dynamic-size types, you * need to use the variants taking size arguments. * * The template parameter \a CustomNullaryOp is the type of the functor. * * \sa class CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Constant(const Scalar& value) { EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived) return DenseBase::NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_constant_op(value)); } /** \deprecated because of accuracy loss. In Eigen 3.3, it is an alias for LinSpaced(Index,const Scalar&,const Scalar&) * * \only_for_vectors * * Example: \include DenseBase_LinSpaced_seq_deprecated.cpp * Output: \verbinclude DenseBase_LinSpaced_seq_deprecated.out * * \sa LinSpaced(Index,const Scalar&, const Scalar&), setLinSpaced(Index,const Scalar&,const Scalar&) */ template EIGEN_DEPRECATED EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::RandomAccessLinSpacedReturnType DenseBase::LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return DenseBase::NullaryExpr(size, internal::linspaced_op(low,high,size)); } /** \deprecated because of accuracy loss. In Eigen 3.3, it is an alias for LinSpaced(const Scalar&,const Scalar&) * * \sa LinSpaced(const Scalar&, const Scalar&) */ template EIGEN_DEPRECATED EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::RandomAccessLinSpacedReturnType DenseBase::LinSpaced(Sequential_t, const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived) return DenseBase::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op(low,high,Derived::SizeAtCompileTime)); } /** * \brief Sets a linearly spaced vector. * * The function generates 'size' equally spaced values in the closed interval [low,high]. * When size is set to 1, a vector of length 1 containing 'high' is returned. * * \only_for_vectors * * Example: \include DenseBase_LinSpaced.cpp * Output: \verbinclude DenseBase_LinSpaced.out * * For integer scalar types, an even spacing is possible if and only if the length of the range, * i.e., \c high-low is a scalar multiple of \c size-1, or if \c size is a scalar multiple of the * number of values \c high-low+1 (meaning each value can be repeated the same number of time). * If one of these two considions is not satisfied, then \c high is lowered to the largest value * satisfying one of this constraint. * Here are some examples: * * Example: \include DenseBase_LinSpacedInt.cpp * Output: \verbinclude DenseBase_LinSpacedInt.out * * \sa setLinSpaced(Index,const Scalar&,const Scalar&), CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::RandomAccessLinSpacedReturnType DenseBase::LinSpaced(Index size, const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return DenseBase::NullaryExpr(size, internal::linspaced_op(low,high,size)); } /** * \copydoc DenseBase::LinSpaced(Index, const Scalar&, const Scalar&) * Special version for fixed size types which does not require the size parameter. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::RandomAccessLinSpacedReturnType DenseBase::LinSpaced(const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived) return DenseBase::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op(low,high,Derived::SizeAtCompileTime)); } /** \returns true if all coefficients in this matrix are approximately equal to \a val, to within precision \a prec */ template EIGEN_DEVICE_FUNC bool DenseBase::isApproxToConstant (const Scalar& val, const RealScalar& prec) const { typename internal::nested_eval::type self(derived()); for(Index j = 0; j < cols(); ++j) for(Index i = 0; i < rows(); ++i) if(!internal::isApprox(self.coeff(i, j), val, prec)) return false; return true; } /** This is just an alias for isApproxToConstant(). * * \returns true if all coefficients in this matrix are approximately equal to \a value, to within precision \a prec */ template EIGEN_DEVICE_FUNC bool DenseBase::isConstant (const Scalar& val, const RealScalar& prec) const { return isApproxToConstant(val, prec); } /** Alias for setConstant(): sets all coefficients in this expression to \a val. * * \sa setConstant(), Constant(), class CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void DenseBase::fill(const Scalar& val) { setConstant(val); } /** Sets all coefficients in this expression to value \a val. * * \sa fill(), setConstant(Index,const Scalar&), setConstant(Index,Index,const Scalar&), setZero(), setOnes(), Constant(), class CwiseNullaryOp, setZero(), setOnes() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::setConstant(const Scalar& val) { return derived() = Constant(rows(), cols(), val); } /** Resizes to the given \a size, and sets all coefficients in this expression to the given value \a val. * * \only_for_vectors * * Example: \include Matrix_setConstant_int.cpp * Output: \verbinclude Matrix_setConstant_int.out * * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setConstant(Index size, const Scalar& val) { resize(size); return setConstant(val); } /** Resizes to the given size, and sets all coefficients in this expression to the given value \a val. * * \param rows the new number of rows * \param cols the new number of columns * \param val the value to which all coefficients are set * * Example: \include Matrix_setConstant_int_int.cpp * Output: \verbinclude Matrix_setConstant_int_int.out * * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setConstant(Index rows, Index cols, const Scalar& val) { resize(rows, cols); return setConstant(val); } /** Resizes to the given size, changing only the number of columns, and sets all * coefficients in this expression to the given value \a val. For the parameter * of type NoChange_t, just pass the special value \c NoChange. * * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setConstant(NoChange_t, Index cols, const Scalar& val) { return setConstant(rows(), cols, val); } /** Resizes to the given size, changing only the number of rows, and sets all * coefficients in this expression to the given value \a val. For the parameter * of type NoChange_t, just pass the special value \c NoChange. * * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setConstant(Index rows, NoChange_t, const Scalar& val) { return setConstant(rows, cols(), val); } /** * \brief Sets a linearly spaced vector. * * The function generates 'size' equally spaced values in the closed interval [low,high]. * When size is set to 1, a vector of length 1 containing 'high' is returned. * * \only_for_vectors * * Example: \include DenseBase_setLinSpaced.cpp * Output: \verbinclude DenseBase_setLinSpaced.out * * For integer scalar types, do not miss the explanations on the definition * of \link LinSpaced(Index,const Scalar&,const Scalar&) even spacing \endlink. * * \sa LinSpaced(Index,const Scalar&,const Scalar&), CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::setLinSpaced(Index newSize, const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return derived() = Derived::NullaryExpr(newSize, internal::linspaced_op(low,high,newSize)); } /** * \brief Sets a linearly spaced vector. * * The function fills \c *this with equally spaced values in the closed interval [low,high]. * When size is set to 1, a vector of length 1 containing 'high' is returned. * * \only_for_vectors * * For integer scalar types, do not miss the explanations on the definition * of \link LinSpaced(Index,const Scalar&,const Scalar&) even spacing \endlink. * * \sa LinSpaced(Index,const Scalar&,const Scalar&), setLinSpaced(Index, const Scalar&, const Scalar&), CwiseNullaryOp */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::setLinSpaced(const Scalar& low, const Scalar& high) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return setLinSpaced(size(), low, high); } // zero: /** \returns an expression of a zero matrix. * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this MatrixBase type. * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Zero() should be used * instead. * * Example: \include MatrixBase_zero_int_int.cpp * Output: \verbinclude MatrixBase_zero_int_int.out * * \sa Zero(), Zero(Index) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Zero(Index rows, Index cols) { return Constant(rows, cols, Scalar(0)); } /** \returns an expression of a zero vector. * * The parameter \a size is the size of the returned vector. * Must be compatible with this MatrixBase type. * * \only_for_vectors * * This variant is meant to be used for dynamic-size vector types. For fixed-size types, * it is redundant to pass \a size as argument, so Zero() should be used * instead. * * Example: \include MatrixBase_zero_int.cpp * Output: \verbinclude MatrixBase_zero_int.out * * \sa Zero(), Zero(Index,Index) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Zero(Index size) { return Constant(size, Scalar(0)); } /** \returns an expression of a fixed-size zero matrix or vector. * * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you * need to use the variants taking size arguments. * * Example: \include MatrixBase_zero.cpp * Output: \verbinclude MatrixBase_zero.out * * \sa Zero(Index), Zero(Index,Index) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Zero() { return Constant(Scalar(0)); } /** \returns true if *this is approximately equal to the zero matrix, * within the precision given by \a prec. * * Example: \include MatrixBase_isZero.cpp * Output: \verbinclude MatrixBase_isZero.out * * \sa class CwiseNullaryOp, Zero() */ template EIGEN_DEVICE_FUNC bool DenseBase::isZero(const RealScalar& prec) const { typename internal::nested_eval::type self(derived()); for(Index j = 0; j < cols(); ++j) for(Index i = 0; i < rows(); ++i) if(!internal::isMuchSmallerThan(self.coeff(i, j), static_cast(1), prec)) return false; return true; } /** Sets all coefficients in this expression to zero. * * Example: \include MatrixBase_setZero.cpp * Output: \verbinclude MatrixBase_setZero.out * * \sa class CwiseNullaryOp, Zero() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::setZero() { return setConstant(Scalar(0)); } /** Resizes to the given \a size, and sets all coefficients in this expression to zero. * * \only_for_vectors * * Example: \include Matrix_setZero_int.cpp * Output: \verbinclude Matrix_setZero_int.out * * \sa DenseBase::setZero(), setZero(Index,Index), class CwiseNullaryOp, DenseBase::Zero() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setZero(Index newSize) { resize(newSize); return setConstant(Scalar(0)); } /** Resizes to the given size, and sets all coefficients in this expression to zero. * * \param rows the new number of rows * \param cols the new number of columns * * Example: \include Matrix_setZero_int_int.cpp * Output: \verbinclude Matrix_setZero_int_int.out * * \sa DenseBase::setZero(), setZero(Index), class CwiseNullaryOp, DenseBase::Zero() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setZero(Index rows, Index cols) { resize(rows, cols); return setConstant(Scalar(0)); } /** Resizes to the given size, changing only the number of columns, and sets all * coefficients in this expression to zero. For the parameter of type NoChange_t, * just pass the special value \c NoChange. * * \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(Index, NoChange_t), class CwiseNullaryOp, DenseBase::Zero() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setZero(NoChange_t, Index cols) { return setZero(rows(), cols); } /** Resizes to the given size, changing only the number of rows, and sets all * coefficients in this expression to zero. For the parameter of type NoChange_t, * just pass the special value \c NoChange. * * \sa DenseBase::setZero(), setZero(Index), setZero(Index, Index), setZero(NoChange_t, Index), class CwiseNullaryOp, DenseBase::Zero() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setZero(Index rows, NoChange_t) { return setZero(rows, cols()); } // ones: /** \returns an expression of a matrix where all coefficients equal one. * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this MatrixBase type. * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Ones() should be used * instead. * * Example: \include MatrixBase_ones_int_int.cpp * Output: \verbinclude MatrixBase_ones_int_int.out * * \sa Ones(), Ones(Index), isOnes(), class Ones */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Ones(Index rows, Index cols) { return Constant(rows, cols, Scalar(1)); } /** \returns an expression of a vector where all coefficients equal one. * * The parameter \a newSize is the size of the returned vector. * Must be compatible with this MatrixBase type. * * \only_for_vectors * * This variant is meant to be used for dynamic-size vector types. For fixed-size types, * it is redundant to pass \a size as argument, so Ones() should be used * instead. * * Example: \include MatrixBase_ones_int.cpp * Output: \verbinclude MatrixBase_ones_int.out * * \sa Ones(), Ones(Index,Index), isOnes(), class Ones */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Ones(Index newSize) { return Constant(newSize, Scalar(1)); } /** \returns an expression of a fixed-size matrix or vector where all coefficients equal one. * * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you * need to use the variants taking size arguments. * * Example: \include MatrixBase_ones.cpp * Output: \verbinclude MatrixBase_ones.out * * \sa Ones(Index), Ones(Index,Index), isOnes(), class Ones */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename DenseBase::ConstantReturnType DenseBase::Ones() { return Constant(Scalar(1)); } /** \returns true if *this is approximately equal to the matrix where all coefficients * are equal to 1, within the precision given by \a prec. * * Example: \include MatrixBase_isOnes.cpp * Output: \verbinclude MatrixBase_isOnes.out * * \sa class CwiseNullaryOp, Ones() */ template EIGEN_DEVICE_FUNC bool DenseBase::isOnes (const RealScalar& prec) const { return isApproxToConstant(Scalar(1), prec); } /** Sets all coefficients in this expression to one. * * Example: \include MatrixBase_setOnes.cpp * Output: \verbinclude MatrixBase_setOnes.out * * \sa class CwiseNullaryOp, Ones() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase::setOnes() { return setConstant(Scalar(1)); } /** Resizes to the given \a newSize, and sets all coefficients in this expression to one. * * \only_for_vectors * * Example: \include Matrix_setOnes_int.cpp * Output: \verbinclude Matrix_setOnes_int.out * * \sa MatrixBase::setOnes(), setOnes(Index,Index), class CwiseNullaryOp, MatrixBase::Ones() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setOnes(Index newSize) { resize(newSize); return setConstant(Scalar(1)); } /** Resizes to the given size, and sets all coefficients in this expression to one. * * \param rows the new number of rows * \param cols the new number of columns * * Example: \include Matrix_setOnes_int_int.cpp * Output: \verbinclude Matrix_setOnes_int_int.out * * \sa MatrixBase::setOnes(), setOnes(Index), class CwiseNullaryOp, MatrixBase::Ones() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setOnes(Index rows, Index cols) { resize(rows, cols); return setConstant(Scalar(1)); } /** Resizes to the given size, changing only the number of rows, and sets all * coefficients in this expression to one. For the parameter of type NoChange_t, * just pass the special value \c NoChange. * * \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(NoChange_t, Index), class CwiseNullaryOp, MatrixBase::Ones() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setOnes(Index rows, NoChange_t) { return setOnes(rows, cols()); } /** Resizes to the given size, changing only the number of columns, and sets all * coefficients in this expression to one. For the parameter of type NoChange_t, * just pass the special value \c NoChange. * * \sa MatrixBase::setOnes(), setOnes(Index), setOnes(Index, Index), setOnes(Index, NoChange_t) class CwiseNullaryOp, MatrixBase::Ones() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& PlainObjectBase::setOnes(NoChange_t, Index cols) { return setOnes(rows(), cols); } // Identity: /** \returns an expression of the identity matrix (not necessarily square). * * The parameters \a rows and \a cols are the number of rows and of columns of * the returned matrix. Must be compatible with this MatrixBase type. * * This variant is meant to be used for dynamic-size matrix types. For fixed-size types, * it is redundant to pass \a rows and \a cols as arguments, so Identity() should be used * instead. * * Example: \include MatrixBase_identity_int_int.cpp * Output: \verbinclude MatrixBase_identity_int_int.out * * \sa Identity(), setIdentity(), isIdentity() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::IdentityReturnType MatrixBase::Identity(Index rows, Index cols) { return DenseBase::NullaryExpr(rows, cols, internal::scalar_identity_op()); } /** \returns an expression of the identity matrix (not necessarily square). * * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you * need to use the variant taking size arguments. * * Example: \include MatrixBase_identity.cpp * Output: \verbinclude MatrixBase_identity.out * * \sa Identity(Index,Index), setIdentity(), isIdentity() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::IdentityReturnType MatrixBase::Identity() { EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived) return MatrixBase::NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_identity_op()); } /** \returns true if *this is approximately equal to the identity matrix * (not necessarily square), * within the precision given by \a prec. * * Example: \include MatrixBase_isIdentity.cpp * Output: \verbinclude MatrixBase_isIdentity.out * * \sa class CwiseNullaryOp, Identity(), Identity(Index,Index), setIdentity() */ template bool MatrixBase::isIdentity (const RealScalar& prec) const { typename internal::nested_eval::type self(derived()); for(Index j = 0; j < cols(); ++j) { for(Index i = 0; i < rows(); ++i) { if(i == j) { if(!internal::isApprox(self.coeff(i, j), static_cast(1), prec)) return false; } else { if(!internal::isMuchSmallerThan(self.coeff(i, j), static_cast(1), prec)) return false; } } } return true; } namespace internal { template=16)> struct setIdentity_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Derived& run(Derived& m) { return m = Derived::Identity(m.rows(), m.cols()); } }; template struct setIdentity_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE Derived& run(Derived& m) { m.setZero(); const Index size = numext::mini(m.rows(), m.cols()); for(Index i = 0; i < size; ++i) m.coeffRef(i,i) = typename Derived::Scalar(1); return m; } }; } // end namespace internal /** Writes the identity expression (not necessarily square) into *this. * * Example: \include MatrixBase_setIdentity.cpp * Output: \verbinclude MatrixBase_setIdentity.out * * \sa class CwiseNullaryOp, Identity(), Identity(Index,Index), isIdentity() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase::setIdentity() { return internal::setIdentity_impl::run(derived()); } /** \brief Resizes to the given size, and writes the identity expression (not necessarily square) into *this. * * \param rows the new number of rows * \param cols the new number of columns * * Example: \include Matrix_setIdentity_int_int.cpp * Output: \verbinclude Matrix_setIdentity_int_int.out * * \sa MatrixBase::setIdentity(), class CwiseNullaryOp, MatrixBase::Identity() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase::setIdentity(Index rows, Index cols) { derived().resize(rows, cols); return setIdentity(); } /** \returns an expression of the i-th unit (basis) vector. * * \only_for_vectors * * \sa MatrixBase::Unit(Index), MatrixBase::UnitX(), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::Unit(Index newSize, Index i) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return BasisReturnType(SquareMatrixType::Identity(newSize,newSize), i); } /** \returns an expression of the i-th unit (basis) vector. * * \only_for_vectors * * This variant is for fixed-size vector only. * * \sa MatrixBase::Unit(Index,Index), MatrixBase::UnitX(), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::Unit(Index i) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived) return BasisReturnType(SquareMatrixType::Identity(),i); } /** \returns an expression of the X axis unit vector (1{,0}^*) * * \only_for_vectors * * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::UnitX() { return Derived::Unit(0); } /** \returns an expression of the Y axis unit vector (0,1{,0}^*) * * \only_for_vectors * * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::UnitY() { return Derived::Unit(1); } /** \returns an expression of the Z axis unit vector (0,0,1{,0}^*) * * \only_for_vectors * * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::UnitZ() { return Derived::Unit(2); } /** \returns an expression of the W axis unit vector (0,0,0,1) * * \only_for_vectors * * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase::BasisReturnType MatrixBase::UnitW() { return Derived::Unit(3); } /** \brief Set the coefficients of \c *this to the i-th unit (basis) vector * * \param i index of the unique coefficient to be set to 1 * * \only_for_vectors * * \sa MatrixBase::setIdentity(), class CwiseNullaryOp, MatrixBase::Unit(Index,Index) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase::setUnit(Index i) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived); eigen_assert(i EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& MatrixBase::setUnit(Index newSize, Index i) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived); eigen_assert(i // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INVERSE_H #define EIGEN_INVERSE_H namespace Eigen { template class InverseImpl; namespace internal { template struct traits > : traits { typedef typename XprType::PlainObject PlainObject; typedef traits BaseTraits; enum { Flags = BaseTraits::Flags & RowMajorBit }; }; } // end namespace internal /** \class Inverse * * \brief Expression of the inverse of another expression * * \tparam XprType the type of the expression we are taking the inverse * * This class represents an abstract expression of A.inverse() * and most of the time this is the only way it is used. * */ template class Inverse : public InverseImpl::StorageKind> { public: typedef typename XprType::StorageIndex StorageIndex; typedef typename XprType::Scalar Scalar; typedef typename internal::ref_selector::type XprTypeNested; typedef typename internal::remove_all::type XprTypeNestedCleaned; typedef typename internal::ref_selector::type Nested; typedef typename internal::remove_all::type NestedExpression; explicit EIGEN_DEVICE_FUNC Inverse(const XprType &xpr) : m_xpr(xpr) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_xpr.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_xpr.rows(); } EIGEN_DEVICE_FUNC const XprTypeNestedCleaned& nestedExpression() const { return m_xpr; } protected: XprTypeNested m_xpr; }; // Generic API dispatcher template class InverseImpl : public internal::generic_xpr_base >::type { public: typedef typename internal::generic_xpr_base >::type Base; typedef typename XprType::Scalar Scalar; private: Scalar coeff(Index row, Index col) const; Scalar coeff(Index i) const; }; namespace internal { /** \internal * \brief Default evaluator for Inverse expression. * * This default evaluator for Inverse expression simply evaluate the inverse into a temporary * by a call to internal::call_assignment_no_alias. * Therefore, inverse implementers only have to specialize Assignment, ...> for * there own nested expression. * * \sa class Inverse */ template struct unary_evaluator > : public evaluator::PlainObject> { typedef Inverse InverseType; typedef typename InverseType::PlainObject PlainObject; typedef evaluator Base; enum { Flags = Base::Flags | EvalBeforeNestingBit }; unary_evaluator(const InverseType& inv_xpr) : m_result(inv_xpr.rows(), inv_xpr.cols()) { ::new (static_cast(this)) Base(m_result); internal::call_assignment_no_alias(m_result, inv_xpr); } protected: PlainObject m_result; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_INVERSE_H RcppEigen/inst/include/Eigen/src/Core/TriangularMatrix.h0000644000176200001440000011260514562417221022713 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Benoit Jacob // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRIANGULARMATRIX_H #define EIGEN_TRIANGULARMATRIX_H namespace Eigen { namespace internal { template struct triangular_solve_retval; } /** \class TriangularBase * \ingroup Core_Module * * \brief Base class for triangular part in a matrix */ template class TriangularBase : public EigenBase { public: enum { Mode = internal::traits::Mode, RowsAtCompileTime = internal::traits::RowsAtCompileTime, ColsAtCompileTime = internal::traits::ColsAtCompileTime, MaxRowsAtCompileTime = internal::traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = internal::traits::MaxColsAtCompileTime, SizeAtCompileTime = (internal::size_at_compile_time::RowsAtCompileTime, internal::traits::ColsAtCompileTime>::ret), /**< This is equal to the number of coefficients, i.e. the number of * rows times the number of columns, or to \a Dynamic if this is not * known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */ MaxSizeAtCompileTime = (internal::size_at_compile_time::MaxRowsAtCompileTime, internal::traits::MaxColsAtCompileTime>::ret) }; typedef typename internal::traits::Scalar Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::StorageIndex StorageIndex; typedef typename internal::traits::FullMatrixType DenseMatrixType; typedef DenseMatrixType DenseType; typedef Derived const& Nested; EIGEN_DEVICE_FUNC inline TriangularBase() { eigen_assert(!((int(Mode) & int(UnitDiag)) && (int(Mode) & int(ZeroDiag)))); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return derived().rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return derived().cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return derived().outerStride(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return derived().innerStride(); } // dummy resize function EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { EIGEN_UNUSED_VARIABLE(rows); EIGEN_UNUSED_VARIABLE(cols); eigen_assert(rows==this->rows() && cols==this->cols()); } EIGEN_DEVICE_FUNC inline Scalar coeff(Index row, Index col) const { return derived().coeff(row,col); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col) { return derived().coeffRef(row,col); } /** \see MatrixBase::copyCoeff(row,col) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void copyCoeff(Index row, Index col, Other& other) { derived().coeffRef(row, col) = other.coeff(row, col); } EIGEN_DEVICE_FUNC inline Scalar operator()(Index row, Index col) const { check_coordinates(row, col); return coeff(row,col); } EIGEN_DEVICE_FUNC inline Scalar& operator()(Index row, Index col) { check_coordinates(row, col); return coeffRef(row,col); } #ifndef EIGEN_PARSED_BY_DOXYGEN EIGEN_DEVICE_FUNC inline const Derived& derived() const { return *static_cast(this); } EIGEN_DEVICE_FUNC inline Derived& derived() { return *static_cast(this); } #endif // not EIGEN_PARSED_BY_DOXYGEN template EIGEN_DEVICE_FUNC void evalTo(MatrixBase &other) const; template EIGEN_DEVICE_FUNC void evalToLazy(MatrixBase &other) const; EIGEN_DEVICE_FUNC DenseMatrixType toDenseMatrix() const { DenseMatrixType res(rows(), cols()); evalToLazy(res); return res; } protected: void check_coordinates(Index row, Index col) const { EIGEN_ONLY_USED_FOR_DEBUG(row); EIGEN_ONLY_USED_FOR_DEBUG(col); eigen_assert(col>=0 && col=0 && row=row) || (mode==Lower && col<=row) || ((mode==StrictlyUpper || mode==UnitUpper) && col>row) || ((mode==StrictlyLower || mode==UnitLower) && col struct traits > : traits { typedef typename ref_selector::non_const_type MatrixTypeNested; typedef typename remove_reference::type MatrixTypeNestedNonRef; typedef typename remove_all::type MatrixTypeNestedCleaned; typedef typename MatrixType::PlainObject FullMatrixType; typedef MatrixType ExpressionType; enum { Mode = _Mode, FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, Flags = (MatrixTypeNestedCleaned::Flags & (HereditaryBits | FlagsLvalueBit) & (~(PacketAccessBit | DirectAccessBit | LinearAccessBit))) }; }; } template class TriangularViewImpl; template class TriangularView : public TriangularViewImpl<_MatrixType, _Mode, typename internal::traits<_MatrixType>::StorageKind > { public: typedef TriangularViewImpl<_MatrixType, _Mode, typename internal::traits<_MatrixType>::StorageKind > Base; typedef typename internal::traits::Scalar Scalar; typedef _MatrixType MatrixType; protected: typedef typename internal::traits::MatrixTypeNested MatrixTypeNested; typedef typename internal::traits::MatrixTypeNestedNonRef MatrixTypeNestedNonRef; typedef typename internal::remove_all::type MatrixConjugateReturnType; typedef TriangularView::type, _Mode> ConstTriangularView; public: typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::MatrixTypeNestedCleaned NestedExpression; enum { Mode = _Mode, Flags = internal::traits::Flags, TransposeMode = (Mode & Upper ? Lower : 0) | (Mode & Lower ? Upper : 0) | (Mode & (UnitDiag)) | (Mode & (ZeroDiag)), IsVectorAtCompileTime = false }; EIGEN_DEVICE_FUNC explicit inline TriangularView(MatrixType& matrix) : m_matrix(matrix) {} EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TriangularView) /** \copydoc EigenBase::rows() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); } /** \copydoc EigenBase::cols() */ EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); } /** \returns a const reference to the nested expression */ EIGEN_DEVICE_FUNC const NestedExpression& nestedExpression() const { return m_matrix; } /** \returns a reference to the nested expression */ EIGEN_DEVICE_FUNC NestedExpression& nestedExpression() { return m_matrix; } typedef TriangularView ConjugateReturnType; /** \sa MatrixBase::conjugate() const */ EIGEN_DEVICE_FUNC inline const ConjugateReturnType conjugate() const { return ConjugateReturnType(m_matrix.conjugate()); } /** \returns an expression of the complex conjugate of \c *this if Cond==true, * returns \c *this otherwise. */ template EIGEN_DEVICE_FUNC inline typename internal::conditional::type conjugateIf() const { typedef typename internal::conditional::type ReturnType; return ReturnType(m_matrix.template conjugateIf()); } typedef TriangularView AdjointReturnType; /** \sa MatrixBase::adjoint() const */ EIGEN_DEVICE_FUNC inline const AdjointReturnType adjoint() const { return AdjointReturnType(m_matrix.adjoint()); } typedef TriangularView TransposeReturnType; /** \sa MatrixBase::transpose() */ EIGEN_DEVICE_FUNC inline TransposeReturnType transpose() { EIGEN_STATIC_ASSERT_LVALUE(MatrixType) typename MatrixType::TransposeReturnType tmp(m_matrix); return TransposeReturnType(tmp); } typedef TriangularView ConstTransposeReturnType; /** \sa MatrixBase::transpose() const */ EIGEN_DEVICE_FUNC inline const ConstTransposeReturnType transpose() const { return ConstTransposeReturnType(m_matrix.transpose()); } template EIGEN_DEVICE_FUNC inline const Solve solve(const MatrixBase& other) const { return Solve(*this, other.derived()); } // workaround MSVC ICE #if EIGEN_COMP_MSVC template EIGEN_DEVICE_FUNC inline const internal::triangular_solve_retval solve(const MatrixBase& other) const { return Base::template solve(other); } #else using Base::solve; #endif /** \returns a selfadjoint view of the referenced triangular part which must be either \c #Upper or \c #Lower. * * This is a shortcut for \code this->nestedExpression().selfadjointView<(*this)::Mode>() \endcode * \sa MatrixBase::selfadjointView() */ EIGEN_DEVICE_FUNC SelfAdjointView selfadjointView() { EIGEN_STATIC_ASSERT((Mode&(UnitDiag|ZeroDiag))==0,PROGRAMMING_ERROR); return SelfAdjointView(m_matrix); } /** This is the const version of selfadjointView() */ EIGEN_DEVICE_FUNC const SelfAdjointView selfadjointView() const { EIGEN_STATIC_ASSERT((Mode&(UnitDiag|ZeroDiag))==0,PROGRAMMING_ERROR); return SelfAdjointView(m_matrix); } /** \returns the determinant of the triangular matrix * \sa MatrixBase::determinant() */ EIGEN_DEVICE_FUNC Scalar determinant() const { if (Mode & UnitDiag) return 1; else if (Mode & ZeroDiag) return 0; else return m_matrix.diagonal().prod(); } protected: MatrixTypeNested m_matrix; }; /** \ingroup Core_Module * * \brief Base class for a triangular part in a \b dense matrix * * This class is an abstract base class of class TriangularView, and objects of type TriangularViewImpl cannot be instantiated. * It extends class TriangularView with additional methods which available for dense expressions only. * * \sa class TriangularView, MatrixBase::triangularView() */ template class TriangularViewImpl<_MatrixType,_Mode,Dense> : public TriangularBase > { public: typedef TriangularView<_MatrixType, _Mode> TriangularViewType; typedef TriangularBase Base; typedef typename internal::traits::Scalar Scalar; typedef _MatrixType MatrixType; typedef typename MatrixType::PlainObject DenseMatrixType; typedef DenseMatrixType PlainObject; public: using Base::evalToLazy; using Base::derived; typedef typename internal::traits::StorageKind StorageKind; enum { Mode = _Mode, Flags = internal::traits::Flags }; /** \returns the outer-stride of the underlying dense matrix * \sa DenseCoeffsBase::outerStride() */ EIGEN_DEVICE_FUNC inline Index outerStride() const { return derived().nestedExpression().outerStride(); } /** \returns the inner-stride of the underlying dense matrix * \sa DenseCoeffsBase::innerStride() */ EIGEN_DEVICE_FUNC inline Index innerStride() const { return derived().nestedExpression().innerStride(); } /** \sa MatrixBase::operator+=() */ template EIGEN_DEVICE_FUNC TriangularViewType& operator+=(const DenseBase& other) { internal::call_assignment_no_alias(derived(), other.derived(), internal::add_assign_op()); return derived(); } /** \sa MatrixBase::operator-=() */ template EIGEN_DEVICE_FUNC TriangularViewType& operator-=(const DenseBase& other) { internal::call_assignment_no_alias(derived(), other.derived(), internal::sub_assign_op()); return derived(); } /** \sa MatrixBase::operator*=() */ EIGEN_DEVICE_FUNC TriangularViewType& operator*=(const typename internal::traits::Scalar& other) { return *this = derived().nestedExpression() * other; } /** \sa DenseBase::operator/=() */ EIGEN_DEVICE_FUNC TriangularViewType& operator/=(const typename internal::traits::Scalar& other) { return *this = derived().nestedExpression() / other; } /** \sa MatrixBase::fill() */ EIGEN_DEVICE_FUNC void fill(const Scalar& value) { setConstant(value); } /** \sa MatrixBase::setConstant() */ EIGEN_DEVICE_FUNC TriangularViewType& setConstant(const Scalar& value) { return *this = MatrixType::Constant(derived().rows(), derived().cols(), value); } /** \sa MatrixBase::setZero() */ EIGEN_DEVICE_FUNC TriangularViewType& setZero() { return setConstant(Scalar(0)); } /** \sa MatrixBase::setOnes() */ EIGEN_DEVICE_FUNC TriangularViewType& setOnes() { return setConstant(Scalar(1)); } /** \sa MatrixBase::coeff() * \warning the coordinates must fit into the referenced triangular part */ EIGEN_DEVICE_FUNC inline Scalar coeff(Index row, Index col) const { Base::check_coordinates_internal(row, col); return derived().nestedExpression().coeff(row, col); } /** \sa MatrixBase::coeffRef() * \warning the coordinates must fit into the referenced triangular part */ EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col) { EIGEN_STATIC_ASSERT_LVALUE(TriangularViewType); Base::check_coordinates_internal(row, col); return derived().nestedExpression().coeffRef(row, col); } /** Assigns a triangular matrix to a triangular part of a dense matrix */ template EIGEN_DEVICE_FUNC TriangularViewType& operator=(const TriangularBase& other); /** Shortcut for\code *this = other.other.triangularView<(*this)::Mode>() \endcode */ template EIGEN_DEVICE_FUNC TriangularViewType& operator=(const MatrixBase& other); #ifndef EIGEN_PARSED_BY_DOXYGEN EIGEN_DEVICE_FUNC TriangularViewType& operator=(const TriangularViewImpl& other) { return *this = other.derived().nestedExpression(); } template /** \deprecated */ EIGEN_DEPRECATED EIGEN_DEVICE_FUNC void lazyAssign(const TriangularBase& other); template /** \deprecated */ EIGEN_DEPRECATED EIGEN_DEVICE_FUNC void lazyAssign(const MatrixBase& other); #endif /** Efficient triangular matrix times vector/matrix product */ template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase& rhs) const { return Product(derived(), rhs.derived()); } /** Efficient vector/matrix times triangular matrix product */ template friend EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase& lhs, const TriangularViewImpl& rhs) { return Product(lhs.derived(),rhs.derived()); } /** \returns the product of the inverse of \c *this with \a other, \a *this being triangular. * * This function computes the inverse-matrix matrix product inverse(\c *this) * \a other if * \a Side==OnTheLeft (the default), or the right-inverse-multiply \a other * inverse(\c *this) if * \a Side==OnTheRight. * * Note that the template parameter \c Side can be omitted, in which case \c Side==OnTheLeft * * The matrix \c *this must be triangular and invertible (i.e., all the coefficients of the * diagonal must be non zero). It works as a forward (resp. backward) substitution if \c *this * is an upper (resp. lower) triangular matrix. * * Example: \include Triangular_solve.cpp * Output: \verbinclude Triangular_solve.out * * This function returns an expression of the inverse-multiply and can works in-place if it is assigned * to the same matrix or vector \a other. * * For users coming from BLAS, this function (and more specifically solveInPlace()) offer * all the operations supported by the \c *TRSV and \c *TRSM BLAS routines. * * \sa TriangularView::solveInPlace() */ template inline const internal::triangular_solve_retval solve(const MatrixBase& other) const; /** "in-place" version of TriangularView::solve() where the result is written in \a other * * \warning The parameter is only marked 'const' to make the C++ compiler accept a temporary expression here. * This function will const_cast it, so constness isn't honored here. * * Note that the template parameter \c Side can be omitted, in which case \c Side==OnTheLeft * * See TriangularView:solve() for the details. */ template EIGEN_DEVICE_FUNC void solveInPlace(const MatrixBase& other) const; template EIGEN_DEVICE_FUNC void solveInPlace(const MatrixBase& other) const { return solveInPlace(other); } /** Swaps the coefficients of the common triangular parts of two matrices */ template EIGEN_DEVICE_FUNC #ifdef EIGEN_PARSED_BY_DOXYGEN void swap(TriangularBase &other) #else void swap(TriangularBase const & other) #endif { EIGEN_STATIC_ASSERT_LVALUE(OtherDerived); call_assignment(derived(), other.const_cast_derived(), internal::swap_assign_op()); } /** Shortcut for \code (*this).swap(other.triangularView<(*this)::Mode>()) \endcode */ template /** \deprecated */ EIGEN_DEPRECATED EIGEN_DEVICE_FUNC void swap(MatrixBase const & other) { EIGEN_STATIC_ASSERT_LVALUE(OtherDerived); call_assignment(derived(), other.const_cast_derived(), internal::swap_assign_op()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _solve_impl(const RhsType &rhs, DstType &dst) const { if(!internal::is_same_dense(dst,rhs)) dst = rhs; this->solveInPlace(dst); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TriangularViewType& _assignProduct(const ProductType& prod, const Scalar& alpha, bool beta); protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(TriangularViewImpl) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TriangularViewImpl) }; /*************************************************************************** * Implementation of triangular evaluation/assignment ***************************************************************************/ #ifndef EIGEN_PARSED_BY_DOXYGEN // FIXME should we keep that possibility template template EIGEN_DEVICE_FUNC inline TriangularView& TriangularViewImpl::operator=(const MatrixBase& other) { internal::call_assignment_no_alias(derived(), other.derived(), internal::assign_op()); return derived(); } // FIXME should we keep that possibility template template EIGEN_DEVICE_FUNC void TriangularViewImpl::lazyAssign(const MatrixBase& other) { internal::call_assignment_no_alias(derived(), other.template triangularView()); } template template EIGEN_DEVICE_FUNC inline TriangularView& TriangularViewImpl::operator=(const TriangularBase& other) { eigen_assert(Mode == int(OtherDerived::Mode)); internal::call_assignment(derived(), other.derived()); return derived(); } template template EIGEN_DEVICE_FUNC void TriangularViewImpl::lazyAssign(const TriangularBase& other) { eigen_assert(Mode == int(OtherDerived::Mode)); internal::call_assignment_no_alias(derived(), other.derived()); } #endif /*************************************************************************** * Implementation of TriangularBase methods ***************************************************************************/ /** Assigns a triangular or selfadjoint matrix to a dense matrix. * If the matrix is triangular, the opposite part is set to zero. */ template template EIGEN_DEVICE_FUNC void TriangularBase::evalTo(MatrixBase &other) const { evalToLazy(other.derived()); } /*************************************************************************** * Implementation of TriangularView methods ***************************************************************************/ /*************************************************************************** * Implementation of MatrixBase methods ***************************************************************************/ /** * \returns an expression of a triangular view extracted from the current matrix * * The parameter \a Mode can have the following values: \c #Upper, \c #StrictlyUpper, \c #UnitUpper, * \c #Lower, \c #StrictlyLower, \c #UnitLower. * * Example: \include MatrixBase_triangularView.cpp * Output: \verbinclude MatrixBase_triangularView.out * * \sa class TriangularView */ template template EIGEN_DEVICE_FUNC typename MatrixBase::template TriangularViewReturnType::Type MatrixBase::triangularView() { return typename TriangularViewReturnType::Type(derived()); } /** This is the const version of MatrixBase::triangularView() */ template template EIGEN_DEVICE_FUNC typename MatrixBase::template ConstTriangularViewReturnType::Type MatrixBase::triangularView() const { return typename ConstTriangularViewReturnType::Type(derived()); } /** \returns true if *this is approximately equal to an upper triangular matrix, * within the precision given by \a prec. * * \sa isLowerTriangular() */ template bool MatrixBase::isUpperTriangular(const RealScalar& prec) const { RealScalar maxAbsOnUpperPart = static_cast(-1); for(Index j = 0; j < cols(); ++j) { Index maxi = numext::mini(j, rows()-1); for(Index i = 0; i <= maxi; ++i) { RealScalar absValue = numext::abs(coeff(i,j)); if(absValue > maxAbsOnUpperPart) maxAbsOnUpperPart = absValue; } } RealScalar threshold = maxAbsOnUpperPart * prec; for(Index j = 0; j < cols(); ++j) for(Index i = j+1; i < rows(); ++i) if(numext::abs(coeff(i, j)) > threshold) return false; return true; } /** \returns true if *this is approximately equal to a lower triangular matrix, * within the precision given by \a prec. * * \sa isUpperTriangular() */ template bool MatrixBase::isLowerTriangular(const RealScalar& prec) const { RealScalar maxAbsOnLowerPart = static_cast(-1); for(Index j = 0; j < cols(); ++j) for(Index i = j; i < rows(); ++i) { RealScalar absValue = numext::abs(coeff(i,j)); if(absValue > maxAbsOnLowerPart) maxAbsOnLowerPart = absValue; } RealScalar threshold = maxAbsOnLowerPart * prec; for(Index j = 1; j < cols(); ++j) { Index maxi = numext::mini(j, rows()-1); for(Index i = 0; i < maxi; ++i) if(numext::abs(coeff(i, j)) > threshold) return false; } return true; } /*************************************************************************** **************************************************************************** * Evaluators and Assignment of triangular expressions *************************************************************************** ***************************************************************************/ namespace internal { // TODO currently a triangular expression has the form TriangularView<.,.> // in the future triangular-ness should be defined by the expression traits // such that Transpose > is valid. (currently TriangularBase::transpose() is overloaded to make it work) template struct evaluator_traits > { typedef typename storage_kind_to_evaluator_kind::Kind Kind; typedef typename glue_shapes::Shape, TriangularShape>::type Shape; }; template struct unary_evaluator, IndexBased> : evaluator::type> { typedef TriangularView XprType; typedef evaluator::type> Base; EIGEN_DEVICE_FUNC unary_evaluator(const XprType &xpr) : Base(xpr.nestedExpression()) {} }; // Additional assignment kinds: struct Triangular2Triangular {}; struct Triangular2Dense {}; struct Dense2Triangular {}; template struct triangular_assignment_loop; /** \internal Specialization of the dense assignment kernel for triangular matrices. * The main difference is that the triangular, diagonal, and opposite parts are processed through three different functions. * \tparam UpLo must be either Lower or Upper * \tparam Mode must be either 0, UnitDiag, ZeroDiag, or SelfAdjoint */ template class triangular_dense_assignment_kernel : public generic_dense_assignment_kernel { protected: typedef generic_dense_assignment_kernel Base; typedef typename Base::DstXprType DstXprType; typedef typename Base::SrcXprType SrcXprType; using Base::m_dst; using Base::m_src; using Base::m_functor; public: typedef typename Base::DstEvaluatorType DstEvaluatorType; typedef typename Base::SrcEvaluatorType SrcEvaluatorType; typedef typename Base::Scalar Scalar; typedef typename Base::AssignmentTraits AssignmentTraits; EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr) : Base(dst, src, func, dstExpr) {} #ifdef EIGEN_INTERNAL_DEBUGGING EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col) { eigen_internal_assert(row!=col); Base::assignCoeff(row,col); } #else using Base::assignCoeff; #endif EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id) { if(Mode==UnitDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(1)); else if(Mode==ZeroDiag && SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(id,id), Scalar(0)); else if(Mode==0) Base::assignCoeff(id,id); } EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index row, Index col) { eigen_internal_assert(row!=col); if(SetOpposite) m_functor.assignCoeff(m_dst.coeffRef(row,col), Scalar(0)); } }; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_triangular_assignment_loop(DstXprType& dst, const SrcXprType& src, const Functor &func) { typedef evaluator DstEvaluatorType; typedef evaluator SrcEvaluatorType; SrcEvaluatorType srcEvaluator(src); Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); DstEvaluatorType dstEvaluator(dst); typedef triangular_dense_assignment_kernel< Mode&(Lower|Upper),Mode&(UnitDiag|ZeroDiag|SelfAdjoint),SetOpposite, DstEvaluatorType,SrcEvaluatorType,Functor> Kernel; Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived()); enum { unroll = DstXprType::SizeAtCompileTime != Dynamic && SrcEvaluatorType::CoeffReadCost < HugeCost && DstXprType::SizeAtCompileTime * (int(DstEvaluatorType::CoeffReadCost) + int(SrcEvaluatorType::CoeffReadCost)) / 2 <= EIGEN_UNROLLING_LIMIT }; triangular_assignment_loop::run(kernel); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void call_triangular_assignment_loop(DstXprType& dst, const SrcXprType& src) { call_triangular_assignment_loop(dst, src, internal::assign_op()); } template<> struct AssignmentKind { typedef Triangular2Triangular Kind; }; template<> struct AssignmentKind { typedef Triangular2Dense Kind; }; template<> struct AssignmentKind { typedef Dense2Triangular Kind; }; template< typename DstXprType, typename SrcXprType, typename Functor> struct Assignment { EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func) { eigen_assert(int(DstXprType::Mode) == int(SrcXprType::Mode)); call_triangular_assignment_loop(dst, src, func); } }; template< typename DstXprType, typename SrcXprType, typename Functor> struct Assignment { EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func) { call_triangular_assignment_loop(dst, src, func); } }; template< typename DstXprType, typename SrcXprType, typename Functor> struct Assignment { EIGEN_DEVICE_FUNC static void run(DstXprType &dst, const SrcXprType &src, const Functor &func) { call_triangular_assignment_loop(dst, src, func); } }; template struct triangular_assignment_loop { // FIXME: this is not very clean, perhaps this information should be provided by the kernel? typedef typename Kernel::DstEvaluatorType DstEvaluatorType; typedef typename DstEvaluatorType::XprType DstXprType; enum { col = (UnrollCount-1) / DstXprType::RowsAtCompileTime, row = (UnrollCount-1) % DstXprType::RowsAtCompileTime }; typedef typename Kernel::Scalar Scalar; EIGEN_DEVICE_FUNC static inline void run(Kernel &kernel) { triangular_assignment_loop::run(kernel); if(row==col) kernel.assignDiagonalCoeff(row); else if( ((Mode&Lower) && row>col) || ((Mode&Upper) && row struct triangular_assignment_loop { EIGEN_DEVICE_FUNC static inline void run(Kernel &) {} }; // TODO: experiment with a recursive assignment procedure splitting the current // triangular part into one rectangular and two triangular parts. template struct triangular_assignment_loop { typedef typename Kernel::Scalar Scalar; EIGEN_DEVICE_FUNC static inline void run(Kernel &kernel) { for(Index j = 0; j < kernel.cols(); ++j) { Index maxi = numext::mini(j, kernel.rows()); Index i = 0; if (((Mode&Lower) && SetOpposite) || (Mode&Upper)) { for(; i < maxi; ++i) if(Mode&Upper) kernel.assignCoeff(i, j); else kernel.assignOppositeCoeff(i, j); } else i = maxi; if(i template EIGEN_DEVICE_FUNC void TriangularBase::evalToLazy(MatrixBase &other) const { other.derived().resize(this->rows(), this->cols()); internal::call_triangular_assignment_loop(other.derived(), derived().nestedExpression()); } namespace internal { // Triangular = Product template< typename DstXprType, typename Lhs, typename Rhs, typename Scalar> struct Assignment, internal::assign_op::Scalar>, Dense2Triangular> { typedef Product SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op &) { Index dstRows = src.rows(); Index dstCols = src.cols(); if((dst.rows()!=dstRows) || (dst.cols()!=dstCols)) dst.resize(dstRows, dstCols); dst._assignProduct(src, Scalar(1), false); } }; // Triangular += Product template< typename DstXprType, typename Lhs, typename Rhs, typename Scalar> struct Assignment, internal::add_assign_op::Scalar>, Dense2Triangular> { typedef Product SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op &) { dst._assignProduct(src, Scalar(1), true); } }; // Triangular -= Product template< typename DstXprType, typename Lhs, typename Rhs, typename Scalar> struct Assignment, internal::sub_assign_op::Scalar>, Dense2Triangular> { typedef Product SrcXprType; static void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op &) { dst._assignProduct(src, Scalar(-1), true); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_TRIANGULARMATRIX_H RcppEigen/inst/include/Eigen/src/Core/Ref.h0000644000176200001440000004263514562417221020137 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2012 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_REF_H #define EIGEN_REF_H namespace Eigen { namespace internal { template struct traits > : public traits > { typedef _PlainObjectType PlainObjectType; typedef _StrideType StrideType; enum { Options = _Options, Flags = traits >::Flags | NestByRefBit, Alignment = traits >::Alignment }; template struct match { enum { IsVectorAtCompileTime = PlainObjectType::IsVectorAtCompileTime || Derived::IsVectorAtCompileTime, HasDirectAccess = internal::has_direct_access::ret, StorageOrderMatch = IsVectorAtCompileTime || ((PlainObjectType::Flags&RowMajorBit)==(Derived::Flags&RowMajorBit)), InnerStrideMatch = int(StrideType::InnerStrideAtCompileTime)==int(Dynamic) || int(StrideType::InnerStrideAtCompileTime)==int(Derived::InnerStrideAtCompileTime) || (int(StrideType::InnerStrideAtCompileTime)==0 && int(Derived::InnerStrideAtCompileTime)==1), OuterStrideMatch = IsVectorAtCompileTime || int(StrideType::OuterStrideAtCompileTime)==int(Dynamic) || int(StrideType::OuterStrideAtCompileTime)==int(Derived::OuterStrideAtCompileTime), // NOTE, this indirection of evaluator::Alignment is needed // to workaround a very strange bug in MSVC related to the instantiation // of has_*ary_operator in evaluator. // This line is surprisingly very sensitive. For instance, simply adding parenthesis // as "DerivedAlignment = (int(evaluator::Alignment))," will make MSVC fail... DerivedAlignment = int(evaluator::Alignment), AlignmentMatch = (int(traits::Alignment)==int(Unaligned)) || (DerivedAlignment >= int(Alignment)), // FIXME the first condition is not very clear, it should be replaced by the required alignment ScalarTypeMatch = internal::is_same::value, MatchAtCompileTime = HasDirectAccess && StorageOrderMatch && InnerStrideMatch && OuterStrideMatch && AlignmentMatch && ScalarTypeMatch }; typedef typename internal::conditional::type type; }; }; template struct traits > : public traits {}; } template class RefBase : public MapBase { typedef typename internal::traits::PlainObjectType PlainObjectType; typedef typename internal::traits::StrideType StrideType; public: typedef MapBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(RefBase) EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const { return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const { return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer() : IsVectorAtCompileTime ? this->size() : int(Flags)&RowMajorBit ? this->cols() : this->rows(); } EIGEN_DEVICE_FUNC RefBase() : Base(0,RowsAtCompileTime==Dynamic?0:RowsAtCompileTime,ColsAtCompileTime==Dynamic?0:ColsAtCompileTime), // Stride<> does not allow default ctor for Dynamic strides, so let' initialize it with dummy values: m_stride(StrideType::OuterStrideAtCompileTime==Dynamic?0:StrideType::OuterStrideAtCompileTime, StrideType::InnerStrideAtCompileTime==Dynamic?0:StrideType::InnerStrideAtCompileTime) {} EIGEN_INHERIT_ASSIGNMENT_OPERATORS(RefBase) protected: typedef Stride StrideBase; // Resolves inner stride if default 0. static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveInnerStride(Index inner) { return inner == 0 ? 1 : inner; } // Resolves outer stride if default 0. static EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR Index resolveOuterStride(Index inner, Index outer, Index rows, Index cols, bool isVectorAtCompileTime, bool isRowMajor) { return outer == 0 ? isVectorAtCompileTime ? inner * rows * cols : isRowMajor ? inner * cols : inner * rows : outer; } // Returns true if construction is valid, false if there is a stride mismatch, // and fails if there is a size mismatch. template EIGEN_DEVICE_FUNC bool construct(Expression& expr) { // Check matrix sizes. If this is a compile-time vector, we do allow // implicitly transposing. EIGEN_STATIC_ASSERT( EIGEN_PREDICATE_SAME_MATRIX_SIZE(PlainObjectType, Expression) // If it is a vector, the transpose sizes might match. || ( PlainObjectType::IsVectorAtCompileTime && ((int(PlainObjectType::RowsAtCompileTime)==Eigen::Dynamic || int(Expression::ColsAtCompileTime)==Eigen::Dynamic || int(PlainObjectType::RowsAtCompileTime)==int(Expression::ColsAtCompileTime)) && (int(PlainObjectType::ColsAtCompileTime)==Eigen::Dynamic || int(Expression::RowsAtCompileTime)==Eigen::Dynamic || int(PlainObjectType::ColsAtCompileTime)==int(Expression::RowsAtCompileTime)))), YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES ) // Determine runtime rows and columns. Index rows = expr.rows(); Index cols = expr.cols(); if(PlainObjectType::RowsAtCompileTime==1) { eigen_assert(expr.rows()==1 || expr.cols()==1); rows = 1; cols = expr.size(); } else if(PlainObjectType::ColsAtCompileTime==1) { eigen_assert(expr.rows()==1 || expr.cols()==1); rows = expr.size(); cols = 1; } // Verify that the sizes are valid. eigen_assert( (PlainObjectType::RowsAtCompileTime == Dynamic) || (PlainObjectType::RowsAtCompileTime == rows)); eigen_assert( (PlainObjectType::ColsAtCompileTime == Dynamic) || (PlainObjectType::ColsAtCompileTime == cols)); // If this is a vector, we might be transposing, which means that stride should swap. const bool transpose = PlainObjectType::IsVectorAtCompileTime && (rows != expr.rows()); // If the storage format differs, we also need to swap the stride. const bool row_major = ((PlainObjectType::Flags)&RowMajorBit) != 0; const bool expr_row_major = (Expression::Flags&RowMajorBit) != 0; const bool storage_differs = (row_major != expr_row_major); const bool swap_stride = (transpose != storage_differs); // Determine expr's actual strides, resolving any defaults if zero. const Index expr_inner_actual = resolveInnerStride(expr.innerStride()); const Index expr_outer_actual = resolveOuterStride(expr_inner_actual, expr.outerStride(), expr.rows(), expr.cols(), Expression::IsVectorAtCompileTime != 0, expr_row_major); // If this is a column-major row vector or row-major column vector, the inner-stride // is arbitrary, so set it to either the compile-time inner stride or 1. const bool row_vector = (rows == 1); const bool col_vector = (cols == 1); const Index inner_stride = ( (!row_major && row_vector) || (row_major && col_vector) ) ? ( StrideType::InnerStrideAtCompileTime > 0 ? Index(StrideType::InnerStrideAtCompileTime) : 1) : swap_stride ? expr_outer_actual : expr_inner_actual; // If this is a column-major column vector or row-major row vector, the outer-stride // is arbitrary, so set it to either the compile-time outer stride or vector size. const Index outer_stride = ( (!row_major && col_vector) || (row_major && row_vector) ) ? ( StrideType::OuterStrideAtCompileTime > 0 ? Index(StrideType::OuterStrideAtCompileTime) : rows * cols * inner_stride) : swap_stride ? expr_inner_actual : expr_outer_actual; // Check if given inner/outer strides are compatible with compile-time strides. const bool inner_valid = (StrideType::InnerStrideAtCompileTime == Dynamic) || (resolveInnerStride(Index(StrideType::InnerStrideAtCompileTime)) == inner_stride); if (!inner_valid) { return false; } const bool outer_valid = (StrideType::OuterStrideAtCompileTime == Dynamic) || (resolveOuterStride( inner_stride, Index(StrideType::OuterStrideAtCompileTime), rows, cols, PlainObjectType::IsVectorAtCompileTime != 0, row_major) == outer_stride); if (!outer_valid) { return false; } ::new (static_cast(this)) Base(expr.data(), rows, cols); ::new (&m_stride) StrideBase( (StrideType::OuterStrideAtCompileTime == 0) ? 0 : outer_stride, (StrideType::InnerStrideAtCompileTime == 0) ? 0 : inner_stride ); return true; } StrideBase m_stride; }; /** \class Ref * \ingroup Core_Module * * \brief A matrix or vector expression mapping an existing expression * * \tparam PlainObjectType the equivalent matrix type of the mapped data * \tparam Options specifies the pointer alignment in bytes. It can be: \c #Aligned128, , \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned. * The default is \c #Unaligned. * \tparam StrideType optionally specifies strides. By default, Ref implies a contiguous storage along the inner dimension (inner stride==1), * but accepts a variable outer stride (leading dimension). * This can be overridden by specifying strides. * The type passed here must be a specialization of the Stride template, see examples below. * * This class provides a way to write non-template functions taking Eigen objects as parameters while limiting the number of copies. * A Ref<> object can represent either a const expression or a l-value: * \code * // in-out argument: * void foo1(Ref x); * * // read-only const argument: * void foo2(const Ref& x); * \endcode * * In the in-out case, the input argument must satisfy the constraints of the actual Ref<> type, otherwise a compilation issue will be triggered. * By default, a Ref can reference any dense vector expression of float having a contiguous memory layout. * Likewise, a Ref can reference any column-major dense matrix expression of float whose column's elements are contiguously stored with * the possibility to have a constant space in-between each column, i.e. the inner stride must be equal to 1, but the outer stride (or leading dimension) * can be greater than the number of rows. * * In the const case, if the input expression does not match the above requirement, then it is evaluated into a temporary before being passed to the function. * Here are some examples: * \code * MatrixXf A; * VectorXf a; * foo1(a.head()); // OK * foo1(A.col()); // OK * foo1(A.row()); // Compilation error because here innerstride!=1 * foo2(A.row()); // Compilation error because A.row() is a 1xN object while foo2 is expecting a Nx1 object * foo2(A.row().transpose()); // The row is copied into a contiguous temporary * foo2(2*a); // The expression is evaluated into a temporary * foo2(A.col().segment(2,4)); // No temporary * \endcode * * The range of inputs that can be referenced without temporary can be enlarged using the last two template parameters. * Here is an example accepting an innerstride!=1: * \code * // in-out argument: * void foo3(Ref > x); * foo3(A.row()); // OK * \endcode * The downside here is that the function foo3 might be significantly slower than foo1 because it won't be able to exploit vectorization, and will involve more * expensive address computations even if the input is contiguously stored in memory. To overcome this issue, one might propose to overload internally calling a * template function, e.g.: * \code * // in the .h: * void foo(const Ref& A); * void foo(const Ref >& A); * * // in the .cpp: * template void foo_impl(const TypeOfA& A) { * ... // crazy code goes here * } * void foo(const Ref& A) { foo_impl(A); } * void foo(const Ref >& A) { foo_impl(A); } * \endcode * * See also the following stackoverflow questions for further references: * - Correct usage of the Eigen::Ref<> class * * \sa PlainObjectBase::Map(), \ref TopicStorageOrders */ template class Ref : public RefBase > { private: typedef internal::traits Traits; template EIGEN_DEVICE_FUNC inline Ref(const PlainObjectBase& expr, typename internal::enable_if::MatchAtCompileTime),Derived>::type* = 0); public: typedef RefBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(Ref) #ifndef EIGEN_PARSED_BY_DOXYGEN template EIGEN_DEVICE_FUNC inline Ref(PlainObjectBase& expr, typename internal::enable_if::MatchAtCompileTime),Derived>::type* = 0) { EIGEN_STATIC_ASSERT(bool(Traits::template match::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH); // Construction must pass since we will not create temprary storage in the non-const case. const bool success = Base::construct(expr.derived()); EIGEN_UNUSED_VARIABLE(success) eigen_assert(success); } template EIGEN_DEVICE_FUNC inline Ref(const DenseBase& expr, typename internal::enable_if::MatchAtCompileTime),Derived>::type* = 0) #else /** Implicit constructor from any dense expression */ template inline Ref(DenseBase& expr) #endif { EIGEN_STATIC_ASSERT(bool(internal::is_lvalue::value), THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY); EIGEN_STATIC_ASSERT(bool(Traits::template match::MatchAtCompileTime), STORAGE_LAYOUT_DOES_NOT_MATCH); EIGEN_STATIC_ASSERT(!Derived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY); // Construction must pass since we will not create temporary storage in the non-const case. const bool success = Base::construct(expr.const_cast_derived()); EIGEN_UNUSED_VARIABLE(success) eigen_assert(success); } EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Ref) }; // this is the const ref version template class Ref : public RefBase > { typedef internal::traits Traits; public: typedef RefBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(Ref) template EIGEN_DEVICE_FUNC inline Ref(const DenseBase& expr, typename internal::enable_if::ScalarTypeMatch),Derived>::type* = 0) { // std::cout << match_helper::HasDirectAccess << "," << match_helper::OuterStrideMatch << "," << match_helper::InnerStrideMatch << "\n"; // std::cout << int(StrideType::OuterStrideAtCompileTime) << " - " << int(Derived::OuterStrideAtCompileTime) << "\n"; // std::cout << int(StrideType::InnerStrideAtCompileTime) << " - " << int(Derived::InnerStrideAtCompileTime) << "\n"; construct(expr.derived(), typename Traits::template match::type()); } EIGEN_DEVICE_FUNC inline Ref(const Ref& other) : Base(other) { // copy constructor shall not copy the m_object, to avoid unnecessary malloc and copy } template EIGEN_DEVICE_FUNC inline Ref(const RefBase& other) { construct(other.derived(), typename Traits::template match::type()); } protected: template EIGEN_DEVICE_FUNC void construct(const Expression& expr,internal::true_type) { // Check if we can use the underlying expr's storage directly, otherwise call the copy version. if (!Base::construct(expr)) { construct(expr, internal::false_type()); } } template EIGEN_DEVICE_FUNC void construct(const Expression& expr, internal::false_type) { internal::call_assignment_no_alias(m_object,expr,internal::assign_op()); Base::construct(m_object); } protected: TPlainObjectType m_object; }; } // end namespace Eigen #endif // EIGEN_REF_H RcppEigen/inst/include/Eigen/src/Core/CwiseTernaryOp.h0000644000176200001440000002010014107270226022315 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2014 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // Copyright (C) 2016 Eugene Brevdo // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CWISE_TERNARY_OP_H #define EIGEN_CWISE_TERNARY_OP_H namespace Eigen { namespace internal { template struct traits > { // we must not inherit from traits since it has // the potential to cause problems with MSVC typedef typename remove_all::type Ancestor; typedef typename traits::XprKind XprKind; enum { RowsAtCompileTime = traits::RowsAtCompileTime, ColsAtCompileTime = traits::ColsAtCompileTime, MaxRowsAtCompileTime = traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = traits::MaxColsAtCompileTime }; // even though we require Arg1, Arg2, and Arg3 to have the same scalar type // (see CwiseTernaryOp constructor), // we still want to handle the case when the result type is different. typedef typename result_of::type Scalar; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::StorageIndex StorageIndex; typedef typename Arg1::Nested Arg1Nested; typedef typename Arg2::Nested Arg2Nested; typedef typename Arg3::Nested Arg3Nested; typedef typename remove_reference::type _Arg1Nested; typedef typename remove_reference::type _Arg2Nested; typedef typename remove_reference::type _Arg3Nested; enum { Flags = _Arg1Nested::Flags & RowMajorBit }; }; } // end namespace internal template class CwiseTernaryOpImpl; /** \class CwiseTernaryOp * \ingroup Core_Module * * \brief Generic expression where a coefficient-wise ternary operator is * applied to two expressions * * \tparam TernaryOp template functor implementing the operator * \tparam Arg1Type the type of the first argument * \tparam Arg2Type the type of the second argument * \tparam Arg3Type the type of the third argument * * This class represents an expression where a coefficient-wise ternary * operator is applied to three expressions. * It is the return type of ternary operators, by which we mean only those * ternary operators where * all three arguments are Eigen expressions. * For example, the return type of betainc(matrix1, matrix2, matrix3) is a * CwiseTernaryOp. * * Most of the time, this is the only way that it is used, so you typically * don't have to name * CwiseTernaryOp types explicitly. * * \sa MatrixBase::ternaryExpr(const MatrixBase &, const * MatrixBase &, const CustomTernaryOp &) const, class CwiseBinaryOp, * class CwiseUnaryOp, class CwiseNullaryOp */ template class CwiseTernaryOp : public CwiseTernaryOpImpl< TernaryOp, Arg1Type, Arg2Type, Arg3Type, typename internal::traits::StorageKind>, internal::no_assignment_operator { public: typedef typename internal::remove_all::type Arg1; typedef typename internal::remove_all::type Arg2; typedef typename internal::remove_all::type Arg3; typedef typename CwiseTernaryOpImpl< TernaryOp, Arg1Type, Arg2Type, Arg3Type, typename internal::traits::StorageKind>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseTernaryOp) typedef typename internal::ref_selector::type Arg1Nested; typedef typename internal::ref_selector::type Arg2Nested; typedef typename internal::ref_selector::type Arg3Nested; typedef typename internal::remove_reference::type _Arg1Nested; typedef typename internal::remove_reference::type _Arg2Nested; typedef typename internal::remove_reference::type _Arg3Nested; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CwiseTernaryOp(const Arg1& a1, const Arg2& a2, const Arg3& a3, const TernaryOp& func = TernaryOp()) : m_arg1(a1), m_arg2(a2), m_arg3(a3), m_functor(func) { // require the sizes to match EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg2) EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg3) // The index types should match EIGEN_STATIC_ASSERT((internal::is_same< typename internal::traits::StorageKind, typename internal::traits::StorageKind>::value), STORAGE_KIND_MUST_MATCH) EIGEN_STATIC_ASSERT((internal::is_same< typename internal::traits::StorageKind, typename internal::traits::StorageKind>::value), STORAGE_KIND_MUST_MATCH) eigen_assert(a1.rows() == a2.rows() && a1.cols() == a2.cols() && a1.rows() == a3.rows() && a1.cols() == a3.cols()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rows() const { // return the fixed size type if available to enable compile time // optimizations if (internal::traits::type>:: RowsAtCompileTime == Dynamic && internal::traits::type>:: RowsAtCompileTime == Dynamic) return m_arg3.rows(); else if (internal::traits::type>:: RowsAtCompileTime == Dynamic && internal::traits::type>:: RowsAtCompileTime == Dynamic) return m_arg2.rows(); else return m_arg1.rows(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index cols() const { // return the fixed size type if available to enable compile time // optimizations if (internal::traits::type>:: ColsAtCompileTime == Dynamic && internal::traits::type>:: ColsAtCompileTime == Dynamic) return m_arg3.cols(); else if (internal::traits::type>:: ColsAtCompileTime == Dynamic && internal::traits::type>:: ColsAtCompileTime == Dynamic) return m_arg2.cols(); else return m_arg1.cols(); } /** \returns the first argument nested expression */ EIGEN_DEVICE_FUNC const _Arg1Nested& arg1() const { return m_arg1; } /** \returns the first argument nested expression */ EIGEN_DEVICE_FUNC const _Arg2Nested& arg2() const { return m_arg2; } /** \returns the third argument nested expression */ EIGEN_DEVICE_FUNC const _Arg3Nested& arg3() const { return m_arg3; } /** \returns the functor representing the ternary operation */ EIGEN_DEVICE_FUNC const TernaryOp& functor() const { return m_functor; } protected: Arg1Nested m_arg1; Arg2Nested m_arg2; Arg3Nested m_arg3; const TernaryOp m_functor; }; // Generic API dispatcher template class CwiseTernaryOpImpl : public internal::generic_xpr_base< CwiseTernaryOp >::type { public: typedef typename internal::generic_xpr_base< CwiseTernaryOp >::type Base; }; } // end namespace Eigen #endif // EIGEN_CWISE_TERNARY_OP_H RcppEigen/inst/include/Eigen/src/Core/Block.h0000644000176200001440000004433014562417221020447 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2010 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BLOCK_H #define EIGEN_BLOCK_H namespace Eigen { namespace internal { template struct traits > : traits { typedef typename traits::Scalar Scalar; typedef typename traits::StorageKind StorageKind; typedef typename traits::XprKind XprKind; typedef typename ref_selector::type XprTypeNested; typedef typename remove_reference::type _XprTypeNested; enum{ MatrixRows = traits::RowsAtCompileTime, MatrixCols = traits::ColsAtCompileTime, RowsAtCompileTime = MatrixRows == 0 ? 0 : BlockRows, ColsAtCompileTime = MatrixCols == 0 ? 0 : BlockCols, MaxRowsAtCompileTime = BlockRows==0 ? 0 : RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime) : int(traits::MaxRowsAtCompileTime), MaxColsAtCompileTime = BlockCols==0 ? 0 : ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime) : int(traits::MaxColsAtCompileTime), XprTypeIsRowMajor = (int(traits::Flags)&RowMajorBit) != 0, IsRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1 : (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0 : XprTypeIsRowMajor, HasSameStorageOrderAsXprType = (IsRowMajor == XprTypeIsRowMajor), InnerSize = IsRowMajor ? int(ColsAtCompileTime) : int(RowsAtCompileTime), InnerStrideAtCompileTime = HasSameStorageOrderAsXprType ? int(inner_stride_at_compile_time::ret) : int(outer_stride_at_compile_time::ret), OuterStrideAtCompileTime = HasSameStorageOrderAsXprType ? int(outer_stride_at_compile_time::ret) : int(inner_stride_at_compile_time::ret), // FIXME, this traits is rather specialized for dense object and it needs to be cleaned further FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0, Flags = (traits::Flags & (DirectAccessBit | (InnerPanel?CompressedAccessBit:0))) | FlagsLvalueBit | FlagsRowMajorBit, // FIXME DirectAccessBit should not be handled by expressions // // Alignment is needed by MapBase's assertions // We can sefely set it to false here. Internal alignment errors will be detected by an eigen_internal_assert in the respective evaluator Alignment = 0 }; }; template::ret> class BlockImpl_dense; } // end namespace internal template class BlockImpl; /** \class Block * \ingroup Core_Module * * \brief Expression of a fixed-size or dynamic-size block * * \tparam XprType the type of the expression in which we are taking a block * \tparam BlockRows the number of rows of the block we are taking at compile time (optional) * \tparam BlockCols the number of columns of the block we are taking at compile time (optional) * \tparam InnerPanel is true, if the block maps to a set of rows of a row major matrix or * to set of columns of a column major matrix (optional). The parameter allows to determine * at compile time whether aligned access is possible on the block expression. * * This class represents an expression of either a fixed-size or dynamic-size block. It is the return * type of DenseBase::block(Index,Index,Index,Index) and DenseBase::block(Index,Index) and * most of the time this is the only way it is used. * * However, if you want to directly maniputate block expressions, * for instance if you want to write a function returning such an expression, you * will need to use this class. * * Here is an example illustrating the dynamic case: * \include class_Block.cpp * Output: \verbinclude class_Block.out * * \note Even though this expression has dynamic size, in the case where \a XprType * has fixed size, this expression inherits a fixed maximal size which means that evaluating * it does not cause a dynamic memory allocation. * * Here is an example illustrating the fixed-size case: * \include class_FixedBlock.cpp * Output: \verbinclude class_FixedBlock.out * * \sa DenseBase::block(Index,Index,Index,Index), DenseBase::block(Index,Index), class VectorBlock */ template class Block : public BlockImpl::StorageKind> { typedef BlockImpl::StorageKind> Impl; public: //typedef typename Impl::Base Base; typedef Impl Base; EIGEN_GENERIC_PUBLIC_INTERFACE(Block) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block) typedef typename internal::remove_all::type NestedExpression; /** Column or Row constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Block(XprType& xpr, Index i) : Impl(xpr,i) { eigen_assert( (i>=0) && ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && i= 0 && BlockRows >= 0 && startRow + BlockRows <= xpr.rows() && startCol >= 0 && BlockCols >= 0 && startCol + BlockCols <= xpr.cols()); } /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Block(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols) : Impl(xpr, startRow, startCol, blockRows, blockCols) { eigen_assert((RowsAtCompileTime==Dynamic || RowsAtCompileTime==blockRows) && (ColsAtCompileTime==Dynamic || ColsAtCompileTime==blockCols)); eigen_assert(startRow >= 0 && blockRows >= 0 && startRow <= xpr.rows() - blockRows && startCol >= 0 && blockCols >= 0 && startCol <= xpr.cols() - blockCols); } }; // The generic default implementation for dense block simplu forward to the internal::BlockImpl_dense // that must be specialized for direct and non-direct access... template class BlockImpl : public internal::BlockImpl_dense { typedef internal::BlockImpl_dense Impl; typedef typename XprType::StorageIndex StorageIndex; public: typedef Impl Base; EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index i) : Impl(xpr,i) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol) : Impl(xpr, startRow, startCol) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols) : Impl(xpr, startRow, startCol, blockRows, blockCols) {} }; namespace internal { /** \internal Internal implementation of dense Blocks in the general case. */ template class BlockImpl_dense : public internal::dense_xpr_base >::type { typedef Block BlockType; typedef typename internal::ref_selector::non_const_type XprTypeNested; public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(BlockType) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense) // class InnerIterator; // FIXME apparently never used /** Column or Row constructor */ EIGEN_DEVICE_FUNC inline BlockImpl_dense(XprType& xpr, Index i) : m_xpr(xpr), // It is a row if and only if BlockRows==1 and BlockCols==XprType::ColsAtCompileTime, // and it is a column if and only if BlockRows==XprType::RowsAtCompileTime and BlockCols==1, // all other cases are invalid. // The case a 1x1 matrix seems ambiguous, but the result is the same anyway. m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0), m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0), m_blockRows(BlockRows==1 ? 1 : xpr.rows()), m_blockCols(BlockCols==1 ? 1 : xpr.cols()) {} /** Fixed-size constructor */ EIGEN_DEVICE_FUNC inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol) : m_xpr(xpr), m_startRow(startRow), m_startCol(startCol), m_blockRows(BlockRows), m_blockCols(BlockCols) {} /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols) : m_xpr(xpr), m_startRow(startRow), m_startCol(startCol), m_blockRows(blockRows), m_blockCols(blockCols) {} EIGEN_DEVICE_FUNC inline Index rows() const { return m_blockRows.value(); } EIGEN_DEVICE_FUNC inline Index cols() const { return m_blockCols.value(); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index rowId, Index colId) { EIGEN_STATIC_ASSERT_LVALUE(XprType) return m_xpr.coeffRef(rowId + m_startRow.value(), colId + m_startCol.value()); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const { return m_xpr.derived().coeffRef(rowId + m_startRow.value(), colId + m_startCol.value()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const { return m_xpr.coeff(rowId + m_startRow.value(), colId + m_startCol.value()); } EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index) { EIGEN_STATIC_ASSERT_LVALUE(XprType) return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index), m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0)); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index), m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0)); } EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const { return m_xpr.coeff(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index), m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0)); } template inline PacketScalar packet(Index rowId, Index colId) const { return m_xpr.template packet(rowId + m_startRow.value(), colId + m_startCol.value()); } template inline void writePacket(Index rowId, Index colId, const PacketScalar& val) { m_xpr.template writePacket(rowId + m_startRow.value(), colId + m_startCol.value(), val); } template inline PacketScalar packet(Index index) const { return m_xpr.template packet (m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index), m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0)); } template inline void writePacket(Index index, const PacketScalar& val) { m_xpr.template writePacket (m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index), m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0), val); } #ifdef EIGEN_PARSED_BY_DOXYGEN /** \sa MapBase::data() */ EIGEN_DEVICE_FUNC inline const Scalar* data() const; EIGEN_DEVICE_FUNC inline Index innerStride() const; EIGEN_DEVICE_FUNC inline Index outerStride() const; #endif EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& nestedExpression() const { return m_xpr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE XprType& nestedExpression() { return m_xpr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR StorageIndex startRow() const EIGEN_NOEXCEPT { return m_startRow.value(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR StorageIndex startCol() const EIGEN_NOEXCEPT { return m_startCol.value(); } protected: XprTypeNested m_xpr; const internal::variable_if_dynamic m_startRow; const internal::variable_if_dynamic m_startCol; const internal::variable_if_dynamic m_blockRows; const internal::variable_if_dynamic m_blockCols; }; /** \internal Internal implementation of dense Blocks in the direct access case.*/ template class BlockImpl_dense : public MapBase > { typedef Block BlockType; typedef typename internal::ref_selector::non_const_type XprTypeNested; enum { XprTypeIsRowMajor = (int(traits::Flags)&RowMajorBit) != 0 }; public: typedef MapBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(BlockType) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense) /** Column or Row constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index i) : Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor)) || ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride()), BlockRows==1 ? 1 : xpr.rows(), BlockCols==1 ? 1 : xpr.cols()), m_xpr(xpr), m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0), m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0) { init(); } /** Fixed-size constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index startRow, Index startCol) : Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol)), m_xpr(xpr), m_startRow(startRow), m_startCol(startCol) { init(); } /** Dynamic-size constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols) : Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol), blockRows, blockCols), m_xpr(xpr), m_startRow(startRow), m_startCol(startCol) { init(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& nestedExpression() const EIGEN_NOEXCEPT { return m_xpr; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE XprType& nestedExpression() { return m_xpr; } /** \sa MapBase::innerStride() */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index innerStride() const EIGEN_NOEXCEPT { return internal::traits::HasSameStorageOrderAsXprType ? m_xpr.innerStride() : m_xpr.outerStride(); } /** \sa MapBase::outerStride() */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index outerStride() const EIGEN_NOEXCEPT { return internal::traits::HasSameStorageOrderAsXprType ? m_xpr.outerStride() : m_xpr.innerStride(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR StorageIndex startRow() const EIGEN_NOEXCEPT { return m_startRow.value(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR StorageIndex startCol() const EIGEN_NOEXCEPT { return m_startCol.value(); } #ifndef __SUNPRO_CC // FIXME sunstudio is not friendly with the above friend... // META-FIXME there is no 'friend' keyword around here. Is this obsolete? protected: #endif #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal used by allowAligned() */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows, Index blockCols) : Base(data, blockRows, blockCols), m_xpr(xpr) { init(); } #endif protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void init() { m_outerStride = internal::traits::HasSameStorageOrderAsXprType ? m_xpr.outerStride() : m_xpr.innerStride(); } XprTypeNested m_xpr; const internal::variable_if_dynamic m_startRow; const internal::variable_if_dynamic m_startCol; Index m_outerStride; }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_BLOCK_H RcppEigen/inst/include/Eigen/src/Core/MatrixBase.h0000644000176200001440000005646014562417221021463 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2009 Benoit Jacob // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATRIXBASE_H #define EIGEN_MATRIXBASE_H namespace Eigen { /** \class MatrixBase * \ingroup Core_Module * * \brief Base class for all dense matrices, vectors, and expressions * * This class is the base that is inherited by all matrix, vector, and related expression * types. Most of the Eigen API is contained in this class, and its base classes. Other important * classes for the Eigen API are Matrix, and VectorwiseOp. * * Note that some methods are defined in other modules such as the \ref LU_Module LU module * for all functions related to matrix inversions. * * \tparam Derived is the derived type, e.g. a matrix type, or an expression, etc. * * When writing a function taking Eigen objects as argument, if you want your function * to take as argument any matrix, vector, or expression, just let it take a * MatrixBase argument. As an example, here is a function printFirstRow which, given * a matrix, vector, or expression \a x, prints the first row of \a x. * * \code template void printFirstRow(const Eigen::MatrixBase& x) { cout << x.row(0) << endl; } * \endcode * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_MATRIXBASE_PLUGIN. * * \sa \blank \ref TopicClassHierarchy */ template class MatrixBase : public DenseBase { public: #ifndef EIGEN_PARSED_BY_DOXYGEN typedef MatrixBase StorageBaseType; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::StorageIndex StorageIndex; typedef typename internal::traits::Scalar Scalar; typedef typename internal::packet_traits::type PacketScalar; typedef typename NumTraits::Real RealScalar; typedef DenseBase Base; using Base::RowsAtCompileTime; using Base::ColsAtCompileTime; using Base::SizeAtCompileTime; using Base::MaxRowsAtCompileTime; using Base::MaxColsAtCompileTime; using Base::MaxSizeAtCompileTime; using Base::IsVectorAtCompileTime; using Base::Flags; using Base::derived; using Base::const_cast_derived; using Base::rows; using Base::cols; using Base::size; using Base::coeff; using Base::coeffRef; using Base::lazyAssign; using Base::eval; using Base::operator-; using Base::operator+=; using Base::operator-=; using Base::operator*=; using Base::operator/=; typedef typename Base::CoeffReturnType CoeffReturnType; typedef typename Base::ConstTransposeReturnType ConstTransposeReturnType; typedef typename Base::RowXpr RowXpr; typedef typename Base::ColXpr ColXpr; #endif // not EIGEN_PARSED_BY_DOXYGEN #ifndef EIGEN_PARSED_BY_DOXYGEN /** type of the equivalent square matrix */ typedef Matrix SquareMatrixType; #endif // not EIGEN_PARSED_BY_DOXYGEN /** \returns the size of the main diagonal, which is min(rows(),cols()). * \sa rows(), cols(), SizeAtCompileTime. */ EIGEN_DEVICE_FUNC inline Index diagonalSize() const { return (numext::mini)(rows(),cols()); } typedef typename Base::PlainObject PlainObject; #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal Represents a matrix with all coefficients equal to one another*/ typedef CwiseNullaryOp,PlainObject> ConstantReturnType; /** \internal the return type of MatrixBase::adjoint() */ typedef typename internal::conditional::IsComplex, CwiseUnaryOp, ConstTransposeReturnType>, ConstTransposeReturnType >::type AdjointReturnType; /** \internal Return type of eigenvalues() */ typedef Matrix, internal::traits::ColsAtCompileTime, 1, ColMajor> EigenvaluesReturnType; /** \internal the return type of identity */ typedef CwiseNullaryOp,PlainObject> IdentityReturnType; /** \internal the return type of unit vectors */ typedef Block, SquareMatrixType>, internal::traits::RowsAtCompileTime, internal::traits::ColsAtCompileTime> BasisReturnType; #endif // not EIGEN_PARSED_BY_DOXYGEN #define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::MatrixBase #define EIGEN_DOC_UNARY_ADDONS(X,Y) # include "../plugins/CommonCwiseBinaryOps.h" # include "../plugins/MatrixCwiseUnaryOps.h" # include "../plugins/MatrixCwiseBinaryOps.h" # ifdef EIGEN_MATRIXBASE_PLUGIN # include EIGEN_MATRIXBASE_PLUGIN # endif #undef EIGEN_CURRENT_STORAGE_BASE_CLASS #undef EIGEN_DOC_UNARY_ADDONS /** Special case of the template operator=, in order to prevent the compiler * from generating a default operator= (issue hit with g++ 4.1) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const MatrixBase& other); // We cannot inherit here via Base::operator= since it is causing // trouble with MSVC. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase& other); template EIGEN_DEVICE_FUNC Derived& operator=(const EigenBase& other); template EIGEN_DEVICE_FUNC Derived& operator=(const ReturnByValue& other); template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator+=(const MatrixBase& other); template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator-=(const MatrixBase& other); template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase &other) const; template EIGEN_DEVICE_FUNC const Product lazyProduct(const MatrixBase &other) const; template Derived& operator*=(const EigenBase& other); template void applyOnTheLeft(const EigenBase& other); template void applyOnTheRight(const EigenBase& other); template EIGEN_DEVICE_FUNC const Product operator*(const DiagonalBase &diagonal) const; template EIGEN_DEVICE_FUNC typename ScalarBinaryOpTraits::Scalar,typename internal::traits::Scalar>::ReturnType dot(const MatrixBase& other) const; EIGEN_DEVICE_FUNC RealScalar squaredNorm() const; EIGEN_DEVICE_FUNC RealScalar norm() const; RealScalar stableNorm() const; RealScalar blueNorm() const; RealScalar hypotNorm() const; EIGEN_DEVICE_FUNC const PlainObject normalized() const; EIGEN_DEVICE_FUNC const PlainObject stableNormalized() const; EIGEN_DEVICE_FUNC void normalize(); EIGEN_DEVICE_FUNC void stableNormalize(); EIGEN_DEVICE_FUNC const AdjointReturnType adjoint() const; EIGEN_DEVICE_FUNC void adjointInPlace(); typedef Diagonal DiagonalReturnType; EIGEN_DEVICE_FUNC DiagonalReturnType diagonal(); typedef typename internal::add_const >::type ConstDiagonalReturnType; EIGEN_DEVICE_FUNC ConstDiagonalReturnType diagonal() const; template struct DiagonalIndexReturnType { typedef Diagonal Type; }; template struct ConstDiagonalIndexReturnType { typedef const Diagonal Type; }; template EIGEN_DEVICE_FUNC typename DiagonalIndexReturnType::Type diagonal(); template EIGEN_DEVICE_FUNC typename ConstDiagonalIndexReturnType::Type diagonal() const; typedef Diagonal DiagonalDynamicIndexReturnType; typedef typename internal::add_const >::type ConstDiagonalDynamicIndexReturnType; EIGEN_DEVICE_FUNC DiagonalDynamicIndexReturnType diagonal(Index index); EIGEN_DEVICE_FUNC ConstDiagonalDynamicIndexReturnType diagonal(Index index) const; template struct TriangularViewReturnType { typedef TriangularView Type; }; template struct ConstTriangularViewReturnType { typedef const TriangularView Type; }; template EIGEN_DEVICE_FUNC typename TriangularViewReturnType::Type triangularView(); template EIGEN_DEVICE_FUNC typename ConstTriangularViewReturnType::Type triangularView() const; template struct SelfAdjointViewReturnType { typedef SelfAdjointView Type; }; template struct ConstSelfAdjointViewReturnType { typedef const SelfAdjointView Type; }; template EIGEN_DEVICE_FUNC typename SelfAdjointViewReturnType::Type selfadjointView(); template EIGEN_DEVICE_FUNC typename ConstSelfAdjointViewReturnType::Type selfadjointView() const; const SparseView sparseView(const Scalar& m_reference = Scalar(0), const typename NumTraits::Real& m_epsilon = NumTraits::dummy_precision()) const; EIGEN_DEVICE_FUNC static const IdentityReturnType Identity(); EIGEN_DEVICE_FUNC static const IdentityReturnType Identity(Index rows, Index cols); EIGEN_DEVICE_FUNC static const BasisReturnType Unit(Index size, Index i); EIGEN_DEVICE_FUNC static const BasisReturnType Unit(Index i); EIGEN_DEVICE_FUNC static const BasisReturnType UnitX(); EIGEN_DEVICE_FUNC static const BasisReturnType UnitY(); EIGEN_DEVICE_FUNC static const BasisReturnType UnitZ(); EIGEN_DEVICE_FUNC static const BasisReturnType UnitW(); EIGEN_DEVICE_FUNC const DiagonalWrapper asDiagonal() const; const PermutationWrapper asPermutation() const; EIGEN_DEVICE_FUNC Derived& setIdentity(); EIGEN_DEVICE_FUNC Derived& setIdentity(Index rows, Index cols); EIGEN_DEVICE_FUNC Derived& setUnit(Index i); EIGEN_DEVICE_FUNC Derived& setUnit(Index newSize, Index i); bool isIdentity(const RealScalar& prec = NumTraits::dummy_precision()) const; bool isDiagonal(const RealScalar& prec = NumTraits::dummy_precision()) const; bool isUpperTriangular(const RealScalar& prec = NumTraits::dummy_precision()) const; bool isLowerTriangular(const RealScalar& prec = NumTraits::dummy_precision()) const; template bool isOrthogonal(const MatrixBase& other, const RealScalar& prec = NumTraits::dummy_precision()) const; bool isUnitary(const RealScalar& prec = NumTraits::dummy_precision()) const; /** \returns true if each coefficients of \c *this and \a other are all exactly equal. * \warning When using floating point scalar values you probably should rather use a * fuzzy comparison such as isApprox() * \sa isApprox(), operator!= */ template EIGEN_DEVICE_FUNC inline bool operator==(const MatrixBase& other) const { return cwiseEqual(other).all(); } /** \returns true if at least one pair of coefficients of \c *this and \a other are not exactly equal to each other. * \warning When using floating point scalar values you probably should rather use a * fuzzy comparison such as isApprox() * \sa isApprox(), operator== */ template EIGEN_DEVICE_FUNC inline bool operator!=(const MatrixBase& other) const { return cwiseNotEqual(other).any(); } NoAlias EIGEN_DEVICE_FUNC noalias(); // TODO forceAlignedAccess is temporarily disabled // Need to find a nicer workaround. inline const Derived& forceAlignedAccess() const { return derived(); } inline Derived& forceAlignedAccess() { return derived(); } template inline const Derived& forceAlignedAccessIf() const { return derived(); } template inline Derived& forceAlignedAccessIf() { return derived(); } EIGEN_DEVICE_FUNC Scalar trace() const; template EIGEN_DEVICE_FUNC RealScalar lpNorm() const; EIGEN_DEVICE_FUNC MatrixBase& matrix() { return *this; } EIGEN_DEVICE_FUNC const MatrixBase& matrix() const { return *this; } /** \returns an \link Eigen::ArrayBase Array \endlink expression of this matrix * \sa ArrayBase::matrix() */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ArrayWrapper array() { return ArrayWrapper(derived()); } /** \returns a const \link Eigen::ArrayBase Array \endlink expression of this matrix * \sa ArrayBase::matrix() */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const ArrayWrapper array() const { return ArrayWrapper(derived()); } /////////// LU module /////////// inline const FullPivLU fullPivLu() const; inline const PartialPivLU partialPivLu() const; inline const PartialPivLU lu() const; EIGEN_DEVICE_FUNC inline const Inverse inverse() const; template inline void computeInverseAndDetWithCheck( ResultType& inverse, typename ResultType::Scalar& determinant, bool& invertible, const RealScalar& absDeterminantThreshold = NumTraits::dummy_precision() ) const; template inline void computeInverseWithCheck( ResultType& inverse, bool& invertible, const RealScalar& absDeterminantThreshold = NumTraits::dummy_precision() ) const; EIGEN_DEVICE_FUNC Scalar determinant() const; /////////// Cholesky module /////////// inline const LLT llt() const; inline const LDLT ldlt() const; /////////// QR module /////////// inline const HouseholderQR householderQr() const; inline const ColPivHouseholderQR colPivHouseholderQr() const; inline const FullPivHouseholderQR fullPivHouseholderQr() const; inline const CompleteOrthogonalDecomposition completeOrthogonalDecomposition() const; /////////// Eigenvalues module /////////// inline EigenvaluesReturnType eigenvalues() const; inline RealScalar operatorNorm() const; /////////// SVD module /////////// inline JacobiSVD jacobiSvd(unsigned int computationOptions = 0) const; inline BDCSVD bdcSvd(unsigned int computationOptions = 0) const; /////////// Geometry module /////////// #ifndef EIGEN_PARSED_BY_DOXYGEN /// \internal helper struct to form the return type of the cross product template struct cross_product_return_type { typedef typename ScalarBinaryOpTraits::Scalar,typename internal::traits::Scalar>::ReturnType Scalar; typedef Matrix type; }; #endif // EIGEN_PARSED_BY_DOXYGEN template EIGEN_DEVICE_FUNC #ifndef EIGEN_PARSED_BY_DOXYGEN inline typename cross_product_return_type::type #else inline PlainObject #endif cross(const MatrixBase& other) const; template EIGEN_DEVICE_FUNC inline PlainObject cross3(const MatrixBase& other) const; EIGEN_DEVICE_FUNC inline PlainObject unitOrthogonal(void) const; EIGEN_DEVICE_FUNC inline Matrix eulerAngles(Index a0, Index a1, Index a2) const; // put this as separate enum value to work around possible GCC 4.3 bug (?) enum { HomogeneousReturnTypeDirection = ColsAtCompileTime==1&&RowsAtCompileTime==1 ? ((internal::traits::Flags&RowMajorBit)==RowMajorBit ? Horizontal : Vertical) : ColsAtCompileTime==1 ? Vertical : Horizontal }; typedef Homogeneous HomogeneousReturnType; EIGEN_DEVICE_FUNC inline HomogeneousReturnType homogeneous() const; enum { SizeMinusOne = SizeAtCompileTime==Dynamic ? Dynamic : SizeAtCompileTime-1 }; typedef Block::ColsAtCompileTime==1 ? SizeMinusOne : 1, internal::traits::ColsAtCompileTime==1 ? 1 : SizeMinusOne> ConstStartMinusOne; typedef EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(ConstStartMinusOne,Scalar,quotient) HNormalizedReturnType; EIGEN_DEVICE_FUNC inline const HNormalizedReturnType hnormalized() const; ////////// Householder module /////////// EIGEN_DEVICE_FUNC void makeHouseholderInPlace(Scalar& tau, RealScalar& beta); template EIGEN_DEVICE_FUNC void makeHouseholder(EssentialPart& essential, Scalar& tau, RealScalar& beta) const; template EIGEN_DEVICE_FUNC void applyHouseholderOnTheLeft(const EssentialPart& essential, const Scalar& tau, Scalar* workspace); template EIGEN_DEVICE_FUNC void applyHouseholderOnTheRight(const EssentialPart& essential, const Scalar& tau, Scalar* workspace); ///////// Jacobi module ///////// template EIGEN_DEVICE_FUNC void applyOnTheLeft(Index p, Index q, const JacobiRotation& j); template EIGEN_DEVICE_FUNC void applyOnTheRight(Index p, Index q, const JacobiRotation& j); ///////// SparseCore module ///////// template EIGEN_STRONG_INLINE const typename SparseMatrixBase::template CwiseProductDenseReturnType::Type cwiseProduct(const SparseMatrixBase &other) const { return other.cwiseProduct(derived()); } ///////// MatrixFunctions module ///////// typedef typename internal::stem_function::type StemFunction; #define EIGEN_MATRIX_FUNCTION(ReturnType, Name, Description) \ /** \returns an expression of the matrix Description of \c *this. \brief This function requires the unsupported MatrixFunctions module. To compute the coefficient-wise Description use ArrayBase::##Name . */ \ const ReturnType Name() const; #define EIGEN_MATRIX_FUNCTION_1(ReturnType, Name, Description, Argument) \ /** \returns an expression of the matrix Description of \c *this. \brief This function requires the unsupported MatrixFunctions module. To compute the coefficient-wise Description use ArrayBase::##Name . */ \ const ReturnType Name(Argument) const; EIGEN_MATRIX_FUNCTION(MatrixExponentialReturnValue, exp, exponential) /** \brief Helper function for the unsupported MatrixFunctions module.*/ const MatrixFunctionReturnValue matrixFunction(StemFunction f) const; EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, cosh, hyperbolic cosine) EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, sinh, hyperbolic sine) #if EIGEN_HAS_CXX11_MATH EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, atanh, inverse hyperbolic cosine) EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, acosh, inverse hyperbolic cosine) EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, asinh, inverse hyperbolic sine) #endif EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, cos, cosine) EIGEN_MATRIX_FUNCTION(MatrixFunctionReturnValue, sin, sine) EIGEN_MATRIX_FUNCTION(MatrixSquareRootReturnValue, sqrt, square root) EIGEN_MATRIX_FUNCTION(MatrixLogarithmReturnValue, log, logarithm) EIGEN_MATRIX_FUNCTION_1(MatrixPowerReturnValue, pow, power to \c p, const RealScalar& p) EIGEN_MATRIX_FUNCTION_1(MatrixComplexPowerReturnValue, pow, power to \c p, const std::complex& p) protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(MatrixBase) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(MatrixBase) private: EIGEN_DEVICE_FUNC explicit MatrixBase(int); EIGEN_DEVICE_FUNC MatrixBase(int,int); template EIGEN_DEVICE_FUNC explicit MatrixBase(const MatrixBase&); protected: // mixing arrays and matrices is not legal template Derived& operator+=(const ArrayBase& ) {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;} // mixing arrays and matrices is not legal template Derived& operator-=(const ArrayBase& ) {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;} }; /*************************************************************************** * Implementation of matrix base methods ***************************************************************************/ /** replaces \c *this by \c *this * \a other. * * \returns a reference to \c *this * * Example: \include MatrixBase_applyOnTheRight.cpp * Output: \verbinclude MatrixBase_applyOnTheRight.out */ template template inline Derived& MatrixBase::operator*=(const EigenBase &other) { other.derived().applyThisOnTheRight(derived()); return derived(); } /** replaces \c *this by \c *this * \a other. It is equivalent to MatrixBase::operator*=(). * * Example: \include MatrixBase_applyOnTheRight.cpp * Output: \verbinclude MatrixBase_applyOnTheRight.out */ template template inline void MatrixBase::applyOnTheRight(const EigenBase &other) { other.derived().applyThisOnTheRight(derived()); } /** replaces \c *this by \a other * \c *this. * * Example: \include MatrixBase_applyOnTheLeft.cpp * Output: \verbinclude MatrixBase_applyOnTheLeft.out */ template template inline void MatrixBase::applyOnTheLeft(const EigenBase &other) { other.derived().applyThisOnTheLeft(derived()); } } // end namespace Eigen #endif // EIGEN_MATRIXBASE_H RcppEigen/inst/include/Eigen/src/Core/MathFunctionsImpl.h0000644000176200001440000001576414562417221023032 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com) // Copyright (C) 2016 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MATHFUNCTIONSIMPL_H #define EIGEN_MATHFUNCTIONSIMPL_H namespace Eigen { namespace internal { /** \internal \returns the hyperbolic tan of \a a (coeff-wise) Doesn't do anything fancy, just a 13/6-degree rational interpolant which is accurate up to a couple of ulps in the (approximate) range [-8, 8], outside of which tanh(x) = +/-1 in single precision. The input is clamped to the range [-c, c]. The value c is chosen as the smallest value where the approximation evaluates to exactly 1. In the reange [-0.0004, 0.0004] the approxmation tanh(x) ~= x is used for better accuracy as x tends to zero. This implementation works on both scalars and packets. */ template T generic_fast_tanh_float(const T& a_x) { // Clamp the inputs to the range [-c, c] #ifdef EIGEN_VECTORIZE_FMA const T plus_clamp = pset1(7.99881172180175781f); const T minus_clamp = pset1(-7.99881172180175781f); #else const T plus_clamp = pset1(7.90531110763549805f); const T minus_clamp = pset1(-7.90531110763549805f); #endif const T tiny = pset1(0.0004f); const T x = pmax(pmin(a_x, plus_clamp), minus_clamp); const T tiny_mask = pcmp_lt(pabs(a_x), tiny); // The monomial coefficients of the numerator polynomial (odd). const T alpha_1 = pset1(4.89352455891786e-03f); const T alpha_3 = pset1(6.37261928875436e-04f); const T alpha_5 = pset1(1.48572235717979e-05f); const T alpha_7 = pset1(5.12229709037114e-08f); const T alpha_9 = pset1(-8.60467152213735e-11f); const T alpha_11 = pset1(2.00018790482477e-13f); const T alpha_13 = pset1(-2.76076847742355e-16f); // The monomial coefficients of the denominator polynomial (even). const T beta_0 = pset1(4.89352518554385e-03f); const T beta_2 = pset1(2.26843463243900e-03f); const T beta_4 = pset1(1.18534705686654e-04f); const T beta_6 = pset1(1.19825839466702e-06f); // Since the polynomials are odd/even, we need x^2. const T x2 = pmul(x, x); // Evaluate the numerator polynomial p. T p = pmadd(x2, alpha_13, alpha_11); p = pmadd(x2, p, alpha_9); p = pmadd(x2, p, alpha_7); p = pmadd(x2, p, alpha_5); p = pmadd(x2, p, alpha_3); p = pmadd(x2, p, alpha_1); p = pmul(x, p); // Evaluate the denominator polynomial q. T q = pmadd(x2, beta_6, beta_4); q = pmadd(x2, q, beta_2); q = pmadd(x2, q, beta_0); // Divide the numerator by the denominator. return pselect(tiny_mask, x, pdiv(p, q)); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE RealScalar positive_real_hypot(const RealScalar& x, const RealScalar& y) { // IEEE IEC 6059 special cases. if ((numext::isinf)(x) || (numext::isinf)(y)) return NumTraits::infinity(); if ((numext::isnan)(x) || (numext::isnan)(y)) return NumTraits::quiet_NaN(); EIGEN_USING_STD(sqrt); RealScalar p, qp; p = numext::maxi(x,y); if(p==RealScalar(0)) return RealScalar(0); qp = numext::mini(y,x) / p; return p * sqrt(RealScalar(1) + qp*qp); } template struct hypot_impl { typedef typename NumTraits::Real RealScalar; static EIGEN_DEVICE_FUNC inline RealScalar run(const Scalar& x, const Scalar& y) { EIGEN_USING_STD(abs); return positive_real_hypot(abs(x), abs(y)); } }; // Generic complex sqrt implementation that correctly handles corner cases // according to https://en.cppreference.com/w/cpp/numeric/complex/sqrt template EIGEN_DEVICE_FUNC std::complex complex_sqrt(const std::complex& z) { // Computes the principal sqrt of the input. // // For a complex square root of the number x + i*y. We want to find real // numbers u and v such that // (u + i*v)^2 = x + i*y <=> // u^2 - v^2 + i*2*u*v = x + i*v. // By equating the real and imaginary parts we get: // u^2 - v^2 = x // 2*u*v = y. // // For x >= 0, this has the numerically stable solution // u = sqrt(0.5 * (x + sqrt(x^2 + y^2))) // v = y / (2 * u) // and for x < 0, // v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2))) // u = y / (2 * v) // // Letting w = sqrt(0.5 * (|x| + |z|)), // if x == 0: u = w, v = sign(y) * w // if x > 0: u = w, v = y / (2 * w) // if x < 0: u = |y| / (2 * w), v = sign(y) * w const T x = numext::real(z); const T y = numext::imag(z); const T zero = T(0); const T w = numext::sqrt(T(0.5) * (numext::abs(x) + numext::hypot(x, y))); return (numext::isinf)(y) ? std::complex(NumTraits::infinity(), y) : x == zero ? std::complex(w, y < zero ? -w : w) : x > zero ? std::complex(w, y / (2 * w)) : std::complex(numext::abs(y) / (2 * w), y < zero ? -w : w ); } // Generic complex rsqrt implementation. template EIGEN_DEVICE_FUNC std::complex complex_rsqrt(const std::complex& z) { // Computes the principal reciprocal sqrt of the input. // // For a complex reciprocal square root of the number z = x + i*y. We want to // find real numbers u and v such that // (u + i*v)^2 = 1 / (x + i*y) <=> // u^2 - v^2 + i*2*u*v = x/|z|^2 - i*v/|z|^2. // By equating the real and imaginary parts we get: // u^2 - v^2 = x/|z|^2 // 2*u*v = y/|z|^2. // // For x >= 0, this has the numerically stable solution // u = sqrt(0.5 * (x + |z|)) / |z| // v = -y / (2 * u * |z|) // and for x < 0, // v = -sign(y) * sqrt(0.5 * (-x + |z|)) / |z| // u = -y / (2 * v * |z|) // // Letting w = sqrt(0.5 * (|x| + |z|)), // if x == 0: u = w / |z|, v = -sign(y) * w / |z| // if x > 0: u = w / |z|, v = -y / (2 * w * |z|) // if x < 0: u = |y| / (2 * w * |z|), v = -sign(y) * w / |z| const T x = numext::real(z); const T y = numext::imag(z); const T zero = T(0); const T abs_z = numext::hypot(x, y); const T w = numext::sqrt(T(0.5) * (numext::abs(x) + abs_z)); const T woz = w / abs_z; // Corner cases consistent with 1/sqrt(z) on gcc/clang. return abs_z == zero ? std::complex(NumTraits::infinity(), NumTraits::quiet_NaN()) : ((numext::isinf)(x) || (numext::isinf)(y)) ? std::complex(zero, zero) : x == zero ? std::complex(woz, y < zero ? woz : -woz) : x > zero ? std::complex(woz, -y / (2 * w * abs_z)) : std::complex(numext::abs(y) / (2 * w * abs_z), y < zero ? woz : -woz ); } template EIGEN_DEVICE_FUNC std::complex complex_log(const std::complex& z) { // Computes complex log. T a = numext::abs(z); EIGEN_USING_STD(atan2); T b = atan2(z.imag(), z.real()); return std::complex(numext::log(a), b); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MATHFUNCTIONSIMPL_H RcppEigen/inst/include/Eigen/src/Core/NumTraits.h0000644000176200001440000003112414562417221021340 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2010 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NUMTRAITS_H #define EIGEN_NUMTRAITS_H namespace Eigen { namespace internal { // default implementation of digits10(), based on numeric_limits if specialized, // 0 for integer types, and log10(epsilon()) otherwise. template< typename T, bool use_numeric_limits = std::numeric_limits::is_specialized, bool is_integer = NumTraits::IsInteger> struct default_digits10_impl { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { return std::numeric_limits::digits10; } }; template struct default_digits10_impl // Floating point { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { using std::log10; using std::ceil; typedef typename NumTraits::Real Real; return int(ceil(-log10(NumTraits::epsilon()))); } }; template struct default_digits10_impl // Integer { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { return 0; } }; // default implementation of digits(), based on numeric_limits if specialized, // 0 for integer types, and log2(epsilon()) otherwise. template< typename T, bool use_numeric_limits = std::numeric_limits::is_specialized, bool is_integer = NumTraits::IsInteger> struct default_digits_impl { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { return std::numeric_limits::digits; } }; template struct default_digits_impl // Floating point { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { using std::log; using std::ceil; typedef typename NumTraits::Real Real; return int(ceil(-log(NumTraits::epsilon())/log(static_cast(2)))); } }; template struct default_digits_impl // Integer { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int run() { return 0; } }; } // end namespace internal namespace numext { /** \internal bit-wise cast without changing the underlying bit representation. */ // TODO: Replace by std::bit_cast (available in C++20) template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Tgt bit_cast(const Src& src) { #if EIGEN_HAS_TYPE_TRAITS // The behaviour of memcpy is not specified for non-trivially copyable types EIGEN_STATIC_ASSERT(std::is_trivially_copyable::value, THIS_TYPE_IS_NOT_SUPPORTED); EIGEN_STATIC_ASSERT(std::is_trivially_copyable::value && std::is_default_constructible::value, THIS_TYPE_IS_NOT_SUPPORTED); #endif EIGEN_STATIC_ASSERT(sizeof(Src) == sizeof(Tgt), THIS_TYPE_IS_NOT_SUPPORTED); Tgt tgt; EIGEN_USING_STD(memcpy) memcpy(&tgt, &src, sizeof(Tgt)); return tgt; } } // namespace numext /** \class NumTraits * \ingroup Core_Module * * \brief Holds information about the various numeric (i.e. scalar) types allowed by Eigen. * * \tparam T the numeric type at hand * * This class stores enums, typedefs and static methods giving information about a numeric type. * * The provided data consists of: * \li A typedef \c Real, giving the "real part" type of \a T. If \a T is already real, * then \c Real is just a typedef to \a T. If \a T is \c std::complex then \c Real * is a typedef to \a U. * \li A typedef \c NonInteger, giving the type that should be used for operations producing non-integral values, * such as quotients, square roots, etc. If \a T is a floating-point type, then this typedef just gives * \a T again. Note however that many Eigen functions such as internal::sqrt simply refuse to * take integers. Outside of a few cases, Eigen doesn't do automatic type promotion. Thus, this typedef is * only intended as a helper for code that needs to explicitly promote types. * \li A typedef \c Literal giving the type to use for numeric literals such as "2" or "0.5". For instance, for \c std::complex, Literal is defined as \c U. * Of course, this type must be fully compatible with \a T. In doubt, just use \a T here. * \li A typedef \a Nested giving the type to use to nest a value inside of the expression tree. If you don't know what * this means, just use \a T here. * \li An enum value \a IsComplex. It is equal to 1 if \a T is a \c std::complex * type, and to 0 otherwise. * \li An enum value \a IsInteger. It is equal to \c 1 if \a T is an integer type such as \c int, * and to \c 0 otherwise. * \li Enum values ReadCost, AddCost and MulCost representing a rough estimate of the number of CPU cycles needed * to by move / add / mul instructions respectively, assuming the data is already stored in CPU registers. * Stay vague here. No need to do architecture-specific stuff. If you don't know what this means, just use \c Eigen::HugeCost. * \li An enum value \a IsSigned. It is equal to \c 1 if \a T is a signed type and to 0 if \a T is unsigned. * \li An enum value \a RequireInitialization. It is equal to \c 1 if the constructor of the numeric type \a T must * be called, and to 0 if it is safe not to call it. Default is 0 if \a T is an arithmetic type, and 1 otherwise. * \li An epsilon() function which, unlike std::numeric_limits::epsilon(), * it returns a \a Real instead of a \a T. * \li A dummy_precision() function returning a weak epsilon value. It is mainly used as a default * value by the fuzzy comparison operators. * \li highest() and lowest() functions returning the highest and lowest possible values respectively. * \li digits() function returning the number of radix digits (non-sign digits for integers, mantissa for floating-point). This is * the analogue of std::numeric_limits::digits * which is used as the default implementation if specialized. * \li digits10() function returning the number of decimal digits that can be represented without change. This is * the analogue of std::numeric_limits::digits10 * which is used as the default implementation if specialized. * \li min_exponent() and max_exponent() functions returning the highest and lowest possible values, respectively, * such that the radix raised to the power exponent-1 is a normalized floating-point number. These are equivalent to * std::numeric_limits::min_exponent/ * std::numeric_limits::max_exponent. * \li infinity() function returning a representation of positive infinity, if available. * \li quiet_NaN function returning a non-signaling "not-a-number", if available. */ template struct GenericNumTraits { enum { IsInteger = std::numeric_limits::is_integer, IsSigned = std::numeric_limits::is_signed, IsComplex = 0, RequireInitialization = internal::is_arithmetic::value ? 0 : 1, ReadCost = 1, AddCost = 1, MulCost = 1 }; typedef T Real; typedef typename internal::conditional< IsInteger, typename internal::conditional::type, T >::type NonInteger; typedef T Nested; typedef T Literal; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline Real epsilon() { return numext::numeric_limits::epsilon(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline int digits10() { return internal::default_digits10_impl::run(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline int digits() { return internal::default_digits_impl::run(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline int min_exponent() { return numext::numeric_limits::min_exponent; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline int max_exponent() { return numext::numeric_limits::max_exponent; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline Real dummy_precision() { // make sure to override this for floating-point types return Real(0); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline T highest() { return (numext::numeric_limits::max)(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline T lowest() { return IsInteger ? (numext::numeric_limits::min)() : static_cast(-(numext::numeric_limits::max)()); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline T infinity() { return numext::numeric_limits::infinity(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline T quiet_NaN() { return numext::numeric_limits::quiet_NaN(); } }; template struct NumTraits : GenericNumTraits {}; template<> struct NumTraits : GenericNumTraits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline float dummy_precision() { return 1e-5f; } }; template<> struct NumTraits : GenericNumTraits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline double dummy_precision() { return 1e-12; } }; template<> struct NumTraits : GenericNumTraits { EIGEN_CONSTEXPR static inline long double dummy_precision() { return 1e-15l; } }; template struct NumTraits > : GenericNumTraits > { typedef _Real Real; typedef typename NumTraits<_Real>::Literal Literal; enum { IsComplex = 1, RequireInitialization = NumTraits<_Real>::RequireInitialization, ReadCost = 2 * NumTraits<_Real>::ReadCost, AddCost = 2 * NumTraits::AddCost, MulCost = 4 * NumTraits::MulCost + 2 * NumTraits::AddCost }; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline Real epsilon() { return NumTraits::epsilon(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline Real dummy_precision() { return NumTraits::dummy_precision(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline int digits10() { return NumTraits::digits10(); } }; template struct NumTraits > { typedef Array ArrayType; typedef typename NumTraits::Real RealScalar; typedef Array Real; typedef typename NumTraits::NonInteger NonIntegerScalar; typedef Array NonInteger; typedef ArrayType & Nested; typedef typename NumTraits::Literal Literal; enum { IsComplex = NumTraits::IsComplex, IsInteger = NumTraits::IsInteger, IsSigned = NumTraits::IsSigned, RequireInitialization = 1, ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits::ReadCost), AddCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits::AddCost), MulCost = ArrayType::SizeAtCompileTime==Dynamic ? HugeCost : ArrayType::SizeAtCompileTime * int(NumTraits::MulCost) }; EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline RealScalar epsilon() { return NumTraits::epsilon(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static inline RealScalar dummy_precision() { return NumTraits::dummy_precision(); } EIGEN_CONSTEXPR static inline int digits10() { return NumTraits::digits10(); } }; template<> struct NumTraits : GenericNumTraits { enum { RequireInitialization = 1, ReadCost = HugeCost, AddCost = HugeCost, MulCost = HugeCost }; EIGEN_CONSTEXPR static inline int digits10() { return 0; } private: static inline std::string epsilon(); static inline std::string dummy_precision(); static inline std::string lowest(); static inline std::string highest(); static inline std::string infinity(); static inline std::string quiet_NaN(); }; // Empty specialization for void to allow template specialization based on NumTraits::Real with T==void and SFINAE. template<> struct NumTraits {}; template<> struct NumTraits : GenericNumTraits {}; } // end namespace Eigen #endif // EIGEN_NUMTRAITS_H RcppEigen/inst/include/Eigen/src/Core/ArrayWrapper.h0000644000176200001440000001555214562417221022040 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARRAYWRAPPER_H #define EIGEN_ARRAYWRAPPER_H namespace Eigen { /** \class ArrayWrapper * \ingroup Core_Module * * \brief Expression of a mathematical vector or matrix as an array object * * This class is the return type of MatrixBase::array(), and most of the time * this is the only way it is use. * * \sa MatrixBase::array(), class MatrixWrapper */ namespace internal { template struct traits > : public traits::type > { typedef ArrayXpr XprKind; // Let's remove NestByRefBit enum { Flags0 = traits::type >::Flags, LvalueBitFlag = is_lvalue::value ? LvalueBit : 0, Flags = (Flags0 & ~(NestByRefBit | LvalueBit)) | LvalueBitFlag }; }; } template class ArrayWrapper : public ArrayBase > { public: typedef ArrayBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(ArrayWrapper) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ArrayWrapper) typedef typename internal::remove_all::type NestedExpression; typedef typename internal::conditional< internal::is_lvalue::value, Scalar, const Scalar >::type ScalarWithConstIfNotLvalue; typedef typename internal::ref_selector::non_const_type NestedExpressionType; using Base::coeffRef; EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); } EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); } EIGEN_DEVICE_FUNC inline const Scalar* data() const { return m_expression.data(); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const { return m_expression.coeffRef(rowId, colId); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { return m_expression.coeffRef(index); } template EIGEN_DEVICE_FUNC inline void evalTo(Dest& dst) const { dst = m_expression; } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_expression; } /** Forwards the resizing request to the nested expression * \sa DenseBase::resize(Index) */ EIGEN_DEVICE_FUNC void resize(Index newSize) { m_expression.resize(newSize); } /** Forwards the resizing request to the nested expression * \sa DenseBase::resize(Index,Index)*/ EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { m_expression.resize(rows,cols); } protected: NestedExpressionType m_expression; }; /** \class MatrixWrapper * \ingroup Core_Module * * \brief Expression of an array as a mathematical vector or matrix * * This class is the return type of ArrayBase::matrix(), and most of the time * this is the only way it is use. * * \sa MatrixBase::matrix(), class ArrayWrapper */ namespace internal { template struct traits > : public traits::type > { typedef MatrixXpr XprKind; // Let's remove NestByRefBit enum { Flags0 = traits::type >::Flags, LvalueBitFlag = is_lvalue::value ? LvalueBit : 0, Flags = (Flags0 & ~(NestByRefBit | LvalueBit)) | LvalueBitFlag }; }; } template class MatrixWrapper : public MatrixBase > { public: typedef MatrixBase > Base; EIGEN_DENSE_PUBLIC_INTERFACE(MatrixWrapper) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(MatrixWrapper) typedef typename internal::remove_all::type NestedExpression; typedef typename internal::conditional< internal::is_lvalue::value, Scalar, const Scalar >::type ScalarWithConstIfNotLvalue; typedef typename internal::ref_selector::non_const_type NestedExpressionType; using Base::coeffRef; EIGEN_DEVICE_FUNC explicit inline MatrixWrapper(ExpressionType& matrix) : m_expression(matrix) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return m_expression.outerStride(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return m_expression.innerStride(); } EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); } EIGEN_DEVICE_FUNC inline const Scalar* data() const { return m_expression.data(); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const { return m_expression.derived().coeffRef(rowId, colId); } EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { return m_expression.coeffRef(index); } EIGEN_DEVICE_FUNC const typename internal::remove_all::type& nestedExpression() const { return m_expression; } /** Forwards the resizing request to the nested expression * \sa DenseBase::resize(Index) */ EIGEN_DEVICE_FUNC void resize(Index newSize) { m_expression.resize(newSize); } /** Forwards the resizing request to the nested expression * \sa DenseBase::resize(Index,Index)*/ EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { m_expression.resize(rows,cols); } protected: NestedExpressionType m_expression; }; } // end namespace Eigen #endif // EIGEN_ARRAYWRAPPER_H RcppEigen/inst/include/Eigen/src/Core/NestByValue.h0000644000176200001440000000473014562417221021616 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_NESTBYVALUE_H #define EIGEN_NESTBYVALUE_H namespace Eigen { namespace internal { template struct traits > : public traits { enum { Flags = traits::Flags & ~NestByRefBit }; }; } /** \class NestByValue * \ingroup Core_Module * * \brief Expression which must be nested by value * * \tparam ExpressionType the type of the object of which we are requiring nesting-by-value * * This class is the return type of MatrixBase::nestByValue() * and most of the time this is the only way it is used. * * \sa MatrixBase::nestByValue() */ template class NestByValue : public internal::dense_xpr_base< NestByValue >::type { public: typedef typename internal::dense_xpr_base::type Base; EIGEN_DENSE_PUBLIC_INTERFACE(NestByValue) EIGEN_DEVICE_FUNC explicit inline NestByValue(const ExpressionType& matrix) : m_expression(matrix) {} EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_expression.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_expression.cols(); } EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; } EIGEN_DEVICE_FUNC const ExpressionType& nestedExpression() const { return m_expression; } protected: const ExpressionType m_expression; }; /** \returns an expression of the temporary version of *this. */ template EIGEN_DEVICE_FUNC inline const NestByValue DenseBase::nestByValue() const { return NestByValue(derived()); } namespace internal { // Evaluator of Solve -> eval into a temporary template struct evaluator > : public evaluator { typedef evaluator Base; EIGEN_DEVICE_FUNC explicit evaluator(const NestByValue& xpr) : Base(xpr.nestedExpression()) {} }; } } // end namespace Eigen #endif // EIGEN_NESTBYVALUE_H RcppEigen/inst/include/Eigen/src/Core/PlainObjectBase.h0000644000176200001440000014005114562417221022377 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2009 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_DENSESTORAGEBASE_H #define EIGEN_DENSESTORAGEBASE_H #if defined(EIGEN_INITIALIZE_MATRICES_BY_ZERO) # define EIGEN_INITIALIZE_COEFFS # define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(Index i=0;i::quiet_NaN(); #else # undef EIGEN_INITIALIZE_COEFFS # define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED #endif namespace Eigen { namespace internal { template struct check_rows_cols_for_overflow { template EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE void run(Index, Index) { } }; template<> struct check_rows_cols_for_overflow { template EIGEN_DEVICE_FUNC static EIGEN_ALWAYS_INLINE void run(Index rows, Index cols) { // http://hg.mozilla.org/mozilla-central/file/6c8a909977d3/xpcom/ds/CheckedInt.h#l242 // we assume Index is signed Index max_index = (std::size_t(1) << (8 * sizeof(Index) - 1)) - 1; // assume Index is signed bool error = (rows == 0 || cols == 0) ? false : (rows > max_index / cols); if (error) throw_std_bad_alloc(); } }; template struct conservative_resize_like_impl; template struct matrix_swap_impl; } // end namespace internal #ifdef EIGEN_PARSED_BY_DOXYGEN namespace doxygen { // This is a workaround to doxygen not being able to understand the inheritance logic // when it is hidden by the dense_xpr_base helper struct. // Moreover, doxygen fails to include members that are not documented in the declaration body of // MatrixBase if we inherits MatrixBase >, // this is why we simply inherits MatrixBase, though this does not make sense. /** This class is just a workaround for Doxygen and it does not not actually exist. */ template struct dense_xpr_base_dispatcher; /** This class is just a workaround for Doxygen and it does not not actually exist. */ template struct dense_xpr_base_dispatcher > : public MatrixBase {}; /** This class is just a workaround for Doxygen and it does not not actually exist. */ template struct dense_xpr_base_dispatcher > : public ArrayBase {}; } // namespace doxygen /** \class PlainObjectBase * \ingroup Core_Module * \brief %Dense storage base class for matrices and arrays. * * This class can be extended with the help of the plugin mechanism described on the page * \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_PLAINOBJECTBASE_PLUGIN. * * \tparam Derived is the derived type, e.g., a Matrix or Array * * \sa \ref TopicClassHierarchy */ template class PlainObjectBase : public doxygen::dense_xpr_base_dispatcher #else template class PlainObjectBase : public internal::dense_xpr_base::type #endif { public: enum { Options = internal::traits::Options }; typedef typename internal::dense_xpr_base::type Base; typedef typename internal::traits::StorageKind StorageKind; typedef typename internal::traits::Scalar Scalar; typedef typename internal::packet_traits::type PacketScalar; typedef typename NumTraits::Real RealScalar; typedef Derived DenseType; using Base::RowsAtCompileTime; using Base::ColsAtCompileTime; using Base::SizeAtCompileTime; using Base::MaxRowsAtCompileTime; using Base::MaxColsAtCompileTime; using Base::MaxSizeAtCompileTime; using Base::IsVectorAtCompileTime; using Base::Flags; typedef Eigen::Map MapType; typedef const Eigen::Map ConstMapType; typedef Eigen::Map AlignedMapType; typedef const Eigen::Map ConstAlignedMapType; template struct StridedMapType { typedef Eigen::Map type; }; template struct StridedConstMapType { typedef Eigen::Map type; }; template struct StridedAlignedMapType { typedef Eigen::Map type; }; template struct StridedConstAlignedMapType { typedef Eigen::Map type; }; protected: DenseStorage m_storage; public: enum { NeedsToAlign = (SizeAtCompileTime != Dynamic) && (internal::traits::Alignment>0) }; EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign) EIGEN_DEVICE_FUNC Base& base() { return *static_cast(this); } EIGEN_DEVICE_FUNC const Base& base() const { return *static_cast(this); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_storage.rows(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_storage.cols(); } /** This is an overloaded version of DenseCoeffsBase::coeff(Index,Index) const * provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts. * * See DenseCoeffsBase::coeff(Index) const for details. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index rowId, Index colId) const { if(Flags & RowMajorBit) return m_storage.data()[colId + rowId * m_storage.cols()]; else // column-major return m_storage.data()[rowId + colId * m_storage.rows()]; } /** This is an overloaded version of DenseCoeffsBase::coeff(Index) const * provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts. * * See DenseCoeffsBase::coeff(Index) const for details. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const { return m_storage.data()[index]; } /** This is an overloaded version of DenseCoeffsBase::coeffRef(Index,Index) const * provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts. * * See DenseCoeffsBase::coeffRef(Index,Index) const for details. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index rowId, Index colId) { if(Flags & RowMajorBit) return m_storage.data()[colId + rowId * m_storage.cols()]; else // column-major return m_storage.data()[rowId + colId * m_storage.rows()]; } /** This is an overloaded version of DenseCoeffsBase::coeffRef(Index) const * provided to by-pass the creation of an evaluator of the expression, thus saving compilation efforts. * * See DenseCoeffsBase::coeffRef(Index) const for details. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_storage.data()[index]; } /** This is the const version of coeffRef(Index,Index) which is thus synonym of coeff(Index,Index). * It is provided for convenience. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeffRef(Index rowId, Index colId) const { if(Flags & RowMajorBit) return m_storage.data()[colId + rowId * m_storage.cols()]; else // column-major return m_storage.data()[rowId + colId * m_storage.rows()]; } /** This is the const version of coeffRef(Index) which is thus synonym of coeff(Index). * It is provided for convenience. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeffRef(Index index) const { return m_storage.data()[index]; } /** \internal */ template EIGEN_STRONG_INLINE PacketScalar packet(Index rowId, Index colId) const { return internal::ploadt (m_storage.data() + (Flags & RowMajorBit ? colId + rowId * m_storage.cols() : rowId + colId * m_storage.rows())); } /** \internal */ template EIGEN_STRONG_INLINE PacketScalar packet(Index index) const { return internal::ploadt(m_storage.data() + index); } /** \internal */ template EIGEN_STRONG_INLINE void writePacket(Index rowId, Index colId, const PacketScalar& val) { internal::pstoret (m_storage.data() + (Flags & RowMajorBit ? colId + rowId * m_storage.cols() : rowId + colId * m_storage.rows()), val); } /** \internal */ template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketScalar& val) { internal::pstoret(m_storage.data() + index, val); } /** \returns a const pointer to the data array of this matrix */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar *data() const { return m_storage.data(); } /** \returns a pointer to the data array of this matrix */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar *data() { return m_storage.data(); } /** Resizes \c *this to a \a rows x \a cols matrix. * * This method is intended for dynamic-size matrices, although it is legal to call it on any * matrix as long as fixed dimensions are left unchanged. If you only want to change the number * of rows and/or of columns, you can use resize(NoChange_t, Index), resize(Index, NoChange_t). * * If the current number of coefficients of \c *this exactly matches the * product \a rows * \a cols, then no memory allocation is performed and * the current values are left unchanged. In all other cases, including * shrinking, the data is reallocated and all previous values are lost. * * Example: \include Matrix_resize_int_int.cpp * Output: \verbinclude Matrix_resize_int_int.out * * \sa resize(Index) for vectors, resize(NoChange_t, Index), resize(Index, NoChange_t) */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize(Index rows, Index cols) { eigen_assert( EIGEN_IMPLIES(RowsAtCompileTime!=Dynamic,rows==RowsAtCompileTime) && EIGEN_IMPLIES(ColsAtCompileTime!=Dynamic,cols==ColsAtCompileTime) && EIGEN_IMPLIES(RowsAtCompileTime==Dynamic && MaxRowsAtCompileTime!=Dynamic,rows<=MaxRowsAtCompileTime) && EIGEN_IMPLIES(ColsAtCompileTime==Dynamic && MaxColsAtCompileTime!=Dynamic,cols<=MaxColsAtCompileTime) && rows>=0 && cols>=0 && "Invalid sizes when resizing a matrix or array."); internal::check_rows_cols_for_overflow::run(rows, cols); #ifdef EIGEN_INITIALIZE_COEFFS Index size = rows*cols; bool size_changed = size != this->size(); m_storage.resize(size, rows, cols); if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED #else m_storage.resize(rows*cols, rows, cols); #endif } /** Resizes \c *this to a vector of length \a size * * \only_for_vectors. This method does not work for * partially dynamic matrices when the static dimension is anything other * than 1. For example it will not work with Matrix. * * Example: \include Matrix_resize_int.cpp * Output: \verbinclude Matrix_resize_int.out * * \sa resize(Index,Index), resize(NoChange_t, Index), resize(Index, NoChange_t) */ EIGEN_DEVICE_FUNC inline void resize(Index size) { EIGEN_STATIC_ASSERT_VECTOR_ONLY(PlainObjectBase) eigen_assert(((SizeAtCompileTime == Dynamic && (MaxSizeAtCompileTime==Dynamic || size<=MaxSizeAtCompileTime)) || SizeAtCompileTime == size) && size>=0); #ifdef EIGEN_INITIALIZE_COEFFS bool size_changed = size != this->size(); #endif if(RowsAtCompileTime == 1) m_storage.resize(size, 1, size); else m_storage.resize(size, size, 1); #ifdef EIGEN_INITIALIZE_COEFFS if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED #endif } /** Resizes the matrix, changing only the number of columns. For the parameter of type NoChange_t, just pass the special value \c NoChange * as in the example below. * * Example: \include Matrix_resize_NoChange_int.cpp * Output: \verbinclude Matrix_resize_NoChange_int.out * * \sa resize(Index,Index) */ EIGEN_DEVICE_FUNC inline void resize(NoChange_t, Index cols) { resize(rows(), cols); } /** Resizes the matrix, changing only the number of rows. For the parameter of type NoChange_t, just pass the special value \c NoChange * as in the example below. * * Example: \include Matrix_resize_int_NoChange.cpp * Output: \verbinclude Matrix_resize_int_NoChange.out * * \sa resize(Index,Index) */ EIGEN_DEVICE_FUNC inline void resize(Index rows, NoChange_t) { resize(rows, cols()); } /** Resizes \c *this to have the same dimensions as \a other. * Takes care of doing all the checking that's needed. * * Note that copying a row-vector into a vector (and conversely) is allowed. * The resizing, if any, is then done in the appropriate way so that row-vectors * remain row-vectors and vectors remain vectors. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resizeLike(const EigenBase& _other) { const OtherDerived& other = _other.derived(); internal::check_rows_cols_for_overflow::run(other.rows(), other.cols()); const Index othersize = other.rows()*other.cols(); if(RowsAtCompileTime == 1) { eigen_assert(other.rows() == 1 || other.cols() == 1); resize(1, othersize); } else if(ColsAtCompileTime == 1) { eigen_assert(other.rows() == 1 || other.cols() == 1); resize(othersize, 1); } else resize(other.rows(), other.cols()); } /** Resizes the matrix to \a rows x \a cols while leaving old values untouched. * * The method is intended for matrices of dynamic size. If you only want to change the number * of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or * conservativeResize(Index, NoChange_t). * * Matrices are resized relative to the top-left element. In case values need to be * appended to the matrix they will be uninitialized. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void conservativeResize(Index rows, Index cols) { internal::conservative_resize_like_impl::run(*this, rows, cols); } /** Resizes the matrix to \a rows x \a cols while leaving old values untouched. * * As opposed to conservativeResize(Index rows, Index cols), this version leaves * the number of columns unchanged. * * In case the matrix is growing, new rows will be uninitialized. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void conservativeResize(Index rows, NoChange_t) { // Note: see the comment in conservativeResize(Index,Index) conservativeResize(rows, cols()); } /** Resizes the matrix to \a rows x \a cols while leaving old values untouched. * * As opposed to conservativeResize(Index rows, Index cols), this version leaves * the number of rows unchanged. * * In case the matrix is growing, new columns will be uninitialized. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void conservativeResize(NoChange_t, Index cols) { // Note: see the comment in conservativeResize(Index,Index) conservativeResize(rows(), cols); } /** Resizes the vector to \a size while retaining old values. * * \only_for_vectors. This method does not work for * partially dynamic matrices when the static dimension is anything other * than 1. For example it will not work with Matrix. * * When values are appended, they will be uninitialized. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void conservativeResize(Index size) { internal::conservative_resize_like_impl::run(*this, size); } /** Resizes the matrix to \a rows x \a cols of \c other, while leaving old values untouched. * * The method is intended for matrices of dynamic size. If you only want to change the number * of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or * conservativeResize(Index, NoChange_t). * * Matrices are resized relative to the top-left element. In case values need to be * appended to the matrix they will copied from \c other. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void conservativeResizeLike(const DenseBase& other) { internal::conservative_resize_like_impl::run(*this, other); } /** This is a special case of the templated operator=. Its purpose is to * prevent a default operator= from hiding the templated operator=. */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const PlainObjectBase& other) { return _set(other); } /** \sa MatrixBase::lazyAssign() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& lazyAssign(const DenseBase& other) { _resize_to_match(other); return Base::lazyAssign(other.derived()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const ReturnByValue& func) { resize(func.rows(), func.cols()); return Base::operator=(func); } // Prevent user from trying to instantiate PlainObjectBase objects // by making all its constructor protected. See bug 1074. protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase() : m_storage() { // _check_template_params(); // EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } #ifndef EIGEN_PARSED_BY_DOXYGEN // FIXME is it still needed ? /** \internal */ EIGEN_DEVICE_FUNC explicit PlainObjectBase(internal::constructor_without_unaligned_array_assert) : m_storage(internal::constructor_without_unaligned_array_assert()) { // _check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } #endif #if EIGEN_HAS_RVALUE_REFERENCES EIGEN_DEVICE_FUNC PlainObjectBase(PlainObjectBase&& other) EIGEN_NOEXCEPT : m_storage( std::move(other.m_storage) ) { } EIGEN_DEVICE_FUNC PlainObjectBase& operator=(PlainObjectBase&& other) EIGEN_NOEXCEPT { _check_template_params(); m_storage = std::move(other.m_storage); return *this; } #endif /** Copy constructor */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(const PlainObjectBase& other) : Base(), m_storage(other.m_storage) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(Index size, Index rows, Index cols) : m_storage(size, rows, cols) { // _check_template_params(); // EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED } #if EIGEN_HAS_CXX11 /** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients. \cpp11 * * \only_for_vectors * * This constructor is for 1D array or vectors with more than 4 coefficients. * There exists C++98 analogue constructors for fixed-size array/vector having 1, 2, 3, or 4 coefficients. * * \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this * constructor must match the the fixed number of rows (resp. columns) of \c *this. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) : m_storage() { _check_template_params(); EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, sizeof...(args) + 4); m_storage.data()[0] = a0; m_storage.data()[1] = a1; m_storage.data()[2] = a2; m_storage.data()[3] = a3; Index i = 4; auto x = {(m_storage.data()[i++] = args, 0)...}; static_cast(x); } /** \brief Constructs a Matrix or Array and initializes it by elements given by an initializer list of initializer * lists \cpp11 */ EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE PlainObjectBase(const std::initializer_list>& list) : m_storage() { _check_template_params(); size_t list_size = 0; if (list.begin() != list.end()) { list_size = list.begin()->size(); } // This is to allow syntax like VectorXi {{1, 2, 3, 4}} if (ColsAtCompileTime == 1 && list.size() == 1) { eigen_assert(list_size == static_cast(RowsAtCompileTime) || RowsAtCompileTime == Dynamic); resize(list_size, ColsAtCompileTime); std::copy(list.begin()->begin(), list.begin()->end(), m_storage.data()); } else { eigen_assert(list.size() == static_cast(RowsAtCompileTime) || RowsAtCompileTime == Dynamic); eigen_assert(list_size == static_cast(ColsAtCompileTime) || ColsAtCompileTime == Dynamic); resize(list.size(), list_size); Index row_index = 0; for (const std::initializer_list& row : list) { eigen_assert(list_size == row.size()); Index col_index = 0; for (const Scalar& e : row) { coeffRef(row_index, col_index) = e; ++col_index; } ++row_index; } } } #endif // end EIGEN_HAS_CXX11 /** \sa PlainObjectBase::operator=(const EigenBase&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(const DenseBase &other) : m_storage() { _check_template_params(); resizeLike(other); _set_noalias(other); } /** \sa PlainObjectBase::operator=(const EigenBase&) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(const EigenBase &other) : m_storage() { _check_template_params(); resizeLike(other); *this = other.derived(); } /** \brief Copy constructor with in-place evaluation */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PlainObjectBase(const ReturnByValue& other) { _check_template_params(); // FIXME this does not automatically transpose vectors if necessary resize(other.rows(), other.cols()); other.evalTo(this->derived()); } public: /** \brief Copies the generic expression \a other into *this. * \copydetails DenseBase::operator=(const EigenBase &other) */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const EigenBase &other) { _resize_to_match(other); Base::operator=(other.derived()); return this->derived(); } /** \name Map * These are convenience functions returning Map objects. The Map() static functions return unaligned Map objects, * while the AlignedMap() functions return aligned Map objects and thus should be called only with 16-byte-aligned * \a data pointers. * * Here is an example using strides: * \include Matrix_Map_stride.cpp * Output: \verbinclude Matrix_Map_stride.out * * \see class Map */ //@{ static inline ConstMapType Map(const Scalar* data) { return ConstMapType(data); } static inline MapType Map(Scalar* data) { return MapType(data); } static inline ConstMapType Map(const Scalar* data, Index size) { return ConstMapType(data, size); } static inline MapType Map(Scalar* data, Index size) { return MapType(data, size); } static inline ConstMapType Map(const Scalar* data, Index rows, Index cols) { return ConstMapType(data, rows, cols); } static inline MapType Map(Scalar* data, Index rows, Index cols) { return MapType(data, rows, cols); } static inline ConstAlignedMapType MapAligned(const Scalar* data) { return ConstAlignedMapType(data); } static inline AlignedMapType MapAligned(Scalar* data) { return AlignedMapType(data); } static inline ConstAlignedMapType MapAligned(const Scalar* data, Index size) { return ConstAlignedMapType(data, size); } static inline AlignedMapType MapAligned(Scalar* data, Index size) { return AlignedMapType(data, size); } static inline ConstAlignedMapType MapAligned(const Scalar* data, Index rows, Index cols) { return ConstAlignedMapType(data, rows, cols); } static inline AlignedMapType MapAligned(Scalar* data, Index rows, Index cols) { return AlignedMapType(data, rows, cols); } template static inline typename StridedConstMapType >::type Map(const Scalar* data, const Stride& stride) { return typename StridedConstMapType >::type(data, stride); } template static inline typename StridedMapType >::type Map(Scalar* data, const Stride& stride) { return typename StridedMapType >::type(data, stride); } template static inline typename StridedConstMapType >::type Map(const Scalar* data, Index size, const Stride& stride) { return typename StridedConstMapType >::type(data, size, stride); } template static inline typename StridedMapType >::type Map(Scalar* data, Index size, const Stride& stride) { return typename StridedMapType >::type(data, size, stride); } template static inline typename StridedConstMapType >::type Map(const Scalar* data, Index rows, Index cols, const Stride& stride) { return typename StridedConstMapType >::type(data, rows, cols, stride); } template static inline typename StridedMapType >::type Map(Scalar* data, Index rows, Index cols, const Stride& stride) { return typename StridedMapType >::type(data, rows, cols, stride); } template static inline typename StridedConstAlignedMapType >::type MapAligned(const Scalar* data, const Stride& stride) { return typename StridedConstAlignedMapType >::type(data, stride); } template static inline typename StridedAlignedMapType >::type MapAligned(Scalar* data, const Stride& stride) { return typename StridedAlignedMapType >::type(data, stride); } template static inline typename StridedConstAlignedMapType >::type MapAligned(const Scalar* data, Index size, const Stride& stride) { return typename StridedConstAlignedMapType >::type(data, size, stride); } template static inline typename StridedAlignedMapType >::type MapAligned(Scalar* data, Index size, const Stride& stride) { return typename StridedAlignedMapType >::type(data, size, stride); } template static inline typename StridedConstAlignedMapType >::type MapAligned(const Scalar* data, Index rows, Index cols, const Stride& stride) { return typename StridedConstAlignedMapType >::type(data, rows, cols, stride); } template static inline typename StridedAlignedMapType >::type MapAligned(Scalar* data, Index rows, Index cols, const Stride& stride) { return typename StridedAlignedMapType >::type(data, rows, cols, stride); } //@} using Base::setConstant; EIGEN_DEVICE_FUNC Derived& setConstant(Index size, const Scalar& val); EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, Index cols, const Scalar& val); EIGEN_DEVICE_FUNC Derived& setConstant(NoChange_t, Index cols, const Scalar& val); EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, NoChange_t, const Scalar& val); using Base::setZero; EIGEN_DEVICE_FUNC Derived& setZero(Index size); EIGEN_DEVICE_FUNC Derived& setZero(Index rows, Index cols); EIGEN_DEVICE_FUNC Derived& setZero(NoChange_t, Index cols); EIGEN_DEVICE_FUNC Derived& setZero(Index rows, NoChange_t); using Base::setOnes; EIGEN_DEVICE_FUNC Derived& setOnes(Index size); EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, Index cols); EIGEN_DEVICE_FUNC Derived& setOnes(NoChange_t, Index cols); EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, NoChange_t); using Base::setRandom; Derived& setRandom(Index size); Derived& setRandom(Index rows, Index cols); Derived& setRandom(NoChange_t, Index cols); Derived& setRandom(Index rows, NoChange_t); #ifdef EIGEN_PLAINOBJECTBASE_PLUGIN #include EIGEN_PLAINOBJECTBASE_PLUGIN #endif protected: /** \internal Resizes *this in preparation for assigning \a other to it. * Takes care of doing all the checking that's needed. * * Note that copying a row-vector into a vector (and conversely) is allowed. * The resizing, if any, is then done in the appropriate way so that row-vectors * remain row-vectors and vectors remain vectors. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _resize_to_match(const EigenBase& other) { #ifdef EIGEN_NO_AUTOMATIC_RESIZING eigen_assert((this->size()==0 || (IsVectorAtCompileTime ? (this->size() == other.size()) : (rows() == other.rows() && cols() == other.cols()))) && "Size mismatch. Automatic resizing is disabled because EIGEN_NO_AUTOMATIC_RESIZING is defined"); EIGEN_ONLY_USED_FOR_DEBUG(other); #else resizeLike(other); #endif } /** * \brief Copies the value of the expression \a other into \c *this with automatic resizing. * * *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized), * it will be initialized. * * Note that copying a row-vector into a vector (and conversely) is allowed. * The resizing, if any, is then done in the appropriate way so that row-vectors * remain row-vectors and vectors remain vectors. * * \sa operator=(const MatrixBase&), _set_noalias() * * \internal */ // aliasing is dealt once in internal::call_assignment // so at this stage we have to assume aliasing... and resising has to be done later. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& _set(const DenseBase& other) { internal::call_assignment(this->derived(), other.derived()); return this->derived(); } /** \internal Like _set() but additionally makes the assumption that no aliasing effect can happen (which * is the case when creating a new matrix) so one can enforce lazy evaluation. * * \sa operator=(const MatrixBase&), _set() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& _set_noalias(const DenseBase& other) { // I don't think we need this resize call since the lazyAssign will anyways resize // and lazyAssign will be called by the assign selector. //_resize_to_match(other); // the 'false' below means to enforce lazy evaluation. We don't use lazyAssign() because // it wouldn't allow to copy a row-vector into a column-vector. internal::call_assignment_no_alias(this->derived(), other.derived(), internal::assign_op()); return this->derived(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init2(Index rows, Index cols, typename internal::enable_if::type* = 0) { const bool t0_is_integer_alike = internal::is_valid_index_type::value; const bool t1_is_integer_alike = internal::is_valid_index_type::value; EIGEN_STATIC_ASSERT(t0_is_integer_alike && t1_is_integer_alike, FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED) resize(rows,cols); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init2(const T0& val0, const T1& val1, typename internal::enable_if::type* = 0) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 2) m_storage.data()[0] = Scalar(val0); m_storage.data()[1] = Scalar(val1); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init2(const Index& val0, const Index& val1, typename internal::enable_if< (!internal::is_same::value) && (internal::is_same::value) && (internal::is_same::value) && Base::SizeAtCompileTime==2,T1>::type* = 0) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 2) m_storage.data()[0] = Scalar(val0); m_storage.data()[1] = Scalar(val1); } // The argument is convertible to the Index type and we either have a non 1x1 Matrix, or a dynamic-sized Array, // then the argument is meant to be the size of the object. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(Index size, typename internal::enable_if< (Base::SizeAtCompileTime!=1 || !internal::is_convertible::value) && ((!internal::is_same::XprKind,ArrayXpr>::value || Base::SizeAtCompileTime==Dynamic)),T>::type* = 0) { // NOTE MSVC 2008 complains if we directly put bool(NumTraits::IsInteger) as the EIGEN_STATIC_ASSERT argument. const bool is_integer_alike = internal::is_valid_index_type::value; EIGEN_UNUSED_VARIABLE(is_integer_alike); EIGEN_STATIC_ASSERT(is_integer_alike, FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED) resize(size); } // We have a 1x1 matrix/array => the argument is interpreted as the value of the unique coefficient (case where scalar type can be implicitly converted) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Scalar& val0, typename internal::enable_if::value,T>::type* = 0) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 1) m_storage.data()[0] = val0; } // We have a 1x1 matrix/array => the argument is interpreted as the value of the unique coefficient (case where scalar type match the index type) template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Index& val0, typename internal::enable_if< (!internal::is_same::value) && (internal::is_same::value) && Base::SizeAtCompileTime==1 && internal::is_convertible::value,T*>::type* = 0) { EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 1) m_storage.data()[0] = Scalar(val0); } // Initialize a fixed size matrix from a pointer to raw data template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Scalar* data){ this->_set_noalias(ConstMapType(data)); } // Initialize an arbitrary matrix from a dense expression template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const DenseBase& other){ this->_set_noalias(other); } // Initialize an arbitrary matrix from an object convertible to the Derived type. template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Derived& other){ this->_set_noalias(other); } // Initialize an arbitrary matrix from a generic Eigen expression template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const EigenBase& other){ this->derived() = other; } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const ReturnByValue& other) { resize(other.rows(), other.cols()); other.evalTo(this->derived()); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const RotationBase& r) { this->derived() = r; } // For fixed-size Array template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Scalar& val0, typename internal::enable_if< Base::SizeAtCompileTime!=Dynamic && Base::SizeAtCompileTime!=1 && internal::is_convertible::value && internal::is_same::XprKind,ArrayXpr>::value,T>::type* = 0) { Base::setConstant(val0); } // For fixed-size Array template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void _init1(const Index& val0, typename internal::enable_if< (!internal::is_same::value) && (internal::is_same::value) && Base::SizeAtCompileTime!=Dynamic && Base::SizeAtCompileTime!=1 && internal::is_convertible::value && internal::is_same::XprKind,ArrayXpr>::value,T*>::type* = 0) { Base::setConstant(val0); } template friend struct internal::matrix_swap_impl; public: #ifndef EIGEN_PARSED_BY_DOXYGEN /** \internal * \brief Override DenseBase::swap() since for dynamic-sized matrices * of same type it is enough to swap the data pointers. */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void swap(DenseBase & other) { enum { SwapPointers = internal::is_same::value && Base::SizeAtCompileTime==Dynamic }; internal::matrix_swap_impl::run(this->derived(), other.derived()); } /** \internal * \brief const version forwarded to DenseBase::swap */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void swap(DenseBase const & other) { Base::swap(other.derived()); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void _check_template_params() { EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (int(Options)&RowMajor)==RowMajor) && EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (int(Options)&RowMajor)==0) && ((RowsAtCompileTime == Dynamic) || (RowsAtCompileTime >= 0)) && ((ColsAtCompileTime == Dynamic) || (ColsAtCompileTime >= 0)) && ((MaxRowsAtCompileTime == Dynamic) || (MaxRowsAtCompileTime >= 0)) && ((MaxColsAtCompileTime == Dynamic) || (MaxColsAtCompileTime >= 0)) && (MaxRowsAtCompileTime == RowsAtCompileTime || RowsAtCompileTime==Dynamic) && (MaxColsAtCompileTime == ColsAtCompileTime || ColsAtCompileTime==Dynamic) && (Options & (DontAlign|RowMajor)) == Options), INVALID_MATRIX_TEMPLATE_PARAMETERS) } enum { IsPlainObjectBase = 1 }; #endif public: // These apparently need to be down here for nvcc+icc to prevent duplicate // Map symbol. template friend class Eigen::Map; friend class Eigen::Map; friend class Eigen::Map; #if EIGEN_MAX_ALIGN_BYTES>0 // for EIGEN_MAX_ALIGN_BYTES==0, AlignedMax==Unaligned, and many compilers generate warnings for friend-ing a class twice. friend class Eigen::Map; friend class Eigen::Map; #endif }; namespace internal { template struct conservative_resize_like_impl { #if EIGEN_HAS_TYPE_TRAITS static const bool IsRelocatable = std::is_trivially_copyable::value; #else static const bool IsRelocatable = !NumTraits::RequireInitialization; #endif static void run(DenseBase& _this, Index rows, Index cols) { if (_this.rows() == rows && _this.cols() == cols) return; EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived) if ( IsRelocatable && (( Derived::IsRowMajor && _this.cols() == cols) || // row-major and we change only the number of rows (!Derived::IsRowMajor && _this.rows() == rows) )) // column-major and we change only the number of columns { internal::check_rows_cols_for_overflow::run(rows, cols); _this.derived().m_storage.conservativeResize(rows*cols,rows,cols); } else { // The storage order does not allow us to use reallocation. Derived tmp(rows,cols); const Index common_rows = numext::mini(rows, _this.rows()); const Index common_cols = numext::mini(cols, _this.cols()); tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols); _this.derived().swap(tmp); } } static void run(DenseBase& _this, const DenseBase& other) { if (_this.rows() == other.rows() && _this.cols() == other.cols()) return; // Note: Here is space for improvement. Basically, for conservativeResize(Index,Index), // neither RowsAtCompileTime or ColsAtCompileTime must be Dynamic. If only one of the // dimensions is dynamic, one could use either conservativeResize(Index rows, NoChange_t) or // conservativeResize(NoChange_t, Index cols). For these methods new static asserts like // EIGEN_STATIC_ASSERT_DYNAMIC_ROWS and EIGEN_STATIC_ASSERT_DYNAMIC_COLS would be good. EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived) EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(OtherDerived) if ( IsRelocatable && (( Derived::IsRowMajor && _this.cols() == other.cols()) || // row-major and we change only the number of rows (!Derived::IsRowMajor && _this.rows() == other.rows()) )) // column-major and we change only the number of columns { const Index new_rows = other.rows() - _this.rows(); const Index new_cols = other.cols() - _this.cols(); _this.derived().m_storage.conservativeResize(other.size(),other.rows(),other.cols()); if (new_rows>0) _this.bottomRightCorner(new_rows, other.cols()) = other.bottomRows(new_rows); else if (new_cols>0) _this.bottomRightCorner(other.rows(), new_cols) = other.rightCols(new_cols); } else { // The storage order does not allow us to use reallocation. Derived tmp(other); const Index common_rows = numext::mini(tmp.rows(), _this.rows()); const Index common_cols = numext::mini(tmp.cols(), _this.cols()); tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols); _this.derived().swap(tmp); } } }; // Here, the specialization for vectors inherits from the general matrix case // to allow calling .conservativeResize(rows,cols) on vectors. template struct conservative_resize_like_impl : conservative_resize_like_impl { typedef conservative_resize_like_impl Base; using Base::run; using Base::IsRelocatable; static void run(DenseBase& _this, Index size) { const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : size; const Index new_cols = Derived::RowsAtCompileTime==1 ? size : 1; if(IsRelocatable) _this.derived().m_storage.conservativeResize(size,new_rows,new_cols); else Base::run(_this.derived(), new_rows, new_cols); } static void run(DenseBase& _this, const DenseBase& other) { if (_this.rows() == other.rows() && _this.cols() == other.cols()) return; const Index num_new_elements = other.size() - _this.size(); const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : other.rows(); const Index new_cols = Derived::RowsAtCompileTime==1 ? other.cols() : 1; if(IsRelocatable) _this.derived().m_storage.conservativeResize(other.size(),new_rows,new_cols); else Base::run(_this.derived(), new_rows, new_cols); if (num_new_elements > 0) _this.tail(num_new_elements) = other.tail(num_new_elements); } }; template struct matrix_swap_impl { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE void run(MatrixTypeA& a, MatrixTypeB& b) { a.base().swap(b); } }; template struct matrix_swap_impl { EIGEN_DEVICE_FUNC static inline void run(MatrixTypeA& a, MatrixTypeB& b) { static_cast(a).m_storage.swap(static_cast(b).m_storage); } }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_DENSESTORAGEBASE_H RcppEigen/inst/include/Eigen/src/Core/ArithmeticSequence.h0000644000176200001440000004541614562417221023205 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_ARITHMETIC_SEQUENCE_H #define EIGEN_ARITHMETIC_SEQUENCE_H namespace Eigen { namespace internal { #if (!EIGEN_HAS_CXX11) || !((!EIGEN_COMP_GNUC) || EIGEN_COMP_GNUC>=48) template struct aseq_negate {}; template<> struct aseq_negate { typedef Index type; }; template struct aseq_negate > { typedef FixedInt<-N> type; }; // Compilation error in the following case: template<> struct aseq_negate > {}; template::value, bool SizeIsSymbolic =symbolic::is_symbolic::value> struct aseq_reverse_first_type { typedef Index type; }; template struct aseq_reverse_first_type { typedef symbolic::AddExpr > >, symbolic::ValueExpr > > type; }; template struct aseq_reverse_first_type_aux { typedef Index type; }; template struct aseq_reverse_first_type_aux::type> { typedef FixedInt<(SizeType::value-1)*IncrType::value> type; }; template struct aseq_reverse_first_type { typedef typename aseq_reverse_first_type_aux::type Aux; typedef symbolic::AddExpr > type; }; template struct aseq_reverse_first_type { typedef symbolic::AddExpr > >, symbolic::ValueExpr >, symbolic::ValueExpr<> > type; }; #endif // Helper to cleanup the type of the increment: template struct cleanup_seq_incr { typedef typename cleanup_index_type::type type; }; } //-------------------------------------------------------------------------------- // seq(first,last,incr) and seqN(first,size,incr) //-------------------------------------------------------------------------------- template > class ArithmeticSequence; template ArithmeticSequence::type, typename internal::cleanup_index_type::type, typename internal::cleanup_seq_incr::type > seqN(FirstType first, SizeType size, IncrType incr); /** \class ArithmeticSequence * \ingroup Core_Module * * This class represents an arithmetic progression \f$ a_0, a_1, a_2, ..., a_{n-1}\f$ defined by * its \em first value \f$ a_0 \f$, its \em size (aka length) \em n, and the \em increment (aka stride) * that is equal to \f$ a_{i+1}-a_{i}\f$ for any \em i. * * It is internally used as the return type of the Eigen::seq and Eigen::seqN functions, and as the input arguments * of DenseBase::operator()(const RowIndices&, const ColIndices&), and most of the time this is the * only way it is used. * * \tparam FirstType type of the first element, usually an Index, * but internally it can be a symbolic expression * \tparam SizeType type representing the size of the sequence, usually an Index * or a compile time integral constant. Internally, it can also be a symbolic expression * \tparam IncrType type of the increment, can be a runtime Index, or a compile time integral constant (default is compile-time 1) * * \sa Eigen::seq, Eigen::seqN, DenseBase::operator()(const RowIndices&, const ColIndices&), class IndexedView */ template class ArithmeticSequence { public: ArithmeticSequence(FirstType first, SizeType size) : m_first(first), m_size(size) {} ArithmeticSequence(FirstType first, SizeType size, IncrType incr) : m_first(first), m_size(size), m_incr(incr) {} enum { SizeAtCompileTime = internal::get_fixed_value::value, IncrAtCompileTime = internal::get_fixed_value::value }; /** \returns the size, i.e., number of elements, of the sequence */ Index size() const { return m_size; } /** \returns the first element \f$ a_0 \f$ in the sequence */ Index first() const { return m_first; } /** \returns the value \f$ a_i \f$ at index \a i in the sequence. */ Index operator[](Index i) const { return m_first + i * m_incr; } const FirstType& firstObject() const { return m_first; } const SizeType& sizeObject() const { return m_size; } const IncrType& incrObject() const { return m_incr; } protected: FirstType m_first; SizeType m_size; IncrType m_incr; public: #if EIGEN_HAS_CXX11 && ((!EIGEN_COMP_GNUC) || EIGEN_COMP_GNUC>=48) auto reverse() const -> decltype(Eigen::seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr)) { return seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr); } #else protected: typedef typename internal::aseq_negate::type ReverseIncrType; typedef typename internal::aseq_reverse_first_type::type ReverseFirstType; public: ArithmeticSequence reverse() const { return seqN(m_first+(m_size+fix<-1>())*m_incr,m_size,-m_incr); } #endif }; /** \returns an ArithmeticSequence starting at \a first, of length \a size, and increment \a incr * * \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */ template ArithmeticSequence::type,typename internal::cleanup_index_type::type,typename internal::cleanup_seq_incr::type > seqN(FirstType first, SizeType size, IncrType incr) { return ArithmeticSequence::type,typename internal::cleanup_index_type::type,typename internal::cleanup_seq_incr::type>(first,size,incr); } /** \returns an ArithmeticSequence starting at \a first, of length \a size, and unit increment * * \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType) */ template ArithmeticSequence::type,typename internal::cleanup_index_type::type > seqN(FirstType first, SizeType size) { return ArithmeticSequence::type,typename internal::cleanup_index_type::type>(first,size); } #ifdef EIGEN_PARSED_BY_DOXYGEN /** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and with positive (or negative) increment \a incr * * It is essentially an alias to: * \code * seqN(f, (l-f+incr)/incr, incr); * \endcode * * \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType) */ template auto seq(FirstType f, LastType l, IncrType incr); /** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and unit increment * * It is essentially an alias to: * \code * seqN(f,l-f+1); * \endcode * * \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */ template auto seq(FirstType f, LastType l); #else // EIGEN_PARSED_BY_DOXYGEN #if EIGEN_HAS_CXX11 template auto seq(FirstType f, LastType l) -> decltype(seqN(typename internal::cleanup_index_type::type(f), ( typename internal::cleanup_index_type::type(l) - typename internal::cleanup_index_type::type(f)+fix<1>()))) { return seqN(typename internal::cleanup_index_type::type(f), (typename internal::cleanup_index_type::type(l) -typename internal::cleanup_index_type::type(f)+fix<1>())); } template auto seq(FirstType f, LastType l, IncrType incr) -> decltype(seqN(typename internal::cleanup_index_type::type(f), ( typename internal::cleanup_index_type::type(l) - typename internal::cleanup_index_type::type(f)+typename internal::cleanup_seq_incr::type(incr) ) / typename internal::cleanup_seq_incr::type(incr), typename internal::cleanup_seq_incr::type(incr))) { typedef typename internal::cleanup_seq_incr::type CleanedIncrType; return seqN(typename internal::cleanup_index_type::type(f), ( typename internal::cleanup_index_type::type(l) -typename internal::cleanup_index_type::type(f)+CleanedIncrType(incr)) / CleanedIncrType(incr), CleanedIncrType(incr)); } #else // EIGEN_HAS_CXX11 template typename internal::enable_if::value || symbolic::is_symbolic::value), ArithmeticSequence::type,Index> >::type seq(FirstType f, LastType l) { return seqN(typename internal::cleanup_index_type::type(f), Index((typename internal::cleanup_index_type::type(l)-typename internal::cleanup_index_type::type(f)+fix<1>()))); } template typename internal::enable_if::value, ArithmeticSequence,symbolic::ValueExpr<> >, symbolic::ValueExpr > > > >::type seq(const symbolic::BaseExpr &f, LastType l) { return seqN(f.derived(),(typename internal::cleanup_index_type::type(l)-f.derived()+fix<1>())); } template typename internal::enable_if::value, ArithmeticSequence::type, symbolic::AddExpr >, symbolic::ValueExpr > > > >::type seq(FirstType f, const symbolic::BaseExpr &l) { return seqN(typename internal::cleanup_index_type::type(f),(l.derived()-typename internal::cleanup_index_type::type(f)+fix<1>())); } template ArithmeticSequence >,symbolic::ValueExpr > > > seq(const symbolic::BaseExpr &f, const symbolic::BaseExpr &l) { return seqN(f.derived(),(l.derived()-f.derived()+fix<1>())); } template typename internal::enable_if::value || symbolic::is_symbolic::value), ArithmeticSequence::type,Index,typename internal::cleanup_seq_incr::type> >::type seq(FirstType f, LastType l, IncrType incr) { typedef typename internal::cleanup_seq_incr::type CleanedIncrType; return seqN(typename internal::cleanup_index_type::type(f), Index((typename internal::cleanup_index_type::type(l)-typename internal::cleanup_index_type::type(f)+CleanedIncrType(incr))/CleanedIncrType(incr)), incr); } template typename internal::enable_if::value, ArithmeticSequence, symbolic::ValueExpr<> >, symbolic::ValueExpr::type> >, symbolic::ValueExpr::type> >, typename internal::cleanup_seq_incr::type> >::type seq(const symbolic::BaseExpr &f, LastType l, IncrType incr) { typedef typename internal::cleanup_seq_incr::type CleanedIncrType; return seqN(f.derived(),(typename internal::cleanup_index_type::type(l)-f.derived()+CleanedIncrType(incr))/CleanedIncrType(incr), incr); } template typename internal::enable_if::value, ArithmeticSequence::type, symbolic::QuotientExpr >, symbolic::ValueExpr::type> >, symbolic::ValueExpr::type> >, typename internal::cleanup_seq_incr::type> >::type seq(FirstType f, const symbolic::BaseExpr &l, IncrType incr) { typedef typename internal::cleanup_seq_incr::type CleanedIncrType; return seqN(typename internal::cleanup_index_type::type(f), (l.derived()-typename internal::cleanup_index_type::type(f)+CleanedIncrType(incr))/CleanedIncrType(incr), incr); } template ArithmeticSequence >, symbolic::ValueExpr::type> >, symbolic::ValueExpr::type> >, typename internal::cleanup_seq_incr::type> seq(const symbolic::BaseExpr &f, const symbolic::BaseExpr &l, IncrType incr) { typedef typename internal::cleanup_seq_incr::type CleanedIncrType; return seqN(f.derived(),(l.derived()-f.derived()+CleanedIncrType(incr))/CleanedIncrType(incr), incr); } #endif // EIGEN_HAS_CXX11 #endif // EIGEN_PARSED_BY_DOXYGEN #if EIGEN_HAS_CXX11 || defined(EIGEN_PARSED_BY_DOXYGEN) /** \cpp11 * \returns a symbolic ArithmeticSequence representing the last \a size elements with increment \a incr. * * It is a shortcut for: \code seqN(last-(size-fix<1>)*incr, size, incr) \endcode * * \sa lastN(SizeType), seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */ template auto lastN(SizeType size, IncrType incr) -> decltype(seqN(Eigen::last-(size-fix<1>())*incr, size, incr)) { return seqN(Eigen::last-(size-fix<1>())*incr, size, incr); } /** \cpp11 * \returns a symbolic ArithmeticSequence representing the last \a size elements with a unit increment. * * It is a shortcut for: \code seq(last+fix<1>-size, last) \endcode * * \sa lastN(SizeType,IncrType, seqN(FirstType,SizeType), seq(FirstType,LastType) */ template auto lastN(SizeType size) -> decltype(seqN(Eigen::last+fix<1>()-size, size)) { return seqN(Eigen::last+fix<1>()-size, size); } #endif namespace internal { // Convert a symbolic span into a usable one (i.e., remove last/end "keywords") template struct make_size_type { typedef typename internal::conditional::value, Index, T>::type type; }; template struct IndexedViewCompatibleType, XprSize> { typedef ArithmeticSequence::type,IncrType> type; }; template ArithmeticSequence::type,IncrType> makeIndexedViewCompatible(const ArithmeticSequence& ids, Index size,SpecializedType) { return ArithmeticSequence::type,IncrType>( eval_expr_given_size(ids.firstObject(),size),eval_expr_given_size(ids.sizeObject(),size),ids.incrObject()); } template struct get_compile_time_incr > { enum { value = get_fixed_value::value }; }; } // end namespace internal /** \namespace Eigen::indexing * \ingroup Core_Module * * The sole purpose of this namespace is to be able to import all functions * and symbols that are expected to be used within operator() for indexing * and slicing. If you already imported the whole Eigen namespace: * \code using namespace Eigen; \endcode * then you are already all set. Otherwise, if you don't want/cannot import * the whole Eigen namespace, the following line: * \code using namespace Eigen::indexing; \endcode * is equivalent to: * \code using Eigen::all; using Eigen::seq; using Eigen::seqN; using Eigen::lastN; // c++11 only using Eigen::last; using Eigen::lastp1; using Eigen::fix; \endcode */ namespace indexing { using Eigen::all; using Eigen::seq; using Eigen::seqN; #if EIGEN_HAS_CXX11 using Eigen::lastN; #endif using Eigen::last; using Eigen::lastp1; using Eigen::fix; } } // end namespace Eigen #endif // EIGEN_ARITHMETIC_SEQUENCE_H RcppEigen/inst/include/Eigen/src/Core/Assign_MKL.h0000755000176200001440000003031014562417221021340 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Copyright (C) 2015 Gael Guennebaud Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to Intel(R) MKL * MKL VML support for coefficient-wise unary Eigen expressions like a=b.sin() ******************************************************************************** */ #ifndef EIGEN_ASSIGN_VML_H #define EIGEN_ASSIGN_VML_H namespace Eigen { namespace internal { template class vml_assign_traits { private: enum { DstHasDirectAccess = Dst::Flags & DirectAccessBit, SrcHasDirectAccess = Src::Flags & DirectAccessBit, StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Src::IsRowMajor)), InnerSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::SizeAtCompileTime) : int(Dst::Flags)&RowMajorBit ? int(Dst::ColsAtCompileTime) : int(Dst::RowsAtCompileTime), InnerMaxSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime) : int(Dst::Flags)&RowMajorBit ? int(Dst::MaxColsAtCompileTime) : int(Dst::MaxRowsAtCompileTime), MaxSizeAtCompileTime = Dst::SizeAtCompileTime, MightEnableVml = StorageOrdersAgree && DstHasDirectAccess && SrcHasDirectAccess && Src::InnerStrideAtCompileTime==1 && Dst::InnerStrideAtCompileTime==1, MightLinearize = MightEnableVml && (int(Dst::Flags) & int(Src::Flags) & LinearAccessBit), VmlSize = MightLinearize ? MaxSizeAtCompileTime : InnerMaxSize, LargeEnough = VmlSize==Dynamic || VmlSize>=EIGEN_MKL_VML_THRESHOLD }; public: enum { EnableVml = MightEnableVml && LargeEnough, Traversal = MightLinearize ? LinearTraversal : DefaultTraversal }; }; #define EIGEN_PP_EXPAND(ARG) ARG #if !defined (EIGEN_FAST_MATH) || (EIGEN_FAST_MATH != 1) #define EIGEN_VMLMODE_EXPAND_xLA , VML_HA #else #define EIGEN_VMLMODE_EXPAND_xLA , VML_LA #endif #define EIGEN_VMLMODE_EXPAND_x_ #define EIGEN_VMLMODE_PREFIX_xLA vm #define EIGEN_VMLMODE_PREFIX_x_ v #define EIGEN_VMLMODE_PREFIX(VMLMODE) EIGEN_CAT(EIGEN_VMLMODE_PREFIX_x,VMLMODE) #define EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \ template< typename DstXprType, typename SrcXprNested> \ struct Assignment, SrcXprNested>, assign_op, \ Dense2Dense, typename enable_if::EnableVml>::type> { \ typedef CwiseUnaryOp, SrcXprNested> SrcXprType; \ static void run(DstXprType &dst, const SrcXprType &src, const assign_op &func) { \ resize_if_allowed(dst, src, func); \ eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \ if(vml_assign_traits::Traversal==LinearTraversal) { \ VMLOP(dst.size(), (const VMLTYPE*)src.nestedExpression().data(), \ (VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE) ); \ } else { \ const Index outerSize = dst.outerSize(); \ for(Index outer = 0; outer < outerSize; ++outer) { \ const EIGENTYPE *src_ptr = src.IsRowMajor ? &(src.nestedExpression().coeffRef(outer,0)) : \ &(src.nestedExpression().coeffRef(0, outer)); \ EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \ VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, \ (VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \ } \ } \ } \ }; \ #define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),s##VMLOP), float, float, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),d##VMLOP), double, double, VMLMODE) #define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(EIGENOP, VMLOP, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),c##VMLOP), scomplex, MKL_Complex8, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),z##VMLOP), dcomplex, MKL_Complex16, VMLMODE) #define EIGEN_MKL_VML_DECLARE_UNARY_CALLS(EIGENOP, VMLOP, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \ EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(EIGENOP, VMLOP, VMLMODE) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sin, Sin, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(asin, Asin, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sinh, Sinh, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cos, Cos, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(acos, Acos, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cosh, Cosh, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tan, Tan, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(atan, Atan, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tanh, Tanh, LA) // EIGEN_MKL_VML_DECLARE_UNARY_CALLS(abs, Abs, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(exp, Exp, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(log, Ln, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(log10, Log10, LA) EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sqrt, Sqrt, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(square, Sqr, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(arg, Arg, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(round, Round, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(floor, Floor, _) EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(ceil, Ceil, _) #define EIGEN_MKL_VML_DECLARE_POW_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \ template< typename DstXprType, typename SrcXprNested, typename Plain> \ struct Assignment, SrcXprNested, \ const CwiseNullaryOp,Plain> >, assign_op, \ Dense2Dense, typename enable_if::EnableVml>::type> { \ typedef CwiseBinaryOp, SrcXprNested, \ const CwiseNullaryOp,Plain> > SrcXprType; \ static void run(DstXprType &dst, const SrcXprType &src, const assign_op &func) { \ resize_if_allowed(dst, src, func); \ eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \ VMLTYPE exponent = reinterpret_cast(src.rhs().functor().m_other); \ if(vml_assign_traits::Traversal==LinearTraversal) \ { \ VMLOP( dst.size(), (const VMLTYPE*)src.lhs().data(), exponent, \ (VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE) ); \ } else { \ const Index outerSize = dst.outerSize(); \ for(Index outer = 0; outer < outerSize; ++outer) { \ const EIGENTYPE *src_ptr = src.IsRowMajor ? &(src.lhs().coeffRef(outer,0)) : \ &(src.lhs().coeffRef(0, outer)); \ EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \ VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, exponent, \ (VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \ } \ } \ } \ }; EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmsPowx, float, float, LA) EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmdPowx, double, double, LA) EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmcPowx, scomplex, MKL_Complex8, LA) EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmzPowx, dcomplex, MKL_Complex16, LA) } // end namespace internal } // end namespace Eigen #endif // EIGEN_ASSIGN_VML_H RcppEigen/inst/include/Eigen/src/Core/CoreEvaluators.h0000644000176200001440000017454114562417221022363 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2011 Benoit Jacob // Copyright (C) 2011-2014 Gael Guennebaud // Copyright (C) 2011-2012 Jitse Niesen // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_COREEVALUATORS_H #define EIGEN_COREEVALUATORS_H namespace Eigen { namespace internal { // This class returns the evaluator kind from the expression storage kind. // Default assumes index based accessors template struct storage_kind_to_evaluator_kind { typedef IndexBased Kind; }; // This class returns the evaluator shape from the expression storage kind. // It can be Dense, Sparse, Triangular, Diagonal, SelfAdjoint, Band, etc. template struct storage_kind_to_shape; template<> struct storage_kind_to_shape { typedef DenseShape Shape; }; template<> struct storage_kind_to_shape { typedef SolverShape Shape; }; template<> struct storage_kind_to_shape { typedef PermutationShape Shape; }; template<> struct storage_kind_to_shape { typedef TranspositionsShape Shape; }; // Evaluators have to be specialized with respect to various criteria such as: // - storage/structure/shape // - scalar type // - etc. // Therefore, we need specialization of evaluator providing additional template arguments for each kind of evaluators. // We currently distinguish the following kind of evaluators: // - unary_evaluator for expressions taking only one arguments (CwiseUnaryOp, CwiseUnaryView, Transpose, MatrixWrapper, ArrayWrapper, Reverse, Replicate) // - binary_evaluator for expression taking two arguments (CwiseBinaryOp) // - ternary_evaluator for expression taking three arguments (CwiseTernaryOp) // - product_evaluator for linear algebra products (Product); special case of binary_evaluator because it requires additional tags for dispatching. // - mapbase_evaluator for Map, Block, Ref // - block_evaluator for Block (special dispatching to a mapbase_evaluator or unary_evaluator) template< typename T, typename Arg1Kind = typename evaluator_traits::Kind, typename Arg2Kind = typename evaluator_traits::Kind, typename Arg3Kind = typename evaluator_traits::Kind, typename Arg1Scalar = typename traits::Scalar, typename Arg2Scalar = typename traits::Scalar, typename Arg3Scalar = typename traits::Scalar> struct ternary_evaluator; template< typename T, typename LhsKind = typename evaluator_traits::Kind, typename RhsKind = typename evaluator_traits::Kind, typename LhsScalar = typename traits::Scalar, typename RhsScalar = typename traits::Scalar> struct binary_evaluator; template< typename T, typename Kind = typename evaluator_traits::Kind, typename Scalar = typename T::Scalar> struct unary_evaluator; // evaluator_traits contains traits for evaluator template struct evaluator_traits_base { // by default, get evaluator kind and shape from storage typedef typename storage_kind_to_evaluator_kind::StorageKind>::Kind Kind; typedef typename storage_kind_to_shape::StorageKind>::Shape Shape; }; // Default evaluator traits template struct evaluator_traits : public evaluator_traits_base { }; template::Shape > struct evaluator_assume_aliasing { static const bool value = false; }; // By default, we assume a unary expression: template struct evaluator : public unary_evaluator { typedef unary_evaluator Base; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const T& xpr) : Base(xpr) {} }; // TODO: Think about const-correctness template struct evaluator : evaluator { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const T& xpr) : evaluator(xpr) {} }; // ---------- base class for all evaluators ---------- template struct evaluator_base { // TODO that's not very nice to have to propagate all these traits. They are currently only needed to handle outer,inner indices. typedef traits ExpressionTraits; enum { Alignment = 0 }; // noncopyable: // Don't make this class inherit noncopyable as this kills EBO (Empty Base Optimization) // and make complex evaluator much larger than then should do. EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE evaluator_base() {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ~evaluator_base() {} private: EIGEN_DEVICE_FUNC evaluator_base(const evaluator_base&); EIGEN_DEVICE_FUNC const evaluator_base& operator=(const evaluator_base&); }; // -------------------- Matrix and Array -------------------- // // evaluator is a common base class for the // Matrix and Array evaluators. // Here we directly specialize evaluator. This is not really a unary expression, and it is, by definition, dense, // so no need for more sophisticated dispatching. // this helper permits to completely eliminate m_outerStride if it is known at compiletime. template class plainobjectbase_evaluator_data { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE plainobjectbase_evaluator_data(const Scalar* ptr, Index outerStride) : data(ptr) { #ifndef EIGEN_INTERNAL_DEBUGGING EIGEN_UNUSED_VARIABLE(outerStride); #endif eigen_internal_assert(outerStride==OuterStride); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index outerStride() const EIGEN_NOEXCEPT { return OuterStride; } const Scalar *data; }; template class plainobjectbase_evaluator_data { public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE plainobjectbase_evaluator_data(const Scalar* ptr, Index outerStride) : data(ptr), m_outerStride(outerStride) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outerStride() const { return m_outerStride; } const Scalar *data; protected: Index m_outerStride; }; template struct evaluator > : evaluator_base { typedef PlainObjectBase PlainObjectType; typedef typename PlainObjectType::Scalar Scalar; typedef typename PlainObjectType::CoeffReturnType CoeffReturnType; enum { IsRowMajor = PlainObjectType::IsRowMajor, IsVectorAtCompileTime = PlainObjectType::IsVectorAtCompileTime, RowsAtCompileTime = PlainObjectType::RowsAtCompileTime, ColsAtCompileTime = PlainObjectType::ColsAtCompileTime, CoeffReadCost = NumTraits::ReadCost, Flags = traits::EvaluatorFlags, Alignment = traits::Alignment }; enum { // We do not need to know the outer stride for vectors OuterStrideAtCompileTime = IsVectorAtCompileTime ? 0 : int(IsRowMajor) ? ColsAtCompileTime : RowsAtCompileTime }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE evaluator() : m_d(0,OuterStrideAtCompileTime) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const PlainObjectType& m) : m_d(m.data(),IsVectorAtCompileTime ? 0 : m.outerStride()) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { if (IsRowMajor) return m_d.data[row * m_d.outerStride() + col]; else return m_d.data[row + col * m_d.outerStride()]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_d.data[index]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { if (IsRowMajor) return const_cast(m_d.data)[row * m_d.outerStride() + col]; else return const_cast(m_d.data)[row + col * m_d.outerStride()]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return const_cast(m_d.data)[index]; } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { if (IsRowMajor) return ploadt(m_d.data + row * m_d.outerStride() + col); else return ploadt(m_d.data + row + col * m_d.outerStride()); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return ploadt(m_d.data + index); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { if (IsRowMajor) return pstoret (const_cast(m_d.data) + row * m_d.outerStride() + col, x); else return pstoret (const_cast(m_d.data) + row + col * m_d.outerStride(), x); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { return pstoret(const_cast(m_d.data) + index, x); } protected: plainobjectbase_evaluator_data m_d; }; template struct evaluator > : evaluator > > { typedef Matrix XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE evaluator() {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& m) : evaluator >(m) { } }; template struct evaluator > : evaluator > > { typedef Array XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE evaluator() {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& m) : evaluator >(m) { } }; // -------------------- Transpose -------------------- template struct unary_evaluator, IndexBased> : evaluator_base > { typedef Transpose XprType; enum { CoeffReadCost = evaluator::CoeffReadCost, Flags = evaluator::Flags ^ RowMajorBit, Alignment = evaluator::Alignment }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& t) : m_argImpl(t.nestedExpression()) {} typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_argImpl.coeff(col, row); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_argImpl.coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_argImpl.coeffRef(col, row); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename XprType::Scalar& coeffRef(Index index) { return m_argImpl.coeffRef(index); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_argImpl.template packet(col, row); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return m_argImpl.template packet(index); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { m_argImpl.template writePacket(col, row, x); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { m_argImpl.template writePacket(index, x); } protected: evaluator m_argImpl; }; // -------------------- CwiseNullaryOp -------------------- // Like Matrix and Array, this is not really a unary expression, so we directly specialize evaluator. // Likewise, there is not need to more sophisticated dispatching here. template::value, bool has_unary = has_unary_operator::value, bool has_binary = has_binary_operator::value> struct nullary_wrapper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i, IndexType j) const { return op(i,j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i) const { return op(i); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i, IndexType j) const { return op.template packetOp(i,j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i) const { return op.template packetOp(i); } }; template struct nullary_wrapper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType=0, IndexType=0) const { return op(); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType=0, IndexType=0) const { return op.template packetOp(); } }; template struct nullary_wrapper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i, IndexType j=0) const { return op(i,j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i, IndexType j=0) const { return op.template packetOp(i,j); } }; // We need the following specialization for vector-only functors assigned to a runtime vector, // for instance, using linspace and assigning a RowVectorXd to a MatrixXd or even a row of a MatrixXd. // In this case, i==0 and j is used for the actual iteration. template struct nullary_wrapper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i, IndexType j) const { eigen_assert(i==0 || j==0); return op(i+j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i, IndexType j) const { eigen_assert(i==0 || j==0); return op.template packetOp(i+j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i) const { return op(i); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i) const { return op.template packetOp(i); } }; template struct nullary_wrapper {}; #if 0 && EIGEN_COMP_MSVC>0 // Disable this ugly workaround. This is now handled in traits::match, // but this piece of code might still become handly if some other weird compilation // erros pop up again. // MSVC exhibits a weird compilation error when // compiling: // Eigen::MatrixXf A = MatrixXf::Random(3,3); // Ref R = 2.f*A; // and that has_*ary_operator> have not been instantiated yet. // The "problem" is that evaluator<2.f*A> is instantiated by traits::match<2.f*A> // and at that time has_*ary_operator returns true regardless of T. // Then nullary_wrapper is badly instantiated as nullary_wrapper<.,.,true,true,true>. // The trick is thus to defer the proper instantiation of nullary_wrapper when coeff(), // and packet() are really instantiated as implemented below: // This is a simple wrapper around Index to enforce the re-instantiation of // has_*ary_operator when needed. template struct nullary_wrapper_workaround_msvc { nullary_wrapper_workaround_msvc(const T&); operator T()const; }; template struct nullary_wrapper { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i, IndexType j) const { return nullary_wrapper >::value, has_unary_operator >::value, has_binary_operator >::value>().operator()(op,i,j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator()(const NullaryOp& op, IndexType i) const { return nullary_wrapper >::value, has_unary_operator >::value, has_binary_operator >::value>().operator()(op,i); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i, IndexType j) const { return nullary_wrapper >::value, has_unary_operator >::value, has_binary_operator >::value>().template packetOp(op,i,j); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T packetOp(const NullaryOp& op, IndexType i) const { return nullary_wrapper >::value, has_unary_operator >::value, has_binary_operator >::value>().template packetOp(op,i); } }; #endif // MSVC workaround template struct evaluator > : evaluator_base > { typedef CwiseNullaryOp XprType; typedef typename internal::remove_all::type PlainObjectTypeCleaned; enum { CoeffReadCost = internal::functor_traits::Cost, Flags = (evaluator::Flags & ( HereditaryBits | (functor_has_linear_access::ret ? LinearAccessBit : 0) | (functor_traits::PacketAccess ? PacketAccessBit : 0))) | (functor_traits::IsRepeatable ? 0 : EvalBeforeNestingBit), Alignment = AlignedMax }; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& n) : m_functor(n.functor()), m_wrapper() { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::CoeffReturnType CoeffReturnType; template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(IndexType row, IndexType col) const { return m_wrapper(m_functor, row, col); } template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(IndexType index) const { return m_wrapper(m_functor,index); } template EIGEN_STRONG_INLINE PacketType packet(IndexType row, IndexType col) const { return m_wrapper.template packetOp(m_functor, row, col); } template EIGEN_STRONG_INLINE PacketType packet(IndexType index) const { return m_wrapper.template packetOp(m_functor, index); } protected: const NullaryOp m_functor; const internal::nullary_wrapper m_wrapper; }; // -------------------- CwiseUnaryOp -------------------- template struct unary_evaluator, IndexBased > : evaluator_base > { typedef CwiseUnaryOp XprType; enum { CoeffReadCost = int(evaluator::CoeffReadCost) + int(functor_traits::Cost), Flags = evaluator::Flags & (HereditaryBits | LinearAccessBit | (functor_traits::PacketAccess ? PacketAccessBit : 0)), Alignment = evaluator::Alignment }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& op) : m_d(op) { EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits::Cost); EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_d.func()(m_d.argImpl.coeff(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_d.func()(m_d.argImpl.coeff(index)); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_d.func().packetOp(m_d.argImpl.template packet(row, col)); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return m_d.func().packetOp(m_d.argImpl.template packet(index)); } protected: // this helper permits to completely eliminate the functor if it is empty struct Data { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const UnaryOp& func() const { return op; } UnaryOp op; evaluator argImpl; }; Data m_d; }; // -------------------- CwiseTernaryOp -------------------- // this is a ternary expression template struct evaluator > : public ternary_evaluator > { typedef CwiseTernaryOp XprType; typedef ternary_evaluator > Base; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : Base(xpr) {} }; template struct ternary_evaluator, IndexBased, IndexBased> : evaluator_base > { typedef CwiseTernaryOp XprType; enum { CoeffReadCost = int(evaluator::CoeffReadCost) + int(evaluator::CoeffReadCost) + int(evaluator::CoeffReadCost) + int(functor_traits::Cost), Arg1Flags = evaluator::Flags, Arg2Flags = evaluator::Flags, Arg3Flags = evaluator::Flags, SameType = is_same::value && is_same::value, StorageOrdersAgree = (int(Arg1Flags)&RowMajorBit)==(int(Arg2Flags)&RowMajorBit) && (int(Arg1Flags)&RowMajorBit)==(int(Arg3Flags)&RowMajorBit), Flags0 = (int(Arg1Flags) | int(Arg2Flags) | int(Arg3Flags)) & ( HereditaryBits | (int(Arg1Flags) & int(Arg2Flags) & int(Arg3Flags) & ( (StorageOrdersAgree ? LinearAccessBit : 0) | (functor_traits::PacketAccess && StorageOrdersAgree && SameType ? PacketAccessBit : 0) ) ) ), Flags = (Flags0 & ~RowMajorBit) | (Arg1Flags & RowMajorBit), Alignment = EIGEN_PLAIN_ENUM_MIN( EIGEN_PLAIN_ENUM_MIN(evaluator::Alignment, evaluator::Alignment), evaluator::Alignment) }; EIGEN_DEVICE_FUNC explicit ternary_evaluator(const XprType& xpr) : m_d(xpr) { EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits::Cost); EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_d.func()(m_d.arg1Impl.coeff(row, col), m_d.arg2Impl.coeff(row, col), m_d.arg3Impl.coeff(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_d.func()(m_d.arg1Impl.coeff(index), m_d.arg2Impl.coeff(index), m_d.arg3Impl.coeff(index)); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_d.func().packetOp(m_d.arg1Impl.template packet(row, col), m_d.arg2Impl.template packet(row, col), m_d.arg3Impl.template packet(row, col)); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return m_d.func().packetOp(m_d.arg1Impl.template packet(index), m_d.arg2Impl.template packet(index), m_d.arg3Impl.template packet(index)); } protected: // this helper permits to completely eliminate the functor if it is empty struct Data { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Data(const XprType& xpr) : op(xpr.functor()), arg1Impl(xpr.arg1()), arg2Impl(xpr.arg2()), arg3Impl(xpr.arg3()) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const TernaryOp& func() const { return op; } TernaryOp op; evaluator arg1Impl; evaluator arg2Impl; evaluator arg3Impl; }; Data m_d; }; // -------------------- CwiseBinaryOp -------------------- // this is a binary expression template struct evaluator > : public binary_evaluator > { typedef CwiseBinaryOp XprType; typedef binary_evaluator > Base; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& xpr) : Base(xpr) {} }; template struct binary_evaluator, IndexBased, IndexBased> : evaluator_base > { typedef CwiseBinaryOp XprType; enum { CoeffReadCost = int(evaluator::CoeffReadCost) + int(evaluator::CoeffReadCost) + int(functor_traits::Cost), LhsFlags = evaluator::Flags, RhsFlags = evaluator::Flags, SameType = is_same::value, StorageOrdersAgree = (int(LhsFlags)&RowMajorBit)==(int(RhsFlags)&RowMajorBit), Flags0 = (int(LhsFlags) | int(RhsFlags)) & ( HereditaryBits | (int(LhsFlags) & int(RhsFlags) & ( (StorageOrdersAgree ? LinearAccessBit : 0) | (functor_traits::PacketAccess && StorageOrdersAgree && SameType ? PacketAccessBit : 0) ) ) ), Flags = (Flags0 & ~RowMajorBit) | (LhsFlags & RowMajorBit), Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator::Alignment,evaluator::Alignment) }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit binary_evaluator(const XprType& xpr) : m_d(xpr) { EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits::Cost); EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_d.func()(m_d.lhsImpl.coeff(row, col), m_d.rhsImpl.coeff(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_d.func()(m_d.lhsImpl.coeff(index), m_d.rhsImpl.coeff(index)); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_d.func().packetOp(m_d.lhsImpl.template packet(row, col), m_d.rhsImpl.template packet(row, col)); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return m_d.func().packetOp(m_d.lhsImpl.template packet(index), m_d.rhsImpl.template packet(index)); } protected: // this helper permits to completely eliminate the functor if it is empty struct Data { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Data(const XprType& xpr) : op(xpr.functor()), lhsImpl(xpr.lhs()), rhsImpl(xpr.rhs()) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const BinaryOp& func() const { return op; } BinaryOp op; evaluator lhsImpl; evaluator rhsImpl; }; Data m_d; }; // -------------------- CwiseUnaryView -------------------- template struct unary_evaluator, IndexBased> : evaluator_base > { typedef CwiseUnaryView XprType; enum { CoeffReadCost = int(evaluator::CoeffReadCost) + int(functor_traits::Cost), Flags = (evaluator::Flags & (HereditaryBits | LinearAccessBit | DirectAccessBit)), Alignment = 0 // FIXME it is not very clear why alignment is necessarily lost... }; EIGEN_DEVICE_FUNC explicit unary_evaluator(const XprType& op) : m_d(op) { EIGEN_INTERNAL_CHECK_COST_VALUE(functor_traits::Cost); EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_d.func()(m_d.argImpl.coeff(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_d.func()(m_d.argImpl.coeff(index)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_d.func()(m_d.argImpl.coeffRef(row, col)); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_d.func()(m_d.argImpl.coeffRef(index)); } protected: // this helper permits to completely eliminate the functor if it is empty struct Data { EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Data(const XprType& xpr) : op(xpr.functor()), argImpl(xpr.nestedExpression()) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const UnaryOp& func() const { return op; } UnaryOp op; evaluator argImpl; }; Data m_d; }; // -------------------- Map -------------------- // FIXME perhaps the PlainObjectType could be provided by Derived::PlainObject ? // but that might complicate template specialization template struct mapbase_evaluator; template struct mapbase_evaluator : evaluator_base { typedef Derived XprType; typedef typename XprType::PointerType PointerType; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; enum { IsRowMajor = XprType::RowsAtCompileTime, ColsAtCompileTime = XprType::ColsAtCompileTime, CoeffReadCost = NumTraits::ReadCost }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit mapbase_evaluator(const XprType& map) : m_data(const_cast(map.data())), m_innerStride(map.innerStride()), m_outerStride(map.outerStride()) { EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(evaluator::Flags&PacketAccessBit, internal::inner_stride_at_compile_time::ret==1), PACKET_ACCESS_REQUIRES_TO_HAVE_INNER_STRIDE_FIXED_TO_1); EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_data[col * colStride() + row * rowStride()]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_data[index * m_innerStride.value()]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_data[col * colStride() + row * rowStride()]; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_data[index * m_innerStride.value()]; } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { PointerType ptr = m_data + row * rowStride() + col * colStride(); return internal::ploadt(ptr); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return internal::ploadt(m_data + index * m_innerStride.value()); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { PointerType ptr = m_data + row * rowStride() + col * colStride(); return internal::pstoret(ptr, x); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { internal::pstoret(m_data + index * m_innerStride.value(), x); } protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rowStride() const EIGEN_NOEXCEPT { return XprType::IsRowMajor ? m_outerStride.value() : m_innerStride.value(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index colStride() const EIGEN_NOEXCEPT { return XprType::IsRowMajor ? m_innerStride.value() : m_outerStride.value(); } PointerType m_data; const internal::variable_if_dynamic m_innerStride; const internal::variable_if_dynamic m_outerStride; }; template struct evaluator > : public mapbase_evaluator, PlainObjectType> { typedef Map XprType; typedef typename XprType::Scalar Scalar; // TODO: should check for smaller packet types once we can handle multi-sized packet types typedef typename packet_traits::type PacketScalar; enum { InnerStrideAtCompileTime = StrideType::InnerStrideAtCompileTime == 0 ? int(PlainObjectType::InnerStrideAtCompileTime) : int(StrideType::InnerStrideAtCompileTime), OuterStrideAtCompileTime = StrideType::OuterStrideAtCompileTime == 0 ? int(PlainObjectType::OuterStrideAtCompileTime) : int(StrideType::OuterStrideAtCompileTime), HasNoInnerStride = InnerStrideAtCompileTime == 1, HasNoOuterStride = StrideType::OuterStrideAtCompileTime == 0, HasNoStride = HasNoInnerStride && HasNoOuterStride, IsDynamicSize = PlainObjectType::SizeAtCompileTime==Dynamic, PacketAccessMask = bool(HasNoInnerStride) ? ~int(0) : ~int(PacketAccessBit), LinearAccessMask = bool(HasNoStride) || bool(PlainObjectType::IsVectorAtCompileTime) ? ~int(0) : ~int(LinearAccessBit), Flags = int( evaluator::Flags) & (LinearAccessMask&PacketAccessMask), Alignment = int(MapOptions)&int(AlignedMask) }; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& map) : mapbase_evaluator(map) { } }; // -------------------- Ref -------------------- template struct evaluator > : public mapbase_evaluator, PlainObjectType> { typedef Ref XprType; enum { Flags = evaluator >::Flags, Alignment = evaluator >::Alignment }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& ref) : mapbase_evaluator(ref) { } }; // -------------------- Block -------------------- template::ret> struct block_evaluator; template struct evaluator > : block_evaluator { typedef Block XprType; typedef typename XprType::Scalar Scalar; // TODO: should check for smaller packet types once we can handle multi-sized packet types typedef typename packet_traits::type PacketScalar; enum { CoeffReadCost = evaluator::CoeffReadCost, RowsAtCompileTime = traits::RowsAtCompileTime, ColsAtCompileTime = traits::ColsAtCompileTime, MaxRowsAtCompileTime = traits::MaxRowsAtCompileTime, MaxColsAtCompileTime = traits::MaxColsAtCompileTime, ArgTypeIsRowMajor = (int(evaluator::Flags)&RowMajorBit) != 0, IsRowMajor = (MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1) ? 1 : (MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1) ? 0 : ArgTypeIsRowMajor, HasSameStorageOrderAsArgType = (IsRowMajor == ArgTypeIsRowMajor), InnerSize = IsRowMajor ? int(ColsAtCompileTime) : int(RowsAtCompileTime), InnerStrideAtCompileTime = HasSameStorageOrderAsArgType ? int(inner_stride_at_compile_time::ret) : int(outer_stride_at_compile_time::ret), OuterStrideAtCompileTime = HasSameStorageOrderAsArgType ? int(outer_stride_at_compile_time::ret) : int(inner_stride_at_compile_time::ret), MaskPacketAccessBit = (InnerStrideAtCompileTime == 1 || HasSameStorageOrderAsArgType) ? PacketAccessBit : 0, FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (evaluator::Flags&LinearAccessBit))) ? LinearAccessBit : 0, FlagsRowMajorBit = XprType::Flags&RowMajorBit, Flags0 = evaluator::Flags & ( (HereditaryBits & ~RowMajorBit) | DirectAccessBit | MaskPacketAccessBit), Flags = Flags0 | FlagsLinearAccessBit | FlagsRowMajorBit, PacketAlignment = unpacket_traits::alignment, Alignment0 = (InnerPanel && (OuterStrideAtCompileTime!=Dynamic) && (OuterStrideAtCompileTime!=0) && (((OuterStrideAtCompileTime * int(sizeof(Scalar))) % int(PacketAlignment)) == 0)) ? int(PacketAlignment) : 0, Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator::Alignment, Alignment0) }; typedef block_evaluator block_evaluator_type; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& block) : block_evaluator_type(block) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } }; // no direct-access => dispatch to a unary evaluator template struct block_evaluator : unary_evaluator > { typedef Block XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit block_evaluator(const XprType& block) : unary_evaluator(block) {} }; template struct unary_evaluator, IndexBased> : evaluator_base > { typedef Block XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& block) : m_argImpl(block.nestedExpression()), m_startRow(block.startRow()), m_startCol(block.startCol()), m_linear_offset(ForwardLinearAccess?(ArgType::IsRowMajor ? block.startRow()*block.nestedExpression().cols() + block.startCol() : block.startCol()*block.nestedExpression().rows() + block.startRow()):0) { } typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; enum { RowsAtCompileTime = XprType::RowsAtCompileTime, ForwardLinearAccess = (InnerPanel || int(XprType::IsRowMajor)==int(ArgType::IsRowMajor)) && bool(evaluator::Flags&LinearAccessBit) }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return linear_coeff_impl(index, bool_constant()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return linear_coeffRef_impl(index, bool_constant()); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_argImpl.template packet(m_startRow.value() + row, m_startCol.value() + col); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { if (ForwardLinearAccess) return m_argImpl.template packet(m_linear_offset.value() + index); else return packet(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { return m_argImpl.template writePacket(m_startRow.value() + row, m_startCol.value() + col, x); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { if (ForwardLinearAccess) return m_argImpl.template writePacket(m_linear_offset.value() + index, x); else return writePacket(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0, x); } protected: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType linear_coeff_impl(Index index, internal::true_type /* ForwardLinearAccess */) const { return m_argImpl.coeff(m_linear_offset.value() + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType linear_coeff_impl(Index index, internal::false_type /* not ForwardLinearAccess */) const { return coeff(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& linear_coeffRef_impl(Index index, internal::true_type /* ForwardLinearAccess */) { return m_argImpl.coeffRef(m_linear_offset.value() + index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& linear_coeffRef_impl(Index index, internal::false_type /* not ForwardLinearAccess */) { return coeffRef(RowsAtCompileTime == 1 ? 0 : index, RowsAtCompileTime == 1 ? index : 0); } evaluator m_argImpl; const variable_if_dynamic m_startRow; const variable_if_dynamic m_startCol; const variable_if_dynamic m_linear_offset; }; // TODO: This evaluator does not actually use the child evaluator; // all action is via the data() as returned by the Block expression. template struct block_evaluator : mapbase_evaluator, typename Block::PlainObject> { typedef Block XprType; typedef typename XprType::Scalar Scalar; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit block_evaluator(const XprType& block) : mapbase_evaluator(block) { // TODO: for the 3.3 release, this should be turned to an internal assertion, but let's keep it as is for the beta lifetime eigen_assert(((internal::UIntPtr(block.data()) % EIGEN_PLAIN_ENUM_MAX(1,evaluator::Alignment)) == 0) && "data is not aligned"); } }; // -------------------- Select -------------------- // NOTE shall we introduce a ternary_evaluator? // TODO enable vectorization for Select template struct evaluator > : evaluator_base > { typedef Select XprType; enum { CoeffReadCost = evaluator::CoeffReadCost + EIGEN_PLAIN_ENUM_MAX(evaluator::CoeffReadCost, evaluator::CoeffReadCost), Flags = (unsigned int)evaluator::Flags & evaluator::Flags & HereditaryBits, Alignment = EIGEN_PLAIN_ENUM_MIN(evaluator::Alignment, evaluator::Alignment) }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& select) : m_conditionImpl(select.conditionMatrix()), m_thenImpl(select.thenMatrix()), m_elseImpl(select.elseMatrix()) { EIGEN_INTERNAL_CHECK_COST_VALUE(CoeffReadCost); } typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { if (m_conditionImpl.coeff(row, col)) return m_thenImpl.coeff(row, col); else return m_elseImpl.coeff(row, col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { if (m_conditionImpl.coeff(index)) return m_thenImpl.coeff(index); else return m_elseImpl.coeff(index); } protected: evaluator m_conditionImpl; evaluator m_thenImpl; evaluator m_elseImpl; }; // -------------------- Replicate -------------------- template struct unary_evaluator > : evaluator_base > { typedef Replicate XprType; typedef typename XprType::CoeffReturnType CoeffReturnType; enum { Factor = (RowFactor==Dynamic || ColFactor==Dynamic) ? Dynamic : RowFactor*ColFactor }; typedef typename internal::nested_eval::type ArgTypeNested; typedef typename internal::remove_all::type ArgTypeNestedCleaned; enum { CoeffReadCost = evaluator::CoeffReadCost, LinearAccessMask = XprType::IsVectorAtCompileTime ? LinearAccessBit : 0, Flags = (evaluator::Flags & (HereditaryBits|LinearAccessMask) & ~RowMajorBit) | (traits::Flags & RowMajorBit), Alignment = evaluator::Alignment }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& replicate) : m_arg(replicate.nestedExpression()), m_argImpl(m_arg), m_rows(replicate.nestedExpression().rows()), m_cols(replicate.nestedExpression().cols()) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { // try to avoid using modulo; this is a pure optimization strategy const Index actual_row = internal::traits::RowsAtCompileTime==1 ? 0 : RowFactor==1 ? row : row % m_rows.value(); const Index actual_col = internal::traits::ColsAtCompileTime==1 ? 0 : ColFactor==1 ? col : col % m_cols.value(); return m_argImpl.coeff(actual_row, actual_col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { // try to avoid using modulo; this is a pure optimization strategy const Index actual_index = internal::traits::RowsAtCompileTime==1 ? (ColFactor==1 ? index : index%m_cols.value()) : (RowFactor==1 ? index : index%m_rows.value()); return m_argImpl.coeff(actual_index); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { const Index actual_row = internal::traits::RowsAtCompileTime==1 ? 0 : RowFactor==1 ? row : row % m_rows.value(); const Index actual_col = internal::traits::ColsAtCompileTime==1 ? 0 : ColFactor==1 ? col : col % m_cols.value(); return m_argImpl.template packet(actual_row, actual_col); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { const Index actual_index = internal::traits::RowsAtCompileTime==1 ? (ColFactor==1 ? index : index%m_cols.value()) : (RowFactor==1 ? index : index%m_rows.value()); return m_argImpl.template packet(actual_index); } protected: const ArgTypeNested m_arg; evaluator m_argImpl; const variable_if_dynamic m_rows; const variable_if_dynamic m_cols; }; // -------------------- MatrixWrapper and ArrayWrapper -------------------- // // evaluator_wrapper_base is a common base class for the // MatrixWrapper and ArrayWrapper evaluators. template struct evaluator_wrapper_base : evaluator_base { typedef typename remove_all::type ArgType; enum { CoeffReadCost = evaluator::CoeffReadCost, Flags = evaluator::Flags, Alignment = evaluator::Alignment }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator_wrapper_base(const ArgType& arg) : m_argImpl(arg) {} typedef typename ArgType::Scalar Scalar; typedef typename ArgType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_argImpl.coeff(row, col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_argImpl.coeff(index); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_argImpl.coeffRef(row, col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_argImpl.coeffRef(index); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { return m_argImpl.template packet(row, col); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { return m_argImpl.template packet(index); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { m_argImpl.template writePacket(row, col, x); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { m_argImpl.template writePacket(index, x); } protected: evaluator m_argImpl; }; template struct unary_evaluator > : evaluator_wrapper_base > { typedef MatrixWrapper XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& wrapper) : evaluator_wrapper_base >(wrapper.nestedExpression()) { } }; template struct unary_evaluator > : evaluator_wrapper_base > { typedef ArrayWrapper XprType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& wrapper) : evaluator_wrapper_base >(wrapper.nestedExpression()) { } }; // -------------------- Reverse -------------------- // defined in Reverse.h: template struct reverse_packet_cond; template struct unary_evaluator > : evaluator_base > { typedef Reverse XprType; typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; enum { IsRowMajor = XprType::IsRowMajor, IsColMajor = !IsRowMajor, ReverseRow = (Direction == Vertical) || (Direction == BothDirections), ReverseCol = (Direction == Horizontal) || (Direction == BothDirections), ReversePacket = (Direction == BothDirections) || ((Direction == Vertical) && IsColMajor) || ((Direction == Horizontal) && IsRowMajor), CoeffReadCost = evaluator::CoeffReadCost, // let's enable LinearAccess only with vectorization because of the product overhead // FIXME enable DirectAccess with negative strides? Flags0 = evaluator::Flags, LinearAccess = ( (Direction==BothDirections) && (int(Flags0)&PacketAccessBit) ) || ((ReverseRow && XprType::ColsAtCompileTime==1) || (ReverseCol && XprType::RowsAtCompileTime==1)) ? LinearAccessBit : 0, Flags = int(Flags0) & (HereditaryBits | PacketAccessBit | LinearAccess), Alignment = 0 // FIXME in some rare cases, Alignment could be preserved, like a Vector4f. }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit unary_evaluator(const XprType& reverse) : m_argImpl(reverse.nestedExpression()), m_rows(ReverseRow ? reverse.nestedExpression().rows() : 1), m_cols(ReverseCol ? reverse.nestedExpression().cols() : 1) { } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const { return m_argImpl.coeff(ReverseRow ? m_rows.value() - row - 1 : row, ReverseCol ? m_cols.value() - col - 1 : col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_argImpl.coeff(m_rows.value() * m_cols.value() - index - 1); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) { return m_argImpl.coeffRef(ReverseRow ? m_rows.value() - row - 1 : row, ReverseCol ? m_cols.value() - col - 1 : col); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_argImpl.coeffRef(m_rows.value() * m_cols.value() - index - 1); } template EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const { enum { PacketSize = unpacket_traits::size, OffsetRow = ReverseRow && IsColMajor ? PacketSize : 1, OffsetCol = ReverseCol && IsRowMajor ? PacketSize : 1 }; typedef internal::reverse_packet_cond reverse_packet; return reverse_packet::run(m_argImpl.template packet( ReverseRow ? m_rows.value() - row - OffsetRow : row, ReverseCol ? m_cols.value() - col - OffsetCol : col)); } template EIGEN_STRONG_INLINE PacketType packet(Index index) const { enum { PacketSize = unpacket_traits::size }; return preverse(m_argImpl.template packet(m_rows.value() * m_cols.value() - index - PacketSize)); } template EIGEN_STRONG_INLINE void writePacket(Index row, Index col, const PacketType& x) { // FIXME we could factorize some code with packet(i,j) enum { PacketSize = unpacket_traits::size, OffsetRow = ReverseRow && IsColMajor ? PacketSize : 1, OffsetCol = ReverseCol && IsRowMajor ? PacketSize : 1 }; typedef internal::reverse_packet_cond reverse_packet; m_argImpl.template writePacket( ReverseRow ? m_rows.value() - row - OffsetRow : row, ReverseCol ? m_cols.value() - col - OffsetCol : col, reverse_packet::run(x)); } template EIGEN_STRONG_INLINE void writePacket(Index index, const PacketType& x) { enum { PacketSize = unpacket_traits::size }; m_argImpl.template writePacket (m_rows.value() * m_cols.value() - index - PacketSize, preverse(x)); } protected: evaluator m_argImpl; // If we do not reverse rows, then we do not need to know the number of rows; same for columns // Nonetheless, in this case it is important to set to 1 such that the coeff(index) method works fine for vectors. const variable_if_dynamic m_rows; const variable_if_dynamic m_cols; }; // -------------------- Diagonal -------------------- template struct evaluator > : evaluator_base > { typedef Diagonal XprType; enum { CoeffReadCost = evaluator::CoeffReadCost, Flags = (unsigned int)(evaluator::Flags & (HereditaryBits | DirectAccessBit) & ~RowMajorBit) | LinearAccessBit, Alignment = 0 }; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit evaluator(const XprType& diagonal) : m_argImpl(diagonal.nestedExpression()), m_index(diagonal.index()) { } typedef typename XprType::Scalar Scalar; typedef typename XprType::CoeffReturnType CoeffReturnType; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index) const { return m_argImpl.coeff(row + rowOffset(), row + colOffset()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const { return m_argImpl.coeff(index + rowOffset(), index + colOffset()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index) { return m_argImpl.coeffRef(row + rowOffset(), row + colOffset()); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) { return m_argImpl.coeffRef(index + rowOffset(), index + colOffset()); } protected: evaluator m_argImpl; const internal::variable_if_dynamicindex m_index; private: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; } }; //---------------------------------------------------------------------- // deprecated code //---------------------------------------------------------------------- // -------------------- EvalToTemp -------------------- // expression class for evaluating nested expression to a temporary template class EvalToTemp; template struct traits > : public traits { }; template class EvalToTemp : public dense_xpr_base >::type { public: typedef typename dense_xpr_base::type Base; EIGEN_GENERIC_PUBLIC_INTERFACE(EvalToTemp) explicit EvalToTemp(const ArgType& arg) : m_arg(arg) { } const ArgType& arg() const { return m_arg; } EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_arg.rows(); } EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_arg.cols(); } private: const ArgType& m_arg; }; template struct evaluator > : public evaluator { typedef EvalToTemp XprType; typedef typename ArgType::PlainObject PlainObject; typedef evaluator Base; EIGEN_DEVICE_FUNC explicit evaluator(const XprType& xpr) : m_result(xpr.arg()) { ::new (static_cast(this)) Base(m_result); } // This constructor is used when nesting an EvalTo evaluator in another evaluator EIGEN_DEVICE_FUNC evaluator(const ArgType& arg) : m_result(arg) { ::new (static_cast(this)) Base(m_result); } protected: PlainObject m_result; }; } // namespace internal } // end namespace Eigen #endif // EIGEN_COREEVALUATORS_H RcppEigen/inst/include/Eigen/src/Core/SelfAdjointView.h0000644000176200001440000003522714562417221022457 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELFADJOINTMATRIX_H #define EIGEN_SELFADJOINTMATRIX_H namespace Eigen { /** \class SelfAdjointView * \ingroup Core_Module * * * \brief Expression of a selfadjoint matrix from a triangular part of a dense matrix * * \param MatrixType the type of the dense matrix storing the coefficients * \param TriangularPart can be either \c #Lower or \c #Upper * * This class is an expression of a sefladjoint matrix from a triangular part of a matrix * with given dense storage of the coefficients. It is the return type of MatrixBase::selfadjointView() * and most of the time this is the only way that it is used. * * \sa class TriangularBase, MatrixBase::selfadjointView() */ namespace internal { template struct traits > : traits { typedef typename ref_selector::non_const_type MatrixTypeNested; typedef typename remove_all::type MatrixTypeNestedCleaned; typedef MatrixType ExpressionType; typedef typename MatrixType::PlainObject FullMatrixType; enum { Mode = UpLo | SelfAdjoint, FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, Flags = MatrixTypeNestedCleaned::Flags & (HereditaryBits|FlagsLvalueBit) & (~(PacketAccessBit | DirectAccessBit | LinearAccessBit)) // FIXME these flags should be preserved }; }; } template class SelfAdjointView : public TriangularBase > { public: typedef _MatrixType MatrixType; typedef TriangularBase Base; typedef typename internal::traits::MatrixTypeNested MatrixTypeNested; typedef typename internal::traits::MatrixTypeNestedCleaned MatrixTypeNestedCleaned; typedef MatrixTypeNestedCleaned NestedExpression; /** \brief The type of coefficients in this matrix */ typedef typename internal::traits::Scalar Scalar; typedef typename MatrixType::StorageIndex StorageIndex; typedef typename internal::remove_all::type MatrixConjugateReturnType; typedef SelfAdjointView::type, UpLo> ConstSelfAdjointView; enum { Mode = internal::traits::Mode, Flags = internal::traits::Flags, TransposeMode = ((int(Mode) & int(Upper)) ? Lower : 0) | ((int(Mode) & int(Lower)) ? Upper : 0) }; typedef typename MatrixType::PlainObject PlainObject; EIGEN_DEVICE_FUNC explicit inline SelfAdjointView(MatrixType& matrix) : m_matrix(matrix) { EIGEN_STATIC_ASSERT(UpLo==Lower || UpLo==Upper,SELFADJOINTVIEW_ACCEPTS_UPPER_AND_LOWER_MODE_ONLY); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index rows() const EIGEN_NOEXCEPT { return m_matrix.rows(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index cols() const EIGEN_NOEXCEPT { return m_matrix.cols(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const EIGEN_NOEXCEPT { return m_matrix.outerStride(); } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const EIGEN_NOEXCEPT { return m_matrix.innerStride(); } /** \sa MatrixBase::coeff() * \warning the coordinates must fit into the referenced triangular part */ EIGEN_DEVICE_FUNC inline Scalar coeff(Index row, Index col) const { Base::check_coordinates_internal(row, col); return m_matrix.coeff(row, col); } /** \sa MatrixBase::coeffRef() * \warning the coordinates must fit into the referenced triangular part */ EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col) { EIGEN_STATIC_ASSERT_LVALUE(SelfAdjointView); Base::check_coordinates_internal(row, col); return m_matrix.coeffRef(row, col); } /** \internal */ EIGEN_DEVICE_FUNC const MatrixTypeNestedCleaned& _expression() const { return m_matrix; } EIGEN_DEVICE_FUNC const MatrixTypeNestedCleaned& nestedExpression() const { return m_matrix; } EIGEN_DEVICE_FUNC MatrixTypeNestedCleaned& nestedExpression() { return m_matrix; } /** Efficient triangular matrix times vector/matrix product */ template EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase& rhs) const { return Product(*this, rhs.derived()); } /** Efficient vector/matrix times triangular matrix product */ template friend EIGEN_DEVICE_FUNC const Product operator*(const MatrixBase& lhs, const SelfAdjointView& rhs) { return Product(lhs.derived(),rhs); } friend EIGEN_DEVICE_FUNC const SelfAdjointView operator*(const Scalar& s, const SelfAdjointView& mat) { return (s*mat.nestedExpression()).template selfadjointView(); } /** Perform a symmetric rank 2 update of the selfadjoint matrix \c *this: * \f$ this = this + \alpha u v^* + conj(\alpha) v u^* \f$ * \returns a reference to \c *this * * The vectors \a u and \c v \b must be column vectors, however they can be * a adjoint expression without any overhead. Only the meaningful triangular * part of the matrix is updated, the rest is left unchanged. * * \sa rankUpdate(const MatrixBase&, Scalar) */ template EIGEN_DEVICE_FUNC SelfAdjointView& rankUpdate(const MatrixBase& u, const MatrixBase& v, const Scalar& alpha = Scalar(1)); /** Perform a symmetric rank K update of the selfadjoint matrix \c *this: * \f$ this = this + \alpha ( u u^* ) \f$ where \a u is a vector or matrix. * * \returns a reference to \c *this * * Note that to perform \f$ this = this + \alpha ( u^* u ) \f$ you can simply * call this function with u.adjoint(). * * \sa rankUpdate(const MatrixBase&, const MatrixBase&, Scalar) */ template EIGEN_DEVICE_FUNC SelfAdjointView& rankUpdate(const MatrixBase& u, const Scalar& alpha = Scalar(1)); /** \returns an expression of a triangular view extracted from the current selfadjoint view of a given triangular part * * The parameter \a TriMode can have the following values: \c #Upper, \c #StrictlyUpper, \c #UnitUpper, * \c #Lower, \c #StrictlyLower, \c #UnitLower. * * If \c TriMode references the same triangular part than \c *this, then this method simply return a \c TriangularView of the nested expression, * otherwise, the nested expression is first transposed, thus returning a \c TriangularView> object. * * \sa MatrixBase::triangularView(), class TriangularView */ template EIGEN_DEVICE_FUNC typename internal::conditional<(TriMode&(Upper|Lower))==(UpLo&(Upper|Lower)), TriangularView, TriangularView >::type triangularView() const { typename internal::conditional<(TriMode&(Upper|Lower))==(UpLo&(Upper|Lower)), MatrixType&, typename MatrixType::ConstTransposeReturnType>::type tmp1(m_matrix); typename internal::conditional<(TriMode&(Upper|Lower))==(UpLo&(Upper|Lower)), MatrixType&, typename MatrixType::AdjointReturnType>::type tmp2(tmp1); return typename internal::conditional<(TriMode&(Upper|Lower))==(UpLo&(Upper|Lower)), TriangularView, TriangularView >::type(tmp2); } typedef SelfAdjointView ConjugateReturnType; /** \sa MatrixBase::conjugate() const */ EIGEN_DEVICE_FUNC inline const ConjugateReturnType conjugate() const { return ConjugateReturnType(m_matrix.conjugate()); } /** \returns an expression of the complex conjugate of \c *this if Cond==true, * returns \c *this otherwise. */ template EIGEN_DEVICE_FUNC inline typename internal::conditional::type conjugateIf() const { typedef typename internal::conditional::type ReturnType; return ReturnType(m_matrix.template conjugateIf()); } typedef SelfAdjointView AdjointReturnType; /** \sa MatrixBase::adjoint() const */ EIGEN_DEVICE_FUNC inline const AdjointReturnType adjoint() const { return AdjointReturnType(m_matrix.adjoint()); } typedef SelfAdjointView TransposeReturnType; /** \sa MatrixBase::transpose() */ EIGEN_DEVICE_FUNC inline TransposeReturnType transpose() { EIGEN_STATIC_ASSERT_LVALUE(MatrixType) typename MatrixType::TransposeReturnType tmp(m_matrix); return TransposeReturnType(tmp); } typedef SelfAdjointView ConstTransposeReturnType; /** \sa MatrixBase::transpose() const */ EIGEN_DEVICE_FUNC inline const ConstTransposeReturnType transpose() const { return ConstTransposeReturnType(m_matrix.transpose()); } /** \returns a const expression of the main diagonal of the matrix \c *this * * This method simply returns the diagonal of the nested expression, thus by-passing the SelfAdjointView decorator. * * \sa MatrixBase::diagonal(), class Diagonal */ EIGEN_DEVICE_FUNC typename MatrixType::ConstDiagonalReturnType diagonal() const { return typename MatrixType::ConstDiagonalReturnType(m_matrix); } /////////// Cholesky module /////////// const LLT llt() const; const LDLT ldlt() const; /////////// Eigenvalue module /////////// /** Real part of #Scalar */ typedef typename NumTraits::Real RealScalar; /** Return type of eigenvalues() */ typedef Matrix::ColsAtCompileTime, 1> EigenvaluesReturnType; EIGEN_DEVICE_FUNC EigenvaluesReturnType eigenvalues() const; EIGEN_DEVICE_FUNC RealScalar operatorNorm() const; protected: MatrixTypeNested m_matrix; }; // template // internal::selfadjoint_matrix_product_returntype > // operator*(const MatrixBase& lhs, const SelfAdjointView& rhs) // { // return internal::matrix_selfadjoint_product_returntype >(lhs.derived(),rhs); // } // selfadjoint to dense matrix namespace internal { // TODO currently a selfadjoint expression has the form SelfAdjointView<.,.> // in the future selfadjoint-ness should be defined by the expression traits // such that Transpose > is valid. (currently TriangularBase::transpose() is overloaded to make it work) template struct evaluator_traits > { typedef typename storage_kind_to_evaluator_kind::Kind Kind; typedef SelfAdjointShape Shape; }; template class triangular_dense_assignment_kernel : public generic_dense_assignment_kernel { protected: typedef generic_dense_assignment_kernel Base; typedef typename Base::DstXprType DstXprType; typedef typename Base::SrcXprType SrcXprType; using Base::m_dst; using Base::m_src; using Base::m_functor; public: typedef typename Base::DstEvaluatorType DstEvaluatorType; typedef typename Base::SrcEvaluatorType SrcEvaluatorType; typedef typename Base::Scalar Scalar; typedef typename Base::AssignmentTraits AssignmentTraits; EIGEN_DEVICE_FUNC triangular_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr) : Base(dst, src, func, dstExpr) {} EIGEN_DEVICE_FUNC void assignCoeff(Index row, Index col) { eigen_internal_assert(row!=col); Scalar tmp = m_src.coeff(row,col); m_functor.assignCoeff(m_dst.coeffRef(row,col), tmp); m_functor.assignCoeff(m_dst.coeffRef(col,row), numext::conj(tmp)); } EIGEN_DEVICE_FUNC void assignDiagonalCoeff(Index id) { Base::assignCoeff(id,id); } EIGEN_DEVICE_FUNC void assignOppositeCoeff(Index, Index) { eigen_internal_assert(false && "should never be called"); } }; } // end namespace internal /*************************************************************************** * Implementation of MatrixBase methods ***************************************************************************/ /** This is the const version of MatrixBase::selfadjointView() */ template template EIGEN_DEVICE_FUNC typename MatrixBase::template ConstSelfAdjointViewReturnType::Type MatrixBase::selfadjointView() const { return typename ConstSelfAdjointViewReturnType::Type(derived()); } /** \returns an expression of a symmetric/self-adjoint view extracted from the upper or lower triangular part of the current matrix * * The parameter \a UpLo can be either \c #Upper or \c #Lower * * Example: \include MatrixBase_selfadjointView.cpp * Output: \verbinclude MatrixBase_selfadjointView.out * * \sa class SelfAdjointView */ template template EIGEN_DEVICE_FUNC typename MatrixBase::template SelfAdjointViewReturnType::Type MatrixBase::selfadjointView() { return typename SelfAdjointViewReturnType::Type(derived()); } } // end namespace Eigen #endif // EIGEN_SELFADJOINTMATRIX_H RcppEigen/inst/include/Eigen/src/Core/Map.h0000644000176200001440000001613014562417221020127 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007-2010 Benoit Jacob // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MAP_H #define EIGEN_MAP_H namespace Eigen { namespace internal { template struct traits > : public traits { typedef traits TraitsBase; enum { PlainObjectTypeInnerSize = ((traits::Flags&RowMajorBit)==RowMajorBit) ? PlainObjectType::ColsAtCompileTime : PlainObjectType::RowsAtCompileTime, InnerStrideAtCompileTime = StrideType::InnerStrideAtCompileTime == 0 ? int(PlainObjectType::InnerStrideAtCompileTime) : int(StrideType::InnerStrideAtCompileTime), OuterStrideAtCompileTime = StrideType::OuterStrideAtCompileTime == 0 ? (InnerStrideAtCompileTime==Dynamic || PlainObjectTypeInnerSize==Dynamic ? Dynamic : int(InnerStrideAtCompileTime) * int(PlainObjectTypeInnerSize)) : int(StrideType::OuterStrideAtCompileTime), Alignment = int(MapOptions)&int(AlignedMask), Flags0 = TraitsBase::Flags & (~NestByRefBit), Flags = is_lvalue::value ? int(Flags0) : (int(Flags0) & ~LvalueBit) }; private: enum { Options }; // Expressions don't have Options }; } /** \class Map * \ingroup Core_Module * * \brief A matrix or vector expression mapping an existing array of data. * * \tparam PlainObjectType the equivalent matrix type of the mapped data * \tparam MapOptions specifies the pointer alignment in bytes. It can be: \c #Aligned128, \c #Aligned64, \c #Aligned32, \c #Aligned16, \c #Aligned8 or \c #Unaligned. * The default is \c #Unaligned. * \tparam StrideType optionally specifies strides. By default, Map assumes the memory layout * of an ordinary, contiguous array. This can be overridden by specifying strides. * The type passed here must be a specialization of the Stride template, see examples below. * * This class represents a matrix or vector expression mapping an existing array of data. * It can be used to let Eigen interface without any overhead with non-Eigen data structures, * such as plain C arrays or structures from other libraries. By default, it assumes that the * data is laid out contiguously in memory. You can however override this by explicitly specifying * inner and outer strides. * * Here's an example of simply mapping a contiguous array as a \ref TopicStorageOrders "column-major" matrix: * \include Map_simple.cpp * Output: \verbinclude Map_simple.out * * If you need to map non-contiguous arrays, you can do so by specifying strides: * * Here's an example of mapping an array as a vector, specifying an inner stride, that is, the pointer * increment between two consecutive coefficients. Here, we're specifying the inner stride as a compile-time * fixed value. * \include Map_inner_stride.cpp * Output: \verbinclude Map_inner_stride.out * * Here's an example of mapping an array while specifying an outer stride. Here, since we're mapping * as a column-major matrix, 'outer stride' means the pointer increment between two consecutive columns. * Here, we're specifying the outer stride as a runtime parameter. Note that here \c OuterStride<> is * a short version of \c OuterStride because the default template parameter of OuterStride * is \c Dynamic * \include Map_outer_stride.cpp * Output: \verbinclude Map_outer_stride.out * * For more details and for an example of specifying both an inner and an outer stride, see class Stride. * * \b Tip: to change the array of data mapped by a Map object, you can use the C++ * placement new syntax: * * Example: \include Map_placement_new.cpp * Output: \verbinclude Map_placement_new.out * * This class is the return type of PlainObjectBase::Map() but can also be used directly. * * \sa PlainObjectBase::Map(), \ref TopicStorageOrders */ template class Map : public MapBase > { public: typedef MapBase Base; EIGEN_DENSE_PUBLIC_INTERFACE(Map) typedef typename Base::PointerType PointerType; typedef PointerType PointerArgType; EIGEN_DEVICE_FUNC inline PointerType cast_to_pointer_type(PointerArgType ptr) { return ptr; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index innerStride() const { return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR inline Index outerStride() const { return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer() : internal::traits::OuterStrideAtCompileTime != Dynamic ? Index(internal::traits::OuterStrideAtCompileTime) : IsVectorAtCompileTime ? (this->size() * innerStride()) : int(Flags)&RowMajorBit ? (this->cols() * innerStride()) : (this->rows() * innerStride()); } /** Constructor in the fixed-size case. * * \param dataPtr pointer to the array to map * \param stride optional Stride object, passing the strides. */ EIGEN_DEVICE_FUNC explicit inline Map(PointerArgType dataPtr, const StrideType& stride = StrideType()) : Base(cast_to_pointer_type(dataPtr)), m_stride(stride) { PlainObjectType::Base::_check_template_params(); } /** Constructor in the dynamic-size vector case. * * \param dataPtr pointer to the array to map * \param size the size of the vector expression * \param stride optional Stride object, passing the strides. */ EIGEN_DEVICE_FUNC inline Map(PointerArgType dataPtr, Index size, const StrideType& stride = StrideType()) : Base(cast_to_pointer_type(dataPtr), size), m_stride(stride) { PlainObjectType::Base::_check_template_params(); } /** Constructor in the dynamic-size matrix case. * * \param dataPtr pointer to the array to map * \param rows the number of rows of the matrix expression * \param cols the number of columns of the matrix expression * \param stride optional Stride object, passing the strides. */ EIGEN_DEVICE_FUNC inline Map(PointerArgType dataPtr, Index rows, Index cols, const StrideType& stride = StrideType()) : Base(cast_to_pointer_type(dataPtr), rows, cols), m_stride(stride) { PlainObjectType::Base::_check_template_params(); } EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Map) protected: StrideType m_stride; }; } // end namespace Eigen #endif // EIGEN_MAP_H RcppEigen/inst/include/Eigen/src/Core/util/0000755000176200001440000000000014562417221020215 5ustar liggesusersRcppEigen/inst/include/Eigen/src/Core/util/XprHelper.h0000644000176200001440000010566214562417221022311 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_XPRHELPER_H #define EIGEN_XPRHELPER_H // just a workaround because GCC seems to not really like empty structs // FIXME: gcc 4.3 generates bad code when strict-aliasing is enabled // so currently we simply disable this optimization for gcc 4.3 #if EIGEN_COMP_GNUC && !EIGEN_GNUC_AT(4,3) #define EIGEN_EMPTY_STRUCT_CTOR(X) \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE X() {} \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE X(const X& ) {} #else #define EIGEN_EMPTY_STRUCT_CTOR(X) #endif namespace Eigen { namespace internal { template EIGEN_DEVICE_FUNC inline IndexDest convert_index(const IndexSrc& idx) { // for sizeof(IndexDest)>=sizeof(IndexSrc) compilers should be able to optimize this away: eigen_internal_assert(idx <= NumTraits::highest() && "Index value to big for target type"); return IndexDest(idx); } // true if T can be considered as an integral index (i.e., and integral type or enum) template struct is_valid_index_type { enum { value = #if EIGEN_HAS_TYPE_TRAITS internal::is_integral::value || std::is_enum::value #elif EIGEN_COMP_MSVC internal::is_integral::value || __is_enum(T) #else // without C++11, we use is_convertible to Index instead of is_integral in order to treat enums as Index. internal::is_convertible::value && !internal::is_same::value && !is_same::value #endif }; }; // true if both types are not valid index types template struct valid_indexed_view_overload { enum { value = !(internal::is_valid_index_type::value && internal::is_valid_index_type::value) }; }; // promote_scalar_arg is an helper used in operation between an expression and a scalar, like: // expression * scalar // Its role is to determine how the type T of the scalar operand should be promoted given the scalar type ExprScalar of the given expression. // The IsSupported template parameter must be provided by the caller as: internal::has_ReturnType >::value using the proper order for ExprScalar and T. // Then the logic is as follows: // - if the operation is natively supported as defined by IsSupported, then the scalar type is not promoted, and T is returned. // - otherwise, NumTraits::Literal is returned if T is implicitly convertible to NumTraits::Literal AND that this does not imply a float to integer conversion. // - otherwise, ExprScalar is returned if T is implicitly convertible to ExprScalar AND that this does not imply a float to integer conversion. // - In all other cases, the promoted type is not defined, and the respective operation is thus invalid and not available (SFINAE). template struct promote_scalar_arg; template struct promote_scalar_arg { typedef T type; }; // Recursively check safe conversion to PromotedType, and then ExprScalar if they are different. template::value, bool IsSafe = NumTraits::IsInteger || !NumTraits::IsInteger> struct promote_scalar_arg_unsupported; // Start recursion with NumTraits::Literal template struct promote_scalar_arg : promote_scalar_arg_unsupported::Literal> {}; // We found a match! template struct promote_scalar_arg_unsupported { typedef PromotedType type; }; // No match, but no real-to-integer issues, and ExprScalar and current PromotedType are different, // so let's try to promote to ExprScalar template struct promote_scalar_arg_unsupported : promote_scalar_arg_unsupported {}; // Unsafe real-to-integer, let's stop. template struct promote_scalar_arg_unsupported {}; // T is not even convertible to ExprScalar, let's stop. template struct promote_scalar_arg_unsupported {}; //classes inheriting no_assignment_operator don't generate a default operator=. class no_assignment_operator { private: no_assignment_operator& operator=(const no_assignment_operator&); protected: EIGEN_DEFAULT_COPY_CONSTRUCTOR(no_assignment_operator) EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(no_assignment_operator) }; /** \internal return the index type with the largest number of bits */ template struct promote_index_type { typedef typename conditional<(sizeof(I1)::type type; }; /** \internal If the template parameter Value is Dynamic, this class is just a wrapper around a T variable that * can be accessed using value() and setValue(). * Otherwise, this class is an empty structure and value() just returns the template parameter Value. */ template class variable_if_dynamic { public: EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(variable_if_dynamic) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit variable_if_dynamic(T v) { EIGEN_ONLY_USED_FOR_DEBUG(v); eigen_assert(v == T(Value)); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE EIGEN_CONSTEXPR T value() { return T(Value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR operator T() const { return T(Value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void setValue(T v) const { EIGEN_ONLY_USED_FOR_DEBUG(v); eigen_assert(v == T(Value)); } }; template class variable_if_dynamic { T m_value; public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit variable_if_dynamic(T value = 0) EIGEN_NO_THROW : m_value(value) {} EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T value() const { return m_value; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE operator T() const { return m_value; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void setValue(T value) { m_value = value; } }; /** \internal like variable_if_dynamic but for DynamicIndex */ template class variable_if_dynamicindex { public: EIGEN_EMPTY_STRUCT_CTOR(variable_if_dynamicindex) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit variable_if_dynamicindex(T v) { EIGEN_ONLY_USED_FOR_DEBUG(v); eigen_assert(v == T(Value)); } EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE EIGEN_CONSTEXPR T value() { return T(Value); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void setValue(T) {} }; template class variable_if_dynamicindex { T m_value; EIGEN_DEVICE_FUNC variable_if_dynamicindex() { eigen_assert(false); } public: EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit variable_if_dynamicindex(T value) : m_value(value) {} EIGEN_DEVICE_FUNC T EIGEN_STRONG_INLINE value() const { return m_value; } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void setValue(T value) { m_value = value; } }; template struct functor_traits { enum { Cost = 10, PacketAccess = false, IsRepeatable = false }; }; template struct packet_traits; template struct unpacket_traits; template::size)==0 || is_same::half>::value> struct find_best_packet_helper; template< int Size, typename PacketType> struct find_best_packet_helper { typedef PacketType type; }; template struct find_best_packet_helper { typedef typename find_best_packet_helper::half>::type type; }; template struct find_best_packet { typedef typename find_best_packet_helper::type>::type type; }; #if EIGEN_MAX_STATIC_ALIGN_BYTES>0 template struct compute_default_alignment_helper { enum { value = 0 }; }; template struct compute_default_alignment_helper // Match { enum { value = AlignmentBytes }; }; template struct compute_default_alignment_helper // Try-half { // current packet too large, try with an half-packet enum { value = compute_default_alignment_helper::value }; }; #else // If static alignment is disabled, no need to bother. // This also avoids a division by zero in "bool Match = bool((ArrayBytes%AlignmentBytes)==0)" template struct compute_default_alignment_helper { enum { value = 0 }; }; #endif template struct compute_default_alignment { enum { value = compute_default_alignment_helper::value }; }; template struct compute_default_alignment { enum { value = EIGEN_MAX_ALIGN_BYTES }; }; template class make_proper_matrix_type { enum { IsColVector = _Cols==1 && _Rows!=1, IsRowVector = _Rows==1 && _Cols!=1, Options = IsColVector ? (_Options | ColMajor) & ~RowMajor : IsRowVector ? (_Options | RowMajor) & ~ColMajor : _Options }; public: typedef Matrix<_Scalar, _Rows, _Cols, Options, _MaxRows, _MaxCols> type; }; template class compute_matrix_flags { enum { row_major_bit = Options&RowMajor ? RowMajorBit : 0 }; public: // FIXME currently we still have to handle DirectAccessBit at the expression level to handle DenseCoeffsBase<> // and then propagate this information to the evaluator's flags. // However, I (Gael) think that DirectAccessBit should only matter at the evaluation stage. enum { ret = DirectAccessBit | LvalueBit | NestByRefBit | row_major_bit }; }; template struct size_at_compile_time { enum { ret = (_Rows==Dynamic || _Cols==Dynamic) ? Dynamic : _Rows * _Cols }; }; template struct size_of_xpr_at_compile_time { enum { ret = size_at_compile_time::RowsAtCompileTime,traits::ColsAtCompileTime>::ret }; }; /* plain_matrix_type : the difference from eval is that plain_matrix_type is always a plain matrix type, * whereas eval is a const reference in the case of a matrix */ template::StorageKind> struct plain_matrix_type; template struct plain_matrix_type_dense; template struct plain_matrix_type { typedef typename plain_matrix_type_dense::XprKind, traits::Flags>::type type; }; template struct plain_matrix_type { typedef typename T::PlainObject type; }; template struct plain_matrix_type_dense { typedef Matrix::Scalar, traits::RowsAtCompileTime, traits::ColsAtCompileTime, AutoAlign | (Flags&RowMajorBit ? RowMajor : ColMajor), traits::MaxRowsAtCompileTime, traits::MaxColsAtCompileTime > type; }; template struct plain_matrix_type_dense { typedef Array::Scalar, traits::RowsAtCompileTime, traits::ColsAtCompileTime, AutoAlign | (Flags&RowMajorBit ? RowMajor : ColMajor), traits::MaxRowsAtCompileTime, traits::MaxColsAtCompileTime > type; }; /* eval : the return type of eval(). For matrices, this is just a const reference * in order to avoid a useless copy */ template::StorageKind> struct eval; template struct eval { typedef typename plain_matrix_type::type type; // typedef typename T::PlainObject type; // typedef T::Matrix::Scalar, // traits::RowsAtCompileTime, // traits::ColsAtCompileTime, // AutoAlign | (traits::Flags&RowMajorBit ? RowMajor : ColMajor), // traits::MaxRowsAtCompileTime, // traits::MaxColsAtCompileTime // > type; }; template struct eval { typedef typename plain_matrix_type::type type; }; // for matrices, no need to evaluate, just use a const reference to avoid a useless copy template struct eval, Dense> { typedef const Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>& type; }; template struct eval, Dense> { typedef const Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>& type; }; /* similar to plain_matrix_type, but using the evaluator's Flags */ template::StorageKind> struct plain_object_eval; template struct plain_object_eval { typedef typename plain_matrix_type_dense::XprKind, evaluator::Flags>::type type; }; /* plain_matrix_type_column_major : same as plain_matrix_type but guaranteed to be column-major */ template struct plain_matrix_type_column_major { enum { Rows = traits::RowsAtCompileTime, Cols = traits::ColsAtCompileTime, MaxRows = traits::MaxRowsAtCompileTime, MaxCols = traits::MaxColsAtCompileTime }; typedef Matrix::Scalar, Rows, Cols, (MaxRows==1&&MaxCols!=1) ? RowMajor : ColMajor, MaxRows, MaxCols > type; }; /* plain_matrix_type_row_major : same as plain_matrix_type but guaranteed to be row-major */ template struct plain_matrix_type_row_major { enum { Rows = traits::RowsAtCompileTime, Cols = traits::ColsAtCompileTime, MaxRows = traits::MaxRowsAtCompileTime, MaxCols = traits::MaxColsAtCompileTime }; typedef Matrix::Scalar, Rows, Cols, (MaxCols==1&&MaxRows!=1) ? ColMajor : RowMajor, MaxRows, MaxCols > type; }; /** \internal The reference selector for template expressions. The idea is that we don't * need to use references for expressions since they are light weight proxy * objects which should generate no copying overhead. */ template struct ref_selector { typedef typename conditional< bool(traits::Flags & NestByRefBit), T const&, const T >::type type; typedef typename conditional< bool(traits::Flags & NestByRefBit), T &, T >::type non_const_type; }; /** \internal Adds the const qualifier on the value-type of T2 if and only if T1 is a const type */ template struct transfer_constness { typedef typename conditional< bool(internal::is_const::value), typename internal::add_const_on_value_type::type, T2 >::type type; }; // However, we still need a mechanism to detect whether an expression which is evaluated multiple time // has to be evaluated into a temporary. // That's the purpose of this new nested_eval helper: /** \internal Determines how a given expression should be nested when evaluated multiple times. * For example, when you do a * (b+c), Eigen will determine how the expression b+c should be * evaluated into the bigger product expression. The choice is between nesting the expression b+c as-is, or * evaluating that expression b+c into a temporary variable d, and nest d so that the resulting expression is * a*d. Evaluating can be beneficial for example if every coefficient access in the resulting expression causes * many coefficient accesses in the nested expressions -- as is the case with matrix product for example. * * \tparam T the type of the expression being nested. * \tparam n the number of coefficient accesses in the nested expression for each coefficient access in the bigger expression. * \tparam PlainObject the type of the temporary if needed. */ template::type> struct nested_eval { enum { ScalarReadCost = NumTraits::Scalar>::ReadCost, CoeffReadCost = evaluator::CoeffReadCost, // NOTE What if an evaluator evaluate itself into a temporary? // Then CoeffReadCost will be small (e.g., 1) but we still have to evaluate, especially if n>1. // This situation is already taken care by the EvalBeforeNestingBit flag, which is turned ON // for all evaluator creating a temporary. This flag is then propagated by the parent evaluators. // Another solution could be to count the number of temps? NAsInteger = n == Dynamic ? HugeCost : n, CostEval = (NAsInteger+1) * ScalarReadCost + CoeffReadCost, CostNoEval = NAsInteger * CoeffReadCost, Evaluate = (int(evaluator::Flags) & EvalBeforeNestingBit) || (int(CostEval) < int(CostNoEval)) }; typedef typename conditional::type>::type type; }; template EIGEN_DEVICE_FUNC inline T* const_cast_ptr(const T* ptr) { return const_cast(ptr); } template::XprKind> struct dense_xpr_base { /* dense_xpr_base should only ever be used on dense expressions, thus falling either into the MatrixXpr or into the ArrayXpr cases */ }; template struct dense_xpr_base { typedef MatrixBase type; }; template struct dense_xpr_base { typedef ArrayBase type; }; template::XprKind, typename StorageKind = typename traits::StorageKind> struct generic_xpr_base; template struct generic_xpr_base { typedef typename dense_xpr_base::type type; }; template struct cast_return_type { typedef typename XprType::Scalar CurrentScalarType; typedef typename remove_all::type _CastType; typedef typename _CastType::Scalar NewScalarType; typedef typename conditional::value, const XprType&,CastType>::type type; }; template struct promote_storage_type; template struct promote_storage_type { typedef A ret; }; template struct promote_storage_type { typedef A ret; }; template struct promote_storage_type { typedef A ret; }; /** \internal Specify the "storage kind" of applying a coefficient-wise * binary operations between two expressions of kinds A and B respectively. * The template parameter Functor permits to specialize the resulting storage kind wrt to * the functor. * The default rules are as follows: * \code * A op A -> A * A op dense -> dense * dense op B -> dense * sparse op dense -> sparse * dense op sparse -> sparse * \endcode */ template struct cwise_promote_storage_type; template struct cwise_promote_storage_type { typedef A ret; }; template struct cwise_promote_storage_type { typedef Dense ret; }; template struct cwise_promote_storage_type { typedef Dense ret; }; template struct cwise_promote_storage_type { typedef Dense ret; }; template struct cwise_promote_storage_type { typedef Sparse ret; }; template struct cwise_promote_storage_type { typedef Sparse ret; }; template struct cwise_promote_storage_order { enum { value = LhsOrder }; }; template struct cwise_promote_storage_order { enum { value = RhsOrder }; }; template struct cwise_promote_storage_order { enum { value = LhsOrder }; }; template struct cwise_promote_storage_order { enum { value = Order }; }; /** \internal Specify the "storage kind" of multiplying an expression of kind A with kind B. * The template parameter ProductTag permits to specialize the resulting storage kind wrt to * some compile-time properties of the product: GemmProduct, GemvProduct, OuterProduct, InnerProduct. * The default rules are as follows: * \code * K * K -> K * dense * K -> dense * K * dense -> dense * diag * K -> K * K * diag -> K * Perm * K -> K * K * Perm -> K * \endcode */ template struct product_promote_storage_type; template struct product_promote_storage_type { typedef A ret;}; template struct product_promote_storage_type { typedef Dense ret;}; template struct product_promote_storage_type { typedef Dense ret; }; template struct product_promote_storage_type { typedef Dense ret; }; template struct product_promote_storage_type { typedef A ret; }; template struct product_promote_storage_type { typedef B ret; }; template struct product_promote_storage_type { typedef Dense ret; }; template struct product_promote_storage_type { typedef Dense ret; }; template struct product_promote_storage_type { typedef A ret; }; template struct product_promote_storage_type { typedef B ret; }; template struct product_promote_storage_type { typedef Dense ret; }; template struct product_promote_storage_type { typedef Dense ret; }; /** \internal gives the plain matrix or array type to store a row/column/diagonal of a matrix type. * \tparam Scalar optional parameter allowing to pass a different scalar type than the one of the MatrixType. */ template struct plain_row_type { typedef Matrix MatrixRowType; typedef Array ArrayRowType; typedef typename conditional< is_same< typename traits::XprKind, MatrixXpr >::value, MatrixRowType, ArrayRowType >::type type; }; template struct plain_col_type { typedef Matrix MatrixColType; typedef Array ArrayColType; typedef typename conditional< is_same< typename traits::XprKind, MatrixXpr >::value, MatrixColType, ArrayColType >::type type; }; template struct plain_diag_type { enum { diag_size = EIGEN_SIZE_MIN_PREFER_DYNAMIC(ExpressionType::RowsAtCompileTime, ExpressionType::ColsAtCompileTime), max_diag_size = EIGEN_SIZE_MIN_PREFER_FIXED(ExpressionType::MaxRowsAtCompileTime, ExpressionType::MaxColsAtCompileTime) }; typedef Matrix MatrixDiagType; typedef Array ArrayDiagType; typedef typename conditional< is_same< typename traits::XprKind, MatrixXpr >::value, MatrixDiagType, ArrayDiagType >::type type; }; template struct plain_constant_type { enum { Options = (traits::Flags&RowMajorBit)?RowMajor:0 }; typedef Array::RowsAtCompileTime, traits::ColsAtCompileTime, Options, traits::MaxRowsAtCompileTime,traits::MaxColsAtCompileTime> array_type; typedef Matrix::RowsAtCompileTime, traits::ColsAtCompileTime, Options, traits::MaxRowsAtCompileTime,traits::MaxColsAtCompileTime> matrix_type; typedef CwiseNullaryOp, const typename conditional::XprKind, MatrixXpr >::value, matrix_type, array_type>::type > type; }; template struct is_lvalue { enum { value = (!bool(is_const::value)) && bool(traits::Flags & LvalueBit) }; }; template struct is_diagonal { enum { ret = false }; }; template struct is_diagonal > { enum { ret = true }; }; template struct is_diagonal > { enum { ret = true }; }; template struct is_diagonal > { enum { ret = true }; }; template struct is_identity { enum { value = false }; }; template struct is_identity, T> > { enum { value = true }; }; template struct glue_shapes; template<> struct glue_shapes { typedef TriangularShape type; }; template struct possibly_same_dense { enum { value = has_direct_access::ret && has_direct_access::ret && is_same::value }; }; template EIGEN_DEVICE_FUNC bool is_same_dense(const T1 &mat1, const T2 &mat2, typename enable_if::value>::type * = 0) { return (mat1.data()==mat2.data()) && (mat1.innerStride()==mat2.innerStride()) && (mat1.outerStride()==mat2.outerStride()); } template EIGEN_DEVICE_FUNC bool is_same_dense(const T1 &, const T2 &, typename enable_if::value>::type * = 0) { return false; } // Internal helper defining the cost of a scalar division for the type T. // The default heuristic can be specialized for each scalar type and architecture. template struct scalar_div_cost { enum { value = 8*NumTraits::MulCost }; }; template struct scalar_div_cost, Vectorized> { enum { value = 2*scalar_div_cost::value + 6*NumTraits::MulCost + 3*NumTraits::AddCost }; }; template struct scalar_div_cost::type> { enum { value = 24 }; }; template struct scalar_div_cost::type> { enum { value = 21 }; }; #ifdef EIGEN_DEBUG_ASSIGN std::string demangle_traversal(int t) { if(t==DefaultTraversal) return "DefaultTraversal"; if(t==LinearTraversal) return "LinearTraversal"; if(t==InnerVectorizedTraversal) return "InnerVectorizedTraversal"; if(t==LinearVectorizedTraversal) return "LinearVectorizedTraversal"; if(t==SliceVectorizedTraversal) return "SliceVectorizedTraversal"; return "?"; } std::string demangle_unrolling(int t) { if(t==NoUnrolling) return "NoUnrolling"; if(t==InnerUnrolling) return "InnerUnrolling"; if(t==CompleteUnrolling) return "CompleteUnrolling"; return "?"; } std::string demangle_flags(int f) { std::string res; if(f&RowMajorBit) res += " | RowMajor"; if(f&PacketAccessBit) res += " | Packet"; if(f&LinearAccessBit) res += " | Linear"; if(f&LvalueBit) res += " | Lvalue"; if(f&DirectAccessBit) res += " | Direct"; if(f&NestByRefBit) res += " | NestByRef"; if(f&NoPreferredStorageOrderBit) res += " | NoPreferredStorageOrderBit"; return res; } #endif } // end namespace internal /** \class ScalarBinaryOpTraits * \ingroup Core_Module * * \brief Determines whether the given binary operation of two numeric types is allowed and what the scalar return type is. * * This class permits to control the scalar return type of any binary operation performed on two different scalar types through (partial) template specializations. * * For instance, let \c U1, \c U2 and \c U3 be three user defined scalar types for which most operations between instances of \c U1 and \c U2 returns an \c U3. * You can let %Eigen knows that by defining: \code template struct ScalarBinaryOpTraits { typedef U3 ReturnType; }; template struct ScalarBinaryOpTraits { typedef U3 ReturnType; }; \endcode * You can then explicitly disable some particular operations to get more explicit error messages: \code template<> struct ScalarBinaryOpTraits > {}; \endcode * Or customize the return type for individual operation: \code template<> struct ScalarBinaryOpTraits > { typedef U1 ReturnType; }; \endcode * * By default, the following generic combinations are supported:
ScalarAScalarBBinaryOpReturnTypeNote
\c T \c T \c * \c T
\c NumTraits::Real \c T \c * \c T Only if \c NumTraits::IsComplex
\c T \c NumTraits::Real \c * \c T Only if \c NumTraits::IsComplex
* * \sa CwiseBinaryOp */ template > struct ScalarBinaryOpTraits #ifndef EIGEN_PARSED_BY_DOXYGEN // for backward compatibility, use the hints given by the (deprecated) internal::scalar_product_traits class. : internal::scalar_product_traits #endif // EIGEN_PARSED_BY_DOXYGEN {}; template struct ScalarBinaryOpTraits { typedef T ReturnType; }; template struct ScalarBinaryOpTraits::IsComplex,T>::type>::Real, BinaryOp> { typedef T ReturnType; }; template struct ScalarBinaryOpTraits::IsComplex,T>::type>::Real, T, BinaryOp> { typedef T ReturnType; }; // For Matrix * Permutation template struct ScalarBinaryOpTraits { typedef T ReturnType; }; // For Permutation * Matrix template struct ScalarBinaryOpTraits { typedef T ReturnType; }; // for Permutation*Permutation template struct ScalarBinaryOpTraits { typedef void ReturnType; }; // We require Lhs and Rhs to have "compatible" scalar types. // It is tempting to always allow mixing different types but remember that this is often impossible in the vectorized paths. // So allowing mixing different types gives very unexpected errors when enabling vectorization, when the user tries to // add together a float matrix and a double matrix. #define EIGEN_CHECK_BINARY_COMPATIBILIY(BINOP,LHS,RHS) \ EIGEN_STATIC_ASSERT((Eigen::internal::has_ReturnType >::value), \ YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY) } // end namespace Eigen #endif // EIGEN_XPRHELPER_H RcppEigen/inst/include/Eigen/src/Core/util/ReshapedHelper.h0000644000176200001440000000263014562417221023262 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_RESHAPED_HELPER_H #define EIGEN_RESHAPED_HELPER_H namespace Eigen { enum AutoSize_t { AutoSize }; const int AutoOrder = 2; namespace internal { template struct get_compiletime_reshape_size { enum { value = get_fixed_value::value }; }; template Index get_runtime_reshape_size(SizeType size, Index /*other*/, Index /*total*/) { return internal::get_runtime_value(size); } template struct get_compiletime_reshape_size { enum { other_size = get_fixed_value::value, value = (TotalSize==Dynamic || other_size==Dynamic) ? Dynamic : TotalSize / other_size }; }; inline Index get_runtime_reshape_size(AutoSize_t /*size*/, Index other, Index total) { return total/other; } template struct get_compiletime_reshape_order { enum { value = Order == AutoOrder ? Flags & RowMajorBit : Order }; }; } } // end namespace Eigen #endif // EIGEN_RESHAPED_HELPER_H RcppEigen/inst/include/Eigen/src/Core/util/Memory.h0000644000176200001440000013310514562417221021641 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2015 Gael Guennebaud // Copyright (C) 2008-2009 Benoit Jacob // Copyright (C) 2009 Kenneth Riddile // Copyright (C) 2010 Hauke Heibel // Copyright (C) 2010 Thomas Capricelli // Copyright (C) 2013 Pavel Holoborodko // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. /***************************************************************************** *** Platform checks for aligned malloc functions *** *****************************************************************************/ #ifndef EIGEN_MEMORY_H #define EIGEN_MEMORY_H #ifndef EIGEN_MALLOC_ALREADY_ALIGNED // Try to determine automatically if malloc is already aligned. // On 64-bit systems, glibc's malloc returns 16-byte-aligned pointers, see: // http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html // This is true at least since glibc 2.8. // This leaves the question how to detect 64-bit. According to this document, // http://gcc.fyxm.net/summit/2003/Porting%20to%2064%20bit.pdf // page 114, "[The] LP64 model [...] is used by all 64-bit UNIX ports" so it's indeed // quite safe, at least within the context of glibc, to equate 64-bit with LP64. #if defined(__GLIBC__) && ((__GLIBC__>=2 && __GLIBC_MINOR__ >= 8) || __GLIBC__>2) \ && defined(__LP64__) && ! defined( __SANITIZE_ADDRESS__ ) && (EIGEN_DEFAULT_ALIGN_BYTES == 16) #define EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED 1 #else #define EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED 0 #endif // FreeBSD 6 seems to have 16-byte aligned malloc // See http://svn.freebsd.org/viewvc/base/stable/6/lib/libc/stdlib/malloc.c?view=markup // FreeBSD 7 seems to have 16-byte aligned malloc except on ARM and MIPS architectures // See http://svn.freebsd.org/viewvc/base/stable/7/lib/libc/stdlib/malloc.c?view=markup #if defined(__FreeBSD__) && !(EIGEN_ARCH_ARM || EIGEN_ARCH_MIPS) && (EIGEN_DEFAULT_ALIGN_BYTES == 16) #define EIGEN_FREEBSD_MALLOC_ALREADY_ALIGNED 1 #else #define EIGEN_FREEBSD_MALLOC_ALREADY_ALIGNED 0 #endif #if (EIGEN_OS_MAC && (EIGEN_DEFAULT_ALIGN_BYTES == 16)) \ || (EIGEN_OS_WIN64 && (EIGEN_DEFAULT_ALIGN_BYTES == 16)) \ || EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED \ || EIGEN_FREEBSD_MALLOC_ALREADY_ALIGNED #define EIGEN_MALLOC_ALREADY_ALIGNED 1 #else #define EIGEN_MALLOC_ALREADY_ALIGNED 0 #endif #endif namespace Eigen { namespace internal { EIGEN_DEVICE_FUNC inline void throw_std_bad_alloc() { #ifdef EIGEN_EXCEPTIONS throw std::bad_alloc(); #else std::size_t huge = static_cast(-1); #if defined(EIGEN_HIPCC) // // calls to "::operator new" are to be treated as opaque function calls (i.e no inlining), // and as a consequence the code in the #else block triggers the hipcc warning : // "no overloaded function has restriction specifiers that are compatible with the ambient context" // // "throw_std_bad_alloc" has the EIGEN_DEVICE_FUNC attribute, so it seems that hipcc expects // the same on "operator new" // Reverting code back to the old version in this #if block for the hipcc compiler // new int[huge]; #else void* unused = ::operator new(huge); EIGEN_UNUSED_VARIABLE(unused); #endif #endif } /***************************************************************************** *** Implementation of handmade aligned functions *** *****************************************************************************/ /* ----- Hand made implementations of aligned malloc/free and realloc ----- */ /** \internal Like malloc, but the returned pointer is guaranteed to be 16-byte aligned. * Fast, but wastes 16 additional bytes of memory. Does not throw any exception. */ EIGEN_DEVICE_FUNC inline void* handmade_aligned_malloc(std::size_t size, std::size_t alignment = EIGEN_DEFAULT_ALIGN_BYTES) { eigen_assert(alignment >= sizeof(void*) && (alignment & (alignment-1)) == 0 && "Alignment must be at least sizeof(void*) and a power of 2"); EIGEN_USING_STD(malloc) void *original = malloc(size+alignment); if (original == 0) return 0; void *aligned = reinterpret_cast((reinterpret_cast(original) & ~(std::size_t(alignment-1))) + alignment); *(reinterpret_cast(aligned) - 1) = original; return aligned; } /** \internal Frees memory allocated with handmade_aligned_malloc */ EIGEN_DEVICE_FUNC inline void handmade_aligned_free(void *ptr) { if (ptr) { EIGEN_USING_STD(free) free(*(reinterpret_cast(ptr) - 1)); } } /** \internal * \brief Reallocates aligned memory. * Since we know that our handmade version is based on std::malloc * we can use std::realloc to implement efficient reallocation. */ inline void* handmade_aligned_realloc(void* ptr, std::size_t size, std::size_t = 0) { if (ptr == 0) return handmade_aligned_malloc(size); void *original = *(reinterpret_cast(ptr) - 1); std::ptrdiff_t previous_offset = static_cast(ptr)-static_cast(original); original = std::realloc(original,size+EIGEN_DEFAULT_ALIGN_BYTES); if (original == 0) return 0; void *aligned = reinterpret_cast((reinterpret_cast(original) & ~(std::size_t(EIGEN_DEFAULT_ALIGN_BYTES-1))) + EIGEN_DEFAULT_ALIGN_BYTES); void *previous_aligned = static_cast(original)+previous_offset; if(aligned!=previous_aligned) std::memmove(aligned, previous_aligned, size); *(reinterpret_cast(aligned) - 1) = original; return aligned; } /***************************************************************************** *** Implementation of portable aligned versions of malloc/free/realloc *** *****************************************************************************/ #ifdef EIGEN_NO_MALLOC EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed() { eigen_assert(false && "heap allocation is forbidden (EIGEN_NO_MALLOC is defined)"); } #elif defined EIGEN_RUNTIME_NO_MALLOC EIGEN_DEVICE_FUNC inline bool is_malloc_allowed_impl(bool update, bool new_value = false) { static bool value = true; if (update == 1) value = new_value; return value; } EIGEN_DEVICE_FUNC inline bool is_malloc_allowed() { return is_malloc_allowed_impl(false); } EIGEN_DEVICE_FUNC inline bool set_is_malloc_allowed(bool new_value) { return is_malloc_allowed_impl(true, new_value); } EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed() { eigen_assert(is_malloc_allowed() && "heap allocation is forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is false)"); } #else EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed() {} #endif /** \internal Allocates \a size bytes. The returned pointer is guaranteed to have 16 or 32 bytes alignment depending on the requirements. * On allocation error, the returned pointer is null, and std::bad_alloc is thrown. */ EIGEN_DEVICE_FUNC inline void* aligned_malloc(std::size_t size) { check_that_malloc_is_allowed(); void *result; #if (EIGEN_DEFAULT_ALIGN_BYTES==0) || EIGEN_MALLOC_ALREADY_ALIGNED EIGEN_USING_STD(malloc) result = malloc(size); #if EIGEN_DEFAULT_ALIGN_BYTES==16 eigen_assert((size<16 || (std::size_t(result)%16)==0) && "System's malloc returned an unaligned pointer. Compile with EIGEN_MALLOC_ALREADY_ALIGNED=0 to fallback to handmade aligned memory allocator."); #endif #else result = handmade_aligned_malloc(size); #endif if(!result && size) throw_std_bad_alloc(); return result; } /** \internal Frees memory allocated with aligned_malloc. */ EIGEN_DEVICE_FUNC inline void aligned_free(void *ptr) { #if (EIGEN_DEFAULT_ALIGN_BYTES==0) || EIGEN_MALLOC_ALREADY_ALIGNED EIGEN_USING_STD(free) free(ptr); #else handmade_aligned_free(ptr); #endif } /** * \internal * \brief Reallocates an aligned block of memory. * \throws std::bad_alloc on allocation failure */ inline void* aligned_realloc(void *ptr, std::size_t new_size, std::size_t old_size) { EIGEN_UNUSED_VARIABLE(old_size) void *result; #if (EIGEN_DEFAULT_ALIGN_BYTES==0) || EIGEN_MALLOC_ALREADY_ALIGNED result = std::realloc(ptr,new_size); #else result = handmade_aligned_realloc(ptr,new_size,old_size); #endif if (!result && new_size) throw_std_bad_alloc(); return result; } /***************************************************************************** *** Implementation of conditionally aligned functions *** *****************************************************************************/ /** \internal Allocates \a size bytes. If Align is true, then the returned ptr is 16-byte-aligned. * On allocation error, the returned pointer is null, and a std::bad_alloc is thrown. */ template EIGEN_DEVICE_FUNC inline void* conditional_aligned_malloc(std::size_t size) { return aligned_malloc(size); } template<> EIGEN_DEVICE_FUNC inline void* conditional_aligned_malloc(std::size_t size) { check_that_malloc_is_allowed(); EIGEN_USING_STD(malloc) void *result = malloc(size); if(!result && size) throw_std_bad_alloc(); return result; } /** \internal Frees memory allocated with conditional_aligned_malloc */ template EIGEN_DEVICE_FUNC inline void conditional_aligned_free(void *ptr) { aligned_free(ptr); } template<> EIGEN_DEVICE_FUNC inline void conditional_aligned_free(void *ptr) { EIGEN_USING_STD(free) free(ptr); } template inline void* conditional_aligned_realloc(void* ptr, std::size_t new_size, std::size_t old_size) { return aligned_realloc(ptr, new_size, old_size); } template<> inline void* conditional_aligned_realloc(void* ptr, std::size_t new_size, std::size_t) { return std::realloc(ptr, new_size); } /***************************************************************************** *** Construction/destruction of array elements *** *****************************************************************************/ /** \internal Destructs the elements of an array. * The \a size parameters tells on how many objects to call the destructor of T. */ template EIGEN_DEVICE_FUNC inline void destruct_elements_of_array(T *ptr, std::size_t size) { // always destruct an array starting from the end. if(ptr) while(size) ptr[--size].~T(); } /** \internal Constructs the elements of an array. * The \a size parameter tells on how many objects to call the constructor of T. */ template EIGEN_DEVICE_FUNC inline T* construct_elements_of_array(T *ptr, std::size_t size) { std::size_t i; EIGEN_TRY { for (i = 0; i < size; ++i) ::new (ptr + i) T; return ptr; } EIGEN_CATCH(...) { destruct_elements_of_array(ptr, i); EIGEN_THROW; } return NULL; } /***************************************************************************** *** Implementation of aligned new/delete-like functions *** *****************************************************************************/ template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void check_size_for_overflow(std::size_t size) { if(size > std::size_t(-1) / sizeof(T)) throw_std_bad_alloc(); } /** \internal Allocates \a size objects of type T. The returned pointer is guaranteed to have 16 bytes alignment. * On allocation error, the returned pointer is undefined, but a std::bad_alloc is thrown. * The default constructor of T is called. */ template EIGEN_DEVICE_FUNC inline T* aligned_new(std::size_t size) { check_size_for_overflow(size); T *result = reinterpret_cast(aligned_malloc(sizeof(T)*size)); EIGEN_TRY { return construct_elements_of_array(result, size); } EIGEN_CATCH(...) { aligned_free(result); EIGEN_THROW; } return result; } template EIGEN_DEVICE_FUNC inline T* conditional_aligned_new(std::size_t size) { check_size_for_overflow(size); T *result = reinterpret_cast(conditional_aligned_malloc(sizeof(T)*size)); EIGEN_TRY { return construct_elements_of_array(result, size); } EIGEN_CATCH(...) { conditional_aligned_free(result); EIGEN_THROW; } return result; } /** \internal Deletes objects constructed with aligned_new * The \a size parameters tells on how many objects to call the destructor of T. */ template EIGEN_DEVICE_FUNC inline void aligned_delete(T *ptr, std::size_t size) { destruct_elements_of_array(ptr, size); Eigen::internal::aligned_free(ptr); } /** \internal Deletes objects constructed with conditional_aligned_new * The \a size parameters tells on how many objects to call the destructor of T. */ template EIGEN_DEVICE_FUNC inline void conditional_aligned_delete(T *ptr, std::size_t size) { destruct_elements_of_array(ptr, size); conditional_aligned_free(ptr); } template EIGEN_DEVICE_FUNC inline T* conditional_aligned_realloc_new(T* pts, std::size_t new_size, std::size_t old_size) { check_size_for_overflow(new_size); check_size_for_overflow(old_size); if(new_size < old_size) destruct_elements_of_array(pts+new_size, old_size-new_size); T *result = reinterpret_cast(conditional_aligned_realloc(reinterpret_cast(pts), sizeof(T)*new_size, sizeof(T)*old_size)); if(new_size > old_size) { EIGEN_TRY { construct_elements_of_array(result+old_size, new_size-old_size); } EIGEN_CATCH(...) { conditional_aligned_free(result); EIGEN_THROW; } } return result; } template EIGEN_DEVICE_FUNC inline T* conditional_aligned_new_auto(std::size_t size) { if(size==0) return 0; // short-cut. Also fixes Bug 884 check_size_for_overflow(size); T *result = reinterpret_cast(conditional_aligned_malloc(sizeof(T)*size)); if(NumTraits::RequireInitialization) { EIGEN_TRY { construct_elements_of_array(result, size); } EIGEN_CATCH(...) { conditional_aligned_free(result); EIGEN_THROW; } } return result; } template inline T* conditional_aligned_realloc_new_auto(T* pts, std::size_t new_size, std::size_t old_size) { check_size_for_overflow(new_size); check_size_for_overflow(old_size); if(NumTraits::RequireInitialization && (new_size < old_size)) destruct_elements_of_array(pts+new_size, old_size-new_size); T *result = reinterpret_cast(conditional_aligned_realloc(reinterpret_cast(pts), sizeof(T)*new_size, sizeof(T)*old_size)); if(NumTraits::RequireInitialization && (new_size > old_size)) { EIGEN_TRY { construct_elements_of_array(result+old_size, new_size-old_size); } EIGEN_CATCH(...) { conditional_aligned_free(result); EIGEN_THROW; } } return result; } template EIGEN_DEVICE_FUNC inline void conditional_aligned_delete_auto(T *ptr, std::size_t size) { if(NumTraits::RequireInitialization) destruct_elements_of_array(ptr, size); conditional_aligned_free(ptr); } /****************************************************************************/ /** \internal Returns the index of the first element of the array that is well aligned with respect to the requested \a Alignment. * * \tparam Alignment requested alignment in Bytes. * \param array the address of the start of the array * \param size the size of the array * * \note If no element of the array is well aligned or the requested alignment is not a multiple of a scalar, * the size of the array is returned. For example with SSE, the requested alignment is typically 16-bytes. If * packet size for the given scalar type is 1, then everything is considered well-aligned. * * \note Otherwise, if the Alignment is larger that the scalar size, we rely on the assumptions that sizeof(Scalar) is a * power of 2. On the other hand, we do not assume that the array address is a multiple of sizeof(Scalar), as that fails for * example with Scalar=double on certain 32-bit platforms, see bug #79. * * There is also the variant first_aligned(const MatrixBase&) defined in DenseCoeffsBase.h. * \sa first_default_aligned() */ template EIGEN_DEVICE_FUNC inline Index first_aligned(const Scalar* array, Index size) { const Index ScalarSize = sizeof(Scalar); const Index AlignmentSize = Alignment / ScalarSize; const Index AlignmentMask = AlignmentSize-1; if(AlignmentSize<=1) { // Either the requested alignment if smaller than a scalar, or it exactly match a 1 scalar // so that all elements of the array have the same alignment. return 0; } else if( (UIntPtr(array) & (sizeof(Scalar)-1)) || (Alignment%ScalarSize)!=0) { // The array is not aligned to the size of a single scalar, or the requested alignment is not a multiple of the scalar size. // Consequently, no element of the array is well aligned. return size; } else { Index first = (AlignmentSize - (Index((UIntPtr(array)/sizeof(Scalar))) & AlignmentMask)) & AlignmentMask; return (first < size) ? first : size; } } /** \internal Returns the index of the first element of the array that is well aligned with respect the largest packet requirement. * \sa first_aligned(Scalar*,Index) and first_default_aligned(DenseBase) */ template EIGEN_DEVICE_FUNC inline Index first_default_aligned(const Scalar* array, Index size) { typedef typename packet_traits::type DefaultPacketType; return first_aligned::alignment>(array, size); } /** \internal Returns the smallest integer multiple of \a base and greater or equal to \a size */ template inline Index first_multiple(Index size, Index base) { return ((size+base-1)/base)*base; } // std::copy is much slower than memcpy, so let's introduce a smart_copy which // use memcpy on trivial types, i.e., on types that does not require an initialization ctor. template struct smart_copy_helper; template EIGEN_DEVICE_FUNC void smart_copy(const T* start, const T* end, T* target) { smart_copy_helper::RequireInitialization>::run(start, end, target); } template struct smart_copy_helper { EIGEN_DEVICE_FUNC static inline void run(const T* start, const T* end, T* target) { IntPtr size = IntPtr(end)-IntPtr(start); if(size==0) return; eigen_internal_assert(start!=0 && end!=0 && target!=0); EIGEN_USING_STD(memcpy) memcpy(target, start, size); } }; template struct smart_copy_helper { EIGEN_DEVICE_FUNC static inline void run(const T* start, const T* end, T* target) { std::copy(start, end, target); } }; // intelligent memmove. falls back to std::memmove for POD types, uses std::copy otherwise. template struct smart_memmove_helper; template void smart_memmove(const T* start, const T* end, T* target) { smart_memmove_helper::RequireInitialization>::run(start, end, target); } template struct smart_memmove_helper { static inline void run(const T* start, const T* end, T* target) { IntPtr size = IntPtr(end)-IntPtr(start); if(size==0) return; eigen_internal_assert(start!=0 && end!=0 && target!=0); std::memmove(target, start, size); } }; template struct smart_memmove_helper { static inline void run(const T* start, const T* end, T* target) { if (UIntPtr(target) < UIntPtr(start)) { std::copy(start, end, target); } else { std::ptrdiff_t count = (std::ptrdiff_t(end)-std::ptrdiff_t(start)) / sizeof(T); std::copy_backward(start, end, target + count); } } }; #if EIGEN_HAS_RVALUE_REFERENCES template EIGEN_DEVICE_FUNC T* smart_move(T* start, T* end, T* target) { return std::move(start, end, target); } #else template EIGEN_DEVICE_FUNC T* smart_move(T* start, T* end, T* target) { return std::copy(start, end, target); } #endif /***************************************************************************** *** Implementation of runtime stack allocation (falling back to malloc) *** *****************************************************************************/ // you can overwrite Eigen's default behavior regarding alloca by defining EIGEN_ALLOCA // to the appropriate stack allocation function #if ! defined EIGEN_ALLOCA && ! defined EIGEN_GPU_COMPILE_PHASE #if EIGEN_OS_LINUX || EIGEN_OS_MAC || (defined alloca) #define EIGEN_ALLOCA alloca #elif EIGEN_COMP_MSVC #define EIGEN_ALLOCA _alloca #endif #endif // With clang -Oz -mthumb, alloca changes the stack pointer in a way that is // not allowed in Thumb2. -DEIGEN_STACK_ALLOCATION_LIMIT=0 doesn't work because // the compiler still emits bad code because stack allocation checks use "<=". // TODO: Eliminate after https://bugs.llvm.org/show_bug.cgi?id=23772 // is fixed. #if defined(__clang__) && defined(__thumb__) #undef EIGEN_ALLOCA #endif // This helper class construct the allocated memory, and takes care of destructing and freeing the handled data // at destruction time. In practice this helper class is mainly useful to avoid memory leak in case of exceptions. template class aligned_stack_memory_handler : noncopyable { public: /* Creates a stack_memory_handler responsible for the buffer \a ptr of size \a size. * Note that \a ptr can be 0 regardless of the other parameters. * This constructor takes care of constructing/initializing the elements of the buffer if required by the scalar type T (see NumTraits::RequireInitialization). * In this case, the buffer elements will also be destructed when this handler will be destructed. * Finally, if \a dealloc is true, then the pointer \a ptr is freed. **/ EIGEN_DEVICE_FUNC aligned_stack_memory_handler(T* ptr, std::size_t size, bool dealloc) : m_ptr(ptr), m_size(size), m_deallocate(dealloc) { if(NumTraits::RequireInitialization && m_ptr) Eigen::internal::construct_elements_of_array(m_ptr, size); } EIGEN_DEVICE_FUNC ~aligned_stack_memory_handler() { if(NumTraits::RequireInitialization && m_ptr) Eigen::internal::destruct_elements_of_array(m_ptr, m_size); if(m_deallocate) Eigen::internal::aligned_free(m_ptr); } protected: T* m_ptr; std::size_t m_size; bool m_deallocate; }; #ifdef EIGEN_ALLOCA template::Evaluate && Xpr::MaxSizeAtCompileTime==Dynamic > struct local_nested_eval_wrapper { static const bool NeedExternalBuffer = false; typedef typename Xpr::Scalar Scalar; typedef typename nested_eval::type ObjectType; ObjectType object; EIGEN_DEVICE_FUNC local_nested_eval_wrapper(const Xpr& xpr, Scalar* ptr) : object(xpr) { EIGEN_UNUSED_VARIABLE(ptr); eigen_internal_assert(ptr==0); } }; template struct local_nested_eval_wrapper { static const bool NeedExternalBuffer = true; typedef typename Xpr::Scalar Scalar; typedef typename plain_object_eval::type PlainObject; typedef Map ObjectType; ObjectType object; EIGEN_DEVICE_FUNC local_nested_eval_wrapper(const Xpr& xpr, Scalar* ptr) : object(ptr==0 ? reinterpret_cast(Eigen::internal::aligned_malloc(sizeof(Scalar)*xpr.size())) : ptr, xpr.rows(), xpr.cols()), m_deallocate(ptr==0) { if(NumTraits::RequireInitialization && object.data()) Eigen::internal::construct_elements_of_array(object.data(), object.size()); object = xpr; } EIGEN_DEVICE_FUNC ~local_nested_eval_wrapper() { if(NumTraits::RequireInitialization && object.data()) Eigen::internal::destruct_elements_of_array(object.data(), object.size()); if(m_deallocate) Eigen::internal::aligned_free(object.data()); } private: bool m_deallocate; }; #endif // EIGEN_ALLOCA template class scoped_array : noncopyable { T* m_ptr; public: explicit scoped_array(std::ptrdiff_t size) { m_ptr = new T[size]; } ~scoped_array() { delete[] m_ptr; } T& operator[](std::ptrdiff_t i) { return m_ptr[i]; } const T& operator[](std::ptrdiff_t i) const { return m_ptr[i]; } T* &ptr() { return m_ptr; } const T* ptr() const { return m_ptr; } operator const T*() const { return m_ptr; } }; template void swap(scoped_array &a,scoped_array &b) { std::swap(a.ptr(),b.ptr()); } } // end namespace internal /** \internal * * The macro ei_declare_aligned_stack_constructed_variable(TYPE,NAME,SIZE,BUFFER) declares, allocates, * and construct an aligned buffer named NAME of SIZE elements of type TYPE on the stack * if the size in bytes is smaller than EIGEN_STACK_ALLOCATION_LIMIT, and if stack allocation is supported by the platform * (currently, this is Linux, OSX and Visual Studio only). Otherwise the memory is allocated on the heap. * The allocated buffer is automatically deleted when exiting the scope of this declaration. * If BUFFER is non null, then the declared variable is simply an alias for BUFFER, and no allocation/deletion occurs. * Here is an example: * \code * { * ei_declare_aligned_stack_constructed_variable(float,data,size,0); * // use data[0] to data[size-1] * } * \endcode * The underlying stack allocation function can controlled with the EIGEN_ALLOCA preprocessor token. * * The macro ei_declare_local_nested_eval(XPR_T,XPR,N,NAME) is analogue to * \code * typename internal::nested_eval::type NAME(XPR); * \endcode * with the advantage of using aligned stack allocation even if the maximal size of XPR at compile time is unknown. * This is accomplished through alloca if this later is supported and if the required number of bytes * is below EIGEN_STACK_ALLOCATION_LIMIT. */ #ifdef EIGEN_ALLOCA #if EIGEN_DEFAULT_ALIGN_BYTES>0 // We always manually re-align the result of EIGEN_ALLOCA. // If alloca is already aligned, the compiler should be smart enough to optimize away the re-alignment. #define EIGEN_ALIGNED_ALLOCA(SIZE) reinterpret_cast((internal::UIntPtr(EIGEN_ALLOCA(SIZE+EIGEN_DEFAULT_ALIGN_BYTES-1)) + EIGEN_DEFAULT_ALIGN_BYTES-1) & ~(std::size_t(EIGEN_DEFAULT_ALIGN_BYTES-1))) #else #define EIGEN_ALIGNED_ALLOCA(SIZE) EIGEN_ALLOCA(SIZE) #endif #define ei_declare_aligned_stack_constructed_variable(TYPE,NAME,SIZE,BUFFER) \ Eigen::internal::check_size_for_overflow(SIZE); \ TYPE* NAME = (BUFFER)!=0 ? (BUFFER) \ : reinterpret_cast( \ (sizeof(TYPE)*SIZE<=EIGEN_STACK_ALLOCATION_LIMIT) ? EIGEN_ALIGNED_ALLOCA(sizeof(TYPE)*SIZE) \ : Eigen::internal::aligned_malloc(sizeof(TYPE)*SIZE) ); \ Eigen::internal::aligned_stack_memory_handler EIGEN_CAT(NAME,_stack_memory_destructor)((BUFFER)==0 ? NAME : 0,SIZE,sizeof(TYPE)*SIZE>EIGEN_STACK_ALLOCATION_LIMIT) #define ei_declare_local_nested_eval(XPR_T,XPR,N,NAME) \ Eigen::internal::local_nested_eval_wrapper EIGEN_CAT(NAME,_wrapper)(XPR, reinterpret_cast( \ ( (Eigen::internal::local_nested_eval_wrapper::NeedExternalBuffer) && ((sizeof(typename XPR_T::Scalar)*XPR.size())<=EIGEN_STACK_ALLOCATION_LIMIT) ) \ ? EIGEN_ALIGNED_ALLOCA( sizeof(typename XPR_T::Scalar)*XPR.size() ) : 0 ) ) ; \ typename Eigen::internal::local_nested_eval_wrapper::ObjectType NAME(EIGEN_CAT(NAME,_wrapper).object) #else #define ei_declare_aligned_stack_constructed_variable(TYPE,NAME,SIZE,BUFFER) \ Eigen::internal::check_size_for_overflow(SIZE); \ TYPE* NAME = (BUFFER)!=0 ? BUFFER : reinterpret_cast(Eigen::internal::aligned_malloc(sizeof(TYPE)*SIZE)); \ Eigen::internal::aligned_stack_memory_handler EIGEN_CAT(NAME,_stack_memory_destructor)((BUFFER)==0 ? NAME : 0,SIZE,true) #define ei_declare_local_nested_eval(XPR_T,XPR,N,NAME) typename Eigen::internal::nested_eval::type NAME(XPR) #endif /***************************************************************************** *** Implementation of EIGEN_MAKE_ALIGNED_OPERATOR_NEW [_IF] *** *****************************************************************************/ #if EIGEN_HAS_CXX17_OVERALIGN // C++17 -> no need to bother about alignment anymore :) #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign) #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(Scalar,Size) #else // HIP does not support new/delete on device. #if EIGEN_MAX_ALIGN_BYTES!=0 && !defined(EIGEN_HIP_DEVICE_COMPILE) #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) \ EIGEN_DEVICE_FUNC \ void* operator new(std::size_t size, const std::nothrow_t&) EIGEN_NO_THROW { \ EIGEN_TRY { return Eigen::internal::conditional_aligned_malloc(size); } \ EIGEN_CATCH (...) { return 0; } \ } #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign) \ EIGEN_DEVICE_FUNC \ void *operator new(std::size_t size) { \ return Eigen::internal::conditional_aligned_malloc(size); \ } \ EIGEN_DEVICE_FUNC \ void *operator new[](std::size_t size) { \ return Eigen::internal::conditional_aligned_malloc(size); \ } \ EIGEN_DEVICE_FUNC \ void operator delete(void * ptr) EIGEN_NO_THROW { Eigen::internal::conditional_aligned_free(ptr); } \ EIGEN_DEVICE_FUNC \ void operator delete[](void * ptr) EIGEN_NO_THROW { Eigen::internal::conditional_aligned_free(ptr); } \ EIGEN_DEVICE_FUNC \ void operator delete(void * ptr, std::size_t /* sz */) EIGEN_NO_THROW { Eigen::internal::conditional_aligned_free(ptr); } \ EIGEN_DEVICE_FUNC \ void operator delete[](void * ptr, std::size_t /* sz */) EIGEN_NO_THROW { Eigen::internal::conditional_aligned_free(ptr); } \ /* in-place new and delete. since (at least afaik) there is no actual */ \ /* memory allocated we can safely let the default implementation handle */ \ /* this particular case. */ \ EIGEN_DEVICE_FUNC \ static void *operator new(std::size_t size, void *ptr) { return ::operator new(size,ptr); } \ EIGEN_DEVICE_FUNC \ static void *operator new[](std::size_t size, void* ptr) { return ::operator new[](size,ptr); } \ EIGEN_DEVICE_FUNC \ void operator delete(void * memory, void *ptr) EIGEN_NO_THROW { return ::operator delete(memory,ptr); } \ EIGEN_DEVICE_FUNC \ void operator delete[](void * memory, void *ptr) EIGEN_NO_THROW { return ::operator delete[](memory,ptr); } \ /* nothrow-new (returns zero instead of std::bad_alloc) */ \ EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) \ EIGEN_DEVICE_FUNC \ void operator delete(void *ptr, const std::nothrow_t&) EIGEN_NO_THROW { \ Eigen::internal::conditional_aligned_free(ptr); \ } \ typedef void eigen_aligned_operator_new_marker_type; #else #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign) #endif #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(true) #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(Scalar,Size) \ EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(bool( \ ((Size)!=Eigen::Dynamic) && \ (((EIGEN_MAX_ALIGN_BYTES>=16) && ((sizeof(Scalar)*(Size))%(EIGEN_MAX_ALIGN_BYTES )==0)) || \ ((EIGEN_MAX_ALIGN_BYTES>=32) && ((sizeof(Scalar)*(Size))%(EIGEN_MAX_ALIGN_BYTES/2)==0)) || \ ((EIGEN_MAX_ALIGN_BYTES>=64) && ((sizeof(Scalar)*(Size))%(EIGEN_MAX_ALIGN_BYTES/4)==0)) ))) #endif /****************************************************************************/ /** \class aligned_allocator * \ingroup Core_Module * * \brief STL compatible allocator to use with types requiring a non standrad alignment. * * The memory is aligned as for dynamically aligned matrix/array types such as MatrixXd. * By default, it will thus provide at least 16 bytes alignment and more in following cases: * - 32 bytes alignment if AVX is enabled. * - 64 bytes alignment if AVX512 is enabled. * * This can be controlled using the \c EIGEN_MAX_ALIGN_BYTES macro as documented * \link TopicPreprocessorDirectivesPerformance there \endlink. * * Example: * \code * // Matrix4f requires 16 bytes alignment: * std::map< int, Matrix4f, std::less, * aligned_allocator > > my_map_mat4; * // Vector3f does not require 16 bytes alignment, no need to use Eigen's allocator: * std::map< int, Vector3f > my_map_vec3; * \endcode * * \sa \blank \ref TopicStlContainers. */ template class aligned_allocator : public std::allocator { public: typedef std::size_t size_type; typedef std::ptrdiff_t difference_type; typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; typedef T value_type; template struct rebind { typedef aligned_allocator other; }; aligned_allocator() : std::allocator() {} aligned_allocator(const aligned_allocator& other) : std::allocator(other) {} template aligned_allocator(const aligned_allocator& other) : std::allocator(other) {} ~aligned_allocator() {} #if EIGEN_COMP_GNUC_STRICT && EIGEN_GNUC_AT_LEAST(7,0) // In gcc std::allocator::max_size() is bugged making gcc triggers a warning: // eigen/Eigen/src/Core/util/Memory.h:189:12: warning: argument 1 value '18446744073709551612' exceeds maximum object size 9223372036854775807 // See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87544 size_type max_size() const { return (std::numeric_limits::max)()/sizeof(T); } #endif pointer allocate(size_type num, const void* /*hint*/ = 0) { internal::check_size_for_overflow(num); return static_cast( internal::aligned_malloc(num * sizeof(T)) ); } void deallocate(pointer p, size_type /*num*/) { internal::aligned_free(p); } }; //---------- Cache sizes ---------- #if !defined(EIGEN_NO_CPUID) # if EIGEN_COMP_GNUC && EIGEN_ARCH_i386_OR_x86_64 # if defined(__PIC__) && EIGEN_ARCH_i386 // Case for x86 with PIC # define EIGEN_CPUID(abcd,func,id) \ __asm__ __volatile__ ("xchgl %%ebx, %k1;cpuid; xchgl %%ebx,%k1": "=a" (abcd[0]), "=&r" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "a" (func), "c" (id)); # elif defined(__PIC__) && EIGEN_ARCH_x86_64 // Case for x64 with PIC. In theory this is only a problem with recent gcc and with medium or large code model, not with the default small code model. // However, we cannot detect which code model is used, and the xchg overhead is negligible anyway. # define EIGEN_CPUID(abcd,func,id) \ __asm__ __volatile__ ("xchg{q}\t{%%}rbx, %q1; cpuid; xchg{q}\t{%%}rbx, %q1": "=a" (abcd[0]), "=&r" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "0" (func), "2" (id)); # else // Case for x86_64 or x86 w/o PIC # define EIGEN_CPUID(abcd,func,id) \ __asm__ __volatile__ ("cpuid": "=a" (abcd[0]), "=b" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "0" (func), "2" (id) ); # endif # elif EIGEN_COMP_MSVC # if (EIGEN_COMP_MSVC > 1500) && EIGEN_ARCH_i386_OR_x86_64 # define EIGEN_CPUID(abcd,func,id) __cpuidex((int*)abcd,func,id) # endif # endif #endif namespace internal { #ifdef EIGEN_CPUID inline bool cpuid_is_vendor(int abcd[4], const int vendor[3]) { return abcd[1]==vendor[0] && abcd[3]==vendor[1] && abcd[2]==vendor[2]; } inline void queryCacheSizes_intel_direct(int& l1, int& l2, int& l3) { int abcd[4]; l1 = l2 = l3 = 0; int cache_id = 0; int cache_type = 0; do { abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0; EIGEN_CPUID(abcd,0x4,cache_id); cache_type = (abcd[0] & 0x0F) >> 0; if(cache_type==1||cache_type==3) // data or unified cache { int cache_level = (abcd[0] & 0xE0) >> 5; // A[7:5] int ways = (abcd[1] & 0xFFC00000) >> 22; // B[31:22] int partitions = (abcd[1] & 0x003FF000) >> 12; // B[21:12] int line_size = (abcd[1] & 0x00000FFF) >> 0; // B[11:0] int sets = (abcd[2]); // C[31:0] int cache_size = (ways+1) * (partitions+1) * (line_size+1) * (sets+1); switch(cache_level) { case 1: l1 = cache_size; break; case 2: l2 = cache_size; break; case 3: l3 = cache_size; break; default: break; } } cache_id++; } while(cache_type>0 && cache_id<16); } inline void queryCacheSizes_intel_codes(int& l1, int& l2, int& l3) { int abcd[4]; abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0; l1 = l2 = l3 = 0; EIGEN_CPUID(abcd,0x00000002,0); unsigned char * bytes = reinterpret_cast(abcd)+2; bool check_for_p2_core2 = false; for(int i=0; i<14; ++i) { switch(bytes[i]) { case 0x0A: l1 = 8; break; // 0Ah data L1 cache, 8 KB, 2 ways, 32 byte lines case 0x0C: l1 = 16; break; // 0Ch data L1 cache, 16 KB, 4 ways, 32 byte lines case 0x0E: l1 = 24; break; // 0Eh data L1 cache, 24 KB, 6 ways, 64 byte lines case 0x10: l1 = 16; break; // 10h data L1 cache, 16 KB, 4 ways, 32 byte lines (IA-64) case 0x15: l1 = 16; break; // 15h code L1 cache, 16 KB, 4 ways, 32 byte lines (IA-64) case 0x2C: l1 = 32; break; // 2Ch data L1 cache, 32 KB, 8 ways, 64 byte lines case 0x30: l1 = 32; break; // 30h code L1 cache, 32 KB, 8 ways, 64 byte lines case 0x60: l1 = 16; break; // 60h data L1 cache, 16 KB, 8 ways, 64 byte lines, sectored case 0x66: l1 = 8; break; // 66h data L1 cache, 8 KB, 4 ways, 64 byte lines, sectored case 0x67: l1 = 16; break; // 67h data L1 cache, 16 KB, 4 ways, 64 byte lines, sectored case 0x68: l1 = 32; break; // 68h data L1 cache, 32 KB, 4 ways, 64 byte lines, sectored case 0x1A: l2 = 96; break; // code and data L2 cache, 96 KB, 6 ways, 64 byte lines (IA-64) case 0x22: l3 = 512; break; // code and data L3 cache, 512 KB, 4 ways (!), 64 byte lines, dual-sectored case 0x23: l3 = 1024; break; // code and data L3 cache, 1024 KB, 8 ways, 64 byte lines, dual-sectored case 0x25: l3 = 2048; break; // code and data L3 cache, 2048 KB, 8 ways, 64 byte lines, dual-sectored case 0x29: l3 = 4096; break; // code and data L3 cache, 4096 KB, 8 ways, 64 byte lines, dual-sectored case 0x39: l2 = 128; break; // code and data L2 cache, 128 KB, 4 ways, 64 byte lines, sectored case 0x3A: l2 = 192; break; // code and data L2 cache, 192 KB, 6 ways, 64 byte lines, sectored case 0x3B: l2 = 128; break; // code and data L2 cache, 128 KB, 2 ways, 64 byte lines, sectored case 0x3C: l2 = 256; break; // code and data L2 cache, 256 KB, 4 ways, 64 byte lines, sectored case 0x3D: l2 = 384; break; // code and data L2 cache, 384 KB, 6 ways, 64 byte lines, sectored case 0x3E: l2 = 512; break; // code and data L2 cache, 512 KB, 4 ways, 64 byte lines, sectored case 0x40: l2 = 0; break; // no integrated L2 cache (P6 core) or L3 cache (P4 core) case 0x41: l2 = 128; break; // code and data L2 cache, 128 KB, 4 ways, 32 byte lines case 0x42: l2 = 256; break; // code and data L2 cache, 256 KB, 4 ways, 32 byte lines case 0x43: l2 = 512; break; // code and data L2 cache, 512 KB, 4 ways, 32 byte lines case 0x44: l2 = 1024; break; // code and data L2 cache, 1024 KB, 4 ways, 32 byte lines case 0x45: l2 = 2048; break; // code and data L2 cache, 2048 KB, 4 ways, 32 byte lines case 0x46: l3 = 4096; break; // code and data L3 cache, 4096 KB, 4 ways, 64 byte lines case 0x47: l3 = 8192; break; // code and data L3 cache, 8192 KB, 8 ways, 64 byte lines case 0x48: l2 = 3072; break; // code and data L2 cache, 3072 KB, 12 ways, 64 byte lines case 0x49: if(l2!=0) l3 = 4096; else {check_for_p2_core2=true; l3 = l2 = 4096;} break;// code and data L3 cache, 4096 KB, 16 ways, 64 byte lines (P4) or L2 for core2 case 0x4A: l3 = 6144; break; // code and data L3 cache, 6144 KB, 12 ways, 64 byte lines case 0x4B: l3 = 8192; break; // code and data L3 cache, 8192 KB, 16 ways, 64 byte lines case 0x4C: l3 = 12288; break; // code and data L3 cache, 12288 KB, 12 ways, 64 byte lines case 0x4D: l3 = 16384; break; // code and data L3 cache, 16384 KB, 16 ways, 64 byte lines case 0x4E: l2 = 6144; break; // code and data L2 cache, 6144 KB, 24 ways, 64 byte lines case 0x78: l2 = 1024; break; // code and data L2 cache, 1024 KB, 4 ways, 64 byte lines case 0x79: l2 = 128; break; // code and data L2 cache, 128 KB, 8 ways, 64 byte lines, dual-sectored case 0x7A: l2 = 256; break; // code and data L2 cache, 256 KB, 8 ways, 64 byte lines, dual-sectored case 0x7B: l2 = 512; break; // code and data L2 cache, 512 KB, 8 ways, 64 byte lines, dual-sectored case 0x7C: l2 = 1024; break; // code and data L2 cache, 1024 KB, 8 ways, 64 byte lines, dual-sectored case 0x7D: l2 = 2048; break; // code and data L2 cache, 2048 KB, 8 ways, 64 byte lines case 0x7E: l2 = 256; break; // code and data L2 cache, 256 KB, 8 ways, 128 byte lines, sect. (IA-64) case 0x7F: l2 = 512; break; // code and data L2 cache, 512 KB, 2 ways, 64 byte lines case 0x80: l2 = 512; break; // code and data L2 cache, 512 KB, 8 ways, 64 byte lines case 0x81: l2 = 128; break; // code and data L2 cache, 128 KB, 8 ways, 32 byte lines case 0x82: l2 = 256; break; // code and data L2 cache, 256 KB, 8 ways, 32 byte lines case 0x83: l2 = 512; break; // code and data L2 cache, 512 KB, 8 ways, 32 byte lines case 0x84: l2 = 1024; break; // code and data L2 cache, 1024 KB, 8 ways, 32 byte lines case 0x85: l2 = 2048; break; // code and data L2 cache, 2048 KB, 8 ways, 32 byte lines case 0x86: l2 = 512; break; // code and data L2 cache, 512 KB, 4 ways, 64 byte lines case 0x87: l2 = 1024; break; // code and data L2 cache, 1024 KB, 8 ways, 64 byte lines case 0x88: l3 = 2048; break; // code and data L3 cache, 2048 KB, 4 ways, 64 byte lines (IA-64) case 0x89: l3 = 4096; break; // code and data L3 cache, 4096 KB, 4 ways, 64 byte lines (IA-64) case 0x8A: l3 = 8192; break; // code and data L3 cache, 8192 KB, 4 ways, 64 byte lines (IA-64) case 0x8D: l3 = 3072; break; // code and data L3 cache, 3072 KB, 12 ways, 128 byte lines (IA-64) default: break; } } if(check_for_p2_core2 && l2 == l3) l3 = 0; l1 *= 1024; l2 *= 1024; l3 *= 1024; } inline void queryCacheSizes_intel(int& l1, int& l2, int& l3, int max_std_funcs) { if(max_std_funcs>=4) queryCacheSizes_intel_direct(l1,l2,l3); else if(max_std_funcs>=2) queryCacheSizes_intel_codes(l1,l2,l3); else l1 = l2 = l3 = 0; } inline void queryCacheSizes_amd(int& l1, int& l2, int& l3) { int abcd[4]; abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0; // First query the max supported function. EIGEN_CPUID(abcd,0x80000000,0); if(static_cast(abcd[0]) >= static_cast(0x80000006)) { EIGEN_CPUID(abcd,0x80000005,0); l1 = (abcd[2] >> 24) * 1024; // C[31:24] = L1 size in KB abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0; EIGEN_CPUID(abcd,0x80000006,0); l2 = (abcd[2] >> 16) * 1024; // C[31;16] = l2 cache size in KB l3 = ((abcd[3] & 0xFFFC000) >> 18) * 512 * 1024; // D[31;18] = l3 cache size in 512KB } else { l1 = l2 = l3 = 0; } } #endif /** \internal * Queries and returns the cache sizes in Bytes of the L1, L2, and L3 data caches respectively */ inline void queryCacheSizes(int& l1, int& l2, int& l3) { #ifdef EIGEN_CPUID int abcd[4]; const int GenuineIntel[] = {0x756e6547, 0x49656e69, 0x6c65746e}; const int AuthenticAMD[] = {0x68747541, 0x69746e65, 0x444d4163}; const int AMDisbetter_[] = {0x69444d41, 0x74656273, 0x21726574}; // "AMDisbetter!" // identify the CPU vendor EIGEN_CPUID(abcd,0x0,0); int max_std_funcs = abcd[0]; if(cpuid_is_vendor(abcd,GenuineIntel)) queryCacheSizes_intel(l1,l2,l3,max_std_funcs); else if(cpuid_is_vendor(abcd,AuthenticAMD) || cpuid_is_vendor(abcd,AMDisbetter_)) queryCacheSizes_amd(l1,l2,l3); else // by default let's use Intel's API queryCacheSizes_intel(l1,l2,l3,max_std_funcs); // here is the list of other vendors: // ||cpuid_is_vendor(abcd,"VIA VIA VIA ") // ||cpuid_is_vendor(abcd,"CyrixInstead") // ||cpuid_is_vendor(abcd,"CentaurHauls") // ||cpuid_is_vendor(abcd,"GenuineTMx86") // ||cpuid_is_vendor(abcd,"TransmetaCPU") // ||cpuid_is_vendor(abcd,"RiseRiseRise") // ||cpuid_is_vendor(abcd,"Geode by NSC") // ||cpuid_is_vendor(abcd,"SiS SiS SiS ") // ||cpuid_is_vendor(abcd,"UMC UMC UMC ") // ||cpuid_is_vendor(abcd,"NexGenDriven") #else l1 = l2 = l3 = -1; #endif } /** \internal * \returns the size in Bytes of the L1 data cache */ inline int queryL1CacheSize() { int l1(-1), l2, l3; queryCacheSizes(l1,l2,l3); return l1; } /** \internal * \returns the size in Bytes of the L2 or L3 cache if this later is present */ inline int queryTopLevelCacheSize() { int l1, l2(-1), l3(-1); queryCacheSizes(l1,l2,l3); return (std::max)(l2,l3); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_MEMORY_H RcppEigen/inst/include/Eigen/src/Core/util/IntegralConstant.h0000644000176200001440000002530514562417221023652 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INTEGRAL_CONSTANT_H #define EIGEN_INTEGRAL_CONSTANT_H namespace Eigen { namespace internal { template class FixedInt; template class VariableAndFixedInt; /** \internal * \class FixedInt * * This class embeds a compile-time integer \c N. * * It is similar to c++11 std::integral_constant but with some additional features * such as: * - implicit conversion to int * - arithmetic and some bitwise operators: -, +, *, /, %, &, | * - c++98/14 compatibility with fix and fix() syntax to define integral constants. * * It is strongly discouraged to directly deal with this class FixedInt. Instances are expcected to * be created by the user using Eigen::fix or Eigen::fix(). In C++98-11, the former syntax does * not create a FixedInt instance but rather a point to function that needs to be \em cleaned-up * using the generic helper: * \code * internal::cleanup_index_type::type * internal::cleanup_index_type::type * \endcode * where T can a FixedInt, a pointer to function FixedInt (*)(), or numerous other integer-like representations. * \c DynamicKey is either Dynamic (default) or DynamicIndex and used to identify true compile-time values. * * For convenience, you can extract the compile-time value \c N in a generic way using the following helper: * \code * internal::get_fixed_value::value * \endcode * that will give you \c N if T equals FixedInt or FixedInt (*)(), and \c DefaultVal if T does not embed any compile-time value (e.g., T==int). * * \sa fix, class VariableAndFixedInt */ template class FixedInt { public: static const int value = N; EIGEN_CONSTEXPR operator int() const { return value; } FixedInt() {} FixedInt( VariableAndFixedInt other) { #ifndef EIGEN_INTERNAL_DEBUGGING EIGEN_UNUSED_VARIABLE(other); #endif eigen_internal_assert(int(other)==N); } FixedInt<-N> operator-() const { return FixedInt<-N>(); } template FixedInt operator+( FixedInt) const { return FixedInt(); } template FixedInt operator-( FixedInt) const { return FixedInt(); } template FixedInt operator*( FixedInt) const { return FixedInt(); } template FixedInt operator/( FixedInt) const { return FixedInt(); } template FixedInt operator%( FixedInt) const { return FixedInt(); } template FixedInt operator|( FixedInt) const { return FixedInt(); } template FixedInt operator&( FixedInt) const { return FixedInt(); } #if EIGEN_HAS_CXX14_VARIABLE_TEMPLATES // Needed in C++14 to allow fix(): FixedInt operator() () const { return *this; } VariableAndFixedInt operator() (int val) const { return VariableAndFixedInt(val); } #else FixedInt ( FixedInt (*)() ) {} #endif #if EIGEN_HAS_CXX11 FixedInt(std::integral_constant) {} #endif }; /** \internal * \class VariableAndFixedInt * * This class embeds both a compile-time integer \c N and a runtime integer. * Both values are supposed to be equal unless the compile-time value \c N has a special * value meaning that the runtime-value should be used. Depending on the context, this special * value can be either Eigen::Dynamic (for positive quantities) or Eigen::DynamicIndex (for * quantities that can be negative). * * It is the return-type of the function Eigen::fix(int), and most of the time this is the only * way it is used. It is strongly discouraged to directly deal with instances of VariableAndFixedInt. * Indeed, in order to write generic code, it is the responsibility of the callee to properly convert * it to either a true compile-time quantity (i.e. a FixedInt), or to a runtime quantity (e.g., an Index) * using the following generic helper: * \code * internal::cleanup_index_type::type * internal::cleanup_index_type::type * \endcode * where T can be a template instantiation of VariableAndFixedInt or numerous other integer-like representations. * \c DynamicKey is either Dynamic (default) or DynamicIndex and used to identify true compile-time values. * * For convenience, you can also extract the compile-time value \c N using the following helper: * \code * internal::get_fixed_value::value * \endcode * that will give you \c N if T equals VariableAndFixedInt, and \c DefaultVal if T does not embed any compile-time value (e.g., T==int). * * \sa fix(int), class FixedInt */ template class VariableAndFixedInt { public: static const int value = N; operator int() const { return m_value; } VariableAndFixedInt(int val) { m_value = val; } protected: int m_value; }; template struct get_fixed_value { static const int value = Default; }; template struct get_fixed_value,Default> { static const int value = N; }; #if !EIGEN_HAS_CXX14 template struct get_fixed_value (*)(),Default> { static const int value = N; }; #endif template struct get_fixed_value,Default> { static const int value = N ; }; template struct get_fixed_value,Default> { static const int value = N; }; template EIGEN_DEVICE_FUNC Index get_runtime_value(const T &x) { return x; } #if !EIGEN_HAS_CXX14 template EIGEN_DEVICE_FUNC Index get_runtime_value(FixedInt (*)()) { return N; } #endif // Cleanup integer/FixedInt/VariableAndFixedInt/etc types: // By default, no cleanup: template struct cleanup_index_type { typedef T type; }; // Convert any integral type (e.g., short, int, unsigned int, etc.) to Eigen::Index template struct cleanup_index_type::value>::type> { typedef Index type; }; #if !EIGEN_HAS_CXX14 // In c++98/c++11, fix is a pointer to function that we better cleanup to a true FixedInt: template struct cleanup_index_type (*)(), DynamicKey> { typedef FixedInt type; }; #endif // If VariableAndFixedInt does not match DynamicKey, then we turn it to a pure compile-time value: template struct cleanup_index_type, DynamicKey> { typedef FixedInt type; }; // If VariableAndFixedInt matches DynamicKey, then we turn it to a pure runtime-value (aka Index): template struct cleanup_index_type, DynamicKey> { typedef Index type; }; #if EIGEN_HAS_CXX11 template struct cleanup_index_type, DynamicKey> { typedef FixedInt type; }; #endif } // end namespace internal #ifndef EIGEN_PARSED_BY_DOXYGEN #if EIGEN_HAS_CXX14_VARIABLE_TEMPLATES template static const internal::FixedInt fix{}; #else template inline internal::FixedInt fix() { return internal::FixedInt(); } // The generic typename T is mandatory. Otherwise, a code like fix could refer to either the function above or this next overload. // This way a code like fix can only refer to the previous function. template inline internal::VariableAndFixedInt fix(T val) { return internal::VariableAndFixedInt(internal::convert_index(val)); } #endif #else // EIGEN_PARSED_BY_DOXYGEN /** \var fix() * \ingroup Core_Module * * This \em identifier permits to construct an object embedding a compile-time integer \c N. * * \tparam N the compile-time integer value * * It is typically used in conjunction with the Eigen::seq and Eigen::seqN functions to pass compile-time values to them: * \code * seqN(10,fix<4>,fix<-3>) // <=> [10 7 4 1] * \endcode * * See also the function fix(int) to pass both a compile-time and runtime value. * * In c++14, it is implemented as: * \code * template static const internal::FixedInt fix{}; * \endcode * where internal::FixedInt is an internal template class similar to * \c std::integral_constant * Here, \c fix is thus an object of type \c internal::FixedInt. * * In c++98/11, it is implemented as a function: * \code * template inline internal::FixedInt fix(); * \endcode * Here internal::FixedInt is thus a pointer to function. * * If for some reason you want a true object in c++98 then you can write: \code fix() \endcode which is also valid in c++14. * * \sa fix(int), seq, seqN */ template static const auto fix(); /** \fn fix(int) * \ingroup Core_Module * * This function returns an object embedding both a compile-time integer \c N, and a fallback runtime value \a val. * * \tparam N the compile-time integer value * \param val the fallback runtime integer value * * This function is a more general version of the \ref fix identifier/function that can be used in template code * where the compile-time value could turn out to actually mean "undefined at compile-time". For positive integers * such as a size or a dimension, this case is identified by Eigen::Dynamic, whereas runtime signed integers * (e.g., an increment/stride) are identified as Eigen::DynamicIndex. In such a case, the runtime value \a val * will be used as a fallback. * * A typical use case would be: * \code * template void foo(const MatrixBase &mat) { * const int N = Derived::RowsAtCompileTime==Dynamic ? Dynamic : Derived::RowsAtCompileTime/2; * const int n = mat.rows()/2; * ... mat( seqN(0,fix(n) ) ...; * } * \endcode * In this example, the function Eigen::seqN knows that the second argument is expected to be a size. * If the passed compile-time value N equals Eigen::Dynamic, then the proxy object returned by fix will be dissmissed, and converted to an Eigen::Index of value \c n. * Otherwise, the runtime-value \c n will be dissmissed, and the returned ArithmeticSequence will be of the exact same type as seqN(0,fix) . * * \sa fix, seqN, class ArithmeticSequence */ template static const auto fix(int val); #endif // EIGEN_PARSED_BY_DOXYGEN } // end namespace Eigen #endif // EIGEN_INTEGRAL_CONSTANT_H RcppEigen/inst/include/Eigen/src/Core/util/MKL_support.h0000755000176200001440000001025414562417221022612 0ustar liggesusers/* Copyright (c) 2011, Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Eigen bindings to Intel(R) MKL * Include file with common MKL declarations ******************************************************************************** */ #ifndef EIGEN_MKL_SUPPORT_H #define EIGEN_MKL_SUPPORT_H #ifdef EIGEN_USE_MKL_ALL #ifndef EIGEN_USE_BLAS #define EIGEN_USE_BLAS #endif #ifndef EIGEN_USE_LAPACKE #define EIGEN_USE_LAPACKE #endif #ifndef EIGEN_USE_MKL_VML #define EIGEN_USE_MKL_VML #endif #endif #ifdef EIGEN_USE_LAPACKE_STRICT #define EIGEN_USE_LAPACKE #endif #if defined(EIGEN_USE_MKL_VML) && !defined(EIGEN_USE_MKL) #define EIGEN_USE_MKL #endif #if defined EIGEN_USE_MKL # if (!defined MKL_DIRECT_CALL) && (!defined EIGEN_MKL_NO_DIRECT_CALL) # define MKL_DIRECT_CALL # define MKL_DIRECT_CALL_JUST_SET # endif # include /*Check IMKL version for compatibility: < 10.3 is not usable with Eigen*/ # ifndef INTEL_MKL_VERSION # undef EIGEN_USE_MKL /* INTEL_MKL_VERSION is not even defined on older versions */ # elif INTEL_MKL_VERSION < 100305 /* the intel-mkl-103-release-notes say this was when the lapacke.h interface was added*/ # undef EIGEN_USE_MKL # endif # ifndef EIGEN_USE_MKL /*If the MKL version is too old, undef everything*/ # undef EIGEN_USE_MKL_ALL # undef EIGEN_USE_LAPACKE # undef EIGEN_USE_MKL_VML # undef EIGEN_USE_LAPACKE_STRICT # undef EIGEN_USE_LAPACKE # ifdef MKL_DIRECT_CALL_JUST_SET # undef MKL_DIRECT_CALL # endif # endif #endif #if defined EIGEN_USE_MKL #define EIGEN_MKL_VML_THRESHOLD 128 /* MKL_DOMAIN_BLAS, etc are defined only in 10.3 update 7 */ /* MKL_BLAS, etc are not defined in 11.2 */ #ifdef MKL_DOMAIN_ALL #define EIGEN_MKL_DOMAIN_ALL MKL_DOMAIN_ALL #else #define EIGEN_MKL_DOMAIN_ALL MKL_ALL #endif #ifdef MKL_DOMAIN_BLAS #define EIGEN_MKL_DOMAIN_BLAS MKL_DOMAIN_BLAS #else #define EIGEN_MKL_DOMAIN_BLAS MKL_BLAS #endif #ifdef MKL_DOMAIN_FFT #define EIGEN_MKL_DOMAIN_FFT MKL_DOMAIN_FFT #else #define EIGEN_MKL_DOMAIN_FFT MKL_FFT #endif #ifdef MKL_DOMAIN_VML #define EIGEN_MKL_DOMAIN_VML MKL_DOMAIN_VML #else #define EIGEN_MKL_DOMAIN_VML MKL_VML #endif #ifdef MKL_DOMAIN_PARDISO #define EIGEN_MKL_DOMAIN_PARDISO MKL_DOMAIN_PARDISO #else #define EIGEN_MKL_DOMAIN_PARDISO MKL_PARDISO #endif #endif #if defined(EIGEN_USE_BLAS) && !defined(EIGEN_USE_MKL) #include "../../misc/blas.h" #endif namespace Eigen { typedef std::complex dcomplex; typedef std::complex scomplex; #if defined(EIGEN_USE_MKL) typedef MKL_INT BlasIndex; #else typedef int BlasIndex; #endif } // end namespace Eigen #endif // EIGEN_MKL_SUPPORT_H RcppEigen/inst/include/Eigen/src/Core/util/ReenableStupidWarnings.h0000644000176200001440000000200614562417221025003 0ustar liggesusers#ifdef EIGEN_WARNINGS_DISABLED_2 // "DisableStupidWarnings.h" was included twice recursively: Do not reenable warnings yet! # undef EIGEN_WARNINGS_DISABLED_2 #elif defined(EIGEN_WARNINGS_DISABLED) #undef EIGEN_WARNINGS_DISABLED #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS #ifdef _MSC_VER #pragma warning( pop ) #elif defined __INTEL_COMPILER #pragma warning pop #elif defined __clang__ // #pragma clang diagnostic pop #elif defined __GNUC__ && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)) // #pragma GCC diagnostic pop #endif #if defined __NVCC__ // Don't reenable the diagnostic messages, as it turns out these messages need // to be disabled at the point of the template instantiation (i.e the user code) // otherwise they'll be triggered by nvcc. // #pragma diag_default code_is_unreachable // #pragma diag_default initialization_not_reachable // #pragma diag_default 2651 // #pragma diag_default 2653 #endif #endif #endif // EIGEN_WARNINGS_DISABLED RcppEigen/inst/include/Eigen/src/Core/util/Meta.h0000755000176200001440000007123014562417221021262 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2015 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_META_H #define EIGEN_META_H #if defined(EIGEN_GPU_COMPILE_PHASE) #include #if defined(EIGEN_CUDA_ARCH) #include #endif #if defined(EIGEN_HIP_DEVICE_COMPILE) #include "Eigen/src/Core/arch/HIP/hcc/math_constants.h" #endif #endif // Recent versions of ICC require for pointer types below. #define EIGEN_ICC_NEEDS_CSTDINT (EIGEN_COMP_ICC>=1600 && EIGEN_COMP_CXXVER >= 11) // Define portable (u)int{32,64} types #if EIGEN_HAS_CXX11 || EIGEN_ICC_NEEDS_CSTDINT #include namespace Eigen { namespace numext { typedef std::uint8_t uint8_t; typedef std::int8_t int8_t; typedef std::uint16_t uint16_t; typedef std::int16_t int16_t; typedef std::uint32_t uint32_t; typedef std::int32_t int32_t; typedef std::uint64_t uint64_t; typedef std::int64_t int64_t; } } #else // Without c++11, all compilers able to compile Eigen also // provide the C99 stdint.h header file. #include namespace Eigen { namespace numext { typedef ::uint8_t uint8_t; typedef ::int8_t int8_t; typedef ::uint16_t uint16_t; typedef ::int16_t int16_t; typedef ::uint32_t uint32_t; typedef ::int32_t int32_t; typedef ::uint64_t uint64_t; typedef ::int64_t int64_t; } } #endif namespace Eigen { typedef EIGEN_DEFAULT_DENSE_INDEX_TYPE DenseIndex; /** * \brief The Index type as used for the API. * \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE. * \sa \blank \ref TopicPreprocessorDirectives, StorageIndex. */ typedef EIGEN_DEFAULT_DENSE_INDEX_TYPE Index; namespace internal { /** \internal * \file Meta.h * This file contains generic metaprogramming classes which are not specifically related to Eigen. * \note In case you wonder, yes we're aware that Boost already provides all these features, * we however don't want to add a dependency to Boost. */ // Only recent versions of ICC complain about using ptrdiff_t to hold pointers, // and older versions do not provide *intptr_t types. #if EIGEN_ICC_NEEDS_CSTDINT typedef std::intptr_t IntPtr; typedef std::uintptr_t UIntPtr; #else typedef std::ptrdiff_t IntPtr; typedef std::size_t UIntPtr; #endif #undef EIGEN_ICC_NEEDS_CSTDINT struct true_type { enum { value = 1 }; }; struct false_type { enum { value = 0 }; }; template struct bool_constant; template<> struct bool_constant : true_type {}; template<> struct bool_constant : false_type {}; template struct conditional { typedef Then type; }; template struct conditional { typedef Else type; }; template struct remove_reference { typedef T type; }; template struct remove_reference { typedef T type; }; template struct remove_pointer { typedef T type; }; template struct remove_pointer { typedef T type; }; template struct remove_pointer { typedef T type; }; template struct remove_const { typedef T type; }; template struct remove_const { typedef T type; }; template struct remove_const { typedef T type[]; }; template struct remove_const { typedef T type[Size]; }; template struct remove_all { typedef T type; }; template struct remove_all { typedef typename remove_all::type type; }; template struct remove_all { typedef typename remove_all::type type; }; template struct remove_all { typedef typename remove_all::type type; }; template struct remove_all { typedef typename remove_all::type type; }; template struct remove_all { typedef typename remove_all::type type; }; template struct is_arithmetic { enum { value = false }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic{ enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; template struct is_same { enum { value = 0 }; }; template struct is_same { enum { value = 1 }; }; template< class T > struct is_void : is_same::type> {}; #if EIGEN_HAS_CXX11 template<> struct is_arithmetic { enum { value = true }; }; template<> struct is_arithmetic { enum { value = true }; }; using std::is_integral; #else template struct is_integral { enum { value = false }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; #if EIGEN_COMP_MSVC template<> struct is_integral { enum { value = true }; }; template<> struct is_integral { enum { value = true }; }; #endif #endif #if EIGEN_HAS_CXX11 using std::make_unsigned; #else // TODO: Possibly improve this implementation of make_unsigned. // It is currently used only by // template struct random_default_impl. template struct make_unsigned; template<> struct make_unsigned { typedef unsigned char type; }; template<> struct make_unsigned { typedef unsigned char type; }; template<> struct make_unsigned { typedef unsigned char type; }; template<> struct make_unsigned { typedef unsigned short type; }; template<> struct make_unsigned { typedef unsigned short type; }; template<> struct make_unsigned { typedef unsigned int type; }; template<> struct make_unsigned { typedef unsigned int type; }; template<> struct make_unsigned { typedef unsigned long type; }; template<> struct make_unsigned { typedef unsigned long type; }; #if EIGEN_COMP_MSVC template<> struct make_unsigned { typedef unsigned __int64 type; }; template<> struct make_unsigned { typedef unsigned __int64 type; }; #endif // Some platforms define int64_t as `long long` even for C++03, where // `long long` is not guaranteed by the standard. In this case we are missing // the definition for make_unsigned. If we just define it, we run into issues // where `long long` doesn't exist in some compilers for C++03. We therefore add // the specialization for these platforms only. #if EIGEN_OS_MAC || EIGEN_COMP_MINGW template<> struct make_unsigned { typedef unsigned long long type; }; template<> struct make_unsigned { typedef unsigned long long type; }; #endif #endif template struct add_const { typedef const T type; }; template struct add_const { typedef T& type; }; template struct is_const { enum { value = 0 }; }; template struct is_const { enum { value = 1 }; }; template struct add_const_on_value_type { typedef const T type; }; template struct add_const_on_value_type { typedef T const& type; }; template struct add_const_on_value_type { typedef T const* type; }; template struct add_const_on_value_type { typedef T const* const type; }; template struct add_const_on_value_type { typedef T const* const type; }; #if EIGEN_HAS_CXX11 using std::is_convertible; #else template struct is_convertible_impl { private: struct any_conversion { template any_conversion(const volatile T&); template any_conversion(T&); }; struct yes {int a[1];}; struct no {int a[2];}; template static yes test(T, int); template static no test(any_conversion, ...); public: static typename internal::remove_reference::type* ms_from; #ifdef __INTEL_COMPILER #pragma warning push #pragma warning ( disable : 2259 ) #endif enum { value = sizeof(test(*ms_from, 0))==sizeof(yes) }; #ifdef __INTEL_COMPILER #pragma warning pop #endif }; template struct is_convertible { enum { value = is_convertible_impl::value }; }; template struct is_convertible { enum { value = false }; }; template struct is_convertible { enum { value = true }; }; #endif /** \internal Allows to enable/disable an overload * according to a compile time condition. */ template struct enable_if; template struct enable_if { typedef T type; }; #if defined(EIGEN_GPU_COMPILE_PHASE) && !EIGEN_HAS_CXX11 #if !defined(__FLT_EPSILON__) #define __FLT_EPSILON__ FLT_EPSILON #define __DBL_EPSILON__ DBL_EPSILON #endif namespace device { template struct numeric_limits { EIGEN_DEVICE_FUNC static EIGEN_CONSTEXPR T epsilon() { return 0; } static T (max)() { assert(false && "Highest not supported for this type"); } static T (min)() { assert(false && "Lowest not supported for this type"); } static T infinity() { assert(false && "Infinity not supported for this type"); } static T quiet_NaN() { assert(false && "quiet_NaN not supported for this type"); } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static float epsilon() { return __FLT_EPSILON__; } EIGEN_DEVICE_FUNC static float (max)() { #if defined(EIGEN_CUDA_ARCH) return CUDART_MAX_NORMAL_F; #else return HIPRT_MAX_NORMAL_F; #endif } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static float (min)() { return FLT_MIN; } EIGEN_DEVICE_FUNC static float infinity() { #if defined(EIGEN_CUDA_ARCH) return CUDART_INF_F; #else return HIPRT_INF_F; #endif } EIGEN_DEVICE_FUNC static float quiet_NaN() { #if defined(EIGEN_CUDA_ARCH) return CUDART_NAN_F; #else return HIPRT_NAN_F; #endif } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static double epsilon() { return __DBL_EPSILON__; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static double (max)() { return DBL_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static double (min)() { return DBL_MIN; } EIGEN_DEVICE_FUNC static double infinity() { #if defined(EIGEN_CUDA_ARCH) return CUDART_INF; #else return HIPRT_INF; #endif } EIGEN_DEVICE_FUNC static double quiet_NaN() { #if defined(EIGEN_CUDA_ARCH) return CUDART_NAN; #else return HIPRT_NAN; #endif } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int (max)() { return INT_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static int (min)() { return INT_MIN; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned int epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned int (max)() { return UINT_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned int (min)() { return 0; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long (max)() { return LONG_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long (min)() { return LONG_MIN; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long (max)() { return ULONG_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long (min)() { return 0; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long long epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long long (max)() { return LLONG_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static long long (min)() { return LLONG_MIN; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long long epsilon() { return 0; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long long (max)() { return ULLONG_MAX; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static unsigned long long (min)() { return 0; } }; template<> struct numeric_limits { EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static bool epsilon() { return false; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static bool (max)() { return true; } EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR static bool (min)() { return false; } }; } #endif // defined(EIGEN_GPU_COMPILE_PHASE) && !EIGEN_HAS_CXX11 /** \internal * A base class do disable default copy ctor and copy assignment operator. */ class noncopyable { EIGEN_DEVICE_FUNC noncopyable(const noncopyable&); EIGEN_DEVICE_FUNC const noncopyable& operator=(const noncopyable&); protected: EIGEN_DEVICE_FUNC noncopyable() {} EIGEN_DEVICE_FUNC ~noncopyable() {} }; /** \internal * Provides access to the number of elements in the object of as a compile-time constant expression. * It "returns" Eigen::Dynamic if the size cannot be resolved at compile-time (default). * * Similar to std::tuple_size, but more general. * * It currently supports: * - any types T defining T::SizeAtCompileTime * - plain C arrays as T[N] * - std::array (c++11) * - some internal types such as SingleRange and AllRange * * The second template parameter eases SFINAE-based specializations. */ template struct array_size { enum { value = Dynamic }; }; template struct array_size::type> { enum { value = T::SizeAtCompileTime }; }; template struct array_size { enum { value = N }; }; template struct array_size { enum { value = N }; }; #if EIGEN_HAS_CXX11 template struct array_size > { enum { value = N }; }; template struct array_size > { enum { value = N }; }; #endif /** \internal * Analogue of the std::size free function. * It returns the size of the container or view \a x of type \c T * * It currently supports: * - any types T defining a member T::size() const * - plain C arrays as T[N] * */ template EIGEN_CONSTEXPR Index size(const T& x) { return x.size(); } template EIGEN_CONSTEXPR Index size(const T (&) [N]) { return N; } /** \internal * Convenient struct to get the result type of a nullary, unary, binary, or * ternary functor. * * Pre C++11: * Supports both a Func::result_type member and templated * Func::result::type member. * * If none of these members is provided, then the type of the first * argument is returned. * * Post C++11: * This uses std::result_of. However, note the `type` member removes * const and converts references/pointers to their corresponding value type. */ #if EIGEN_HAS_STD_INVOKE_RESULT template struct result_of; template struct result_of { typedef typename std::invoke_result::type type1; typedef typename remove_all::type type; }; #elif EIGEN_HAS_STD_RESULT_OF template struct result_of { typedef typename std::result_of::type type1; typedef typename remove_all::type type; }; #else template struct result_of { }; struct has_none {int a[1];}; struct has_std_result_type {int a[2];}; struct has_tr1_result {int a[3];}; template struct nullary_result_of_select {}; template struct nullary_result_of_select {typedef typename Func::result_type type;}; template struct nullary_result_of_select {typedef typename Func::template result::type type;}; template struct result_of { template static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0); template static has_tr1_result testFunctor(T const *, typename T::template result::type const * = 0); static has_none testFunctor(...); // note that the following indirection is needed for gcc-3.3 enum {FunctorType = sizeof(testFunctor(static_cast(0)))}; typedef typename nullary_result_of_select::type type; }; template struct unary_result_of_select {typedef typename internal::remove_all::type type;}; template struct unary_result_of_select {typedef typename Func::result_type type;}; template struct unary_result_of_select {typedef typename Func::template result::type type;}; template struct result_of { template static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0); template static has_tr1_result testFunctor(T const *, typename T::template result::type const * = 0); static has_none testFunctor(...); // note that the following indirection is needed for gcc-3.3 enum {FunctorType = sizeof(testFunctor(static_cast(0)))}; typedef typename unary_result_of_select::type type; }; template struct binary_result_of_select {typedef typename internal::remove_all::type type;}; template struct binary_result_of_select {typedef typename Func::result_type type;}; template struct binary_result_of_select {typedef typename Func::template result::type type;}; template struct result_of { template static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0); template static has_tr1_result testFunctor(T const *, typename T::template result::type const * = 0); static has_none testFunctor(...); // note that the following indirection is needed for gcc-3.3 enum {FunctorType = sizeof(testFunctor(static_cast(0)))}; typedef typename binary_result_of_select::type type; }; template struct ternary_result_of_select {typedef typename internal::remove_all::type type;}; template struct ternary_result_of_select {typedef typename Func::result_type type;}; template struct ternary_result_of_select {typedef typename Func::template result::type type;}; template struct result_of { template static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0); template static has_tr1_result testFunctor(T const *, typename T::template result::type const * = 0); static has_none testFunctor(...); // note that the following indirection is needed for gcc-3.3 enum {FunctorType = sizeof(testFunctor(static_cast(0)))}; typedef typename ternary_result_of_select::type type; }; #endif #if EIGEN_HAS_STD_INVOKE_RESULT template struct invoke_result { typedef typename std::invoke_result::type type1; typedef typename remove_all::type type; }; #elif EIGEN_HAS_CXX11 template struct invoke_result { typedef typename result_of::type type1; typedef typename remove_all::type type; }; #else template struct invoke_result { typedef typename result_of::type type1; typedef typename remove_all::type type; }; template struct invoke_result { typedef typename result_of::type type1; typedef typename remove_all::type type; }; template struct invoke_result { typedef typename result_of::type type1; typedef typename remove_all::type type; }; template struct invoke_result { typedef typename result_of::type type1; typedef typename remove_all::type type; }; #endif struct meta_yes { char a[1]; }; struct meta_no { char a[2]; }; // Check whether T::ReturnType does exist template struct has_ReturnType { template static meta_yes testFunctor(C const *, typename C::ReturnType const * = 0); template static meta_no testFunctor(...); enum { value = sizeof(testFunctor(static_cast(0))) == sizeof(meta_yes) }; }; template const T* return_ptr(); template struct has_nullary_operator { template static meta_yes testFunctor(C const *,typename enable_if<(sizeof(return_ptr()->operator()())>0)>::type * = 0); static meta_no testFunctor(...); enum { value = sizeof(testFunctor(static_cast(0))) == sizeof(meta_yes) }; }; template struct has_unary_operator { template static meta_yes testFunctor(C const *,typename enable_if<(sizeof(return_ptr()->operator()(IndexType(0)))>0)>::type * = 0); static meta_no testFunctor(...); enum { value = sizeof(testFunctor(static_cast(0))) == sizeof(meta_yes) }; }; template struct has_binary_operator { template static meta_yes testFunctor(C const *,typename enable_if<(sizeof(return_ptr()->operator()(IndexType(0),IndexType(0)))>0)>::type * = 0); static meta_no testFunctor(...); enum { value = sizeof(testFunctor(static_cast(0))) == sizeof(meta_yes) }; }; /** \internal In short, it computes int(sqrt(\a Y)) with \a Y an integer. * Usage example: \code meta_sqrt<1023>::ret \endcode */ template Y))) > // use ?: instead of || just to shut up a stupid gcc 4.3 warning class meta_sqrt { enum { MidX = (InfX+SupX)/2, TakeInf = MidX*MidX > Y ? 1 : 0, NewInf = int(TakeInf) ? InfX : int(MidX), NewSup = int(TakeInf) ? int(MidX) : SupX }; public: enum { ret = meta_sqrt::ret }; }; template class meta_sqrt { public: enum { ret = (SupX*SupX <= Y) ? SupX : InfX }; }; /** \internal Computes the least common multiple of two positive integer A and B * at compile-time. */ template=B)> struct meta_least_common_multiple { enum { ret = meta_least_common_multiple::ret }; }; template struct meta_least_common_multiple { enum { ret = meta_least_common_multiple::ret }; }; template struct meta_least_common_multiple { enum { ret = A*K }; }; /** \internal determines whether the product of two numeric types is allowed and what the return type is */ template struct scalar_product_traits { enum { Defined = 0 }; }; // FIXME quick workaround around current limitation of result_of // template // struct result_of(ArgType0,ArgType1)> { // typedef typename scalar_product_traits::type, typename remove_all::type>::ReturnType type; // }; /** \internal Obtains a POD type suitable to use as storage for an object of a size * of at most Len bytes, aligned as specified by \c Align. */ template struct aligned_storage { struct type { EIGEN_ALIGN_TO_BOUNDARY(Align) unsigned char data[Len]; }; }; } // end namespace internal namespace numext { #if defined(EIGEN_GPU_COMPILE_PHASE) template EIGEN_DEVICE_FUNC void swap(T &a, T &b) { T tmp = b; b = a; a = tmp; } #else template EIGEN_STRONG_INLINE void swap(T &a, T &b) { std::swap(a,b); } #endif #if defined(EIGEN_GPU_COMPILE_PHASE) && !EIGEN_HAS_CXX11 using internal::device::numeric_limits; #else using std::numeric_limits; #endif // Integer division with rounding up. // T is assumed to be an integer type with a>=0, and b>0 template EIGEN_DEVICE_FUNC T div_ceil(const T &a, const T &b) { return (a+b-1) / b; } // The aim of the following functions is to bypass -Wfloat-equal warnings // when we really want a strict equality comparison on floating points. template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool equal_strict(const X& x,const Y& y) { return x == y; } #if !defined(EIGEN_GPU_COMPILE_PHASE) || (!defined(EIGEN_CUDA_ARCH) && defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC)) template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool equal_strict(const float& x,const float& y) { return std::equal_to()(x,y); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool equal_strict(const double& x,const double& y) { return std::equal_to()(x,y); } #endif template EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool not_equal_strict(const X& x,const Y& y) { return x != y; } #if !defined(EIGEN_GPU_COMPILE_PHASE) || (!defined(EIGEN_CUDA_ARCH) && defined(EIGEN_CONSTEXPR_ARE_DEVICE_FUNC)) template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool not_equal_strict(const float& x,const float& y) { return std::not_equal_to()(x,y); } template<> EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC bool not_equal_strict(const double& x,const double& y) { return std::not_equal_to()(x,y); } #endif } // end namespace numext } // end namespace Eigen #endif // EIGEN_META_H RcppEigen/inst/include/Eigen/src/Core/util/NonMPL2.h0000644000176200001440000000012514107270226021546 0ustar liggesusers#ifdef EIGEN_MPL2_ONLY #error Including non-MPL2 code in EIGEN_MPL2_ONLY mode #endif RcppEigen/inst/include/Eigen/src/Core/util/BlasUtil.h0000755000176200001440000005516414562417221022123 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2009-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_BLASUTIL_H #define EIGEN_BLASUTIL_H // This file contains many lightweight helper classes used to // implement and control fast level 2 and level 3 BLAS-like routines. namespace Eigen { namespace internal { // forward declarations template struct gebp_kernel; template struct gemm_pack_rhs; template struct gemm_pack_lhs; template< typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs, typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs, int ResStorageOrder, int ResInnerStride> struct general_matrix_matrix_product; template struct general_matrix_vector_product; template struct get_factor { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE To run(const From& x) { return To(x); } }; template struct get_factor::Real> { EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE typename NumTraits::Real run(const Scalar& x) { return numext::real(x); } }; template class BlasVectorMapper { public: EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE BlasVectorMapper(Scalar *data) : m_data(data) {} EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar operator()(Index i) const { return m_data[i]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet load(Index i) const { return ploadt(m_data + i); } template EIGEN_DEVICE_FUNC bool aligned(Index i) const { return (UIntPtr(m_data+i)%sizeof(Packet))==0; } protected: Scalar* m_data; }; template class BlasLinearMapper; template class BlasLinearMapper { public: EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE BlasLinearMapper(Scalar *data, Index incr=1) : m_data(data) { EIGEN_ONLY_USED_FOR_DEBUG(incr); eigen_assert(incr==1); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void prefetch(int i) const { internal::prefetch(&operator()(i)); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar& operator()(Index i) const { return m_data[i]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketType loadPacket(Index i) const { return ploadt(m_data + i); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacket(Index i, const PacketType &p) const { pstoret(m_data + i, p); } protected: Scalar *m_data; }; // Lightweight helper class to access matrix coefficients. template class blas_data_mapper; // TMP to help PacketBlock store implementation. // There's currently no known use case for PacketBlock load. // The default implementation assumes ColMajor order. // It always store each packet sequentially one `stride` apart. template struct PacketBlockManagement { PacketBlockManagement pbm; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock &block) const { pbm.store(to, stride, i, j, block); pstoreu(to + i + (j + idx)*stride, block.packet[idx]); } }; // PacketBlockManagement specialization to take care of RowMajor order without ifs. template struct PacketBlockManagement { PacketBlockManagement pbm; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock &block) const { pbm.store(to, stride, i, j, block); pstoreu(to + j + (i + idx)*stride, block.packet[idx]); } }; template struct PacketBlockManagement { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock &block) const { EIGEN_UNUSED_VARIABLE(to); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(i); EIGEN_UNUSED_VARIABLE(j); EIGEN_UNUSED_VARIABLE(block); } }; template struct PacketBlockManagement { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(Scalar *to, const Index stride, Index i, Index j, const PacketBlock &block) const { EIGEN_UNUSED_VARIABLE(to); EIGEN_UNUSED_VARIABLE(stride); EIGEN_UNUSED_VARIABLE(i); EIGEN_UNUSED_VARIABLE(j); EIGEN_UNUSED_VARIABLE(block); } }; template class blas_data_mapper { public: typedef BlasLinearMapper LinearMapper; typedef BlasVectorMapper VectorMapper; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE blas_data_mapper(Scalar* data, Index stride, Index incr=1) : m_data(data), m_stride(stride) { EIGEN_ONLY_USED_FOR_DEBUG(incr); eigen_assert(incr==1); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE blas_data_mapper getSubMapper(Index i, Index j) const { return blas_data_mapper(&operator()(i, j), m_stride); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE LinearMapper getLinearMapper(Index i, Index j) const { return LinearMapper(&operator()(i, j)); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE VectorMapper getVectorMapper(Index i, Index j) const { return VectorMapper(&operator()(i, j)); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar& operator()(Index i, Index j) const { return m_data[StorageOrder==RowMajor ? j + i*m_stride : i + j*m_stride]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketType loadPacket(Index i, Index j) const { return ploadt(&operator()(i, j)); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT load(Index i, Index j) const { return ploadt(&operator()(i, j)); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void scatterPacket(Index i, Index j, const SubPacket &p) const { pscatter(&operator()(i, j), p, m_stride); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE SubPacket gatherPacket(Index i, Index j) const { return pgather(&operator()(i, j), m_stride); } EIGEN_DEVICE_FUNC const Index stride() const { return m_stride; } EIGEN_DEVICE_FUNC const Scalar* data() const { return m_data; } EIGEN_DEVICE_FUNC Index firstAligned(Index size) const { if (UIntPtr(m_data)%sizeof(Scalar)) { return -1; } return internal::first_default_aligned(m_data, size); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacketBlock(Index i, Index j, const PacketBlock &block) const { PacketBlockManagement pbm; pbm.store(m_data, m_stride, i, j, block); } protected: Scalar* EIGEN_RESTRICT m_data; const Index m_stride; }; // Implementation of non-natural increment (i.e. inner-stride != 1) // The exposed API is not complete yet compared to the Incr==1 case // because some features makes less sense in this case. template class BlasLinearMapper { public: EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE BlasLinearMapper(Scalar *data,Index incr) : m_data(data), m_incr(incr) {} EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void prefetch(int i) const { internal::prefetch(&operator()(i)); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar& operator()(Index i) const { return m_data[i*m_incr.value()]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketType loadPacket(Index i) const { return pgather(m_data + i*m_incr.value(), m_incr.value()); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacket(Index i, const PacketType &p) const { pscatter(m_data + i*m_incr.value(), p, m_incr.value()); } protected: Scalar *m_data; const internal::variable_if_dynamic m_incr; }; template class blas_data_mapper { public: typedef BlasLinearMapper LinearMapper; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE blas_data_mapper(Scalar* data, Index stride, Index incr) : m_data(data), m_stride(stride), m_incr(incr) {} EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE blas_data_mapper getSubMapper(Index i, Index j) const { return blas_data_mapper(&operator()(i, j), m_stride, m_incr.value()); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE LinearMapper getLinearMapper(Index i, Index j) const { return LinearMapper(&operator()(i, j), m_incr.value()); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar& operator()(Index i, Index j) const { return m_data[StorageOrder==RowMajor ? j*m_incr.value() + i*m_stride : i*m_incr.value() + j*m_stride]; } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketType loadPacket(Index i, Index j) const { return pgather(&operator()(i, j),m_incr.value()); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT load(Index i, Index j) const { return pgather(&operator()(i, j),m_incr.value()); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void scatterPacket(Index i, Index j, const SubPacket &p) const { pscatter(&operator()(i, j), p, m_stride); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE SubPacket gatherPacket(Index i, Index j) const { return pgather(&operator()(i, j), m_stride); } // storePacketBlock_helper defines a way to access values inside the PacketBlock, this is essentially required by the Complex types. template struct storePacketBlock_helper { storePacketBlock_helper spbh; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper* sup, Index i, Index j, const PacketBlock& block) const { spbh.store(sup, i,j,block); for(int l = 0; l < unpacket_traits::size; l++) { ScalarT *v = &sup->operator()(i+l, j+idx); *v = block.packet[idx][l]; } } }; template struct storePacketBlock_helper, n, idx> { storePacketBlock_helper, n, idx-1> spbh; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper* sup, Index i, Index j, const PacketBlock& block) const { spbh.store(sup,i,j,block); for(int l = 0; l < unpacket_traits::size; l++) { std::complex *v = &sup->operator()(i+l, j+idx); v->real(block.packet[idx].v[2*l+0]); v->imag(block.packet[idx].v[2*l+1]); } } }; template struct storePacketBlock_helper, n, idx> { storePacketBlock_helper, n, idx-1> spbh; EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper* sup, Index i, Index j, const PacketBlock& block) const { spbh.store(sup,i,j,block); for(int l = 0; l < unpacket_traits::size; l++) { std::complex *v = &sup->operator()(i+l, j+idx); v->real(block.packet[idx].v[2*l+0]); v->imag(block.packet[idx].v[2*l+1]); } } }; template struct storePacketBlock_helper { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper*, Index, Index, const PacketBlock& ) const { } }; template struct storePacketBlock_helper, n, -1> { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper*, Index, Index, const PacketBlock& ) const { } }; template struct storePacketBlock_helper, n, -1> { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void store(const blas_data_mapper*, Index, Index, const PacketBlock& ) const { } }; // This function stores a PacketBlock on m_data, this approach is really quite slow compare to Incr=1 and should be avoided when possible. template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacketBlock(Index i, Index j, const PacketBlock&block) const { storePacketBlock_helper spb; spb.store(this, i,j,block); } protected: Scalar* EIGEN_RESTRICT m_data; const Index m_stride; const internal::variable_if_dynamic m_incr; }; // lightweight helper class to access matrix coefficients (const version) template class const_blas_data_mapper : public blas_data_mapper { public: EIGEN_ALWAYS_INLINE const_blas_data_mapper(const Scalar *data, Index stride) : blas_data_mapper(data, stride) {} EIGEN_ALWAYS_INLINE const_blas_data_mapper getSubMapper(Index i, Index j) const { return const_blas_data_mapper(&(this->operator()(i, j)), this->m_stride); } }; /* Helper class to analyze the factors of a Product expression. * In particular it allows to pop out operator-, scalar multiples, * and conjugate */ template struct blas_traits { typedef typename traits::Scalar Scalar; typedef const XprType& ExtractType; typedef XprType _ExtractType; enum { IsComplex = NumTraits::IsComplex, IsTransposed = false, NeedToConjugate = false, HasUsableDirectAccess = ( (int(XprType::Flags)&DirectAccessBit) && ( bool(XprType::IsVectorAtCompileTime) || int(inner_stride_at_compile_time::ret) == 1) ) ? 1 : 0, HasScalarFactor = false }; typedef typename conditional::type DirectLinearAccessType; static inline EIGEN_DEVICE_FUNC ExtractType extract(const XprType& x) { return x; } static inline EIGEN_DEVICE_FUNC const Scalar extractScalarFactor(const XprType&) { return Scalar(1); } }; // pop conjugate template struct blas_traits, NestedXpr> > : blas_traits { typedef blas_traits Base; typedef CwiseUnaryOp, NestedXpr> XprType; typedef typename Base::ExtractType ExtractType; enum { IsComplex = NumTraits::IsComplex, NeedToConjugate = Base::NeedToConjugate ? 0 : IsComplex }; static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); } static inline Scalar extractScalarFactor(const XprType& x) { return conj(Base::extractScalarFactor(x.nestedExpression())); } }; // pop scalar multiple template struct blas_traits, const CwiseNullaryOp,Plain>, NestedXpr> > : blas_traits { enum { HasScalarFactor = true }; typedef blas_traits Base; typedef CwiseBinaryOp, const CwiseNullaryOp,Plain>, NestedXpr> XprType; typedef typename Base::ExtractType ExtractType; static inline EIGEN_DEVICE_FUNC ExtractType extract(const XprType& x) { return Base::extract(x.rhs()); } static inline EIGEN_DEVICE_FUNC Scalar extractScalarFactor(const XprType& x) { return x.lhs().functor().m_other * Base::extractScalarFactor(x.rhs()); } }; template struct blas_traits, NestedXpr, const CwiseNullaryOp,Plain> > > : blas_traits { enum { HasScalarFactor = true }; typedef blas_traits Base; typedef CwiseBinaryOp, NestedXpr, const CwiseNullaryOp,Plain> > XprType; typedef typename Base::ExtractType ExtractType; static inline ExtractType extract(const XprType& x) { return Base::extract(x.lhs()); } static inline Scalar extractScalarFactor(const XprType& x) { return Base::extractScalarFactor(x.lhs()) * x.rhs().functor().m_other; } }; template struct blas_traits, const CwiseNullaryOp,Plain1>, const CwiseNullaryOp,Plain2> > > : blas_traits,Plain1> > {}; // pop opposite template struct blas_traits, NestedXpr> > : blas_traits { enum { HasScalarFactor = true }; typedef blas_traits Base; typedef CwiseUnaryOp, NestedXpr> XprType; typedef typename Base::ExtractType ExtractType; static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); } static inline Scalar extractScalarFactor(const XprType& x) { return - Base::extractScalarFactor(x.nestedExpression()); } }; // pop/push transpose template struct blas_traits > : blas_traits { typedef typename NestedXpr::Scalar Scalar; typedef blas_traits Base; typedef Transpose XprType; typedef Transpose ExtractType; // const to get rid of a compile error; anyway blas traits are only used on the RHS typedef Transpose _ExtractType; typedef typename conditional::type DirectLinearAccessType; enum { IsTransposed = Base::IsTransposed ? 0 : 1 }; static inline ExtractType extract(const XprType& x) { return ExtractType(Base::extract(x.nestedExpression())); } static inline Scalar extractScalarFactor(const XprType& x) { return Base::extractScalarFactor(x.nestedExpression()); } }; template struct blas_traits : blas_traits {}; template::HasUsableDirectAccess> struct extract_data_selector { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static const typename T::Scalar* run(const T& m) { return blas_traits::extract(m).data(); } }; template struct extract_data_selector { static typename T::Scalar* run(const T&) { return 0; } }; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const typename T::Scalar* extract_data(const T& m) { return extract_data_selector::run(m); } /** * \c combine_scalar_factors extracts and multiplies factors from GEMM and GEMV products. * There is a specialization for booleans */ template struct combine_scalar_factors_impl { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static ResScalar run(const Lhs& lhs, const Rhs& rhs) { return blas_traits::extractScalarFactor(lhs) * blas_traits::extractScalarFactor(rhs); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static ResScalar run(const ResScalar& alpha, const Lhs& lhs, const Rhs& rhs) { return alpha * blas_traits::extractScalarFactor(lhs) * blas_traits::extractScalarFactor(rhs); } }; template struct combine_scalar_factors_impl { EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static bool run(const Lhs& lhs, const Rhs& rhs) { return blas_traits::extractScalarFactor(lhs) && blas_traits::extractScalarFactor(rhs); } EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE static bool run(const bool& alpha, const Lhs& lhs, const Rhs& rhs) { return alpha && blas_traits::extractScalarFactor(lhs) && blas_traits::extractScalarFactor(rhs); } }; template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE ResScalar combine_scalar_factors(const ResScalar& alpha, const Lhs& lhs, const Rhs& rhs) { return combine_scalar_factors_impl::run(alpha, lhs, rhs); } template EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE ResScalar combine_scalar_factors(const Lhs& lhs, const Rhs& rhs) { return combine_scalar_factors_impl::run(lhs, rhs); } } // end namespace internal } // end namespace Eigen #endif // EIGEN_BLASUTIL_H RcppEigen/inst/include/Eigen/src/Core/util/IndexedViewHelper.h0000644000176200001440000001505014562417221023742 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_INDEXED_VIEW_HELPER_H #define EIGEN_INDEXED_VIEW_HELPER_H namespace Eigen { namespace internal { struct symbolic_last_tag {}; } /** \var last * \ingroup Core_Module * * Can be used as a parameter to Eigen::seq and Eigen::seqN functions to symbolically reference the last element/row/columns * of the underlying vector or matrix once passed to DenseBase::operator()(const RowIndices&, const ColIndices&). * * This symbolic placeholder supports standard arithmetic operations. * * A typical usage example would be: * \code * using namespace Eigen; * using Eigen::last; * VectorXd v(n); * v(seq(2,last-2)).setOnes(); * \endcode * * \sa end */ static const symbolic::SymbolExpr last; // PLEASE use Eigen::last instead of Eigen::placeholders::last /** \var lastp1 * \ingroup Core_Module * * Can be used as a parameter to Eigen::seq and Eigen::seqN functions to symbolically * reference the last+1 element/row/columns of the underlying vector or matrix once * passed to DenseBase::operator()(const RowIndices&, const ColIndices&). * * This symbolic placeholder supports standard arithmetic operations. * It is essentially an alias to last+fix<1>. * * \sa last */ #ifdef EIGEN_PARSED_BY_DOXYGEN static const auto lastp1 = last+fix<1>; #else // Using a FixedExpr<1> expression is important here to make sure the compiler // can fully optimize the computation starting indices with zero overhead. static const symbolic::AddExpr,symbolic::ValueExpr > > lastp1(last+fix<1>()); #endif namespace internal { // Replace symbolic last/end "keywords" by their true runtime value inline Index eval_expr_given_size(Index x, Index /* size */) { return x; } template FixedInt eval_expr_given_size(FixedInt x, Index /*size*/) { return x; } template Index eval_expr_given_size(const symbolic::BaseExpr &x, Index size) { return x.derived().eval(last=size-1); } // Extract increment/step at compile time template struct get_compile_time_incr { enum { value = UndefinedIncr }; }; // Analogue of std::get<0>(x), but tailored for our needs. template EIGEN_CONSTEXPR Index first(const T& x) EIGEN_NOEXCEPT { return x.first(); } // IndexedViewCompatibleType/makeIndexedViewCompatible turn an arbitrary object of type T into something usable by MatrixSlice // The generic implementation is a no-op template struct IndexedViewCompatibleType { typedef T type; }; template const T& makeIndexedViewCompatible(const T& x, Index /*size*/, Q) { return x; } //-------------------------------------------------------------------------------- // Handling of a single Index //-------------------------------------------------------------------------------- struct SingleRange { enum { SizeAtCompileTime = 1 }; SingleRange(Index val) : m_value(val) {} Index operator[](Index) const { return m_value; } static EIGEN_CONSTEXPR Index size() EIGEN_NOEXCEPT { return 1; } Index first() const EIGEN_NOEXCEPT { return m_value; } Index m_value; }; template<> struct get_compile_time_incr { enum { value = 1 }; // 1 or 0 ?? }; // Turn a single index into something that looks like an array (i.e., that exposes a .size(), and operator[](int) methods) template struct IndexedViewCompatibleType::value>::type> { // Here we could simply use Array, but maybe it's less work for the compiler to use // a simpler wrapper as SingleRange //typedef Eigen::Array type; typedef SingleRange type; }; template struct IndexedViewCompatibleType::value>::type> { typedef SingleRange type; }; template typename enable_if::value,SingleRange>::type makeIndexedViewCompatible(const T& id, Index size, SpecializedType) { return eval_expr_given_size(id,size); } //-------------------------------------------------------------------------------- // Handling of all //-------------------------------------------------------------------------------- struct all_t { all_t() {} }; // Convert a symbolic 'all' into a usable range type template struct AllRange { enum { SizeAtCompileTime = XprSize }; AllRange(Index size = XprSize) : m_size(size) {} EIGEN_CONSTEXPR Index operator[](Index i) const EIGEN_NOEXCEPT { return i; } EIGEN_CONSTEXPR Index size() const EIGEN_NOEXCEPT { return m_size.value(); } EIGEN_CONSTEXPR Index first() const EIGEN_NOEXCEPT { return 0; } variable_if_dynamic m_size; }; template struct IndexedViewCompatibleType { typedef AllRange type; }; template inline AllRange::value> makeIndexedViewCompatible(all_t , XprSizeType size, SpecializedType) { return AllRange::value>(size); } template struct get_compile_time_incr > { enum { value = 1 }; }; } // end namespace internal /** \var all * \ingroup Core_Module * Can be used as a parameter to DenseBase::operator()(const RowIndices&, const ColIndices&) to index all rows or columns */ static const Eigen::internal::all_t all; // PLEASE use Eigen::all instead of Eigen::placeholders::all namespace placeholders { typedef symbolic::SymbolExpr last_t; typedef symbolic::AddExpr,symbolic::ValueExpr > > end_t; typedef Eigen::internal::all_t all_t; EIGEN_DEPRECATED static const all_t all = Eigen::all; // PLEASE use Eigen::all instead of Eigen::placeholders::all EIGEN_DEPRECATED static const last_t last = Eigen::last; // PLEASE use Eigen::last instead of Eigen::placeholders::last EIGEN_DEPRECATED static const end_t end = Eigen::lastp1; // PLEASE use Eigen::lastp1 instead of Eigen::placeholders::end } } // end namespace Eigen #endif // EIGEN_INDEXED_VIEW_HELPER_H RcppEigen/inst/include/Eigen/src/Core/util/ForwardDeclarations.h0000644000176200001440000003630314562417221024330 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2007-2010 Benoit Jacob // Copyright (C) 2008-2009 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_FORWARDDECLARATIONS_H #define EIGEN_FORWARDDECLARATIONS_H namespace Eigen { namespace internal { template struct traits; // here we say once and for all that traits == traits // When constness must affect traits, it has to be constness on template parameters on which T itself depends. // For example, traits > != traits >, but // traits > == traits > template struct traits : traits {}; template struct has_direct_access { enum { ret = (traits::Flags & DirectAccessBit) ? 1 : 0 }; }; template struct accessors_level { enum { has_direct_access = (traits::Flags & DirectAccessBit) ? 1 : 0, has_write_access = (traits::Flags & LvalueBit) ? 1 : 0, value = has_direct_access ? (has_write_access ? DirectWriteAccessors : DirectAccessors) : (has_write_access ? WriteAccessors : ReadOnlyAccessors) }; }; template struct evaluator_traits; template< typename T> struct evaluator; } // end namespace internal template struct NumTraits; template struct EigenBase; template class DenseBase; template class PlainObjectBase; template class DenseCoeffsBase; template class Matrix; template class MatrixBase; template class ArrayBase; template class Flagged; template class StorageBase > class NoAlias; template class NestByValue; template class ForceAlignedAccess; template class SwapWrapper; template class Block; template class IndexedView; template class Reshaped; template class VectorBlock; template class Transpose; template class Conjugate; template class CwiseNullaryOp; template class CwiseUnaryOp; template class CwiseUnaryView; template class CwiseBinaryOp; template class CwiseTernaryOp; template class Solve; template class Inverse; template class Product; template class DiagonalBase; template class DiagonalWrapper; template class DiagonalMatrix; template class DiagonalProduct; template class Diagonal; template class PermutationMatrix; template class Transpositions; template class PermutationBase; template class TranspositionsBase; template class PermutationWrapper; template class TranspositionsWrapper; template::has_write_access ? WriteAccessors : ReadOnlyAccessors > class MapBase; template class Stride; template class InnerStride; template class OuterStride; template > class Map; template class RefBase; template,OuterStride<> >::type > class Ref; template class TriangularBase; template class TriangularView; template class SelfAdjointView; template class SparseView; template class WithFormat; template struct CommaInitializer; template class ReturnByValue; template class ArrayWrapper; template class MatrixWrapper; template class SolverBase; template class InnerIterator; namespace internal { template class generic_randaccess_stl_iterator; template class pointer_based_stl_iterator; template class subvector_stl_iterator; template class subvector_stl_reverse_iterator; template struct kernel_retval_base; template struct kernel_retval; template struct image_retval_base; template struct image_retval; } // end namespace internal namespace internal { template class BandMatrix; } namespace internal { template struct product_type; template struct EnableIf; /** \internal * \class product_evaluator * Products need their own evaluator with more template arguments allowing for * easier partial template specializations. */ template< typename T, int ProductTag = internal::product_type::ret, typename LhsShape = typename evaluator_traits::Shape, typename RhsShape = typename evaluator_traits::Shape, typename LhsScalar = typename traits::Scalar, typename RhsScalar = typename traits::Scalar > struct product_evaluator; } template::value> struct ProductReturnType; // this is a workaround for sun CC template struct LazyProductReturnType; namespace internal { // Provides scalar/packet-wise product and product with accumulation // with optional conjugation of the arguments. template struct conj_helper; template struct scalar_sum_op; template struct scalar_difference_op; template struct scalar_conj_product_op; template struct scalar_min_op; template struct scalar_max_op; template struct scalar_opposite_op; template struct scalar_conjugate_op; template struct scalar_real_op; template struct scalar_imag_op; template struct scalar_abs_op; template struct scalar_abs2_op; template struct scalar_absolute_difference_op; template struct scalar_sqrt_op; template struct scalar_rsqrt_op; template struct scalar_exp_op; template struct scalar_log_op; template struct scalar_cos_op; template struct scalar_sin_op; template struct scalar_acos_op; template struct scalar_asin_op; template struct scalar_tan_op; template struct scalar_inverse_op; template struct scalar_square_op; template struct scalar_cube_op; template struct scalar_cast_op; template struct scalar_random_op; template struct scalar_constant_op; template struct scalar_identity_op; template struct scalar_sign_op; template struct scalar_pow_op; template struct scalar_hypot_op; template struct scalar_product_op; template struct scalar_quotient_op; // SpecialFunctions module template struct scalar_lgamma_op; template struct scalar_digamma_op; template struct scalar_erf_op; template struct scalar_erfc_op; template struct scalar_ndtri_op; template struct scalar_igamma_op; template struct scalar_igammac_op; template struct scalar_zeta_op; template struct scalar_betainc_op; // Bessel functions in SpecialFunctions module template struct scalar_bessel_i0_op; template struct scalar_bessel_i0e_op; template struct scalar_bessel_i1_op; template struct scalar_bessel_i1e_op; template struct scalar_bessel_j0_op; template struct scalar_bessel_y0_op; template struct scalar_bessel_j1_op; template struct scalar_bessel_y1_op; template struct scalar_bessel_k0_op; template struct scalar_bessel_k0e_op; template struct scalar_bessel_k1_op; template struct scalar_bessel_k1e_op; } // end namespace internal struct IOFormat; // Array module template class Array; template class Select; template class PartialReduxExpr; template class VectorwiseOp; template class Replicate; template class Reverse; template class FullPivLU; template class PartialPivLU; namespace internal { template struct inverse_impl; } template class HouseholderQR; template class ColPivHouseholderQR; template class FullPivHouseholderQR; template class CompleteOrthogonalDecomposition; template class SVDBase; template class JacobiSVD; template class BDCSVD; template class LLT; template class LDLT; template class HouseholderSequence; template class JacobiRotation; // Geometry module: template class RotationBase; template class Cross; template class QuaternionBase; template class Rotation2D; template class AngleAxis; template class Translation; template class AlignedBox; template class Quaternion; template class Transform; template class ParametrizedLine; template class Hyperplane; template class UniformScaling; template class Homogeneous; // Sparse module: template class SparseMatrixBase; // MatrixFunctions module template struct MatrixExponentialReturnValue; template class MatrixFunctionReturnValue; template class MatrixSquareRootReturnValue; template class MatrixLogarithmReturnValue; template class MatrixPowerReturnValue; template class MatrixComplexPowerReturnValue; namespace internal { template struct stem_function { typedef std::complex::Real> ComplexScalar; typedef ComplexScalar type(ComplexScalar, int); }; } } // end namespace Eigen #endif // EIGEN_FORWARDDECLARATIONS_H RcppEigen/inst/include/Eigen/src/Core/util/DisableStupidWarnings.h0000755000176200001440000001157114562417221024643 0ustar liggesusers#ifndef EIGEN_WARNINGS_DISABLED #define EIGEN_WARNINGS_DISABLED #ifdef _MSC_VER // 4100 - unreferenced formal parameter (occurred e.g. in aligned_allocator::destroy(pointer p)) // 4101 - unreferenced local variable // 4181 - qualifier applied to reference type ignored // 4211 - nonstandard extension used : redefined extern to static // 4244 - 'argument' : conversion from 'type1' to 'type2', possible loss of data // 4273 - QtAlignedMalloc, inconsistent DLL linkage // 4324 - structure was padded due to declspec(align()) // 4503 - decorated name length exceeded, name was truncated // 4512 - assignment operator could not be generated // 4522 - 'class' : multiple assignment operators specified // 4700 - uninitialized local variable 'xyz' used // 4714 - function marked as __forceinline not inlined // 4717 - 'function' : recursive on all control paths, function will cause runtime stack overflow // 4800 - 'type' : forcing value to bool 'true' or 'false' (performance warning) #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS #pragma warning( push ) #endif #pragma warning( disable : 4100 4101 4181 4211 4244 4273 4324 4503 4512 4522 4700 4714 4717 4800) #elif defined __INTEL_COMPILER // 2196 - routine is both "inline" and "noinline" ("noinline" assumed) // ICC 12 generates this warning even without any inline keyword, when defining class methods 'inline' i.e. inside of class body // typedef that may be a reference type. // 279 - controlling expression is constant // ICC 12 generates this warning on assert(constant_expression_depending_on_template_params) and frankly this is a legitimate use case. // 1684 - conversion from pointer to same-sized integral type (potential portability problem) // 2259 - non-pointer conversion from "Eigen::Index={ptrdiff_t={long}}" to "int" may lose significant bits #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS #pragma warning push #endif #pragma warning disable 2196 279 1684 2259 #elif defined __clang__ // -Wconstant-logical-operand - warning: use of logical && with constant operand; switch to bitwise & or remove constant // this is really a stupid warning as it warns on compile-time expressions involving enums // #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS // #pragma clang diagnostic push // #endif // #pragma clang diagnostic ignored "-Wconstant-logical-operand" // #if __clang_major__ >= 3 && __clang_minor__ >= 5 // #pragma clang diagnostic ignored "-Wabsolute-value" // #endif // #if __clang_major__ >= 10 // #pragma clang diagnostic ignored "-Wimplicit-int-float-conversion" // #endif // #if ( defined(__ALTIVEC__) || defined(__VSX__) ) && __cplusplus < 201103L // // warning: generic selections are a C11-specific feature // // ignoring warnings thrown at vec_ctf in Altivec/PacketMath.h // #pragma clang diagnostic ignored "-Wc11-extensions" // #endif #elif defined __GNUC__ // #if (!defined(EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS)) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)) // #pragma GCC diagnostic push // #endif // // g++ warns about local variables shadowing member functions, which is too strict // #pragma GCC diagnostic ignored "-Wshadow" // #if __GNUC__ == 4 && __GNUC_MINOR__ < 8 // // Until g++-4.7 there are warnings when comparing unsigned int vs 0, even in templated functions: // #pragma GCC diagnostic ignored "-Wtype-limits" // #endif // #if __GNUC__>=6 // #pragma GCC diagnostic ignored "-Wignored-attributes" // #endif // #if __GNUC__==7 // // See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89325 // #pragma GCC diagnostic ignored "-Wattributes" // #endif #endif #if defined __NVCC__ #pragma diag_suppress boolean_controlling_expr_is_constant // Disable the "statement is unreachable" message #pragma diag_suppress code_is_unreachable // Disable the "dynamic initialization in unreachable code" message #pragma diag_suppress initialization_not_reachable // Disable the "invalid error number" message that we get with older versions of nvcc #pragma diag_suppress 1222 // Disable the "calling a __host__ function from a __host__ __device__ function is not allowed" messages (yes, there are many of them and they seem to change with every version of the compiler) #pragma diag_suppress 2527 #pragma diag_suppress 2529 #pragma diag_suppress 2651 #pragma diag_suppress 2653 #pragma diag_suppress 2668 #pragma diag_suppress 2669 #pragma diag_suppress 2670 #pragma diag_suppress 2671 #pragma diag_suppress 2735 #pragma diag_suppress 2737 #pragma diag_suppress 2739 #endif #else // warnings already disabled: # ifndef EIGEN_WARNINGS_DISABLED_2 # define EIGEN_WARNINGS_DISABLED_2 # elif defined(EIGEN_INTERNAL_DEBUGGING) # error "Do not include \"DisableStupidWarnings.h\" recursively more than twice!" # endif #endif // not EIGEN_WARNINGS_DISABLED RcppEigen/inst/include/Eigen/src/Core/util/ConfigureVectorization.h0000644000176200001440000004664414562417221025106 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2018 Gael Guennebaud // Copyright (C) 2020, Arm Limited and Contributors // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CONFIGURE_VECTORIZATION_H #define EIGEN_CONFIGURE_VECTORIZATION_H //------------------------------------------------------------------------------------------ // Static and dynamic alignment control // // The main purpose of this section is to define EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES // as the maximal boundary in bytes on which dynamically and statically allocated data may be alignment respectively. // The values of EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES can be specified by the user. If not, // a default value is automatically computed based on architecture, compiler, and OS. // // This section also defines macros EIGEN_ALIGN_TO_BOUNDARY(N) and the shortcuts EIGEN_ALIGN{8,16,32,_MAX} // to be used to declare statically aligned buffers. //------------------------------------------------------------------------------------------ /* EIGEN_ALIGN_TO_BOUNDARY(n) forces data to be n-byte aligned. This is used to satisfy SIMD requirements. * However, we do that EVEN if vectorization (EIGEN_VECTORIZE) is disabled, * so that vectorization doesn't affect binary compatibility. * * If we made alignment depend on whether or not EIGEN_VECTORIZE is defined, it would be impossible to link * vectorized and non-vectorized code. * * FIXME: this code can be cleaned up once we switch to proper C++11 only. */ #if (defined EIGEN_CUDACC) #define EIGEN_ALIGN_TO_BOUNDARY(n) __align__(n) #define EIGEN_ALIGNOF(x) __alignof(x) #elif EIGEN_HAS_ALIGNAS #define EIGEN_ALIGN_TO_BOUNDARY(n) alignas(n) #define EIGEN_ALIGNOF(x) alignof(x) #elif EIGEN_COMP_GNUC || EIGEN_COMP_PGI || EIGEN_COMP_IBM || EIGEN_COMP_ARM #define EIGEN_ALIGN_TO_BOUNDARY(n) __attribute__((aligned(n))) #define EIGEN_ALIGNOF(x) __alignof(x) #elif EIGEN_COMP_MSVC #define EIGEN_ALIGN_TO_BOUNDARY(n) __declspec(align(n)) #define EIGEN_ALIGNOF(x) __alignof(x) #elif EIGEN_COMP_SUNCC // FIXME not sure about this one: #define EIGEN_ALIGN_TO_BOUNDARY(n) __attribute__((aligned(n))) #define EIGEN_ALIGNOF(x) __alignof(x) #else #error Please tell me what is the equivalent of alignas(n) and alignof(x) for your compiler #endif // If the user explicitly disable vectorization, then we also disable alignment #if defined(EIGEN_DONT_VECTORIZE) #if defined(EIGEN_GPUCC) // GPU code is always vectorized and requires memory alignment for // statically allocated buffers. #define EIGEN_IDEAL_MAX_ALIGN_BYTES 16 #else #define EIGEN_IDEAL_MAX_ALIGN_BYTES 0 #endif #elif defined(__AVX512F__) // 64 bytes static alignment is preferred only if really required #define EIGEN_IDEAL_MAX_ALIGN_BYTES 64 #elif defined(__AVX__) // 32 bytes static alignment is preferred only if really required #define EIGEN_IDEAL_MAX_ALIGN_BYTES 32 #else #define EIGEN_IDEAL_MAX_ALIGN_BYTES 16 #endif // EIGEN_MIN_ALIGN_BYTES defines the minimal value for which the notion of explicit alignment makes sense #define EIGEN_MIN_ALIGN_BYTES 16 // Defined the boundary (in bytes) on which the data needs to be aligned. Note // that unless EIGEN_ALIGN is defined and not equal to 0, the data may not be // aligned at all regardless of the value of this #define. #if (defined(EIGEN_DONT_ALIGN_STATICALLY) || defined(EIGEN_DONT_ALIGN)) && defined(EIGEN_MAX_STATIC_ALIGN_BYTES) && EIGEN_MAX_STATIC_ALIGN_BYTES>0 #error EIGEN_MAX_STATIC_ALIGN_BYTES and EIGEN_DONT_ALIGN[_STATICALLY] are both defined with EIGEN_MAX_STATIC_ALIGN_BYTES!=0. Use EIGEN_MAX_STATIC_ALIGN_BYTES=0 as a synonym of EIGEN_DONT_ALIGN_STATICALLY. #endif // EIGEN_DONT_ALIGN_STATICALLY and EIGEN_DONT_ALIGN are deprecated // They imply EIGEN_MAX_STATIC_ALIGN_BYTES=0 #if defined(EIGEN_DONT_ALIGN_STATICALLY) || defined(EIGEN_DONT_ALIGN) #ifdef EIGEN_MAX_STATIC_ALIGN_BYTES #undef EIGEN_MAX_STATIC_ALIGN_BYTES #endif #define EIGEN_MAX_STATIC_ALIGN_BYTES 0 #endif #ifndef EIGEN_MAX_STATIC_ALIGN_BYTES // Try to automatically guess what is the best default value for EIGEN_MAX_STATIC_ALIGN_BYTES // 16 byte alignment is only useful for vectorization. Since it affects the ABI, we need to enable // 16 byte alignment on all platforms where vectorization might be enabled. In theory we could always // enable alignment, but it can be a cause of problems on some platforms, so we just disable it in // certain common platform (compiler+architecture combinations) to avoid these problems. // Only static alignment is really problematic (relies on nonstandard compiler extensions), // try to keep heap alignment even when we have to disable static alignment. #if EIGEN_COMP_GNUC && !(EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_ARM_OR_ARM64 || EIGEN_ARCH_PPC || EIGEN_ARCH_IA64 || EIGEN_ARCH_MIPS) #define EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT 1 #elif EIGEN_ARCH_ARM_OR_ARM64 && EIGEN_COMP_GNUC_STRICT && EIGEN_GNUC_AT_MOST(4, 6) // Old versions of GCC on ARM, at least 4.4, were once seen to have buggy static alignment support. // Not sure which version fixed it, hopefully it doesn't affect 4.7, which is still somewhat in use. // 4.8 and newer seem definitely unaffected. #define EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT 1 #else #define EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT 0 #endif // static alignment is completely disabled with GCC 3, Sun Studio, and QCC/QNX #if !EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT \ && !EIGEN_GCC3_OR_OLDER \ && !EIGEN_COMP_SUNCC \ && !EIGEN_OS_QNX #define EIGEN_ARCH_WANTS_STACK_ALIGNMENT 1 #else #define EIGEN_ARCH_WANTS_STACK_ALIGNMENT 0 #endif #if EIGEN_ARCH_WANTS_STACK_ALIGNMENT #define EIGEN_MAX_STATIC_ALIGN_BYTES EIGEN_IDEAL_MAX_ALIGN_BYTES #else #define EIGEN_MAX_STATIC_ALIGN_BYTES 0 #endif #endif // If EIGEN_MAX_ALIGN_BYTES is defined, then it is considered as an upper bound for EIGEN_MAX_STATIC_ALIGN_BYTES #if defined(EIGEN_MAX_ALIGN_BYTES) && EIGEN_MAX_ALIGN_BYTES0 is the true test whether we want to align arrays on the stack or not. // It takes into account both the user choice to explicitly enable/disable alignment (by setting EIGEN_MAX_STATIC_ALIGN_BYTES) // and the architecture config (EIGEN_ARCH_WANTS_STACK_ALIGNMENT). // Henceforth, only EIGEN_MAX_STATIC_ALIGN_BYTES should be used. // Shortcuts to EIGEN_ALIGN_TO_BOUNDARY #define EIGEN_ALIGN8 EIGEN_ALIGN_TO_BOUNDARY(8) #define EIGEN_ALIGN16 EIGEN_ALIGN_TO_BOUNDARY(16) #define EIGEN_ALIGN32 EIGEN_ALIGN_TO_BOUNDARY(32) #define EIGEN_ALIGN64 EIGEN_ALIGN_TO_BOUNDARY(64) #if EIGEN_MAX_STATIC_ALIGN_BYTES>0 #define EIGEN_ALIGN_MAX EIGEN_ALIGN_TO_BOUNDARY(EIGEN_MAX_STATIC_ALIGN_BYTES) #else #define EIGEN_ALIGN_MAX #endif // Dynamic alignment control #if defined(EIGEN_DONT_ALIGN) && defined(EIGEN_MAX_ALIGN_BYTES) && EIGEN_MAX_ALIGN_BYTES>0 #error EIGEN_MAX_ALIGN_BYTES and EIGEN_DONT_ALIGN are both defined with EIGEN_MAX_ALIGN_BYTES!=0. Use EIGEN_MAX_ALIGN_BYTES=0 as a synonym of EIGEN_DONT_ALIGN. #endif #ifdef EIGEN_DONT_ALIGN #ifdef EIGEN_MAX_ALIGN_BYTES #undef EIGEN_MAX_ALIGN_BYTES #endif #define EIGEN_MAX_ALIGN_BYTES 0 #elif !defined(EIGEN_MAX_ALIGN_BYTES) #define EIGEN_MAX_ALIGN_BYTES EIGEN_IDEAL_MAX_ALIGN_BYTES #endif #if EIGEN_IDEAL_MAX_ALIGN_BYTES > EIGEN_MAX_ALIGN_BYTES #define EIGEN_DEFAULT_ALIGN_BYTES EIGEN_IDEAL_MAX_ALIGN_BYTES #else #define EIGEN_DEFAULT_ALIGN_BYTES EIGEN_MAX_ALIGN_BYTES #endif #ifndef EIGEN_UNALIGNED_VECTORIZE #define EIGEN_UNALIGNED_VECTORIZE 1 #endif //---------------------------------------------------------------------- // if alignment is disabled, then disable vectorization. Note: EIGEN_MAX_ALIGN_BYTES is the proper check, it takes into // account both the user's will (EIGEN_MAX_ALIGN_BYTES,EIGEN_DONT_ALIGN) and our own platform checks #if EIGEN_MAX_ALIGN_BYTES==0 #ifndef EIGEN_DONT_VECTORIZE #define EIGEN_DONT_VECTORIZE #endif #endif // The following (except #include and _M_IX86_FP ??) can likely be // removed as gcc 4.1 and msvc 2008 are not supported anyways. #if EIGEN_COMP_MSVC #include // for _aligned_malloc -- need it regardless of whether vectorization is enabled #if (EIGEN_COMP_MSVC >= 1500) // 2008 or later // a user reported that in 64-bit mode, MSVC doesn't care to define _M_IX86_FP. #if (defined(_M_IX86_FP) && (_M_IX86_FP >= 2)) || EIGEN_ARCH_x86_64 #define EIGEN_SSE2_ON_MSVC_2008_OR_LATER #endif #endif #else #if (defined __SSE2__) && ( (!EIGEN_COMP_GNUC) || EIGEN_COMP_ICC || EIGEN_GNUC_AT_LEAST(4,2) ) #define EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC #endif #endif #if !(defined(EIGEN_DONT_VECTORIZE) || defined(EIGEN_GPUCC)) #if defined (EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC) || defined(EIGEN_SSE2_ON_MSVC_2008_OR_LATER) // Defines symbols for compile-time detection of which instructions are // used. // EIGEN_VECTORIZE_YY is defined if and only if the instruction set YY is used #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_SSE #define EIGEN_VECTORIZE_SSE2 // Detect sse3/ssse3/sse4: // gcc and icc defines __SSE3__, ... // there is no way to know about this on msvc. You can define EIGEN_VECTORIZE_SSE* if you // want to force the use of those instructions with msvc. #ifdef __SSE3__ #define EIGEN_VECTORIZE_SSE3 #endif #ifdef __SSSE3__ #define EIGEN_VECTORIZE_SSSE3 #endif #ifdef __SSE4_1__ #define EIGEN_VECTORIZE_SSE4_1 #endif #ifdef __SSE4_2__ #define EIGEN_VECTORIZE_SSE4_2 #endif #ifdef __AVX__ #ifndef EIGEN_USE_SYCL #define EIGEN_VECTORIZE_AVX #endif #define EIGEN_VECTORIZE_SSE3 #define EIGEN_VECTORIZE_SSSE3 #define EIGEN_VECTORIZE_SSE4_1 #define EIGEN_VECTORIZE_SSE4_2 #endif #ifdef __AVX2__ #ifndef EIGEN_USE_SYCL #define EIGEN_VECTORIZE_AVX2 #define EIGEN_VECTORIZE_AVX #endif #define EIGEN_VECTORIZE_SSE3 #define EIGEN_VECTORIZE_SSSE3 #define EIGEN_VECTORIZE_SSE4_1 #define EIGEN_VECTORIZE_SSE4_2 #endif #if defined(__FMA__) || (EIGEN_COMP_MSVC && defined(__AVX2__)) // MSVC does not expose a switch dedicated for FMA // For MSVC, AVX2 => FMA #define EIGEN_VECTORIZE_FMA #endif #if defined(__AVX512F__) #ifndef EIGEN_VECTORIZE_FMA #if EIGEN_COMP_GNUC #error Please add -mfma to your compiler flags: compiling with -mavx512f alone without SSE/AVX FMA is not supported (bug 1638). #else #error Please enable FMA in your compiler flags (e.g. -mfma): compiling with AVX512 alone without SSE/AVX FMA is not supported (bug 1638). #endif #endif #ifndef EIGEN_USE_SYCL #define EIGEN_VECTORIZE_AVX512 #define EIGEN_VECTORIZE_AVX2 #define EIGEN_VECTORIZE_AVX #endif #define EIGEN_VECTORIZE_FMA #define EIGEN_VECTORIZE_SSE3 #define EIGEN_VECTORIZE_SSSE3 #define EIGEN_VECTORIZE_SSE4_1 #define EIGEN_VECTORIZE_SSE4_2 #ifndef EIGEN_USE_SYCL #ifdef __AVX512DQ__ #define EIGEN_VECTORIZE_AVX512DQ #endif #ifdef __AVX512ER__ #define EIGEN_VECTORIZE_AVX512ER #endif #ifdef __AVX512BF16__ #define EIGEN_VECTORIZE_AVX512BF16 #endif #endif #endif // Disable AVX support on broken xcode versions #if defined(__apple_build_version__) && (__apple_build_version__ == 11000033 ) && ( __MAC_OS_X_VERSION_MIN_REQUIRED == 101500 ) // A nasty bug in the clang compiler shipped with xcode in a common compilation situation // when XCode 11.0 and Mac deployment target macOS 10.15 is https://trac.macports.org/ticket/58776#no1 #ifdef EIGEN_VECTORIZE_AVX #undef EIGEN_VECTORIZE_AVX #warning "Disabling AVX support: clang compiler shipped with XCode 11.[012] generates broken assembly with -macosx-version-min=10.15 and AVX enabled. " #ifdef EIGEN_VECTORIZE_AVX2 #undef EIGEN_VECTORIZE_AVX2 #endif #ifdef EIGEN_VECTORIZE_FMA #undef EIGEN_VECTORIZE_FMA #endif #ifdef EIGEN_VECTORIZE_AVX512 #undef EIGEN_VECTORIZE_AVX512 #endif #ifdef EIGEN_VECTORIZE_AVX512DQ #undef EIGEN_VECTORIZE_AVX512DQ #endif #ifdef EIGEN_VECTORIZE_AVX512ER #undef EIGEN_VECTORIZE_AVX512ER #endif #endif // NOTE: Confirmed test failures in XCode 11.0, and XCode 11.2 with -macosx-version-min=10.15 and AVX // NOTE using -macosx-version-min=10.15 with Xcode 11.0 results in runtime segmentation faults in many tests, 11.2 produce core dumps in 3 tests // NOTE using -macosx-version-min=10.14 produces functioning and passing tests in all cases // NOTE __clang_version__ "11.0.0 (clang-1100.0.33.8)" XCode 11.0 <- Produces many segfault and core dumping tests // with -macosx-version-min=10.15 and AVX // NOTE __clang_version__ "11.0.0 (clang-1100.0.33.12)" XCode 11.2 <- Produces 3 core dumping tests with // -macosx-version-min=10.15 and AVX #endif // include files // This extern "C" works around a MINGW-w64 compilation issue // https://sourceforge.net/tracker/index.php?func=detail&aid=3018394&group_id=202880&atid=983354 // In essence, intrin.h is included by windows.h and also declares intrinsics (just as emmintrin.h etc. below do). // However, intrin.h uses an extern "C" declaration, and g++ thus complains of duplicate declarations // with conflicting linkage. The linkage for intrinsics doesn't matter, but at that stage the compiler doesn't know; // so, to avoid compile errors when windows.h is included after Eigen/Core, ensure intrinsics are extern "C" here too. // notice that since these are C headers, the extern "C" is theoretically needed anyways. extern "C" { // In theory we should only include immintrin.h and not the other *mmintrin.h header files directly. // Doing so triggers some issues with ICC. However old gcc versions seems to not have this file, thus: #if EIGEN_COMP_ICC >= 1110 #include #else #include #include #include #ifdef EIGEN_VECTORIZE_SSE3 #include #endif #ifdef EIGEN_VECTORIZE_SSSE3 #include #endif #ifdef EIGEN_VECTORIZE_SSE4_1 #include #endif #ifdef EIGEN_VECTORIZE_SSE4_2 #include #endif #if defined(EIGEN_VECTORIZE_AVX) || defined(EIGEN_VECTORIZE_AVX512) #include #endif #endif } // end extern "C" #elif defined __VSX__ #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_VSX #include // We need to #undef all these ugly tokens defined in // => use __vector instead of vector #undef bool #undef vector #undef pixel #elif defined __ALTIVEC__ #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_ALTIVEC #include // We need to #undef all these ugly tokens defined in // => use __vector instead of vector #undef bool #undef vector #undef pixel #elif ((defined __ARM_NEON) || (defined __ARM_NEON__)) && !(defined EIGEN_ARM64_USE_SVE) #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_NEON #include // We currently require SVE to be enabled explicitly via EIGEN_ARM64_USE_SVE and // will not select the backend automatically #elif (defined __ARM_FEATURE_SVE) && (defined EIGEN_ARM64_USE_SVE) #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_SVE #include // Since we depend on knowing SVE vector lengths at compile-time, we need // to ensure a fixed lengths is set #if defined __ARM_FEATURE_SVE_BITS #define EIGEN_ARM64_SVE_VL __ARM_FEATURE_SVE_BITS #else #error "Eigen requires a fixed SVE lector length but EIGEN_ARM64_SVE_VL is not set." #endif #elif (defined __s390x__ && defined __VEC__) #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_ZVECTOR #include #elif defined __mips_msa // Limit MSA optimizations to little-endian CPUs for now. // TODO: Perhaps, eventually support MSA optimizations on big-endian CPUs? #if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) #if defined(__LP64__) #define EIGEN_MIPS_64 #else #define EIGEN_MIPS_32 #endif #define EIGEN_VECTORIZE #define EIGEN_VECTORIZE_MSA #include #endif #endif #endif // Following the Arm ACLE arm_neon.h should also include arm_fp16.h but not all // compilers seem to follow this. We therefore include it explicitly. // See also: https://bugs.llvm.org/show_bug.cgi?id=47955 #if defined(EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC) #include #endif #if defined(__F16C__) && (!defined(EIGEN_GPUCC) && (!defined(EIGEN_COMP_CLANG) || EIGEN_COMP_CLANG>=380)) // We can use the optimized fp16 to float and float to fp16 conversion routines #define EIGEN_HAS_FP16_C #if defined(EIGEN_COMP_CLANG) // Workaround for clang: The FP16C intrinsics for clang are included by // immintrin.h, as opposed to emmintrin.h as suggested by Intel: // https://software.intel.com/sites/landingpage/IntrinsicsGuide/#othertechs=FP16C&expand=1711 #include #endif #endif #if defined EIGEN_CUDACC #define EIGEN_VECTORIZE_GPU #include #if EIGEN_CUDA_SDK_VER >= 70500 #define EIGEN_HAS_CUDA_FP16 #endif #endif #if defined(EIGEN_HAS_CUDA_FP16) #include #include #endif #if defined(EIGEN_HIPCC) #define EIGEN_VECTORIZE_GPU #include #define EIGEN_HAS_HIP_FP16 #include #endif /** \brief Namespace containing all symbols from the %Eigen library. */ namespace Eigen { inline static const char *SimdInstructionSetsInUse(void) { #if defined(EIGEN_VECTORIZE_AVX512) return "AVX512, FMA, AVX2, AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2"; #elif defined(EIGEN_VECTORIZE_AVX) return "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2"; #elif defined(EIGEN_VECTORIZE_SSE4_2) return "SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2"; #elif defined(EIGEN_VECTORIZE_SSE4_1) return "SSE, SSE2, SSE3, SSSE3, SSE4.1"; #elif defined(EIGEN_VECTORIZE_SSSE3) return "SSE, SSE2, SSE3, SSSE3"; #elif defined(EIGEN_VECTORIZE_SSE3) return "SSE, SSE2, SSE3"; #elif defined(EIGEN_VECTORIZE_SSE2) return "SSE, SSE2"; #elif defined(EIGEN_VECTORIZE_ALTIVEC) return "AltiVec"; #elif defined(EIGEN_VECTORIZE_VSX) return "VSX"; #elif defined(EIGEN_VECTORIZE_NEON) return "ARM NEON"; #elif defined(EIGEN_VECTORIZE_SVE) return "ARM SVE"; #elif defined(EIGEN_VECTORIZE_ZVECTOR) return "S390X ZVECTOR"; #elif defined(EIGEN_VECTORIZE_MSA) return "MIPS MSA"; #else return "None"; #endif } } // end namespace Eigen #endif // EIGEN_CONFIGURE_VECTORIZATION_H RcppEigen/inst/include/Eigen/src/Core/util/Macros.h0000644000176200001440000014725514562417221021630 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2015 Gael Guennebaud // Copyright (C) 2006-2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_MACROS_H #define EIGEN_MACROS_H //------------------------------------------------------------------------------------------ // Eigen version and basic defaults //------------------------------------------------------------------------------------------ #define EIGEN_WORLD_VERSION 3 #define EIGEN_MAJOR_VERSION 4 #define EIGEN_MINOR_VERSION 0 #define EIGEN_VERSION_AT_LEAST(x,y,z) (EIGEN_WORLD_VERSION>x || (EIGEN_WORLD_VERSION>=x && \ (EIGEN_MAJOR_VERSION>y || (EIGEN_MAJOR_VERSION>=y && \ EIGEN_MINOR_VERSION>=z)))) #ifdef EIGEN_DEFAULT_TO_ROW_MAJOR #define EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION Eigen::RowMajor #else #define EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION Eigen::ColMajor #endif #ifndef EIGEN_DEFAULT_DENSE_INDEX_TYPE #define EIGEN_DEFAULT_DENSE_INDEX_TYPE std::ptrdiff_t #endif // Upperbound on the C++ version to use. // Expected values are 03, 11, 14, 17, etc. // By default, let's use an arbitrarily large C++ version. #ifndef EIGEN_MAX_CPP_VER #define EIGEN_MAX_CPP_VER 99 #endif /** Allows to disable some optimizations which might affect the accuracy of the result. * Such optimization are enabled by default, and set EIGEN_FAST_MATH to 0 to disable them. * They currently include: * - single precision ArrayBase::sin() and ArrayBase::cos() for SSE and AVX vectorization. */ #ifndef EIGEN_FAST_MATH #define EIGEN_FAST_MATH 1 #endif #ifndef EIGEN_STACK_ALLOCATION_LIMIT // 131072 == 128 KB #define EIGEN_STACK_ALLOCATION_LIMIT 131072 #endif //------------------------------------------------------------------------------------------ // Compiler identification, EIGEN_COMP_* //------------------------------------------------------------------------------------------ /// \internal EIGEN_COMP_GNUC set to 1 for all compilers compatible with GCC #ifdef __GNUC__ #define EIGEN_COMP_GNUC (__GNUC__*10+__GNUC_MINOR__) #else #define EIGEN_COMP_GNUC 0 #endif /// \internal EIGEN_COMP_CLANG set to major+minor version (e.g., 307 for clang 3.7) if the compiler is clang #if defined(__clang__) #define EIGEN_COMP_CLANG (__clang_major__*100+__clang_minor__) #else #define EIGEN_COMP_CLANG 0 #endif /// \internal EIGEN_COMP_CASTXML set to 1 if being preprocessed by CastXML #if defined(__castxml__) #define EIGEN_COMP_CASTXML 1 #else #define EIGEN_COMP_CASTXML 0 #endif /// \internal EIGEN_COMP_LLVM set to 1 if the compiler backend is llvm #if defined(__llvm__) #define EIGEN_COMP_LLVM 1 #else #define EIGEN_COMP_LLVM 0 #endif /// \internal EIGEN_COMP_ICC set to __INTEL_COMPILER if the compiler is Intel compiler, 0 otherwise #if defined(__INTEL_COMPILER) #define EIGEN_COMP_ICC __INTEL_COMPILER #else #define EIGEN_COMP_ICC 0 #endif /// \internal EIGEN_COMP_MINGW set to 1 if the compiler is mingw #if defined(__MINGW32__) #define EIGEN_COMP_MINGW 1 #else #define EIGEN_COMP_MINGW 0 #endif /// \internal EIGEN_COMP_SUNCC set to 1 if the compiler is Solaris Studio #if defined(__SUNPRO_CC) #define EIGEN_COMP_SUNCC 1 #else #define EIGEN_COMP_SUNCC 0 #endif /// \internal EIGEN_COMP_MSVC set to _MSC_VER if the compiler is Microsoft Visual C++, 0 otherwise. #if defined(_MSC_VER) #define EIGEN_COMP_MSVC _MSC_VER #else #define EIGEN_COMP_MSVC 0 #endif #if defined(__NVCC__) #if defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 9) #define EIGEN_COMP_NVCC ((__CUDACC_VER_MAJOR__ * 10000) + (__CUDACC_VER_MINOR__ * 100)) #elif defined(__CUDACC_VER__) #define EIGEN_COMP_NVCC __CUDACC_VER__ #else #error "NVCC did not define compiler version." #endif #else #define EIGEN_COMP_NVCC 0 #endif // For the record, here is a table summarizing the possible values for EIGEN_COMP_MSVC: // name ver MSC_VER // 2008 9 1500 // 2010 10 1600 // 2012 11 1700 // 2013 12 1800 // 2015 14 1900 // "15" 15 1900 // 2017-14.1 15.0 1910 // 2017-14.11 15.3 1911 // 2017-14.12 15.5 1912 // 2017-14.13 15.6 1913 // 2017-14.14 15.7 1914 /// \internal EIGEN_COMP_MSVC_LANG set to _MSVC_LANG if the compiler is Microsoft Visual C++, 0 otherwise. #if defined(_MSVC_LANG) #define EIGEN_COMP_MSVC_LANG _MSVC_LANG #else #define EIGEN_COMP_MSVC_LANG 0 #endif // For the record, here is a table summarizing the possible values for EIGEN_COMP_MSVC_LANG: // MSVC option Standard MSVC_LANG // /std:c++14 (default as of VS 2019) C++14 201402L // /std:c++17 C++17 201703L // /std:c++latest >C++17 >201703L /// \internal EIGEN_COMP_MSVC_STRICT set to 1 if the compiler is really Microsoft Visual C++ and not ,e.g., ICC or clang-cl #if EIGEN_COMP_MSVC && !(EIGEN_COMP_ICC || EIGEN_COMP_LLVM || EIGEN_COMP_CLANG) #define EIGEN_COMP_MSVC_STRICT _MSC_VER #else #define EIGEN_COMP_MSVC_STRICT 0 #endif /// \internal EIGEN_COMP_IBM set to xlc version if the compiler is IBM XL C++ // XLC version // 3.1 0x0301 // 4.5 0x0405 // 5.0 0x0500 // 12.1 0x0C01 #if defined(__IBMCPP__) || defined(__xlc__) || defined(__ibmxl__) #define EIGEN_COMP_IBM __xlC__ #else #define EIGEN_COMP_IBM 0 #endif /// \internal EIGEN_COMP_PGI set to PGI version if the compiler is Portland Group Compiler #if defined(__PGI) #define EIGEN_COMP_PGI (__PGIC__*100+__PGIC_MINOR__) #else #define EIGEN_COMP_PGI 0 #endif /// \internal EIGEN_COMP_ARM set to 1 if the compiler is ARM Compiler #if defined(__CC_ARM) || defined(__ARMCC_VERSION) #define EIGEN_COMP_ARM 1 #else #define EIGEN_COMP_ARM 0 #endif /// \internal EIGEN_COMP_EMSCRIPTEN set to 1 if the compiler is Emscripten Compiler #if defined(__EMSCRIPTEN__) #define EIGEN_COMP_EMSCRIPTEN 1 #else #define EIGEN_COMP_EMSCRIPTEN 0 #endif /// \internal EIGEN_GNUC_STRICT set to 1 if the compiler is really GCC and not a compatible compiler (e.g., ICC, clang, mingw, etc.) #if EIGEN_COMP_GNUC && !(EIGEN_COMP_CLANG || EIGEN_COMP_ICC || EIGEN_COMP_MINGW || EIGEN_COMP_PGI || EIGEN_COMP_IBM || EIGEN_COMP_ARM || EIGEN_COMP_EMSCRIPTEN) #define EIGEN_COMP_GNUC_STRICT 1 #else #define EIGEN_COMP_GNUC_STRICT 0 #endif #if EIGEN_COMP_GNUC #define EIGEN_GNUC_AT_LEAST(x,y) ((__GNUC__==x && __GNUC_MINOR__>=y) || __GNUC__>x) #define EIGEN_GNUC_AT_MOST(x,y) ((__GNUC__==x && __GNUC_MINOR__<=y) || __GNUC__= 8 #define EIGEN_ARCH_ARMV8 1 #else #define EIGEN_ARCH_ARMV8 0 #endif /// \internal EIGEN_HAS_ARM64_FP16 set to 1 if the architecture provides an IEEE /// compliant Arm fp16 type #if EIGEN_ARCH_ARM64 #ifndef EIGEN_HAS_ARM64_FP16 #if defined(__ARM_FP16_FORMAT_IEEE) #define EIGEN_HAS_ARM64_FP16 1 #else #define EIGEN_HAS_ARM64_FP16 0 #endif #endif #endif /// \internal EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC set to 1 if the architecture /// supports Neon vector intrinsics for fp16. #if EIGEN_ARCH_ARM64 #ifndef EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC #if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) #define EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC 1 #else #define EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC 0 #endif #endif #endif /// \internal EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC set to 1 if the architecture /// supports Neon scalar intrinsics for fp16. #if EIGEN_ARCH_ARM64 #ifndef EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC #if defined(__ARM_FEATURE_FP16_SCALAR_ARITHMETIC) #define EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC 1 #endif #endif #endif /// \internal EIGEN_ARCH_MIPS set to 1 if the architecture is MIPS #if defined(__mips__) || defined(__mips) #define EIGEN_ARCH_MIPS 1 #else #define EIGEN_ARCH_MIPS 0 #endif /// \internal EIGEN_ARCH_SPARC set to 1 if the architecture is SPARC #if defined(__sparc__) || defined(__sparc) #define EIGEN_ARCH_SPARC 1 #else #define EIGEN_ARCH_SPARC 0 #endif /// \internal EIGEN_ARCH_IA64 set to 1 if the architecture is Intel Itanium #if defined(__ia64__) #define EIGEN_ARCH_IA64 1 #else #define EIGEN_ARCH_IA64 0 #endif /// \internal EIGEN_ARCH_PPC set to 1 if the architecture is PowerPC #if defined(__powerpc__) || defined(__ppc__) || defined(_M_PPC) #define EIGEN_ARCH_PPC 1 #else #define EIGEN_ARCH_PPC 0 #endif //------------------------------------------------------------------------------------------ // Operating system identification, EIGEN_OS_* //------------------------------------------------------------------------------------------ /// \internal EIGEN_OS_UNIX set to 1 if the OS is a unix variant #if defined(__unix__) || defined(__unix) #define EIGEN_OS_UNIX 1 #else #define EIGEN_OS_UNIX 0 #endif /// \internal EIGEN_OS_LINUX set to 1 if the OS is based on Linux kernel #if defined(__linux__) #define EIGEN_OS_LINUX 1 #else #define EIGEN_OS_LINUX 0 #endif /// \internal EIGEN_OS_ANDROID set to 1 if the OS is Android // note: ANDROID is defined when using ndk_build, __ANDROID__ is defined when using a standalone toolchain. #if defined(__ANDROID__) || defined(ANDROID) #define EIGEN_OS_ANDROID 1 #else #define EIGEN_OS_ANDROID 0 #endif /// \internal EIGEN_OS_GNULINUX set to 1 if the OS is GNU Linux and not Linux-based OS (e.g., not android) #if defined(__gnu_linux__) && !(EIGEN_OS_ANDROID) #define EIGEN_OS_GNULINUX 1 #else #define EIGEN_OS_GNULINUX 0 #endif /// \internal EIGEN_OS_BSD set to 1 if the OS is a BSD variant #if defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__bsdi__) || defined(__DragonFly__) #define EIGEN_OS_BSD 1 #else #define EIGEN_OS_BSD 0 #endif /// \internal EIGEN_OS_MAC set to 1 if the OS is MacOS #if defined(__APPLE__) #define EIGEN_OS_MAC 1 #else #define EIGEN_OS_MAC 0 #endif /// \internal EIGEN_OS_QNX set to 1 if the OS is QNX #if defined(__QNX__) #define EIGEN_OS_QNX 1 #else #define EIGEN_OS_QNX 0 #endif /// \internal EIGEN_OS_WIN set to 1 if the OS is Windows based #if defined(_WIN32) #define EIGEN_OS_WIN 1 #else #define EIGEN_OS_WIN 0 #endif /// \internal EIGEN_OS_WIN64 set to 1 if the OS is Windows 64bits #if defined(_WIN64) #define EIGEN_OS_WIN64 1 #else #define EIGEN_OS_WIN64 0 #endif /// \internal EIGEN_OS_WINCE set to 1 if the OS is Windows CE #if defined(_WIN32_WCE) #define EIGEN_OS_WINCE 1 #else #define EIGEN_OS_WINCE 0 #endif /// \internal EIGEN_OS_CYGWIN set to 1 if the OS is Windows/Cygwin #if defined(__CYGWIN__) #define EIGEN_OS_CYGWIN 1 #else #define EIGEN_OS_CYGWIN 0 #endif /// \internal EIGEN_OS_WIN_STRICT set to 1 if the OS is really Windows and not some variants #if EIGEN_OS_WIN && !( EIGEN_OS_WINCE || EIGEN_OS_CYGWIN ) #define EIGEN_OS_WIN_STRICT 1 #else #define EIGEN_OS_WIN_STRICT 0 #endif /// \internal EIGEN_OS_SUN set to __SUNPRO_C if the OS is SUN // compiler solaris __SUNPRO_C // version studio // 5.7 10 0x570 // 5.8 11 0x580 // 5.9 12 0x590 // 5.10 12.1 0x5100 // 5.11 12.2 0x5110 // 5.12 12.3 0x5120 #if (defined(sun) || defined(__sun)) && !(defined(__SVR4) || defined(__svr4__)) #define EIGEN_OS_SUN __SUNPRO_C #else #define EIGEN_OS_SUN 0 #endif /// \internal EIGEN_OS_SOLARIS set to 1 if the OS is Solaris #if (defined(sun) || defined(__sun)) && (defined(__SVR4) || defined(__svr4__)) #define EIGEN_OS_SOLARIS 1 #else #define EIGEN_OS_SOLARIS 0 #endif //------------------------------------------------------------------------------------------ // Detect GPU compilers and architectures //------------------------------------------------------------------------------------------ // NVCC is not supported as the target platform for HIPCC // Note that this also makes EIGEN_CUDACC and EIGEN_HIPCC mutually exclusive #if defined(__NVCC__) && defined(__HIPCC__) #error "NVCC as the target platform for HIPCC is currently not supported." #endif #if defined(__CUDACC__) && !defined(EIGEN_NO_CUDA) // Means the compiler is either nvcc or clang with CUDA enabled #define EIGEN_CUDACC __CUDACC__ #endif #if defined(__CUDA_ARCH__) && !defined(EIGEN_NO_CUDA) // Means we are generating code for the device #define EIGEN_CUDA_ARCH __CUDA_ARCH__ #endif #if defined(EIGEN_CUDACC) #include #define EIGEN_CUDA_SDK_VER (CUDA_VERSION * 10) #else #define EIGEN_CUDA_SDK_VER 0 #endif #if defined(__HIPCC__) && !defined(EIGEN_NO_HIP) // Means the compiler is HIPCC (analogous to EIGEN_CUDACC, but for HIP) #define EIGEN_HIPCC __HIPCC__ // We need to include hip_runtime.h here because it pulls in // ++ hip_common.h which contains the define for __HIP_DEVICE_COMPILE__ // ++ host_defines.h which contains the defines for the __host__ and __device__ macros #include #if defined(__HIP_DEVICE_COMPILE__) // analogous to EIGEN_CUDA_ARCH, but for HIP #define EIGEN_HIP_DEVICE_COMPILE __HIP_DEVICE_COMPILE__ #endif // For HIP (ROCm 3.5 and higher), we need to explicitly set the launch_bounds attribute // value to 1024. The compiler assigns a default value of 256 when the attribute is not // specified. This results in failures on the HIP platform, for cases when a GPU kernel // without an explicit launch_bounds attribute is called with a threads_per_block value // greater than 256. // // This is a regression in functioanlity and is expected to be fixed within the next // couple of ROCm releases (compiler will go back to using 1024 value as the default) // // In the meantime, we will use a "only enabled for HIP" macro to set the launch_bounds // attribute. #define EIGEN_HIP_LAUNCH_BOUNDS_1024 __launch_bounds__(1024) #endif #if !defined(EIGEN_HIP_LAUNCH_BOUNDS_1024) #define EIGEN_HIP_LAUNCH_BOUNDS_1024 #endif // !defined(EIGEN_HIP_LAUNCH_BOUNDS_1024) // Unify CUDA/HIPCC #if defined(EIGEN_CUDACC) || defined(EIGEN_HIPCC) // // If either EIGEN_CUDACC or EIGEN_HIPCC is defined, then define EIGEN_GPUCC // #define EIGEN_GPUCC // // EIGEN_HIPCC implies the HIP compiler and is used to tweak Eigen code for use in HIP kernels // EIGEN_CUDACC implies the CUDA compiler and is used to tweak Eigen code for use in CUDA kernels // // In most cases the same tweaks are required to the Eigen code to enable in both the HIP and CUDA kernels. // For those cases, the corresponding code should be guarded with // #if defined(EIGEN_GPUCC) // instead of // #if defined(EIGEN_CUDACC) || defined(EIGEN_HIPCC) // // For cases where the tweak is specific to HIP, the code should be guarded with // #if defined(EIGEN_HIPCC) // // For cases where the tweak is specific to CUDA, the code should be guarded with // #if defined(EIGEN_CUDACC) // #endif #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) // // If either EIGEN_CUDA_ARCH or EIGEN_HIP_DEVICE_COMPILE is defined, then define EIGEN_GPU_COMPILE_PHASE // #define EIGEN_GPU_COMPILE_PHASE // // GPU compilers (HIPCC, NVCC) typically do two passes over the source code, // + one to compile the source for the "host" (ie CPU) // + another to compile the source for the "device" (ie. GPU) // // Code that needs to enabled only during the either the "host" or "device" compilation phase // needs to be guarded with a macro that indicates the current compilation phase // // EIGEN_HIP_DEVICE_COMPILE implies the device compilation phase in HIP // EIGEN_CUDA_ARCH implies the device compilation phase in CUDA // // In most cases, the "host" / "device" specific code is the same for both HIP and CUDA // For those cases, the code should be guarded with // #if defined(EIGEN_GPU_COMPILE_PHASE) // instead of // #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) // // For cases where the tweak is specific to HIP, the code should be guarded with // #if defined(EIGEN_HIP_DEVICE_COMPILE) // // For cases where the tweak is specific to CUDA, the code should be guarded with // #if defined(EIGEN_CUDA_ARCH) // #endif #if defined(EIGEN_USE_SYCL) && defined(__SYCL_DEVICE_ONLY__) // EIGEN_USE_SYCL is a user-defined macro while __SYCL_DEVICE_ONLY__ is a compiler-defined macro. // In most cases we want to check if both macros are defined which can be done using the define below. #define SYCL_DEVICE_ONLY #endif //------------------------------------------------------------------------------------------ // Detect Compiler/Architecture/OS specific features //------------------------------------------------------------------------------------------ #if EIGEN_GNUC_AT_MOST(4,3) && !EIGEN_COMP_CLANG // see bug 89 #define EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO 0 #else #define EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO 1 #endif // Cross compiler wrapper around LLVM's __has_builtin #ifdef __has_builtin # define EIGEN_HAS_BUILTIN(x) __has_builtin(x) #else # define EIGEN_HAS_BUILTIN(x) 0 #endif // A Clang feature extension to determine compiler features. // We use it to determine 'cxx_rvalue_references' #ifndef __has_feature # define __has_feature(x) 0 #endif // Some old compilers do not support template specializations like: // template void foo(const T x[N]); #if !( EIGEN_COMP_CLANG && ( (EIGEN_COMP_CLANG<309) \ || (defined(__apple_build_version__) && (__apple_build_version__ < 9000000))) \ || EIGEN_COMP_GNUC_STRICT && EIGEN_COMP_GNUC<49) #define EIGEN_HAS_STATIC_ARRAY_TEMPLATE 1 #else #define EIGEN_HAS_STATIC_ARRAY_TEMPLATE 0 #endif // The macro EIGEN_CPLUSPLUS is a replacement for __cplusplus/_MSVC_LANG that // works for both platforms, indicating the C++ standard version number. // // With MSVC, without defining /Zc:__cplusplus, the __cplusplus macro will // report 199711L regardless of the language standard specified via /std. // We need to rely on _MSVC_LANG instead, which is only available after // VS2015.3. #if EIGEN_COMP_MSVC_LANG > 0 #define EIGEN_CPLUSPLUS EIGEN_COMP_MSVC_LANG #elif EIGEN_COMP_MSVC >= 1900 #define EIGEN_CPLUSPLUS 201103L #elif defined(__cplusplus) #define EIGEN_CPLUSPLUS __cplusplus #else #define EIGEN_CPLUSPLUS 0 #endif // The macro EIGEN_COMP_CXXVER defines the c++ verson expected by the compiler. // For instance, if compiling with gcc and -std=c++17, then EIGEN_COMP_CXXVER // is defined to 17. #if EIGEN_CPLUSPLUS > 201703L #define EIGEN_COMP_CXXVER 20 #elif EIGEN_CPLUSPLUS > 201402L #define EIGEN_COMP_CXXVER 17 #elif EIGEN_CPLUSPLUS > 201103L #define EIGEN_COMP_CXXVER 14 #elif EIGEN_CPLUSPLUS >= 201103L #define EIGEN_COMP_CXXVER 11 #else #define EIGEN_COMP_CXXVER 03 #endif #ifndef EIGEN_HAS_CXX14_VARIABLE_TEMPLATES #if defined(__cpp_variable_templates) && __cpp_variable_templates >= 201304 && EIGEN_MAX_CPP_VER>=14 #define EIGEN_HAS_CXX14_VARIABLE_TEMPLATES 1 #else #define EIGEN_HAS_CXX14_VARIABLE_TEMPLATES 0 #endif #endif // The macros EIGEN_HAS_CXX?? defines a rough estimate of available c++ features // but in practice we should not rely on them but rather on the availabilty of // individual features as defined later. // This is why there is no EIGEN_HAS_CXX17. // FIXME: get rid of EIGEN_HAS_CXX14 and maybe even EIGEN_HAS_CXX11. #if EIGEN_MAX_CPP_VER>=11 && EIGEN_COMP_CXXVER>=11 #define EIGEN_HAS_CXX11 1 #else #define EIGEN_HAS_CXX11 0 #endif #if EIGEN_MAX_CPP_VER>=14 && EIGEN_COMP_CXXVER>=14 #define EIGEN_HAS_CXX14 1 #else #define EIGEN_HAS_CXX14 0 #endif // Do we support r-value references? #ifndef EIGEN_HAS_RVALUE_REFERENCES #if EIGEN_MAX_CPP_VER>=11 && \ (__has_feature(cxx_rvalue_references) || \ (EIGEN_COMP_CXXVER >= 11) || (EIGEN_COMP_MSVC >= 1600)) #define EIGEN_HAS_RVALUE_REFERENCES 1 #else #define EIGEN_HAS_RVALUE_REFERENCES 0 #endif #endif // Does the compiler support C99? // Need to include to make sure _GLIBCXX_USE_C99 gets defined #include #ifndef EIGEN_HAS_C99_MATH #if EIGEN_MAX_CPP_VER>=11 && \ ((defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901)) \ || (defined(__GNUC__) && defined(_GLIBCXX_USE_C99)) \ || (defined(_LIBCPP_VERSION) && !defined(_MSC_VER)) \ || (EIGEN_COMP_MSVC >= 1900) || defined(SYCL_DEVICE_ONLY)) #define EIGEN_HAS_C99_MATH 1 #else #define EIGEN_HAS_C99_MATH 0 #endif #endif // Does the compiler support result_of? // result_of was deprecated in c++17 and removed in c++ 20 #ifndef EIGEN_HAS_STD_RESULT_OF #if EIGEN_HAS_CXX11 && EIGEN_COMP_CXXVER < 17 #define EIGEN_HAS_STD_RESULT_OF 1 #else #define EIGEN_HAS_STD_RESULT_OF 0 #endif #endif // Does the compiler support std::hash? #ifndef EIGEN_HAS_STD_HASH // The std::hash struct is defined in C++11 but is not labelled as a __device__ // function and is not constexpr, so cannot be used on device. #if EIGEN_HAS_CXX11 && !defined(EIGEN_GPU_COMPILE_PHASE) #define EIGEN_HAS_STD_HASH 1 #else #define EIGEN_HAS_STD_HASH 0 #endif #endif // EIGEN_HAS_STD_HASH #ifndef EIGEN_HAS_STD_INVOKE_RESULT #if EIGEN_MAX_CPP_VER >= 17 && EIGEN_COMP_CXXVER >= 17 #define EIGEN_HAS_STD_INVOKE_RESULT 1 #else #define EIGEN_HAS_STD_INVOKE_RESULT 0 #endif #endif #ifndef EIGEN_HAS_ALIGNAS #if EIGEN_MAX_CPP_VER>=11 && EIGEN_HAS_CXX11 && \ ( __has_feature(cxx_alignas) \ || EIGEN_HAS_CXX14 \ || (EIGEN_COMP_MSVC >= 1800) \ || (EIGEN_GNUC_AT_LEAST(4,8)) \ || (EIGEN_COMP_CLANG>=305) \ || (EIGEN_COMP_ICC>=1500) \ || (EIGEN_COMP_PGI>=1500) \ || (EIGEN_COMP_SUNCC>=0x5130)) #define EIGEN_HAS_ALIGNAS 1 #else #define EIGEN_HAS_ALIGNAS 0 #endif #endif // Does the compiler support type_traits? // - full support of type traits was added only to GCC 5.1.0. // - 20150626 corresponds to the last release of 4.x libstdc++ #ifndef EIGEN_HAS_TYPE_TRAITS #if EIGEN_MAX_CPP_VER>=11 && (EIGEN_HAS_CXX11 || EIGEN_COMP_MSVC >= 1700) \ && ((!EIGEN_COMP_GNUC_STRICT) || EIGEN_GNUC_AT_LEAST(5, 1)) \ && ((!defined(__GLIBCXX__)) || __GLIBCXX__ > 20150626) #define EIGEN_HAS_TYPE_TRAITS 1 #define EIGEN_INCLUDE_TYPE_TRAITS #else #define EIGEN_HAS_TYPE_TRAITS 0 #endif #endif // Does the compiler support variadic templates? #ifndef EIGEN_HAS_VARIADIC_TEMPLATES #if EIGEN_MAX_CPP_VER>=11 && (EIGEN_COMP_CXXVER >= 11) \ && (!defined(__NVCC__) || !EIGEN_ARCH_ARM_OR_ARM64 || (EIGEN_COMP_NVCC >= 80000) ) // ^^ Disable the use of variadic templates when compiling with versions of nvcc older than 8.0 on ARM devices: // this prevents nvcc from crashing when compiling Eigen on Tegra X1 #define EIGEN_HAS_VARIADIC_TEMPLATES 1 #elif EIGEN_MAX_CPP_VER>=11 && (EIGEN_COMP_CXXVER >= 11) && defined(SYCL_DEVICE_ONLY) #define EIGEN_HAS_VARIADIC_TEMPLATES 1 #else #define EIGEN_HAS_VARIADIC_TEMPLATES 0 #endif #endif // Does the compiler fully support const expressions? (as in c++14) #ifndef EIGEN_HAS_CONSTEXPR #if defined(EIGEN_CUDACC) // Const expressions are supported provided that c++11 is enabled and we're using either clang or nvcc 7.5 or above #if EIGEN_MAX_CPP_VER>=14 && (EIGEN_COMP_CXXVER >= 11 && (EIGEN_COMP_CLANG || EIGEN_COMP_NVCC >= 70500)) #define EIGEN_HAS_CONSTEXPR 1 #endif #elif EIGEN_MAX_CPP_VER>=14 && (__has_feature(cxx_relaxed_constexpr) || (EIGEN_COMP_CXXVER >= 14) || \ (EIGEN_GNUC_AT_LEAST(4,8) && (EIGEN_COMP_CXXVER >= 11)) || \ (EIGEN_COMP_CLANG >= 306 && (EIGEN_COMP_CXXVER >= 11))) #define EIGEN_HAS_CONSTEXPR 1 #endif #ifndef EIGEN_HAS_CONSTEXPR #define EIGEN_HAS_CONSTEXPR 0 #endif #endif // EIGEN_HAS_CONSTEXPR #if EIGEN_HAS_CONSTEXPR #define EIGEN_CONSTEXPR constexpr #else #define EIGEN_CONSTEXPR #endif // Does the compiler support C++11 math? // Let's be conservative and enable the default C++11 implementation only if we are sure it exists #ifndef EIGEN_HAS_CXX11_MATH #if EIGEN_MAX_CPP_VER>=11 && ((EIGEN_COMP_CXXVER > 11) || (EIGEN_COMP_CXXVER == 11) && (EIGEN_COMP_GNUC_STRICT || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC || EIGEN_COMP_ICC) \ && (EIGEN_ARCH_i386_OR_x86_64) && (EIGEN_OS_GNULINUX || EIGEN_OS_WIN_STRICT || EIGEN_OS_MAC)) #define EIGEN_HAS_CXX11_MATH 1 #else #define EIGEN_HAS_CXX11_MATH 0 #endif #endif // Does the compiler support proper C++11 containers? #ifndef EIGEN_HAS_CXX11_CONTAINERS #if EIGEN_MAX_CPP_VER>=11 && \ ((EIGEN_COMP_CXXVER > 11) \ || ((EIGEN_COMP_CXXVER == 11) && (EIGEN_COMP_GNUC_STRICT || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC || EIGEN_COMP_ICC>=1400))) #define EIGEN_HAS_CXX11_CONTAINERS 1 #else #define EIGEN_HAS_CXX11_CONTAINERS 0 #endif #endif // Does the compiler support C++11 noexcept? #ifndef EIGEN_HAS_CXX11_NOEXCEPT #if EIGEN_MAX_CPP_VER>=11 && \ (__has_feature(cxx_noexcept) \ || (EIGEN_COMP_CXXVER > 11) \ || ((EIGEN_COMP_CXXVER == 11) && (EIGEN_COMP_GNUC_STRICT || EIGEN_COMP_CLANG || EIGEN_COMP_MSVC || EIGEN_COMP_ICC>=1400))) #define EIGEN_HAS_CXX11_NOEXCEPT 1 #else #define EIGEN_HAS_CXX11_NOEXCEPT 0 #endif #endif #ifndef EIGEN_HAS_CXX11_ATOMIC #if EIGEN_MAX_CPP_VER>=11 && \ (__has_feature(cxx_atomic) \ || (EIGEN_COMP_CXXVER > 11) \ || ((EIGEN_COMP_CXXVER == 11) && (EIGEN_COMP_MSVC==0 || EIGEN_COMP_MSVC >= 1700))) #define EIGEN_HAS_CXX11_ATOMIC 1 #else #define EIGEN_HAS_CXX11_ATOMIC 0 #endif #endif #ifndef EIGEN_HAS_CXX11_OVERRIDE_FINAL #if EIGEN_MAX_CPP_VER>=11 && \ (EIGEN_COMP_CXXVER >= 11 || EIGEN_COMP_MSVC >= 1700) #define EIGEN_HAS_CXX11_OVERRIDE_FINAL 1 #else #define EIGEN_HAS_CXX11_OVERRIDE_FINAL 0 #endif #endif // NOTE: the required Apple's clang version is very conservative // and it could be that XCode 9 works just fine. // NOTE: the MSVC version is based on https://en.cppreference.com/w/cpp/compiler_support // and not tested. #ifndef EIGEN_HAS_CXX17_OVERALIGN #if EIGEN_MAX_CPP_VER>=17 && EIGEN_COMP_CXXVER>=17 && ( \ (EIGEN_COMP_MSVC >= 1912) \ || (EIGEN_GNUC_AT_LEAST(7,0)) \ || ((!defined(__apple_build_version__)) && (EIGEN_COMP_CLANG>=500)) \ || (( defined(__apple_build_version__)) && (__apple_build_version__>=10000000)) \ ) #define EIGEN_HAS_CXX17_OVERALIGN 1 #else #define EIGEN_HAS_CXX17_OVERALIGN 0 #endif #endif #if defined(EIGEN_CUDACC) && EIGEN_HAS_CONSTEXPR // While available already with c++11, this is useful mostly starting with c++14 and relaxed constexpr rules #if defined(__NVCC__) // nvcc considers constexpr functions as __host__ __device__ with the option --expt-relaxed-constexpr #ifdef __CUDACC_RELAXED_CONSTEXPR__ #define EIGEN_CONSTEXPR_ARE_DEVICE_FUNC #endif #elif defined(__clang__) && defined(__CUDA__) && __has_feature(cxx_relaxed_constexpr) // clang++ always considers constexpr functions as implicitly __host__ __device__ #define EIGEN_CONSTEXPR_ARE_DEVICE_FUNC #endif #endif // Does the compiler support the __int128 and __uint128_t extensions for 128-bit // integer arithmetic? // // Clang and GCC define __SIZEOF_INT128__ when these extensions are supported, // but we avoid using them in certain cases: // // * Building using Clang for Windows, where the Clang runtime library has // 128-bit support only on LP64 architectures, but Windows is LLP64. #ifndef EIGEN_HAS_BUILTIN_INT128 #if defined(__SIZEOF_INT128__) && !(EIGEN_OS_WIN && EIGEN_COMP_CLANG) #define EIGEN_HAS_BUILTIN_INT128 1 #else #define EIGEN_HAS_BUILTIN_INT128 0 #endif #endif //------------------------------------------------------------------------------------------ // Preprocessor programming helpers //------------------------------------------------------------------------------------------ // This macro can be used to prevent from macro expansion, e.g.: // std::max EIGEN_NOT_A_MACRO(a,b) #define EIGEN_NOT_A_MACRO #define EIGEN_DEBUG_VAR(x) std::cerr << #x << " = " << x << std::endl; // concatenate two tokens #define EIGEN_CAT2(a,b) a ## b #define EIGEN_CAT(a,b) EIGEN_CAT2(a,b) #define EIGEN_COMMA , // convert a token to a string #define EIGEN_MAKESTRING2(a) #a #define EIGEN_MAKESTRING(a) EIGEN_MAKESTRING2(a) // EIGEN_STRONG_INLINE is a stronger version of the inline, using __forceinline on MSVC, // but it still doesn't use GCC's always_inline. This is useful in (common) situations where MSVC needs forceinline // but GCC is still doing fine with just inline. #ifndef EIGEN_STRONG_INLINE #if (EIGEN_COMP_MSVC || EIGEN_COMP_ICC) && !defined(EIGEN_GPUCC) #define EIGEN_STRONG_INLINE __forceinline #else #define EIGEN_STRONG_INLINE inline #endif #endif // EIGEN_ALWAYS_INLINE is the stronget, it has the effect of making the function inline and adding every possible // attribute to maximize inlining. This should only be used when really necessary: in particular, // it uses __attribute__((always_inline)) on GCC, which most of the time is useless and can severely harm compile times. // FIXME with the always_inline attribute, // gcc 3.4.x and 4.1 reports the following compilation error: // Eval.h:91: sorry, unimplemented: inlining failed in call to 'const Eigen::Eval Eigen::MatrixBase::eval() const' // : function body not available // See also bug 1367 #if EIGEN_GNUC_AT_LEAST(4,2) && !defined(SYCL_DEVICE_ONLY) #define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline #else #define EIGEN_ALWAYS_INLINE EIGEN_STRONG_INLINE #endif #if EIGEN_COMP_GNUC #define EIGEN_DONT_INLINE __attribute__((noinline)) #elif EIGEN_COMP_MSVC #define EIGEN_DONT_INLINE __declspec(noinline) #else #define EIGEN_DONT_INLINE #endif #if EIGEN_COMP_GNUC #define EIGEN_PERMISSIVE_EXPR __extension__ #else #define EIGEN_PERMISSIVE_EXPR #endif // GPU stuff // Disable some features when compiling with GPU compilers (NVCC/clang-cuda/SYCL/HIPCC) #if defined(EIGEN_CUDACC) || defined(SYCL_DEVICE_ONLY) || defined(EIGEN_HIPCC) // Do not try asserts on device code #ifndef EIGEN_NO_DEBUG #define EIGEN_NO_DEBUG #endif #ifdef EIGEN_INTERNAL_DEBUGGING #undef EIGEN_INTERNAL_DEBUGGING #endif #ifdef EIGEN_EXCEPTIONS #undef EIGEN_EXCEPTIONS #endif #endif #if defined(SYCL_DEVICE_ONLY) #ifndef EIGEN_DONT_VECTORIZE #define EIGEN_DONT_VECTORIZE #endif #define EIGEN_DEVICE_FUNC __attribute__((flatten)) __attribute__((always_inline)) // All functions callable from CUDA/HIP code must be qualified with __device__ #elif defined(EIGEN_GPUCC) #define EIGEN_DEVICE_FUNC __host__ __device__ #else #define EIGEN_DEVICE_FUNC #endif // this macro allows to get rid of linking errors about multiply defined functions. // - static is not very good because it prevents definitions from different object files to be merged. // So static causes the resulting linked executable to be bloated with multiple copies of the same function. // - inline is not perfect either as it unwantedly hints the compiler toward inlining the function. #define EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_DEVICE_FUNC #define EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_DEVICE_FUNC inline #ifdef NDEBUG # ifndef EIGEN_NO_DEBUG # define EIGEN_NO_DEBUG # endif #endif // eigen_plain_assert is where we implement the workaround for the assert() bug in GCC <= 4.3, see bug 89 #ifdef EIGEN_NO_DEBUG #ifdef SYCL_DEVICE_ONLY // used to silence the warning on SYCL device #define eigen_plain_assert(x) EIGEN_UNUSED_VARIABLE(x) #else #define eigen_plain_assert(x) #endif #else #if EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO namespace Eigen { namespace internal { inline bool copy_bool(bool b) { return b; } } } #define eigen_plain_assert(x) assert(x) #else // work around bug 89 #include // for abort #include // for std::cerr namespace Eigen { namespace internal { // trivial function copying a bool. Must be EIGEN_DONT_INLINE, so we implement it after including Eigen headers. // see bug 89. namespace { EIGEN_DONT_INLINE bool copy_bool(bool b) { return b; } } inline void assert_fail(const char *condition, const char *function, const char *file, int line) { std::cerr << "assertion failed: " << condition << " in function " << function << " at " << file << ":" << line << std::endl; abort(); } } } #define eigen_plain_assert(x) \ do { \ if(!Eigen::internal::copy_bool(x)) \ Eigen::internal::assert_fail(EIGEN_MAKESTRING(x), __PRETTY_FUNCTION__, __FILE__, __LINE__); \ } while(false) #endif #endif // eigen_assert can be overridden #ifndef eigen_assert #define eigen_assert(x) eigen_plain_assert(x) #endif #ifdef EIGEN_INTERNAL_DEBUGGING #define eigen_internal_assert(x) eigen_assert(x) #else #define eigen_internal_assert(x) #endif #ifdef EIGEN_NO_DEBUG #define EIGEN_ONLY_USED_FOR_DEBUG(x) EIGEN_UNUSED_VARIABLE(x) #else #define EIGEN_ONLY_USED_FOR_DEBUG(x) #endif #ifndef EIGEN_NO_DEPRECATED_WARNING #if EIGEN_COMP_GNUC #define EIGEN_DEPRECATED __attribute__((deprecated)) #elif EIGEN_COMP_MSVC #define EIGEN_DEPRECATED __declspec(deprecated) #else #define EIGEN_DEPRECATED #endif #else #define EIGEN_DEPRECATED #endif #if EIGEN_COMP_GNUC #define EIGEN_UNUSED __attribute__((unused)) #else #define EIGEN_UNUSED #endif // Suppresses 'unused variable' warnings. namespace Eigen { namespace internal { template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void ignore_unused_variable(const T&) {} } } #define EIGEN_UNUSED_VARIABLE(var) Eigen::internal::ignore_unused_variable(var); #if !defined(EIGEN_ASM_COMMENT) #if EIGEN_COMP_GNUC && (EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_ARM_OR_ARM64) #define EIGEN_ASM_COMMENT(X) __asm__("#" X) #else #define EIGEN_ASM_COMMENT(X) #endif #endif // Acts as a barrier preventing operations involving `X` from crossing. This // occurs, for example, in the fast rounding trick where a magic constant is // added then subtracted, which is otherwise compiled away with -ffast-math. // // See bug 1674 #if !defined(EIGEN_OPTIMIZATION_BARRIER) #if EIGEN_COMP_GNUC // According to https://gcc.gnu.org/onlinedocs/gcc/Constraints.html: // X: Any operand whatsoever. // r: A register operand is allowed provided that it is in a general // register. // g: Any register, memory or immediate integer operand is allowed, except // for registers that are not general registers. // w: (AArch32/AArch64) Floating point register, Advanced SIMD vector // register or SVE vector register. // x: (SSE) Any SSE register. // (AArch64) Like w, but restricted to registers 0 to 15 inclusive. // v: (PowerPC) An Altivec vector register. // wa:(PowerPC) A VSX register. // // "X" (uppercase) should work for all cases, though this seems to fail for // some versions of GCC for arm/aarch64 with // "error: inconsistent operand constraints in an 'asm'" // Clang x86_64/arm/aarch64 seems to require "g" to support both scalars and // vectors, otherwise // "error: non-trivial scalar-to-vector conversion, possible invalid // constraint for vector type" // // GCC for ppc64le generates an internal compiler error with x/X/g. // GCC for AVX generates an internal compiler error with X. // // Tested on icc/gcc/clang for sse, avx, avx2, avx512dq // gcc for arm, aarch64, // gcc for ppc64le, // both vectors and scalars. // // Note that this is restricted to plain types - this will not work // directly for std::complex, Eigen::half, Eigen::bfloat16. For these, // you will need to apply to the underlying POD type. #if EIGEN_ARCH_PPC && EIGEN_COMP_GNUC_STRICT // This seems to be broken on clang. Packet4f is loaded into a single // register rather than a vector, zeroing out some entries. Integer // types also generate a compile error. // General, Altivec, VSX. #define EIGEN_OPTIMIZATION_BARRIER(X) __asm__ ("" : "+r,v,wa" (X)); #elif EIGEN_ARCH_ARM_OR_ARM64 // General, NEON. #define EIGEN_OPTIMIZATION_BARRIER(X) __asm__ ("" : "+g,w" (X)); #elif EIGEN_ARCH_i386_OR_x86_64 // General, SSE. #define EIGEN_OPTIMIZATION_BARRIER(X) __asm__ ("" : "+g,x" (X)); #else // Not implemented for other architectures. #define EIGEN_OPTIMIZATION_BARRIER(X) #endif #else // Not implemented for other compilers. #define EIGEN_OPTIMIZATION_BARRIER(X) #endif #endif #if EIGEN_COMP_MSVC // NOTE MSVC often gives C4127 warnings with compiletime if statements. See bug 1362. // This workaround is ugly, but it does the job. # define EIGEN_CONST_CONDITIONAL(cond) (void)0, cond #else # define EIGEN_CONST_CONDITIONAL(cond) cond #endif #ifdef EIGEN_DONT_USE_RESTRICT_KEYWORD #define EIGEN_RESTRICT #endif #ifndef EIGEN_RESTRICT #define EIGEN_RESTRICT __restrict #endif #ifndef EIGEN_DEFAULT_IO_FORMAT #ifdef EIGEN_MAKING_DOCS // format used in Eigen's documentation // needed to define it here as escaping characters in CMake add_definition's argument seems very problematic. #define EIGEN_DEFAULT_IO_FORMAT Eigen::IOFormat(3, 0, " ", "\n", "", "") #else #define EIGEN_DEFAULT_IO_FORMAT Eigen::IOFormat() #endif #endif // just an empty macro ! #define EIGEN_EMPTY // When compiling CUDA/HIP device code with NVCC or HIPCC // pull in math functions from the global namespace. // In host mode, and when device code is compiled with clang, // use the std versions. #if (defined(EIGEN_CUDA_ARCH) && defined(__NVCC__)) || defined(EIGEN_HIP_DEVICE_COMPILE) #define EIGEN_USING_STD(FUNC) using ::FUNC; #else #define EIGEN_USING_STD(FUNC) using std::FUNC; #endif #if EIGEN_COMP_MSVC_STRICT && (EIGEN_COMP_MSVC < 1900 || (EIGEN_COMP_MSVC == 1900 && EIGEN_COMP_NVCC)) // For older MSVC versions, as well as 1900 && CUDA 8, using the base operator is necessary, // otherwise we get duplicate definition errors // For later MSVC versions, we require explicit operator= definition, otherwise we get // use of implicitly deleted operator errors. // (cf Bugs 920, 1000, 1324, 2291) #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ using Base::operator =; #elif EIGEN_COMP_CLANG // workaround clang bug (see http://forum.kde.org/viewtopic.php?f=74&t=102653) #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ using Base::operator =; \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Derived& other) { Base::operator=(other); return *this; } \ template \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase& other) { Base::operator=(other.derived()); return *this; } #else #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ using Base::operator =; \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Derived& other) \ { \ Base::operator=(other); \ return *this; \ } #endif /** * \internal * \brief Macro to explicitly define the default copy constructor. * This is necessary, because the implicit definition is deprecated if the copy-assignment is overridden. */ #if EIGEN_HAS_CXX11 #define EIGEN_DEFAULT_COPY_CONSTRUCTOR(CLASS) CLASS(const CLASS&) = default; #else #define EIGEN_DEFAULT_COPY_CONSTRUCTOR(CLASS) #endif /** \internal * \brief Macro to manually inherit assignment operators. * This is necessary, because the implicitly defined assignment operator gets deleted when a custom operator= is defined. * With C++11 or later this also default-implements the copy-constructor */ #define EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Derived) \ EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \ EIGEN_DEFAULT_COPY_CONSTRUCTOR(Derived) /** \internal * \brief Macro to manually define default constructors and destructors. * This is necessary when the copy constructor is re-defined. * For empty helper classes this should usually be protected, to avoid accidentally creating empty objects. * * Hiding the default destructor lead to problems in C++03 mode together with boost::multiprecision */ #if EIGEN_HAS_CXX11 #define EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(Derived) \ Derived() = default; \ ~Derived() = default; #else #define EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(Derived) \ Derived() {}; \ /* ~Derived() {}; */ #endif /** * Just a side note. Commenting within defines works only by documenting * behind the object (via '!<'). Comments cannot be multi-line and thus * we have these extra long lines. What is confusing doxygen over here is * that we use '\' and basically have a bunch of typedefs with their * documentation in a single line. **/ #define EIGEN_GENERIC_PUBLIC_INTERFACE(Derived) \ typedef typename Eigen::internal::traits::Scalar Scalar; /*!< \brief Numeric type, e.g. float, double, int or std::complex. */ \ typedef typename Eigen::NumTraits::Real RealScalar; /*!< \brief The underlying numeric type for composed scalar types. \details In cases where Scalar is e.g. std::complex, T were corresponding to RealScalar. */ \ typedef typename Base::CoeffReturnType CoeffReturnType; /*!< \brief The return type for coefficient access. \details Depending on whether the object allows direct coefficient access (e.g. for a MatrixXd), this type is either 'const Scalar&' or simply 'Scalar' for objects that do not allow direct coefficient access. */ \ typedef typename Eigen::internal::ref_selector::type Nested; \ typedef typename Eigen::internal::traits::StorageKind StorageKind; \ typedef typename Eigen::internal::traits::StorageIndex StorageIndex; \ enum CompileTimeTraits \ { RowsAtCompileTime = Eigen::internal::traits::RowsAtCompileTime, \ ColsAtCompileTime = Eigen::internal::traits::ColsAtCompileTime, \ Flags = Eigen::internal::traits::Flags, \ SizeAtCompileTime = Base::SizeAtCompileTime, \ MaxSizeAtCompileTime = Base::MaxSizeAtCompileTime, \ IsVectorAtCompileTime = Base::IsVectorAtCompileTime }; \ using Base::derived; \ using Base::const_cast_derived; // FIXME Maybe the EIGEN_DENSE_PUBLIC_INTERFACE could be removed as importing PacketScalar is rarely needed #define EIGEN_DENSE_PUBLIC_INTERFACE(Derived) \ EIGEN_GENERIC_PUBLIC_INTERFACE(Derived) \ typedef typename Base::PacketScalar PacketScalar; #define EIGEN_PLAIN_ENUM_MIN(a,b) (((int)a <= (int)b) ? (int)a : (int)b) #define EIGEN_PLAIN_ENUM_MAX(a,b) (((int)a >= (int)b) ? (int)a : (int)b) // EIGEN_SIZE_MIN_PREFER_DYNAMIC gives the min between compile-time sizes. 0 has absolute priority, followed by 1, // followed by Dynamic, followed by other finite values. The reason for giving Dynamic the priority over // finite values is that min(3, Dynamic) should be Dynamic, since that could be anything between 0 and 3. #define EIGEN_SIZE_MIN_PREFER_DYNAMIC(a,b) (((int)a == 0 || (int)b == 0) ? 0 \ : ((int)a == 1 || (int)b == 1) ? 1 \ : ((int)a == Dynamic || (int)b == Dynamic) ? Dynamic \ : ((int)a <= (int)b) ? (int)a : (int)b) // EIGEN_SIZE_MIN_PREFER_FIXED is a variant of EIGEN_SIZE_MIN_PREFER_DYNAMIC comparing MaxSizes. The difference is that finite values // now have priority over Dynamic, so that min(3, Dynamic) gives 3. Indeed, whatever the actual value is // (between 0 and 3), it is not more than 3. #define EIGEN_SIZE_MIN_PREFER_FIXED(a,b) (((int)a == 0 || (int)b == 0) ? 0 \ : ((int)a == 1 || (int)b == 1) ? 1 \ : ((int)a == Dynamic && (int)b == Dynamic) ? Dynamic \ : ((int)a == Dynamic) ? (int)b \ : ((int)b == Dynamic) ? (int)a \ : ((int)a <= (int)b) ? (int)a : (int)b) // see EIGEN_SIZE_MIN_PREFER_DYNAMIC. No need for a separate variant for MaxSizes here. #define EIGEN_SIZE_MAX(a,b) (((int)a == Dynamic || (int)b == Dynamic) ? Dynamic \ : ((int)a >= (int)b) ? (int)a : (int)b) #define EIGEN_LOGICAL_XOR(a,b) (((a) || (b)) && !((a) && (b))) #define EIGEN_IMPLIES(a,b) (!(a) || (b)) #if EIGEN_HAS_BUILTIN(__builtin_expect) || EIGEN_COMP_GNUC #define EIGEN_PREDICT_FALSE(x) (__builtin_expect(x, false)) #define EIGEN_PREDICT_TRUE(x) (__builtin_expect(false || (x), true)) #else #define EIGEN_PREDICT_FALSE(x) (x) #define EIGEN_PREDICT_TRUE(x) (x) #endif // the expression type of a standard coefficient wise binary operation #define EIGEN_CWISE_BINARY_RETURN_TYPE(LHS,RHS,OPNAME) \ CwiseBinaryOp< \ EIGEN_CAT(EIGEN_CAT(internal::scalar_,OPNAME),_op)< \ typename internal::traits::Scalar, \ typename internal::traits::Scalar \ >, \ const LHS, \ const RHS \ > #define EIGEN_MAKE_CWISE_BINARY_OP(METHOD,OPNAME) \ template \ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const EIGEN_CWISE_BINARY_RETURN_TYPE(Derived,OtherDerived,OPNAME) \ (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS &other) const \ { \ return EIGEN_CWISE_BINARY_RETURN_TYPE(Derived,OtherDerived,OPNAME)(derived(), other.derived()); \ } #define EIGEN_SCALAR_BINARY_SUPPORTED(OPNAME,TYPEA,TYPEB) \ (Eigen::internal::has_ReturnType > >::value) #define EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(EXPR,SCALAR,OPNAME) \ CwiseBinaryOp::Scalar,SCALAR>, const EXPR, \ const typename internal::plain_constant_type::type> #define EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(SCALAR,EXPR,OPNAME) \ CwiseBinaryOp::Scalar>, \ const typename internal::plain_constant_type::type, const EXPR> // Workaround for MSVC 2010 (see ML thread "patch with compile for for MSVC 2010") #if EIGEN_COMP_MSVC_STRICT && (EIGEN_COMP_MSVC_STRICT<=1600) #define EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(X) typename internal::enable_if::type #else #define EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(X) X #endif #define EIGEN_MAKE_SCALAR_BINARY_OP_ONTHERIGHT(METHOD,OPNAME) \ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE \ EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,typename internal::promote_scalar_arg::type,OPNAME))\ (METHOD)(const T& scalar) const { \ typedef typename internal::promote_scalar_arg::type PromotedT; \ return EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,PromotedT,OPNAME)(derived(), \ typename internal::plain_constant_type::type(derived().rows(), derived().cols(), internal::scalar_constant_op(scalar))); \ } #define EIGEN_MAKE_SCALAR_BINARY_OP_ONTHELEFT(METHOD,OPNAME) \ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE friend \ EIGEN_MSVC10_WORKAROUND_BINARYOP_RETURN_TYPE(const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename internal::promote_scalar_arg::type,Derived,OPNAME)) \ (METHOD)(const T& scalar, const StorageBaseType& matrix) { \ typedef typename internal::promote_scalar_arg::type PromotedT; \ return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(PromotedT,Derived,OPNAME)( \ typename internal::plain_constant_type::type(matrix.derived().rows(), matrix.derived().cols(), internal::scalar_constant_op(scalar)), matrix.derived()); \ } #define EIGEN_MAKE_SCALAR_BINARY_OP(METHOD,OPNAME) \ EIGEN_MAKE_SCALAR_BINARY_OP_ONTHELEFT(METHOD,OPNAME) \ EIGEN_MAKE_SCALAR_BINARY_OP_ONTHERIGHT(METHOD,OPNAME) #if (defined(_CPPUNWIND) || defined(__EXCEPTIONS)) && !defined(EIGEN_CUDA_ARCH) && !defined(EIGEN_EXCEPTIONS) && !defined(EIGEN_USE_SYCL) && !defined(EIGEN_HIP_DEVICE_COMPILE) #define EIGEN_EXCEPTIONS #endif #ifdef EIGEN_EXCEPTIONS # define EIGEN_THROW_X(X) throw X # define EIGEN_THROW throw # define EIGEN_TRY try # define EIGEN_CATCH(X) catch (X) #else # if defined(EIGEN_CUDA_ARCH) # define EIGEN_THROW_X(X) asm("trap;") # define EIGEN_THROW asm("trap;") # elif defined(EIGEN_HIP_DEVICE_COMPILE) # define EIGEN_THROW_X(X) asm("s_trap 0") # define EIGEN_THROW asm("s_trap 0") # else # define EIGEN_THROW_X(X) std::abort() # define EIGEN_THROW std::abort() # endif # define EIGEN_TRY if (true) # define EIGEN_CATCH(X) else #endif #if EIGEN_HAS_CXX11_NOEXCEPT # define EIGEN_INCLUDE_TYPE_TRAITS # define EIGEN_NOEXCEPT noexcept # define EIGEN_NOEXCEPT_IF(x) noexcept(x) # define EIGEN_NO_THROW noexcept(true) # define EIGEN_EXCEPTION_SPEC(X) noexcept(false) #else # define EIGEN_NOEXCEPT # define EIGEN_NOEXCEPT_IF(x) # define EIGEN_NO_THROW throw() # if EIGEN_COMP_MSVC || EIGEN_COMP_CXXVER>=17 // MSVC does not support exception specifications (warning C4290), // and they are deprecated in c++11 anyway. This is even an error in c++17. # define EIGEN_EXCEPTION_SPEC(X) throw() # else # define EIGEN_EXCEPTION_SPEC(X) throw(X) # endif #endif #if EIGEN_HAS_VARIADIC_TEMPLATES // The all function is used to enable a variadic version of eigen_assert which can take a parameter pack as its input. namespace Eigen { namespace internal { inline bool all(){ return true; } template bool all(T t, Ts ... ts){ return t && all(ts...); } } } #endif #if EIGEN_HAS_CXX11_OVERRIDE_FINAL // provide override and final specifiers if they are available: # define EIGEN_OVERRIDE override # define EIGEN_FINAL final #else # define EIGEN_OVERRIDE # define EIGEN_FINAL #endif // Wrapping #pragma unroll in a macro since it is required for SYCL #if defined(SYCL_DEVICE_ONLY) #if defined(_MSC_VER) #define EIGEN_UNROLL_LOOP __pragma(unroll) #else #define EIGEN_UNROLL_LOOP _Pragma("unroll") #endif #else #define EIGEN_UNROLL_LOOP #endif #endif // EIGEN_MACROS_H RcppEigen/inst/include/Eigen/src/Core/util/SymbolicIndex.h0000644000176200001440000002734314562417221023150 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2017 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SYMBOLIC_INDEX_H #define EIGEN_SYMBOLIC_INDEX_H namespace Eigen { /** \namespace Eigen::symbolic * \ingroup Core_Module * * This namespace defines a set of classes and functions to build and evaluate symbolic expressions of scalar type Index. * Here is a simple example: * * \code * // First step, defines symbols: * struct x_tag {}; static const symbolic::SymbolExpr x; * struct y_tag {}; static const symbolic::SymbolExpr y; * struct z_tag {}; static const symbolic::SymbolExpr z; * * // Defines an expression: * auto expr = (x+3)/y+z; * * // And evaluate it: (c++14) * std::cout << expr.eval(x=6,y=3,z=-13) << "\n"; * * // In c++98/11, only one symbol per expression is supported for now: * auto expr98 = (3-x)/2; * std::cout << expr98.eval(x=6) << "\n"; * \endcode * * It is currently only used internally to define and manipulate the Eigen::last and Eigen::lastp1 symbols in Eigen::seq and Eigen::seqN. * */ namespace symbolic { template class Symbol; template class NegateExpr; template class AddExpr; template class ProductExpr; template class QuotientExpr; // A simple wrapper around an integral value to provide the eval method. // We could also use a free-function symbolic_eval... template class ValueExpr { public: ValueExpr(IndexType val) : m_value(val) {} template IndexType eval_impl(const T&) const { return m_value; } protected: IndexType m_value; }; // Specialization for compile-time value, // It is similar to ValueExpr(N) but this version helps the compiler to generate better code. template class ValueExpr > { public: ValueExpr() {} template EIGEN_CONSTEXPR Index eval_impl(const T&) const { return N; } }; /** \class BaseExpr * \ingroup Core_Module * Common base class of any symbolic expressions */ template class BaseExpr { public: const Derived& derived() const { return *static_cast(this); } /** Evaluate the expression given the \a values of the symbols. * * \param values defines the values of the symbols, it can either be a SymbolValue or a std::tuple of SymbolValue * as constructed by SymbolExpr::operator= operator. * */ template Index eval(const T& values) const { return derived().eval_impl(values); } #if EIGEN_HAS_CXX14 template Index eval(Types&&... values) const { return derived().eval_impl(std::make_tuple(values...)); } #endif NegateExpr operator-() const { return NegateExpr(derived()); } AddExpr > operator+(Index b) const { return AddExpr >(derived(), b); } AddExpr > operator-(Index a) const { return AddExpr >(derived(), -a); } ProductExpr > operator*(Index a) const { return ProductExpr >(derived(),a); } QuotientExpr > operator/(Index a) const { return QuotientExpr >(derived(),a); } friend AddExpr > operator+(Index a, const BaseExpr& b) { return AddExpr >(b.derived(), a); } friend AddExpr,ValueExpr<> > operator-(Index a, const BaseExpr& b) { return AddExpr,ValueExpr<> >(-b.derived(), a); } friend ProductExpr,Derived> operator*(Index a, const BaseExpr& b) { return ProductExpr,Derived>(a,b.derived()); } friend QuotientExpr,Derived> operator/(Index a, const BaseExpr& b) { return QuotientExpr,Derived>(a,b.derived()); } template AddExpr > > operator+(internal::FixedInt) const { return AddExpr > >(derived(), ValueExpr >()); } template AddExpr > > operator-(internal::FixedInt) const { return AddExpr > >(derived(), ValueExpr >()); } template ProductExpr > > operator*(internal::FixedInt) const { return ProductExpr > >(derived(),ValueExpr >()); } template QuotientExpr > > operator/(internal::FixedInt) const { return QuotientExpr > >(derived(),ValueExpr >()); } template friend AddExpr > > operator+(internal::FixedInt, const BaseExpr& b) { return AddExpr > >(b.derived(), ValueExpr >()); } template friend AddExpr,ValueExpr > > operator-(internal::FixedInt, const BaseExpr& b) { return AddExpr,ValueExpr > >(-b.derived(), ValueExpr >()); } template friend ProductExpr >,Derived> operator*(internal::FixedInt, const BaseExpr& b) { return ProductExpr >,Derived>(ValueExpr >(),b.derived()); } template friend QuotientExpr >,Derived> operator/(internal::FixedInt, const BaseExpr& b) { return QuotientExpr > ,Derived>(ValueExpr >(),b.derived()); } #if (!EIGEN_HAS_CXX14) template AddExpr > > operator+(internal::FixedInt (*)()) const { return AddExpr > >(derived(), ValueExpr >()); } template AddExpr > > operator-(internal::FixedInt (*)()) const { return AddExpr > >(derived(), ValueExpr >()); } template ProductExpr > > operator*(internal::FixedInt (*)()) const { return ProductExpr > >(derived(),ValueExpr >()); } template QuotientExpr > > operator/(internal::FixedInt (*)()) const { return QuotientExpr > >(derived(),ValueExpr >()); } template friend AddExpr > > operator+(internal::FixedInt (*)(), const BaseExpr& b) { return AddExpr > >(b.derived(), ValueExpr >()); } template friend AddExpr,ValueExpr > > operator-(internal::FixedInt (*)(), const BaseExpr& b) { return AddExpr,ValueExpr > >(-b.derived(), ValueExpr >()); } template friend ProductExpr >,Derived> operator*(internal::FixedInt (*)(), const BaseExpr& b) { return ProductExpr >,Derived>(ValueExpr >(),b.derived()); } template friend QuotientExpr >,Derived> operator/(internal::FixedInt (*)(), const BaseExpr& b) { return QuotientExpr > ,Derived>(ValueExpr >(),b.derived()); } #endif template AddExpr operator+(const BaseExpr &b) const { return AddExpr(derived(), b.derived()); } template AddExpr > operator-(const BaseExpr &b) const { return AddExpr >(derived(), -b.derived()); } template ProductExpr operator*(const BaseExpr &b) const { return ProductExpr(derived(), b.derived()); } template QuotientExpr operator/(const BaseExpr &b) const { return QuotientExpr(derived(), b.derived()); } }; template struct is_symbolic { // BaseExpr has no conversion ctor, so we only have to check whether T can be statically cast to its base class BaseExpr. enum { value = internal::is_convertible >::value }; }; /** Represents the actual value of a symbol identified by its tag * * It is the return type of SymbolValue::operator=, and most of the time this is only way it is used. */ template class SymbolValue { public: /** Default constructor from the value \a val */ SymbolValue(Index val) : m_value(val) {} /** \returns the stored value of the symbol */ Index value() const { return m_value; } protected: Index m_value; }; /** Expression of a symbol uniquely identified by the template parameter type \c tag */ template class SymbolExpr : public BaseExpr > { public: /** Alias to the template parameter \c tag */ typedef tag Tag; SymbolExpr() {} /** Associate the value \a val to the given symbol \c *this, uniquely identified by its \c Tag. * * The returned object should be passed to ExprBase::eval() to evaluate a given expression with this specified runtime-time value. */ SymbolValue operator=(Index val) const { return SymbolValue(val); } Index eval_impl(const SymbolValue &values) const { return values.value(); } #if EIGEN_HAS_CXX14 // C++14 versions suitable for multiple symbols template Index eval_impl(const std::tuple& values) const { return std::get >(values).value(); } #endif }; template class NegateExpr : public BaseExpr > { public: NegateExpr(const Arg0& arg0) : m_arg0(arg0) {} template Index eval_impl(const T& values) const { return -m_arg0.eval_impl(values); } protected: Arg0 m_arg0; }; template class AddExpr : public BaseExpr > { public: AddExpr(const Arg0& arg0, const Arg1& arg1) : m_arg0(arg0), m_arg1(arg1) {} template Index eval_impl(const T& values) const { return m_arg0.eval_impl(values) + m_arg1.eval_impl(values); } protected: Arg0 m_arg0; Arg1 m_arg1; }; template class ProductExpr : public BaseExpr > { public: ProductExpr(const Arg0& arg0, const Arg1& arg1) : m_arg0(arg0), m_arg1(arg1) {} template Index eval_impl(const T& values) const { return m_arg0.eval_impl(values) * m_arg1.eval_impl(values); } protected: Arg0 m_arg0; Arg1 m_arg1; }; template class QuotientExpr : public BaseExpr > { public: QuotientExpr(const Arg0& arg0, const Arg1& arg1) : m_arg0(arg0), m_arg1(arg1) {} template Index eval_impl(const T& values) const { return m_arg0.eval_impl(values) / m_arg1.eval_impl(values); } protected: Arg0 m_arg0; Arg1 m_arg1; }; } // end namespace symbolic } // end namespace Eigen #endif // EIGEN_SYMBOLIC_INDEX_H RcppEigen/inst/include/Eigen/src/Core/util/Constants.h0000644000176200001440000005265314562417221022355 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2015 Gael Guennebaud // Copyright (C) 2007-2009 Benoit Jacob // Copyright (C) 2020, Arm Limited and Contributors // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_CONSTANTS_H #define EIGEN_CONSTANTS_H namespace Eigen { /** This value means that a positive quantity (e.g., a size) is not known at compile-time, and that instead the value is * stored in some runtime variable. * * Changing the value of Dynamic breaks the ABI, as Dynamic is often used as a template parameter for Matrix. */ const int Dynamic = -1; /** This value means that a signed quantity (e.g., a signed index) is not known at compile-time, and that instead its value * has to be specified at runtime. */ const int DynamicIndex = 0xffffff; /** This value means that the increment to go from one value to another in a sequence is not constant for each step. */ const int UndefinedIncr = 0xfffffe; /** This value means +Infinity; it is currently used only as the p parameter to MatrixBase::lpNorm(). * The value Infinity there means the L-infinity norm. */ const int Infinity = -1; /** This value means that the cost to evaluate an expression coefficient is either very expensive or * cannot be known at compile time. * * This value has to be positive to (1) simplify cost computation, and (2) allow to distinguish between a very expensive and very very expensive expressions. * It thus must also be large enough to make sure unrolling won't happen and that sub expressions will be evaluated, but not too large to avoid overflow. */ const int HugeCost = 10000; /** \defgroup flags Flags * \ingroup Core_Module * * These are the possible bits which can be OR'ed to constitute the flags of a matrix or * expression. * * It is important to note that these flags are a purely compile-time notion. They are a compile-time property of * an expression type, implemented as enum's. They are not stored in memory at runtime, and they do not incur any * runtime overhead. * * \sa MatrixBase::Flags */ /** \ingroup flags * * for a matrix, this means that the storage order is row-major. * If this bit is not set, the storage order is column-major. * For an expression, this determines the storage order of * the matrix created by evaluation of that expression. * \sa \blank \ref TopicStorageOrders */ const unsigned int RowMajorBit = 0x1; /** \ingroup flags * means the expression should be evaluated by the calling expression */ const unsigned int EvalBeforeNestingBit = 0x2; /** \ingroup flags * \deprecated * means the expression should be evaluated before any assignment */ EIGEN_DEPRECATED const unsigned int EvalBeforeAssigningBit = 0x4; // FIXME deprecated /** \ingroup flags * * Short version: means the expression might be vectorized * * Long version: means that the coefficients can be handled by packets * and start at a memory location whose alignment meets the requirements * of the present CPU architecture for optimized packet access. In the fixed-size * case, there is the additional condition that it be possible to access all the * coefficients by packets (this implies the requirement that the size be a multiple of 16 bytes, * and that any nontrivial strides don't break the alignment). In the dynamic-size case, * there is no such condition on the total size and strides, so it might not be possible to access * all coeffs by packets. * * \note This bit can be set regardless of whether vectorization is actually enabled. * To check for actual vectorizability, see \a ActualPacketAccessBit. */ const unsigned int PacketAccessBit = 0x8; #ifdef EIGEN_VECTORIZE /** \ingroup flags * * If vectorization is enabled (EIGEN_VECTORIZE is defined) this constant * is set to the value \a PacketAccessBit. * * If vectorization is not enabled (EIGEN_VECTORIZE is not defined) this constant * is set to the value 0. */ const unsigned int ActualPacketAccessBit = PacketAccessBit; #else const unsigned int ActualPacketAccessBit = 0x0; #endif /** \ingroup flags * * Short version: means the expression can be seen as 1D vector. * * Long version: means that one can access the coefficients * of this expression by coeff(int), and coeffRef(int) in the case of a lvalue expression. These * index-based access methods are guaranteed * to not have to do any runtime computation of a (row, col)-pair from the index, so that it * is guaranteed that whenever it is available, index-based access is at least as fast as * (row,col)-based access. Expressions for which that isn't possible don't have the LinearAccessBit. * * If both PacketAccessBit and LinearAccessBit are set, then the * packets of this expression can be accessed by packet(int), and writePacket(int) in the case of a * lvalue expression. * * Typically, all vector expressions have the LinearAccessBit, but there is one exception: * Product expressions don't have it, because it would be troublesome for vectorization, even when the * Product is a vector expression. Thus, vector Product expressions allow index-based coefficient access but * not index-based packet access, so they don't have the LinearAccessBit. */ const unsigned int LinearAccessBit = 0x10; /** \ingroup flags * * Means the expression has a coeffRef() method, i.e. is writable as its individual coefficients are directly addressable. * This rules out read-only expressions. * * Note that DirectAccessBit and LvalueBit are mutually orthogonal, as there are examples of expression having one but note * the other: * \li writable expressions that don't have a very simple memory layout as a strided array, have LvalueBit but not DirectAccessBit * \li Map-to-const expressions, for example Map, have DirectAccessBit but not LvalueBit * * Expressions having LvalueBit also have their coeff() method returning a const reference instead of returning a new value. */ const unsigned int LvalueBit = 0x20; /** \ingroup flags * * Means that the underlying array of coefficients can be directly accessed as a plain strided array. The memory layout * of the array of coefficients must be exactly the natural one suggested by rows(), cols(), * outerStride(), innerStride(), and the RowMajorBit. This rules out expressions such as Diagonal, whose coefficients, * though referencable, do not have such a regular memory layout. * * See the comment on LvalueBit for an explanation of how LvalueBit and DirectAccessBit are mutually orthogonal. */ const unsigned int DirectAccessBit = 0x40; /** \deprecated \ingroup flags * * means the first coefficient packet is guaranteed to be aligned. * An expression cannot have the AlignedBit without the PacketAccessBit flag. * In other words, this means we are allow to perform an aligned packet access to the first element regardless * of the expression kind: * \code * expression.packet(0); * \endcode */ EIGEN_DEPRECATED const unsigned int AlignedBit = 0x80; const unsigned int NestByRefBit = 0x100; /** \ingroup flags * * for an expression, this means that the storage order * can be either row-major or column-major. * The precise choice will be decided at evaluation time or when * combined with other expressions. * \sa \blank \ref RowMajorBit, \ref TopicStorageOrders */ const unsigned int NoPreferredStorageOrderBit = 0x200; /** \ingroup flags * * Means that the underlying coefficients can be accessed through pointers to the sparse (un)compressed storage format, * that is, the expression provides: * \code inline const Scalar* valuePtr() const; inline const Index* innerIndexPtr() const; inline const Index* outerIndexPtr() const; inline const Index* innerNonZeroPtr() const; \endcode */ const unsigned int CompressedAccessBit = 0x400; // list of flags that are inherited by default const unsigned int HereditaryBits = RowMajorBit | EvalBeforeNestingBit; /** \defgroup enums Enumerations * \ingroup Core_Module * * Various enumerations used in %Eigen. Many of these are used as template parameters. */ /** \ingroup enums * Enum containing possible values for the \c Mode or \c UpLo parameter of * MatrixBase::selfadjointView() and MatrixBase::triangularView(), and selfadjoint solvers. */ enum UpLoType { /** View matrix as a lower triangular matrix. */ Lower=0x1, /** View matrix as an upper triangular matrix. */ Upper=0x2, /** %Matrix has ones on the diagonal; to be used in combination with #Lower or #Upper. */ UnitDiag=0x4, /** %Matrix has zeros on the diagonal; to be used in combination with #Lower or #Upper. */ ZeroDiag=0x8, /** View matrix as a lower triangular matrix with ones on the diagonal. */ UnitLower=UnitDiag|Lower, /** View matrix as an upper triangular matrix with ones on the diagonal. */ UnitUpper=UnitDiag|Upper, /** View matrix as a lower triangular matrix with zeros on the diagonal. */ StrictlyLower=ZeroDiag|Lower, /** View matrix as an upper triangular matrix with zeros on the diagonal. */ StrictlyUpper=ZeroDiag|Upper, /** Used in BandMatrix and SelfAdjointView to indicate that the matrix is self-adjoint. */ SelfAdjoint=0x10, /** Used to support symmetric, non-selfadjoint, complex matrices. */ Symmetric=0x20 }; /** \ingroup enums * Enum for indicating whether a buffer is aligned or not. */ enum AlignmentType { Unaligned=0, /**< Data pointer has no specific alignment. */ Aligned8=8, /**< Data pointer is aligned on a 8 bytes boundary. */ Aligned16=16, /**< Data pointer is aligned on a 16 bytes boundary. */ Aligned32=32, /**< Data pointer is aligned on a 32 bytes boundary. */ Aligned64=64, /**< Data pointer is aligned on a 64 bytes boundary. */ Aligned128=128, /**< Data pointer is aligned on a 128 bytes boundary. */ AlignedMask=255, Aligned=16, /**< \deprecated Synonym for Aligned16. */ #if EIGEN_MAX_ALIGN_BYTES==128 AlignedMax = Aligned128 #elif EIGEN_MAX_ALIGN_BYTES==64 AlignedMax = Aligned64 #elif EIGEN_MAX_ALIGN_BYTES==32 AlignedMax = Aligned32 #elif EIGEN_MAX_ALIGN_BYTES==16 AlignedMax = Aligned16 #elif EIGEN_MAX_ALIGN_BYTES==8 AlignedMax = Aligned8 #elif EIGEN_MAX_ALIGN_BYTES==0 AlignedMax = Unaligned #else #error Invalid value for EIGEN_MAX_ALIGN_BYTES #endif }; /** \ingroup enums * Enum containing possible values for the \p Direction parameter of * Reverse, PartialReduxExpr and VectorwiseOp. */ enum DirectionType { /** For Reverse, all columns are reversed; * for PartialReduxExpr and VectorwiseOp, act on columns. */ Vertical, /** For Reverse, all rows are reversed; * for PartialReduxExpr and VectorwiseOp, act on rows. */ Horizontal, /** For Reverse, both rows and columns are reversed; * not used for PartialReduxExpr and VectorwiseOp. */ BothDirections }; /** \internal \ingroup enums * Enum to specify how to traverse the entries of a matrix. */ enum TraversalType { /** \internal Default traversal, no vectorization, no index-based access */ DefaultTraversal, /** \internal No vectorization, use index-based access to have only one for loop instead of 2 nested loops */ LinearTraversal, /** \internal Equivalent to a slice vectorization for fixed-size matrices having good alignment * and good size */ InnerVectorizedTraversal, /** \internal Vectorization path using a single loop plus scalar loops for the * unaligned boundaries */ LinearVectorizedTraversal, /** \internal Generic vectorization path using one vectorized loop per row/column with some * scalar loops to handle the unaligned boundaries */ SliceVectorizedTraversal, /** \internal Special case to properly handle incompatible scalar types or other defecting cases*/ InvalidTraversal, /** \internal Evaluate all entries at once */ AllAtOnceTraversal }; /** \internal \ingroup enums * Enum to specify whether to unroll loops when traversing over the entries of a matrix. */ enum UnrollingType { /** \internal Do not unroll loops. */ NoUnrolling, /** \internal Unroll only the inner loop, but not the outer loop. */ InnerUnrolling, /** \internal Unroll both the inner and the outer loop. If there is only one loop, * because linear traversal is used, then unroll that loop. */ CompleteUnrolling }; /** \internal \ingroup enums * Enum to specify whether to use the default (built-in) implementation or the specialization. */ enum SpecializedType { Specialized, BuiltIn }; /** \ingroup enums * Enum containing possible values for the \p _Options template parameter of * Matrix, Array and BandMatrix. */ enum StorageOptions { /** Storage order is column major (see \ref TopicStorageOrders). */ ColMajor = 0, /** Storage order is row major (see \ref TopicStorageOrders). */ RowMajor = 0x1, // it is only a coincidence that this is equal to RowMajorBit -- don't rely on that /** Align the matrix itself if it is vectorizable fixed-size */ AutoAlign = 0, /** Don't require alignment for the matrix itself (the array of coefficients, if dynamically allocated, may still be requested to be aligned) */ // FIXME --- clarify the situation DontAlign = 0x2 }; /** \ingroup enums * Enum for specifying whether to apply or solve on the left or right. */ enum SideType { /** Apply transformation on the left. */ OnTheLeft = 1, /** Apply transformation on the right. */ OnTheRight = 2 }; /** \ingroup enums * Enum for specifying NaN-propagation behavior, e.g. for coeff-wise min/max. */ enum NaNPropagationOptions { /** Implementation defined behavior if NaNs are present. */ PropagateFast = 0, /** Always propagate NaNs. */ PropagateNaN, /** Always propagate not-NaNs. */ PropagateNumbers }; /* the following used to be written as: * * struct NoChange_t {}; * namespace { * EIGEN_UNUSED NoChange_t NoChange; * } * * on the ground that it feels dangerous to disambiguate overloaded functions on enum/integer types. * However, this leads to "variable declared but never referenced" warnings on Intel Composer XE, * and we do not know how to get rid of them (bug 450). */ enum NoChange_t { NoChange }; enum Sequential_t { Sequential }; enum Default_t { Default }; /** \internal \ingroup enums * Used in AmbiVector. */ enum AmbiVectorMode { IsDense = 0, IsSparse }; /** \ingroup enums * Used as template parameter in DenseCoeffBase and MapBase to indicate * which accessors should be provided. */ enum AccessorLevels { /** Read-only access via a member function. */ ReadOnlyAccessors, /** Read/write access via member functions. */ WriteAccessors, /** Direct read-only access to the coefficients. */ DirectAccessors, /** Direct read/write access to the coefficients. */ DirectWriteAccessors }; /** \ingroup enums * Enum with options to give to various decompositions. */ enum DecompositionOptions { /** \internal Not used (meant for LDLT?). */ Pivoting = 0x01, /** \internal Not used (meant for LDLT?). */ NoPivoting = 0x02, /** Used in JacobiSVD to indicate that the square matrix U is to be computed. */ ComputeFullU = 0x04, /** Used in JacobiSVD to indicate that the thin matrix U is to be computed. */ ComputeThinU = 0x08, /** Used in JacobiSVD to indicate that the square matrix V is to be computed. */ ComputeFullV = 0x10, /** Used in JacobiSVD to indicate that the thin matrix V is to be computed. */ ComputeThinV = 0x20, /** Used in SelfAdjointEigenSolver and GeneralizedSelfAdjointEigenSolver to specify * that only the eigenvalues are to be computed and not the eigenvectors. */ EigenvaluesOnly = 0x40, /** Used in SelfAdjointEigenSolver and GeneralizedSelfAdjointEigenSolver to specify * that both the eigenvalues and the eigenvectors are to be computed. */ ComputeEigenvectors = 0x80, /** \internal */ EigVecMask = EigenvaluesOnly | ComputeEigenvectors, /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should * solve the generalized eigenproblem \f$ Ax = \lambda B x \f$. */ Ax_lBx = 0x100, /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should * solve the generalized eigenproblem \f$ ABx = \lambda x \f$. */ ABx_lx = 0x200, /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should * solve the generalized eigenproblem \f$ BAx = \lambda x \f$. */ BAx_lx = 0x400, /** \internal */ GenEigMask = Ax_lBx | ABx_lx | BAx_lx }; /** \ingroup enums * Possible values for the \p QRPreconditioner template parameter of JacobiSVD. */ enum QRPreconditioners { /** Do not specify what is to be done if the SVD of a non-square matrix is asked for. */ NoQRPreconditioner, /** Use a QR decomposition without pivoting as the first step. */ HouseholderQRPreconditioner, /** Use a QR decomposition with column pivoting as the first step. */ ColPivHouseholderQRPreconditioner, /** Use a QR decomposition with full pivoting as the first step. */ FullPivHouseholderQRPreconditioner }; #ifdef Success #error The preprocessor symbol 'Success' is defined, possibly by the X11 header file X.h #endif /** \ingroup enums * Enum for reporting the status of a computation. */ enum ComputationInfo { /** Computation was successful. */ Success = 0, /** The provided data did not satisfy the prerequisites. */ NumericalIssue = 1, /** Iterative procedure did not converge. */ NoConvergence = 2, /** The inputs are invalid, or the algorithm has been improperly called. * When assertions are enabled, such errors trigger an assert. */ InvalidInput = 3 }; /** \ingroup enums * Enum used to specify how a particular transformation is stored in a matrix. * \sa Transform, Hyperplane::transform(). */ enum TransformTraits { /** Transformation is an isometry. */ Isometry = 0x1, /** Transformation is an affine transformation stored as a (Dim+1)^2 matrix whose last row is * assumed to be [0 ... 0 1]. */ Affine = 0x2, /** Transformation is an affine transformation stored as a (Dim) x (Dim+1) matrix. */ AffineCompact = 0x10 | Affine, /** Transformation is a general projective transformation stored as a (Dim+1)^2 matrix. */ Projective = 0x20 }; /** \internal \ingroup enums * Enum used to choose between implementation depending on the computer architecture. */ namespace Architecture { enum Type { Generic = 0x0, SSE = 0x1, AltiVec = 0x2, VSX = 0x3, NEON = 0x4, MSA = 0x5, SVE = 0x6, #if defined EIGEN_VECTORIZE_SSE Target = SSE #elif defined EIGEN_VECTORIZE_ALTIVEC Target = AltiVec #elif defined EIGEN_VECTORIZE_VSX Target = VSX #elif defined EIGEN_VECTORIZE_NEON Target = NEON #elif defined EIGEN_VECTORIZE_SVE Target = SVE #elif defined EIGEN_VECTORIZE_MSA Target = MSA #else Target = Generic #endif }; } /** \internal \ingroup enums * Enum used as template parameter in Product and product evaluators. */ enum ProductImplType { DefaultProduct=0, LazyProduct, AliasFreeProduct, CoeffBasedProductMode, LazyCoeffBasedProductMode, OuterProduct, InnerProduct, GemvProduct, GemmProduct }; /** \internal \ingroup enums * Enum used in experimental parallel implementation. */ enum Action {GetAction, SetAction}; /** The type used to identify a dense storage. */ struct Dense {}; /** The type used to identify a general sparse storage. */ struct Sparse {}; /** The type used to identify a general solver (factored) storage. */ struct SolverStorage {}; /** The type used to identify a permutation storage. */ struct PermutationStorage {}; /** The type used to identify a permutation storage. */ struct TranspositionsStorage {}; /** The type used to identify a matrix expression */ struct MatrixXpr {}; /** The type used to identify an array expression */ struct ArrayXpr {}; // An evaluator must define its shape. By default, it can be one of the following: struct DenseShape { static std::string debugName() { return "DenseShape"; } }; struct SolverShape { static std::string debugName() { return "SolverShape"; } }; struct HomogeneousShape { static std::string debugName() { return "HomogeneousShape"; } }; struct DiagonalShape { static std::string debugName() { return "DiagonalShape"; } }; struct BandShape { static std::string debugName() { return "BandShape"; } }; struct TriangularShape { static std::string debugName() { return "TriangularShape"; } }; struct SelfAdjointShape { static std::string debugName() { return "SelfAdjointShape"; } }; struct PermutationShape { static std::string debugName() { return "PermutationShape"; } }; struct TranspositionsShape { static std::string debugName() { return "TranspositionsShape"; } }; struct SparseShape { static std::string debugName() { return "SparseShape"; } }; namespace internal { // random access iterators based on coeff*() accessors. struct IndexBased {}; // evaluator based on iterators to access coefficients. struct IteratorBased {}; /** \internal * Constants for comparison functors */ enum ComparisonName { cmp_EQ = 0, cmp_LT = 1, cmp_LE = 2, cmp_UNORD = 3, cmp_NEQ = 4, cmp_GT = 5, cmp_GE = 6 }; } // end namespace internal } // end namespace Eigen #endif // EIGEN_CONSTANTS_H RcppEigen/inst/include/Eigen/src/Core/util/StaticAssert.h0000644000176200001440000002466414562417221023013 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008 Gael Guennebaud // Copyright (C) 2008 Benoit Jacob // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_STATIC_ASSERT_H #define EIGEN_STATIC_ASSERT_H /* Some notes on Eigen's static assertion mechanism: * * - in EIGEN_STATIC_ASSERT(CONDITION,MSG) the parameter CONDITION must be a compile time boolean * expression, and MSG an enum listed in struct internal::static_assertion * * - define EIGEN_NO_STATIC_ASSERT to disable them (and save compilation time) * in that case, the static assertion is converted to the following runtime assert: * eigen_assert(CONDITION && "MSG") * * - currently EIGEN_STATIC_ASSERT can only be used in function scope * */ #ifndef EIGEN_STATIC_ASSERT #ifndef EIGEN_NO_STATIC_ASSERT #if EIGEN_MAX_CPP_VER>=11 && (__has_feature(cxx_static_assert) || (EIGEN_COMP_CXXVER >= 11) || (EIGEN_COMP_MSVC >= 1600)) // if native static_assert is enabled, let's use it #define EIGEN_STATIC_ASSERT(X,MSG) static_assert(X,#MSG); #else // not CXX0X namespace Eigen { namespace internal { template struct static_assertion {}; template<> struct static_assertion { enum { YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX=1, YOU_MIXED_VECTORS_OF_DIFFERENT_SIZES=1, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES=1, THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE=1, THIS_METHOD_IS_ONLY_FOR_MATRICES_OF_A_SPECIFIC_SIZE=1, THIS_METHOD_IS_ONLY_FOR_OBJECTS_OF_A_SPECIFIC_SIZE=1, OUT_OF_RANGE_ACCESS=1, YOU_MADE_A_PROGRAMMING_MISTAKE=1, EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT=1, EIGEN_INTERNAL_COMPILATION_ERROR_OR_YOU_MADE_A_PROGRAMMING_MISTAKE=1, YOU_CALLED_A_FIXED_SIZE_METHOD_ON_A_DYNAMIC_SIZE_MATRIX_OR_VECTOR=1, YOU_CALLED_A_DYNAMIC_SIZE_METHOD_ON_A_FIXED_SIZE_MATRIX_OR_VECTOR=1, UNALIGNED_LOAD_AND_STORE_OPERATIONS_UNIMPLEMENTED_ON_ALTIVEC=1, THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES=1, FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED=1, NUMERIC_TYPE_MUST_BE_REAL=1, COEFFICIENT_WRITE_ACCESS_TO_SELFADJOINT_NOT_SUPPORTED=1, WRITING_TO_TRIANGULAR_PART_WITH_UNIT_DIAGONAL_IS_NOT_SUPPORTED=1, THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE=1, INVALID_MATRIX_PRODUCT=1, INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS=1, INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION=1, YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY=1, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES=1, THIS_METHOD_IS_ONLY_FOR_ROW_MAJOR_MATRICES=1, INVALID_MATRIX_TEMPLATE_PARAMETERS=1, INVALID_MATRIXBASE_TEMPLATE_PARAMETERS=1, BOTH_MATRICES_MUST_HAVE_THE_SAME_STORAGE_ORDER=1, THIS_METHOD_IS_ONLY_FOR_DIAGONAL_MATRIX=1, THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE=1, THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES=1, YOU_ALREADY_SPECIFIED_THIS_STRIDE=1, INVALID_STORAGE_ORDER_FOR_THIS_VECTOR_EXPRESSION=1, THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD=1, PACKET_ACCESS_REQUIRES_TO_HAVE_INNER_STRIDE_FIXED_TO_1=1, THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS=1, YOU_CANNOT_MIX_ARRAYS_AND_MATRICES=1, YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION=1, THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY=1, YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT=1, THIS_METHOD_IS_ONLY_FOR_1x1_EXPRESSIONS=1, THIS_METHOD_IS_ONLY_FOR_INNER_OR_LAZY_PRODUCTS=1, THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_OF_BOOL=1, THIS_METHOD_IS_ONLY_FOR_ARRAYS_NOT_MATRICES=1, YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED=1, YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED=1, THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE=1, THE_STORAGE_ORDER_OF_BOTH_SIDES_MUST_MATCH=1, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG=1, IMPLICIT_CONVERSION_TO_SCALAR_IS_FOR_INNER_PRODUCT_ONLY=1, STORAGE_LAYOUT_DOES_NOT_MATCH=1, EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT__INVALID_COST_VALUE=1, THIS_COEFFICIENT_ACCESSOR_TAKING_ONE_ACCESS_IS_ONLY_FOR_EXPRESSIONS_ALLOWING_LINEAR_ACCESS=1, MATRIX_FREE_CONJUGATE_GRADIENT_IS_COMPATIBLE_WITH_UPPER_UNION_LOWER_MODE_ONLY=1, THIS_TYPE_IS_NOT_SUPPORTED=1, STORAGE_KIND_MUST_MATCH=1, STORAGE_INDEX_MUST_MATCH=1, CHOLMOD_SUPPORTS_DOUBLE_PRECISION_ONLY=1, SELFADJOINTVIEW_ACCEPTS_UPPER_AND_LOWER_MODE_ONLY=1, INVALID_TEMPLATE_PARAMETER=1, GPU_TENSOR_CONTRACTION_DOES_NOT_SUPPORT_OUTPUT_KERNELS=1, THE_ARRAY_SIZE_SHOULD_EQUAL_WITH_PACKET_SIZE=1 }; }; } // end namespace internal } // end namespace Eigen // Specialized implementation for MSVC to avoid "conditional // expression is constant" warnings. This implementation doesn't // appear to work under GCC, hence the multiple implementations. #if EIGEN_COMP_MSVC #define EIGEN_STATIC_ASSERT(CONDITION,MSG) \ {Eigen::internal::static_assertion::MSG;} #else // In some cases clang interprets bool(CONDITION) as function declaration #define EIGEN_STATIC_ASSERT(CONDITION,MSG) \ if (Eigen::internal::static_assertion(CONDITION)>::MSG) {} #endif #endif // not CXX0X #else // EIGEN_NO_STATIC_ASSERT #define EIGEN_STATIC_ASSERT(CONDITION,MSG) eigen_assert((CONDITION) && #MSG); #endif // EIGEN_NO_STATIC_ASSERT #endif // EIGEN_STATIC_ASSERT // static assertion failing if the type \a TYPE is not a vector type #define EIGEN_STATIC_ASSERT_VECTOR_ONLY(TYPE) \ EIGEN_STATIC_ASSERT(TYPE::IsVectorAtCompileTime, \ YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX) // static assertion failing if the type \a TYPE is not fixed-size #define EIGEN_STATIC_ASSERT_FIXED_SIZE(TYPE) \ EIGEN_STATIC_ASSERT(TYPE::SizeAtCompileTime!=Eigen::Dynamic, \ YOU_CALLED_A_FIXED_SIZE_METHOD_ON_A_DYNAMIC_SIZE_MATRIX_OR_VECTOR) // static assertion failing if the type \a TYPE is not dynamic-size #define EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(TYPE) \ EIGEN_STATIC_ASSERT(TYPE::SizeAtCompileTime==Eigen::Dynamic, \ YOU_CALLED_A_DYNAMIC_SIZE_METHOD_ON_A_FIXED_SIZE_MATRIX_OR_VECTOR) // static assertion failing if the type \a TYPE is not a vector type of the given size #define EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(TYPE, SIZE) \ EIGEN_STATIC_ASSERT(TYPE::IsVectorAtCompileTime && TYPE::SizeAtCompileTime==SIZE, \ THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE) // static assertion failing if the type \a TYPE is not a vector type of the given size #define EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(TYPE, ROWS, COLS) \ EIGEN_STATIC_ASSERT(TYPE::RowsAtCompileTime==ROWS && TYPE::ColsAtCompileTime==COLS, \ THIS_METHOD_IS_ONLY_FOR_MATRICES_OF_A_SPECIFIC_SIZE) // static assertion failing if the two vector expression types are not compatible (same fixed-size or dynamic size) #define EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(TYPE0,TYPE1) \ EIGEN_STATIC_ASSERT( \ (int(TYPE0::SizeAtCompileTime)==Eigen::Dynamic \ || int(TYPE1::SizeAtCompileTime)==Eigen::Dynamic \ || int(TYPE0::SizeAtCompileTime)==int(TYPE1::SizeAtCompileTime)),\ YOU_MIXED_VECTORS_OF_DIFFERENT_SIZES) #define EIGEN_PREDICATE_SAME_MATRIX_SIZE(TYPE0,TYPE1) \ ( \ (int(Eigen::internal::size_of_xpr_at_compile_time::ret)==0 && int(Eigen::internal::size_of_xpr_at_compile_time::ret)==0) \ || (\ (int(TYPE0::RowsAtCompileTime)==Eigen::Dynamic \ || int(TYPE1::RowsAtCompileTime)==Eigen::Dynamic \ || int(TYPE0::RowsAtCompileTime)==int(TYPE1::RowsAtCompileTime)) \ && (int(TYPE0::ColsAtCompileTime)==Eigen::Dynamic \ || int(TYPE1::ColsAtCompileTime)==Eigen::Dynamic \ || int(TYPE0::ColsAtCompileTime)==int(TYPE1::ColsAtCompileTime))\ ) \ ) #define EIGEN_STATIC_ASSERT_NON_INTEGER(TYPE) \ EIGEN_STATIC_ASSERT(!Eigen::NumTraits::IsInteger, THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES) // static assertion failing if it is guaranteed at compile-time that the two matrix expression types have different sizes #define EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(TYPE0,TYPE1) \ EIGEN_STATIC_ASSERT( \ EIGEN_PREDICATE_SAME_MATRIX_SIZE(TYPE0,TYPE1),\ YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES) #define EIGEN_STATIC_ASSERT_SIZE_1x1(TYPE) \ EIGEN_STATIC_ASSERT((TYPE::RowsAtCompileTime == 1 || TYPE::RowsAtCompileTime == Eigen::Dynamic) && \ (TYPE::ColsAtCompileTime == 1 || TYPE::ColsAtCompileTime == Eigen::Dynamic), \ THIS_METHOD_IS_ONLY_FOR_1x1_EXPRESSIONS) #define EIGEN_STATIC_ASSERT_LVALUE(Derived) \ EIGEN_STATIC_ASSERT(Eigen::internal::is_lvalue::value, \ THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY) #define EIGEN_STATIC_ASSERT_ARRAYXPR(Derived) \ EIGEN_STATIC_ASSERT((Eigen::internal::is_same::XprKind, ArrayXpr>::value), \ THIS_METHOD_IS_ONLY_FOR_ARRAYS_NOT_MATRICES) #define EIGEN_STATIC_ASSERT_SAME_XPR_KIND(Derived1, Derived2) \ EIGEN_STATIC_ASSERT((Eigen::internal::is_same::XprKind, \ typename Eigen::internal::traits::XprKind \ >::value), \ YOU_CANNOT_MIX_ARRAYS_AND_MATRICES) // Check that a cost value is positive, and that is stay within a reasonable range // TODO this check could be enabled for internal debugging only #define EIGEN_INTERNAL_CHECK_COST_VALUE(C) \ EIGEN_STATIC_ASSERT((C)>=0 && (C)<=HugeCost*HugeCost, EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT__INVALID_COST_VALUE); #endif // EIGEN_STATIC_ASSERT_H RcppEigen/inst/include/Eigen/src/Core/Fuzzy.h0000644000176200001440000001317714562417221020551 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2008 Benoit Jacob // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_FUZZY_H #define EIGEN_FUZZY_H namespace Eigen { namespace internal { template::IsInteger> struct isApprox_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec) { typename internal::nested_eval::type nested(x); typename internal::nested_eval::type otherNested(y); return (nested - otherNested).cwiseAbs2().sum() <= prec * prec * numext::mini(nested.cwiseAbs2().sum(), otherNested.cwiseAbs2().sum()); } }; template struct isApprox_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar&) { return x.matrix() == y.matrix(); } }; template::IsInteger> struct isMuchSmallerThan_object_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec) { return x.cwiseAbs2().sum() <= numext::abs2(prec) * y.cwiseAbs2().sum(); } }; template struct isMuchSmallerThan_object_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived&, const typename Derived::RealScalar&) { return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix(); } }; template::IsInteger> struct isMuchSmallerThan_scalar_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const typename Derived::RealScalar& y, const typename Derived::RealScalar& prec) { return x.cwiseAbs2().sum() <= numext::abs2(prec * y); } }; template struct isMuchSmallerThan_scalar_selector { EIGEN_DEVICE_FUNC static bool run(const Derived& x, const typename Derived::RealScalar&, const typename Derived::RealScalar&) { return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix(); } }; } // end namespace internal /** \returns \c true if \c *this is approximately equal to \a other, within the precision * determined by \a prec. * * \note The fuzzy compares are done multiplicatively. Two vectors \f$ v \f$ and \f$ w \f$ * are considered to be approximately equal within precision \f$ p \f$ if * \f[ \Vert v - w \Vert \leqslant p\,\min(\Vert v\Vert, \Vert w\Vert). \f] * For matrices, the comparison is done using the Hilbert-Schmidt norm (aka Frobenius norm * L2 norm). * * \note Because of the multiplicativeness of this comparison, one can't use this function * to check whether \c *this is approximately equal to the zero matrix or vector. * Indeed, \c isApprox(zero) returns false unless \c *this itself is exactly the zero matrix * or vector. If you want to test whether \c *this is zero, use internal::isMuchSmallerThan(const * RealScalar&, RealScalar) instead. * * \sa internal::isMuchSmallerThan(const RealScalar&, RealScalar) const */ template template EIGEN_DEVICE_FUNC bool DenseBase::isApprox( const DenseBase& other, const RealScalar& prec ) const { return internal::isApprox_selector::run(derived(), other.derived(), prec); } /** \returns \c true if the norm of \c *this is much smaller than \a other, * within the precision determined by \a prec. * * \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is * considered to be much smaller than \f$ x \f$ within precision \f$ p \f$ if * \f[ \Vert v \Vert \leqslant p\,\vert x\vert. \f] * * For matrices, the comparison is done using the Hilbert-Schmidt norm. For this reason, * the value of the reference scalar \a other should come from the Hilbert-Schmidt norm * of a reference matrix of same dimensions. * * \sa isApprox(), isMuchSmallerThan(const DenseBase&, RealScalar) const */ template EIGEN_DEVICE_FUNC bool DenseBase::isMuchSmallerThan( const typename NumTraits::Real& other, const RealScalar& prec ) const { return internal::isMuchSmallerThan_scalar_selector::run(derived(), other, prec); } /** \returns \c true if the norm of \c *this is much smaller than the norm of \a other, * within the precision determined by \a prec. * * \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is * considered to be much smaller than a vector \f$ w \f$ within precision \f$ p \f$ if * \f[ \Vert v \Vert \leqslant p\,\Vert w\Vert. \f] * For matrices, the comparison is done using the Hilbert-Schmidt norm. * * \sa isApprox(), isMuchSmallerThan(const RealScalar&, RealScalar) const */ template template EIGEN_DEVICE_FUNC bool DenseBase::isMuchSmallerThan( const DenseBase& other, const RealScalar& prec ) const { return internal::isMuchSmallerThan_object_selector::run(derived(), other.derived(), prec); } } // end namespace Eigen #endif // EIGEN_FUZZY_H RcppEigen/inst/include/Eigen/src/Core/Transpose.h0000644000176200001440000004230614562417221021374 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2008 Benoit Jacob // Copyright (C) 2009-2014 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_TRANSPOSE_H #define EIGEN_TRANSPOSE_H namespace Eigen { namespace internal { template struct traits > : public traits { typedef typename ref_selector::type MatrixTypeNested; typedef typename remove_reference::type MatrixTypeNestedPlain; enum { RowsAtCompileTime = MatrixType::ColsAtCompileTime, ColsAtCompileTime = MatrixType::RowsAtCompileTime, MaxRowsAtCompileTime = MatrixType::MaxColsAtCompileTime, MaxColsAtCompileTime = MatrixType::MaxRowsAtCompileTime, FlagsLvalueBit = is_lvalue::value ? LvalueBit : 0, Flags0 = traits::Flags & ~(LvalueBit | NestByRefBit), Flags1 = Flags0 | FlagsLvalueBit, Flags = Flags1 ^ RowMajorBit, InnerStrideAtCompileTime = inner_stride_at_compile_time::ret, OuterStrideAtCompileTime = outer_stride_at_compile_time::ret }; }; } template class TransposeImpl; /** \class Transpose * \ingroup Core_Module * * \brief Expression of the transpose of a matrix * * \tparam MatrixType the type of the object of which we are taking the transpose * * This class represents an expression of the transpose of a matrix. * It is the return type of MatrixBase::transpose() and MatrixBase::adjoint() * and most of the time this is the only way it is used. * * \sa MatrixBase::transpose(), MatrixBase::adjoint() */ template class Transpose : public TransposeImpl::StorageKind> { public: typedef typename internal::ref_selector::non_const_type MatrixTypeNested; typedef typename TransposeImpl::StorageKind>::Base Base; EIGEN_GENERIC_PUBLIC_INTERFACE(Transpose) typedef typename internal::remove_all::type NestedExpression; EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE Transpose(MatrixType& matrix) : m_matrix(matrix) {} EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Transpose) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index rows() const EIGEN_NOEXCEPT { return m_matrix.cols(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE EIGEN_CONSTEXPR Index cols() const EIGEN_NOEXCEPT { return m_matrix.rows(); } /** \returns the nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename internal::remove_all::type& nestedExpression() const { return m_matrix; } /** \returns the nested expression */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename internal::remove_reference::type& nestedExpression() { return m_matrix; } /** \internal */ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void resize(Index nrows, Index ncols) { m_matrix.resize(ncols,nrows); } protected: typename internal::ref_selector::non_const_type m_matrix; }; namespace internal { template::ret> struct TransposeImpl_base { typedef typename dense_xpr_base >::type type; }; template struct TransposeImpl_base { typedef typename dense_xpr_base >::type type; }; } // end namespace internal // Generic API dispatcher template class TransposeImpl : public internal::generic_xpr_base >::type { public: typedef typename internal::generic_xpr_base >::type Base; }; template class TransposeImpl : public internal::TransposeImpl_base::type { public: typedef typename internal::TransposeImpl_base::type Base; using Base::coeffRef; EIGEN_DENSE_PUBLIC_INTERFACE(Transpose) EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TransposeImpl) EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index innerStride() const { return derived().nestedExpression().innerStride(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index outerStride() const { return derived().nestedExpression().outerStride(); } typedef typename internal::conditional< internal::is_lvalue::value, Scalar, const Scalar >::type ScalarWithConstIfNotLvalue; EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ScalarWithConstIfNotLvalue* data() { return derived().nestedExpression().data(); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar* data() const { return derived().nestedExpression().data(); } // FIXME: shall we keep the const version of coeffRef? EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeffRef(Index rowId, Index colId) const { return derived().nestedExpression().coeffRef(colId, rowId); } EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeffRef(Index index) const { return derived().nestedExpression().coeffRef(index); } protected: EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(TransposeImpl) }; /** \returns an expression of the transpose of *this. * * Example: \include MatrixBase_transpose.cpp * Output: \verbinclude MatrixBase_transpose.out * * \warning If you want to replace a matrix by its own transpose, do \b NOT do this: * \code * m = m.transpose(); // bug!!! caused by aliasing effect * \endcode * Instead, use the transposeInPlace() method: * \code * m.transposeInPlace(); * \endcode * which gives Eigen good opportunities for optimization, or alternatively you can also do: * \code * m = m.transpose().eval(); * \endcode * * \sa transposeInPlace(), adjoint() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Transpose DenseBase::transpose() { return TransposeReturnType(derived()); } /** This is the const version of transpose(). * * Make sure you read the warning for transpose() ! * * \sa transposeInPlace(), adjoint() */ template EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename DenseBase::ConstTransposeReturnType DenseBase::transpose() const { return ConstTransposeReturnType(derived()); } /** \returns an expression of the adjoint (i.e. conjugate transpose) of *this. * * Example: \include MatrixBase_adjoint.cpp * Output: \verbinclude MatrixBase_adjoint.out * * \warning If you want to replace a matrix by its own adjoint, do \b NOT do this: * \code * m = m.adjoint(); // bug!!! caused by aliasing effect * \endcode * Instead, use the adjointInPlace() method: * \code * m.adjointInPlace(); * \endcode * which gives Eigen good opportunities for optimization, or alternatively you can also do: * \code * m = m.adjoint().eval(); * \endcode * * \sa adjointInPlace(), transpose(), conjugate(), class Transpose, class internal::scalar_conjugate_op */ template EIGEN_DEVICE_FUNC inline const typename MatrixBase::AdjointReturnType MatrixBase::adjoint() const { return AdjointReturnType(this->transpose()); } /*************************************************************************** * "in place" transpose implementation ***************************************************************************/ namespace internal { template::size)) && (internal::evaluator::Flags&PacketAccessBit) > struct inplace_transpose_selector; template struct inplace_transpose_selector { // square matrix static void run(MatrixType& m) { m.matrix().template triangularView().swap(m.matrix().transpose().template triangularView()); } }; template struct inplace_transpose_selector { // PacketSize x PacketSize static void run(MatrixType& m) { typedef typename MatrixType::Scalar Scalar; typedef typename internal::packet_traits::type Packet; const Index PacketSize = internal::packet_traits::size; const Index Alignment = internal::evaluator::Alignment; PacketBlock A; for (Index i=0; i(i,0); internal::ptranspose(A); for (Index i=0; i(m.rowIndexByOuterInner(i,0), m.colIndexByOuterInner(i,0), A.packet[i]); } }; template void BlockedInPlaceTranspose(MatrixType& m) { typedef typename MatrixType::Scalar Scalar; typedef typename internal::packet_traits::type Packet; const Index PacketSize = internal::packet_traits::size; eigen_assert(m.rows() == m.cols()); int row_start = 0; for (; row_start + PacketSize <= m.rows(); row_start += PacketSize) { for (int col_start = row_start; col_start + PacketSize <= m.cols(); col_start += PacketSize) { PacketBlock A; if (row_start == col_start) { for (Index i=0; i(row_start + i,col_start); internal::ptranspose(A); for (Index i=0; i(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), A.packet[i]); } else { PacketBlock B; for (Index i=0; i(row_start + i,col_start); B.packet[i] = m.template packetByOuterInner(col_start + i, row_start); } internal::ptranspose(A); internal::ptranspose(B); for (Index i=0; i(m.rowIndexByOuterInner(row_start + i, col_start), m.colIndexByOuterInner(row_start + i,col_start), B.packet[i]); m.template writePacket(m.rowIndexByOuterInner(col_start + i, row_start), m.colIndexByOuterInner(col_start + i,row_start), A.packet[i]); } } } } for (Index row = row_start; row < m.rows(); ++row) { m.matrix().row(row).head(row).swap( m.matrix().col(row).head(row).transpose()); } } template struct inplace_transpose_selector { // non square or dynamic matrix static void run(MatrixType& m) { typedef typename MatrixType::Scalar Scalar; if (m.rows() == m.cols()) { const Index PacketSize = internal::packet_traits::size; if (!NumTraits::IsComplex && m.rows() >= PacketSize) { if ((m.rows() % PacketSize) == 0) BlockedInPlaceTranspose::Alignment>(m); else BlockedInPlaceTranspose(m); } else { m.matrix().template triangularView().swap(m.matrix().transpose().template triangularView()); } } else { m = m.transpose().eval(); } } }; } // end namespace internal /** This is the "in place" version of transpose(): it replaces \c *this by its own transpose. * Thus, doing * \code * m.transposeInPlace(); * \endcode * has the same effect on m as doing * \code * m = m.transpose().eval(); * \endcode * and is faster and also safer because in the latter line of code, forgetting the eval() results * in a bug caused by \ref TopicAliasing "aliasing". * * Notice however that this method is only useful if you want to replace a matrix by its own transpose. * If you just need the transpose of a matrix, use transpose(). * * \note if the matrix is not square, then \c *this must be a resizable matrix. * This excludes (non-square) fixed-size matrices, block-expressions and maps. * * \sa transpose(), adjoint(), adjointInPlace() */ template EIGEN_DEVICE_FUNC inline void DenseBase::transposeInPlace() { eigen_assert((rows() == cols() || (RowsAtCompileTime == Dynamic && ColsAtCompileTime == Dynamic)) && "transposeInPlace() called on a non-square non-resizable matrix"); internal::inplace_transpose_selector::run(derived()); } /*************************************************************************** * "in place" adjoint implementation ***************************************************************************/ /** This is the "in place" version of adjoint(): it replaces \c *this by its own transpose. * Thus, doing * \code * m.adjointInPlace(); * \endcode * has the same effect on m as doing * \code * m = m.adjoint().eval(); * \endcode * and is faster and also safer because in the latter line of code, forgetting the eval() results * in a bug caused by aliasing. * * Notice however that this method is only useful if you want to replace a matrix by its own adjoint. * If you just need the adjoint of a matrix, use adjoint(). * * \note if the matrix is not square, then \c *this must be a resizable matrix. * This excludes (non-square) fixed-size matrices, block-expressions and maps. * * \sa transpose(), adjoint(), transposeInPlace() */ template EIGEN_DEVICE_FUNC inline void MatrixBase::adjointInPlace() { derived() = adjoint().eval(); } #ifndef EIGEN_NO_DEBUG // The following is to detect aliasing problems in most common cases. namespace internal { template struct check_transpose_aliasing_compile_time_selector { enum { ret = bool(blas_traits::IsTransposed) != DestIsTransposed }; }; template struct check_transpose_aliasing_compile_time_selector > { enum { ret = bool(blas_traits::IsTransposed) != DestIsTransposed || bool(blas_traits::IsTransposed) != DestIsTransposed }; }; template struct check_transpose_aliasing_run_time_selector { static bool run(const Scalar* dest, const OtherDerived& src) { return (bool(blas_traits::IsTransposed) != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src)); } }; template struct check_transpose_aliasing_run_time_selector > { static bool run(const Scalar* dest, const CwiseBinaryOp& src) { return ((blas_traits::IsTransposed != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src.lhs()))) || ((blas_traits::IsTransposed != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src.rhs()))); } }; // the following selector, checkTransposeAliasing_impl, based on MightHaveTransposeAliasing, // is because when the condition controlling the assert is known at compile time, ICC emits a warning. // This is actually a good warning: in expressions that don't have any transposing, the condition is // known at compile time to be false, and using that, we can avoid generating the code of the assert again // and again for all these expressions that don't need it. template::IsTransposed,OtherDerived>::ret > struct checkTransposeAliasing_impl { static void run(const Derived& dst, const OtherDerived& other) { eigen_assert((!check_transpose_aliasing_run_time_selector ::IsTransposed,OtherDerived> ::run(extract_data(dst), other)) && "aliasing detected during transposition, use transposeInPlace() " "or evaluate the rhs into a temporary using .eval()"); } }; template struct checkTransposeAliasing_impl { static void run(const Derived&, const OtherDerived&) { } }; template void check_for_aliasing(const Dst &dst, const Src &src) { if((!Dst::IsVectorAtCompileTime) && dst.rows()>1 && dst.cols()>1) internal::checkTransposeAliasing_impl::run(dst, src); } } // end namespace internal #endif // EIGEN_NO_DEBUG } // end namespace Eigen #endif // EIGEN_TRANSPOSE_H RcppEigen/inst/include/Eigen/src/Core/IO.h0000644000176200001440000002005614562417221017723 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2006-2008 Benoit Jacob // Copyright (C) 2008 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_IO_H #define EIGEN_IO_H namespace Eigen { enum { DontAlignCols = 1 }; enum { StreamPrecision = -1, FullPrecision = -2 }; namespace internal { template std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt); } /** \class IOFormat * \ingroup Core_Module * * \brief Stores a set of parameters controlling the way matrices are printed * * List of available parameters: * - \b precision number of digits for floating point values, or one of the special constants \c StreamPrecision and \c FullPrecision. * The default is the special value \c StreamPrecision which means to use the * stream's own precision setting, as set for instance using \c cout.precision(3). The other special value * \c FullPrecision means that the number of digits will be computed to match the full precision of each floating-point * type. * - \b flags an OR-ed combination of flags, the default value is 0, the only currently available flag is \c DontAlignCols which * allows to disable the alignment of columns, resulting in faster code. * - \b coeffSeparator string printed between two coefficients of the same row * - \b rowSeparator string printed between two rows * - \b rowPrefix string printed at the beginning of each row * - \b rowSuffix string printed at the end of each row * - \b matPrefix string printed at the beginning of the matrix * - \b matSuffix string printed at the end of the matrix * - \b fill character printed to fill the empty space in aligned columns * * Example: \include IOFormat.cpp * Output: \verbinclude IOFormat.out * * \sa DenseBase::format(), class WithFormat */ struct IOFormat { /** Default constructor, see class IOFormat for the meaning of the parameters */ IOFormat(int _precision = StreamPrecision, int _flags = 0, const std::string& _coeffSeparator = " ", const std::string& _rowSeparator = "\n", const std::string& _rowPrefix="", const std::string& _rowSuffix="", const std::string& _matPrefix="", const std::string& _matSuffix="", const char _fill=' ') : matPrefix(_matPrefix), matSuffix(_matSuffix), rowPrefix(_rowPrefix), rowSuffix(_rowSuffix), rowSeparator(_rowSeparator), rowSpacer(""), coeffSeparator(_coeffSeparator), fill(_fill), precision(_precision), flags(_flags) { // TODO check if rowPrefix, rowSuffix or rowSeparator contains a newline // don't add rowSpacer if columns are not to be aligned if((flags & DontAlignCols)) return; int i = int(matSuffix.length())-1; while (i>=0 && matSuffix[i]!='\n') { rowSpacer += ' '; i--; } } std::string matPrefix, matSuffix; std::string rowPrefix, rowSuffix, rowSeparator, rowSpacer; std::string coeffSeparator; char fill; int precision; int flags; }; /** \class WithFormat * \ingroup Core_Module * * \brief Pseudo expression providing matrix output with given format * * \tparam ExpressionType the type of the object on which IO stream operations are performed * * This class represents an expression with stream operators controlled by a given IOFormat. * It is the return type of DenseBase::format() * and most of the time this is the only way it is used. * * See class IOFormat for some examples. * * \sa DenseBase::format(), class IOFormat */ template class WithFormat { public: WithFormat(const ExpressionType& matrix, const IOFormat& format) : m_matrix(matrix), m_format(format) {} friend std::ostream & operator << (std::ostream & s, const WithFormat& wf) { return internal::print_matrix(s, wf.m_matrix.eval(), wf.m_format); } protected: typename ExpressionType::Nested m_matrix; IOFormat m_format; }; namespace internal { // NOTE: This helper is kept for backward compatibility with previous code specializing // this internal::significant_decimals_impl structure. In the future we should directly // call digits10() which has been introduced in July 2016 in 3.3. template struct significant_decimals_impl { static inline int run() { return NumTraits::digits10(); } }; /** \internal * print the matrix \a _m to the output stream \a s using the output format \a fmt */ template std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt) { using internal::is_same; using internal::conditional; if(_m.size() == 0) { s << fmt.matPrefix << fmt.matSuffix; return s; } typename Derived::Nested m = _m; typedef typename Derived::Scalar Scalar; typedef typename conditional< is_same::value || is_same::value || is_same::value || is_same::value, int, typename conditional< is_same >::value || is_same >::value || is_same >::value || is_same >::value, std::complex, const Scalar& >::type >::type PrintType; Index width = 0; std::streamsize explicit_precision; if(fmt.precision == StreamPrecision) { explicit_precision = 0; } else if(fmt.precision == FullPrecision) { if (NumTraits::IsInteger) { explicit_precision = 0; } else { explicit_precision = significant_decimals_impl::run(); } } else { explicit_precision = fmt.precision; } std::streamsize old_precision = 0; if(explicit_precision) old_precision = s.precision(explicit_precision); bool align_cols = !(fmt.flags & DontAlignCols); if(align_cols) { // compute the largest width for(Index j = 0; j < m.cols(); ++j) for(Index i = 0; i < m.rows(); ++i) { std::stringstream sstr; sstr.copyfmt(s); sstr << static_cast(m.coeff(i,j)); width = std::max(width, Index(sstr.str().length())); } } std::streamsize old_width = s.width(); char old_fill_character = s.fill(); s << fmt.matPrefix; for(Index i = 0; i < m.rows(); ++i) { if (i) s << fmt.rowSpacer; s << fmt.rowPrefix; if(width) { s.fill(fmt.fill); s.width(width); } s << static_cast(m.coeff(i, 0)); for(Index j = 1; j < m.cols(); ++j) { s << fmt.coeffSeparator; if(width) { s.fill(fmt.fill); s.width(width); } s << static_cast(m.coeff(i, j)); } s << fmt.rowSuffix; if( i < m.rows() - 1) s << fmt.rowSeparator; } s << fmt.matSuffix; if(explicit_precision) s.precision(old_precision); if(width) { s.fill(old_fill_character); s.width(old_width); } return s; } } // end namespace internal /** \relates DenseBase * * Outputs the matrix, to the given stream. * * If you wish to print the matrix with a format different than the default, use DenseBase::format(). * * It is also possible to change the default format by defining EIGEN_DEFAULT_IO_FORMAT before including Eigen headers. * If not defined, this will automatically be defined to Eigen::IOFormat(), that is the Eigen::IOFormat with default parameters. * * \sa DenseBase::format() */ template std::ostream & operator << (std::ostream & s, const DenseBase & m) { return internal::print_matrix(s, m.eval(), EIGEN_DEFAULT_IO_FORMAT); } } // end namespace Eigen #endif // EIGEN_IO_H RcppEigen/inst/include/Eigen/src/Core/Select.h0000644000176200001440000001377714562417221020647 0ustar liggesusers// This file is part of Eigen, a lightweight C++ template library // for linear algebra. // // Copyright (C) 2008-2010 Gael Guennebaud // // This Source Code Form is subject to the terms of the Mozilla // Public License v. 2.0. If a copy of the MPL was not distributed // with this file, You can obtain one at http://mozilla.org/MPL/2.0/. #ifndef EIGEN_SELECT_H #define EIGEN_SELECT_H namespace Eigen { /** \class Select * \ingroup Core_Module * * \brief Expression of a coefficient wise version of the C++ ternary operator ?: * * \param ConditionMatrixType the type of the \em condition expression which must be a boolean matrix * \param ThenMatrixType the type of the \em then expression * \param ElseMatrixType the type of the \em else expression * * This class represents an expression of a coefficient wise version of the C++ ternary operator ?:. * It is the return type of DenseBase::select() and most of the time this is the only way it is used. * * \sa DenseBase::select(const DenseBase&, const DenseBase&) const */ namespace internal { template struct traits > : traits { typedef typename traits::Scalar Scalar; typedef Dense StorageKind; typedef typename traits::XprKind XprKind; typedef typename ConditionMatrixType::Nested ConditionMatrixNested; typedef typename ThenMatrixType::Nested ThenMatrixNested; typedef typename ElseMatrixType::Nested ElseMatrixNested; enum { RowsAtCompileTime = ConditionMatrixType::RowsAtCompileTime, ColsAtCompileTime = ConditionMatrixType::ColsAtCompileTime, MaxRowsAtCompileTime = ConditionMatrixType::MaxRowsAtCompileTime, MaxColsAtCompileTime = ConditionMatrixType::MaxColsAtCompileTime, Flags = (unsigned int)ThenMatrixType::Flags & ElseMatrixType::Flags & RowMajorBit }; }; } template class Select : public internal::dense_xpr_base< Select >::type, internal::no_assignment_operator { public: typedef typename internal::dense_xpr_base