pax_global_header00006660000000000000000000000064125335144260014517gustar00rootroot0000000000000052 comment=28ce6889d05f56f7d436b1a162c7e0eab063bb74 Yeti-6.4.0/000077500000000000000000000000001253351442600124405ustar00rootroot00000000000000Yeti-6.4.0/.gitattributes000066400000000000000000000004031253351442600153300ustar00rootroot00000000000000*.[hc] filter=cleanup-code *.i filter=cleanup-text README filter=cleanup-text TODO filter=cleanup-text AUTHORS filter=cleanup-text NEWS filter=cleanup-text configure filter=cleanup-text Makefile filter=cleanup-yorick-makefile Yeti-6.4.0/.gitignore000066400000000000000000000011611253351442600144270ustar00rootroot00000000000000# Specific files # ################## ywrap.c build/ devel/ old/ # Automatic backups # ##################### *~ # Compiled sources and libraries # ################################## *.com *.class *.dll *.exe *.o *.so lib*.a # Packages # ############ # it's better to unpack these files and commit the raw source # git has its own built in compression methods *.7z *.bz2 *.dmg *.gz *.iso *.jar *.rar *.tar *.tar.bz2 *.tar.gz *.tar.xz *.tgz *.txz *.xz *.zip # Logs and databases # ###################### *.log *.sql *.sqlite # OS generated files # ###################### .DS_Store* ehthumbs.db Icon? Thumbs.db .directory Yeti-6.4.0/AUTHORS000066400000000000000000000001041253351442600135030ustar00rootroot00000000000000Éric Thiébaut (Centre de Recherche Astrophysique de Lyon, France) Yeti-6.4.0/LICENSE.md000066400000000000000000000522501253351442600140500ustar00rootroot00000000000000 # CeCILL-C FREE SOFTWARE LICENSE AGREEMENT ## Notice This Agreement is a Free Software license agreement that is the result of discussions between its authors in order to ensure compliance with the two main principles guiding its drafting: * firstly, compliance with the principles governing the distribution of Free Software: access to source code, broad rights granted to users, * secondly, the election of a governing law, French law, with which it is conformant, both as regards the law of torts and intellectual property law, and the protection that it offers to both authors and holders of the economic rights over software. The authors of the CeCILL-C (for Ce[a] C[nrs] I[nria] L[ogiciel] L[ibre]) license are: Commissariat à l'Energie Atomique - CEA, a public scientific, technical and industrial research establishment, having its principal place of business at 25 rue Leblanc, immeuble Le Ponant D, 75015 Paris, France. Centre National de la Recherche Scientifique - CNRS, a public scientific and technological establishment, having its principal place of business at 3 rue Michel-Ange, 75794 Paris cedex 16, France. Institut National de Recherche en Informatique et en Automatique - INRIA, a public scientific and technological establishment, having its principal place of business at Domaine de Voluceau, Rocquencourt, BP 105, 78153 Le Chesnay cedex, France. ## Preamble The purpose of this Free Software license agreement is to grant users the right to modify and re-use the software governed by this license. The exercising of this right is conditional upon the obligation to make available to the community the modifications made to the source code of the software so as to contribute to its evolution. In consideration of access to the source code and the rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive licensors only have limited liability. In this respect, the risks associated with loading, using, modifying and/or developing or reproducing the software by the user are brought to the user's attention, given its Free Software status, which may make it complicated to use, with the result that its use is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the suitability of the software as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions of security. This Agreement may be freely reproduced and published, provided it is not altered, and that no provisions are either added or removed herefrom. This Agreement may apply to any or all software for which the holder of the economic rights decides to submit the use thereof to its provisions. ## Article 1 - DEFINITIONS For the purpose of this Agreement, when the following expressions commence with a capital letter, they shall have the following meaning: Agreement: means this license agreement, and its possible subsequent versions and annexes. Software: means the software in its Object Code and/or Source Code form and, where applicable, its documentation, "as is" when the Licensee accepts the Agreement. Initial Software: means the Software in its Source Code and possibly its Object Code form and, where applicable, its documentation, "as is" when it is first distributed under the terms and conditions of the Agreement. Modified Software: means the Software modified by at least one Integrated Contribution. Source Code: means all the Software's instructions and program lines to which access is required so as to modify the Software. Object Code: means the binary files originating from the compilation of the Source Code. Holder: means the holder(s) of the economic rights over the Initial Software. Licensee: means the Software user(s) having accepted the Agreement. Contributor: means a Licensee having made at least one Integrated Contribution. Licensor: means the Holder, or any other individual or legal entity, who distributes the Software under the Agreement. Integrated Contribution: means any or all modifications, corrections, translations, adaptations and/or new functions integrated into the Source Code by any or all Contributors. Related Module: means a set of sources files including their documentation that, without modification to the Source Code, enables supplementary functions or services in addition to those offered by the Software. Derivative Software: means any combination of the Software, modified or not, and of a Related Module. Parties: mean both the Licensee and the Licensor. These expressions may be used both in singular and plural form. ## Article 2 - PURPOSE The purpose of the Agreement is the grant by the Licensor to the Licensee of a non-exclusive, transferable and worldwide license for the Software as set forth in Article 5 hereinafter for the whole term of the protection granted by the rights over said Software. ## Article 3 - ACCEPTANCE 3.1 The Licensee shall be deemed as having accepted the terms and conditions of this Agreement upon the occurrence of the first of the following events: * (i) loading the Software by any or all means, notably, by downloading from a remote server, or by loading from a physical medium; * (ii) the first time the Licensee exercises any of the rights granted hereunder. 3.2 One copy of the Agreement, containing a notice relating to the characteristics of the Software, to the limited warranty, and to the fact that its use is restricted to experienced users has been provided to the Licensee prior to its acceptance as set forth in Article 3.1 hereinabove, and the Licensee hereby acknowledges that it has read and understood it. ## Article 4 - EFFECTIVE DATE AND TERM ### 4.1 EFFECTIVE DATE The Agreement shall become effective on the date when it is accepted by the Licensee as set forth in Article 3.1. ### 4.2 TERM The Agreement shall remain in force for the entire legal term of protection of the economic rights over the Software. ## Article 5 - SCOPE OF RIGHTS GRANTED The Licensor hereby grants to the Licensee, who accepts, the following rights over the Software for any or all use, and for the term of the Agreement, on the basis of the terms and conditions set forth hereinafter. Besides, if the Licensor owns or comes to own one or more patents protecting all or part of the functions of the Software or of its components, the Licensor undertakes not to enforce the rights granted by these patents against successive Licensees using, exploiting or modifying the Software. If these patents are transferred, the Licensor undertakes to have the transferees subscribe to the obligations set forth in this paragraph. ### 5.1 RIGHT OF USE The Licensee is authorized to use the Software, without any limitation as to its fields of application, with it being hereinafter specified that this comprises: 1. permanent or temporary reproduction of all or part of the Software by any or all means and in any or all form. 2. loading, displaying, running, or storing the Software on any or all medium. 3. entitlement to observe, study or test its operation so as to determine the ideas and principles behind any or all constituent elements of said Software. This shall apply when the Licensee carries out any or all loading, displaying, running, transmission or storage operation as regards the Software, that it is entitled to carry out hereunder. ### 5.2 RIGHT OF MODIFICATION The right of modification includes the right to translate, adapt, arrange, or make any or all modifications to the Software, and the right to reproduce the resulting software. It includes, in particular, the right to create a Derivative Software. The Licensee is authorized to make any or all modification to the Software provided that it includes an explicit notice that it is the author of said modification and indicates the date of the creation thereof. ### 5.3 RIGHT OF DISTRIBUTION In particular, the right of distribution includes the right to publish, transmit and communicate the Software to the general public on any or all medium, and by any or all means, and the right to market, either in consideration of a fee, or free of charge, one or more copies of the Software by any means. The Licensee is further authorized to distribute copies of the modified or unmodified Software to third parties according to the terms and conditions set forth hereinafter. #### 5.3.1 DISTRIBUTION OF SOFTWARE WITHOUT MODIFICATION The Licensee is authorized to distribute true copies of the Software in Source Code or Object Code form, provided that said distribution complies with all the provisions of the Agreement and is accompanied by: 1. a copy of the Agreement, 2. a notice relating to the limitation of both the Licensor's warranty and liability as set forth in Articles 8 and 9, and that, in the event that only the Object Code of the Software is redistributed, the Licensee allows effective access to the full Source Code of the Software at a minimum during the entire period of its distribution of the Software, it being understood that the additional cost of acquiring the Source Code shall not exceed the cost of transferring the data. #### 5.3.2 DISTRIBUTION OF MODIFIED SOFTWARE When the Licensee makes an Integrated Contribution to the Software, the terms and conditions for the distribution of the resulting Modified Software become subject to all the provisions of this Agreement. The Licensee is authorized to distribute the Modified Software, in source code or object code form, provided that said distribution complies with all the provisions of the Agreement and is accompanied by: 1. a copy of the Agreement, 2. a notice relating to the limitation of both the Licensor's warranty and liability as set forth in Articles 8 and 9, and that, in the event that only the object code of the Modified Software is redistributed, the Licensee allows effective access to the full source code of the Modified Software at a minimum during the entire period of its distribution of the Modified Software, it being understood that the additional cost of acquiring the source code shall not exceed the cost of transferring the data. #### 5.3.3 DISTRIBUTION OF DERIVATIVE SOFTWARE When the Licensee creates Derivative Software, this Derivative Software may be distributed under a license agreement other than this Agreement, subject to compliance with the requirement to include a notice concerning the rights over the Software as defined in Article 6.4. In the event the creation of the Derivative Software required modification of the Source Code, the Licensee undertakes that: 1. the resulting Modified Software will be governed by this Agreement, 2. the Integrated Contributions in the resulting Modified Software will be clearly identified and documented, 3. the Licensee will allow effective access to the source code of the Modified Software, at a minimum during the entire period of distribution of the Derivative Software, such that such modifications may be carried over in a subsequent version of the Software; it being understood that the additional cost of purchasing the source code of the Modified Software shall not exceed the cost of transferring the data. #### 5.3.4 COMPATIBILITY WITH THE CeCILL LICENSE When a Modified Software contains an Integrated Contribution subject to the CeCILL license agreement, or when a Derivative Software contains a Related Module subject to the CeCILL license agreement, the provisions set forth in the third item of Article 6.4 are optional. ## Article 6 - INTELLECTUAL PROPERTY ### 6.1 OVER THE INITIAL SOFTWARE The Holder owns the economic rights over the Initial Software. Any or all use of the Initial Software is subject to compliance with the terms and conditions under which the Holder has elected to distribute its work and no one shall be entitled to modify the terms and conditions for the distribution of said Initial Software. The Holder undertakes that the Initial Software will remain ruled at least by this Agreement, for the duration set forth in Article 4.2. ### 6.2 OVER THE INTEGRATED CONTRIBUTIONS The Licensee who develops an Integrated Contribution is the owner of the intellectual property rights over this Contribution as defined by applicable law. ### 6.3 OVER THE RELATED MODULES The Licensee who develops a Related Module is the owner of the intellectual property rights over this Related Module as defined by applicable law and is free to choose the type of agreement that shall govern its distribution under the conditions defined in Article 5.3.3. ### 6.4 NOTICE OF RIGHTS The Licensee expressly undertakes: 1. not to remove, or modify, in any manner, the intellectual property notices attached to the Software; 2. to reproduce said notices, in an identical manner, in the copies of the Software modified or not; 3. to ensure that use of the Software, its intellectual property notices and the fact that it is governed by the Agreement is indicated in a text that is easily accessible, specifically from the interface of any Derivative Software. The Licensee undertakes not to directly or indirectly infringe the intellectual property rights of the Holder and/or Contributors on the Software and to take, where applicable, vis-à-vis its staff, any and all measures required to ensure respect of said intellectual property rights of the Holder and/or Contributors. ## Article 7 - RELATED SERVICES 7.1 Under no circumstances shall the Agreement oblige the Licensor to provide technical assistance or maintenance services for the Software. However, the Licensor is entitled to offer this type of services. The terms and conditions of such technical assistance, and/or such maintenance, shall be set forth in a separate instrument. Only the Licensor offering said maintenance and/or technical assistance services shall incur liability therefor. 7.2 Similarly, any Licensor is entitled to offer to its licensees, under its sole responsibility, a warranty, that shall only be binding upon itself, for the redistribution of the Software and/or the Modified Software, under terms and conditions that it is free to decide. Said warranty, and the financial terms and conditions of its application, shall be subject of a separate instrument executed between the Licensor and the Licensee. ## Article 8 - LIABILITY 8.1 Subject to the provisions of Article 8.2, the Licensee shall be entitled to claim compensation for any direct loss it may have suffered from the Software as a result of a fault on the part of the relevant Licensor, subject to providing evidence thereof. 8.2 The Licensor's liability is limited to the commitments made under this Agreement and shall not be incurred as a result of in particular: (i) loss due the Licensee's total or partial failure to fulfill its obligations, (ii) direct or consequential loss that is suffered by the Licensee due to the use or performance of the Software, and (iii) more generally, any consequential loss. In particular the Parties expressly agree that any or all pecuniary or business loss (i.e. loss of data, loss of profits, operating loss, loss of customers or orders, opportunity cost, any disturbance to business activities) or any or all legal proceedings instituted against the Licensee by a third party, shall constitute consequential loss and shall not provide entitlement to any or all compensation from the Licensor. ## Article 9 - WARRANTY 9.1 The Licensee acknowledges that the scientific and technical state-of-the-art when the Software was distributed did not enable all possible uses to be tested and verified, nor for the presence of possible defects to be detected. In this respect, the Licensee's attention has been drawn to the risks associated with loading, using, modifying and/or developing and reproducing the Software which are reserved for experienced users. The Licensee shall be responsible for verifying, by any or all means, the suitability of the product for its requirements, its good working order, and for ensuring that it shall not cause damage to either persons or properties. 9.2 The Licensor hereby represents, in good faith, that it is entitled to grant all the rights over the Software (including in particular the rights set forth in Article 5). 9.3 The Licensee acknowledges that the Software is supplied "as is" by the Licensor without any other express or tacit warranty, other than that provided for in Article 9.2 and, in particular, without any warranty as to its commercial value, its secured, safe, innovative or relevant nature. Specifically, the Licensor does not warrant that the Software is free from any error, that it will operate without interruption, that it will be compatible with the Licensee's own equipment and software configuration, nor that it will meet the Licensee's requirements. 9.4 The Licensor does not either expressly or tacitly warrant that the Software does not infringe any third party intellectual property right relating to a patent, software or any other property right. Therefore, the Licensor disclaims any and all liability towards the Licensee arising out of any or all proceedings for infringement that may be instituted in respect of the use, modification and redistribution of the Software. Nevertheless, should such proceedings be instituted against the Licensee, the Licensor shall provide it with technical and legal assistance for its defense. Such technical and legal assistance shall be decided on a case-by-case basis between the relevant Licensor and the Licensee pursuant to a memorandum of understanding. The Licensor disclaims any and all liability as regards the Licensee's use of the name of the Software. No warranty is given as regards the existence of prior rights over the name of the Software or as regards the existence of a trademark. ## Article 10 - TERMINATION 10.1 In the event of a breach by the Licensee of its obligations hereunder, the Licensor may automatically terminate this Agreement thirty (30) days after notice has been sent to the Licensee and has remained ineffective. 10.2 A Licensee whose Agreement is terminated shall no longer be authorized to use, modify or distribute the Software. However, any licenses that it may have granted prior to termination of the Agreement shall remain valid subject to their having been granted in compliance with the terms and conditions hereof. ## Article 11 - MISCELLANEOUS ### 11.1 EXCUSABLE EVENTS Neither Party shall be liable for any or all delay, or failure to perform the Agreement, that may be attributable to an event of force majeure, an act of God or an outside cause, such as defective functioning or interruptions of the electricity or telecommunications networks, network paralysis following a virus attack, intervention by government authorities, natural disasters, water damage, earthquakes, fire, explosions, strikes and labor unrest, war, etc. 11.2 Any failure by either Party, on one or more occasions, to invoke one or more of the provisions hereof, shall under no circumstances be interpreted as being a waiver by the interested Party of its right to invoke said provision(s) subsequently. 11.3 The Agreement cancels and replaces any or all previous agreements, whether written or oral, between the Parties and having the same purpose, and constitutes the entirety of the agreement between said Parties concerning said purpose. No supplement or modification to the terms and conditions hereof shall be effective as between the Parties unless it is made in writing and signed by their duly authorized representatives. 11.4 In the event that one or more of the provisions hereof were to conflict with a current or future applicable act or legislative text, said act or legislative text shall prevail, and the Parties shall make the necessary amendments so as to comply with said act or legislative text. All other provisions shall remain effective. Similarly, invalidity of a provision of the Agreement, for any reason whatsoever, shall not cause the Agreement as a whole to be invalid. ### 11.5 LANGUAGE The Agreement is drafted in both French and English and both versions are deemed authentic. ## Article 12 - NEW VERSIONS OF THE AGREEMENT 12.1 Any person is authorized to duplicate and distribute copies of this Agreement. 12.2 So as to ensure coherence, the wording of this Agreement is protected and may only be modified by the authors of the License, who reserve the right to periodically publish updates or new versions of the Agreement, each with a separate number. These subsequent versions may address new issues encountered by Free Software. 12.3 Any Software distributed under a given version of the Agreement may only be subsequently distributed under the same version of the Agreement or a subsequent version. ## Article 13 - GOVERNING LAW AND JURISDICTION 13.1 The Agreement is governed by French law. The Parties agree to endeavor to seek an amicable solution to any disagreements or disputes that may arise during the performance of the Agreement. 13.2 Failing an amicable solution within two (2) months as from their occurrence, and unless emergency proceedings are necessary, the disagreements or disputes shall be referred to the Paris Courts having jurisdiction, by the more diligent Party. Version 1.0 dated 2006-09-05. Yeti-6.4.0/Makefile.in000066400000000000000000000053431253351442600145120ustar00rootroot00000000000000# # Makefile - # # Top-level Makefile for Yeti package and plugins. # #------------------------------------------------------------------------------ # These values filled in by: yorick -batch make.i Y_MAKEDIR= Y_EXE= Y_EXE_PKGS= Y_EXE_HOME= Y_EXE_SITE= #---------------------------------------------------------- configuration flags # These values filled in by: yorick -batch config.i YETI_PKGS = core fftw regex tiff YETI_VERSION_MAJOR = x YETI_VERSION_MINOR = x YETI_VERSION_MICRO = x YETI_VERSION_SUFFIX = "" YORICK_VERSION_MAJOR = x YORICK_VERSION_MINOR = x YORICK_VERSION_MICRO = x YORICK_VERSION_SUFFIX = "" SUBDIRS = $(YETI_PKGS) doc TMPDIR=/tmp DISTRIB_PKGS = core fftw regex tiff DISTRIB_DIRS = doc $(DISTRIB_PKGS) DISTRIB_FILES = AUTHORS LICENSE.md NEWS README.md VERSION \ Makefile Makefile.in configure config.h.in config.i #------------------------------------------------------------------------------ default: all all: check @for dir in $(SUBDIRS); do \ (cd $$dir; make); \ done check: @if test -z "$(YETI_PKGS)" -o ! -r config.h ; then \ echo "***************************************************************"; \ echo " Before building Yeti, you must run the configuration script."; \ echo " This is achieved by a command like:"; \ echo " yorick -batch ./config.i [...]"; \ echo ""; \ echo " See README.md file for detailled instructions. For a summary"; \ echo " of configuration options, you can also try:"; \ echo " yorick -batch ./config.i --help"; \ echo "***************************************************************"; \ false; \ else \ true; \ fi install: check @for dir in $(SUBDIRS); do \ (cd $$dir; make $@); \ done # @for src in AUTHORS COPYING NEWS README.md LICENSE.md VERSION; do \ # dst=$(Y_EXE_SITE)/doc/$$src.yeti; \ # cp -pf $$src $$dst; \ # chmod 644 $$dst; \ # done clean: check rm -f *~ *.tmp @for dir in $(SUBDIRS); do \ (cd $$dir; make $@); \ done distclean: clean rm -f config.h Makefile distrib: @version=`cat VERSION`; \ target="yeti-$${version}"; \ tmpdir="$(TMPDIR)/$${target}"; \ archive="$${tmpdir}.tar.bz2"; \ if test -d "$${tmpdir}"; then \ echo "directory $${tmpdir} already exists"; \ false; \ else \ olddir=`pwd`; \ mkdir -p "$${tmpdir}"; \ for file in $(DISTRIB_FILES); do \ cp -a "$${file}" "$${tmpdir}/$${file}"; \ done; \ for dir in $(DISTRIB_DIRS); do \ cp -a "$${dir}" "$${tmpdir}/$${dir}"; \ done; \ cd "$${tmpdir}"; \ touch config.h; \ make clean YETI_PKGS="$(DISTRIB_PKGS)"; \ rm -f config.h; \ rm -rf */RCS */*~; \ cd "$(TMPDIR)"; \ tar cf - "$${target}" | bzip2 -9 > "$${archive}"; \ cd "$${olddir}"; \ rm -rf "$${tmpdir}"; \ echo "archive is: $${archive}"; \ fi Yeti-6.4.0/NEWS000066400000000000000000000407471253351442600131530ustar00rootroot00000000000000 * YYYY-MM-DD: COMMENT =============================================================================== * 2015-06-02: Yeti version 6.4.0 released. * 2015-06-02: - Yeti is managed as a Git repository on GitHub (https://github.com/emmt/Yeti). - GSL support abandonned (use YGSL: https://github.com/emmt/gsl). =============================================================================== * 2013-09-13: Yeti version 6.3.3 released. * 2013-09-13: - New function: anonymous to create anonymous functions. - New function: h_save to save variables to a hash-table. - New function: h_functor to create functor objects. - Manage to autoload yeti_yhdf.i whenever yhdf_* functions are called. - Add support for saving/restoring the evaluator of the hash table in a Yeti Hierarchical Data File. =============================================================================== * 2010-01-31: Yeti version 6.3.2 released. * 2011-01-31: - New "configure" script to mimics GNU-configure. - Heavy cleanup of the README file ;-) - Function "set_alarm" removed, you should use Yorick's own "after" function which works better and has more features. - Removed some old (unused) stuff to allow for compilation of a stand-alone Yorick+Yeti interpreter (thanks to Dave Munro for the fix). =============================================================================== * 2010-04-16: Yeti version 6.3.1 released. * 2010-04-16: - Memory leak in hash table code fixed. - Integer indexing of hash table has been removed. - Improved robustness of hash tables with respect to interrupts. =============================================================================== * 2010-04-13: Yeti version 6.3.0 released. * 2010-04-13: - Yeti is now under the CeCILL-C license (http://www.cecill.info). - Lots of functions have been moved from Yeti to Yorick. - Some code cleanup. =============================================================================== * 2009-12-16: Yeti version 6.2.5 released. * 2009-12-16: - Range objects can be saved into Yeti Hierarchical Data Files. - New built-in functions: parse_range() and make_range(). =============================================================================== * 2009-12-09: Yeti version 6.2.4 released. * 2009-09-21: - New function morph_enhance to perform non-linear noise reduction on a 2D/3D array. * 2009-09-30: - Changes names of private functions in yeti_sort.c to prevent conflict with stdlib.h in Mac-OS-X. * 2009-05-28: - Some hash table functions have been fixed. - Many small corrections in the documentation in yeti.i. - New functions to save/restore sparse matrices in/from files: sparse_save, sparse_restore. =============================================================================== * 2008-10-29: Yeti version 6.2.3 released. * 2008-10-03: - Fixed bug in h_next which trigger on 64-bit machines. - Some update in hash table documentation. * 2008-04-02: - New functions: o h_grow: grow a member of a hash table; o mem_clear: clear some global symbols; o fullsizeof: compute size of arrays, lists, hash-tables, ... o make_hermitian: make an array Hermitian. - Some documentation fixed. - Function mem_info correctly accounts for hash-tables, pointers and lists. - Fix symbol_names not reporting scalar symbols. * 2008-02-14: - Add an error handler in GSL interface to avoid GSL aborting on error. =============================================================================== * 2008-02-14: Yeti version 6.2.2 released. * 2008-02-14: - Fix functions: identof, is_scalar, is_vector, is_matrix, is_integer, is_real, is_complex, is_string, and is_numerical when argument is an L-value. Thanks to "sguieu" for reporting this bug on Yorick forum at SourceForge. - Functions mem_copy and mem_copy also fixed to prevent this problem. * 2007-12-26: - Cleanup of code to manage dimension lists (restricted functions to: yeti_reset_dimlist, yeti_grow_dimlist and yeti_start_dimlist). Get rid of ynew_dim dependency which will be soon removed from Yorick's core. - Minor fix in window_geometry. * 2007-11-29: - Configuration script fixed to allow for spaces in path to Yorick executable and modified to have defaults for FFTW/GSL/TIFF linker flags (i.e. so that only --with-fftw is needed if FFTW is installed in standard place). * 2007-10-25: - Function h_show now displays name of symbolic links. * 2007-10-16: - New function h_grow to grow contents of hash table members. * 2007-07-27: Yeti version 6.2.2pre1 released. * 2007-07-27: - Script config.i, files config.h.in and Makefile.in updated to account for version numbers in the form: MAJOR.MINOR.MICROSUFFIX. - New "global" variable YETI_VERSION_SUFFIX. * 2007-07-26: - New "global" variables: YETI_VERSION_MAJOR, YETI_VERSION_MINOR and YETI_VERSION_MICRO. - New functions (rgl_roughness_*) for regularization based on roughness with various norms and boundary conditions. =============================================================================== * 2007-05-14: Yeti version 6.2.1 released. * 2007-05-11: - Configuration script fixed to work with Cygwin/MS-Windows and to allow for compilation with Yorick CVS version. - Yeti Hierarchical Data File (YHDF) can now store functions and symbolic links (by their names). * 2007-04-30: - *** POSSIBLE INCOMPATIBILITY *** Due to inconsistencies, the API for symbolic links has been reworked. To create a symbolic link to a variable, the ambiguous `link` function has been deleted in favor of `symlink_to_variable` and `symlink_to_name`. The functions `link_name`, `solve_link` and `is_link` have been renamed as `name_of_symlink`, `value_of_symlink` and `is_symlink` respectively. =============================================================================== * 2007-04-24: Yeti version 6.2.0 released. * 2007-04-24: - Add rules to build and install documentation of Yeti plugins in Yorick installation directory. * 2007-04-19: - The `symbol_names` function can now specifically select lists, hash tables and/or auto-loaded functions. - The `about` routine now account for auto-loaded functions. - Bug in function `make_dimlist` fixed. * 2007-03-23: - Hash table objects can now have their own evaluator, which can be queried/set by the `h_evaluator` function. - New function `h_number` to query number of entries in a hash table. - Function `is_hash` returns 2 for a hash table object implementing its own evaluator. - `h_clone`, `h_copy`, `h_info` and `h_show` fixed to account for the evaluator of an hash table object. - Restricted possible values for JOB/FLAGS in evaluation of matrix products: must be 0 (default) for direct product and 1 for transpose product. * 2007-01-26: New interpreted function: `setup_package`. * 2007-01-23: New object type: symbolic link. New related functions: link, is_link, link_name and solve_link. =============================================================================== * 2006-12-17: Yeti version 6.1.7 released. * 2006-12-01: Renamed built-in `typeIDof` as `identof` and provide constants for `T_CHAR`, `T_SHORT`, ... =============================================================================== * 2006-07-19: Yeti version 6.1.6 released. * 2006-07-19: New builtin function `insure_temporary`. * 2006-07-19: New builtin function `quick_select`. New functions `quick_median` and `quick_interquartile_range` for fast estimation of the median and the inter-quartile range of an array of values. =============================================================================== * 2006-06-10: Yeti version 6.1.5 released. This is the first public release of Yeti which makes use of the new Yorick API "YAPI" and which provides support for GSL (the GNU Scientific Library). * 2006-06-10: - Documentation for Yeti-GSL completed. - Fixed inconsistency in error computation for Yeti-GSL. - New function h_show to display a hash table as an expanded tree. * 2006-01-30: - Built-in function is_list removed from Yeti (it is now part of Yorick). * 2005-12-08: - New built-in functions: make_dimlist. * 2005-12-01: - New built-in functions: window_select, window_exists and window_list. * 2005-11-18: - Changed yeti_tiff plugin so that it uses new Yorick API "YAPI". - In tiff_plugin, TIFF objects can now be indexed by their tag names. * 2005-11-14: - Changed yeti_gsl plugin so that it uses new Yorick API "YAPI". * 2005-09-27: - New plugin yeti_gsl which implements support for GNU Scientific Library. At this time, the plugin provides 118 special functions from the GSL. - Fix bug in sparse_expand(). - In about() function, explicitely filter range operators which have a built-in function counterpart to avoid a deadly bug in Yorick. =============================================================================== * 2005-09-16: Yeti version 6.0.2 released. * 2005-09-16: Use Yorick (instead of Bourne shell) for configuration of Yeti -- much more easy and protable ;-) * 2005-09-13: New built-in function 'machine_constant' to query machine dependent constant (such as DBL_EPSILON). * 2005-08-31: Fixed signedness of strings in yeti_hash.c to avoid compiler warnings. * 2005-08-11: Fixed bad variable name in morph_white_top_hat(), and morph_black_top_hat. =============================================================================== * 2005-07-30: Yeti version 6.0.0 released. * 2005-07-19: - New functions cost_l2(), cost_l2l1() and cost_l2l0() to compute, L2, L2-L1 and L2-L0 cost functions and optionally their gradient. * 2005-06-15: - *** POSSIBLE INCOMPATIBILITY *** The old sparse matrix API has been removed in favor of a new sparse_matrix object class. The new class is different from the old implementation: [1] it has two index lists (one for the rows, the other for the columns of the non-zero coefficients) to break the former limit on the dimension lists of input/output spaces; [2] the sparse matrix object can be used directly as a function to apply (possibly transpose) matrix multiplication; [3] the sparse matrix object can be used as a Yorick structure to query its contents. - *** POSSIBLE INCOMPATIBILITY *** The built-in function _yeti_init was renamed as yeti_init. * 2005-06-14: - New morpho-math functions: morph_dilation, morph_erosion, morph_opening, morph_closing, morph_white_top_hat, morph_black_top_hat. * 2005-06-03: - *** POSSIBLE INCOMPATIBILITY *** Built-in get_path() removed (conflict with same function in Yorick 1.6.02). The new get_path function return a list of path separated by ':' into a single string whereas the former version built-in Yeti used to return an array of strings. * 2005-05-24: - Lots of changes to make Yeti a pure plugin for Yorick 1.6 and trying to maintain compatibility with previous version of Yeti. - Temporarily removed all the chn_* function (channel). - Builtin expand_path() removed in favor of filepath(). - Built-in strlower() and strlower() replaced by interpreted version which use the new strcase() built-in function in Yorick-1.6. Similarly, strtrim() removed because it is now provided by Yorick-1.6 with same behaviour. - "yeti_fftw" and "yeti_tiff" packages no longer require Yeti to be loaded (they are pure standalone Yorick packages). - Regular expression functions regmatch(), regsub(), regcomp(), etc are now provided by a separate package "yeti_regex". Two reasons for that: (1) new Yorick-1.6 now provides some regular expression engine (see strgrep), and (2) my regular expression package required the GNU REGEX library (built-in GNU C library). The GNU REGEX library can be optionally built into the new "yeti_regex" package (i.e. you no longer need an external REGEX library). Also note that the "yeti_regex" does not require Yeti to be loaded (it is a pure standalone Yorick package). - New Built-in functions: is_complex, is_hash, is_integer, is_list, is_matrix, is_numerical, is_real, is_scalar, is_string, is_vector. =============================================================================== * 2005-05-24: yeti-5.3.12 released. * 2005-05-02: - Changes in 'yeti_plugin.c', 'yeti_misc.c', 'yeti_utils.c' and 'yeti.i' to avoid problems on 64-bit machines. A (minor) side effect is that addresses in 'mem_*' hack functions must now be long integer (i.e. int's are no longer allowed). * 2005-04-14: - Fix bugs caused by using a nil string (e.g. string(0)) as hash key and which trigger segmentation violation interrupt (SIGSEGV). - New functions h_first and h_next to scan a hash table. * 2005-02-02: - Support for reloadable plugins. - Complete rewrite of 'configure' script. - CP40/GPL support is now provided as a separate plugin. - Startup routine '_yeti_init' to provide a more useful list of paths for script files (*.i) than the one setup by Yorick. - Removed 'yeti_version' function in favor of '_yeti_init' function and pre-defined variables YETI_VERSION and YETI_HOME. * 2004-10-14: New function "get_includes" to get a list of included files. =============================================================================== * 2004-09-27: yeti-5.3.11 released. * 2004-09-17: Fix a bug in "arc" and "round" built-in functions which triggers when the argument was a non-double scalar (thanks to Clémentine Béchet for finding this bug). * 2004-09-16: "about" interpreted function can optionally ignore case. * 2004-09-15: New interpreted function "h_clone" to clone/copy a hash object. * 2004-09-15: Work around Yorick bug with palette in gg_fit_2d_spike. =============================================================================== * 2004-09-09: yeti-5.3.10 released. * 2004-09-09: Yeti now displays its version number which can be retrieved by function yeti_version. * 2004-09-09: New built-in function "current_include" to get the path of the currently parsed file if any. * 2004-03-12: New built-in function "arc" to compute lengh of arc in radians. * 2004-03-04: Fix built-in "heapsort" to always return a vector (as claimed in the doc). * 2004-03-03: Avoid Gist markers when drawing widgets in yeti_gist_gui.i. * 2004-02-27: Hash table objects can be invoked as a function with a member name (syntaxic shortcut for h_get) or with a nil argument to get the number of elements. =============================================================================== * 2004-02-23: yeti-5.3.8 released. * 2004-02-23: Added built-in function "eigen" to compute the spectral decomposition of real symetric matrices. =============================================================================== = Local Variables: = = mode: text = = tab-width: 8 = = fill-column: 79 = = coding: utf-8 = = End: = =============================================================================== Yeti-6.4.0/README.md000066400000000000000000000300121253351442600137130ustar00rootroot00000000000000 # Yeti: a Yorick extension Yeti is an extension of Yorick (a fast interpreted interactive data processing language written by David Munro) which implements (see "*Quick Reference*" below for a list of additional functions): * hash table objects * regular expressions * complex, real-complex and complex-real FFT by FFTW (the Fastest Fourier Transform in the West -- version 2) * wavelet filtering ("*à trou*" method) * fast convolution along a chosen dimension with various border conditions * more string functions * memory hacking routines * more math functions (sinc, round, arc) * generalized matrix-vector multiplication (with possibly sparse matrix) * routines to query/check Yorick's symbols * support for reading TIFF images * morpho-math operators * ... This distribution of Yeti may come with several extensions (depending whether corresponding directories exist or not): * `fftw` ..... support for FFTW * `regex` .... support for POSIX regular expressions * `tiff` ..... support for reading TIFF images ## Compilation and Installation Starting with version 6.0.0, Yeti is built as a regular Yorick plugin and some of the Yeti extensions are built as standalone Yorick plugins (they do not require Yeti to be used). The installation of Yeti depends on that of Yorick which must have been installed prior to Yeti. You'll need at least version 2.2 of Yorick (for Yorick version 1.5, you can install Yeti 5.3 and for Yorick 1.6 to 2.1 you can install Yeti 6.2). The first installation step consists in the configuration of Yeti software suite. This is done via the "configure" script (1). In order to figure out the different options (and their default values), you can just do: ./configure --help At least version 2.2 of Yorick is required to configure/build this version of Yeti. By default, the script tries to find an axecutable named "yorick" in your path (according to your environment variable PATH). You can however specify a different Yorick executable with option: ./configure [...] --yorick=PATH_TO_YORICK_EXECUTABLE [...] with `PATH_TO_YORICK_EXECUTABLE` the full path to Yorick executable. If you want Yeti plugin for extended regular support, you'll have the choice to use a system provided POSIX REGEX library or the GNU REGEX library which can be built into the plugin. This is achieved by defining the preprocessor macro `HAVE_REGEX` to true if you trust your system REGEX library. To use the POSIX REGEX library of your system: ./configure [...] --with-regex-defs='-DHAVE_REGEX=1' [...] and to build the GNU REGEX library into the plugin (this is the default behaviour): ./configure [...] --with-regex-defs='-DHAVE_REGEX=0' [...] For instance, here is how to call the configure script to use builtin REGEX support and to enable plugins for FFTW (installed in /usr/local) and TIFF (installed in standard locations): ./configure --with-regex \ --with-fftw --with-fftw-defs="-I/usr/local/include" \ --with-fftw-libs="-L/usr/local/lib -lrfftw -lfftw" \ --with-tiff --with-tiff-libs="-ltiff" In order to check your configuration settings, you can add `--help` as the last argument of the call to `./configure`. After configuration, you can build Yeti and all related plugins by just doing: make In order to install Yeti files into Yorick installation directories, you simply do: make install Note that the `configure` script is actually a wrapper around the command: yorick -batch ./config.i [...] with [...] the arguments passed to `configure`. ## Quick Reference h_cleanup ............ delete void members of hash table object h_clone .............. clone a hash table h_copy ............... duplicate hash table object h_delete ............. delete members from a hash table h_first .............. get name of first hash table member h_functor ............ create "functor" object h_get ................ get value of hash table member h_has ................ check existence of hash table member h_info ............... list contents of hash table h_keys ............... get member names of a hash table object h_list ............... make a hash table into a list h_new. ............... create a new hash table object h_next ............... get name of next hash table member h_pop ................ pop member out of an hash table object h_restore_builtin .... restore builtin functions h_save ............... save variables in a hash table h_save_symbols ....... save builtin functions h_set................. set member of hash table object h_set_copy............ set member of hash table object h_show ............... display a hash table as an expanded tree h_stat ............... get statistics of hash table object Yeti Hierarchical Data (YHD) files: #include "yeti_yhdf.i" yhd_check ............ check version of YHD file yhd_info ............. print some information about an YHD file yhd_restore .......... restore a hash table object from an YHD file yhd_save ............. save a hash table object into an YHD file Regular Expressions: #include "yeti_regex.i" regcomp ............... compile regular expression regmatch .............. match a regular expression against an array of strings regmatch_part ......... peek substrings given indices returned by regmatch regsub ................ substitute regular expression into an array of strings Miscellaneous: anonymous ............. create anonymous (lambda) functions expand_path ........... expand directory names to absolute paths heapsort............... sort an array by heap-sort method insure_temporary ...... make sure a variable is not referenced is_hash ............... check if an object is a hash table is_sparse_matrix ...... check if an object is a sparse matrix quick_select .......... find K-th smallest element in an array quick_median .......... find median value (faster than median function) quick_interquartile_range ...................... compute inter-quartile range of values yeti_init setup ....... setup Yeti internals and query version Memory Hacking: mem_base .............. get base address of an array object mem_copy .............. copy array data at a given address mem_info .............. print memory information mem_peek .............. make a new array from a base address, type and dimension list native_byte_order ..... compute native byte order Binary Encoding of Data: get_encoding .......... get description of binary encoding for various machines install_encoding ...... install binary description into a binary stream same_encoding ......... compare two encodings Math/Numerical: arc ................... lengh of arc in radians cost_l2 ............... cost function and gradient for l2 norm cost_l2l0 ............. cost function and gradient for l2-l0 norm cost_l2l1 ............. cost function and gradient for l2-l1 norm machine_constant ...... get machine dependent constant (such as EPSILON) mvmult ................ (sparse)matrix-vector multiplication round ................. round to nearest integer sinc .................. cardinal sine: sinc(x) = sin(pi*x)/(pi*x) smooth3 ............... smooth an array by 3-element convolution sparse_expand ......... convert a sparse matrix array into a regular array sparse_grow ........... augment a sparse array sparse_matrix ......... create a new sparse matrix sparse_squeeze ........ convert a regular array into a sparse one yeti_convolve ......... convolution along a given dimension yeti_wavelet .......... "à trou" wavelet decomposition Strings: strlower .............. convert array of strings to lower case strtrimleft ........... remove leading spaces from an array of strings strtrimright .......... remove trailing spaces from an array of strings strupper .............. convert array of strings to upper case Yorick Internals: memory_info ........... display memory used by Yorick symbols symbol_info ........... get some information about existing Yorick symbols nrefsof ............... get number of references of an object Morpho-math operations: morph_black_top_hat ... perform valley detection morph_closing ......... perform morpho-math closing operation morph_dilation ........ perform morpho-math dilation operation morph_erosion ......... perform morpho-math erosion operation morph_opening ......... perform morpho-math opening operation morph_white_top_hat ... perform summit detection TIFF images: #include "yeti_tiff.i" tiff_open ............. open TIFF file tiff_debug ............ control printing of TIFF warning messages tiff_read_pixels ...... read pixel values in a TIFF file tiff_read_image ....... read image in a TIFF file tiff_read_directory ... move to next TIFF "directory" tiff_read ............. read image/pixels in a TIFF file tiff_check ............ check if a file is a readable TIFF file. FFTW: #include "yeti_fftw.i" fftw_plan ............. setup a plan for FFTW fftw .................. computes FFT of an array according to a plan cfftw ................. computes complex FFT of an array fftw_indgen ........... generates FFT indices fftw_dist ............. computes length of spatial frequencies fftw_smooth ........... smooths an array fftw_convolve ......... fast convolution of two arrays ## Copyright and Warranty (See file [LICENSE](./LICENSE.md) for details.) > Copyright (C), 1996-2015: Éric Thiébaut > > This software is governed by the CeCILL-C license under French law and abiding > by the rules of distribution of free software. You can use, modify and/or > redistribute the software under the terms of the CeCILL-C license as > circulated by CEA, CNRS and INRIA at the following URL > "http://www.cecill.info". > > As a counterpart to the access to the source code and rights to copy, modify > and redistribute granted by the license, users are provided only with a > limited warranty and the software's author, the holder of the economic rights, > and the successive licensors have only limited liability. > > In this respect, the user's attention is drawn to the risks associated with > loading, using, modifying and/or developing or reproducing the software by the > user in light of its specific status of free software, that may mean that it > is complicated to manipulate, and that also therefore means that it is > reserved for developers and experienced professionals having in-depth computer > knowledge. Users are therefore encouraged to load and test the software's > suitability as regards their requirements in conditions enabling the security > of their systems and/or data to be ensured and, more generally, to use and > operate it in the same conditions as regards security. > > The fact that you are presently reading this means that you have had knowledge > of the CeCILL-C license and that you accept its terms. ## References and notes 1. [Yorick](http://github.com/dhmunro/yorick) is an interpreted programming language for scientific simulations or calculations, postprocessing or steering large simulation codes, interactive scientific graphics, and reading, writing, or translating large files of numbers. 2. [FFTW](http://www.fftw.org/) is *the Fastest Fourier Transform in the West*. Please note that since FFTW API has changed with FFTW version 3, only FFTW version 2 (latest is 2.1.5) is supported in Yeti. A new plug-in [XFFT](https://github.com/emmt/xfft) will be soon available for FFTW3 in Yorick. 3. To use some special functions of [GSL](http://www.gnu.org/software/gsl/) (the GNU Scientific Library) in Yorick, `yeti_gsl` has been abandoned in favor of a separate plugin [YGSL](https://github.com/emmt/ygsl). 4. [LibTIFF](http://www.libtiff.org/) is a free TIFF library which provides support for the *Tag Image File Format* (TIFF), a widely used format for storing image data. Yeti-6.4.0/TODO000066400000000000000000000044031253351442600131310ustar00rootroot00000000000000NOTATIONS --------- [ ] means "to do" [-] means "abandoned" (i.e. it was a bad idea or found another way to solve for that) [+] means "done" YETI ---- [ ] use GIT instead of RCS; [ ] autoload yhd_save, yhd_restore, etc.; [ ] add support for sparse matrix in YHD files; [ ] make a h_save and h_restore functions: h_save, tab, var1, var2, ...; // saves variables as members of tab same as: h_set, tab, var1=var1, var2=var2, ...; tab = h_save(var1, var2, ...); // creates new hash-table with members same as: tab = h_new(var1=var1, var2=var2, ...); h_restore, tab; // restore all members in current context h_restore, tab, var1, var2, ...; // restore some members [-] make Yeti 'relocatable' like Yorick (now Yeti is a regular plugin of Yorick) [ ] write a widget/plotter server (with Tcl/Tk or Gtk) [+] make Yeti as a plugin for Yorick 1.6 [+] implement (re)loadable plugins [ ] use Yeti stack managed workspace in yeti_regex.c [ ] give up compatibility with Yorick-1.4 [ ] cleanup yeti_strcpy, yeti_strncpy in yeti_utils.c -- provide a p_strncpy which is missing in Yorick's Play library, in fact: #define p_strncpy(s,n) p_strncat((char *)0, (s), (n)) would do the trick [ ] implement binary data conversion in yeti_channel.c INTERNALS OF YORICK ------------------- [-] Modify FreeBIFunction() in ydata.c so that freeing builtin function becomes impossible. [-] Patch the list object code to avoid loop in references. [-] Have some mean to avoid warning messages when freeing builtin functions (which sucks when a plugin get reloaded). HASH OBJECTS ------------ [ ] h_cpy() to effectively duplicate a hash table object. [ ] There is maybe a possibility to extend the cases where member assignation is allowed (OBJ.MEMBER = VALUE should behave as h_set, OBJ, MEMBER=VALUE). [ ] cleanup hash-table code (remove/change h_copy/h_clone functions and make the 'ydf' format available in yeti_hash.i) [ ] no needs(?) for 'ReplaceRef' in hash table code INTERFACE TO YORICK INTERNALS ----------------------------- [ ] remove suffix _base in all yeti_push_new..._base and remove corresponding macros that return a DataBlock (if you really want the address of the new DataBlock you can just take the value of the topmost stack synbol) Yeti-6.4.0/VERSION000066400000000000000000000000061253351442600135040ustar00rootroot000000000000006.4.0 Yeti-6.4.0/config.h.in000066400000000000000000000077341253351442600144760ustar00rootroot00000000000000/* * config.h - * * Configuration settings for compiling Yeti and others Yorick extensions. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- * * $Id$ * $Log$ */ #ifndef _YETI_CONFIG_H #define _YETI_CONFIG_H 1 /*---------------------------------------------------------------------------*/ /* Yorick version numbers: */ #define YORICK_VERSION_MAJOR x #define YORICK_VERSION_MINOR x #define YORICK_VERSION_MICRO x #define YORICK_VERSION_SUFFIX "" /* Yeti version numbers: */ #define YETI_VERSION_MAJOR x #define YETI_VERSION_MINOR x #define YETI_VERSION_MICRO x #define YETI_VERSION_SUFFIX "" /* Define the following macro to true if Yorick does not export the definitions of autoload_t structure: */ #define YETI_MUST_DEFINE_AUTOLOAD_TYPE x /*---------------------------------------------------------------------------*/ /* Byte order (+1 for big endian, -1 for little endian). */ #define YETI_BYTE_ORDER x /* Size (in bytes) of basic C-types. */ #define YETI_CHAR_SIZE x #define YETI_SHORT_SIZE x #define YETI_INT_SIZE x #define YETI_LONG_SIZE x #define YETI_FLOAT_SIZE x #define YETI_DOUBLE_SIZE x #define YETI_POINTER_SIZE x /*---------------------------------------------------------------------------*/ /* Unfortunately the code interface to Yorick change with the version and none of the Yorick headers provide this information. The following defintions attempt to provide a more uniform interface. */ #undef p_strfree #if (YORICK_VERSION_MAJOR == 1 && YORICK_VERSION_MINOR == 4) # include "defstr.h" # define p_malloc Ymalloc /* usage: p_malloc(SIZE) */ # define p_realloc Yrealloc /* usage: p_realloc(ADDR, SIZE) */ # define p_free Yfree /* usage: p_free(ADDR) */ # define p_strcpy StrCpy /* usage: p_strcpy(STR) -- also see yeti_strcpy */ # define p_strfree StrFree /* usage: p_strfree(STR) */ # define p_stralloc StrAlloc /* usage: p_stralloc(LEN) */ #endif /* Yorick 1.4 */ #if ((YORICK_VERSION_MAJOR == 1 && YORICK_VERSION_MINOR >= 5) || YORICK_VERSION_MAJOR >= 2) # include "pstdlib.h" # define p_strfree p_free /* usage: p_strfree(STR) */ # define p_stralloc(LEN) p_malloc((LEN)+1) /* usage: p_stralloc(LEN) */ #endif /* Yorick 1.5 and newer */ #ifndef p_stralloc # error "unsupported Yorick version" #endif /*---------------------------------------------------------------------------*/ #endif /* _YETI_CONFIG_H */ Yeti-6.4.0/config.i000066400000000000000000000417741253351442600140740ustar00rootroot00000000000000/* config.i - * * Configuration script for setting up building of Yeti. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. */ /*---------------------------------------------------------------------------*/ /* GLOBAL CONFIGURATION PARAMETERS */ /* path to Yorick executable: */ local CFG_YORICK; /* Yorick version: */ local CFG_YORICK_VERSION; local CFG_YORICK_VERSION_MAJOR, CFG_YORICK_VERSION_MINOR, CFG_YORICK_VERSION_MICRO; /* Yeti version: */ local CFG_YETI_VERSION; local CFG_YETI_VERSION_MAJOR, CFG_YETI_VERSION_MINOR, CFG_YETI_VERSION_MICRO; /* Settings for FFTW plugin: */ local CFG_WITH_FFTW, CFG_WITH_FFTW_DEFS, CFG_WITH_FFTW_LIBS; CFG_WITH_FFTW = "no"; CFG_WITH_FFTW_DEFS = ""; CFG_WITH_FFTW_LIBS = "-lrfftw -lfftw"; /* Settings for REGEX plugin: */ local CFG_WITH_REGEX, CFG_WITH_REGEX_DEFS, CFG_WITH_REGEX_LIBS; CFG_WITH_REGEX = "yes"; CFG_WITH_REGEX_DEFS = ""; CFG_WITH_REGEX_LIBS = ""; /* Settings for TIFF plugin: */ local CFG_WITH_TIFF, CFG_WITH_TIFF_DEFS, CFG_WITH_TIFF_LIBS; CFG_WITH_TIFF = "no"; CFG_WITH_TIFF_DEFS = ""; CFG_WITH_TIFF_LIBS = "-ltiff"; /*---------------------------------------------------------------------------*/ /* HELP AND MAIN CONFIGURATION FUNCTIONS */ func cfg_help { w = cfg_prt; /* shortcut */ w, "Usage: yorick -batch ./config.i [OPTIONS]"; w, " Configure Yeti software for compilation and installation. The default"; w, " settings shown in brackets below are based on values set by environment"; w, " variables or guessed by this script."; w, ""; w, "Options: (default values shown in brackets)"; w, " -h, --help print this help summary"; w, " --debug turn on debug mode for this script"; w, " --yorick=PATH path to Yorick executable [%s]", CFG_YORICK; w, ""; w, " --with-fftw=yes/no build FFTW plugin? [%s]", CFG_WITH_FFTW; w, " --with-fftw-defs=DEFS preprocessor options for FFTW [%s]", CFG_WITH_FFTW_DEFS; w, " --with-fftw-libs=LIBS library specification for FFTW [%s]", CFG_WITH_FFTW_LIBS; w, ""; w, " --with-regex=yes/no build REGEX plugin? [%s]", CFG_WITH_REGEX; w, " --with-regex-defs=DEFS preprocessor options for REGEX [%s]", CFG_WITH_REGEX_DEFS; w, " --with-regex-libs=LIBS library specification for REGEX [%s]", CFG_WITH_REGEX_LIBS; w, ""; w, " --with-tiff=yes/no build TIFF plugin? [%s]", CFG_WITH_TIFF; w, " --with-tiff-defs=DEFS preprocessor options for TIFF [%s]", CFG_WITH_TIFF_DEFS; w, " --with-tiff-libs=LIBS library specification for TIFF [%s]", CFG_WITH_TIFF_LIBS; w, ""; w, "Alternative syntax:"; w, " --with-PACKAGE same as --with-PACKAGE=yes"; w, " --without-PACKAGE same as --with-PACKAGE=no"; } func cfg_configure(argv) { extern CFG_DIR; extern CFG_YORICK; extern CFG_YORICK_VERSION; extern CFG_YORICK_VERSION_MAJOR, CFG_YORICK_VERSION_MINOR, CFG_YORICK_VERSION_MICRO; extern CFG_YETI_VERSION; extern CFG_YETI_VERSION_MAJOR, CFG_YETI_VERSION_MINOR, CFG_YETI_VERSION_MICRO; extern CFG_WITH_FFTW, CFG_WITH_FFTW_DEFS, CFG_WITH_FFTW_LIBS; extern CFG_WITH_REGEX, CFG_WITH_REGEX_DEFS, CFG_WITH_REGEX_LIBS; extern CFG_WITH_TIFF, CFG_WITH_TIFF_DEFS, CFG_WITH_TIFF_LIBS; CFG_YORICK = argv(1); CFG_DIR = get_cwd(); CFG_TMP = "config.tmp"; CFG_DEBUG = 0n; pkg_list = ["fftw", "regex", "tiff"]; nil = string(); s = string(); argc = numberof(argv); for (i=2 ; i<=argc ; ++i) { arg = argv(i); if (arg == "-h" || arg == "--help") { cfg_help; if (batch()) quit; } else if (arg == "--debug") { CFG_DEBUG = 1n; } else if ((s = cfg_split(arg, "--yorick="))) { CFG_YORICK = s; } else if ((s = cfg_split(arg, "--with-"))) { sel = strfind("=", s); if (sel(2) < 0) { cfg_define, "CFG_WITH_" + s, "yes"; } else if (sel(1) > 0) { cfg_define, "CFG_WITH_" + strpart(s, 1:sel(1)), strpart(s, sel(2)+1:0); } else { cfg_die, "bad option \"", arg, "\""; } } else if ((s = cfg_split(arg, "--without-"))) { cfg_define, "CFG_WITH_" + s, "no"; } else { cfg_die, "unknown option: \"", arg, "\"; try \"--help\""; } } /* Get version of Yeti. */ CFG_YETI_VERSION = rdline(open("VERSION")); cfg_parse_version, "CFG_YETI", CFG_YETI_VERSION; /* Get version of Yorick executable. */ if (structof(CFG_YORICK) != string) { CFG_YORICK_VERSION = nil; } else { write, format="%s\n", open(CFG_TMP, "w"), "write, format=\"%s\", Y_VERSION;" CFG_YORICK_VERSION = rdline(popen("\"" + CFG_YORICK + "\" -batch " + CFG_TMP, 0)); if (! CFG_DEBUG) remove, CFG_TMP; } if (! CFG_YORICK_VERSION) cfg_die, "bad path to Yorick executable"; cfg_parse_version, "CFG_YORICK", CFG_YORICK_VERSION; if (CFG_YORICK_VERSION_MAJOR < 1 || (CFG_YORICK_VERSION_MAJOR == 1 && CFG_YORICK_VERSION_MINOR < 6)) { cfg_die, "too old Yorick version (upgrade to at least Yorick 1.6-02)"; } /* Figure out some builtin capabilities of Yorick. */ CFG_PROVIDE_ROUND = (is_func(round) == 2); CFG_PROVIDE_IDENTOF = (is_func(identof) == 2); CFG_PROVIDE_SWAP = (is_func(swap) == 2); CFG_PROVIDE_UNREF = (is_func(unref) == 2); CFG_PROVIDE_IS_LIST = (is_func(is_list) == 2); CFG_PROVIDE_IS_SCALAR = (is_func(is_scalar) == 2); CFG_PROVIDE_IS_VECTOR = (is_func(is_vector) == 2); CFG_PROVIDE_IS_MATRIX = (is_func(is_matrix) == 2); CFG_PROVIDE_IS_INTEGER = (is_func(is_integer) == 2); CFG_PROVIDE_IS_REAL = (is_func(is_real) == 2); CFG_PROVIDE_IS_COMPLEX = (is_func(is_complex) == 2); CFG_PROVIDE_IS_STRING = (is_func(is_string) == 2); CFG_PROVIDE_IS_NUMERICAL = (is_func(is_numerical) == 2); CFG_PROVIDE_SYMBOL_EXISTS = (is_func(symbol_exists) == 2); CFG_PROVIDE_SYMBOL_NAMES = (is_func(symbol_names) == 2); CFG_PROVIDE_GET_INCLUDES = (is_func(get_includes) == 2); CFG_PROVIDE_CURRENT_INCLUDE = (is_func(current_include) == 2); CFG_PROVIDE_WINDOW_GEOMETRY = (is_func(window_geometry) == 2); CFG_PROVIDE_WINDOW_EXISTS = (is_func(window_exists) == 2); CFG_PROVIDE_WINDOW_SELECT = (is_func(window_select) == 2); CFG_PROVIDE_WINDOW_LIST = (is_func(window_list) == 2); if (is_func(random_n) != 3) { cfg_die, "expecting that random_n() be an autoloaded function"; } else { CFG_YETI_MUST_DEFINE_AUTOLOAD_TYPE = (nameof(random_n) != "random_n"); } /* Byte order. */ CFG_BYTE_ORDER = 0; n = sizeof(int); buf = array(char, n); for (i = 1; i <= n; ++i) buf(i) = i; val = cfg_cast(buf, int); if (n == 2) { if (val == 0x0102) { CFG_BYTE_ORDER = +1; } else if (val == 0x0201) { CFG_BYTE_ORDER = -1; } } else if (n == 4) { if (val == 0x01020304) { CFG_BYTE_ORDER = +1; } else if (val == 0x04030201) { CFG_BYTE_ORDER = -1; } } else if (n == 8) { if ((val & 0xFFFFFFFF) == 0x05060708) { CFG_BYTE_ORDER = -1; } else if ((val & 0xFFFFFFFF) == 0x04030201) { CFG_BYTE_ORDER = -1; } } if (! CFG_BYTE_ORDER) { cfg_die, "unknown byte order"; } /* Greeting message. */ cfg_prt; cfg_prt, "*** This is the configuration script for Yeti ***"; cfg_prt; cfg_prt, "Yeti version is: %d.%d.%d%s", CFG_YETI_VERSION_MAJOR, CFG_YETI_VERSION_MINOR, CFG_YETI_VERSION_MICRO, CFG_YETI_VERSION_SUFFIX; cfg_prt; cfg_prt, "Yorick executable is: %s", CFG_YORICK; cfg_prt, "Yorick version is: %d.%d.%d%s", CFG_YORICK_VERSION_MAJOR, CFG_YORICK_VERSION_MINOR, CFG_YORICK_VERSION_MICRO, CFG_YORICK_VERSION_SUFFIX; /* Build/fix Makefile and yeti.h in directory yeti. */ cfg_prt; cfg_prt, "Setup for building Yeti..."; cfg_change_dir, "core"; cfg_fix_makefile; cfg_change_dir, "doc"; cfg_fix_makefile; yeti_pkgs = "core"; for (i=1 ; i<=numberof(pkg_list) ; ++i) { pkg = pkg_list(i); PKG = strcase(1, pkg); with_pkg = symbol_def("CFG_WITH_" + PKG); if ((s = structof(with_pkg)) == string) { with_pkg = (strcase(0, with_pkg) == "yes"); } else { with_pkg = !(! with_pkg); } cfg_prt; def_have = (pkg == "fftw" || pkg == "tiff"); if (with_pkg) { defs = symbol_def("CFG_WITH_" + PKG + "_DEFS"); libs = symbol_def("CFG_WITH_" + PKG + "_LIBS"); if (def_have) defs += " -DHAVE_" + PKG + "=1"; cfg_prt, "Package \"%s\" will be built with the following settings:", pkg; cfg_prt, " PKG_CFLAGS = %s", defs; cfg_prt, " PKG_DEPLIBS = %s", libs; yeti_pkgs += " " + pkg; } else { cfg_prt, "Package \"%s\" will not be built.", pkg; defs = (def_have ? "-DHAVE_" + PKG + "=0" : ""); libs = ""; } /* Build/fix package Makefile. */ cfg_change_dir, pkg; cfg_fix_makefile; cfg_update, "Makefile", "make", "PKG_CFLAGS", defs, "PKG_DEPLIBS", libs; } /* Fix top level Makefile and creates config.h. */ cfg_prt; cfg_prt, "Setting up top-level Makefile..."; cd, CFG_DIR; cfg_update, "Makefile.in", "make", "YETI_PKGS", yeti_pkgs, "YETI_VERSION_MAJOR", CFG_YETI_VERSION_MAJOR, "YETI_VERSION_MINOR", CFG_YETI_VERSION_MINOR, "YETI_VERSION_MICRO", CFG_YETI_VERSION_MICRO, "YETI_VERSION_SUFFIX", "\"" + CFG_YETI_VERSION_SUFFIX + "\"", "YORICK_VERSION_MAJOR", CFG_YORICK_VERSION_MAJOR, "YORICK_VERSION_MINOR", CFG_YORICK_VERSION_MINOR, "YORICK_VERSION_MICRO", CFG_YORICK_VERSION_MICRO, "YORICK_VERSION_SUFFIX", "\"" + CFG_YORICK_VERSION_SUFFIX + "\""; cfg_fix_makefile; cfg_update, "config.h.in", "cpp", "YETI_VERSION_MAJOR", CFG_YETI_VERSION_MAJOR, "YETI_VERSION_MINOR", CFG_YETI_VERSION_MINOR, "YETI_VERSION_MICRO", CFG_YETI_VERSION_MICRO, "YETI_VERSION_SUFFIX", "\"" + CFG_YETI_VERSION_SUFFIX + "\"", "YORICK_VERSION_MAJOR", CFG_YORICK_VERSION_MAJOR, "YORICK_VERSION_MINOR", CFG_YORICK_VERSION_MINOR, "YORICK_VERSION_MICRO", CFG_YORICK_VERSION_MICRO, "YORICK_VERSION_SUFFIX", "\"" + CFG_YORICK_VERSION_SUFFIX + "\"", "YETI_BYTE_ORDER", CFG_BYTE_ORDER, "YETI_CHAR_SIZE", sizeof(char), "YETI_SHORT_SIZE", sizeof(short), "YETI_INT_SIZE", sizeof(int), "YETI_LONG_SIZE", sizeof(long), "YETI_FLOAT_SIZE", sizeof(float), "YETI_DOUBLE_SIZE", sizeof(double), "YETI_POINTER_SIZE", sizeof(pointer), "YETI_MUST_DEFINE_AUTOLOAD_TYPE", CFG_YETI_MUST_DEFINE_AUTOLOAD_TYPE; cfg_prt, " created config.h"; /* Final message. */ cfg_prt; cfg_prt, "OK all done, you can build and install Yeti plugin(s) by:"; cfg_prt, " make all"; cfg_prt, " make install"; cfg_prt; } /*---------------------------------------------------------------------------*/ /* UTILITY FUNCTIONS */ func cfg_prt(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13) /* DOCUMENT cfg_prt, str; -or- cfg_prt, fmt, arg1, arg2, ...; Print out string STR (with no leading space and a terminating newline) or arguments ARG1, ARG2, etc with format FMT. SEE ALSO: write. */ { w = write; /* short alias */ if (is_void(x0)) { w, format="%s\n", ""; return; } if (is_void(x1)) { w, format="%s\n", x0; return; } if (strpart(x0, 0:0) != "\n") x0 += "\n"; /**/ if (is_void(x2)) w,format=x0,x1; else if (is_void(x3)) w,format=x0,x1,x2; else if (is_void(x4)) w,format=x0,x1,x2,x3; else if (is_void(x5)) w,format=x0,x1,x2,x3,x4; else if (is_void(x6)) w,format=x0,x1,x2,x3,x4,x5; else if (is_void(x7)) w,format=x0,x1,x2,x3,x4,x5,x6; else if (is_void(x8)) w,format=x0,x1,x2,x3,x4,x5,x6,x7; else if (is_void(x9)) w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8; else if (is_void(x10)) w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8,x9; else if (is_void(x11)) w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10; else if (is_void(x12)) w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11; else if (is_void(x13)) w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12; else w,format=x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13; } func cfg_yes_or_no(arg) { return (arg ? "yes" : "no"); } func cfg_die(msg, ..) { while (more_args()) msg += next_arg(); if (batch()) { write, format="*** ERROR *** %s\n", msg; quit; } else { error, msg; } } func cfg_parse_version(prefix, version) { extern CFG_YORICK_VERSION_MAJOR, CFG_YORICK_VERSION_MINOR, CFG_YORICK_VERSION_MICRO, CFG_YORICK_VERSION_SUFFIX; extern CFG_YETI_VERSION_MAJOR, CFG_YETI_VERSION_MINOR, CFG_YETI_VERSION_MICRO, CFG_YETI_VERSION_SUFFIX; major = 0L; minor = 0L; micro = 0L; suffix = dummy = string(); n = sread(version, format="%d.%d.%d%s%s", major, minor, micro, suffix, dummy); if (n != 3 && n != 4) { cfg_die, "bad version \"", version, "\" for ", prefix; } symbol_set, prefix + "_VERSION_MAJOR", major; symbol_set, prefix + "_VERSION_MINOR", minor; symbol_set, prefix + "_VERSION_MICRO", micro; symbol_set, prefix + "_VERSION_SUFFIX", (suffix ? suffix : ""); } func cfg_load(filename, grain) /* Load text file. */ { buf = []; if (! (file = open(filename, "r", 1))) { cfg_die, "cannot open file \"" + filename + "\" for reading"; } if (! grain) grain = 10; while (1n) { tmp = rdline(file, grain); i = where(tmp); if (! is_array(i)) return buf; grow, buf, tmp(1:i(0)); /* In order to avoid O(N^2) process, adjust number of lines read at a time so as to double the total number of lines read after next rdline: */ grain = numberof(buf); } } func cfg_update(target, style, ..) { if (style == "cpp") { fmt1 = "^#[ \t]*define[ \t][ \t]*%s([ \t]|$)"; fmt2 = "#define %s %s"; } else if (style == "sh") { fmt1 = "^[ \t]*%s="; fmt2 = "%s=%s"; } else if (style == "make") { fmt1 = "^[ \t]*%s[ \t]*="; fmt2 = "%s = %s"; } else { cfg_die, "cfg_update: bad value for STYLE parameter"; } /* Load text file. */ buf = cfg_load(target); while (more_args()) { macro = next_arg(); if (! more_args()) cfg_die, "cfg_update: missing macro value"; value = next_arg(); if ((s = structof(value)) != string) { if (s==long || s==int || s==short || s==char) { value = swrite(format="%d", value); } else { cfg_die, "unexpected data type for value of macro ", macro; } } pat = swrite(format=fmt1, macro); i = where(strgrep(pat, buf)(2,) > 0); if (is_array(i)) buf(i) = swrite(format=fmt2, macro, value); } /* Replace old file by new one. */ if (strlen(target) > 3 && strpart(target, -3:-3) != "/" && strpart(target, -2:0) == ".in") { dest = strpart(target, 1:-3); write, open(dest, "w"), format="%s\n", buf; } else { backup = target + ".bak"; temporary = target + ".tmp"; write, open(temporary, "w"), format="%s\n", buf; rename, target, backup; rename, temporary, target; if (! CFG_DEBUG) remove, backup; } } func cfg_split(arg, prefix) { len = strlen(prefix); if (strpart(arg, 1:len) == prefix) { return strpart(arg, len+1:0); } } func cfg_define(name, value) { c = strchar(strcase(1, name)); if (is_array((i = where((c == '-'))))) c(i) = '_'; symbol_set, strchar(c), value; } func cfg_change_dir(dir) { write, format=" entering directory: %s\n", dir; cd, CFG_DIR; cd, dir; } func cfg_fix_makefile { write, format="%s", " "; system, "\"" + CFG_YORICK + "\" -batch make.i"; } func cfg_cast(a, type, dimlist) /* DOCUMENT cfg_cast(a, type, dims) This function returns array A reshaped to an array with given TYPE and dimension list. SEE ALSO: reshape. */ { local r; reshape, r, &a, type, dimlist; return r; } /*---------------------------------------------------------------------------*/ /* CLOSURE (run the configuration script if in batch mode) */ if (batch()) { cfg_configure, get_argv(); quit; } /* * Local Variables: * mode: Yorick * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/configure000077500000000000000000000020411253351442600143440ustar00rootroot00000000000000#! /bin/sh # # Wrapper script around "yorick -batch config.i" dir=`dirname "$0"` config_i="$dir/config.i" if test ! -f "$config_i" -o ! -r "$config_i"; then echo >&2 "File \"config.i\" not found." echo >&2 "Maybe Yeti distribution not properly unpacked." exit 1 fi yorick= n=1 while test $n -le $#; do eval arg=\$$n case "$arg" in --yorick=*) yorick=`echo "$arg" | sed 's/^--yorick=//'` ;; esac n=`expr $n + 1` done # Search Yorick in the path: if test "x$yorick" = "x"; then exe_sfx= save_IFS=$IFS IFS=":" for dir in $PATH; do file=$dir/yorick$exe_sfx if test -f "$file" -a -x "$file"; then yorick=$file break fi done IFS=$save_IFS fi if test "x$yorick" = "x" -o ! -f "$yorick" -o ! -x "$yorick"; then echo >&2 "Yorick excutable not found." echo >&2 "Try to specify the path with option --yorick=..." exit 1 fi exec "$yorick" -batch "$config_i" "$@" # Local Variables: # mode: sh # tab-width: 8 # c-basic-offset: 2 # indent-tabs-mode: nil # fill-column: 78 # coding: utf-8 # End: Yeti-6.4.0/core/000077500000000000000000000000001253351442600133705ustar00rootroot00000000000000Yeti-6.4.0/core/Makefile000066400000000000000000000107251253351442600150350ustar00rootroot00000000000000# # Makefile - # # Makefile for Yeti main plugin. # #------------------------------------------------------------------------------ # # Copyright (C) 2006-2010, Eric Thiébaut # # This software is governed by the CeCILL-C license under French law and # abiding by the rules of distribution of free software. You can use, modify # and/or redistribute the software under the terms of the CeCILL-C license as # circulated by CEA, CNRS and INRIA at the following URL # "http://www.cecill.info". # # As a counterpart to the access to the source code and rights to copy, # modify and redistribute granted by the license, users are provided only # with a limited warranty and the software's author, the holder of the # economic rights, and the successive licensors have only limited liability. # # In this respect, the user's attention is drawn to the risks associated with # loading, using, modifying and/or developing or reproducing the software by # the user in light of its specific status of free software, that may mean # that it is complicated to manipulate, and that also therefore means that it # is reserved for developers and experienced professionals having in-depth # computer knowledge. Users are therefore encouraged to load and test the # software's suitability as regards their requirements in conditions enabling # the security of their systems and/or data to be ensured and, more # generally, to use and operate it in the same conditions as regards # security. # # The fact that you are presently reading this means that you have had # knowledge of the CeCILL-C license and that you accept its terms. #------------------------------------------------------------------------------ # these values filled in by: yorick -batch make.i Y_MAKEDIR= Y_EXE= Y_EXE_PKGS= Y_EXE_HOME= Y_EXE_SITE= Y_HOME_PKG= #----------------------------------------------------------- optimization flags # options for make command line, e.g.- make COPT=-g TGT=exe COPT=$(COPT_DEFAULT) TGT=$(DEFAULT_TGT) #------------------------------------------------------ macros for this package PKG_NAME=yeti PKG_I=yeti.i OBJS = yeti_convolve.o \ yeti_cost.o \ yeti_hash.o \ yeti_sort.o \ yeti_math.o \ yeti_misc.o \ yeti_morph.o \ yeti_new.o \ yeti_rgl.o \ yeti_sparse.o \ yeti_symlink.o \ yeti_utils.o # change to give the executable a name other than yorick PKG_EXENAME = yorick # PKG_DEPLIBS=-Lsomedir -lsomelib for dependencies of this package PKG_DEPLIBS = # set compiler (or rarely loader) flags specific to this package PKG_CFLAGS = -I.. PKG_LDFLAGS = # list of additional package names you want in PKG_EXENAME # (typically Y_EXE_PKGS should be first here) EXTRA_PKGS=$(Y_EXE_PKGS) # list of additional files for clean PKG_CLEAN=$(PKG_NAME)$(PLUG_SFX) $(PKG_NAME).dll $(PKG_NAME).so # autoload file for this package, if any PKG_I_START= # non-pkg.i include files for this package, if any PKG_I_EXTRA=yeti_yhdf.i #-------------------------------------- standard targets and rules (in Makepkg) # set macros Makepkg uses in target and dependency names # DLL_TARGETS, LIB_TARGETS, EXE_TARGETS # are any additional targets (defined below) prerequisite to # the plugin library, archive library, and executable, respectively PKG_I_DEPS=$(PKG_I) Y_DISTMAKE=distmake include $(Y_MAKEDIR)/Make.cfg include $(Y_MAKEDIR)/Makepkg include $(Y_MAKEDIR)/Make$(TGT) # override macros Makepkg sets for rules and other macros # Y_HOME and Y_SITE in Make.cfg may not be correct (e.g.- relocatable) Y_HOME=$(Y_EXE_HOME) Y_SITE=$(Y_EXE_SITE) # reduce chance of yorick-1.5 corrupting this Makefile MAKE_TEMPLATE = protect-against-1.5 #------------------------------------------- targets and rules for this package # simple example: #myfunc.o: myapi.h # more complex example (also consider using PKG_CFLAGS above): #myfunc.o: myapi.h myfunc.c # $(CC) $(CPPFLAGS) $(CFLAGS) -DMY_SWITCH -o $@ -c myfunc.c yeti_convolve.o: yeti.h #yeti_cost.o: yeti_eigen.o: yeti_eigen.c $(CC) $(CPPFLAGS) $(CFLAGS) -DYORICK -o $@ -c $< yeti_hash.o: yeti.h ../config.h yeti_heapsort.o: yeti.h ../config.h yeti_math.o: yeti.h ../config.h yeti_misc.o: yeti.h ../config.h yeti_morph.o: yeti.h ../config.h yeti_rgl.o: yeti_rgl.c $(CC) $(CPPFLAGS) $(CFLAGS) -DYORICK -o $@ -c $< yeti_utils.o: yeti.h ../config.h #yeti_new.o: # # Local Variables: # mode: Makefile # tab-width: 8 # fill-column: 78 # coding: utf-8 # End: #-------------------------------------------------------------- end of Makefile Yeti-6.4.0/core/yeti.h000066400000000000000000000461551253351442600145260ustar00rootroot00000000000000/* * yeti.h - * * Definitions for writing Yorick extensions. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #ifndef _YETI_H #define _YETI_H 1 #include #include "ydata.h" #include "defmem.h" /*---------------------------------------------------------------------------*/ /* In recent (>=1.5.12) versions of Yorick, RefNC in defined in binio.h which is included by ydata.h. */ #ifndef RefNC # define RefNC(db) (++(db)->references , (db)) #endif /*---------------------------------------------------------------------------*/ /* USEFUL MACROS */ #define YETI_ROUND_UP(a,b) ((((a)+(b)-1)/(b))*(b)) /*----- Return smallest multiple of integer B that is greater or equal integer A. */ /* * Utility macros: YETI_STRINGIFY takes an argument and wraps it in "" (double * quotation marks), YETI_JOIN joins two arguments. Both are capable of * performing macro expansion of their arguments. */ #define YETI_VERBATIM(x) x #if defined(__STDC__) || defined(__cplusplus) || defined(c_plusplus) # define YETI_STRINGIFY(x) YETI_STRINGIFY1(x) # define YETI_STRINGIFY1(x) # x # define YETI_JOIN(a,b) YETI_JOIN1(a, b) # define YETI_JOIN1(a,b) a ## b #else # define YETI_STRINGIFY(x) "x" # define YETI_JOIN(a,b) YETI_VERBATIM(a)/**/YETI_VERBATIM(b) #endif /*---------------------------------------------------------------------------*/ /* Macro to get rid of some GCC extensions when not compiling with GCC. */ #if ! (defined(__GNUC__) && __GNUC__ > 1) # undef __attribute__ # define __attribute__(x) /* empty */ #endif /*---------------------------------------------------------------------------*/ /* C++ needs to know that types and declarations are C, not C++. */ #ifdef __cplusplus # define _YETI_BEGIN_DECLS extern "C" { # define _YETI_END_DECLS } #else # define _YETI_BEGIN_DECLS # define _YETI_END_DECLS #endif _YETI_BEGIN_DECLS /*---------------------------------------------------------------------------*/ /* Refine definition of YError to avoid GCC warnings (about uninitialized variables or reaching end of non-void function): */ PLUG_API void YError(const char *msg) __attribute__ ((noreturn)); extern void yeti_error(const char *str, ...) __attribute__ ((noreturn)); /*----- Build error message from a list of strings (last element of the list must be NULL) and call YError. The maximum length of the final message is 127 characters; otherwise the message get, silently truncated. */ /* The following routines/macros are designed to simply the handling of Yorick's symbols. */ #define YETI_IS_REF(S) ((S)->ops == &referenceSym) #define YETI_DEREF_SYMBOL(S) (YETI_IS_REF(S) ? &globTab[(S)->index] : (S)) /*----- Return S or the referenced symbol if S is a reference. */ #define YETI_SOLVE_REF(S) if (YETI_IS_REF(S)) (S) = &globTab[(S)->index] /*----- Solve Yorick's symbol reference(s). This macro is intended to be used for symbols that the parser push on the stack as arguments of a built-in routine. The argument of the macro should be a variable. Also beware that the macro could break an if-else statement if not enclosed into braces. Use ReplaceRef(S) to replace the stack symbol S by whatever it point to. */ extern char *yeti_strcpy(const char *str); extern char *yeti_strncpy(const char *str, size_t len); /*----- Return a copy of string STR. If STR is NULL, NULL is returned; otherwise LEN+1 bytes get dynamically allocated for the copy. The return value is intended to be managed as an element of a Yorick's string array, i.e. the copy must be deleted by StrFree for Yorick version 1.4 and p_free for newer Yorick versions. */ extern int yeti_is_range(Symbol *s); extern int yeti_is_structdef(Symbol *s); extern int yeti_is_stream(Symbol *s); extern int yeti_is_nil(Symbol *s); extern int yeti_is_void(Symbol *s); /*----- Check various properties of symbol *S. Note 1: if S is a reference, the test is performed onto the referenced object but S does not get replaced by the referenced object (e.g. not as with YNotNil), so S can be outside the stack. Note 2: yeti_is_nil and yeti_is_void should return the same result but, in case this matters, yeti_is_nil checks the datablock address while yeti_is_void checks the datablock Operations address. */ extern void yeti_debug_symbol(Symbol *s); /*----- Print-out contents of symbol *S. */ extern void yeti_bad_argument(Symbol *s) __attribute__ ((noreturn)); /*----- Trigger an error (by calling YError) due to a bad built-in routine argument *SYM. The first reference level is assumed to have been resolved (see for instance YETI_SOLVE_REF). */ extern void yeti_unknown_keyword(void) __attribute__ ((noreturn)); /*----- Call YError with the message: "unrecognized keyword in builtin function call". */ extern DataBlock *yeti_get_datablock(Symbol *s, const Operations *ops); /*----- Get data block from symbol S. If OPS is non-NULL, the virtual function table (Operations) of the data block symbol must match OPS. If S is a reference the referenced symbol is considered instead of S and get popped onto the stack to replace S (as with ReplaceRef); this is required to avoid returning a temporary data block that could be unreferenced elsewhere. */ extern Array *yeti_get_array(Symbol *s, int nil_ok); /*----- Get array from symbol S. If S is a reference the referenced symbol is considered instead of S and get popped onto the stack to replace S (as with ReplaceRef); this is required to avoid returning a temporary array that could be unreferenced elsewhere. If the considered object, is void, then if NIL_OK is non-zero NULL is returned; otherwise, YError is called and the function does not returns. */ extern int yeti_get_boolean(Symbol *s); /*----- Return 1/0 according to the value of symbol S. The result should be the same as the statement (s?1:0) in Yorick. */ extern long yeti_get_optional_integer(Symbol *s, long defaultValue); /*----- Return the value of symbol `*s' if it is a scalar (or 1 element array) integer (char, short, int or long) and `defaultValue' is symbol is void. Call yeti_bad_argument otherwise. */ typedef struct yeti_scalar yeti_scalar_t; extern yeti_scalar_t *yeti_get_scalar(Symbol *s, yeti_scalar_t *scalar); /*----- Fetch scalar value stored in Yorick symbol S and fill SCALAR accordingly. The return value is SCALAR. */ struct yeti_scalar { int type; /* One of: T_CHAR, T_SHORT, T_INT, T_LONG, T_FLOAT, T_DOUBLE, T_COMPLEX, T_POINTER, T_STRUCT, T_RANGE, T_VOID, T_FUNCTION, T_BUILTIN, T_STRUCTDEF, T_STREAM, T_OPAQUE. Never: T_LVALUE. */ union { char c; int i; short s; long l; float f; double d; struct {double re, im;} z; char *q; void *p; } value; }; /*---------------------------------------------------------------------------*/ /* STACK FUNCTIONS/MACROS */ extern void yeti_pop_and_reduce_to(Symbol *s); /*----- Pop topmost stack element in-place of S and drop all elements above S. S must belong to the stack. This routine is useful to pop the result of a builtin routine. The call is equivalent to: PopTo(S); Drop(N); with N = sp - S (sp taken before PopTo). */ #define yeti_get_integer(SYMBOL) YGetInteger(SYMBOL) #define yeti_get_real(SYMBOL) YGetReal(SYMBOL) #define yeti_get_string(SYMBOL) YGetString(SYMBOL) extern void **yeti_get_pointer(Symbol *s); /*----- Funtions to get a scalar value from Yorick stack element S. */ extern void yeti_push_char_value(int value); extern void yeti_push_short_value(int value); extern void yeti_push_float_value(double value); extern void yeti_push_complex_value(double re, double im); extern void yeti_push_string_value(const char *value); #define yeti_push_int_value(value) PushIntValue(value) #define yeti_push_long_value(value) PushLongValue(value) #define yeti_push_double_value(value) PushDoubleValue(value) /*----- These functions push a new scalar value of a particular type on top of the stack. String VALUE can be NULL. */ #define YETI_PUSH_NEW_ARRAY(SDEF, DIMS) ((Array *)PushDataBlock(NewArray(SDEF, DIMS))) #define YETI_PUSH_NEW_ARRAY_C(DIMS) YETI_PUSH_NEW_ARRAY(&charStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_S(DIMS) YETI_PUSH_NEW_ARRAY(&shortStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_I(DIMS) YETI_PUSH_NEW_ARRAY(&intStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_L(DIMS) YETI_PUSH_NEW_ARRAY(&longStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_F(DIMS) YETI_PUSH_NEW_ARRAY(&floatStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_D(DIMS) YETI_PUSH_NEW_ARRAY(&doubleStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_Z(DIMS) YETI_PUSH_NEW_ARRAY(&complexStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_Q(DIMS) YETI_PUSH_NEW_ARRAY(&stringStruct, DIMS) #define YETI_PUSH_NEW_ARRAY_P(DIMS) YETI_PUSH_NEW_ARRAY(&pointerStruct, DIMS) /*----- These macros allocate a new Yorick array with dimension list DIMS, push it on top of the stack and return the address of the array structure. There must be an element left on top of the stack to store the new array. See also: YETI_PUSH_NEW_. */ #define YETI_PUSH_NEW_(SDEF, DIMS, MEMBER) (YETI_PUSH_NEW_ARRAY(SDEF, DIMS)->value.MEMBER) #define YETI_PUSH_NEW_C(DIMS) YETI_PUSH_NEW_(&charStruct, DIMS, c) #define YETI_PUSH_NEW_S(DIMS) YETI_PUSH_NEW_(&shortStruct, DIMS, s) #define YETI_PUSH_NEW_I(DIMS) YETI_PUSH_NEW_(&intStruct, DIMS, i) #define YETI_PUSH_NEW_L(DIMS) YETI_PUSH_NEW_(&longStruct, DIMS, l) #define YETI_PUSH_NEW_F(DIMS) YETI_PUSH_NEW_(&floatStruct, DIMS, f) #define YETI_PUSH_NEW_D(DIMS) YETI_PUSH_NEW_(&doubleStruct, DIMS, d) #define YETI_PUSH_NEW_Z(DIMS) YETI_PUSH_NEW_(&complexStruct, DIMS, d) #define YETI_PUSH_NEW_Q(DIMS) YETI_PUSH_NEW_(&stringStruct, DIMS, q) #define YETI_PUSH_NEW_P(DIMS) YETI_PUSH_NEW_(&pointerStruct, DIMS, p) /*----- These macros allocate a new Yorick array with dimension list DIMS, push it on top of the stack and return the base address of the array contents. See YETI_PUSH_NEW_ARRAY for side effects and restrictions. */ /*---------------------------------------------------------------------------*/ /* DIMENSIONS OF ARRAYS. */ PLUG_API Dimension *tmpDims; /*----- tmpDims is a global temporary for Dimension lists under construction -- you should always use it, then just leave your garbage there when you are done for the next guy to clean up -- your part of the perpetual cleanup comes first. */ extern void yeti_reset_dimlist(void); /*----- Prepares global variable tmpDims for the building of a new dimension list (i.e. takes care of freeing old dimension list if any). */ extern Dimension *yeti_grow_dimlist(long number); /*----- Appends dimension of length NUMBER to tmpDims and returns new value of tmpDims. Note that tmpDims must have been properly initialized by yeti_reset_dimlist or yeti_start_dimlist (which to see). For instance to build the dimension list of a NCOLS by NROWS array, do: yeti_reset_dimlist(); yeti_grow_dimlist(ncols); yeti_grow_dimlist(nrows); */ extern Dimension *yeti_start_dimlist(long number); /*----- Initializes global temporary tmpDims with a single first dimension with length NUMBER and returns tmpDims. Same as: yeti_reset_dimlist(); yeti_grow_dimlist(number); */ extern Dimension *yeti_make_dims(const long number[], const long origin[], size_t ndims); /*----- Build and store a dimension list in tmpDims, NDIMS is the number of dimensions, NUMBER[i] and ORIGIN[i] are the length and starting index along the i-th dimension (if ORIGIN is NULL, all origins are set to 1). The new value of tmpDims is returned. */ extern size_t yeti_get_dims(const Dimension *dims, long number[], long origin[], size_t maxdims); /*----- Store dimensions along chained list DIMS in arrays NUMBER and ORIGIN (if ORIGIN is NULL, it is not used). There must be no more than MAXDIMS dimensions. Returns the number of dimensions. */ extern int yeti_same_dims(const Dimension *dims1, const Dimension *dims2); /*----- Check that two dimension lists are identical (also see Conform in ydata.c or yeti_total_number_2). Returns 1 (true) or 0 (false). */ extern void yeti_assert_same_dims(const Dimension *dims1, const Dimension *dims2); /*----- Assert that two dimension lists are identical, raise an error (see YError) if this not the case. */ extern long yeti_total_number(const Dimension *dims); /*----- Returns number of elements of the array with dimension list DIMS. */ extern long yeti_total_number_2(const Dimension *dims1, const Dimension *dims2); /*----- Check that two dimension lists are identical and return the number of elements of the corresponding array. An error is raised (via YError) if dimension lists are not identical. */ /*---------------------------------------------------------------------------*/ /* WORKSPACE */ extern void *yeti_push_workspace(size_t nbytes); /*----- Return temporary worspace of NBYTES bytes. In case of error (insufficient memory), YError get called; the routine therefore always returns a valid address. An opaque workspace object get pushed onto Yorick's stack so that it is automatically deleted but the caller has to make sure that the stack is large enough (see CheckStack). */ /*---------------------------------------------------------------------------*/ /* OPAQUE OBJECTS */ /* Very simple implementation of "opaque" objects in Yorick. The main purpose is to let Yorick handle object reference counts and free object resources as soon as the object is no longer referenced. The main lack is that no operator overloading is providing (although it is possible in principle but necessitates much more code...) */ typedef struct yeti_opaque yeti_opaque_t; typedef struct yeti_opaque_class yeti_opaque_class_t; struct yeti_opaque { /* The yeti_opaque_t structure stores information about an instance of an object. The two first members (references and ops) are common to any Yorick's DataBlock. */ int references; /* reference counter */ Operations *ops; /* virtual function table */ const yeti_opaque_class_t *class; /* opaque class definition */ void *data; /* opaque object client data. */ }; struct yeti_opaque_class { /* This structure provides the object class definition (class name and methods). For each class there must be a unique such structure so that its address can be used as an identifier to identify the object class. */ const char *name; /* Object class name. */ void (*delete)(void *); /* Method called when object is being deleted (the argument is the object client data). If the "delete" method is NULL, nothing particular is done when the object is deleted. */ void (*print)(void *); /* Method used to print object information (the argument is the object client data). If the "print" method is NULL, a default one is supplied. */ }; extern yeti_opaque_t *yeti_new_opaque(void *data, const yeti_opaque_class_t *class); /*----- Create a new Yorick data block to store an instance of an opaque object. Since Yeti's implementation of opaque objects does not keep track of object client data references, a single client data instance cannot be referenced by several object data blocks (unless the "delete" method provided by the class takes care of that); it is nevertheless still possible that several Yorick's symbols share the same data block. */ extern yeti_opaque_t *yeti_get_opaque(Symbol *stack, const yeti_opaque_class_t *class, int fatal); /*----- Returns a pointer to the object referenced by Yorick's symbol STACK. If CLASS is non-NULL the object must be of that class. If FATAL is non-zero, any error result in calling YError (i.e. the routine never return on error); otherwise, NULL is returned on error. Note: STACK must belong to the stack and, if it is a reference, it gets replaced by the referenced object (as by calling ReplaceRef); this is needed to avoid using a temporary object that may be unreferenced elsewhere. */ #define yeti_get_opaque_data(STACK, CLASS) \ (yeti_get_opaque(STACK, CLASS, 1)->data) /*----- Get the client data of the stack opaque object referenced by symbol *STACK which must be a Yeti object type and, if CLASS is not NULL, must be an instance of this class. An error is issued if symbol *STACK does not reference an opaque object or if CLASS is not NULL and is not that of the object. See yeti_get_opaque for side effects. */ /*---------------------------------------------------------------------------*/ _YETI_END_DECLS #endif /* _YETI_H */ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti.i000066400000000000000000002604521253351442600145250ustar00rootroot00000000000000/* * yeti.i - * * Main startup file for Yeti (an extension of Yorick). * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2013 Éric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ if (is_func(plug_in) && is_func(yeti_init) != 2) plug_in, "yeti"; local YETI_HOME, YETI_VERSION, YETI_VERSION_MAJOR, YETI_VERSION_MINOR; local YETI_VERSION_MICRO, YETI_VERSION_SUFFIX; extern yeti_init; /* DOCUMENT YETI_HOME the directory where Yeti is installed or YETI_VERSION version of current Yeti interpreter (string) or YETI_VERSION_MAJOR major Yeti version number (integer) or YETI_VERSION_MINOR minor Yeti version number (integer) or YETI_VERSION_MICRO micro Yeti version number (integer) or YETI_VERSION_SUFFIX suffix Yeti version number (string, e.g. "pre1") or yeti_init; or yeti_init(); YETI_VERSION and YETI_HOME are global variables predefined by Yeti to store its version number (as "MAJOR.MINOR.MICROSUFFIX") and installation directory (e.g. "/usr/local/lib/yeti-VERSION"). In YETI_VERSION, a non-empty suffix like "x" or "pre1" indicates a development version. The function yeti_init can be used to restore the values of YETI_VERSION and YETI_HOME. When called as a function, yeti_init() returns Yeti version as a string. If Yeti is loaded as a plugin, YETI_HOME is left undefined and no path initialization is performed. Otherwise, the first time yeti_init is called (this is automatically done at Yeti startup), it set the default path list for Yeti applications. A convenient way to check if your script is parsed by Yeti is to do: if (is_func(yeti_init) == 2) { // we are in Yeti ... } else { // not in Yeti ... } SEE ALSO: Y_LAUNCH, Y_HOME, Y_SITE, Y_VERSION, get_path, set_path. */ if (batch()) { yeti_init; } else { write, format=" Yeti %s ready. Copyright (c) 1996-2009, Eric THIEBAUT.\n", yeti_init(); } func setup_package(plugname) /* DOCUMENT PACKAGE_HOME = setup_package(); or PACKAGE_HOME = setup_package(plugname); The setup_package function must be directly called in a Yorick source file, the so-called Yorick package source file. This function determines the package directory which is the absolute directory name of the package source file and setup Yorick search paths to include this directory. The returned value is the package directory (guaranteed to be terminated by a slash "/"). If PLUGNAME is specified, the corresponding plugin is loaded (preferentially from the package directory). SEE ALSO: plug_in, plug_dir, current_include, get_path, set_path. */ { /* Quick check. */ path = current_include(); if (is_void(path)) { error, "setup_package must be called from a Yorick source file"; } /* Figure out the absolute directory from where we are called. */ cwd = cd("."); j = where(strchar(path) == '/'); if (is_array(j)) { pkgdir = cd(strpart(path, 1:j(0))); cd, cwd; } else { pkgdir = cwd; } if (is_void(pkgdir)) { error, "bad path for include file: \"" + path + "\""; } if (strpart(pkgdir, 0:0) != "/") { pkgdir += "/"; } /* Setup Yorick search path. */ list = get_path(); if (! strlen(list)) { list = []; flag = 1n; } else { c = strchar(list); j = where(c == ':'); if (is_array(j)) { c(j) = 0; list = strchar(c); } found = (list == pkgdir); if (noneof(found)) { flag = 1n; } else if (! found(1) || sum(found) > 1) { flag = 1n; list = list(where(! found)); } else { flag = 0n; /* no need to add PKGDIR */ } } if (flag) { set_path, (numberof(list) ? pkgdir + sum(":" + list) : pkgdir); } /* Setup list of directories for plugins so that the package directory is searched first and load package plugin. */ if (! is_void(plugname) && is_func(plug_in)) { list = plug_dir(); if (is_void(list)) { plug_dir, pkgdir; } else { /* move directory in first position */ plug_dir, grow(pkgdir, list(where(list != pkgdir))); } plug_in, plugname; } return pkgdir; } func anonymous(args, code) /* DOCUMENT f = anonymous(args, code); Make an anonymous function with ARGS its argument list and CODE the body of the function. ARGS and CODE must be scalar strings. For instance: f1 = anonymous("x", "c = x*x; return sqrt(c + abs(x));"); f2 = anonymous("x,y", "return cos(x*y + abs(x));"); define two functions f1 and f2 which take respectively one and two arguments. When variables f1 and f2 get out of scope the function definition is automatically deleted. Other example: a = _lst(12,34,67); b = map(anonymous("x", "return sin(x);"), a); B is a list with its elements the sines of the elements of A. SEE ALSO: map, include, funcdef, h_functor, closure. */ { include, ["func __anonymous__(" + args + "){" + code + "}"], 1; return __anonymous__; } /*---------------------------------------------------------------------------*/ /* SORTING */ extern heapsort; /* DOCUMENT heapsort(a) or heapsort, a; When called as a function, returns a vector of numberof(A) longs containing index values such that A(heapsort(A)) is a monotonically increasing vector. When called as a subroutine, performs in-place sorting of elements of array A. This function uses the heap-sort algorithm which may be superior to the quicksort algorithm (for instance for integer valued arrays). Beware that headpsort(A) and sort(A) differ for multidimensional arrays. SEE ALSO: quick_select, sort. */ extern quick_select; /* DOCUMENT quick_select(a, k [, first, last]) or quick_select, a, k [, first, last]; Find the K-th smallest element in array A. When called as a function, the value of the K-th smallest element in array A is returned. When called as a subroutine, the elements of A are re-ordered (in-place operation) so that A(K) is the K-th smallest element in array A and A(J) <= A(K) for J <= K and A(J) >= A(K) for J >= K. Optional arguments FIRST and LAST can be used to specify the indices of the first and/or last element of A to consider: elements before FIRST and after LAST are ignored and left unchanged when called as a subroutine; index K however always refers to the full range of A. By default, FIRST=1 and LAST=numberof(A). Yorick indexing rules are supported for arguments K, FIRST and LAST (i.e. 0 means last element, etc). EXAMPLES The index K which splits a sample of N=numberof(A) elements into fractions ALPHA (strictly before K, that is K - 1 elements) and 1 - ALPHA (strictly after K, that is N - K elements) is such that: (1 - ALPHA)*(K - 1) = ALPHA*(N - K) hence: K = 1 + ALPHA*(N - 1) Accounting for rounding to nearest integer, this leads to compute the value at the boundary of the split as: q(ALPHA) = quick_select(A, long(1.5 + ALPHA*(numberof(A) - 1))) Therefore the first inter-quartile split is at (1-based and rounded to the nearest integer) index: K1 = (N + 5)/4 (with integer division) the second inter-quartile (median) is at: K2 = N/2 + 1 (with integer division) the third inter-quartile is at: K3 = (3*N + 3)/4 (with integer division) SEE ALSO: quick_median, quick_quartile, sort, heapsort. */ func quick_median(a) /* DOCUMENT quick_median(a) Returns the median of values in array A. SEE ALSO median, quick_quartile, quick_select, insure_temporary. */ { n = numberof(a); k = (n + 1)/2; if (n % 2) { /* odd number of elements */ return quick_select(a, k); } else { /* even number of elements */ insure_temporary, a; quick_select, a, k; return (double(a(k)) + a(min:k+1:n))/2.0; } } local quick_interquartile_range; func quick_quartile(a) /* DOCUMENT q = quick_quartile(a); or iqr = quick_interquartile_range(a); The function quick_quartile() returns the 3 quartiles of the values in array A. The function quick_interquartile_range() returns IQR = Q(3) - Q(1), the interquartile range of values in array A. Linear interpolation is used to estimate the value of A at fractional orders. Array A must have at least 3 elements. SEE ALSO quick_median, quick_select, insure_temporary. */ { /* Check argument and prepare for in-place operation. */ if ((n = numberof(a)) <= 2) { error, "expecting an array with at least 3 elements"; } insure_temporary, a; q = array(double, 3); /* The 1st interquartile is at fractional index (n + 2)/4. */ k1 = (p = n + 2)/4; quick_select, a, k1; if ((r = (p & 3)) == 0) { q(1) = double(a(k1)); } else { ++k1; quick_select, a, k1, k1; u = r/4.0; q(1) = (1.0 - u)*a(k1 - 1) + u*a(k1); } /* The median (2nd interquartile) is at fractional index (n + 1)/2. */ k2 = (p = n + 1)/2; if (k2 > k1) { quick_select, a, k2, k1 + 1; } if ((p & 1) == 0) { q(2) = double(a(k2)); } else { ++k2; quick_select, a, k2, k2; q(2) = (double(a(k2 - 1)) + a(k2))/2.0; } /* The 3rd interquartile is at fractional index (3*n + 2)/4. */ k3 = (p = 3*n + 2)/4; if (k3 > k2) { quick_select, a, k3, k2 + 1; } if ((r = (p & 3)) == 0) { q(3) = double(a(k3)); } else { ++k3; quick_select, a, k3, k3; u = r/4.0; q(3) = (1.0 - u)*a(k3 - 1) + u*a(k3); } return q; } func quick_interquartile_range(a) { /* Check argument and prepare for in-place operation. */ if ((n = numberof(a)) <= 2) { error, "expecting an array with at least 3 elements"; } insure_temporary, a; /* The 1st interquartile is at fractional index (n + 2)/4. */ k1 = (p = n + 2)/4; quick_select, a, k1; if ((r = (p & 3)) == 0) { q1 = double(a(k1)); } else { ++k1; quick_select, a, k1, k1; u = r/4.0; q1 = (1.0 - u)*a(k1 - 1) + u*a(k1); } /* The 3rd interquartile is at fractional index (3*n + 2)/4. */ k3 = (p = 3*n + 2)/4; if (k3 > k1) { quick_select, a, k3, k1 + 1; } if ((r = (p & 3)) == 0) { q3 = double(a(k3)); } else { ++k3; quick_select, a, k3, k3; u = r/4.0; q3 = (1.0 - u)*a(k3 - 1) + u*a(k3); } return q3 - q1; } /*---------------------------------------------------------------------------*/ /* SYMBOLIC LINKS */ extern symlink_to_variable; extern symlink_to_name; extern is_symlink; extern name_of_symlink; extern value_of_symlink; /* DOCUMENT lnk = symlink_to_variable(var) or lnk = symlink_to_name(varname) or is_symlink(lnk) or name_of_symlink(lnk) or value_of_symlink(lnk) The call symlink_to_variable(var) creates a symbolic link to variable VAR. The call symlink_to_name(varname) creates a symbolic link to variable whose name is VARNAME. When the link object LNK is used in an 'eval' context or a 'get member' context (see examples below), LNK gets replaced 'on the fly' by the symbol which is actually stored into the corresponding Yorick's variable. Therefore LNK adds no additional reference to the variable which only has to exist when LNK is later used. This functionality can be used to implement 'virtual' methods for pseudo-object in Yorick (using hash tables). For instance: > lnk = symlink_to_variable(foo); // variable foo does not yet exists > lnk = symlink_to_name("foo"); // same link, using a name > func foo(x) { return 2*x; } > lnk(9) 18 > func foo(x) { return 3*x; } > lnk(9) 27 > z = array(complex, 10, 4); > lnk = symlink_to_variable(z); > info, lnk.re; array(double,10,4) The function is_symlink(LNK) check whether LNK is a symbolic link. The function name_of_symlink(LNK) returns the name of the variable linked by LNK. The function value_of_symlink(LNK) returns the actual value of the variable corresponding to the symbolic link LNK. This function can be used to force the substitution in a context where it is not automatically done. For instance: > lnk = symlink_to_variable(a); > a = random(10); > avg(lnk) ERROR (*main*) avg requires numeric argument > avg(value_of_symlink(lnk)) 0.383679 > avg(a) 0.383679 SEE ALSO: h_new. */ /*---------------------------------------------------------------------------*/ /* HASH TABLE OBJECTS */ extern h_debug; /* DOCUMENT h_debug, object, ... Print out some debug information on OBJECT. **************************** *** WILL BE REMOVED SOON *** ****************************/ extern h_new; /* DOCUMENT h_new(); or h_new(key=value, ...); or h_new("key", value, ...); Returns a new hash table with member(s) KEY set to VALUE. There may be any number of KEY-VALUE pairs. A particular member of a hash table TAB can be specified as a scalar string, i.e. "KEY", or using keyword syntax, i.e. KEY=. The keyword syntax is however only possible if KEY is a valid Yorick's symbol name. VALUE can be anything (even a non-array object). h_save and h_functor (which to see) provide alternative means to create hash table object. yhd_save and yhd_restore (which to see) let you save and restore hash tables to data files. A hash table can be used to implement some kind of object-oriented abstraction in Yorick. However, in Yorick, a hash table must have a simple tree structure -- no loops or rings are allowed (loops break Yorick's memory manager -- beware). You need to be careful not to do this as the error will not be detected. The difference between a hash table and a list object is that items are retrieved by key identifier rather than by order (by h_get, get_member or dot dereferenciation). It is possible to dereference the contents of TAB using the dot operator (as for a structure) or the get_member function. For instance, it is legal to do: tab = h_new(x=span(-7,7,100), name="my name", op=sin, scale=33); plg, tab.op(tab.x), tab.x; but the member must already exists and there are restrictions to assignation, i.e. only contents of array members can be assigned: tab.name() = "some other string"; // ok tab.name = "some other string"; // error tab.x(RANGE_OR_INDEX) = EXPR; // ok if conformable AND member X // is not a 'fast' scalar (int, // long or double scalar) tab.x = EXPR; // error and assignation cannot therefore change the dimension list or data type of a hash table member. Redefinition/creation of a member can always be performed with the h_set function which is the recommended method to set the value of a hash table member. Hash tables behave differently depending how they are used: tab.key - de-reference hash member tab("key") - returns member named "key" in hash table TAB, this is exactly the same as: h_get(tab, "key") tab() - returns number of elements in hash table TAB tab(i) - returns i-th member in hash table TAB; i is a scalar integer and can be less or equal zero to start from the last one; if the hash table is unmodified, tab(i) is the same as tab(keys(i)) where keys=h_keys(tab) -- beware that this is very inefficient way to access the contents of a hash table and will probably be removed soon. However, beware that the behaviour of calls such that TAB(...) may be changed if the has table implements its own "evaluator" (see h_evaluator). For instance, to explore the whole hash table, there are different possibilities: keys = h_keys(tab); n = numberof(keys); // alternatively: n = tab() for (i = 1; i <= n; ++i) { a = tab(keys(i)); ...; } or: for (key = h_first(tab); key; key = h_next(tab, key)) { a = tab(key); ...; } or: n = tab(); for (i=1 ; i<=n ; ++i) { a = tab(i); ...; } the third form is slower for large tables and will be made obsolete soon. An important point to remember when using hash table is that hash members are references to their contents, i.e. h_set, hash, member=x; makes an additional reference to array X and does not copy the array although you can force that, e.g.: tmp = x; // make a copy of array X h_set, hash, member=tmp; // reference copy in hash table tmp = []; // delete one reference to the copy Because assignation result is its rhs (right-hand-side), you cannot do: h_set, hash, member=(tmp = x); // assignation result is X Similarly, unlike Yorick array data types, a statement like x=hash does not make a copy of the hash table, it merely makes an additional reference to the list. CAVEATS: In Yorick (or Yeti), many objects can be used to reference other objects: pointers, lists and hash tables. Since Yorick uses a simple reference counter to delete unused object, cyclic references (i.e. an object referencing itself either directly or indirectly) result in objects that will not be properly deleted. It is the user reponsibility to create no cyclic references in order to avoid memory leaks. Checking a potential (or effective) cyclic reference would require recursive investigation of all members of the parent object and could be very time consuming. SEE ALSO: h_save, h_copy, h_get, h_has, h_keys, h_pop, h_set, h_stat, h_first, h_next, yhd_save, yhd_restore, _lst, h_functor, get_member. */ extern h_get; /* DOCUMENT h_get(tab, key=); or h_get(tab, "key"); Returns the value of member KEY of hash table TAB. If no member KEY exists in TAB, nil is returned. h_get(TAB, "KEY") is identical to get_member(TAB, "KEY") and also to TAB("KEY"). SEE ALSO h_new, get_member. */ extern h_set; /* DOCUMENT h_set, tab, key=value, ...; or h_set, tab, "key", value, ...; Stores VALUE in member KEY of hash table TAB. There may be any number of KEY-VALUE pairs. If called as a function, the returned value is TAB. SEE ALSO h_new, h_set_copy. */ func h_set_copy(tab, ..) /* DOCUMENT h_set_copy, tab, key, value, ...; Set member KEY (a scalar string) of hash table TAB with VALUE. Unlike h_set, VALUE is duplicated if it is an array. There may be any number of KEY-VALUE pairs. SEE ALSO h_copy, h_new, h_set. */ { while (more_args()) { key = next_arg(); value = next_arg(); h_set, tab, key, value; } return tab; } func h_copy(tab, recursively) /* DOCUMENT h_copy(tab); or h_copy(tab, recursively); Effectively copy contents of hash table TAB into a new hash table that is returned. If argument RECURSIVELY is true, every hash table contained into TAB get also duplicated. This routine is needed because doing CPY=TAB, where TAB is a hash table, would only make a new reference to TAB: CPY and TAB would be the same object. SEE ALSO h_new, h_set, h_clone. */ { key_list = h_keys(tab); n = h_number(tab); /* number of members */ new = h_new(); h_evaluator, new, h_evaluator(tab); if (recursively) { for (i=1 ; i<=n ; ++i) { key = key_list(i); member = h_get(tab, key); h_set, new, key, (is_hash(member) ? h_copy(member, 1) : member); } } else { for (i=1 ; i<=n ; ++i) { key = key_list(i); member = h_get(tab, key); h_set, new, key, member; } } return new; } /* * NOTE: h_clone(tab, copy=1) is the same as h_copy(tab) * h_clone(tab, copy=1, depth=-1) is the same as h_copy(tab, 1) */ func h_clone(tab, copy=, depth=) /* DOCUMENT h_clone(tab, copy=, depth=); Make a new hash table with same contents as TAB. If keyword COPY is true, a fresh copy is made for array members. Otherwise, array members are just referenced one more time by the new hash table. If keyword DEPTH is non-zero, every hash table referenced by TAB get also cloned (this is done recursively) until level DEPTH has been reached (infinite recursion if DEPTH is negative). The value of keyword COPY is kept the same across the recursions. SEE ALSO h_new, h_set, h_copy. */ { local member; key_list = h_keys(tab); /* list of hash keys */ n = h_number(tab); /* number of members */ new = h_new(); h_evaluator, new, h_evaluator(tab); if (depth) { --depth; for (i=1 ; i<=n ; ++i) { key = key_list(i); if (copy) member = h_get(tab, key); else eq_nocopy, member, h_get(tab, key); h_set, new, key, (is_hash(member) ? h_clone(member, copy=copy, depth=depth) : member); } } else if (copy) { for (i=1 ; i<=n ; ++i) { key = key_list(i); member = h_get(tab, key); h_set, new, key, member; } } else { for (i=1 ; i<=n ; ++i) { key = key_list(i); h_set, new, key, h_get(tab, key); } } return new; } extern h_number; /* DOCUMENT h_number(tab); Returns number of entries in hash table TAB. SEE ALSO h_new, h_keys. */ extern h_keys; /* DOCUMENT h_keys(tab); Returns list of members of hash table TAB as a string vector of key names. The order in which keys are returned is arbitrary. SEE ALSO h_new, h_first, h_next, h_number. */ extern h_has; /* DOCUMENT h_has(tab, "key"); or h_has(tab, key=); Returns 1 if member KEY is defined in hash table TAB, else 0. SEE ALSO h_new. */ extern h_first; extern h_next; /* DOCUMENT h_first(tab); or h_next(tab, key); Get first or next key in hash table TAB. A NULL string is returned if key is not found or if it is the last one (for h_next). Thes routines are useful to run through all entries in a hash table (however beware that the hash table should be left unchanged during the scan). For instance: for (key = h_first(tab); key; key = h_next(tab, key)) { value = h_get(tab, key); ...; } SEE ALSO h_new, h_keys. */ extern h_evaluator; /* DOCUMENT h_evaluator(obj) or h_evaluator(obj, evl); or h_evaluator, obj, evl; Set/query evaluator function of hash table OBJ. When called as a function, the evaluator of OBJ prior to any change is returned as a scalar string. If EVL is specified, it becomes the new evaluator of OBJ. EVL must be a scalar string (the name of the evaluator function), or a function, or nil. If EVL is explicitely nil (for instance []) or a NULL-string (for instance string(0)), the default behaviour is restored. When hash table OBJ is used as: OBJ(...) where "..." represents any list of arguments (including none) then its evaluator get called as: EVL(OBJ, ...) that is with OBJ prepended to the same argument list. EXAMPLES: // create a hash table: obj = h_new(data=random(200), count=0); // define a fucntion: func eval_me(self, incr) { if (incr) h_set, self, count = (self.count + incr); return self.data(1 + abs(self.count)%200); } // set evaluator (which must be already defined as a function): h_evaluator, obj, eval_me; obj(49); // return 49-th value obj(); // return same value obj(3); // return 51-th value h_evaluator, obj, []; // restore standard behaviour // set evaluator (not necessarily already defined as a function): h_evaluator, obj, "some_name"; // then define the function code prior to use: func some_name(self, a, b) { return self.count; } SEE ALSO: h_new, h_get. */ func h_save(args) /* DOCUMENT tbl = h_save(var1, var2, ...); or h_save, tbl, var1, var2, ...; Save variables VAR1, VAR2, ... into an hash table. When called as a function, the resulting new hash table is returned by the function; when called as a subroutine, the first argument, TBL, must be a hash table whose contents gets updated. The VARi arguments may be: - a simple variable reference, in which case the name of the VARi specifies the key in the hash table; - a (KEY,VAL) pair where KEY is the name of the hash entry (a scalar string) and VAL is the value of the hash entry; as a special case, if KEY is the NULL string, string(0), the corresponding value is used to set the evaluator of the hash table; - a keyword KEY=VAL; Note that positional arguments are processed in order, and the keywords, if any, are processed last; if multiple arguments are stored with the same key, the final value will be the last one (according to the processing order). PERFORMANCES: Currently, this function is implemented by interpreted code. It is however quite fast: about 1.5 microseconds to create a table with one entry, plus about 0.6 microsecond per additional entry. For comparison, it takes 0.3 microsecond to create a single entry table with h_new() and about 0.06 microsecond per additional entry. All these times were measured on an Intel Core i7-870 at 2.93GHz. SEE ALSO: h_new, h_evaluator, h_functor, save. */ { /* Get the keywords and the number of positional arguments. */ keys = args(-); nargs = args(*); nkeys = numberof(keys); /* Create/fetch the hash table and process positional arguments. */ local obj, key; if (am_subroutine()) { if (nargs >= 1) eq_nocopy, obj, args(1); if (! is_hash(obj)) { error, ("expecting a hash table object as first argument " + "when called as a subroutine"); } i = 1; } else { obj = h_new(); i = 0; } while (++i <= nargs) { eq_nocopy, key, args(-,i); if (! key) { /* Argument is not a simple variable reference; then, the key is the value of the current positional argument and the the value is that of the next positional argument. */ eq_nocopy, key, args(i); if (! is_string(key) || ! is_scalar(key)) { error, "expecting key name or variable reference"; } if (++i > nargs) { error, "missing value after last key name in argument list"; } if (! key) { h_evaluator, obj, args(i); continue; } } h_set, obj, key, args(i); } /* Process keywords. */ for (i = 1; i <= nkeys; ++i) { key = keys(i); h_set, obj, key, args(key); } return obj; } //errs2caller, h_save; wrap_args, h_save; func h_functor(args) /* DOCUMENT obj = h_functor(fn, ..., var, ..., key=val, ..., "key", val, ...); This function creates a functor object OBJ (actually a hash table) which calls function FN with itself prepended to its argument list: obj(arg1, arg2, ...) is the same as: fn(obj, arg1, arg2, ...) First positional argument FN specify the function to call, it can be a name or any object callable as a function (including another functor or an anonymous function, see anonymous). Any other arguments (either positional ones or keywords) are used to populate the created object with hash entries. Any given keyword -- say KEY=VAL -- is stored into the returned object with "KEY" as its name and VAL as its value. Any simple variable reference -- say VAR -- is stored with name "VAR" and the contents of VAR as value. Finally, pairs of names and values -- say "KEY",VAL -- are stored with name "KEY" and value VAL. If you want to use the contents of a variable (not the variable name itself) as the name of an entry, just turn it into an expression, e.g., noop(VAR) or VAR+"" will do the trick. See h_save() for performance issues if that matters to you. SEE ALSO: h_new, h_evaluator, h_save, anonymous, closure, wrap_args, noop. */ { /* Get the keywords and the number of positional arguments. */ keys = args(-); nargs = args(*); nkeys = numberof(keys); /* Get the functions. */ local fn; if (nargs >= 1) eq_nocopy, fn, args(1); if (is_void(fn)) { error, "invalid or missing function"; } obj = h_new(); /* Add hash table entries from positional arguments. */ local key; for (i = 2; i <= nargs; ++i) { eq_nocopy, key, args(-,i); if (! key) { /* Argument is not a simple variable reference; then, the key is the value of the current positional argument and the the value is that of the next positional argument. */ eq_nocopy, key, args(i); if (! is_string(key) || ! is_scalar(key)) { error, "expecting attribute name or variable reference"; } if (++i > nargs) { error, "missing value for last attribute"; } } if (h_has(obj, key)) { error, ("attribute \""+key+"\" already exists"); } h_set, obj, key, args(i); } /* Add hash table entries from keywords. */ for (k = 1; k <= nkeys; ++k) { key = keys(k); if (h_has(obj, key)) { error, ("attribute \""+key+"\" already exists"); } h_set, obj, key, args(key); } h_evaluator, obj, fn; return obj; } wrap_args, h_functor; func h_info(tab, align) /* DOCUMENT h_info, tab; or h_info, tab, align; List contents of hash table TAB in alphabetical order of keys. If second argument is true, the key names are right aligned. SEE ALSO: h_new, h_keys, h_first, h_next, h_show, sort. */ { key_list = h_keys(tab); if (is_void(key_list)) return; key_list = key_list(sort(key_list)); n = numberof(key_list); width = max(strlen(key_list)); format = swrite(format=(align?"%%%ds":"%%-%ds"), width + 1); for (i=1 ; i<=n ; ++i) { key = key_list(i); write, format=format, key+":"; info, h_get(tab, key); } } local __h_show_worker; func h_show(tab, prefix=, maxcnt=, depth=) /* DOCUMENT h_show, tab; Display contents of hash table TAB in a tree-like representation. Keyword PREFIX can be used to prepend a prefix to the printed lines. Keyword MAXCNT (default 5) can be used to specify the maximum number of elements for printing array values. SEE ALSO: h_info, h_keys. */ { __h_show_maxcnt = (is_void(maxcnt) ? 5 : maxcnt); __h_show_worker, tab, , (is_void(prefix) ? "" : prefix), 0; } func __h_show_worker(obj, name, prefix, stage) { if (! name) { name = ""; } else if (strlen(name) == 0) { name = "\"\""; } else if (strgrep("^[_A-Za-z][0-9_A-Za-z]*$", name)(2) < 0) { name = print(name)(sum); } if (stage == 1) { prefix1 = prefix + " |-"; prefix2 = prefix + " | "; } else if (stage == 2) { prefix1 = prefix + " `-"; prefix2 = prefix + " "; } else { prefix1 = prefix; prefix2 = prefix; } if (is_hash(obj)) { key_list = h_keys(obj); if (is_array(key_list)) { key_list = key_list(sort(key_list)); //width = max(strlen(key_list)); //format = swrite(format=(align?"%%%ds":"%%-%ds"), width + 1); } n = numberof(key_list); e = h_evaluator(obj); write, format="%s %s (hash_table, %s%d %s)\n", prefix1, name, (e ? "evaluator=\""+e+"\", " : ""), n, (n <= 1 ? "entry" : "entries"); for (k = 1; k <= n; ++k) { key = key_list(k); __h_show_worker, h_get(obj,key), key, prefix2, 1 + (k == n); } } else if (is_array(obj)) { descr = typeof(obj); dims = dimsof(obj); n = numberof(dims); k = 1; while (++k <= n) { descr += swrite(format=",%d", dims(k)); } if (numberof(obj) <= __h_show_maxcnt) { write, format="%s %s (%s) %s\n", prefix1, name, descr, sum(print(obj)); } else { write, format="%s %s (%s)\n", prefix1, name, descr; } } else if (is_void(obj)) { write, format="%s %s (void) []\n", prefix1, name; } else if (is_symlink(obj)) { write, format="%s %s (%s) \"%s\"\n", prefix1, name, typeof(obj), name_of_symlink(obj); } else { write, format="%s %s (%s)\n", prefix1, name, typeof(obj); } } extern h_pop; /* DOCUMENT h_pop(tab, "key"); or h_pop(tab, key=); Pop member KEY out of hash table TAB and return it. When called as a subroutine, the net result is therefore to delete the member from the hash table. SEE ALSO h_new, h_delete. */ func h_delete(h, ..) /* DOCUMENT h_delete(tab, "key", ...); Delete members KEY, ... from hash table TAB and return it. Any KEY arguments may be present and must be array of strings or nil. SEE ALSO h_new, h_pop. */ { local key; while (more_args()) { eq_nocopy, key, next_arg(); n = numberof(key); for (i=1 ; i<=n ; ++i) h_pop, h, key(i); } return h; } extern h_stat; /* DOCUMENT h_stat(tab); Returns an histogram of the slot occupation in hash table TAB. The result is a long integer vector with i-th value equal to the number of slots with (i-1) items. Note: efficient hash table should keep the number of items per slot as low as possible. SEE ALSO h_new. */ func h_list(tab, sorted) /* DOCUMENT h_list(tab); or h_list(tab, sorted); Convert hash table TAB into a list: _lst("KEY1", VALUE1, ...). The order of key-value pairs is arbitrary unless argument SORTED is true in which case keys get sorted in alphabetical order. SEE ALSO h_new, _lst, sort. */ { keylist = h_keys(tab); n = numberof(keylist); if (sorted && n>1) keylist = keylist(sort(keylist)(::-1)); list = _lst(); for (i=1 ; i<=n ; ++i) { /* grow the list the fast way, adding new values to its head (adding to the tail would make growth an N^2 proposition, as would using the grow function) */ key = keylist(i); list = _cat(key, h_get(tab, key), list); } return list; } func h_cleanup(tab, recursively) /* DOCUMENT h_cleanup, tab, 0/1; or h_cleanup(tab, 0/1); Delete all void members of hash table TAB and return TAB. If the second argument is a true (non nil and non-zero) empty members get deleted recursively. SEE ALSO h_new. */ { local member; keylist = h_keys(tab); n = numberof(keylist); for (i=1 ; i<=n ; ++i) { key = keylist(i); eq_nocopy, member, h_get(tab, key); if (is_void(member)) h_pop, tab, key; else if (recursively && is_hash(member)) h_cleanup, member, recursively; } return tab; } func h_grow(tab, .., flatten=) /* DOCUMENT h_grow, tab, key, value, ...; Grow member named KEY of hash table TAB by VALUE. There may be any number of key-value pairs. If keyword FLATTEN is true, then VALUE(*) instead of VALUE is appended to the former contents of TAB.KEY. If member KEY does not already exists in TAB, then a new member is created with VALUE, or VALUE(*), as contents. SEE ALSO h_new. */ { local key, value; if (flatten) { while (more_args()) { eq_nocopy, key, next_arg(); h_set, tab, key, grow(h_get(tab, key), next_arg()(*)); } } else { while (more_args()) { eq_nocopy, key, next_arg(); h_set, tab, key, grow(h_get(tab, key), next_arg()); } } } func h_save_symbols(____l____, ..) /* DOCUMENT h_save_symbols(namelist, ...); or h_save_symbols(flag); Return hash table which references symbols given in NAMELIST or selected by FLAG (see symbol_names). Of course, the symbol names will be used as member names in the result. SEE ALSO h_new, h_restore_builtin, symbol_names. */ { /* Attempt to use dummy symbol names in this routine to avoid clash with the symbols ddefined in caller's context. */ while (more_args()) grow, ____l____, next_arg(); if ((____s____ = structof(____l____)) != string) { if ((____s____!=long && ____s____!=int && ____s____!=short && ____s____!=char) || dimsof(____l____)(1)) error, "expected a list of names, or nil, or a scalar integer"; ____l____ = symbol_names(____l____); } ____s____ = h_new(); ____n____ = numberof(____l____); for (____i____=1 ; ____i____<=____n____ ; ++____i____) { ____k____ = ____l____(____i____); h_set, ____s____, ____k____, symbol_def(____k____); } return ____s____; } local SAVE_BUILTINS; local __h_saved_builtins; func h_restore_builtin(name) { return h_get(__h_saved_builtins, name); } /* DOCUMENT h_restore_builtin(name); Get the original definition of builtin function NAME. This is useful if you deleted by accident a builtin function and want to recover it; for instance: sin = 1; ... sin = h_restore_builtin("sin"); would restore the definition of the sine function that was redefined by the assignation. To enable this feature, you must define the global variable SAVE_BUILTINS to be true before loading the Yeti package. For instance: SAVE_BUILTINS = 1; include, "yeti.i"; then all all current definitions of builtin functions will be referenced in global hash table __h_saved_builtins and could be retrieved by calling h_restore_builtin. Note that this feature is disabled in batch mode. SEE ALSO h_new, h_save_symbols, batch. */ if (! batch() && SAVE_BUILTINS && ! is_hash(__h_saved_builtins)) { __h_saved_builtins = h_save_symbols(32); } /*---------------------------------------------------------------------------*/ /* MORPHO-MATH OPERATORS */ extern morph_dilation; extern morph_erosion; /* DOCUMENT morph_dilation(a, r); or morph_erosion(a, r); These functions perform a dilation/erosion morpho-math operation onto input array A which must have at most 3 dimensions. A dilation (erosion) operation replaces every voxel of A by the maximum (minimum) value found in the voxel neighborhood as defined by the structuring element. Argument R defines the structuring element as follows: - If R is a scalar integer, then it is taken as the radius (in voxels) of the structuring element. - Otherwise, R gives the offsets of the structuring element relative to the coordinates of the voxel of interest. In that case, R must an array of integers with last dimension equals to the number of dimensions of A. In other words, if A is a 3-D array, then the offsets are: DX = R(1,..) DY = R(2,..) DZ = R(3,..) and the neighborhood of a voxel at (X,Y,Z) is defined as: (X + DX(I), Y + DY(I), Z + DZ(i)) for i=1,...,numberof(DX). Conversely, R = [DX, DY, DZ]. Thanks to that definition, structuring element with arbitrary shape and relative position can be used in morpho-math operations. For instance, the dilation of an image (a 2-D array) IMG by a 3-by-5 rectangular structuring element centered at the pixel of interest is obtained by: dx = indgen(-1:1); dy = indgen(-2:2); result = morph_dilation(img, [dx, dy(-,)]) SEE ALSO: morph_closing, morph_opening, morph_white_top_hat, morph_black_top_hat, morph_enhance. */ func morph_closing(a, r) { return morph_erosion(morph_dilation(a, r), r); } func morph_opening(a, r) { return morph_dilation(morph_erosion(a, r), r); } /* DOCUMENT morph_closing(a, r); or morph_opening(a, r); Perform an image closing/opening of A by a structuring element R. A closing is a dilation followed by an erosion, whereas an opening is an erosion followed by a dilation. See morph_dilation for the meaning of the arguments. SEE ALSO: morph_dilation, morph_white_top_hat, morph_black_top_hat. */ func morph_white_top_hat(a, r, s) { if (! is_void(s)) a = morph_closing(a, s); return a - morph_opening(a, r); } func morph_black_top_hat(a, r, s) { if (! is_void(s)) a = morph_opening(a, s); return morph_closing(a, r) - a; } /* DOCUMENT morph_white_top_hat(a, r); or morph_white_top_hat(a, r, s); or morph_black_top_hat(a, r); or morph_black_top_hat(a, r, s); Perform a summit/valley detection by applying a top-hat filter to array A. Argument R defines the structuring element for the feature detection. Optional argument gives the structuring element used to apply a smoothing to A prior to the top-hat filter. If R and S are specified as the radii of the structuring elements, then S should be smaller than R. For instance: morph_white_top_hat(bitmap, 3, 1) may be used to detect text or lines in a bimap image. SEE ALSO: morph_dilation, morph_closing, morph_enhance. */ func morph_enhance(a, r, s) /* DOCUMENT morph_enhance(a, r); or morph_enhance(a, r, s); Perform noise reduction with edge preserving on array A. The result is obtained by rescaling the values in A in a non-linear way between the local minimum and the local maximum. Argument R defines the structuring element for the local neighborhood. Argument S is a shape factor for the rescaling function which is a sigmoid function. If S is given, it must be a non-negative value, the larger is S, the steeper is the rescaling function. The shape factor should be larger than 3 or 5 to have a noticeable effect. If S is omitted, a step-like rescaling function is chosen: the output elements are set to either the local minimum or the local maximum which one is the closest. This corresponds to the limit of very large shape factors and implements the "toggle filter" proposed by Kramer & Bruckner [1]. The morph_enhance() may be iterated to achieve deblurring of the input array A (hundreds of iterations may be required). REFERENCES [1] H.P. Kramer & J.B. Bruckner, "iterations of a nonlinear transformation for enhancement of digital images", Pattern Recognition, vol. 7, pp. 53-58, 1975. SEE ALSO: morph_erosion, morph_dilation. */ { if (is_void(s)) { s = -1.0; /* special value */ } else if (s < 0.0) { error, "S must be non-negative"; } else { /* Pre-compute the range of the sigmoid function to detect early return with no change and skip the time consuming morpho-math operations. */ s = double(s); hi = 1.0/(1.0 + exp(-s)); lo = ((hi == 1.0) ? 0.0 : 1.0/(1.0 + exp(s))); if (hi == lo) { return a; } } /* Compute the local minima and maxima. */ amin = morph_erosion(a, r); amax = morph_dilation(a, r); /* Staircase remapping of values. */ if (s < 0.0) { test = ((a - amin) >= (amax - a)); return merge(amax(where(test)), amin(where(! test)), test); } /* Remapping of values with a sigmoid. */ test = ((amin < a)&(a < amax)); // values that need to change w = where(test); if (! is_array(w)) { return a; } type = structof(a); integer = is_integer(a); if (numberof(w) != numberof(a)) { /* Not all values change, select only those for which there is a difference between the local minimum and the local maximum. */ unchanged = a(where(! test)); a = a(w); amin = amin(w); amax = amax(w); } /* We use the sigmoid function f(t) = 1/(1 + exp(-t)) for the rescaling function g(t) = alpha*f(s*t) + beta with S the shape parameter, and (ALPHA,BETA) chosen to map the range [-1,1] into [0,1]. */ alpha = 1.0/(hi - lo); beta = alpha*lo; /* Linearly map the values in the range [-1,1] -- we already know that the local minimum and maximum are different. */ a = (double(a - amin) - double(amax - a))/double(amax - amin); a = alpha/(1.0 + exp(-s*a)) - beta; a = a*amax + (1.0 - a)*amin; if (! is_void(unchanged)) { a = merge(a, unchanged, test); } if (type != structof(a)) { if (integer) return type(round(a)); return type(a); } return a; } /*---------------------------------------------------------------------------*/ /* COST FUNCTIONS AND REGULARIZATION */ extern cost_l2; extern cost_l2l1; extern cost_l2l0; /* DOCUMENT cost_l2(hyper, res [, grd]) or cost_l2l1(hyper, res [, grd]) or cost_l2l0(hyper, res [, grd]) These functions compute the cost for an array of residuals RES and hyper-parameters HYPER (which can have 1, 2 or 3 elements). If optional third argument GRD is provided, it must be a simple variable reference used to store the gradient of the cost function with respect to the residuals. The cost_l2() function returns the sum of squared residuals times HYPER(1): COST_L2 = MU*sum(RES^2) where MU = HYPER(1). The cost_l2l1() and cost_l2l0() functions are quadratic (L2) for small residuals and non-quadratic (L1 and L0 respectively) for larger residuals. The thresholds for L2 / non-L2 transition are given by the second and third value of HYPER. If HYPER = [MU, TINF, TSUP] with TINF < 0 and TSUP > 0, an asymmetric cost function is computed as: COST_L2L0 = MU*(TINF^2*sum(atan(RES(INEG)/TINF)^2) + TSUP^2*sum(atan(RES(IPOS)/TPOS)^2)) COST_L2L1 = 2*MU*(TINF^2*sum(RES(INEG)/TINF - log(1 + RES(INEG)/TINF)) + TSUP^2*sum(RES(IPOS)/TSUP - log(1 + RES(IPOS)/TSUP))) with INEG = where(RES < 0) and IPOS = where(RES >= 0). If any or the thresholds is negative or zero, the L2 norm is used for residuals with the corresponding sign (same as having an infinite threshold level). The different cases are: TINF < 0 ==> L2-L1/L0 norm for negative residuals TINF = 0 ==> L2 norm for negative residuals TSUP = 0 ==> L2 norm for positive residuals TSUP > 0 ==> L2-L1/L0 norm for positive residuals For residuals much smaller (in magnitude) than the thresholds, the non-L2 cost function behave as the L2 one. For residuals much larger (in magnitude), than the thresholds, the L2-L1 cost function is L1 (i.e. scales as abs(RES)) and the L2-L0 cost function is L0 (tends to saturate). If HYPER = [MU, T], with T>0, a symmetric non-L2 cost function is computed with TINF = -T and TSUP = +T; in other words: COST_L2L0 = MU*T^2*sum(atan(RES/T)^2) COST_L2L1 = 2*MU*T^2*sum(abs(RES/T) - log(1 + abs(RES/T))) If HYPER has only one element (MU) the L2 cost function is used. Note that HYPER = [MU, 0] or HYPER = [MU, 0, 0] is the same as HYPER = MU (i.e. L2 cost function). This is an implementation issue; by continuity, the cost should be zero for a threshold equals to zero. SEE ALSO: rgl_roughness_l2; */ extern rgl_roughness_l2; extern rgl_roughness_l2_periodic; extern rgl_roughness_l1; extern rgl_roughness_l1_periodic; extern rgl_roughness_l2l1; extern rgl_roughness_l2l1_periodic; extern rgl_roughness_l2l0; extern rgl_roughness_l2l0_periodic; extern rgl_roughness_cauchy; extern rgl_roughness_cauchy_periodic; /* DOCUMENT err = rgl_roughness_SUFFIX(hyper, offset, arr); or err = rgl_roughness_SUFFIX(hyper, offset, arr, grd); Compute regularization penalty based on the roughness of array ARR. SUFFIX indicates the type of cost function and the boundary condition (see below). HYPER is the array of hyper-parameters; depending on the particular cost function, HYPER may have 1 or 2 elements (see below). OFFSET is an array of offsets for each dimensions of ARR (missing offsets are treated as being equal to zero): OFFSET(j) is the offset along j-th dimension between elements to compare. The penalty is equal to the sum of the costs of the differences between values of ARR separated by OFFSET; schematically: ERR = sum_k COST(ARR(k + OFFSET) - ARR(k)) The following penalties are implemented: rgl_roughness_l1 L1 norm rgl_roughness_l1_periodic L1 norm, periodic rgl_roughness_l2 L2 norm rgl_roughness_l2_periodic L2 norm, periodic rgl_roughness_l2l1 L2-L1 norm rgl_roughness_l2l1_periodic L2-L1 norm, periodic rgl_roughness_l2l0 L2-L0 norm rgl_roughness_l2l0_periodic L2-L0 norm, periodic rgl_roughness_cauchy Cauchy norm rgl_roughness_cauchy_periodic Cauchy norm, periodic The suffix "periodic" indicates periodic boundary condition. The different cost functions are): L1(x) = mu * abs(x) L2(x) = mu * x^2 L2L0(x) = mu * eps^2 * atan(x/eps))^2 L2L1(x) = 2 * mu * eps^2 * (abs(x/eps) - log(1 + abs(x/eps))) Cauchy(x) = mu * eps^2 * log(1 + (x/eps)^2) where X = ARR(k + OFFSET) - ARR(k), MU = HYPER(1) is the weight of the regularization and EPS = HYPER(2) is a threshold level. Restrictions: MU >= 0 and EPS >= 0 and the result is ERR = 0 when MU = 0 or EPS = 0 -- the case EPS = 0, is implemented by continuity. The L2-L0, L2-L1 and Cauchy cost functions behave as L2(X) = MU*X^2 for abs(X) much smaller than EPS. They differ in their tail for large values of abs(X): L2-L0 tends to be flat; L2-L1 behave as abs(X) and CAUCHY is intermediate. From a Baysian viewpoint, L2 correspond to the neg-log likelihood of a Gaussian distribution, CAUCHY correspond to the neg-log likelihood of a Cauchy (or Lorentzian) distribution. Optional argument GRD must be an unadorned variable where to store the gradient. If the argument GRD is omitted, no gradient is computed. On entry, the value of GRD may be empty to automatically or an array (convertible to real type) with same dimension list as ARR. In the first case, a new array is created to store the gradient; in the second case, the contents of GRD is augmented by the gradient (and GRD is converted to "double" if it is not yet the case). EXAMPLES To compute isotropic quadratic roughness along 2 first dimensions of A: g = array(double, dimsof(a)); // to store the gradient mu = 1e3; // regularization weight rgl = rgl_roughness_l2; // shortcut f = (rgl( mu, 1, a, g) + rgl( mu, [ 0, 1], a, g) + rgl(0.5*mu, [-1, 1], a, g) + rgl(0.5*mu, [ 1, 1], a, g)); To compute anisotropic roughness along first and third dimensions of A: g = array(double, dimsof(a)); // to store the gradient mu1 = 1e3; // regularization weight along first dimension mu2 = 3e4; // regularization weight along second dimension rgl = rgl_roughness_l2; // shortcut f = (rgl(mu1, 1, a, g) + // 1st dim rgl(mu3, [ 0, 0, 1], a, g) + // 3rd dim rgl(mu1 + mu3, [-1, 0, 1], a, g) + // 1st & 3rd dim rgl(mu1 + mu3, [ 1, 0, 1], a, g)); // 1st & 3rd dim SEE ALSO cost_l2. */ /*---------------------------------------------------------------------------*/ /* 1D CONVOLUTION AND "A TROUS" WAVELET TRANSFORM */ extern __yeti_convolve_f; /* PROTOTYPE void yeti_convolve_f(float array dst, float array src, int stride, int n, int nafter, float array ker, int w, int scale, int border, float array ws); */ extern __yeti_convolve_d; /* PROTOTYPE void yeti_convolve_d(double array dst, double array src, int stride, int n, int nafter, double array ker, int w, int scale, int border, double array ws); */ func yeti_convolve(a, which=, kernel=, scale=, border=, count=) /* DOCUMENT ap = yeti_convolve(a) Convolve array A along its dimensions (all by default) by a given kernel. By default, the convolution kernel is [1,4,6,4,1]/16.0. This can be changed by using keyword KERNEL (but the kernel must have an odd number of elements). The following operation is performed (with special handling for the boundaries, see keyword BORDER) along the direction(s) of interest: | ____ | \ | AP(i)= \ KERNEL(j+W) * A(i + j*SCALE) | / | /___ | -W <= j <= +W | where numberof(KERNEL)=2*W+1. Except for the SCALE factor, AP is mostly a convolution of A by array KERNEL along the direction of interest. Keyword WHICH can be used to specify the dimension(s) of interest; by default, all dimensions get convolved. As for indices, elements in WHICH less than 1 is taken as relative to the final dimension of the array. You may specify repeated convolution along some dimensions by using them several times in array WHICH (see keyword COUNT). Keyword BORDER can be used to set the handling of boundary conditions: BORDER=0 Extrapolate missing values by the left/rightmost ones (this is the default behaviour). BORDER=1 Extrapolate missing left values by zero and missing right values by the rightmost one. BORDER=2 Extrapolate missing left values by the leftmost one and missing right values by zero. BORDER=3 Extrapolate missing left/right values by zero. BORDER=4 Use periodic conditions. BORDER>4 or BORDER<0 Do not extrapolate missing values but normalize convolution product by sum of kernel weights taken into account (assuming they are all positive). By default, SCALE=1 which corresponds to a simple convolution. An other value can be used thanks to keyword SCALE (e.g. for the wavelet "a trou" method). The value of SCALE must be a positive integer. Keyword COUNT can be used to augment the amount of smoothing: COUNT (default COUNT=1) is the number of convolution passes. It is better (i.e. faster) to use only one pass with appropriate convolution kernel (see keyword KERNEL). SEE ALSO yeti_wavelet. RESTRICTIONS 1. Should use the in-place ability of the operation to limit the number of array copies. 2. Complex convolution not yet implemented (although it exists in the C-code). */ { /* Check data type of A. */ type = structof(a); if (type == complex) { return (yeti_convolve(double(a), which=which, kernel=kernel, scale=scale, border=border, count=count) + 1i*yeti_convolve(a.im, which=which, kernel=kernel, scale=scale, border=border, count=count)); } else if (type == double) { op = __yeti_convolve_d; } else if (type == float || type == long || type == int || type == short || type == char) { op = __yeti_convolve_f; type = float; } else { error, "bad data type"; } a = type(a); /* force a private copy of A */ /* Check dimensions of A and keyword WHICH. */ dims = dimsof(a); rank = dims(1); if (is_void(which)) { which = indgen(rank); } else { which += (which <= 0)*rank; if (min(which) < 1 || max(which) > rank) error, "dimension index out of range in WHICH"; } /* Check KERNEL and other keywords. */ if (is_void(kernel)) { k0= type(0.375); /* 6.0/16.0 */ k1= type(0.25); /* 4.0/16.0 */ k2= type(0.0625); /* 1.0/16.0 */ kernel= [k2, k1, k0, k1, k2]; } if ((w = numberof(kernel))%2 != 1) error, "KERNEL must have an odd number of elements"; if (is_void(scale)) scale = 1; else if (structof(scale+0)!=long || scale<=0) error, "bad value for keyword SCALE"; if (is_void(border)) border = 0; if (is_void(count)) count = 1; /* Compute strides. */ stride = array(1, rank); for (s=1,i=2 ; i<=rank ; ++i) stride(i) = stride(i-1)*dims(i); stride = stride(which); dims = dims(which + 1); nafter = numberof(a)/(dims*stride); /* Apply the operator along every dimensions of interest. */ for (i=1 ; i<=numberof(which) ; ++i) { len = dims(i); for (j=1 ; j<=count ; ++j) { op, a, a, stride(i), len, nafter(i), kernel, (w-1)/2, scale, border, array(type, 2*len); } } return a; } func yeti_wavelet(a, order, which=, kernel=, border=) /* DOCUMENT cube = yeti_wavelet(a, order) Compute the "a trou" wavelet transform of A. The result is such that: CUBE(.., i) = S_i - S_(i+1) where: S_1 = A S_(i+1) = yeti_convolve(S_i, SCALE=2^(i-1)) As a consequence: CUBE(..,sum) = A; SEE ALSO yeti_convolve. */ { if (((s=structof(order)) != long && s!=int && s!=short && s!=char) || dimsof(order)(1) || order<0) { error, "ORDER must be a non-negative integer"; } dims = dimsof(a); grow, dims, order+1; ++dims(1); cube = array(structof(a(1)+0.0f), dims); for (scale=1, i=1 ; i<=order ; ++i, scale*=2) { ap = a; a = yeti_convolve(a, which=which, kernel=kernel, scale=scale, border=border); cube(..,i) = ap-a; } cube(..,0) = a; return cube; } extern smooth3; /* DOCUMENT smooth3(a) Returns array A smoothed by a simple 3-element convolution (but for the edges). In one dimension, the smoothing operation reads: smooth3(A)(i) = C*A(i) + D*(A(i-1) + A(i+1)) but for the first and last element for which: smooth3(A)(1) = E*A(1) + D*A(2) smooth3(A)(n) = E*A(n) + D*A(n-1) where N is the length of the dimension and the coefficients are: C = 0.5 D = 0.25 E = 0.75 With the default value of C (see keyword C below), the smoothing operation is identical to: smooth3(A) = A(pcen)(zcen) for a 1D array smooth3(A) = A(pcen,pcen)(zcen,zcen) for a 2D array ... and so on Keyword C can be used to specify another value for the coefficient C (default: C=0.5); coefficients D and E are computed as follows: D = 0.5*(1 - C) E = 0.5*(1 + C) The default is to smooth A along all its dimensions, but keyword WHICH can be used to specify the only dimension to smooth. If WHICH is less or equal zero, then the smoothed dimension is the last one + WHICH. The smoothing operator implemented by smooth3 has the following properties: 1. The smoothing operator is linear and symmetric (for any number of dimensions in A). The symmetry of the smoothing operator is important for the computation of gradients in regularization. For instance, let Y = smooth3(X) and Q be a scalar function of Y, then then the gradient of Q with respect to X is simply: DQ_DX = smooth3(DQ_DY) where DQ_DY is the gradient of Q with respect to Y. 2. For a vector, A, smooth3(A)=S(,+)*A(+) where the matrix S is tridiagonal: [E D ] [D C D ] [ D C D ] [ \ \ \ ] where, to improve readability, [ \ \ \ ] missing values are all zero. [ D C D ] [ D C D] [ D E] You can, in principle, reverse the smoothing operation with TDsolve along each dimensions of smooth3(A). Note: for a vector A, the operator S-I applied to A (where I is the identity matrix) is the finite difference 2nd derivatives of A (but for the edges). 3. The definition of coefficients C, D and E insure that the smoothing operator does not change the sum of the element values of its argument, i.e.: sum(smooth3(A)) = sum(A). 4. Only an array with all elements having the same value is invariant by the smoothing operator. In fact "slopes" along dimensions of A are almost invariant, only the values along the edges are changed. KEYWORDS: c, which. SEE ALSO: TDsolve. */ /*---------------------------------------------------------------------------*/ /* STRING ROUTINES */ func strtrimleft(s) {return strtrim(s, 1);} func strtrimright(s) {return strtrim(s, 2);} /* DOCUMENT strtrimleft(s); or strtrimrigth(s); Returns input (array of) string(s) S without leading or trailing blanks. SEE ALSO strlower, strupper, string, strtrim. */ func strlower(s) { return strcase(0, s); } func strupper(s) { return strcase(1, s); } /* DOCUMENT strlower(s); or strupper(s); Returns input (array of) string(s) S converted to lower/upper case letters. SEE ALSO string, strcase, strtrimleft. */ /*---------------------------------------------------------------------------*/ /* MATH ROUTINES */ extern sinc; /* DOCUMENT sinc(x); Returns the "sampling function" of X as defined by Woodward (1953) and Bracewell (1999): sinc(x) = 1 for x=0 sin(PI*x)/(PI*x) otherwise Note: This definition correspond to the "normalized sinc function"; some other authors may define the sampling function without the PI factors in the above expression. REFERENCES Bracewell, R. "The Filtering or Interpolating Function, sinc(x)." In "The Fourier Transform and Its Applications", 3rd ed. New York: McGraw-Hill, pp. 62-64, 1999. Woodward, P. M. "Probability and Information Theory with Applications to Radar". New York: McGraw-Hill, 1953. SEE ALSO: sin. */ extern arc; /* DOCUMENT arc(x); Returns angle X wrapped in range (-PI, +PI]. */ /*---------------------------------------------------------------------------*/ /* SPARSE MATRICES AND MATRIX-VECTOR MULTIPLICATION */ extern sparse_matrix; /* DOCUMENT s = sparse_matrix(coefs, row_dimlist, row_indices, col_dimlist, col_indices); Returns a sparse matrix object. COEFS is an array with the non-zero coefficients of the full matrix. ROW_DIMLIST and COL_DIMLIST are the dimension lists of the matrix 'rows' and 'columns'. ROW_INDICES and COL_INDICES are the 'row' and 'column' indices of the non-zero coefficients of the full matrix. The sparse matrix object S can be used to perform sparse matrix multiplication as follows: S(x) or S(x, 0) yields the result of matrix multiplication of 'vector' X by S; X must be an array with dimension list COL_DIMLIST (or a vector with as many elements as an array with such a dimension list); the result is an array with dimension list ROW_DIMLIST. S(y, 1) yields the result of matrix multiplication of 'vector' Y by the transpose of S; Y must be an array with dimension list ROW_DIMLIST (or a vector with as many elements as an array with such a dimension list); the result is an array with dimension list COL_DIMLIST. The contents of the sparse matrix object S can be queried as with a regular Yorick structure: S.coefs, S.row_dimlist, S.row_indices, S.col_dimlist or S.col_indices are valid expressions if S is a sparse matrix. SEE ALSO: is_sparse_matrix, mvmult, sparse_expand, sparse_squeeze, sparse_grow. */ extern is_sparse_matrix; /* DOCUMENT is_sparse_matrix(obj) * Returns true if OBJ is a sparse matrix object; false otherwise. * * SEE ALSO: sparse_matrix. */ func sparse_grow(s, coefs, row_indices, col_indices) /* DOCUMENT sparse_grow(s, coefs, row_indices, col_indices); Returns a sparse matrix object obtained by growing the non-zero coefficients of S by COEFS with the corresponding row/column indices given by ROW_INDICES and COL_INDICES which must have the same number of elements as COEFS. SEE ALSO: sparse_matrix. */ { return sparse_matrix(grow(s.coefs, coefs), s.row_dimlist, grow(s.row_indices, row_indices), s.col_dimlist, grow(s.col_indices, col_indices)); } func sparse_squeeze(a, n) /* DOCUMENT s = sparse_squeeze(a); or s = sparse_squeeze(a, n); Convert array A into its sparse matrix representation. Optional argument N (default, N=1) is the number of dimensions of the input space. The dimension list of the input space are the N trailing dimensions of A and, assuming that A has NDIMS dimensions, the dimension list of the output space are the NDIMS - N leading dimensions of A. SEE ALSO: sparse_matrix, sparse_expand. */ { if (! is_array(a)) error, "unexpected non-array"; dimlist = dimsof(a); ndims = dimlist(1); if (is_void(n)) n = 1; /* one trailing dimension for the input space */ if ((m = ndims - n) < 0) error, "input space has too many dimensions"; if (! is_array((i = where(a)))) error, "input array is zero everywhere!"; (row_dimlist = array(long, m + 1))(1) = m; stride = 1; if (m >= 1) { row_dimlist(2:) = dimlist(2:m+1); for (j=m+1;j>=2;--j) stride *= dimlist(j); } (col_dimlist = array(long, n + 1))(1) = n; if (n >= 1) col_dimlist(2:) = dimlist(m+2:0); j = i - 1; return sparse_matrix(a(i), row_dimlist, 1 + j%stride, col_dimlist, 1 + j/stride); } func sparse_expand(s) /* DOCUMENT a = sparse_expand(s); Convert sparse matrix S into standard Yorick's array A. SEE ALSO: sparse_squeeze, histogram. */ { row_dimlist = s.row_dimlist; stride = 1; j = row_dimlist(1) + 2; while (--j >= 2) stride *= row_dimlist(j); a = array(structof(s.coefs), row_dimlist, s.col_dimlist); #if 0 /* We cannot do that because, coefficients may not be unique. */ a(s.row_indices + (s.col_indices - 1)*stride) = s.coefs; #endif a(*) = histogram(s.row_indices + (s.col_indices - 1)*stride, s.coefs, top=numberof(a)); return a; } local sparse_restore, sparse_save; /* DOCUMENT sparse_save, pdb, obj; or sparse_restore(pdb); The subroutine sparse_save saves the sparse matrix OBJ into file PDB. The function sparse_restore restores the sparse matrix saved into file PDB. PDB is either a file name or a PDB file handle. SEE ALSO: createb, openb, restore, save, sparse_matrix. */ func sparse_save(pdb, obj) { if (! is_sparse_matrix(obj)) error, "expecting a sparse matrix"; if (structof(pdb) == string) { logfile = pdb + "L"; if (open(logfile, "r", 1)) logfile = 0; pdb = createb(pdb); if (logfile) remove, logfile; } local coefs, row_dimlist, row_indices, col_dimlist, col_indices; eq_nocopy, coefs, obj.coefs; eq_nocopy, row_dimlist, obj.row_dimlist; eq_nocopy, row_indices, obj.row_indices; eq_nocopy, col_dimlist, obj.col_dimlist; eq_nocopy, col_indices, obj.col_indices; save, pdb, coefs, row_dimlist, row_indices, col_dimlist, col_indices; } func sparse_restore(pdb) { local coefs, row_dimlist, row_indices, col_dimlist, col_indices; if (structof(pdb) == string) pdb = openb(pdb); restore, pdb, coefs, row_dimlist, row_indices, col_dimlist, col_indices; return sparse_matrix(coefs, row_dimlist, row_indices, col_dimlist, col_indices); } extern mvmult; /* DOCUMENT y = mvmult(a, x); or y = mvmult(a, x, 0/1); Returns the result of (generalized) matrix-vector multiplication of vector X (a regular Yorick array) by matrix A (a regular Yorick array or a sparse matrix). The matrix-vector multiplication is performed as if there is only one index running over the elements of X and the trailing/leading dimensions of A. If optional last argument is omitted or false, the summation index runs across the trailing dimensions of A which must be the same as those of X and the dimensions of the result are the remaining leading dimensions of A. If optional last argument is 1, the matrix operator is transposed: the summation index runs across the leading dimensions of A which must be the same as those of X and the dimensions of the result are the remaining trailing dimensions of A. SEE ALSO: sparse_matrix, sparse_squeeze. */ /*---------------------------------------------------------------------------*/ /* ACCESSING YORICK'S INTERNALS */ extern is_hash; /* DOCUMENT is_hash(object) Returns 1, if OBJECT is a regular hash table; returns 2, if OBJECT is a hash table with a specialized evaluator; returns 0, if OBJECT is not a hash table. SEE ALSO: h_new, h_evaluator, is_array, is_func, is_integer, is_list, is_range, is_scalar, is_stream, is_struct, is_void. */ extern nrefsof; /* DOCUMENT nrefsof(object) Returns number of references on OBJECT. SEE ALSO: unref. */ extern get_encoding; /* DOCUMENT get_encoding(name); Return the data layout for machine NAME, one of: "native" the current machine (little-endians) "i86" Intel x86 Linux "ibmpc" IBM PC (2 byte int) "alpha" Compaq alpha "dec" DEC workstation (MIPS), Intel x86 Windows "vax" DEC VAX (H-double) "vaxg" DEC VAX (G-double) (big-endians) "xdr" External Data Representation "sun" Sun, HP, SGI, IBM-RS6000, MIPS 32 bit "sun3" Sun-2 or Sun-3 (old) "sgi64" SGI, Sun, HP, IBM-RS6000 64 bit "mac" MacIntosh 68000 (power Mac, Gx are __sun) "macl" MacIntosh 68000 (12 byte double) "cray" Cray XMP, YMP The result is a vector of 32 long's as follow: [size, align, order] repeated 6 times for char, short, int, long, float, and double, except that char align is always 1, so result(2) is the structure alignment (see struct_align). [sign_address, exponent_address, exponent_bits, mantissa_address, mantissa_bits, mantissa_normalization, exponent_bias] repeated twice for float and double. See the comment at the top of file prmtyp.i for an explanation of these fields. The total number of items is therefore 3*6 + 7*2 = 32. SEE ALSO get_primitives, set_primitives, install_encoding, machine_constant. */ func install_encoding(file, encoding) /* DOCUMENT install_encoding, file, encoding; Set layout of primitive data types for binary stream FILE. ENCODING may be one of the names accepted by get_encoding or an array of 32 integers as explained in get_encoding documentation. SEE ALSO: get_encoding, install_struct. */ { /* Get encoding parameters with minimal check. */ if (structof(encoding) == string) { p = get_encoding(encoding); } else { if ((s = structof(encoding)) == long) p = encoding; else if (/*s==char || s==short || */s==int) p = long(encoding); else error, "bad data type for ENCODING"; if (numberof(p) != 32) error, "bad number of elements for encoding"; } /* Install primitive definitions. */ install_struct, file, "char", 1, 1, p( 3); install_struct, file, "short", p( 4), p( 5), p( 6); install_struct, file, "int", p( 7), p( 8), p( 9); install_struct, file, "long", p(10), p(11), p(12); install_struct, file, "float", p(13), p(14), p(15), p(19:25); install_struct, file, "double", p(16), p(17), p(18), p(26:32); struct_align, file, p(2); } func same_encoding(a, b) /* DOCUMENT same_encoding(a, b) Compare primitives A and B which must be conformable integer arrays with first dimension equals to 32 (see set_primitives). The result is an array of int's with one less dimension than A-B (the first one). Some checking is performed for the operands. The byte order for the char data type is ignored in the comparison. SEE ALSO install_encoding, get_encoding.*/ { if (! is_array((d = dimsof(a, b))) || d(1) < 1 || d(2) != 32) error, "bad dimensions"; diff = abs(a - b); if ((s = structof(diff)) != long && s != int) error, "bad data type"; if (anyof(a(1,..) != 1) || anyof(b(1,..) != 1)) error, "unexpected sizeof(char) != 1"; diff(3, ..) = 0; /* ignore byte order for type char */ return ! diff(max,); } local DBL_EPSILON, DBL_MIN, DBL_MAX; local FLT_EPSILON, FLT_MIN, FLT_MAX; extern machine_constant; /* DOCUMENT machine_constant(str) Returns the value of the machine dependent constant given its name STR. STR is a scalar string which can be one of (prefixes "FLT_" and "DBL_" are for single/double precision respectively): "FLT_MIN", "DBL_MIN" - minimum normalized positive floating-point number; "FLT_MAX", "DBL_MAX" - maximum representable finite floating-point number; "FLT_EPSILON", "DBL_EPSILON" - the difference between 1 and the least value greater than 1 that is representable in the given floating point type: B^(1 - P); "FLT_MIN_EXP", "DBL_MIN_EXP" - minimum integer EMIN such that FLT_RADIX^(EMIN - 1) is a normalized floating-point value; "FLT_MIN_10_EXP" "DBL_MIN_10_EXP" - minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers: ceil(log10(B)*(EMIN - 1)); "FLT_MAX_EXP", "DBL_MAX_EXP" - maximum integer EMAX such that FLT_RADIX^(EMAX - 1) is a normalized floating-point value; "FLT_MAX_10_EXP" "DBL_MAX_10_EXP" - maximum integer such that 10 raised to that power is in the range of normalized floating-point numbers: floor(log10((1 - B^(-P))*(B^EMAX))) "FLT_RADIX" - radix of exponent representation, B; "FLT_MANT_DIG", "DBL_MANT_DIG" - number of base-FLT_RADIX significant digits P in the mantissa; "FLT_DIG", "DBL_DIG" - number of decimal digits, Q, such that any floating-point number with Q decimal digits can be rounded into a floating-point number with P (FLT/DBL_MANT_DIG) radix B (FLT_RADIX) digits and back again without change to the Q decimal digits: Q = P*log10(B) if B is a power of 10 Q = floor((P - 1)*log10(B)) otherwise SEE ALSO: get_encoding. */ DBL_EPSILON = machine_constant("DBL_EPSILON"); DBL_MIN = machine_constant("DBL_MIN"); DBL_MAX = machine_constant("DBL_MAX"); FLT_EPSILON = machine_constant("FLT_EPSILON"); FLT_MIN = machine_constant("FLT_MIN"); FLT_MAX = machine_constant("FLT_MAX"); func symbol_info(____n____) /* DOCUMENT symbol_info, flags; or symbol_info, names; or symbol_info; Print out some information about Yorick's symbols. FLAGS is a scalar integer used to select symbol types (as in symbol_names). NAMES is an array of symbol names. If argument is omitted or undefined, all defined array symbols get selected (as with FLAGS=3). SEE ALSO: info, mem_info, symbol_def, symbol_names.*/ { /* attempt to use _very_ odd names to avoid clash with caller */ if (is_void(____n____)) ____n____ = symbol_names(3); else if (! is_string(____n____)) ____n____ = symbol_names(____n____); for (____i____ = 1; ____i____ <= numberof(____n____); ++____i____) { write, format="%s:", ____n____(____i____); info, symbol_def(____n____(____i____)); } } func mem_info(____a____) /* DOCUMENT mem_info; or mem_info, count; Print out some information about memory occupation. If COUNT is specified, the COUNT biggest (in bytes) symbols are listed (use COUNT<0 to list all symbols sorted by size). Only the memory used by Yorick's array symbols (including array of pointers), lists and hash tables is considered. BUGS: Symbols which are aliases (e.g. by using eq_nocopy) may be considered several times. SEE ALSO: symbol_def, symbol_info, symbol_names, mem_clear, fullsizeof. */ { ____n____ = symbol_names(3); ____i____ = numberof(____n____); ____s____ = array(long, ____i____); while (____i____ > 0) { ____s____(____i____) = fullsizeof(symbol_def(____n____(____i____))); --____i____; } ____i____ = sum(____s____); write, format="Total memory used by array symbols: %d bytes (%.3f Mb)\n", ____i____, ____i____/1024.0^2; if (____a____) { ____i____ = sort(____s____); if (____a____ > 0 && ____a____ < numberof(____i____)) ____i____ = ____i____(1-____a____:0); ____n____ = ____n____(____i____); ____s____ = ____s____(____i____); ____i____ = numberof(____i____); if (____i____ > 1) { write, format="The %d biggest symbols are:\n", ____i____; } else { write, format="%s", "The biggest symbol is:\n"; } ____a____ = swrite(format=" %%%ds: %%%.0fd bytes,", max(strlen(____n____)), ceil(log10(max(____s____)))); while (____i____ > 0) { write, format=____a____, ____n____(____i____), ____s____(____i____); info, symbol_def(____n____(____i____)); --____i____; } } } func mem_clear(_____s_____, _____f_____) /* DOCUMENT mem_clear; or mem_clear, minsize; or mem_clear, minsize, flags; *** USE THIS FUNCTION WITH CARE *** Clear (that is destroy) global symbols larger than MINSIZE bytes (default 1024 bytes). Symbol names starting with an underscore are not destroyed. Optional argument FLAGS (default 0) is a bitwise combination of the following bits: 0x01 - quiet mode, do not even print the summary. 0x02 - verbose mode, print out the names of matched symbols. 0x04 - dry-run mode, the symbols are not really destroyed. When called as a function, returns the number of bytes released. SEE ALSO: symbol_def, symbol_info, symbol_names, fullsizeof. */ { _____t_____ = 0; if (is_void(_____s_____)) { _____s_____ = 1024; } _____n_____ = symbol_names(1 | 2 | 1024); _____i_____ = 1 + numberof(_____n_____); while (--_____i_____ > 0) { if (strpart(_____n_____(_____i_____), 1:1) != "_") { _____u_____ = fullsizeof(symbol_def(_____n_____(_____i_____))); if (_____u_____ > _____s_____) { if ((_____f_____ & 0x02) != 0) { write, _____n_____(_____i_____); } if ((_____f_____ & 0x04) == 0) { symbol_set, _____n_____(_____i_____), []; } _____t_____ += _____u_____; } } } if (am_subroutine()) { if ((_____f_____ & 0x01) == 0) { write, format="%d bytes of memory cleared\n", _____t_____; } } else { return t; } } func fullsizeof(x) /* DOCUMENT fullsizeof(x) Returns size in bytes of object X. Similar to sizeof (which see) function but also works for lists, arrays of pointers, or hash tables. SEE ALSO: sizeof, is_list, is_array, is_hash. */ { if (is_array(x)) { if (structof(x) == pointer) { s = 0; for (k = numberof(x); k >= 1; --k) { s += fullsizeof(*x(k)); } return s; } else { return sizeof(x); } } if (is_hash(x)) { s = 0; for (k = h_first(x); k; k = h_next(x, k)) { s += fullsizeof(h_get(x, k)); } return s; } if (is_list(x)) { s = 0; while (x) { s += fullsizeof(_car(x)); x = _cdr(x); } return s; } return sizeof(x); } extern insure_temporary; /* DOCUMENT insure_temporary, var1 [, var2, ...]; Insure that symbols VAR1 (VAR2 ...) are temporary variables referring to arrays. Useful prior to in-place operations to avoid side-effects for caller. SEE ALSO: eq_nocopy, nrefsof, swap, unref. */ extern mem_base; extern mem_copy; extern mem_peek; /* DOCUMENT mem_base(array); or mem_copy, address, expression; or mem_peek(address, type, dimlist); Hacker routines to read/write data at a given memory location. These routines allow the user to do _very_ nasty but sometimes needed things. They do not provide the safety level of ususal Yorick routines, and must therefore be used with extreme care (you've been warned). In all these routines, ADDRESS is either a long integer scalar or a scalar pointer (e.g. &OBJECT). mem_base returns the address (as a long scalar) of the first element of array object ARRAY. You can use this function if you need to add some offset to the address of an object, e.g. to reach some particular element of an array or a structure. mem_copy copy the contents of EXPRESSION at memory location ADDRESS. mem_peek returns a new array of data type TYPE and dimension list DIMLIST filled with memory contents starting at address ADDRESS. EXAMPLE The following statement converts the contents of complex array Z as an array of doubles: X = mem_peek(mem_base(Z), double, 2, dimsof(Z)); then: X(1,..) is Z.re X(2,..) is Z.im SEE ALSO reshape, native_byte_order. */ func native_byte_order(type) /* DOCUMENT native_byte_order() or native_byte_order(type) Returns the native byte order, one of: "LITTLE_ENDIAN", "BIG_ENDIAN", or "PDP_ENDIAN". Optional argument TYPE is an integer data type (default is long). SEE ALSO mem_peek. */ { if (is_void(type)) type = long; size = sizeof(type); (carr = array(char, size))(*) = indgen(size:1:-1); value = mem_peek(mem_base(carr), type); if (size == 4) { if (value == 0x01020304) { return "LITTLE_ENDIAN"; } else if (value == 0x04030201) { return "BIG_ENDIAN"; } else if (value == 0x03040102) { return "PDP_ENDIAN"; } } else if (size == 2) { if (value == 0x0102) { return "LITTLE_ENDIAN"; } else if (value == 0x0201) { return "BIG_ENDIAN"; } } error, "unknown byte order"; } local Y_MMMARK, Y_PSEUDO, Y_RUBBER, Y_RUBBER1, Y_NULLER; local Y_MIN_DFLT, Y_MAX_DFLT; extern parse_range; extern make_range; /* DOCUMENT arr = parse_range(rng); or rng = make_range(arr); The parse_range() function converts the index range RNG into an array of 4 long integers: [FLAGS,MIN,MAX,STEP]. The make_range() function does the opposite. For a completely specified range, FLAGS is 1. Otherwise, FLAGS can be Y_MMMARK for a matrix multiply marker (+) -- which is almost certainly a syntax error in any other context -- Y_PSEUDO for the pseudo-range index (-), Y_RUBBER for the .. index, Y_RUBBER1 for the * index, and Y_NULLER for the result of where(0). Bits Y_MIN_DFLT and Y_MAX_DFLT can be set in FLAGS to indicate whether the minimum or maximum is defaulted; there is no flag for a default step, so there is no way to tell the difference between x(:) and x(::1). SEE ALSO: is_range. */ Y_MMMARK = 2; Y_PSEUDO = 3; Y_RUBBER = 4; Y_RUBBER1 = 5; Y_NULLER = 6; Y_MIN_DFLT = 16; Y_MAX_DFLT = 32; extern make_dimlist; /* DOCUMENT make_dimlist(arg1, arg2, ...) or make_dimlist, arg1, arg2, ...; Concatenate all arguments as a single dimension list. The function form returns the resulting dimension list whereas the subroutine form redefines the contents of its first argument which must be a simple variable reference. The resulting dimension list is always of the form [NDIMS, DIM1, DIM2, ...]. EXAMPLES In the following example, a first call to make_dimlist is needed to make sure that input argument DIMS is a valid dimension list if there are no other input arguments: func foo(a, b, dims, ..) { // build up dimension list: make_dimlist, dims; while (more_args()) make_dimlist, dims, next_arg(); ...; } Here is an other example: func foo(a, b, ..) { // build up dimension list: dims = [0]; while (more_args()) make_dimlist, dims, next_arg(); ...; } SEE ALSO: array, build_dimlist. */ /*---------------------------------------------------------------------------*/ /* COMPLEX NUMBERS */ func make_hermitian(z, half=, method=, debug=) /* DOCUMENT zp = make_hermitian(z) or make_hermitian, z; Insure that complex array Z is hermitian (in the FFT sense). The resulting hermitian array ZP is such that: ZP(kneg(k)) = conj(ZP(k)) where k is the index of a given FFT frequency U and kneg(k) is the index of -U. When called as a subroutine, the operation is made in-place. Input array Z must be complex. The particular method used to apply the hermitian constraint can be set by keyword METHOD. Use METHOD = 0 or undefined to "copy" the values: ZP(k) = Z(k) ZP(kneg(k)) = conj(Z(k)) for all indices k such that k < kneg(k) -- in words, the only relevant values in input array Z are those which appear before their negative frequency counterpart. Use METHOD = 1 to average the values: ZP(k) = (Z(k) + conj(Z(kneg(k))))/2 ZP(kneg(k)) = (conj(Z(k)) + Z(kneg(k)))/2 Finally, use METHOD = 2 to "sum" the values (useful to make a gradient hermitian): ZP(k) = Z(k) + conj(Z(kneg(k))) ZP(kneg(k)) = conj(Z(k)) + Z(kneg(k)) Set keyword HALF true to indicate that only half the Fourier frequencies are stored into Z. This is for instance the case when real FFTW is used. SEE ALSO: fft, fftw. */ { if (! am_subroutine()) { /* force copy */ insure_temporary, z; } /* Compute indices of negative frequencies starting by the last dimension. */ local u, v; dimlist = dimsof(z); if ((n = numberof(dimlist)) <= 1) { /* Same as if: u = v = 1 below */ z.im = 0.0; return z; } flag = 0n; for (k = n; k >= 2; --k) { dim = dimlist(k); /* Compute index J of negative frequency along that dimension. */ if (k == 2) { /* Cope with first dimension. Index J is 1-based. */ if (half) { /* Only zero-th frequency along the first dimension has possibly a negative counterpart in the same array. */ j = [1]; } else { (j = indgen(dim+1:2:-1))(1) = 1; } } else { /* Other dimension than the first one. Index J is zero-based. */ (j = indgen(dim:1:-1))(1) = 0; } if (flag) { v = j + dim*v(-,..); } else { v = j; flag = 1n; } } u = array(long, dimsof(v)); if (half) { stride = dimlist(2); u(*) = indgen(1 : 1 + stride*(numberof(u) - 1) : stride); } else { u(*) = indgen(numberof(u)); } if (debug) { return [&u, &v]; } /* Fix frequencies which must be real (at least zero-th frequency). */ z(u(where(u == v))).im = 0.0; /* Fix other frequencies. */ j = where(u < v); if (is_array(j)) { u = u(j); v = v(j); if (! method) { /* apply "copy" method */ z(v) = conj(z(u)); } else if (method == 1) { /* apply "average" method */ tmp = 0.5*(z(u) + conj(z(v))); z(u) = tmp; z(v) = conj(tmp); } else if (method == 2) { /* apply "sum" method */ tmp = z(u) + conj(z(v)); z(u) = tmp; z(v) = conj(tmp); } else { error, "bad METHOD"; } } return z; } /*---------------------------------------------------------------------------*/ /* Manage to autoload other parts of Yeti package. */ if (! is_func(yhd_save)) { autoload, "yeti_yhdf.i", yhd_save, yhd_restore, yhd_check, yhd_info; } /* * Local Variables: * mode: Yorick * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 79 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_convolve.c000066400000000000000000000252001253351442600164200ustar00rootroot00000000000000/* * yeti_convolve.c - * * Convolution and wavelet smoothing along one dimension. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #ifndef _YETI_CONVOLVE_C #define _YETI_CONVOLVE_C 1 extern void yeti_convolve_f(float dst[], const float src[], int stride, int n, int nafter, const float ker[], int w, int scale, int border, float ws[]); extern void yeti_convolve_c(float dst[], const float src[], int stride, int n, int nafter, const float ker[], int w, int scale, int border, float ws[]); extern void yeti_convolve_d(double dst[], const double src[], int stride, int n, int nafter, const double ker[], int w, int scale, int border, double ws[]); extern void yeti_convolve_z(double dst[], const double src[], int stride, int n, int nafter, const double ker[], int w, int scale, int border, double ws[]); #define HAVE_MEMCPY 1 /* use memcpy instead of loops? */ #if HAVE_MEMCPY # include #endif /*---------------------------------------------------------------------------*/ /* OPERATIONS FOR COMPLEX DATA TYPE */ #define _(name, op, real_t) \ void name(real_t dst[], const real_t src[], int stride, int n, int nafter, \ const real_t ker[], int w, int scale, int border, real_t ws[]) \ { \ op(dst, src, 2*stride, n, nafter, ker, w, scale, border, ws); \ op(dst+1, src+1, 2*stride, n, nafter, ker, w, scale, border, ws); \ } _(yeti_convolve_c, yeti_convolve_f, float) _(yeti_convolve_z, yeti_convolve_d, double) #undef _ /*---------------------------------------------------------------------------*/ /* OPERATIONS FOR FLOATING POINT DATA TYPE */ #define real_t float #define ZERO 0.0f #define CONVOLVE yeti_convolve_f #define CONVOLVE_1 convolve_f #include __FILE__ #define real_t double #define ZERO 0.0 #define CONVOLVE yeti_convolve_d #define CONVOLVE_1 convolve_d #include __FILE__ #else /* _YETI_CONVOLVE_C defined. ------------------------------------------*/ /* Private routines, data and definitions used in this file. */ #ifdef CONVOLVE_1 static void CONVOLVE_1(real_t dst[], const real_t src[], int n, const real_t ker[], int w, int scale, int border); #endif #ifdef CONVOLVE void CONVOLVE(real_t *dst, const real_t *src, int stride, int n, int nafter, const real_t *ker, int w, int scale, int border, real_t *ws) { int i, j, k, l; ker += w; if (stride == 1) { if (dst == src) { for (k=l=0; l1) { /************************* * * * WAVELET CONVOLUTION * * * *************************/ int ws = w*scale; switch (border) { case 0: /* Extrapolate missing left values by the leftmost one and missing right values by the rightmost one. */ xl = src[0]; xr = src[n-1]; for (i=0 ; i=0 ? (k=n) sum += ker[j] * xr; else if (k>=0) sum += ker[j] * src[k]; } dst[i] = sum; } break; case 2: /* Extrapolate missing left values by the leftmost one and missing right values by zero. */ xl = src[0]; for (i=0 ; i=0) sum += ker[j] * src[k]; } dst[i] = sum; } break; case 4: /* Periodic conditions. */ for (i=0 ; i=0) { sum += (xr= ker[j]) * src[k]; xl += xr; } } dst[i] = xl ? sum/xl : ZERO; } } } else { /************************ * * * NORMAL CONVOLUTION * * * ************************/ int jl, jr, wp1=w+1; switch (border) { case 0: /* Extrapolate missing left values by the leftmost one and missing right values by the rightmost one. */ xl = src[0]; xr = src[n-1]; for (i=0 ; i wp1) jr = wp1; /* limit of right border */ for (j=-w ; j wp1) jr = wp1; /* limit of right border */ for (j=jl, k=i+j ; j wp1) jr = wp1; /* limit of right border */ for (j=-w ; j wp1) jr = wp1; /* limit of right border */ for (j=jl, k=i+j ; j=n) k = 0; sum += ker[j] * src[k]; } dst[i] = sum; } break; default: /* Do not extrapolate missing values but normalize convolution product by sum of kernel weights taken into account (assuming they are all positive). */ for (i=0 ; i= 0) { xr = ker[j]; sum += xr * src[k]; xl += xr; } } dst[i] = xl ? sum/xl : ZERO; } } } } #endif /* CONVOLVE_1 */ /*---------------------------------------------------------------------------*/ #undef real_t #undef ZERO #undef CONVOLVE #undef CONVOLVE_1 #endif /* _YETI_CONVOLVE_C */ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_cost.c000066400000000000000000000253021253351442600155400ustar00rootroot00000000000000/* * yeti_cost.c - * * Implement various cost functions for solving inverse problems * in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include extern BuiltIn Y_cost_l2; extern BuiltIn Y_cost_l2l1; extern BuiltIn Y_cost_l2l0; typedef double cost_worker_t(const double hyper[], const double x[], double g[], size_t number, int kase); static cost_worker_t cost_l2; static cost_worker_t cost_l2l1; static cost_worker_t cost_l2l0; static void cost_wrapper(int argc, const char *name, cost_worker_t *worker); void Y_cost_l2(int argc) { cost_wrapper(argc, "l2", cost_l2); } void Y_cost_l2l1(int argc) { cost_wrapper(argc, "l2-l1", cost_l2l1); } void Y_cost_l2l0(int argc) { cost_wrapper(argc, "l2-l0", cost_l2l0); } static void cost_wrapper(int argc, const char *name, cost_worker_t *worker) { const double ZERO = 0.0; double result, mu, tpos, tneg, hyper[3]; Operand op; size_t number; const double *x; double *g ; Symbol *s; long index; int kase, temporary; if (argc < 2 || argc > 3) YError("expecting 2 or 3 arguments"); /* Get the hyper-parameters. */ s = sp - argc + 1; if (s->ops && s->ops->FormOperand(s, &op)->ops->isArray) { number = op.type.number; if (number < 1 || number > 3) { YError("expecting 1, 2 or 3 hyper-parameters"); return; } switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); case T_DOUBLE: x = (const double *)op.value; break; default: YError("bad data type for the hyper-parameters"); return; } } else { YError("hyper-parameters must be an array"); return; } if (number == 1) { mu = x[0]; tneg = ZERO; tpos = ZERO; } else if (number == 2) { mu = x[0]; tneg = -x[1]; tpos = +x[1]; } else { mu = x[0]; tneg = x[1]; tpos = x[2]; } kase = 0; if (tneg < ZERO) kase |= 1; else if (tneg != ZERO) YError("lower threshold must be negative"); if (tpos > ZERO) kase |= 2; else if (tpos != ZERO) YError("upper threshold must be positive"); /* Get the parameters. */ ++s; x = (double *)0; temporary = 0; if (s->ops && s->ops->FormOperand(s, &op)->ops->isArray) { switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); case T_DOUBLE: x = (const double *)op.value; temporary = (! op.references); number = op.type.number; } } if (! x) { YError("invalid input array"); return; } if (argc == 3) { /* Get the symbol for the gradient. */ /* If gradient is required and input array X is a temporary one, re-use X as the output gradient; otherwise, create a new array from scratch for G (see BuildResultU in ops0.c). */ ++s; if (s->ops!=&referenceSym) YError("needs simple variable reference to store the gradient"); index = s->index; Drop(1); if (temporary) { g = (double *)x; } else { g = ((Array *)PushDataBlock(NewArray(&doubleStruct, op.type.dims)))->value.d; } } else { index = -1L; g = (double *)0; } hyper[0] = mu; hyper[1] = tneg; hyper[2] = tpos; result = worker(hyper, x, g, number, kase); if (index >= 0L) PopTo(&globTab[index]); PushDoubleValue(result); } static double cost_l2(const double hyper[], const double x[], double g[], size_t number, int kase) { double mu, result, gscl, t; size_t i; result = 0.0; mu = hyper[0]; gscl = mu + mu; if (g) { for (i = 0 ; i < number ; ++i) { t = x[i]; g[i] = gscl*t; result += mu*t*t; } } else { for (i = 0 ; i < number ; ++i) { t = x[i]; result += mu*t*t; } } return result; } static double cost_l2l1(const double hyper[], const double x[], double g[], size_t number, int kase) { const double ZERO = 0.0; const double ONE = 1.0; double mu, result, qneg, qpos, fneg, fpos, gscl, t, q; size_t i; result = ZERO; mu = hyper[0]; gscl = mu + mu; switch (kase) { case 0: /* L2 norm for all residuals. */ if (g) { for (i = 0 ; i < number ; ++i) { t = x[i]; g[i] = gscl*t; result += mu*t*t; } } else { for (i = 0 ; i < number ; ++i) { t = x[i]; result += mu*t*t; } } break; case 1: /* L2-L1 norm for negative residuals, L2 norm for positive residuals. */ qneg = ONE/hyper[1]; fneg = gscl*hyper[1]*hyper[1]; if (g) { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) < ZERO) { q = qneg*t; g[i] = gscl*t/(ONE + q); result += fneg*(q - log(ONE + q)); } else { g[i] = gscl*t; result += mu*t*t; } } } else { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) < ZERO) { q = qneg*t; result += fneg*(q - log(ONE + q)); } else { result += mu*t*t; } } } break; case 2: /* L2 norm for negative residuals, L2-L1 norm for positive residuals. */ qpos = ONE/hyper[2]; fpos = gscl*hyper[2]*hyper[2]; if (g) { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) > ZERO) { q = qpos*t; g[i] = gscl*t/(ONE + q); result += fpos*(q - log(ONE + q)); } else { g[i] = gscl*t; result += mu*t*t; } } } else { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) > ZERO) { q = qpos*t; result += fpos*(q - log(ONE + q)); } else { result += mu*t*t; } } } break; case 3: /* L2-L1 norm for all residuals. */ qneg = ONE/hyper[1]; fneg = gscl*hyper[1]*hyper[1]; qpos = ONE/hyper[2]; fpos = gscl*hyper[2]*hyper[2]; if (g) { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) < ZERO) { q = qneg*t; g[i] = gscl*t/(ONE + q); result += fneg*(q - log(ONE + q)); } else { q = qpos*t; g[i] = gscl*t/(ONE + q); result += fpos*(q - log(ONE + q)); } } } else { for (i = 0 ; i < number ; ++i) { if ((t = x[i]) < ZERO) { q = qneg*t; result += fneg*(q - log(ONE + q)); } else { q = qpos*t; result += fpos*(q - log(ONE + q)); } } } break; } return result; } static double cost_l2l0(const double hyper[], const double x[], double g[], size_t number, int kase) { const double ZERO = 0.0; const double ONE = 1.0; double mu, result, tneg, tpos, qneg, qpos, r, s, t; size_t i; result = ZERO; mu = hyper[0]; s = mu + mu; switch (kase) { case 0: /* L2 norm for all residuals. */ if (g) { for (i = 0 ; i < number ; ++i) { r = x[i]; g[i] = s*r; result += r*r; } } else { for (i = 0 ; i < number ; ++i) { r = x[i]; result += r*r; } } break; case 1: /* L2-L0 norm for negative residuals, L2 norm for positive residuals. */ tneg = hyper[1]; qneg = ONE/tneg; if (g) { for (i = 0 ; i < number ; ++i) { if ((r = x[i]) < ZERO) { t = qneg*r; r = tneg*atan(t); g[i] = s*r/(ONE + t*t); result += r*r; } else { g[i] = s*r; result += r*r; } } } else { for (i = 0 ; i < number ; ++i) { if ((r = x[i]) < ZERO) { r = tneg*atan(qneg*r); result += r*r; } else { result += r*r; } } } break; case 2: /* L2 norm for negative residuals, L2-L0 norm for positive residuals. */ tpos = hyper[2]; qpos = ONE/tpos; if (g) { for (i = 0 ; i < number ; ++i) { if ((r = x[i]) > ZERO) { t = qpos*r; r = tpos*atan(t); g[i] = s*r/(ONE + t*t); result += r*r; } else { g[i] = s*r; result += r*r; } } } else { } break; case 3: /* L2-L0 norm for all residuals. */ tneg = hyper[1]; qneg = ONE/tneg; tpos = hyper[2]; qpos = ONE/tpos; if (g) { for (i = 0 ; i < number ; ++i) { if ((r = x[i]) < ZERO) { t = qneg*r; r = tneg*atan(t); } else { t = qpos*r; r = tpos*atan(t); } g[i] = s*r/(ONE + t*t); result += r*r; } } else { for (i = 0 ; i < number ; ++i) { if ((r = x[i]) < ZERO) { r = tneg*atan(qneg*r); } else { r = tpos*atan(qpos*r); } result += r*r; } } break; } return mu*result; } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_hash.c000066400000000000000000001021651253351442600155160ustar00rootroot00000000000000/* * yeti_hash.c - * * Implement hash table objects in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include "config.h" #include "yeti.h" #include "yio.h" #undef H_DEBUG /*---------------------------------------------------------------------------*/ /* DEFINITIONS FOR STRING HASH TABLES */ /* Some macros to adapt implementation. */ #define h_error(MSG) YError(MSG) #define h_malloc(SIZE) p_malloc(SIZE) #define h_free(ADDR) p_free(ADDR) #define OFFSET(type, member) ((char *)&((type *)0)->member - (char *)0) typedef unsigned int h_uint_t; typedef struct h_table h_table_t; typedef struct h_entry h_entry_t; struct h_table { int references; /* reference counter */ Operations *ops; /* virtual function table */ long eval; /* index to eval method (-1L if none) */ int rehash; /* this table needs rehash? */ h_uint_t number; /* number of entries */ h_uint_t size; /* number of allocated slots */ h_entry_t **slot; /* dynamically malloc'ed slots */ }; struct h_entry { h_entry_t *next; /* next entry or NULL */ OpTable *sym_ops; /* client data value = Yorick's symbol */ SymbolValue sym_value; h_uint_t key; /* hash key */ char name[1]; /* entry name, actual size is large enough for whole string name to fit (MUST BE LAST MEMBER) */ }; /* * Tests about the hashing method: * ------------------ -------- --------------------------------------------- * hash code cost(*) histogram of slot occupation * ------------------ -------- --------------------------------------------- * KEY+=(KEY<<1)+CODE 1.38 [1386,545,100,17] * KEY+=(KEY<<2)+CODE 1.42 [1399,522,107,20] * KEY+=(KEY<<3)+CODE 1.43 [1404,511,116,15, 2] * KEY =(KEY<<1)^CODE 1.81 [1434,481, 99,31, 2, 0,0,0,0,0,0,0,0,0,0,0,1] * KEY =(KEY<<2)^CODE 2.09 [1489,401,112,31, 9, 4,1,0,0,0,0,0,0,0,0,0,1] * KEY =(KEY<<3)^CODE 2.82 [1575,310, 95,28,19,10,4,3,2,1,0,0,0,0,0,0,1] * ------------------ -------- --------------------------------------------- * (*) cost = mean # of tests to localize an item * TCL randomize method is: KEY += (KEY<<3) + C * Yorick randomize method is: KEY = (KEY<<1) ^ C */ /* Piece of code to randomize a string. KEY, LEN, CODE and NAME must be variables. KEY, LEN, CODE must be unsigned integers (h_uint_t) and NAME an unsigned character array. */ #define H_HASH(KEY, LEN, NAME, CODE) \ do { \ const unsigned char * __temp__ = (const unsigned char *)NAME; \ for (KEY = LEN = 0; (CODE = __temp__[LEN]); ++LEN) { \ KEY += (KEY<<3) + CODE; \ } \ } while (0) /* Use this macro to check if hash table ENTRY match string NAME. LEN is the length of NAME and KEY the hash key computed from NAME. */ #define H_MATCH(ENTRY, KEY, NAME, LEN) \ ((ENTRY)->key == KEY && ! strncmp(NAME, (ENTRY)->name, LEN)) extern h_table_t *h_new(h_uint_t number); /*----- Create a new empty hash table with at least NUMBER slots pre-allocated (rounded up to a power of 2). */ extern void h_delete(h_table_t *table); /*----- Destroy hash table TABLE and its contents. */ extern h_entry_t *h_find(h_table_t *table, const char *name); /*----- Returns the address of the entry in hash table TABLE that match NAME. If no entry is identified by NAME (or in case of error) NULL is returned. */ extern int h_remove(h_table_t *table, const char *name); /*----- Remove entry identifed by NAME from hash table TABLE. Return value is: 0 if no entry in TABLE match NAME, 1 if and entry matching NAME was found and unreferenced, -1 in case of error. */ extern int h_insert(h_table_t *table, const char *name, Symbol *sym); /*----- Insert entry identifed by NAME with contents SYM in hash table TABLE. Return value is: 0 if no former entry in TABLE matched NAME (hence a new entry was created); 1 if a former entry in TABLE matched NAME (which was properly unreferenced); -1 in case of error. */ /*---------------------------------------------------------------------------*/ /* PRIVATE ROUTINES */ extern BuiltIn Y_is_hash; extern BuiltIn Y_h_new, Y_h_get, Y_h_set, Y_h_has, Y_h_pop, Y_h_stat; extern BuiltIn Y_h_debug, Y_h_keys, Y_h_first, Y_h_next; static h_table_t *get_hash(Symbol *stack); /*----- Returns hash table stored by symbol STACK. STACK get replaced by the referenced object if it is a reference symbol. */ static void set_members(h_table_t *obj, Symbol *stack, int nargs); /*----- Parse arguments STACK[0]..STACK[NARGS-1] as key-value pairs to store in hash table OBJ. */ static int get_hash_and_key(int nargs, h_table_t **table, const char **keystr); static void get_member(Symbol *owner, h_table_t *table, const char *name); /*----- Replace stack symbol OWNER by the contents of entry matching NAME in hash TABLE (taking care of UnRef/Ref properly). */ static void rehash(h_table_t *table); /*----- Rehash hash TABLE (taking care of interrupts). */ /*--------------------------------------------------------------------------*/ /* IMPLEMENTATION OF HASH TABLES AS OPAQUE YORICK OBJECTS */ extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; static MemberOp GetMemberH; static UnaryOp PrintH; static void FreeH(void *addr); /* ******* Use Unref(hash) ******* */ static void EvalH(Operand *op); Operations hashOps = { &FreeH, T_OPAQUE, 0, /* promoteID = */T_STRING/* means illegal */, "hash_table", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalH, &SetupX, &GetMemberH, &MatMultX, &PrintH }; /* FreeH is automatically called by Yorick to delete an object instance that is no longer referenced. */ static void FreeH(void *addr) { h_delete((h_table_t *)addr); } /* PrintH is used by Yorick's info command. */ static void PrintH(Operand *op) { h_table_t *obj = (h_table_t *)op->value; char line[80]; ForceNewline(); PrintFunc("Object of type: "); PrintFunc(obj->ops->typeName); PrintFunc(" (evaluator="); if (obj->eval < 0L) { PrintFunc("(nil)"); } else { PrintFunc("\""); PrintFunc(globalTable.names[obj->eval]); PrintFunc("\""); } sprintf(line, ", references=%d, number=%u, size=%u)", obj->references, obj->number, obj->size); PrintFunc(line); ForceNewline(); } /* GetMemberH implements the de-referencing '.' operator. */ static void GetMemberH(Operand *op, char *name) { get_member(op->owner, (h_table_t *)op->value, name); } /* EvalH implements hash table used as a function or as an indexed array. */ static void EvalH(Operand *op) { Symbol *s, *owner; h_table_t *table; DataBlock *old, *db; OpTable *ops; Operations *oper; int i, nargs, offset; /* Get the hash table. */ owner = op->owner; table = (h_table_t *)owner->value.db; nargs = sp - owner; /* number of arguments */ if (table->eval >= 0L) { /* this hash table implement its own eval method */ s = &globTab[table->eval]; while (s->ops == &referenceSym) { s = &globTab[s->index]; } db = s->value.db; /* correctness checked below */ if (s->ops != &dataBlockSym || db == NULL || ((oper = db->ops) != &functionOps && oper != &builtinOps && oper != &auto_ops)) { YError("non-function eval method"); } /* shift stack to prepend reference to eval method */ offset = owner - spBottom; /* stack may move */ if (CheckStack(2)) { owner = spBottom + offset; op->owner = owner; } /*** CRITICAL CODE START ***/ { volatile Symbol *stack = owner; ++nargs; /* one more argument: the object itself */ i = nargs; stack[i].ops = &intScalar; /* set safe OpTable */ sp = (Symbol *)stack + i; /* it is now safe to grow the stack */ while (--i >= 0) { ops = stack[i].ops; stack[i].ops = &intScalar; /* set safe OpTable */ stack[i + 1].value = stack[i].value; stack[i + 1].index = stack[i].index; stack[i + 1].ops = ops; /* set true OpTable *after* initialization */ } stack->value.db = RefNC(db); /* we already know that db != NULL */ stack->ops = &dataBlockSym; } /*** CRITICAL CODE END ***/ /* re-form operand and call Eval method */ op->owner = owner; /* stack may have moved */ op->references = nargs; /* (see FormEvalOp in array.c) */ op->ops = db->ops; op->value = db; op->ops->Eval(op); return; } /* got exactly one argument */ if (nargs == 1 && sp->ops != NULL) { Operand arg; sp->ops->FormOperand(sp, &arg); if (arg.ops->typeID == T_STRING) { if (arg.type.dims == NULL) { char *name = *(char **)arg.value; h_entry_t *entry = h_find(table, name); Drop(1); /* discard key name (after using it) */ old = (owner->ops == &dataBlockSym) ? owner->value.db : NULL; owner->ops = &intScalar; /* avoid clash in case of interrupts */ if (entry != NULL) { if ((ops = entry->sym_ops) == &dataBlockSym) { db = entry->sym_value.db; owner->value.db = Ref(db); } else { owner->value = entry->sym_value; } } else { /* NULLER_DATA_BLOCK NewRange(0L, 0L, 1L, R_NULLER); */ owner->value.db = RefNC(&nilDB); ops = &dataBlockSym; } Unref(old); owner->ops = ops; /* change ops only AFTER value updated */ return; } } else if (arg.ops->typeID == T_VOID) { Drop(2); PushLongValue(table->number); return; } } YError("expecting or a single hash key name or nil (integer indexing no longer supported)"); } /*---------------------------------------------------------------------------*/ /* BUILTIN ROUTINES */ static int is_nil(Symbol *s); static void push_string_value(const char *value); static int is_nil(Symbol *s) { while (s->ops == &referenceSym) s = &globTab[s->index]; return (s->ops == &dataBlockSym && s->value.db == &nilDB); } static void push_string_value(const char *value) { ((Array *)PushDataBlock(NewArray(&stringStruct, NULL)))->value.q[0] = (value ? p_strcpy((char *)value) : NULL); } void Y_is_hash(int nargs) { Symbol *s; int result; if (nargs != 1) YError("is_hash takes exactly one argument"); s = YETI_DEREF_SYMBOL(sp); if (s->ops == &dataBlockSym && s->value.db->ops == &hashOps) { if (((h_table_t *)s->value.db)->eval >= 0L) { result = 2; } else { result = 1; } } else { result = 0; } PushIntValue(result); } void Y_h_debug(int nargs) { int i; for (i=1 ; i<=nargs ; ++i) yeti_debug_symbol(sp - nargs + i); Drop(nargs); } void Y_h_new(int nargs) { h_table_t *obj; int initial_size, got_members; const int min_size = 16; Symbol *stack = sp - nargs + 1; /* first argument (we know that the stack will NOT be moved) */ if (nargs == 0 || (nargs == 1 && is_nil(sp))) { got_members = 0; initial_size = 0; } else { got_members = 1; initial_size = nargs/2; } if (initial_size < min_size) initial_size = min_size; obj = h_new(initial_size); PushDataBlock(obj); if (got_members) set_members(obj, stack, nargs); } void Y_h_set(int nargs) { h_table_t *table; if (nargs < 1 || nargs%2 != 1) YError("usage: h_set,table,\"key\",value,... -or- h_set,table,key=value,..."); table = get_hash(sp - nargs + 1); if (nargs > 1) { set_members(table, sp - nargs + 2, nargs - 1); Drop(nargs-1); /* just left the target object on top of the stack */ } } void Y_h_get(int nargs) { /* Get hash table object and key name, then replace first argument (the hash table object) by entry contents. */ h_table_t *table; const char *name; if (get_hash_and_key(nargs, &table, &name)) { YError("usage: h_get(table, \"key\") -or- h_get(table, key=)"); } Drop(nargs - 1); /* only left hash table on top of stack */ get_member(sp, table, name); /* replace top of stack by entry contents */ } void Y_h_has(int nargs) { int result; h_table_t *table; const char *name; if (get_hash_and_key(nargs, &table, &name)) { YError("usage: h_has(table, \"key\") -or- h_has(table, key=)"); } result = (h_find(table, name) != NULL); Drop(nargs); PushIntValue(result); } void Y_h_pop(int nargs) { h_uint_t key, len, code, index; h_entry_t *entry, *prev; h_table_t *table; const char *name; Symbol *stack = sp + 1; /* location to put new element */ if (get_hash_and_key(nargs, &table, &name)) { YError("usage: h_pop(table, \"key\") -or- h_pop(table, key=)"); } /* *** Code more or less stolen from 'h_remove' *** */ if (name) { /* Compute hash key. */ H_HASH(key, len, name, code); /* Find the entry. */ prev = NULL; index = (key % table->size); entry = table->slot[index]; while (entry) { if (H_MATCH(entry, key, name, len)) { /* Delete the entry: (1) remove entry from chained list of entries in its slot, (2) pop contents of entry, (3) free entry memory. */ /*** CRITICAL CODE BEGIN ***/ if (prev) prev->next = entry->next; else table->slot[index] = entry->next; stack->ops = entry->sym_ops; stack->value = entry->sym_value; h_free(entry); --table->number; sp = stack; /* sp updated AFTER new stack element finalized */ /*** CRITICAL CODE END ***/ return; /* entry found and popped */ } prev = entry; entry = entry->next; } } PushDataBlock(RefNC(&nilDB)); /* entry not found */ } void Y_h_number(int nargs) { Symbol *s; long result; if (nargs != 1) YError("h_number takes exactly one argument"); s = YETI_DEREF_SYMBOL(sp); if (s->ops != &dataBlockSym || s->value.db->ops != &hashOps) { YError("inexpected non-hash table argument"); } result = ((h_table_t *)s->value.db)->number; PushLongValue(result); } void Y_h_keys(int nargs) { h_entry_t *entry; h_table_t *table; char **result; h_uint_t i, j, number; if (nargs != 1) YError("h_keys takes exactly one argument"); table = get_hash(sp); number = table->number; if (number) { result = YETI_PUSH_NEW_Q(yeti_start_dimlist(number)); j = 0; for (i = 0; i < table->size; ++i) { for (entry = table->slot[i]; entry != NULL; entry = entry->next) { if (j >= number) YError("corrupted hash table"); result[j++] = p_strcpy(entry->name); } } } else { PushDataBlock(RefNC(&nilDB)); } } void Y_h_first(int nargs) { h_table_t *table; char *name; h_uint_t j, n; h_entry_t **slot; if (nargs != 1) YError("h_first takes exactly one argument"); table = get_hash(sp); name = NULL; slot = table->slot; n = table->size; for (j = 0; j < n; ++j) { if (slot[j]) { name = slot[j]->name; break; } } push_string_value(name); } void Y_h_next(int nargs) { Operand arg; h_table_t *table; h_entry_t *entry, **slot; const char *name; h_uint_t key, len, code, j, n; if (nargs != 2) YError("h_next takes exactly two arguments"); table = get_hash(sp - 1); /* Get scalar string argument. */ if (sp->ops == NULL) { bad_arg: YError("expecting a scalar string"); } sp->ops->FormOperand(sp, &arg); if (arg.type.dims != NULL || arg.ops->typeID != T_STRING) { goto bad_arg; } name = *(const char **)arg.value; if (name == NULL) { /* Left nil string as result on top of stack. */ return; } /* Compute hash key. */ H_HASH(key, len, name, code); /* Locate matching entry. */ j = (key % table->size); slot = table->slot; for (entry = slot[j]; entry != NULL; entry = entry->next) { if (H_MATCH(entry, key, (const char *)name, len)) { /* Get 'next' hash entry. */ if (entry->next) { name = (const char *)entry->next->name; } else { name = (const char *)0; n = table->size; while (++j < n) { entry = slot[j]; if (entry) { name = (const char *)entry->name; break; } } } push_string_value(name); return; } } YError("hash entry not found"); } void Y_h_stat(int nargs) { Array *array; h_entry_t *entry, **slot; h_table_t *table; long *result; h_uint_t i, number, max_count=0, sum_count=0; if (nargs != 1) YError("h_stat takes exactly one argument"); table = get_hash(sp); number = table->number; slot = table->slot; array = YETI_PUSH_NEW_ARRAY_L(yeti_start_dimlist(number + 1)); result = array->value.l; for (i = 0; i <= number; ++i) { result[i] = 0L; } for (i = 0; i < table->size; ++i) { h_uint_t count = 0; for (entry = slot[i]; entry != NULL; entry = entry->next) { ++count; } if (count <= number) { ++result[count]; } if (count > max_count) { max_count = count; } sum_count += count; } if (sum_count != number) { table->number = sum_count; YError("corrupted hash table"); } } #if YETI_MUST_DEFINE_AUTOLOAD_TYPE typedef struct autoload_t autoload_t; struct autoload_t { int references; /* reference counter */ Operations *ops; /* virtual function table */ long ifile; /* index into table of autoload files */ long isymbol; /* global symtab index */ autoload_t *next; /* linked list for each ifile */ }; #endif /* YETI_MUST_DEFINE_AUTOLOAD_TYPE */ void Y_h_evaluator(int nargs) { static long default_eval_index = -1; /* index of default eval method in globTab */ static unsigned char type[256]; /* array of integers to check consistency of a symbol's name */ h_table_t *table; char *str; long old_index; int push_result; /* Initialization of internals (digits must have lowest values). */ if (default_eval_index < 0L) { int i; unsigned char value = 0; for (i = 0; i < 256; ++i) { type[i] = value; } for (i = '0'; i <= '9'; ++i) { type[i] = ++value; } for (i = 'A'; i <= 'Z'; ++i) { type[i] = ++value; } type['_'] = ++value; for (i = 'a'; i <= 'z'; ++i) { type[i] = ++value; } default_eval_index = Globalize("*hash_evaluator*", 0L); } if (nargs < 1 || nargs > 2) YError("h_evaluator takes 1 or 2 arguments"); push_result = ! yarg_subroutine(); table = get_hash(sp - nargs + 1); old_index = table->eval; if (nargs == 2) { long new_index = -1L; Symbol *s = sp; while (s->ops == &referenceSym) { s = &globTab[s->index]; } if (s->ops == &dataBlockSym) { Operations *ops = s->value.db->ops; if (ops == &functionOps) { new_index = ((Function *)s->value.db)->code[0].index; } else if (ops == &builtinOps) { new_index = ((BIFunction *)s->value.db)->index; } else if (ops == &auto_ops) { new_index = ((autoload_t *)s->value.db)->isymbol; } else if (ops == &stringOps) { Array *a = (Array *)s->value.db; if (a->type.dims == NULL) { /* got a scalar string */ unsigned char *q = (unsigned char *)a->value.q[0]; if (q == NULL) { /* nil symbol's name corresponds to default value */ new_index = default_eval_index; } else { /* symbol's name must not have a zero length, nor start with an invalid character nor a digit */ if (type[q[0]] > 10) { int c, i = 0; for (;;) { if ((c = q[++i]) == 0) { new_index = Globalize((char *)q, i); break; } if (! type[c]) { /* symbol's must not contain an invalid character */ break; } } } } } } else if (ops == &voidOps) { /* void symbol corresponds to default value */ new_index = default_eval_index; } } if (new_index < 0L) { YError("evaluator must be a function or a valid symbol's name"); } if (new_index == default_eval_index) { table->eval = -1L; } else { table->eval = new_index; } } if (push_result) { if (old_index >= 0L && old_index != default_eval_index) { str = globalTable.names[old_index]; } else { str = (char *)0; } push_string_value(str); } } /*---------------------------------------------------------------------------*/ static void get_member(Symbol *owner, h_table_t *table, const char *name) { OpTable *ops; h_entry_t *entry = h_find(table, name); DataBlock *old = (owner->ops == &dataBlockSym) ? owner->value.db : NULL; owner->ops = &intScalar; /* avoid clash in case of interrupts */ if (entry) { if ((ops = entry->sym_ops) == &dataBlockSym) { DataBlock *db = entry->sym_value.db; owner->value.db = Ref(db); } else { owner->value = entry->sym_value; } } else { owner->value.db = RefNC(&nilDB); ops = &dataBlockSym; } owner->ops = ops; /* change ops only AFTER value updated */ Unref(old); } /* get args from the top of the stack: first arg is hash table, second arg should be key name or keyword followed by third nil arg */ static int get_hash_and_key(int nargs, h_table_t **table, const char **keystr) { Operand op; Symbol *s, *stack; stack = sp - nargs + 1; if (nargs == 2) { /* e.g.: foo(table, "key") */ s = stack + 1; /* symbol for key */ if (s->ops) { s->ops->FormOperand(s, &op); if (! op.type.dims && op.ops->typeID == T_STRING) { *table = get_hash(stack); *keystr = *(char **)op.value; return 0; } } } else if (nargs == 3) { /* e.g.: foo(table, key=) */ if (! (stack + 1)->ops && is_nil(stack + 2)) { *table = get_hash(stack); *keystr = globalTable.names[(stack + 1)->index]; return 0; } } return -1; } static h_table_t *get_hash(Symbol *stack) { DataBlock *db; Symbol *sym = (stack->ops == &referenceSym) ? &globTab[stack->index] : stack; if (sym->ops != &dataBlockSym || sym->value.db->ops != &hashOps) YError("expected hash table object"); db = sym->value.db; if (sym != stack) { /* Replace reference onto the stack (equivalent to the statement ReplaceRef(s); see ydata.c for actual code of this routine). */ stack->value.db = Ref(db); stack->ops = &dataBlockSym; /* change ops only AFTER value updated */ } return (h_table_t *)db; } static void set_members(h_table_t *table, Symbol *stack, int nargs) { Operand op; int i; const char *name; if (nargs%2 != 0) YError("last key has no value"); for (i = 0; i < nargs; i += 2, stack += 2) { /* Get key name. */ if (stack->ops) { stack->ops->FormOperand(stack, &op); if (! op.type.dims && op.ops == &stringOps) { name = *(char **)op.value; } else { name = NULL; } } else { name = globalTable.names[stack->index]; } if (! name) { YError("bad key, expecting a non-nil scalar string name or a keyword"); } /* Replace value. */ h_insert(table, name, stack + 1); } } /*--------------------------------------------------------------------------*/ /* The following code implement management of hash tables with string keys and aimed at the storage of Yorick DataBlock. The randomization algorithm is taken from Tcl (which is 25-30% more efficient than Yorick's algorithm). */ h_table_t *h_new(h_uint_t number) { h_uint_t nbytes, size = 1; h_table_t *table; /* Member SIZE of a hash table is always a power of 2, greater or equal 2*NUMBER (twice the number of entries in the table). */ while (size < number) { size <<= 1; } size <<= 1; nbytes = size*sizeof(h_entry_t *); table = h_malloc(sizeof(h_table_t)); if (table == NULL) { enomem: h_error("insufficient memory for new hash table"); return NULL; } table->slot = h_malloc(nbytes); if (table->slot == NULL) { h_free(table); goto enomem; } memset(table->slot, 0, nbytes); table->rehash = 0; table->references = 0; table->ops = &hashOps; table->eval = -1L; table->number = 0; table->size = size; return table; } void h_delete(h_table_t *table) { h_uint_t i, size;; h_entry_t *entry, **slot; if (table != NULL) { if (table->rehash) { rehash(table); } size = table->size; slot = table->slot; for (i = 0; i < size; ++i) { entry = slot[i]; while (entry) { void *addr = entry; if (entry->sym_ops == &dataBlockSym) { DataBlock *db = entry->sym_value.db; Unref(db); } entry = entry->next; h_free(addr); } } h_free(slot); h_free(table); } } h_entry_t *h_find(h_table_t *table, const char *name) { h_uint_t key, len, code; h_entry_t *entry; /* Check key string and compute hash key. */ if (name == NULL) return NULL; /* not found */ H_HASH(key, len, name, code); /* Ensure consistency of the buckets. */ if (table->rehash) { rehash(table); } /* Locate matching entry. */ for (entry = table->slot[key % table->size]; entry != NULL; entry = entry->next) { if (H_MATCH(entry, key, name, len)) return entry; } /* Not found. */ return NULL; } int h_remove(h_table_t *table, const char *name) { h_uint_t key, len, code, index; h_entry_t *entry, *prev; /* Check key string and compute hash key. */ if (name == NULL) return 0; /* not found */ H_HASH(key, len, name, code); /* Ensure consistency of the buckets. */ if (table->rehash) { rehash(table); } /* Find the entry. */ prev = NULL; index = key % table->size; entry = table->slot[index]; while (entry != NULL) { if (H_MATCH(entry, key, name, len)) { /* Delete the entry: (1) remove entry from chained list of entries in its slot, (2) unreference contents of entry, (3) free entry memory. */ /*** CRITICAL CODE BEGIN ***/ if (prev != NULL) { prev->next = entry->next; } else { table->slot[index] = entry->next; } if (entry->sym_ops == &dataBlockSym) { DataBlock *db = entry->sym_value.db; Unref(db); } h_free(entry); --table->number; /*** CRITICAL CODE END ***/ return 1; /* entry found and deleted */ } prev = entry; entry = entry->next; } return 0; /* not found */ } int h_insert(h_table_t *table, const char *name, Symbol *sym) { h_uint_t key, len, code, index; h_entry_t *entry; DataBlock *db; /* Check key string. */ if (name == NULL) { h_error("invalid nil key name"); return -1; /* error */ } /* Compute hash key. */ H_HASH(key, len, name, code); /* Ensure consistency of the buckets. */ if (table->rehash) { rehash(table); } /* Prepare symbol for storage. */ if (sym->ops == &referenceSym) { /* We do not need to call ReplaceRef because the referenced symbol will be properly inserted into the hash table and the stack symbol will be left unchanged. */ sym = &globTab[sym->index]; } if (sym->ops == &dataBlockSym && sym->value.db->ops == &lvalueOps) { /* Symbol is an LValue, e.g. part of an array, we fetch (make a private copy of) the data to release the link on the total array. */ FetchLValue(sym->value.db, sym); } /* Replace contents of the entry with same key name if it already exists. */ for (entry = table->slot[key % table->size]; entry != NULL; entry = entry->next) { if (H_MATCH(entry, key, name, len)) { /*** CRITICAL CODE BEGIN ***/ db = (entry->sym_ops == &dataBlockSym) ? entry->sym_value.db : NULL; entry->sym_ops = &intScalar; /* avoid clash in case of interrupts */ Unref(db); if (sym->ops == &dataBlockSym) { db = sym->value.db; entry->sym_value.db = Ref(db); } else { entry->sym_value = sym->value; } entry->sym_ops = sym->ops; /* change ops only AFTER value updated */ /*** CRITICAL CODE END ***/ return 1; /* old entry replaced */ } } /* Must create a new entry. */ if (((table->number + 1)<<1) > table->size) { /* Must grow hash table slot array, i.e. "re-hash". This is done in such a way that the buckets array is always consistent. This is needed to be robust in case of interrupts (at most one entry could be lost in this case). */ h_entry_t **old, **new; h_uint_t size; size_t nbytes; size = table->size; nbytes = size*sizeof(h_entry_t *); old = table->slot; new = h_malloc(2*nbytes); if (new == NULL) { not_enough_memory: h_error("insufficient memory to store new hash entry"); return -1; } memcpy(new, old, nbytes); memset((char *)new + nbytes, 0, nbytes); /*** CRITICAL CODE BEGIN ***/ table->slot = new; table->rehash = 1; h_free(old); /*** CRITICAL CODE END ***/ rehash(table); } /* Create new entry. */ entry = h_malloc(OFFSET(h_entry_t, name) + 1 + len); if (entry == NULL) goto not_enough_memory; memcpy(entry->name, name, len+1); entry->key = key; if (sym->ops == &dataBlockSym) { db = sym->value.db; entry->sym_value.db = Ref(db); } else { entry->sym_value = sym->value; } entry->sym_ops = sym->ops; /* Insert new entry. */ index = key % table->size; /*** CRITICAL CODE BEGIN ***/ entry->next = table->slot[index]; table->slot[index] = entry; ++table->number; /*** CRITICAL CODE END ***/ return 0; /* a new entry was created */ } /* This function rehash a recently grown hash table. The complications come from the needs to be robust with respet to interruptions so that the task can be interrupted at (almost) any time and resumed later with a minimun risk to loose entries. */ static void rehash(h_table_t *table) { h_entry_t **slot, *prev, *entry; h_uint_t i, j, new_size, old_size; if (table->rehash) { slot = table->slot; new_size = (old_size = table->size) << 1; for (i = 0; i < old_size; ++i) { prev = NULL; entry = slot[i]; while (entry != NULL) { /* Compute index of the entry in the full array of buckets. */ j = entry->key % new_size; if (j == i) { /* No change in entry location, just move to next entry in bucket. */ prev = entry; entry = entry->next; } else { /*** CRITICAL CODE BEGIN ***/ /* Remove entry from its bucket. */ if (prev == NULL) { slot[i] = entry->next; } else { prev->next = entry->next; } /* Insert entry in its new bucket. */ entry->next = slot[j]; slot[j] = entry; /*** CRITICAL CODE END ***/ /* Move to next entry in former bucket. */ entry = ((prev == NULL) ? slot[i] : prev->next); } } } /*** CRITICAL CODE BEGIN ***/ table->rehash = 0; /* clear before setting size */ table->size = new_size; /*** CRITICAL CODE END ***/ } } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_hash_test.i000066400000000000000000000334771253351442600165740ustar00rootroot00000000000000 func eval_me(self, x, y) { return abs(self.x - x, self.y - y); } func call_me(self, x, y) { return sin(self.x - x)*cos(self.y - y); } func test1 { /* create a hash table object */ tab = h_new(x=[1,4], y=[7,9]); /* set evaluator (use function name since it does not yet exists) */ h_evaluator, tab, "eval_me"; tab(4,11); h_show, tab; other = h_new(x=3.6, y=11.0); h_evaluator, other, call_me; } func db_get_member(db, name, type=, keys=) { if (is_void(keys)) { keys = h_keys(db); n = numberof(keys); if (n > 1) keys = keys(sort(keys)); } else { n = numberof(keys); } if (n) { x = array((is_void(type) ? double : type), n); for (i=1 ; i<=n ; ++i) { x(i) = h_get(h_get(db, keys(i)), name); } return x; } } func h_analyse(s) { if (is_hash(s)) { s = h_stat(s); } else if (! is_integer(s)) { error, "expecting a hash table or an array with hash statistics"; } n = numberof(s); q = double(indgen(0:n-1)); write, "number = ", sum(q*s); write, "empty = ", s(1); write, "avg = ", sum(q*q*s)/sum(q*s); } func _h_test_eval1(self, key) { return h_get(self, key); } func h_test(repeat) { require, "utils.i"; if (is_void(repeat)) repeat = 1; names = ["B_nu", "B_nu_bar", "B_nu_scale", "GISTPATH", "GIST_FORMAT", "HX_blkbnd", "HX_block", "LPR_FORMAT", "LUrcond", "LUsolve", "PS2EPSI_FORMAT", "QRsolve", "Ray_Path", "SVdec", "SVrank", "SVsolve", "TDsolve", "Y_LAUNCH", "Y_SITE", "Y_VERSION", "_", "__alpha", "__cray", "__dec", "__i86", "__ibmpc", "__mac", "__macl", "__sgi64", "__sun", "__sun3", "__vax", "__vaxg", "__xdr", "_car", "_cat", "_cdr", "_cpy", "_dgecox", "_dgelss", "_dgelx", "_dgesv", "_dgesvx", "_dgetrf", "_dgtsv", "_get_matrix", "_get_msize", "_init_clog", "_init_drat", "_init_pdb", "_jc", "_jr", "_jt", "_len", "_lst", "_map", "_not_cdf", "_not_pdb", "_not_pdbf", "_nxt", "_pl_init", "_plmk_color", "_plmk_count", "_plmk_markers", "_plmk_msize", "_plmk_width", "_prt", "_raw1_flat", "_raw1_linear", "_raw2_flat", "_raw2_linear", "_raw_pcens", "_raw_track", "_ray_integ", "_ray_reduce", "_read", "_rev", "_roll2", "_set_pdb", "_timer_elapsed", "_to_real_system", "_write", "a", "abs", "acos", "acosh", "add_member", "add_next_file", "add_record", "add_variable", "adjust", "adjust_ireg", "akap", "alloc_mesh", "allof", "alpha_primitives", "am_subroutine", "anh", "animate", "anorm", "anyof", "apply_funcs", "area", "array", "arrowl", "arroww", "asin", "asinh", "aspect", "at_pdb_close", "at_pdb_open", "atan", "atanh", "auss_tsigma", "avg", "ax", "b", "b_optional", "backup", "base", "batch", "bins", "blks", "block", "bn", "bnds", "bndy", "bnu", "boltz", "bookmark", "bound", "bounds", "bp", "break", "bytscl", "c", "c_adjust", "call", "catch", "cc", "cd", "ceil", "cell", "cells", "changed", "char", "clogfile", "close", "close102", "close102_default", "closed", "cmax", "cmin", "collect", "color", "color_bar", "colors", "command_line", "complex", "conj", "continue", "contour", "conv3_rays", "copyright", "cos", "cosh", "cray_primitives", "create", "createb", "cs_adjust", "csch", "cum", "current_window", "d", "data_align", "dbauto", "dbcont", "dbdis", "dbdt", "dbexit", "dbinfo", "dbret", "dbskip", "dbup", "dd", "dec_primitives", "default_gate", "default_integrate", "default_ocompute", "dif", "digitize", "dims", "dimsof", "dirs", "disassemble", "display", "do", "double", "drat_akap", "drat_amult", "drat_backlight", "drat_channel", "drat_compress", "drat_ekap", "drat_emult", "drat_gate", "drat_gav", "drat_gb", "drat_glist", "drat_integrate", "drat_ireg", "drat_ireg_adj", "drat_isymz", "drat_khold", "drat_lhold", "drat_linear", "drat_nomilne", "drat_oadjust", "drat_ocompute", "drat_omult", "drat_quiet", "drat_rt", "drat_start", "drat_static", "drat_stop", "drat_symmetry", "drat_zt", "ds", "dummy", "dump", "dump_clog", "dx", "dy", "dz", "e", "e30", "ecolor", "edges", "edit_times", "ee", "ekap", "elapsed", "else", "eps", "eq_nocopy", "erfc", "error", "ewidth", "exist", "exit", "exp", "expm1", "extern", "f", "f__map", "f_save", "face", "ff", "fflush", "fft", "fft_braw", "fft_dirs", "fft_fraw", "fft_init", "fft_inplace", "fft_raw", "fft_setup", "fi", "file", "filename", "final", "find_boundary", "first", "flip", "floor", "fma", "font", "for", "form_mesh", "format", "fudge", "full", "func", "gauss_gate", "gauss_int", "gauss_norm", "gauss_t0", "gauss_tsigma", "gaussian_gate", "gav", "gb", "get_addrs", "get_argv", "get_command_line", "get_cwd", "get_env", "get_home", "get_kaps", "get_member", "get_ncycs", "get_primitives", "get_ray_path", "get_s0", "get_std_limits", "get_times", "get_vars", "gotopen", "gridxy", "group", "grow", "guess_symmetry", "has_ireg", "has_records", "has_time", "hcp", "hcp_file", "hcp_finish", "hcp_out", "hcpoff", "hcpon", "hcps", "height", "help", "help_file", "help_topic", "help_worker", "hex24b_track", "hex24f_track", "hex5_track", "hex_mesh", "hex_mesh2", "hex_query", "hex_startflag", "hex_triang", "hide", "histeq_scale", "histogram", "hnu", "hnub", "hollow", "hooks", "how", "hydra_adj", "hydra_blks", "hydra_bnd", "hydra_mesh", "hydra_mrk", "hydra_start", "hydra_xyz", "i", "i0", "i86_primitives", "ic", "if", "ijk_max", "ijx", "im", "im_part", "ims", "imsof", "include", "indent", "indgen", "info", "install_struct", "instant", "integ", "integ_flat", "integ_linear", "integrator", "intens", "internal_rays", "interp", "irays", "ireg", "is_array", "is_complex", "is_func", "is_present", "is_range", "is_stream", "is_struct", "is_void", "item", "iwork", "ix", "j", "j0", "jc", "jr", "jt", "justify", "jx", "k0", "kc", "keep", "keybd_focus", "khold", "kmax", "kt", "kxlm", "l", "label", "labl", "labs", "last", "ldims", "ldvt", "legal", "legend", "legends", "length", "levs", "lhold", "library", "limits", "line", "list", "list__map", "ljdir", "ljoff", "lm", "lmax", "local", "log", "log10", "logxy", "long", "lsdir", "lwork", "m", "mac_primitives", "macl_primitives", "make_sphere", "mark", "marker", "marks", "mask", "max", "max_trans", "mbnds", "mcolor", "median", "merge", "merge2", "mesh", "mesh_loc", "min", "mkdir", "mnmax", "more_args", "mouse", "moush", "mphase", "msg", "msize", "mspace", "n", "n0", "n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9", "nallof", "name", "nameof", "nblk", "nbnds", "nc", "ncyc", "ndb", "ndims", "nedges", "next_arg", "ng", "ngroup", "ni", "ni1", "nij", "nj", "nj1", "njk", "nk", "nk1", "nki", "nlist", "nolj", "nomilne", "noneof", "norj", "nrays", "nrhs", "ntot", "nub", "numberof", "nz", "nzones", "o", "odd", "old", "one_norm", "opac", "opaque", "open", "open102", "openb", "openb_hooks", "orgs", "orgsof", "orient", "orig", "ouble", "outname", "p", "pair", "pairs", "palette", "pat", "path", "pause", "pc_primitives", "pcen", "pcen_source", "periodic", "phi", "phi12", "phi_up", "pi", "pic3_rays", "picture_rays", "pivot", "plc", "pldefault", "pldj", "pledit", "plf", "plfc", "plfc_colors", "plfc_levs", "plfp", "plg", "pli", "plm", "plmesh", "plmk", "plmk_default", "plq", "plsys", "plt", "plt1", "pltitle", "pltitle_font", "pltitle_height", "plv", "pointer", "poly", "popen", "port", "power", "pr1", "prev", "primitives", "print", "print_format", "process_argv", "psum", "pt1", "pt2", "ptcen", "px", "q", "q_up", "qrt", "query", "quit", "qx", "qxy", "qy", "qz", "r", "radius", "random", "random_seed", "randomize", "range", "raw_collect", "raw_legal", "raw_not_cdf", "raw_read_n", "raw_show", "raw_style", "ray", "rays", "rcond", "rdline", "rdv", "re", "re_part", "read", "read_clog", "read_n", "recover_file", "redraw", "reg_track", "region", "remove", "rename", "require", "reset_options", "reshape", "restore", "result", "result__map", "return", "reverse", "rgb_read", "rjdir", "rjoff", "rmdir", "roll", "rphase", "rspace", "rt", "s", "save", "scalar", "sech", "seed", "selfem", "set_blocksize", "set_filesize", "set_idler", "set_path", "set_primitives", "set_site", "set_tolerances", "set_vars", "setup", "sgi64_primitives", "short", "show", "sign", "sin", "sinh", "sizeof", "slimits", "slims", "smooth", "snap", "snap_dt", "snap_i", "snap_result", "snap_worker", "solid", "sort", "source", "span", "spanl", "spann", "split", "sqrt", "sread", "start", "stds", "stds1", "stop", "streak", "streak_save", "streak_saver", "streak_times", "stride", "string", "strlen", "strmatch", "strpart", "strtok", "struct", "struct_align", "structof", "sum", "sun3_primitives", "sun_primitives", "swrite", "symbol_def", "symbol_set", "sys", "system", "t", "t0", "tail", "tail__map", "tan", "tanh", "text", "theta", "theta12", "theta_up", "time", "timer", "timer_print", "times", "timestamp", "title", "tmp", "tn", "top", "topic", "tops", "tosys", "track_integ", "track_rays", "track_reduce", "track_solve", "transp", "transpose", "triangle", "tsigma", "tt", "type", "typeof", "u", "ublk", "uncen", "uncp", "unit", "unzoom", "update", "update_mesh", "updateb", "use_origins", "v", "value", "vars", "vax_primitives", "vaxg_primitives", "vert", "viewport", "void", "volume", "vsq", "vt", "w", "w2", "warranty", "where", "where2", "which", "while", "width", "window", "winkill", "work", "write", "ws", "ww", "x", "x3ff", "x3ffe", "x4000", "x401", "x7f", "x81", "xc", "xdr_primitives", "xm", "xp", "xpict", "xtitle", "xy", "xytitles", "xyz", "y", "yPDBclose", "yPDBopen", "yc", "ym", "ymax", "ymin", "yorick_init", "yorick_stats", "ypict", "ytitle", "z", "zcen", "zmax", "zmin", "zncen", "zone", "zoom_factor", "zsym", "zt"]; /* reserved keywords */ i = 0; reserved = h_new("do",++i, "for",++i, "while",++i, "if",++i, "else",++i, "goto",++i, "break",++i, "continue",++i, "func",++i, "return",++i, "struct",++i, "extern",++i, "local",++i, "more_args",++i, "next_arg",++i, /* "char",++i, "short",++i, "int",++i, "long",++i, "float",++i, "double",++i, "complex",++i, "string",++i, "pointer",++i, */ "min",++i, "max",++i, "avg",++i, "rms",++i, "sum",++i, "mnx",++i, "mxx",++i, "dif",++i, "pcen",++i, "psum",++i, "zcen",++i, "cum",++i, "ptp",++i, "uncp",++i); n = numberof(names); ok = "OK - %s\n"; tab = h_new(); for (i = 1; i <= n; ++i) { h_set, tab, names(i), names(i); } if ((value = tab()) != n) { error, swrite(format="tab() != %d\n", n); } else { write, format=ok, "tab() yields number of keys"; } /* Check h_keys(). */ value = h_keys(tab); if (numberof(value) != n) { error, swrite(format="tab() != %d\n", n); } else if (anyof(value(sort(value)) != names(sort(names)))) { error, swrite(format="h_keys(tab) != list of keys\n", n); } else { write, format=ok, "h_keys(tab) yields list of keys"; } /* Check values stored into hash table. */ for (i = 1; i <= n; ++i) { key = names(i); if ((value = h_get(tab, key)) != key) { error, swrite(format="h_get(tab, \"%s\") != \"%s\"\n", value, key); } } write, format=ok, "h_get(tab, \"key\") yields value"; for (i = 1; i <= n; ++i) { key = names(i); if ((value = tab(key)) != key) { error, swrite(format="tab(\"%s\") != \"%s\"\n", value, key); } } write, format=ok, "tab(\"key\") yields value"; /* Check for tab.key and h_get(tab, key=) syntaxes for non-keyword keys. */ local failure; for (i = 1; i <= n; ++i) { key = names(i); if (reserved(key)) continue; expr = swrite(format="tab.%s == \"%s\"", key, key); failure = 2; /* needed to detect syntax errors */ include, ["failure = (" + expr + " ? 0 : 1);"], 1; if (failure) { error, swrite(format="assertion failed: %s\n", expr); } } write, format=ok, "tab.key yields value"; for (i = 1; i <= n; ++i) { key = names(i); if (reserved(key)) continue; expr = swrite(format="h_get(tab, %s=) == \"%s\"", key, key); failure = 2; /* needed to detect syntax errors */ include, ["failure = (" + expr + " ? 0 : 1);"], 1; if (failure) { error, swrite(format="assertion failed: %s\n", expr); } } write, format=ok, "h_get(tab, key=) yields value"; /* Check h_evaluator. */ h_evaluator, tab, "_h_test_eval1"; for (i = 1; i <= n; ++i) { key = names(i); if ((value = tab(key)) != key) { error, swrite(format="tab(\"%s\") != \"%s\"\n", value, key); } } write, format=ok, "tab(\"key\") yields value with h_evaluator"; /* Speed test (can also be used to detect memory leaks). */ write, ""; timer_start; for (k = 1; k <= repeat; ++k) { tab = h_new(); for (i = 1; i <= n; ++i) { h_set, tab, names(i), i; } } timer_elapsed, repeat; stat = h_stat(tab); h_analyse, stat; stat(1:max(where(stat))); } h_test; Yeti-6.4.0/core/yeti_math.c000066400000000000000000000177111253351442600155260ustar00rootroot00000000000000/* * yeti_math.c - * * Additional math builtin functions and generalized matrix-vector * multiplication for Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include "config.h" #include "yeti.h" #include "ydata.h" #include "bcast.h" #include "yio.h" /* Some constants. */ #define PI 3.141592653589793238462643383279502884197 #define TWO_PI 6.283185307179586476925286766559005768394 #define ONE_OVER_TWO_PI 0.1591549430918953357688837633725143620345 /* Use definition of sinc in use in signal processing (i.e. normalized sinc) otherwise use mathematical definition. */ #ifndef NORMALIZED_SINC # define NORMALIZED_SINC 1 #endif /* * Utility macros: STRINGIFY takes an argument and wraps it in "" (double * quotation marks), JOIN joins two arguments. Both are capable of * performing macro expansion of their arguments. */ #define VERBATIM(x) x #if defined(__STDC__) || defined(__cplusplus) || defined(c_plusplus) # define STRINGIFY(x) STRINGIFY1(x) # define STRINGIFY1(x) #x # define JOIN(a,b) JOIN1(a,b) # define JOIN1(a,b) a##b #else # define STRINGIFY(x) "x" # define JOIN(a,b) VERBATIM(a)/**/VERBATIM(b) #endif extern BuiltIn Y_sinc; /* ANSI standard math.h functions */ extern double sin(double); extern double cos(double); extern double tan(double); extern double asin(double); extern double acos(double); extern double atan(double); extern double atan2(double, double); extern double sinh(double); extern double cosh(double); extern double tanh(double); extern double exp(double); extern double log(double); extern double log10(double); extern double sqrt(double); extern double ceil(double); extern double floor(double); /* function either present in math library or implemented in nonc.c */ extern double hypot(double, double); /* Some functions and definitions stolen from Yorick std0.c and ops0.c in order to not use 'private' Yorick API. */ typedef void looper_t(double *dst, const double *src, const long n); static void *build_result(Operand *op, StructDef *base); static void unary_worker(int nArgs, looper_t *DLooper, looper_t *ZLooper); static void pop_to_d(Symbol *s); /* same as PopToD in ops0.c */ static void pop_to_d(Symbol *s) { Array *array = (Array *)sp->value.db; PopTo(s); if (s->ops==&dataBlockSym && !array->type.dims) { s->ops= &doubleScalar; s->value.d= array->value.d[0]; Unref(array); } } /* similar to BuildResultU in ops0.c */ static void *build_result(Operand *op, StructDef *base) { if (! op->references && op->type.base == base) { /* similar to PushCopy in ydata.c */ Symbol *stack = sp + 1; Symbol *s = op->owner; int isDB = (s->ops == &dataBlockSym); stack->ops = s->ops; if (isDB) stack->value.db = Ref(s->value.db); else stack->value = s->value; sp = stack; /* sp updated AFTER new stack element intact */ return (isDB ? op->value : &sp->value); } else { return (void *)(((Array *)(PushDataBlock(NewArray(base, op->type.dims))))->value.c); } } static void unary_worker(int nArgs, looper_t *DLooper, looper_t *ZLooper) { Operand op; int promoteID; if (nArgs!=1) YError("expecting exactly one argument"); if (!sp->ops) YError("unexpected keyword"); sp->ops->FormOperand(sp, &op); promoteID = op.ops->promoteID; if (promoteID <= T_DOUBLE) { if (promoteID < T_DOUBLE) op.ops->ToDouble(&op); DLooper(build_result(&op, &doubleStruct), op.value, op.type.number); pop_to_d(sp - 2); } else { if (promoteID>T_COMPLEX) YError("expecting numeric argument"); ZLooper(build_result(&op, &complexStruct), op.value, 2*op.type.number); PopTo(sp - 2); } Drop(1); } /* ----- sinc(x) = sin(PI*x)/PI/x ----- */ static void sincDLoop(double *dst, const double *src, const long n); static void sincZLoop(double *dst, const double *src, const long n) ; void Y_sinc(int nArgs) { unary_worker(nArgs, &sincDLoop, &sincZLoop); } static void sincDLoop(double *dst, const double *src, const long n) { #if NORMALIZED_SINC const double pi = PI; #endif double x; long i; for (i=0 ; i fabs(ri)) { ri /= rr; rr = 1.0/((1.0 + ri*ri)*rr); dst[i] = (lr + li*ri)*rr; dst[i+1] = (li - lr*ri)*rr; } else { rr /= ri; ri = 1.0/((1.0 + rr*rr)*ri); dst[i] = (lr*rr + li)*ri; dst[i+1] = (li*rr - lr)*ri; } } else { dst[i] = 1.0; dst[i+1] = 0.0; /* Not needed? */ } } } /*---------------------------------------------------------------------------*/ /* ARC */ extern BuiltIn Y_arc; void Y_arc(int nArgs) { Operand op; int promoteID; long number, i; if (nArgs != 1) YError("arc takes exactly one argument"); if (! sp->ops) YError("unexpected keyword"); sp->ops->FormOperand(sp, &op); promoteID = op.ops->promoteID; if (promoteID == T_DOUBLE) { const double rad = TWO_PI; const double scl = ONE_OVER_TWO_PI; double *x, *y; x = op.value; y = build_result(&op, &doubleStruct); number = op.type.number; for (i=0 ; iToFloat(&op); x = op.value; y = build_result(&op, &floatStruct); number = op.type.number; for (i=0 ; i * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include #include #include #include #include #include "config.h" #include "yeti.h" /* Shall we use faster complex division? (depends Yorick version) */ #if (YORICK_VERSION_MAJOR >= 2) # define USE_FASTER_DIVIDE_Z 0 #elif (YORICK_VERSION_MAJOR == 1 && YORICK_VERSION_MINOR >= 6) # define USE_FASTER_DIVIDE_Z 0 #elif (YORICK_VERSION_MAJOR == 1 && YORICK_VERSION_MINOR == 5 && YORICK_VERSION_MICRO >= 15) # define USE_FASTER_DIVIDE_Z 0 #else # define USE_FASTER_DIVIDE_Z 1 #endif /* Built-in functions defined in this file: */ extern BuiltIn Y_yeti_init; extern BuiltIn Y_mem_base, Y_mem_copy, Y_mem_peek; extern BuiltIn Y_get_encoding; extern BuiltIn Y_nrefsof; extern BuiltIn Y_smooth3; extern BuiltIn Y_insure_temporary; /*---------------------------------------------------------------------------*/ /* INITIALIZATION OF YETI */ /* The order of parsing of startup files is as follows: * 1. Yorick startup scripts: paths.i, std.i, graph.i, matrix.i, fft.i; * 2. Package(s) startup scripts: yeti.i, ...; * 3. Yorick post-initialization: stdx.i (just call 'set_path'). * * It is therefore possible to fool Yorick post-initialization by * changing builtin function 'set_path' to something else. * * Until step 3, search path include the launch directory. * Built-in 'set_site' function is called at statup by 'std.i' to * define global variables: * Y_LAUNCH the directory containing the Yorick executable * Y_VERSION Yorick's version as "MAJOR.MINOR.MICRO" * Y_HOME Yorick's "site directory" with machine dependent files * Y_SITE Yorick's "site directory" with machine independent files */ /* Symbols defined in std0.c: */ extern char *yLaunchDir; extern int yBatchMode; /* Symbols defined in ops0.c: */ extern void *BuildResult2(Operand *l, Operand *r); /* Symbols defined in ycode.c: */ extern char *yHomeDir; /* e.g., "/usr/local/lib/yorick/1.5" */ extern char *ySiteDir; /* e.g., "/usr/local/share/yorick/1.5" */ extern char *yUserPath; /* e.g., ".:~/yorick:~/Yorick" */ static void globalize_string(const char *name, const char *value); static void globalize_long(const char *name, long value); #if USE_FASTER_DIVIDE_Z static void fast_DivideZ(Operand *l, Operand *r); #endif /* USE_FASTER_DIVIDE_Z */ void Y_yeti_init(int argc) { const char *version = YETI_STRINGIFY(YETI_VERSION_MAJOR) "." \ YETI_STRINGIFY(YETI_VERSION_MINOR) "." \ YETI_STRINGIFY(YETI_VERSION_MICRO) YETI_VERSION_SUFFIX; #if USE_FASTER_DIVIDE_Z /* Replace complex division by faster code. */ complexOps.Divide = fast_DivideZ; #endif /* USE_FASTER_DIVIDE_Z */ /* Restore global variables. */ globalize_string("YETI_VERSION", version); globalize_long("YETI_VERSION_MAJOR", YETI_VERSION_MAJOR); globalize_long("YETI_VERSION_MINOR", YETI_VERSION_MINOR); globalize_long("YETI_VERSION_MICRO", YETI_VERSION_MICRO); globalize_string("YETI_VERSION_SUFFIX", YETI_VERSION_SUFFIX); if (! CalledAsSubroutine()) { yeti_push_string_value(version); } } static void globalize_string(const char *name, const char *value) { long index = Globalize(name, 0L); DataBlock *old = (globTab[index].ops == &dataBlockSym ? globTab[index].value.db : 0); Array *obj = NewArray(&stringStruct, (Dimension *)0); globTab[index].ops = &intScalar; /* in case of interrupt */ globTab[index].value.db = (DataBlock *)obj; globTab[index].ops = &dataBlockSym; Unref(old); obj->value.q[0] = p_strcpy(value); } static void globalize_long(const char *name, long value) { long index = Globalize(name, 0L); DataBlock *old = (globTab[index].ops == &dataBlockSym ? globTab[index].value.db : 0); globTab[index].ops = &longScalar; /* in case of interrupt */ globTab[index].value.l = value; Unref(old); } #if USE_FASTER_DIVIDE_Z /* Faster code for complex division (save 1 division out of 3 with respect to original Yorick DivideZ code resulting in ~33% faster code). */ static void fast_DivideZ(Operand *l, Operand *r) { const double one=1.0; double lr, li, rr, ri; /* watch out for dst==lv or rv */ double *lv, *rv, *dst; size_t i, n; dst = BuildResult2(l, r); if (! dst) YError("operands not conformable in binary /"); n = l->type.number; lv = l->value; rv = r->value; for (i=0 ; i0?rr:-rr)>(ri>0?ri:-ri)) { /* be careful about overflow... */ ri /= rr; rr = one/((one + ri*ri)*rr); dst[2*i] = (lr + li*ri)*rr; dst[2*i+1] = (li - lr*ri)*rr; } else { rr /= ri; /* do not care of division by zero here, since Yorick catches floating point exceptions */ ri = one/((one + rr*rr)*ri); dst[2*i] = (lr*rr + li)*ri; dst[2*i+1] = (li*rr - lr)*ri; } } PopTo(l->owner); } #endif /* USE_FASTER_DIVIDE_Z */ /*---------------------------------------------------------------------------*/ /* MEMORY HACKING ROUTINES */ static void *get_address(Symbol *s); static void build_dimlist(Symbol *stack, int nArgs); static Operand *form_operand_db(Symbol *owner, Operand *op); void Y_mem_base(int argc) { Array *array; Symbol *s; OpTable *ops; long value; if (argc != 1) YError("mem_base takes exactly 1 argument"); /*** based on Address() in ops3.c ***/ /* Taking the address of a variable X, where X is a scalar constant, causes X to be replaced by an Array. This is obscure, but there is no other obvious way to get both the efficiency of the scalar Symbols, AND the reference-count safety of Yorick pointers. Notice that if the address of a scalar is taken, the efficient representation is lost. */ if (sp->ops != &referenceSym) { bad_arg: YError("expected a reference to an array object"); } s = &globTab[sp->index]; ops = s->ops; if (ops == &dataBlockSym) { array = (Array *)s->value.db; } else if (ops == &doubleScalar) { array = NewArray(&doubleStruct, (Dimension *)0); array->value.d[0] = s->value.d; s->value.db = (DataBlock *)array; s->ops = &dataBlockSym; } else if (ops == &longScalar) { array = NewArray(&longStruct, (Dimension *)0); array->value.l[0] = s->value.l; s->value.db = (DataBlock *)array; s->ops = &dataBlockSym; } else if (ops == &intScalar) { array = NewArray(&intStruct, (Dimension *)0); array->value.i[0] = s->value.i; s->value.db = (DataBlock *)array; s->ops = &dataBlockSym; } else { goto bad_arg; } if (! array->ops->isArray) goto bad_arg; value = (long)array->value.c; Drop(2); PushLongValue(value); } void Y_mem_copy(int argc) { void *address; Symbol *s; if (argc != 2) YError("mem_copy takes exactly 2 arguments"); address = get_address(sp - 1); s = (sp->ops == &referenceSym) ? &globTab[sp->index] : sp; if (s->ops == &doubleScalar) { (void)memcpy(address, &(s->value.d), sizeof(double)); } else if (s->ops == &longScalar) { (void)memcpy(address, &(s->value.l), sizeof(long)); } else if (s->ops == &intScalar) { (void)memcpy(address, &(s->value.i), sizeof(int)); } else if (s->ops == &dataBlockSym && s->value.db->ops->isArray) { Array *array = (Array *)s->value.db; (void)memcpy(address, array->value.c, array->type.number*array->type.base->size); } else { YError("unexpected non-array data"); } } void Y_mem_peek(int argc) { Symbol *s, *stack = sp - argc + 1; StructDef *base; Array *array; void *address; if (argc < 2) YError("mem_peek takes at least 2 arguments"); address = get_address(stack); s = stack + 1; if (s->ops == &referenceSym) s = &globTab[s->index]; if (s->ops != &dataBlockSym || s->value.db->ops != &structDefOps) YError("expected type definition as second argument"); base = (StructDef *)s->value.db; if (base->dataOps->typeID < T_CHAR || base->dataOps->typeID > T_COMPLEX) YError("only basic data types are supported"); build_dimlist(stack + 2, argc - 2); array = PushDataBlock(NewArray(base, tmpDims)); memcpy(array->value.c, address, array->type.number*array->type.base->size); } static void *get_address(Symbol *s) { Operand op; if (! s->ops) YError("unexpected keyword argument"); s->ops->FormOperand(s, &op); if (op.type.dims == (Dimension *)0) { if (op.ops->typeID == T_LONG) return (void *)*(long *)op.value; if (op.ops->typeID == T_POINTER) return (void *)*(void **)op.value; } YError("bad address (expecting long integer or pointer scalar)"); return (void *)0; /* avoid compiler warning */ } /* The following function is a pure copy of BuildDimList in 'ops3.c' of Yorick source code -- required to avoid plugin clash. */ static void build_dimlist(Symbol *stack, int nArgs) { Dimension *tmp= tmpDims; tmpDims= 0; FreeDimension(tmp); while (nArgs--) { if (stack->ops==&referenceSym) ReplaceRef(stack); if (stack->ops==&longScalar) { if (stack->value.l<=0) goto badl; tmpDims= NewDimension(stack->value.l, 1L, tmpDims); } else if (stack->ops==&intScalar) { if (stack->value.i<=0) goto badl; tmpDims= NewDimension(stack->value.i, 1L, tmpDims); } else if (stack->ops==&dataBlockSym) { Operand op; form_operand_db(stack, &op); if (op.ops==&rangeOps) { Range *range= op.value; long len; if (range->rf || range->nilFlags || range->inc!=1) YError("only min:max ranges allowed in dimension list"); len= range->max-range->min+1; if (len<=0) goto badl; tmpDims= NewDimension(len, range->min, tmpDims); } else if (op.ops->promoteID<=T_LONG && (!op.type.dims || !op.type.dims->next)) { long len; op.ops->ToLong(&op); if (!op.type.dims) { len= *(long *)op.value; if (len<=0) goto badl; tmpDims= NewDimension(len, 1L, tmpDims); } else { long *dim= op.value; long n= *dim++; if (n>10 || n>=op.type.number) YError("dimension list format [#dims, len1, len2, ...]"); while (n--) { len= *dim++; if (len<=0) goto badl; tmpDims= NewDimension(len, 1L, tmpDims); } } } else if (op.ops!=&voidOps) { goto badl; } } else { badl: YError("bad dimension list"); } stack++; } } /* The following function is a pure copy of FormOperandDB in 'ops0.c' of Yorick source code -- required to avoid plugin clash. */ static Operand *form_operand_db(Symbol *owner, Operand *op) { DataBlock *db= owner->value.db; Operations *ops= db->ops; op->owner= owner; if (ops->isArray) { Array *array= (Array *)db; op->ops= ops; op->references= array->references; op->type.base= array->type.base; op->type.dims= array->type.dims; op->type.number= array->type.number; op->value= array->value.c; } else if (ops==&lvalueOps) { LValue *lvalue= (LValue *)db; StructDef *base= lvalue->type.base; if (lvalue->strider || base->model) { Array *array= FetchLValue(lvalue, owner); op->ops= array->ops; op->references= array->references; op->type.base= array->type.base; op->type.dims= array->type.dims; op->type.number= array->type.number; op->value= array->value.c; } else { op->ops= base->dataOps; op->references= 1; /* NEVER try to use this as result */ op->type.base= base; op->type.dims= lvalue->type.dims; op->type.number= lvalue->type.number; op->value= lvalue->address.m; } } else { op->ops= ops; op->references= db->references; op->type.base= 0; op->type.dims= 0; op->type.number= 0; op->value= db; } return op; } /*---------------------------------------------------------------------------*/ /* DATA ENCODING */ #include "prmtyp.h" void Y_get_encoding(int argc) { const char *name; static struct { const char *name; long layout[32]; } db[] = { {"alpha", {1,1,-1, 2,2,-1, 4,4,-1, 8,8,-1, 4,4,-1, 8,8,-1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"cray", {1,1,1, 8,8,1, 8,8,1, 8,8,1, 8,8,1, 8,8,1, 0,1,15,16,48,1,16384, 0,1,15,16,48,1,16384}}, {"dec", {1,1,-1, 2,2,-1, 4,4,-1, 4,4,-1, 4,4,-1, 8,8,-1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"i86", {1,1,-1, 2,2,-1, 4,4,-1, 4,4,-1, 4,4,-1, 8,4,-1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"ibmpc", {1,1,-1, 2,2,-1, 2,2,-1, 4,2,-1, 4,2,-1, 8,2,-1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"mac", {1,1,1, 2,2,1, 2,2,1, 4,2,1, 4,2,1, 8,2,1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"macl", {1,1,1, 2,2,1, 2,2,1, 4,2,1, 4,2,1, 12,2,1, 0,1,8,9,23,0,127, 0,1,15,32,64,1,16382}}, {"sgi64", {1,1,1, 2,2,1, 4,4,1, 8,8,1, 4,4,1, 8,8,1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"sun", {1,1,1, 2,2,1, 4,4,1, 4,4,1, 4,4,1, 8,8,1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"sun3", {1,1,1, 2,2,1, 4,2,1, 4,2,1, 4,2,1, 8,2,1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"vax", {1,1,-1, 2,1,-1, 4,1,-1, 4,1,-1, 4,1,2, 8,1,2, 0,1,8,9,23,0,129, 0,1,8,9,55,0,129}}, {"vaxg", {1,1,-1, 2,1,-1, 4,1,-1, 4,1,-1, 4,1,2, 8,1,2, 0,1,8,9,23,0,129, 0,1,11,12,52,0,1025}}, {"xdr", {1,1,1, 2,2,1, 4,4,1, 4,4,1, 4,4,1, 8,4,1, 0,1,8,9,23,0,127, 0,1,11,12,52,0,1023}}, {"native", {sizeof(char), P_STRUCT_ALIGN, 0, sizeof(short), P_SHORT_ALIGN, P_SHORT_ORDER, sizeof(int), P_INT_ALIGN, P_INT_ORDER, sizeof(long), P_LONG_ALIGN, P_LONG_ORDER, sizeof(float), P_FLOAT_ALIGN, P_FLOAT_ORDER, sizeof(double), P_DOUBLE_ALIGN, P_DOUBLE_ORDER, P_FLOAT_LAYOUT, P_DOUBLE_LAYOUT}} }; const int ndb = sizeof(db)/sizeof(db[0]); if (argc!=1) YError("get_encoding takes exactly one argument"); name = YGetString(sp); if (name) { long *result = YETI_PUSH_NEW_L(yeti_start_dimlist(32)); int i, c = name[0]; for (i=0 ; iops) YError("unexpected keyword argument"); PushLongValue(sp->ops->FormOperand(sp, &op)->references); } void Y_insure_temporary(int argc) { OpTable *ops; Symbol *glob, *stack; Array *array, *copy; int i; if (argc < 1 || ! CalledAsSubroutine()) { YError("insure_temporary must be called as a subroutine"); } for (i = 1 - argc ; i <= 0 ; ++i) { stack = sp + i; if (stack->ops != &referenceSym) { YError("insure_temporary expects variable reference(s)"); } glob = &globTab[stack->index]; ops = glob->ops; if (ops == &doubleScalar) { copy = NewArray(&doubleStruct, (Dimension *)0); copy->value.d[0] = glob->value.d; glob->value.db = (DataBlock *)copy; glob->ops = &dataBlockSym; } else if (ops == &longScalar) { copy = NewArray(&longStruct, (Dimension *)0); copy->value.l[0] = glob->value.l; glob->value.db = (DataBlock *)copy; glob->ops = &dataBlockSym; } else if (ops == &intScalar) { copy = NewArray(&intStruct, (Dimension *)0); copy->value.i[0] = glob->value.i; glob->value.db = (DataBlock *)copy; glob->ops = &dataBlockSym; } else if (ops == &dataBlockSym) { array = (Array *)glob->value.db; if (array->references >= 1 && array->ops->isArray) { /* make a fresh copy */ copy = NewArray(array->type.base, array->type.dims); glob->value.db = (DataBlock *)copy; --array->references; array->type.base->Copy(array->type.base, copy->value.c, array->value.c, array->type.number); } } } } /*---------------------------------------------------------------------------*/ /* SMOOTHING */ static void smooth_single(double *x, double p25, double p50, double p75, long n1, long n2, long n3); void Y_smooth3(int argc) { Operand op; double *x = NULL; long n1, n2, n3; int single = 0, is_complex; long which = 0; /* avoid compiler warning */ Symbol *stack; Dimension *dims; int nparsed = 0; double p25 = 0.25, p50 = 0.50, p75 = 0.75; for (stack=sp-argc+1 ; stack<=sp ; ++stack) { if (stack->ops) { /* non-keyword argument */ if (++nparsed == 1) { stack->ops->FormOperand(stack, &op); } else { YError("too many arguments"); } } else { /* keyword argument */ const char *keyword = globalTable.names[stack->index]; ++stack; if (keyword[0] == 'c' && keyword[1] == 0) { if (YNotNil(stack)) { p50 = YGetReal(stack); p25 = 0.5*(1.0 - p50); p75 = 0.5*(1.0 + p50); } } else if (keyword[0] == 'w' && ! strcmp(keyword, "which")) { if (YNotNil(stack)) { which = YGetInteger(stack); single = 1; } } else { YError("unknown keyword"); } } } if (nparsed != 1) YError("bad number of arguments"); /* Get input array. */ is_complex = (op.ops->typeID == T_COMPLEX); n1 = (is_complex ? 2*op.type.number : op.type.number); stack = op.owner; switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: /* Convert input in a new array of double's. */ op.ops->ToDouble(&op); x = op.value; dims = op.type.dims; break; case T_DOUBLE: case T_COMPLEX: /* If input array has references (is not temporary), make a new copy. */ if (op.references) { Array *array = NewArray((is_complex ? &complexStruct : &doubleStruct), op.type.dims); PushDataBlock(array); x = array->value.d; dims = array->type.dims; memcpy(x, op.value, n1*sizeof(double)); PopTo(stack); } else { x = op.value; dims = op.type.dims; } break; default: YError("bad data type for input array"); } while (sp != stack) Drop(1); /* left result on top of the stack */ /* Apply operator. */ n3 = 1; /* product of dimensions after current one */ if (single) { /* Apply operator along a single dimension. */ Dimension *tmp = dims; long rank=0; while (tmp) { ++rank; tmp = tmp->next; } if (which <= 0) which += rank; if (which <= 0 || which > rank) YError("WHICH is out of range"); while (dims) { n2 = dims->number; n1 /= n2; if (rank-- == which) { smooth_single(x, p25, p50, p75, n1, n2, n3); break; } n3 *= n2; dims = dims->next; } } else { /* Apply operator to every dimensions. */ while (dims) { n2 = dims->number; n1 /= n2; smooth_single(x, p25, p50, p75, n1, n2, n3); n3 *= n2; dims = dims->next; } } } static void smooth_single(double *x, double p25, double p50, double p75, long n1, long n2, long n3) { if (n2 >= 2) { long i, stride = n1, n = n1*n2; double x1, x2, x3; if (stride == 1) { for ( ; --n3>=0 ; x+=n) { x2 = x[0]; x3 = x[1]; x[0] = p75*x2 + p25*x3; for (i=2 ; i=0 ; x+=p) { for (n1=stride ; --n1>=0 ; ++x) { x2 = x[0]; x3 = x[stride]; x[0] = p75*x2 + p25*x3; for (i=2*stride ; i * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #ifndef _YETI_MORPH_C #define _YETI_MORPH_C 1 #include "config.h" #include "yeti.h" #include #include #define voxel_t unsigned char #define MORPH_DILATION dilation_c #define MORPH_EROSION erosion_c #include __FILE__ #define voxel_t short #define MORPH_DILATION dilation_s #define MORPH_EROSION erosion_s #include __FILE__ #define voxel_t int #define MORPH_DILATION dilation_i #define MORPH_EROSION erosion_i #include __FILE__ #define voxel_t long #define MORPH_DILATION dilation_l #define MORPH_EROSION erosion_l #include __FILE__ #define voxel_t float #define MORPH_DILATION dilation_f #define MORPH_EROSION erosion_f #include __FILE__ #define voxel_t double #define MORPH_DILATION dilation_d #define MORPH_EROSION erosion_d #include __FILE__ static void morph_op(int argc, int mop); static long *get_offset(Symbol *s, Dimension **dims); extern BuiltIn Y_morph_erosion, Y_morph_dilation; void Y_morph_erosion(int argc) { morph_op(argc, 0); } void Y_morph_dilation(int argc) { morph_op(argc, 1); } static void morph_op(int argc, int mop) { char msg[80]; Operand op; Dimension *dims; Symbol *s; Array *ap; long ndims, width, height, depth, number, *off, *dx, *dy, *dz; if (argc != 2) { sprintf(msg, "morph_%s takes exactly 2 arguments", (mop ? "dilation" : "erosion")); YError(msg); } /* Get input array. */ s = sp - 1; if (! s->ops) YError("unexpected keyword argument"); dims = s->ops->FormOperand(s, &op)->type.dims; ndims = 0; width = height = depth = 0; while (dims) { if (++ndims > 3) YError("too many dimensions for input array"); depth = height; height = width; width = dims->number; dims = dims->next; } /* Get radius / offset array. */ off = get_offset(sp, &dims); if (! dims) { /* Only one extra scalar argument: the structuring element is a sphere. */ long x, y, z, r, n, lim0, lim1, lim2; r = off[0]; if (r < 0) { YError("radius of structuring element must be a positive integer"); } Drop(1); /* to be able to push temporary workspace */ n = 2*r + 1; lim0 = r*(r + 1); number = 0; if (depth > 1) { n = n*n*n; /* maximum number of offsets per dimension */ off = yeti_push_workspace(3*sizeof(long)*n); dx = off; dy = dx + n; dz = dy + n; for (z=-r ; z<=r ; ++z) { lim1 = lim0 - z*z; for (y=-r ; y<=r ; ++y) { lim2 = lim1 - y*y; for (x=-r ; x<=r ; ++x) { /* To be inside the structuring element, we must have * sqrt(x*x + y*y + z*z) < r + 1/2 * which is the same as: * x*x + y*y + z*z <= r*(r + 1) * because X, Y, Z and R are integers. */ if (x*x <= lim2) { dx[number] = x; dy[number] = y; dz[number] = z; ++number; } } } } } else if (height > 1) { n = n*n; /* maximum number of offsets per dimension */ dx = yeti_push_workspace(2*sizeof(long)*n); dy = dx + n; dz = NULL; for (y=-r ; y<=r ; ++y) { lim1 = lim0 - y*y; for (x=-r ; x<=r ; ++x) { if (x*x <= lim1) { dx[number] = x; dy[number] = y; ++number; } } } } else { dx = yeti_push_workspace(sizeof(long)*n); dy = NULL; dz = NULL; for (x=-r ; x<=r ; ++x) { dx[number++] = x; } } } else { if (ndims > 1) { if (dims->number != ndims) { YError("last dimension of OFF not equal to number of dimensions of A"); } dims = dims->next; } number = 1; while (dims) { number *= dims->number; dims = dims->next; } dx = off; dy = (ndims >= 2 ? dx + number : NULL); dz = (ndims >= 3 ? dy + number : NULL); } /* Allocate output array and apply the operation. */ ap = ((Array *)PushDataBlock(NewArray(op.type.base, op.type.dims))); switch (op.ops->typeID) { #undef _ #define _(ID) (mop ? dilation_##ID : erosion_##ID)((void *)ap->value.ID, \ op.value, width, height, depth, dx, dy, dz, number); break case T_CHAR: _(c); case T_SHORT: _(s); case T_INT: _(i); case T_LONG: _(l); case T_FLOAT: _(f); case T_DOUBLE: _(d); #undef _ default: YError("bad data type"); } } /* almost the same as YGet_L */ static long *get_offset(Symbol *s, Dimension **dims) { Operand op; if (! s->ops) YError("unexpected keyword argument"); if (s->ops == &referenceSym && globTab[s->index].ops == &longScalar) { if (dims) *dims= 0; return &globTab[s->index].value.l; } switch (s->ops->FormOperand(s, &op)->ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: op.ops->ToLong(&op); case T_LONG: if (dims) *dims = op.type.dims; return op.value; } YError("bad data type for index offsets"); return 0L; } #else /* _YETI_MORPH_C ------------------------------------------------------*/ #ifdef MORPH_SEGMENTATION long MORPH_SEGMENTATION(long region[], search_t search[], long index[], long nb[], unsigned short x0[], unsigned short x1[], unsigned short y0[], unsigned short y1[], const voxel_t img[], long width, long height) { #define LEFT ((search_t)1) #define RIGHT ((search_t)2) #define DOWN ((search_t)4) #define UP ((search_t)8) #define ALL ((search_t)(LEFT|RIGHT|DOWN|UP)) long number, x, y; long i, j, k, l, mark, count; voxel_t level; search_t s; mark = 1; number = width*height; for (i=0 ; i 0, k - 1); CHECK_PIXEL(RIGHT, x < (width - 1), k + 1); CHECK_PIXEL(DOWN, y > 0, k - width); CHECK_PIXEL(UP, y < (height - 1), k + width); #undef CHECK_PIXEL } if (nb) { nb[mark - 1] = count; } if (x0 || x1) { long xmin, xmax; xmin = xmax = index[0]%width; for (j=1 ; j xmax) xmax = x; } if (x0) x0[mark - 1] = xmin; if (x1) x1[mark - 1] = xmax; } if (y0 || y1) { long ymin, ymax; ymin = ymax = index[0]/width; for (j=1 ; j ymax) ymax = y; } if (y0) y0[mark - 1] = ymin; if (y1) y1[mark - 1] = ymax; } ++mark; } #undef LEFT #undef RIGHT #undef DOWN #undef UP #undef ALL return mark; } #endif /* MORPH_SEGMENTATION */ #undef _ #define _(CMP) (voxel_t dst[], const voxel_t src[], \ long width, long height, long depth, \ const long dx[], const long dy[], const long dz[], \ long number) \ { \ long i, x, y, z, xp, yp, zp; \ voxel_t val, tmp; \ \ val = 0; /* avoids compiler warnings */ \ if (depth > 1) { \ /* 3-D case. */ \ for (z=0 ; z= 0 && xp < width && \ yp >= 0 && yp < height && \ zp >= 0 && zp < depth) { \ if (any) { \ if ((tmp = src[(zp*height + yp)*width + xp]) CMP val) { \ val = tmp; \ } \ } else { \ val = src[(zp*height + yp)*width + xp]; \ any = 1; \ } \ } \ } \ if (any) { \ dst[(z*height + y)*width + x] = val; \ } \ } \ } \ } \ } else if (height > 1) { \ /* 2-D case. */ \ for (y=0 ; y= 0 && xp < width && \ yp >= 0 && yp < height) { \ if (any) { \ if ((tmp = src[yp*width + xp]) CMP val) { \ val = tmp; \ } \ } else { \ val = src[yp*width + xp]; \ any = 1; \ } \ } \ } \ if (any) { \ dst[y*width + x] = val; \ } \ } \ } \ } else { \ /* 1-D case. */ \ for (x=0 ; x= 0 && xp < width) { \ if (any) { \ if ((tmp = src[xp]) CMP val) { \ val = tmp; \ } \ } else { \ val = src[xp]; \ any = 1; \ } \ } \ } \ if (any) { \ dst[x] = val; \ } \ } \ } \ } #ifdef MORPH_DILATION static void MORPH_DILATION _(>) #endif #ifdef MORPH_EROSION static void MORPH_EROSION _(<) #endif /* MORPH_EROSION */ #undef _ #undef MORPH_SEGMENTATION #undef MORPH_DILATION #undef MORPH_EROSION #undef voxel_t #endif /* _YETI_MORPH_C -----------------------------------------------------*/ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_new.c000066400000000000000000000201431253351442600153570ustar00rootroot00000000000000/* * yeti_new.c - * * Various built-in functions using the ney Yorick API defined in "yapi.h". * *----------------------------------------------------------------------------- * * Copyright (C) 2005-2009 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #ifndef _YETI_NEW_C #define _YETI_NEW_C 1 #include #include #include #include #include #include #include #include /* Built-in functions defined in this file. */ void Y_is_dimlist(int argc); void Y_same_dimlist(int argc); void Y_make_dimlist(int argc); void Y_make_range(int argc); void Y_parse_range(int argc); /*---------------------------------------------------------------------------*/ /* RANGE */ #if 0 void Y_is_range(int argc) { int result = 0; if (argc != 1) y_error("wrong number of arguments"); switch (yarg_typeid(0)) { case Y_CHAR: case Y_SHORT: case Y_INT: case Y_LONG: if (yarg_rank(0) == 0) { result = 2; } break; case Y_RANGE: result = 1; break; case Y_VOID: result = 3; break; } push_int(result); } #endif void Y_make_range(int argc) { long dims[Y_DIMSIZE], ntot, *arr; if (argc != 1) y_error("wrong number of arguments"); switch (yarg_typeid(0)) { case Y_CHAR: case Y_SHORT: case Y_INT: case Y_LONG: arr = ygeta_l(0, &ntot, dims); if ((ntot == 4) && (dims[0] == 1)) { ypush_range(&arr[1], arr[0]); return; } } y_error("expecting an array of 4 integers"); } void Y_parse_range(int argc) { long dims[2], *arr; if (argc != 1) y_error("wrong number of arguments"); if (yarg_typeid(0) != Y_RANGE) y_error("expecting a range"); dims[0] = 1; dims[1] = 4; arr = ypush_l(dims); arr[0] = yget_range(1, &arr[1]); /* iarg=1 because result already pushed on top of stack */ } /*---------------------------------------------------------------------------*/ /* DIMENSION LIST */ #define type_t unsigned char #define CHECK_DIMS check_dims_c #include __FILE__ #define type_t short #define CHECK_DIMS check_dims_s #include __FILE__ #define type_t int #define CHECK_DIMS check_dims_i #include __FILE__ #define type_t long #define CHECK_DIMS check_dims_l #include __FILE__ #if 0 void Y_tmpfile(int argc) { const char *tail = "XXXXXX"; ystring_t src, dst, *arr; long len, size; int pad, fd; if (argc != 1) y_error("tmpfile takes exaclty one argument"); src = ygets_q(0); len = (src && src[0] ? strlen(src) : 0); if (len < 6 || strcmp(src + (len - 6), tail)) { pad = 1; size = len + 7; } else { pad = 0; size = len + 1; } arr = ypush_q(0); dst = p_malloc(size); dst[size - 1] = '\0'; /* mark end of string */ arr[0] = dst; /* then store string pointer */ if (len > 0) memcpy(dst, src, len); if (pad) memcpy(dst + len, tail, 6); fprintf(stderr, "template=\"%s\"\n", dst); fd = mkstemp(dst); if (fd < 0) { y_error("tmpfile failed to create a unique temporary file"); } else { close(fd); } } #endif #if 0 void Y_is_dimlist(int argc) { } void Y_same_dimlist(int argc) { } #endif /* n or [l, n1, n2, .., nl] */ void Y_make_dimlist(int argc) { long *dimlist, dims[Y_DIMSIZE], ref, ndims, j, n; int iarg; /* argument index */ int nvoids; /* number of void arguments */ int iarg_of_result; /* index of potentially valid result */ if (argc < 1) y_error("make_dimlist takes at least one argument"); if (yarg_subroutine()) { ref = yget_ref(argc - 1); if (ref < 0L) y_error("expecting a simple reference for first argument"); } else { ref = -1L; } /* First pass: count total number of dimensions. */ nvoids = 0; iarg_of_result = -1; ndims = 0L; for (iarg = argc - 1; iarg >= 0; --iarg) { switch (yarg_typeid(iarg)) { case Y_CHAR: ndims += check_dims_c(ygeta_c(iarg, NULL, dims), dims); break; case Y_SHORT: ndims += check_dims_s(ygeta_s(iarg, NULL, dims), dims); break; case Y_INT: ndims += check_dims_i(ygeta_i(iarg, NULL, dims), dims); break; case Y_LONG: ndims += check_dims_l(ygeta_l(iarg, NULL, dims), dims); if (iarg_of_result < 0 && dims[0] == 1L) { /* First argument which is elligible as the resulting dimension list. */ iarg_of_result = iarg; } break; case Y_VOID: ++nvoids; break; default: y_error("unexpected data type in dimension list"); } } if (argc - nvoids == 1 && iarg_of_result >= 0) { /* Exactly one non void argument and which is elligible as the resulting dimension list. */ if (ref < 0L) { /* Called as a function; nothing to do except dropping the void arguments above the result if any. */ if (iarg_of_result > 0) { yarg_drop(iarg_of_result); } return; } /* Called as a subroutine: elligible result must be the first argument. */ if (iarg_of_result == argc -1) { return; } } /* Second pass: build up new dimension list. */ dims[0] = 1; dims[1] = ndims + 1; dimlist = ypush_l(dims); *dimlist = ndims; for (iarg = argc; iarg >= 1; --iarg) { #define GET_DIMS(type_t, x) \ { \ type_t *ptr = (type_t *)ygeta_##x(iarg, &n, dims); \ if (dims[0]) { \ for (j=1L ; j= 0L) { /* replace reference by topmost stack element */ yput_global(ref, 0); } } #else /* _YETI_NEW_C *********************************************************/ #ifdef CHECK_DIMS static long CHECK_DIMS(const void *array, const long dims[]) { const type_t *value = array; long j, n; n = value[0]; if (! dims[0] && n > 0L) return 1L; if (dims[0] == 1L && dims[1] == n + 1L) { for (j=1L ; j<=n ; ++j) { if (value[j] <= 0) { goto bad_dimlist; } } return n; } bad_dimlist: y_error("bad dimension list @"); return -1L; /* avoid compiler warnings */ } #endif /* CHECK_DIMS */ #undef type_t #undef CHECK_DIMS #endif /* _YETI_NEW_C ********************************************************/ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_rgl.c000066400000000000000000000516231253351442600153610ustar00rootroot00000000000000/* * yeti_rgl.c - * * Regularization for Yeti (an extension of Yorick). * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- * * CUSTOMIZATION AT COMPILE TIME * * Yorick: This code can be compiled with preprocessor flag * -DYORICK to enable Yorick support. * * Number of dimensions: Compile this code with preprocessor flag * -DRGL_MAX_NDIMS=n with n <= 8 to limit the maximum number * of dimensions (and reduce the size of the code). Since the * code is smart enough to "compress" dimensions, it may * however be applicable for more than RGL_MAX_NDIMS * dimensions. * *----------------------------------------------------------------------------- */ #ifndef _RGL_CODE #define _RGL_CODE 1 #include #include #include #ifdef YORICK # include /* for Yorick interface */ #endif #ifndef NULL # define NULL 0 #endif #ifndef RGL_MAX_NDIMS # define RGL_MAX_NDIMS 8 #endif /*---------------------------------------------------------------------------*/ /* DEFINITIONS */ typedef double rgl_roughness_penalty_t(const double hyper[], /* hyper-parameters */ const long ndims, /* number of dimensions */ const long dim[], /* dimensions */ const long off[], /* offsets */ const double arr[], /* model array */ double grd[]); /* gradient (can be NULL) */ extern rgl_roughness_penalty_t rgl_roughness_l2; extern rgl_roughness_penalty_t rgl_roughness_l2_periodic; extern rgl_roughness_penalty_t rgl_roughness_l1; extern rgl_roughness_penalty_t rgl_roughness_l1_periodic; extern rgl_roughness_penalty_t rgl_roughness_l2l1; extern rgl_roughness_penalty_t rgl_roughness_l2l1_periodic; extern rgl_roughness_penalty_t rgl_roughness_l2l0; extern rgl_roughness_penalty_t rgl_roughness_l2l0_periodic; extern rgl_roughness_penalty_t rgl_roughness_cauchy; extern rgl_roughness_penalty_t rgl_roughness_cauchy_periodic; #define integer_t long #define real_t double /* Error codes: */ #define RGL_ERROR_BAD_ADDRESS -1 #define RGL_ERROR_BAD_DIMENSION -2 #define RGL_ERROR_BAD_HYPER -3 #define RGL_ERROR_TOO_MANY_DIMS -4 /* Maximum number of dimensions: */ #define RGL_MAX_NDIMS 8 /* Codes for cost functions: */ #define RGL_COST_L1 1 #define RGL_COST_L2 2 #define RGL_COST_L2L1 3 #define RGL_COST_CAUCHY 4 #define RGL_COST_L2L0 5 #define RGL_MAX(a,b) ((a) >= (b) ? (a) : (b)) #define RGL_MIN(a,b) ((a) <= (b) ? (a) : (b)) #define RGL_PERIODIC 0 #define RGL_COST RGL_COST_L2 #define RGL_ROUGHNESS rgl_roughness_l2 #include __FILE__ #define RGL_PERIODIC 1 #define RGL_COST RGL_COST_L2 #define RGL_ROUGHNESS rgl_roughness_l2_periodic #include __FILE__ #define RGL_PERIODIC 0 #define RGL_COST RGL_COST_L1 #define RGL_ROUGHNESS rgl_roughness_l1 #include __FILE__ #define RGL_PERIODIC 1 #define RGL_COST RGL_COST_L1 #define RGL_ROUGHNESS rgl_roughness_l1_periodic #include __FILE__ #define RGL_PERIODIC 0 #define RGL_COST RGL_COST_L2L1 #define RGL_ROUGHNESS rgl_roughness_l2l1 #include __FILE__ #define RGL_PERIODIC 1 #define RGL_COST RGL_COST_L2L1 #define RGL_ROUGHNESS rgl_roughness_l2l1_periodic #include __FILE__ #define RGL_PERIODIC 0 #define RGL_COST RGL_COST_CAUCHY #define RGL_ROUGHNESS rgl_roughness_cauchy #include __FILE__ #define RGL_PERIODIC 1 #define RGL_COST RGL_COST_CAUCHY #define RGL_ROUGHNESS rgl_roughness_cauchy_periodic #include __FILE__ #define RGL_PERIODIC 0 #define RGL_COST RGL_COST_L2L0 #define RGL_ROUGHNESS rgl_roughness_l2l0 #include __FILE__ #define RGL_PERIODIC 1 #define RGL_COST RGL_COST_L2L0 #define RGL_ROUGHNESS rgl_roughness_l2l0_periodic #include __FILE__ /*---------------------------------------------------------------------------*/ /* YORICK INTERFACE */ #ifdef YORICK static long *get_vector_l(int iarg, long *ntot) { if (yarg_number(iarg) != 1 || yarg_rank(iarg) > 1) { y_error("expecting a vector of integers"); } return ygeta_l(iarg, ntot, NULL); } static double *get_vector_d(int iarg, long *ntot) { int id = yarg_number(iarg); if (id < 1 || id > 2 || yarg_rank(iarg) > 1) { y_error("expecting a vector of reals"); } return ygeta_d(iarg, ntot, NULL); } static double *get_array_d(int iarg, long *ntot, long dims[]) { int id = yarg_number(iarg); if (id < 1 || id > 2) { y_error("expecting an array of reals"); } return ygeta_d(iarg, ntot, dims); } static void roughness(int argc, const char *name, rgl_roughness_penalty_t *rgl, int n) { double penalty; char buf[100]; double *arr, *grd, *hyp; long dims[Y_DIMSIZE]; long off[Y_DIMSIZE - 1], dim[Y_DIMSIZE - 1]; long *offset; long j, ndims, ref, noffs, nhyps, ntot; int iarg, type, flag; if (argc < 3 || argc > 4) { strcpy(buf, name); strcat(buf, " takes 3 or 4 arguments"); y_error(buf); } /* Get HYPER argument. */ hyp = get_vector_d(argc - 1, &nhyps); if (nhyps != n) { y_error("bad number of hyper-parameters"); } for (j = 0; j < nhyps; ++j) { if (hyp[j] < 0.0) { y_error("invalid hyper-parameter value(s)"); } } /* Get OFFSET and ARR arguments. Check compatibility of OFFSET and dimension list of ARR. */ offset = get_vector_l(argc - 2, &noffs); arr = get_array_d(argc - 3, &ntot, dims); ndims = dims[0]; for (j = 0; j < ndims; ++j) { if (j < noffs) { off[j] = offset[j]; } else { off[j] = 0; } dim[j] = dims[j + 1]; } for (j = ndims; j < noffs; ++j) { if (offset[j]) { y_error("non-zero extra offset(s)"); } } /* Get GRD argument. Create output gradient if needed. */ grd = NULL; if (argc >= 4) { iarg = argc - 4; ref = yget_ref(iarg); if (ref == -1L) { y_error("expecting a simple variable reference for argument GRD"); } type = yarg_typeid(iarg); flag = 0; switch (type) { case Y_CHAR: case Y_SHORT: case Y_INT: case Y_LONG: case Y_FLOAT: case Y_DOUBLE: grd = ygeta_d(iarg, NULL, dims); if (dims[0] != ndims) { flag = 1; } else { for (j = 0; j < ndims; ++j) { if (dims[j + 1] != dim[j]) { flag = 1; break; } } } break; case Y_VOID: grd = ypush_d(dims); break; default: flag = 1; } if (flag) { y_error("argument GRD must be nil or an array of reals with same dimension list as ARR"); } if (type != Y_DOUBLE) { yput_global(ref, iarg); } } /* Compute penalty and return result. */ penalty = rgl(hyp, ndims, dim, off, arr, grd); if (penalty < 0.0) { if (penalty == -1.0) { strcpy(buf, "bad 1st hyper-parameter in "); } else if (penalty == -2.0) { strcpy(buf, "bad 2nd hyper-parameter in "); } else if (penalty == -11.0) { strcpy(buf, "too many dimensions in "); } else { strcpy(buf, "unknown error in "); } strcat(buf, name); y_error(buf); } ypush_double(penalty); } #define MAKE_BUILTIN(cost, n) \ void Y_rgl_roughness_##cost(int argc) \ { \ roughness(argc, "rgl_roughness_"#cost, rgl_roughness_##cost, n); \ } MAKE_BUILTIN(l2, 1) MAKE_BUILTIN(l2_periodic, 1) MAKE_BUILTIN(l1, 1) MAKE_BUILTIN(l1_periodic, 1) MAKE_BUILTIN(l2l1, 2) MAKE_BUILTIN(l2l1_periodic, 2) MAKE_BUILTIN(l2l0, 2) MAKE_BUILTIN(l2l0_periodic, 2) MAKE_BUILTIN(cauchy, 2) MAKE_BUILTIN(cauchy_periodic, 2) #endif /* YORICK */ #else /* _RGL_CODE is defined */ /*---------------------------------------------------------------------------*/ /* ROUGHNESS PENALTY */ /* * Nomenclature: * * PREFIX_NAME_COST[_PERIODIC] * * where: * * PREFIX = rgl * * NAME = roughness * * COST = l1 / l2 / l2l1 / cauchy / l2l0 * * DIMENSIONS = # of dimensions of interest, prefixed with a 'p' for * periodic bounds * * Prototype of penalty functions: * * double rgl(const double hyper[], const integer_t ndims, * const long dim[], const long off[], * const double arr[], double grd[]); * * the returned value is the penalty and the arguments are: * * HYPER is array of hyper-parameters. HYPER[0] is the weight of the * regularization; other elements of HYPER depend on the * regularization and/or on the cost function, for instance, HYPER[1] * is the threshold in L2-L1, L2-L0 and Cauchy cost functions; * * NDIMS is the number of dimensions (number of elements in DIM and OFF * arrays). * * DIM is the list of dimensions. DIM[0] is the * lenght of the faster varying index. Hence ARR has * DIM[0]*DIM[1]*...*DIM[NDIMS - 1] elements. * * OFF is the list of offsets. OFF[j] is the offset along j+1-th * dimension. * * ARR is the model array. * * GRD is an optional (can be NULL) array to store the gradient. If * non-NULL, GRD must have as many elements has ARR. GRD is assumed * to have been correctly initialized. This can be used to sum * the gradient of the cost for differetnt offsets. */ #ifdef RGL_ROUGHNESS double RGL_ROUGHNESS(const double hyper[], /* hyper-parameters */ const integer_t ndims, /* number of dimensions */ const integer_t dim[], /* dimensions */ const integer_t off[], /* offsets */ const real_t arr[], /* model array */ real_t grd[]) /* gradient (can be NULL) */ { const double ZERO = 0.0; #if (RGL_COST == RGL_COST_L2) double w ,r; #endif #if (RGL_COST == RGL_COST_L1) double w; #endif #if (RGL_COST == RGL_COST_L2L1) const double ONE = 1.0; double q, r, s, w; #endif #if (RGL_COST == RGL_COST_L2L0) const double ONE = 1.0; double q, r, s, w; #endif #if (RGL_COST == RGL_COST_CAUCHY) const double ONE = 1.0; double q, r, s, w; #endif double penalty; #undef s1 #define s1 1 /* fisrt stride is always equal to 1 */ integer_t j1, e1, lo1, hi1; integer_t j2, e2, lo2, hi2, s2; integer_t j3, e3, lo3, hi3, s3; integer_t j4, e4, lo4, hi4, s4; integer_t j5, e5, lo5, hi5, s5; integer_t j6, e6, lo6, hi6, s6; integer_t j7, e7, lo7, hi7, s7; integer_t j8, e8, lo8, hi8, s8; integer_t j9, s9; integer_t n, j, jc; integer_t dim_c[RGL_MAX_NDIMS]; /* compact offsets */ integer_t off_c[RGL_MAX_NDIMS]; /* compact dimensions */ /* Check arguments. */ if (hyper[0] < ZERO) { return -1.0; } #if (RGL_COST == RGL_COST_L2L1) || \ (RGL_COST == RGL_COST_L2L0) || \ (RGL_COST == RGL_COST_CAUCHY) if (hyper[1] <= ZERO) { /* By continuity, the cost is ZERO when HYPER[1] = 0. */ return (hyper[1] ? -2.0 : 0.0); } #endif if (ndims <= 0 || dim == NULL || off == NULL || hyper == NULL || arr == NULL || hyper[0] <= 0.0) { return 0.0; } /* Compact dimensions. */ jc = 0; /* index over "compact" dimensions list */ dim_c[0] = dim[0]; off_c[0] = off[0]; for (j = 1; j < ndims; ++j) { if (off[j] == 0 && off_c[jc] == 0) { /* Collapse with previous dimension. */ dim_c[jc] *= dim[j]; } else { /* Add new dimension. */ if (++jc >= RGL_MAX_NDIMS) { return -1.0; } dim_c[jc] = dim[j]; off_c[jc] = off[j]; } } n = jc + 1; /* number of "compact" dimensions */ #if RGL_PERIODIC for (jc = 0; jc < n; ++jc) { if (off_c[jc] >= 0) { off_c[jc] %= dim_c[jc]; } else { off_c[jc] = ((-off_c[jc])%dim_c[jc]); if (off_c[jc]) { off_c[jc] = dim_c[jc] - off_c[jc]; } } } #endif /* RGL_PERIODIC */ /* Macros for spk = (k+1)-th stride, jpk = (k+1)-th index. */ #undef jp1 #define jp1 j2 #undef jp2 #define jp2 j3 #undef jp3 #define jp3 j4 #undef jp4 #define jp4 j5 #undef jp5 #define jp5 j6 #undef jp6 #define jp6 j7 #undef jp7 #define jp7 j8 #undef jp8 #define jp8 j9 #undef sp1 #define sp1 s2 #undef sp2 #define sp2 s3 #undef sp3 #define sp3 s4 #undef sp4 #define sp4 s5 #undef sp5 #define sp5 s6 #undef sp6 #define sp6 s7 #undef sp7 #define sp7 s8 #undef sp8 #define sp8 s9 #undef LOOP #undef CODE #undef BODY_1 #undef FINAL_1 #undef BODY_2 #undef FINAL_2 /* Macro definitions and constants for L1 (absolute value) cost function. */ #if (RGL_COST == RGL_COST_L1) w = (grd ? hyper[0] : ZERO); # define BODY_1(a1, a2) penalty += fabs(arr[a2] - arr[a1]); # define FINAL_1 penalty *= hyper[0]; # define BODY_2(a1, a2) if (arr[a2] > arr[a1]) { \ penalty += w*(arr[a2] - arr[a1]); \ grd[a2] += w; \ grd[a1] -= w; \ } \ if (arr[a2] < arr[a1]) { \ penalty -= w*(arr[a2] - arr[a1]); \ grd[a2] -= w; \ grd[a1] += w; \ } #define FINAL_2 /* nothing to do */ #endif /* RGL_COST_L1 */ /* Macro definitions and constants for L2 (quadratic) cost function. */ #if (RGL_COST == RGL_COST_L2) w = (grd ? 2.0*hyper[0] : ZERO); # define BODY_1(a1, a2) r = arr[a2] - arr[a1]; \ penalty += r*r; # define FINAL_1 penalty *= hyper[0]; # define BODY_2(a1, a2) r = arr[a2] - arr[a1]; \ penalty += r*r; \ grd[a2] += w*r; \ grd[a1] -= w*r; # define FINAL_2 penalty *= hyper[0]; #endif /* RGL_COST_L2 */ /* Macro definitions and constants for L2-L1 cost function. */ #if (RGL_COST == RGL_COST_L2L1) w = 2.0*hyper[0]; q = ONE/hyper[1]; # define BODY_1(a1, a2) r = arr[a2] - arr[a1]; \ s = q*fabs(r); \ penalty += (s - log(ONE + s)); # define FINAL_1 penalty *= (w*hyper[1]*hyper[1]); # define BODY_2(a1, a2) r = arr[a2] - arr[a1]; \ s = q*fabs(r); \ penalty += (s - log(ONE + s)); \ r *= w/(ONE + s); \ grd[a2] += r; \ grd[a1] -= r; # define FINAL_2 penalty *= (w*hyper[1]*hyper[1]); #endif /* RGL_COST_L2L1 */ /* Macro definitions and constants for L2-L0 cost function. */ #if (RGL_COST == RGL_COST_L2L0) w = 2.0*hyper[0]*hyper[1]; q = ONE/hyper[1]; # define BODY_1(a1, a2) s = atan(q*(arr[a2] - arr[a1])); \ penalty += s*s; # define FINAL_1 penalty *= (hyper[0]*hyper[1]*hyper[1]); # define BODY_2(a1, a2) r = q*(arr[a2] - arr[a1]); \ s = atan(r); \ penalty += s*s; \ r = w*s/(ONE + r*r); \ grd[a2] += r; \ grd[a1] -= r; # define FINAL_2 penalty *= (hyper[0]*hyper[1]*hyper[1]); #endif /* RGL_COST_L2L0 */ /* Macro definitions and constants for CAUCHY cost function. */ #if (RGL_COST == RGL_COST_CAUCHY) w = 2.0*hyper[0]*hyper[1]; q = ONE/hyper[1]; # define BODY_1(a1, a2) r = q*(arr[a2] - arr[a1]); \ s = ONE + r*r; \ penalty += log(s); # define FINAL_1 penalty *= (hyper[0]*hyper[1]*hyper[1]); # define BODY_2(a1, a2) r = q*(arr[a2] - arr[a1]); \ s = ONE + r*r; \ penalty += log(s); \ r *= w/s; \ grd[a2] += r; \ grd[a1] -= r; # define FINAL_2 penalty *= (hyper[0]*hyper[1]*hyper[1]); #endif /* RGL_COST_CAUCHY */ #if RGL_PERIODIC /* * Periodic case * ~~~~~~~~~~~~~ * * At k-th dimension, j_k is the offset of the 'other' element * not accounting for indices lower than k. Therefore a loop * writes: * * for (i_k = 0; i_k < dim_k; ++i_k) { * j_k = j_{k+1} + ((i_k + off_k)%dim_k)*s_k; * ...; * * where i_k and s_k are the index and the stride along k-th * dimension. * * This can be rewritten as: * * j_k = j_{k+1} + ((i_k*s_k + off_k*s_k) % (dim_k*s_k) * = j_{k+1} + (e_k % s_{k+1}) * * with: * * e_k = i_k*s_k + off_k*s_k * = lo_k, lo_k + s_k, ..., hi_k - s_k * lo_k = off_k*s_k (i_k = 0) * hi_k = lo_k + dim_k*s_k (i_k = dim_k) * * finally the loop reads: * * for (e_k = lo_k; e_k < hi_k; e_k += s_k) { * j_k = j_{k+1} + (e_k % s_{k+1}); * ...; * * Note that j1 is the position of the 'other' element inside the * final loop. */ # define LOOP(k) \ for (e##k = lo##k, j##k = jp##k + e##k; \ e##k < hi##k; \ e##k += s##k, j##k = jp##k + (e##k % sp##k)) # define CODE(k) \ sp##k = dim_c[k-1]*s##k; /* next stride */ \ lo##k = off_c[k-1]*s##k; \ hi##k = (off_c[k-1] + dim_c[k-1])*s##k; \ if (n == k) { \ jp##k = 0; /* let the optimizer do the job */ \ if (grd) { \ LOOPS { \ BODY_2(j, j1) \ ++j; \ } \ FINAL_2 \ } else { \ LOOPS { \ BODY_1(j, j1) \ ++j; \ } \ FINAL_1 \ } \ return penalty; \ } #else /* not RGL_PERIODIC */ # define LOOP(k) \ for (j##k = lo##k + jp##k, e##k = hi##k + jp##k; j##k < e##k; j##k += s##k) # define CODE(k) \ j += off_c[k-1]*s##k; /* increment total offset */ \ sp##k = dim_c[k-1]*s##k; /* next stride */ \ lo##k = (off_c[k-1] >= 0 ? 0 : -off_c[k-1]*s##k); \ hi##k = (off_c[k-1] >= 0 ? dim_c[k-1] - off_c[k-1] : dim_c[k-1])*s##k;\ if (lo##k >= hi##k) { \ return 0.0; \ } \ if (n == k) { \ jp##k = 0; /* let the optimizer do the job */ \ if (grd) { \ LOOPS { \ BODY_2(j1, j1 + j) \ } \ FINAL_2 \ } else { \ LOOPS { \ BODY_1(j1, j1 + j) \ } \ FINAL_1 \ } \ return penalty; \ } #endif /* not RGL_PERIODIC */ /* Loop over dimensions. */ penalty = 0.0; j = 0; #if (RGL_MAX_NDIMS >= 1) # undef LOOPS # define LOOPS LOOP(1) CODE(1); #endif #if (RGL_MAX_NDIMS >= 2) # undef LOOPS # define LOOPS LOOP(2) LOOP(1) CODE(2); #endif #if (RGL_MAX_NDIMS >= 3) # undef LOOPS # define LOOPS LOOP(3) LOOP(2) LOOP(1) CODE(3); #endif #if (RGL_MAX_NDIMS >= 4) # undef LOOPS # define LOOPS LOOP(4) LOOP(3) LOOP(2) LOOP(1) CODE(4); #endif #if (RGL_MAX_NDIMS >= 5) # undef LOOPS # define LOOPS LOOP(5) LOOP(4) LOOP(3) LOOP(2) LOOP(1) CODE(5); #endif #if (RGL_MAX_NDIMS >= 6) # undef LOOPS # define LOOPS LOOP(6) LOOP(5) LOOP(4) LOOP(3) LOOP(2) LOOP(1) CODE(6); #endif #if (RGL_MAX_NDIMS >= 7) # undef LOOPS # define LOOPS LOOP(7) LOOP(6) LOOP(5) LOOP(4) LOOP(3) LOOP(2) LOOP(1) CODE(7); #endif #if (RGL_MAX_NDIMS >= 8) # undef LOOPS # define LOOPS LOOP(8) LOOP(7) LOOP(6) LOOP(5) LOOP(4) LOOP(3) LOOP(2) LOOP(1) CODE(8); #endif return -11.0; } #endif /* RGL_ROUGHNESS */ /*---------------------------------------------------------------------------*/ /* CLEANUP */ /* Undefine all macros. */ #undef RGL_COST #undef RGL_PERIODIC #undef RGL_ROUGHNESS #endif /* _RGL_CODE */ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_rgl_test.i000066400000000000000000000164441253351442600164300ustar00rootroot00000000000000func rgl1(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(2:0,..) += r; g(1:-1,..) -= r; } return penalty; } func rgl2(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(,dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(,2:0,..) += r; g(,1:-1,..) -= r; } return penalty; } func rgl3(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(,,dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(,,2:0,..) += r; g(,,1:-1,..) -= r; } return penalty; } func rgl4(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(,,,dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(,,,2:0,..) += r; g(,,,1:-1,..) -= r; } return penalty; } func rgl5(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(,,,,dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(,,,,2:0,..) += r; g(,,,,1:-1,..) -= r; } return penalty; } func rgl6(mu,x,g) { if (! mu) return 0.0; r = (2.0*mu)*x(,,,,,dif,..); penalty = (0.25/mu)*sum(r*r); if (! is_void(g)) { g(,,,,,2:0,..) += r; g(,,,,,1:-1,..) -= r; } return penalty; } /* periodic */ func rgl_p(mu,off,x,&g) { if (mu == 0.0 || noneof(off)) { return 0.0; } dims = dimsof(x); ndims = dims(1); noffs = numberof(off); if (noffs > ndims) { if (anyof(off(ndims+1:noffs))) { error, "non-zero extra offset(s)"; } off = off(1:ndims); } else if (noffs < ndims) { grow, off, array(long, ndims - noffs); } r = roll(x, off) - x; if (! is_void(g)) { g += (2.0*mu)*(roll(r, -off) - r); } return mu*sum(r*r); } func rgl1a(hyper,f,x,g) { local grd; if (! hyper(1)) return 0.0; r = x(dif,..); if (is_void(g)) return f(hyper, r); penalty = f(hyper, r, grd); g(2:0,..) += grd; g(1:-1,..) -= grd; return penalty; } func rgl2a(hyper,f,x,g) { local grd; if (! hyper(1)) return 0.0; r = x(,dif,..); if (is_void(g)) return f(hyper, r); penalty = f(hyper, r, grd); g(,2:0,..) += grd; g(,1:-1,..) -= grd; return penalty; } func rgl3a(hyper,f,x,g) { local grd; if (! hyper(1)) return 0.0; r = x(,,dif,..); if (is_void(g)) return f(hyper, r); penalty = f(hyper, r, grd); g(,,2:0,..) += grd; g(,,1:-1,..) -= grd; return penalty; } func rgl4a(hyper,f,x,g) { local grd; if (! hyper(1)) return 0.0; r = x(,,,dif,..); if (is_void(g)) return f(hyper, r); penalty = f(hyper, r, grd); g(,,,2:0,..) += grd; g(,,,1:-1,..) -= grd; return penalty; } func rgl5a(hyper,f,x,g) { local grd; if (! hyper(1)) return 0.0; r = x(,,,,dif,..); if (is_void(g)) return f(hyper, r); penalty = f(hyper, r, grd); g(,,,,2:0,..) += grd; g(,,,,1:-1,..) -= grd; return penalty; } func rgl_test { format = "%2d %-15s - delta_penalty = %9.2g / max(|delta_gradient|) = %g\n"; mu = pi; eps = 0.1; x = random(3,4,5,6,7) - 0.5; /*----------------------------------------*/ cost = "l2"; g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl1(mu,x,g0); e1 = rgl_roughness_l2(mu,1,x,g1); write, format=format, 1, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl2(mu,x,g0); e1 = rgl_roughness_l2(mu,[0,1],x,g1); write, format=format, 2, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl3(mu,x,g0); e1 = rgl_roughness_l2(mu,[0,0,1],x,g1); write, format=format, 3, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl4(mu,x,g0); e1 = rgl_roughness_l2(mu,[0,0,0,1],x,g1); write, format=format, 4, cost, e1 - e0, max(abs(g1 - g0)); /*----------------------------------------*/ cost = "l2_periodic"; g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = 1; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 5, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = [0,1]; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 6, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = [0,0,1]; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 7, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = [0,1,1]; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 8, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = [0,2,0,1]; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 9, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); off = [0,2,0,-3]; e0 = rgl_p(mu,off,x,g0); e1 = rgl_roughness_l2_periodic(mu,off,x,g1); write, format=format, 10, cost, e1 - e0, max(abs(g1 - g0)); /*----------------------------------------*/ cost = "l2l1"; hyper = [mu, eps]; g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl1a(hyper,cost_l2l1,x,g0); e1 = rgl_roughness_l2l1(hyper,1,x,g1); write, format=format, 11, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl2a(hyper,cost_l2l1,x,g0); e1 = rgl_roughness_l2l1(hyper,[0,1],x,g1); write, format=format, 12, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl3a(hyper,cost_l2l1,x,g0); e1 = rgl_roughness_l2l1(hyper,[0,0,1],x,g1); write, format=format, 13, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl4a(hyper,cost_l2l1,x,g0); e1 = rgl_roughness_l2l1(hyper,[0,0,0,1],x,g1); write, format=format, 14, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl5a(hyper,cost_l2l1,x,g0); e1 = rgl_roughness_l2l1(hyper,[0,0,0,0,1],x,g1); write, format=format, 15, cost, e1 - e0, max(abs(g1 - g0)); /*----------------------------------------*/ cost = "l2l0"; hyper = [mu, eps]; g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl1a(hyper,cost_l2l0,x,g0); e1 = rgl_roughness_l2l0(hyper,1,x,g1); write, format=format, 16, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl2a(hyper,cost_l2l0,x,g0); e1 = rgl_roughness_l2l0(hyper,[0,1],x,g1); write, format=format, 17, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl3a(hyper,cost_l2l0,x,g0); e1 = rgl_roughness_l2l0(hyper,[0,0,1],x,g1); write, format=format, 18, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl4a(hyper,cost_l2l0,x,g0); e1 = rgl_roughness_l2l0(hyper,[0,0,0,1],x,g1); write, format=format, 19, cost, e1 - e0, max(abs(g1 - g0)); g0 = array(double, dimsof(x)); g1 = array(double, dimsof(x)); e0 = rgl5a(hyper,cost_l2l0,x,g0); e1 = rgl_roughness_l2l0(hyper,[0,0,0,0,1],x,g1); write, format=format, 20, cost, e1 - e0, max(abs(g1 - g0)); } plug_dir,"."; include,"./yeti.i"; rgl_test; Yeti-6.4.0/core/yeti_sort.c000066400000000000000000000250321253351442600155570ustar00rootroot00000000000000/* * yeti_sort.c - * * Implement sorting functions in Yeti. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #ifndef _YETI_SORT_C #define _YETI_SORT_C 1 #include "config.h" #include "yeti.h" #if (YORICK_VERSION_MAJOR == 1) && (YORICK_VERSION_MINOR <= 4) # include #else # include "pstdio.h" #endif #include #include #include #include "yio.h" #define index_t long #define value_t unsigned char #define HEAPSORT _yeti_heapsort_c #define HEAPSORT1 _yeti_heapsort1_c #define QUICKSELECT _yeti_quick_select_c #include __FILE__ #define value_t short #define HEAPSORT _yeti_heapsort_s #define HEAPSORT1 _yeti_heapsort1_s #define QUICKSELECT _yeti_quick_select_s #include __FILE__ #define value_t int #define HEAPSORT _yeti_heapsort_i #define HEAPSORT1 _yeti_heapsort1_i #define QUICKSELECT _yeti_quick_select_i #include __FILE__ #define value_t long #define HEAPSORT _yeti_heapsort_l #define HEAPSORT1 _yeti_heapsort1_l #define QUICKSELECT _yeti_quick_select_l #include __FILE__ #define value_t float #define HEAPSORT _yeti_heapsort_f #define HEAPSORT1 _yeti_heapsort1_f #define QUICKSELECT _yeti_quick_select_f #include __FILE__ #define value_t double #define HEAPSORT _yeti_heapsort_d #define HEAPSORT1 _yeti_heapsort1_d #define QUICKSELECT _yeti_quick_select_d #include __FILE__ extern BuiltIn Y_heapsort; void Y_heapsort(int argc) { Operand op; index_t *index = NULL, number; if (argc != 1) YError("heapsort takes exactly one argument"); if (! sp->ops) YError("unexpected keyword"); sp->ops->FormOperand(sp, &op); number = op.type.number; if (CalledAsSubroutine()) { switch (op.ops->typeID) { case T_CHAR: _yeti_heapsort_c(op.value, number); return; case T_SHORT: _yeti_heapsort_s(op.value, number); return; case T_INT: _yeti_heapsort_i(op.value, number); return; case T_LONG: _yeti_heapsort_l(op.value, number); return; case T_FLOAT: _yeti_heapsort_f(op.value, number); return; case T_DOUBLE: _yeti_heapsort_d(op.value, number); return; } } else { switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: case T_DOUBLE: index = YETI_PUSH_NEW_L(yeti_start_dimlist(number)); switch (op.ops->typeID) { case T_CHAR: _yeti_heapsort1_c(index, op.value, number); break; case T_SHORT: _yeti_heapsort1_s(index, op.value, number); break; case T_INT: _yeti_heapsort1_i(index, op.value, number); break; case T_LONG: _yeti_heapsort1_l(index, op.value, number); break; case T_FLOAT: _yeti_heapsort1_f(index, op.value, number); break; default: _yeti_heapsort1_d(index, op.value, number); break; } return; } } YError("bad data type"); } extern BuiltIn Y_quick_select; void Y_quick_select(int argc) { Operand op; index_t number, k, first, last, offset, elsize; Symbol *s; void *ptr; int in_place, type; if (argc < 2 || argc > 4) { YError("quick_select takes 2 to 4 arguments"); } s = &sp[1 - argc]; if (argc >= 4) { last = yeti_get_optional_integer(s + 3, 0); } else { last = 0; } if (argc >= 3) { first = yeti_get_optional_integer(s + 2, 1); } else { first = 1; } k = YGetInteger(s + 1); if (! s->ops) YError("unexpected keyword"); s->ops->FormOperand(s, &op); number = op.type.number; type = op.ops->typeID; elsize = op.type.base->size; switch (type) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: case T_DOUBLE: if (k <= 0) k += number; if (k <= 0 || k > number) YError("out of range index K"); if (first <= 0) first += number; if (first <= 0 || first > number) YError("out of range index FIRST"); if (last <= 0) last += number; if (last <= 0 || last > number) YError("out of range index LAST"); if (last < first || k < first || k > last) { YError("selected index range is empty"); } break; default: YError("bad data type"); } in_place = CalledAsSubroutine(); if (in_place) { ptr = op.value; } else { /* creates a temporary copy as needed */ if (op.references) { ptr = (void*)(((Array*)PushDataBlock(NewArray(op.type.base, op.type.dims)))->value.c); memcpy(ptr, op.value, number*elsize); PopTo(s); } else { ptr = op.value; } } offset = first - 1; number = last - first + 1; k -= first; /* must be zero-based index */ ptr = (void *)(((char *)ptr) + offset*elsize); switch (type) { case T_CHAR: _yeti_quick_select_c(k, number, (unsigned char *)ptr); if (! in_place) { yeti_push_char_value(((unsigned char *)ptr)[k]); } break; case T_SHORT: _yeti_quick_select_s(k, number, (short *)ptr); if (! in_place) { yeti_push_short_value(((short *)ptr)[k]); } break; case T_INT: _yeti_quick_select_i(k, number, (int *)ptr); if (! in_place) { yeti_push_int_value(((int *)ptr)[k]); } break; case T_LONG: _yeti_quick_select_l(k, number, (long *)ptr); if (! in_place) { yeti_push_long_value(((long *)ptr)[k]); } break; case T_FLOAT: _yeti_quick_select_f(k, number, (float *)ptr); if (! in_place) { yeti_push_float_value(((float *)ptr)[k]); } break; case T_DOUBLE: _yeti_quick_select_d(k, number, (double *)ptr); if (! in_place) { yeti_push_double_value(((double *)ptr)[k]); } break; } } #else /* _YETI_SORT_C */ #ifdef HEAPSORT /* HEAPSORT - in-place sorting of array */ static void HEAPSORT(value_t a[], const index_t n) { index_t i,j,k,l; value_t asave; if (n < 2) return; k = n/2; l = n - 1; for (;;) { if (k > 0) { asave = a[--k]; } else { asave = a[l]; a[l] = a[0]; if (--l == 0) { a[0] = asave; return; } } i = k; while ((j = 2*i + 1) <= l) { if (j < l && a[j] < a[j + 1]) ++j; if (a[j] <= asave) break; a[i] = a[j]; i = j; } a[i] = asave; } } #endif /* HEAPSORT */ #ifdef HEAPSORT0 /* HEAPSORT0 - indirect sorting of an array, with C-indexing (starting at 0) */ static void HEAPSORT0(index_t index[], const value_t a[], const index_t n) { index_t i,j,k,l,isave; value_t asave; for (i=0 ; i 0) { isave = index[--k]; } else { isave = index[l]; index[l] = index[0]; if (--l == 0) { index[0] = isave; return; } } asave = a[isave]; i = k; while ((j = 2*i + 1) <= l) { if (j < l && a[index[j]] < a[index[j + 1]]) ++j; if (a[index[j]] <= asave) break; index[i] = index[j]; i = j; } index[i] = isave; } } #endif /* HEAPSORT0 */ #ifdef HEAPSORT1 /* HEAPSORT1 - indirect sorting of an array, with Yorick/FORTRAN indexing (starting at 1) */ static void HEAPSORT1(index_t index[], const value_t *a, index_t n) { index_t i,j,k,l,isave; value_t asave; for (i=0 ; i 0) { isave = index[--k]; } else { isave = index[l]; index[l] = index[0]; if (--l == 0) { index[0] = isave; return; } } asave = a[isave]; i = k; while ((j = 2*i + 1) <= l) { if (j < l && a[index[j]] < a[index[j + 1]]) ++j; if (a[index[j]] <= asave) break; index[i] = index[j]; i = j; } index[i] = isave; } } #endif /* HEAPSORT1 */ #ifdef QUICKSELECT #define SWAP(a,b) t=(a);(a)=(b);(b)=t static value_t QUICKSELECT(long k, long n, value_t arr[]) { index_t i, j, top, bot, mid; value_t a, t; bot = 0; top = n - 1; for (;;) { if (top <= bot + 1) { if (top == bot + 1 && arr[bot] > arr[top]) { SWAP(arr[bot], arr[top]); } return arr[k]; } else { mid = (bot + top)/2; SWAP(arr[mid], arr[bot + 1]); if (arr[bot] > arr[top]) { SWAP(arr[bot], arr[top]); } if (arr[bot + 1] > arr[top]) { SWAP(arr[bot + 1], arr[top]); } if (arr[bot] > arr[bot + 1]) { SWAP(arr[bot], arr[bot + 1]); } i = bot + 1; j = top; a = arr[i]; for (;;) { while (arr[++i] < a) ; while (arr[--j] > a) ; if (j < i) break; SWAP(arr[i], arr[j]); } arr[bot + 1] = arr[j]; arr[j] = a; if (j >= k) top = j - 1; if (j <= k) bot = i; } } } #undef SWAP #endif /* QUICKSELECT */ #undef HEAPSORT #undef HEAPSORT0 #undef HEAPSORT1 #undef QUICKSELECT #undef value_t #endif /* _YETI_SORT_C */ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_sparse.c000066400000000000000000000513671253351442600160770ustar00rootroot00000000000000/* * yeti_sparse.c - * * Implement sparse matrix in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include #include /* Debug level: 0 or undefined = none, * 1 = perform assertions, * 2 = verbose debug. */ #undef YETI_SPARSE_DEBUG extern BuiltIn Y_sparse_matrix, Y_is_sparse_matrix; extern BuiltIn Y_mvmult; #if defined(__GNUC__) && __GNUC__ > 1 extern void YError(const char *msg) __attribute__ ((noreturn)); #endif /*--------------------------------------------------------------------------*/ /* IMPLEMENTATION OF SPARSE MATRICES AS OPAQUE YORICK OBJECTS */ extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; static MemberOp sparse_get_member; static UnaryOp sparse_print; static void sparse_free(void *addr); /* ******* Use Unref(obj) ******* */ static void sparse_eval(Operand *op); Operations sparseOps = { &sparse_free, T_OPAQUE, 0, /* promoteID = */T_STRING/* means illegal */, "sparse_matrix", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &sparse_eval, &SetupX, &sparse_get_member, &MatMultX, &sparse_print }; /* * Sparse matrices store a list of non-zero coefficients and two lists of * indices: the 'row' and 'column' indices of the non-zero coefficients. * The so-called 'rows' and 'colums' can have any dimension list and * represent respectively the output and input spaces of the matrix * (i.e. the matrix operates onto a 'vector' in the input space to produce * a 'vector' in the output space). With Yorick dimension ordering in * mind, the 'rows' and 'colums' represent respectively the leading and * trailing dimensions of the matrix. * * The 'sparse' structure describes a sparse matrix. * * The 'index' structure describes the row/column index of a matrix. */ typedef struct index index_t; typedef struct sparse sparse_t; struct index { size_t nelem; /* number of elements in indexed array */ size_t ndims; /* number of dimensions in DIMLIST */ size_t *dimlist; /* list of dimensions */ size_t *indices; /* indices of non-zero elements along this dimension */ }; struct sparse { int references; /* reference counter */ Operations *ops; /* virtual function table */ size_t number; /* number of non-zero elements */ index_t row, col; /* indices for rows / colums of non zero elements */ void *coefs; /* non-zero elements of the sparse matrix */ }; static void sparse_print(Operand *op) { sparse_t *obj = (sparse_t *)op->value; char line[80]; ForceNewline(); PrintFunc("Object of type: "); PrintFunc(obj->ops->typeName); sprintf(line, " (references=%d)", obj->references); PrintFunc(line); ForceNewline(); } static void sparse_free(void *addr) { /* A sparse matrix is allocated as a single memory chunk. */ if (addr) p_free(addr); } static long *get_array_l(Symbol *s, size_t *number); static double *get_array_d(Symbol *s, size_t *number); static long *get_dimlist(Symbol *s, size_t *ndims_ptr, size_t *nelem_ptr); static unsigned int get_flags(Symbol *s, unsigned int default_value); /** Push a new array with given dimension list. If DIMLIST is NULL, then the array is a vector of length N; otherwise N is the number of dimensions (and the number of lements in DIMLIST). */ static Array *push_new_array(StructDef *base, size_t n, const size_t dimlist[]); /** Pop topmost stack element in place of OWNER. If CLEANUP is true, drop symbols from top of the stack until OWNER is the topmost one. */ static void pop_to(Symbol *owner, int cleanup); /* usage: sparse_matrix(coefs, row_dimlist, row_indices, * col_dimlist, col_indices) */ void Y_sparse_matrix(int argc) { size_t off1, off2, nint, size; size_t i, number=0, ndims1, nelem1, len1, ndims2, nelem2, len2; long *dims1, *idx1, *dims2, *idx2; size_t *row_indices, *col_indices; double *coefs, *nonzero; sparse_t *sparse; /* Parse the arguments. */ if (argc != 5) { YError("sparse_matrix takes exactly 5 arguments"); } nonzero = get_array_d(sp - 4, &number); dims1 = get_dimlist(sp - 3, &ndims1, &nelem1); idx1 = get_array_l(sp - 2, &len1); dims2 = get_dimlist(sp - 1, &ndims2, &nelem2); idx2 = get_array_l(sp , &len2); /* Check row (1st) indices. */ if (len1 != number) { YError("bad number of elements for list of row indices"); } for (i=0 ; i nelem1) { YError("out of range row index"); } } /* Check column (2nd) indices. */ if (len2 != number) { YError("bad number of elements for list of column indices"); } for (i=0 ; i nelem2) { YError("out of range column index"); } } /* Allocate memory for the sparse matrix. Push the opaque object as soon as possible onto the stack to limit memory leak in case of interrupt. */ #define ROUND_UP(a,b) ((((a) + (b) - 1)/(b))*(b)) off1 = ROUND_UP(sizeof(sparse_t), sizeof(size_t)); nint = ndims1 + number + ndims2 + number; off2 = ROUND_UP(off1 + nint*sizeof(size_t), sizeof(double)); size = ROUND_UP(off2 + number*sizeof(double), sizeof(double)); sparse = p_malloc(size); sparse->references = 0; sparse->ops = &sparseOps; PushDataBlock(sparse); /* early push */ sparse->number = number; sparse->row.nelem = nelem1; sparse->row.ndims = ndims1; sparse->row.dimlist = (size_t *)((char *)sparse + off1); sparse->row.indices = sparse->row.dimlist + ndims1; sparse->col.nelem = nelem2; sparse->col.ndims = ndims2; sparse->col.dimlist = sparse->row.indices + number; sparse->col.indices = sparse->col.dimlist + ndims2; sparse->coefs = (double *)((char *)sparse + off2); /* Fill up coefficients and list of row/column indices (beware that Yorick uses 1-based indices). */ for (i=0 ; irow.dimlist[i] = dims1[i]; } for (i=0 ; icol.dimlist[i] = dims2[i]; } coefs = sparse->coefs; row_indices = sparse->row.indices; col_indices = sparse->col.indices; for (i=0 ; iops == &referenceSym ? &globTab[sp->index] : sp); result = (s->ops == &dataBlockSym && s->value.db->ops == &sparseOps); PushIntValue(result); } static long *get_array_l(Symbol *s, size_t *number_ptr) { Operand op; if (! s->ops) YError("unexpected keyword argument"); switch (s->ops->FormOperand(s, &op)->ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: op.ops->ToLong(&op); case T_LONG: if (number_ptr) { size_t number = 1; Dimension *dims = op.type.dims; while (dims) { number *= dims->number; dims = dims->next; } *number_ptr = number; } return op.value; } YError("expecting array of integers"); return 0; } static double *get_array_d(Symbol *s, size_t *number_ptr) { Operand op; if (! s->ops) YError("unexpected keyword argument"); switch (s->ops->FormOperand(s, &op)->ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); case T_DOUBLE: if (number_ptr) { size_t number = 1; Dimension *dims = op.type.dims; while (dims) { number *= dims->number; dims = dims->next; } *number_ptr = number; } return op.value; } YError("expecting array of reals"); return 0; } static long *get_dimlist(Symbol *s, size_t *ndims_ptr, size_t *nelem_ptr) { Operand op; size_t i, ndims, nelem; long *dimlist; Dimension *dims; if (! s->ops) { goto bad_dimlist; } switch (s->ops->FormOperand(s, &op)->ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: op.ops->ToLong(&op); case T_LONG: dims = op.type.dims; if (! dims) { ndims = 1; dimlist = (long *)op.value; } else if (! dims->next && (ndims = *(long *)op.value) == dims->number - 1) { dimlist = (long *)op.value + 1; } else { goto bad_dimlist; } nelem = 1; for (i=0 ; iops==&longScalar) return s->value.l; if (s->ops==&intScalar) return s->value.i; if (! s->ops->FormOperand(s, &op)->type.dims) { switch (op.ops->typeID) { case T_CHAR: return *(char*)op.value;; case T_SHORT: return *(short*)op.value; case T_INT: return *(int*)op.value; case T_LONG: return *(long*)op.value; case T_VOID: return default_value; } } YError("expecting nil or integer scalar argument"); return 0; /* avoids compiler warnings */ } /* sparse_eval implements sparse matrix used as a function (or as an indexed array). */ static void sparse_eval(Operand *op0) { Operand op; size_t k, number; Symbol *sym, *stack = op0->owner; Dimension *dims; sparse_t *sparse; const size_t *j, *i; const index_t *inp, *out; const double *a, *x; double *y; unsigned int flags; if (sp - stack > 2) { YError("sparse matrix operator takes 1 or 2 arguments"); } /* Get the 'matrix'. */ #if defined(YETI_SPARSE_DEBUG) && YETI_SPARSE_DEBUG >= 1 sym = (stack->ops == &referenceSym) ? &globTab[stack->index] : stack; if (sym->ops != &dataBlockSym || sym->value.db->ops != &sparseOps) YError("unexpected non-sparse matrix object (must be a BUG!)"); sparse = (sparse_t *)sym->value.db; #else sparse = (sparse_t *)stack->value.db; #endif /* Get the flags. */ if (sp - stack == 2) { flags = get_flags(sp, 0); } else { flags = 0; } if (! flags) { inp = &sparse->col; out = &sparse->row; } else if (flags == 1) { inp = &sparse->row; out = &sparse->col; } else { inp = 0; /* avoids compiler warning */ out = 0; /* avoids compiler warning */ YError("unsupported job value (should be 0 or 1)"); } /* Get the input 'vector'. */ sym = stack + 1; if (! sym->ops) YError("unexpected keyword argument"); switch (sym->ops->FormOperand(sym, &op)->ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); break; case T_DOUBLE: break; default: YError("bad data type for input 'vector'"); return; } number = 1; dims = op.type.dims; while (dims) { number *= dims->number; dims = dims->next; } if ((dims = op.type.dims) != NULL && dims->next) { /* Check the dimension list. */ k = inp->ndims; while (k-- >= 1) { if (! dims || dims->number != inp->dimlist[k]) { YError("bad dimension list for input 'vector'"); } dims = dims->next; } } else if (number != inp->nelem) { YError("bad number of elements for input 'vector'"); } x = op.value; /* Create the output 'vector' and perform the matrix multiplication. */ y = push_new_array(&doubleStruct, out->ndims, out->dimlist)->value.d; memset(y, 0, out->nelem*sizeof(*y)); i = out->indices; j = inp->indices; number = sparse->number; a = sparse->coefs; for (k=0 ; kowner, 1); } static void push_indices(const index_t *p, size_t number); static void push_dimlist(const index_t *p); static void sparse_get_member(Operand *op, char *name) { static long row_dimlist_id = -1L; static long row_indices_id = -1L; static long col_dimlist_id = -1L; static long col_indices_id = -1L; static long coefs_id = -1L; sparse_t *this = (sparse_t *)op->value; if (coefs_id < 0) { row_dimlist_id = Globalize("row_dimlist", 0L); row_indices_id = Globalize("row_indices", 0L); col_dimlist_id = Globalize("col_dimlist", 0L); col_indices_id = Globalize("col_indices", 0L); coefs_id = Globalize("coefs", 0L); } if (name) { long id = Globalize(name, 0L); int ok = 0; CheckStack(1); if (id == coefs_id) { memcpy(push_new_array(&doubleStruct, this->number, NULL)->value.d, this->coefs, this->number*sizeof(double)); ok = 1; } else if (id == row_dimlist_id) { push_dimlist(&this->row); ok = 1; } else if (id == row_indices_id) { push_indices(&this->row, this->number); ok = 1; } else if (id == col_dimlist_id) { push_dimlist(&this->col); ok = 1; } else if (id == col_indices_id) { push_indices(&this->col, this->number); ok = 1; } if (ok) { /* Pop result in place of owner symbol. */ pop_to(op->owner, 0); return; } } YError("illegal sparse matrix member"); } static void pop_to(Symbol *owner, int cleanup) { DataBlock *old = (owner->ops == &dataBlockSym) ? owner->value.db : NULL; Symbol *stack; owner->ops = &intScalar; /* avoid clash in case of interrupts */ stack = sp--; /* sp decremented BEFORE stack element is moved */ owner->value = stack->value; owner->ops = stack->ops; Unref(old); if (cleanup) { while (sp - owner > 0) { stack = sp--; /* sp decremented BEFORE stack element is deleted */ if (stack->ops == &dataBlockSym) Unref(stack->value.db); } } } static void push_dimlist(const index_t *p) { size_t i, ndims = p->ndims; const size_t *dimlist = p->dimlist; long *ptr = push_new_array(&longStruct, ndims + 1, NULL)->value.l; *ptr++ = ndims; for (i=0 ; ivalue.l; const size_t *index = p->indices; for (i=0 ; i 3) YError("mvmult takes 2 or 3 arguments"); stack = sp - argc + 1; if (! stack->ops) YError("unexpected keyword argument"); stack->ops->FormOperand(stack, &op); if (op.ops == &sparseOps) { sparse_eval(&op); /* that's all folks! */ } else { /* Get the optional flags. */ flags = (argc == 3 ? get_flags(sp, 0) : 0); if ((unsigned int)flags > 1U) { YError("unsupported job value (should be 0 or 1)"); } /* Get the 'matrix' A. */ switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); case T_DOUBLE: ndims_a = pack_dimlist(op.type.dims, dimlist_a, MAXDIMS); a = op.value; break; default: YError("expecting array of reals for the 'matrix'"); return; /* avoid compiler warnings */ } /* Get the 'vector' X. */ ++stack; if (! stack->ops) YError("unexpected keyword argument"); stack->ops->FormOperand(stack, &op); switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: op.ops->ToDouble(&op); case T_DOUBLE: ndims_x = pack_dimlist(op.type.dims, dimlist_x, MAXDIMS); x = op.value; break; default: YError("expecting array of reals for the 'vector'"); return; /* avoid compiler warnings */ } /* Cleanup temporary dimension list. */ dims = tmpDims; tmpDims = NULL; if (dims) FreeDimension(dims); /* Check dimension lists and build dimension list of the result. */ if (ndims_a < ndims_x) { bad_dim_list: YError("incompatible dimension lists"); return; /* avoid compiler warnings */ } ndims_y = ndims_a - ndims_x; nx = ny = 1; if (flags) { /* Leading dimensions of A must match dimensions of X, trailing dimensions of A are the dimensions of Y. */ for (i = 0 ; i < ndims_x ; ++i) { if (dimlist_x[i] != dimlist_a[i]) goto bad_dim_list; nx *= dimlist_x[i]; } for (i = ndims_x ; i < ndims_a ; ++i) { tmpDims = NewDimension(dimlist_a[i], 1L, tmpDims); ny *= dimlist_a[i]; } } else { /* Trailing dimensions of A must match dimensions of X, leading dimensions of A are the dimensions of Y. */ for (i = 0 ; i < ndims_x ; ++i) { if (dimlist_x[i] != dimlist_a[i + ndims_y]) goto bad_dim_list; nx *= dimlist_x[i]; } for (i = 0 ; i < ndims_y ; ++i) { tmpDims = NewDimension(dimlist_a[i], 1L, tmpDims); ny *= dimlist_a[i]; } } /* Allocate output array and perform matrix multiplication. */ y = ((Array *)PushDataBlock(NewArray(&doubleStruct, tmpDims)))->value.d; if (flags) { for (i = 0 ; i < ny ; ++i, a += nx) { s = zero; for (j = 0 ; j < nx ; ++j) { s += a[j]*x[j]; } y[i] = s; } } else { memset(y, 0, ny*sizeof(*y)); for (j = 0 ; j < nx ; ++j, a += ny) { if ((s = x[j]) != zero) { for (i = 0 ; i < ny ; ++i) { y[i] += a[i]*s; } } } } } } static size_t pack_dimlist(const Dimension *dims, size_t dimlist[], size_t maxdims) { const Dimension *ptr = dims; size_t i, n = 0; while (ptr) { ++n; ptr = ptr->next; } if (n > maxdims) YError("too many dimensions"); i = n; while (i-- >= 1) { dimlist[i] = dims->number; dims = dims->next; } return n; } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_sparse_test.i000066400000000000000000000027131253351442600171330ustar00rootroot00000000000000write, "*** TODO: check empty sparse matrix is possible"; write, "*** TODO: check scalar input/output spaces is possible"; write, "*** TODO: implement complex matrix"; write, "*** TODO: optimize (1) add coefficients with same indices"; write, "*** TODO: optimize (2) only keep non-zero coefficients (==> sparse_shrink becomes trivial)"; write, "*** TODO: optimize (3) sort coefficients according to transpose or not"; func sparse_test(row_dimlist, col_dimlist) { a = random(row_dimlist, col_dimlist) - 0.5; dims = dimsof(a); a *= (random(dims) < 0.9); /* make it sparse */ stride = numberof(array(char, row_dimlist)); nonzero = where(a); s = sparse_matrix(a(nonzero), row_dimlist, 1 + (nonzero - 1)%stride, col_dimlist, 1 + (nonzero - 1)/stride); /* make A a 2-D matrix. */ (ap = array(double, stride, numberof(a)/stride))(*) = a(*); x = random(col_dimlist); (y0 = array(double, row_dimlist))(*) = ap(,+)*(x(*))(+); y1 = s(x); y2 = mvmult(a, x); y3 = mvmult(s, x); write, format="y0 vs. y%d: %g\n", indgen(3), [max(abs(y1 - y0)), max(abs(y2 - y0)), max(abs(y3 - y0))]; u = random(row_dimlist); (v0 = array(double, col_dimlist))(*) = ap(+,)*(u(*))(+); v1 = s(u, 1); v2 = mvmult(a, u, 1); v3 = mvmult(s, u, 1); write, format="v0 vs. v%d: %g\n", indgen(3), [max(abs(v1 - v0)), max(abs(v2 - v0)), max(abs(v3 - v0))]; //error; } #if 1 sparse_test,[2,8,5],[3,11,2,3]; sparse_test,[3,8,5,11],[4,1,2,2,3]; #endif Yeti-6.4.0/core/yeti_symlink.c000066400000000000000000000202051253351442600162530ustar00rootroot00000000000000/* * yeti_symlink.c - * * Implement symbolic links in Yeti. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include "plugin.h" #include "ydata.h" #include "yio.h" #include "defmem.h" #include "pstdlib.h" /* Debug level: 0 or undefined = none, * 1 = perform assertions, * 2 = verbose debug. */ #define YETI_SYMLINK_DEBUG 0 extern BuiltIn Y_symlink_to_name, Y_symlink_to_variable; extern BuiltIn Y_is_symlink; extern BuiltIn Y_name_of_symlink, Y_value_of_symlink; extern DataBlock *ForceToDB(Symbol *s); /* Implement symbolic links as a foreign Yorick data type. */ typedef struct _symlink symlink_t; struct _symlink { int references; /* reference counter */ Operations *ops; /* virtual function table */ long index; /* index into global symbol table */ }; static symlink_t *new_symlink(long index); static void free_symlink(void *list); /* ******* Use Unref(list) ******* */ static void dereference_symlink(Operand *op); extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; static UnaryOp eval_symlink; static UnaryOp print_symlink; static MemberOp get_symlink_member; Operations symlink_ops = { &free_symlink, T_OPAQUE, 0, T_STRING, "symlink", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &eval_symlink, &SetupX, &get_symlink_member, &MatMultX, &print_symlink }; /* Set up a block allocator which grabs space for 64 symlink_t objects at a time. */ static MemryBlock symlink_block= {0, 0, sizeof(symlink_t), 64*sizeof(symlink_t)}; static symlink_t *new_symlink(long index) { symlink_t *lnk = NextUnit(&symlink_block); lnk->references = 0; lnk->ops = &symlink_ops; lnk->index = index; return lnk; } static void free_symlink(void *addr) /* ******* Use Unref(lnk) ******* */ { /* symlink_t *lnk = addr; */ FreeUnit(&symlink_block, addr); } static void print_symlink(Operand *op) { symlink_t *lnk = op->value; char *name = globalTable.names[lnk->index]; ForceNewline(); PrintFunc("symbolic link to \""); PrintFunc(name); PrintFunc("\""); ForceNewline(); } static void dereference_symlink(Operand *op) { Symbol *s, *owner; symlink_t *lnk; DataBlock *db; /* Replace owner by the globTab symbol which is indexed by the symbolic link object. */ owner = op->owner; #if defined(YETI_SYMLINK_DEBUG) && YETI_SYMLINK_DEBUG >= 1 if (! owner || (owner - sp) > 0 || (owner - spBottom) < 0) { /* owner should be on stack if this called from Eval or Print */ YError("symbolic link object evaluated in illegal situation"); } #endif lnk = op->value; s = &globTab[lnk->index]; if (owner->ops == &dataBlockSym) { /* always take this branch, and Unref usually does free_symlink */ owner->ops = &intScalar; Unref(owner->value.db); #if defined(YETI_SYMLINK_DEBUG) && YETI_SYMLINK_DEBUG >= 1 } else { YError("assertion failed for symbolic link object"); #endif } if (s->ops == &dataBlockSym) { db = s->value.db; if (db->ops == &symlink_ops) { YError("illegal symbolic link to symbolic link"); } owner->value.db = Ref(db); owner->ops = s->ops; } else { owner->value = s->value; owner->ops = s->ops; db = ForceToDB(owner); } op->ops = db->ops; op->value = db; } static void eval_symlink(Operand *op) { dereference_symlink(op); op->ops->Eval(op); } static void get_symlink_member(Operand *op, char *name) { dereference_symlink(op); op->ops->GetMember(op, name); } void Y_symlink_to_variable(int nargs) { if (nargs != 1) { YError("symlink_to_variable takes exactly one argument"); } if (sp->ops != &referenceSym) { YError("expecting simple variable reference"); } PushDataBlock(new_symlink(sp->index)); } void Y_symlink_to_name(int nargs) { Operand op; const char *name; int i, c; if (nargs != 1) { YError("symlink_to_name takes exactly one argument"); } if (! sp->ops) { YError("unexpected keyword argument"); } sp->ops->FormOperand(sp, &op); if (op.ops->typeID != T_STRING || op.type.dims) { YError("expecting scalar string argument"); } name = *(char **)op.value; i = -1; if (name) { while ((c = name[++i]) != '\0') { if ((c < 'a' || c > 'z') && (c < 'A' || c > 'Z') && (c != '_') && (i == 0 || c < '0' || c > '9')) { i = -1; break; } } } if (i <= 0) { YError("invalid symbol name"); } PushDataBlock(new_symlink(Globalize(name, i))); } void Y_is_symlink(int nargs) { Symbol *s; int result; if (nargs != 1) YError("is_symlink takes exactly one argument"); s = (sp->ops == &referenceSym ? &globTab[sp->index] : sp); result = (s->ops == &dataBlockSym && s->value.db->ops == &symlink_ops); PushIntValue(result); } void Y_name_of_symlink(int nargs) { Symbol *s; char *name; symlink_t *lnk; Array *array; if (nargs != 1) YError("name_of_symlink takes exactly one argument"); s = (sp->ops == &referenceSym ? &globTab[sp->index] : sp); if (s->ops != &dataBlockSym || s->value.db->ops != &symlink_ops) { YError("expecting a symbolic link object"); } lnk = (symlink_t *)s->value.db; name = globalTable.names[lnk->index]; array = (Array *)PushDataBlock(NewArray(&stringStruct, NULL)); array->value.q[0] = p_strcpy(name); } void Y_value_of_symlink(int nargs) { Symbol *s, *stack; DataBlock *db; symlink_t *lnk; if (nargs != 1) YError("value_of_symlink takes exactly one argument"); s = (sp->ops == &referenceSym ? &globTab[sp->index] : sp); if (s->ops != &dataBlockSym || s->value.db->ops != &symlink_ops) { YError("expecting a symbolic link object"); } lnk = (symlink_t *)s->value.db; s = &globTab[lnk->index]; if (s->ops == &dataBlockSym) { db = s->value.db; PushDataBlock(Ref(db)); } else { stack = sp + 1; stack->ops = s->ops; stack->value = s->value; sp = stack; } } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_template.c000066400000000000000000000040031253351442600163760ustar00rootroot00000000000000/* * yeti_template.c -- * * Implementation of XXXX in Yeti. * *----------------------------------------------------------------------------- * * Copyright (C) 2009 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/ or redistribute the software under the terms of the CeCILL-C license * as circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include #include #include #include /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_test.i000066400000000000000000000075131253351442600155610ustar00rootroot00000000000000if (! is_integer(0)) error; if (! is_integer(char(0))) error; if (! is_integer(short(0))) error; if (! is_integer(int(0))) error; if (! is_integer(long(0))) error; if ( is_integer(0.0)) error; if ( is_integer(float(0))) error; if ( is_integer(double(0))) error; if ( is_integer(complex(0))) error; if ( is_integer(string(0))) error; if (! is_integer((x=[0]))) error; if (! is_integer((x=[char(0)]))) error; if (! is_integer((x=[short(0)]))) error; if (! is_integer((x=[int(0)]))) error; if (! is_integer((x=[long(0)]))) error; if ( is_integer((x=[0.0]))) error; if ( is_integer((x=[float(0)]))) error; if ( is_integer((x=[double(0)]))) error; if ( is_integer((x=[complex(0)]))) error; if ( is_integer((x=[string(0)]))) error; if ( is_real(0)) error; if ( is_real(char(0))) error; if ( is_real(short(0))) error; if ( is_real(int(0))) error; if ( is_real(long(0))) error; if (! is_real(0.0)) error; if (! is_real(float(0))) error; if (! is_real(double(0))) error; if ( is_real(complex(0))) error; if ( is_real(string(0))) error; if ( is_real((x=[0]))) error; if ( is_real((x=[char(0)]))) error; if ( is_real((x=[short(0)]))) error; if ( is_real((x=[int(0)]))) error; if ( is_real((x=[long(0)]))) error; if (! is_real((x=[0.0]))) error; if (! is_real((x=[float(0)]))) error; if (! is_real((x=[double(0)]))) error; if ( is_real((x=[complex(0)]))) error; if ( is_real((x=[string(0)]))) error; if ( is_complex(0)) error; if ( is_complex(char(0))) error; if ( is_complex(short(0))) error; if ( is_complex(int(0))) error; if ( is_complex(long(0))) error; if ( is_complex(0.0)) error; if ( is_complex(float(0))) error; if ( is_complex(double(0))) error; if (! is_complex(complex(0))) error; if ( is_complex(string(0))) error; if ( is_complex((x=[0]))) error; if ( is_complex((x=[char(0)]))) error; if ( is_complex((x=[short(0)]))) error; if ( is_complex((x=[int(0)]))) error; if ( is_complex((x=[long(0)]))) error; if ( is_complex((x=[0.0]))) error; if ( is_complex((x=[float(0)]))) error; if ( is_complex((x=[double(0)]))) error; if (! is_complex((x=[complex(0)]))) error; if ( is_complex((x=[string(0)]))) error; if (! is_numerical(0)) error; if (! is_numerical(char(0))) error; if (! is_numerical(short(0))) error; if (! is_numerical(int(0))) error; if (! is_numerical(long(0))) error; if (! is_numerical(0.0)) error; if (! is_numerical(float(0))) error; if (! is_numerical(double(0))) error; if (! is_numerical(complex(0))) error; if ( is_numerical(string(0))) error; if (! is_numerical((x=[0]))) error; if (! is_numerical((x=[char(0)]))) error; if (! is_numerical((x=[short(0)]))) error; if (! is_numerical((x=[int(0)]))) error; if (! is_numerical((x=[long(0)]))) error; if (! is_numerical((x=[0.0]))) error; if (! is_numerical((x=[float(0)]))) error; if (! is_numerical((x=[double(0)]))) error; if (! is_numerical((x=[complex(0)]))) error; if ( is_numerical((x=[string(0)]))) error; /* check for L-value bug. */ t1 = 1; t2 = 1.0; t3 = "hello"; vp = [&t1, &t2, &t3]; if (! is_scalar(*vp(1))) error; if (! is_scalar(*vp(2))) error; if (! is_scalar(*vp(3))) error; if (! is_integer(*vp(1))) error; if ( is_integer(*vp(2))) error; if ( is_integer(*vp(3))) error; if ( is_real(*vp(1))) error; if (! is_real(*vp(2))) error; if ( is_real(*vp(3))) error; if (! is_numerical(*vp(1))) error; if (! is_numerical(*vp(2))) error; if ( is_numerical(*vp(3))) error; if ( is_complex(*vp(1))) error; if ( is_complex(*vp(2))) error; if ( is_complex(*vp(3))) error; if ( is_string(*vp(1))) error; if ( is_string(*vp(2))) error; if (! is_string(*vp(3))) error; func yeti_test_quick_quartile { for (n = 300; n <= 400; ++n) { x = random_n(n); q = quick_quartile(x); n1 = numberof(where(x < q(1))); n2 = numberof(where(x < q(2))); n3 = numberof(where(x < q(3))); f = 1e2/n; write, format="%4d: %.2f%% %.2f%% %.2f%%\n", n, f*n1, f*n2, f*n3; } } Yeti-6.4.0/core/yeti_utils.c000066400000000000000000000522241253351442600157330ustar00rootroot00000000000000/* * yeti_utils.c - * * Routines to make the coding of built-in Yorick's functions easier. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2010 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include "config.h" #include "yeti.h" #include "yio.h" /*---------------------------------------------------------------------------*/ /* PRIVATE ROUTINES */ static void unexpected_keyword_argument(void); /*---------------------------------------------------------------------------*/ char *yeti_strcpy(const char *s) { if (s) { size_t len = strlen(s); char *t = p_stralloc(len); memcpy(t, s, len); t[len] = '\0'; return t; } return (char *)0; } char *yeti_strncpy(const char *s, size_t len) { if (s) { char *t = p_stralloc(len); memcpy(t, s, len); t[len] = '\0'; return t; } return (char *)0; } int yeti_is_nil(Symbol *s) { YETI_SOLVE_REF(s); return (s->ops==&dataBlockSym && s->value.db==&nilDB); } #undef FUNCTION #define FUNCTION(NAME, OPS) \ int yeti_is_##NAME(Symbol *s) \ { \ YETI_SOLVE_REF(s); \ return (s->ops==&dataBlockSym && s->value.db->ops==&OPS); \ } FUNCTION(void, voidOps) FUNCTION(range, rangeOps) FUNCTION(structdef, structDefOps) FUNCTION(stream, streamOps) void yeti_bad_argument(Symbol *s) { static char buf[80]; char *msg; if (s->ops==NULL) { msg = "unexpected keyword argument"; } else if (s->ops==&intScalar) { msg = "unexpected int scalar argument"; } else if (s->ops==&longScalar) { msg = "unexpected long scalar argument"; } else if (s->ops==&doubleScalar) { msg = "unexpected double scalar argument"; } else if (s->ops==&dataBlockSym) { sprintf(buf, "unexpected %s%s argument", s->value.db->ops->typeName, (s->value.db->ops->isArray ? " array" : "")); msg = buf; } else if (s->ops==&referenceSym) { /* I assume that YETI_SOLVE_REF or ReplaceRef was already applied to that symbol. */ msg = "***BUG*** too many reference levels"; } else if (s->ops==&returnSym) { msg = "***BUG*** unexpected return symbol"; } else { msg = "***BUG*** unknown symbol type"; } YError(msg); } static void unexpected_keyword_argument(void) { unexpected_keyword_argument(); } void yeti_unknown_keyword(void) { YError("unrecognized keyword in builtin function call"); } void yeti_debug_symbol(Symbol *s) { fprintf(stderr, "yeti_debug_symbol: s= (Symbol*)0x%lx\n", (unsigned long)s); if (s == NULL) return; if (s->ops==&doubleScalar) { fprintf(stderr, " s->ops= &doubleScalar\n"); fprintf(stderr, " s->index= (long)%ld\n", s->index); fprintf(stderr, " s->value.d= (double)%g\n", s->value.d); } else if (s->ops==&longScalar) { fprintf(stderr, " s->ops= &longScalar\n"); fprintf(stderr, " s->index= (long)%ld\n", s->index); fprintf(stderr, " s->value.l= (long)%ld\n", s->value.l); } else if (s->ops==&intScalar) { fprintf(stderr, " s->ops= &intScalar\n"); fprintf(stderr, " s->index= (long)%ld\n", s->index); fprintf(stderr, " s->value.i= (int)%d\n", s->value.i); } else if (s->ops==&dataBlockSym) { fprintf(stderr, " s->ops= &dataBlockSym\n"); fprintf(stderr, " s->index= (long)%ld\n", s->index); fprintf(stderr, " s->value.db= (DataBlock*)0x%lx\n", (unsigned long)s->value.db); } else if (s->ops==&referenceSym) { /* In `ydata.h': referenceSym is not a "complete" Symbol in the sense that the parser ensures it never appears in a binary operation ops - &referenceSym index - to globTab entry value.offset - stack offset (for Return only) */ fprintf(stderr, " s->ops= &referenceSym\n"); fprintf(stderr, " s->index= (long)%ld\n", s->index); fprintf(stderr, " s->value.offset= (long)%ld\n", s->value.offset); } else if (s->ops==&returnSym) { /* In `ydata.h': returnSym is not a "complete" Symbol in the sense that the parser ensures it never appears in a binary operation ops - &returnSym index - (unused) value.pc - VM program counter (for Return only) */ fprintf(stderr, " s->ops= &returnSym\n"); fprintf(stderr, " s->value.pc= (Instruction *)0x%lx\n", (unsigned long)s->value.pc); } else if (s->ops==NULL) { /* Keywords may also appear on the program stack-- these are marked by ops==0. */ fprintf(stderr, " s->ops= (OpTable*)NULL (KEYWORD)\n"); } else { fprintf(stderr, " s->ops= (OpTable*)0x%lx\n", (unsigned long)s->ops); } } int yeti_get_boolean(Symbol *s) { if (s->ops == &referenceSym) s = &globTab[s->index]; if (s->ops == &intScalar) return (s->value.i != 0); if (s->ops == &longScalar) return (s->value.l != 0L); if (s->ops == &doubleScalar) return (s->value.d != 0.0); if (s->ops == &dataBlockSym) { Operand op; s->ops->FormOperand(s, &op); if (! op.type.dims) { switch (op.ops->typeID) { case T_CHAR: return (*(char *)op.value != 0); case T_SHORT: return (*(short *)op.value != 0); case T_INT: return (*(int *)op.value != 0); case T_LONG: return (*(long *)op.value != 0L); case T_FLOAT: return (*(float *)op.value != 0.0F); case T_DOUBLE: return (*(double *)op.value != 0.0); case T_COMPLEX:return (((double *)op.value)[0] != 0.0 || ((double *)op.value)[1] != 0.0); case T_STRING: return (op.value != NULL); case T_VOID: return 0; } } } YError("bad non-boolean argument"); return 0; /* avoid compiler warning */ } long yeti_get_optional_integer(Symbol *s, long default_value) { if (s->ops==&longScalar) return s->value.l; if (s->ops==&intScalar) return s->value.i; YETI_SOLVE_REF(s); if (s->ops && s->ops!=&doubleScalar) { Operand op; s->ops->FormOperand(s, &op); if (! op.type.dims) { if (op.ops==&charOps) return *(char*)op.value; if (op.ops==&shortOps) return *(short*)op.value; if (op.ops==&intOps) return *(int*)op.value; if (op.ops==&longOps) return *(long*)op.value; if (op.ops==&voidOps) return default_value; } } yeti_bad_argument(s); } yeti_scalar_t *yeti_get_scalar(Symbol *s, yeti_scalar_t *scalar) { if (s->ops == &longScalar) { scalar->type = T_LONG; scalar->value.l = s->value.l; } else if (s->ops == &doubleScalar) { scalar->type = T_DOUBLE; scalar->value.d = s->value.d; } else if (s->ops == &intScalar) { scalar->type = T_INT; scalar->value.i = s->value.i; } else { Operand op; if (! s->ops) unexpected_keyword_argument(); s->ops->FormOperand(s, &op); if (op.type.dims) YError("expecting scalar argument"); scalar->type = op.ops->typeID; switch(scalar->type) { #define _(MEMBER, TYPE) scalar->value.MEMBER = *(TYPE*)(op.value); break case T_CHAR: _(c, char); case T_SHORT: _(s, short); case T_INT: _(i, int); case T_LONG: _(l, long); case T_FLOAT: _(f, float); case T_DOUBLE: _(d, double); case T_COMPLEX: scalar->value.z.re = ((double*)(op.value))[0]; scalar->value.z.im = ((double*)(op.value))[1]; break; case T_STRING: _(q, char *); case T_POINTER: _(p, void *); #undef _ default: scalar->value.p = op.value; } } return scalar; } DataBlock *yeti_get_datablock(Symbol *stack, const Operations *ops) { #if 0 if (stack->ops==&referenceSym) ReplaceRef(stack); if (stack->ops!=&dataBlockSym || (ops && stack->value.db->ops!=ops)) yeti_bad_argument(stack); return stack->value.db; #else DataBlock *db; Symbol *sym = YETI_DEREF_SYMBOL(stack); if (sym->ops != &dataBlockSym || (ops && sym->value.db->ops != ops)) yeti_bad_argument(sym); db = sym->value.db; if (sym != stack) { /* Replace reference onto the stack (equivalent to the statement ReplaceRef(stack); see ydata.c for actual code of this routine). */ stack->value.db = Ref(db); stack->ops = &dataBlockSym; /* change ops only AFTER value updated */ } return db; #endif } Array *yeti_get_array(Symbol *s, int nil_ok) { #if 0 ReplaceRef(s); if (s->ops == &dataBlockSym) { if (s->ops->isArray) return (Array *)s->value.db; if (nil_ok && s->value.db == &nilDB) return NULL; } #else Symbol *ref = YETI_DEREF_SYMBOL(s); if (ref->ops == &dataBlockSym) { DataBlock *db = ref->value.db; if (db->ops->isArray) { if (ref != s) { /* Replace reference onto the stack (equivalent to the statement ReplaceRef(stack); see ydata.c for actual code of this routine). */ s->value.db = Ref(db); s->ops = &dataBlockSym; /* change ops only AFTER value updated */ } return (Array *)db; } if (nil_ok && db == &nilDB) return NULL; } #endif YError("unexpected non-array argument"); return NULL; /* avoid compiler warning */ } /*---------------------------------------------------------------------------*/ /* STACK MANAGEMENT */ /* Pop topmost stack element in-place of S and drop all elements above S. */ void yeti_pop_and_reduce_to(Symbol *s) { if (s < sp) { DataBlock *old= s->ops==&dataBlockSym? s->value.db : 0; Symbol *stack= sp--; s->value= stack->value; s->ops= stack->ops; Unref(old); /* old unreferenced AFTER stack element is replaced */ while (sp > s) { stack= sp--; /* sp decremented BEFORE stack element is deleted */ if (stack->ops == &dataBlockSym) { DataBlock *db = stack->value.db; Unref(db); } } } else if (s > sp) { YError("attempt to pop outside the stack"); } } /*---------------------------------------------------------------------------*/ /* PUSH A SCALAR ON TOP OF THE STACK */ void yeti_push_char_value(int value) { YETI_PUSH_NEW_C(NULL)[0] = value; } void yeti_push_short_value(int value) { YETI_PUSH_NEW_S(NULL)[0] = value; } void yeti_push_float_value(double value) { YETI_PUSH_NEW_F(NULL)[0] = value; } void yeti_push_complex_value(double re, double im) { double *ptr = YETI_PUSH_NEW_Z(NULL); ptr[0] = re; ptr[1] = im; } void yeti_push_string_value(const char *value) { YETI_PUSH_NEW_Q(NULL)[0] = (value ? p_strcpy((char *)value) : NULL); } /*---------------------------------------------------------------------------*/ /* GET A SCALAR FROM THE STACK */ void **yeti_get_pointer(Symbol *s) { Operand op; if (!s->ops) unexpected_keyword_argument(); s->ops->FormOperand(s, &op); if (op.ops->typeID!=T_POINTER || op.type.dims) YError("expecting scalar pointer argument"); return *(void **)op.value; } /*---------------------------------------------------------------------------*/ /* ERROR MANAGEMENT */ #define MSG_MAX_LEN 127 void yeti_error(const char *str, ...) { unsigned int msglen=0; char msg[MSG_MAX_LEN+1]; va_list ap; va_start(ap, str); while (str) { unsigned int len = strlen(str); if (len + msglen > MSG_MAX_LEN) len = MSG_MAX_LEN - msglen; if (len > 0) memcpy(msg+msglen, str, len); msglen += len; str = va_arg(ap, char *); } va_end(ap); msg[msglen] = 0; YError(msg); } #undef MSG_MAX_LEN /*---------------------------------------------------------------------------*/ /* DIMENSIONS OF ARRAYS */ void yeti_reset_dimlist(void) { /* tmpDims is a global temporary for Dimension lists under construction -- you should always use it, then just leave your garbage there when you are done for the next guy to clean up -- your part of the perpetual cleanup comes first. */ Dimension *dims = tmpDims; tmpDims = NULL; if (dims) FreeDimension(dims); } Dimension *yeti_start_dimlist(long number) { Dimension *dims = tmpDims; tmpDims = NULL; if (dims) FreeDimension(dims); return tmpDims = NewDimension(number, 1L, NULL); } Dimension *yeti_grow_dimlist(long number) { return tmpDims = NewDimension(number, 1L, tmpDims); } static void not_same_dims(void); int yeti_same_dims(const Dimension *dims1, const Dimension *dims2) { while (dims1 != dims2) { if (! dims1 || ! dims2 || dims1->number != dims2->number) return 0; dims1 = dims1->next; dims2 = dims2->next; } return 1; } void yeti_assert_same_dims(const Dimension *dims1, const Dimension *dims2) { while (dims1 != dims2) { if (! dims1 || ! dims2 || dims1->number != dims2->number) not_same_dims(); dims1 = dims1->next; dims2 = dims2->next; } } long yeti_total_number(const Dimension *dims) { long number = 1; while (dims) { number *= dims->number; dims = dims->next; } return number; } long yeti_total_number_2(const Dimension *dims1, const Dimension *dims2) { long number = 1; while (dims1) { if (! dims2 || dims1->number != dims2->number) not_same_dims(); number *= dims1->number; dims1 = dims1->next; dims2 = dims2->next; } if (dims2) not_same_dims(); return number; } static void not_same_dims(void) { YError("input arrays must have same dimensions"); } Dimension *yeti_make_dims(const long number[], const long origin[], size_t ndims) { size_t i; Dimension *dims = tmpDims; tmpDims = NULL; if (dims) FreeDimension(dims); if (origin) { for (i=0 ; inext) ++ndims; if (ndims > maxdims) YError("too many dimensions"); i = ndims; if (origin) { while (i-- >= 1) { number[i] = dims->number; origin[i] = dims->origin; dims = dims->next; } } else { while (i-- >= 1) { number[i] = dims->number; dims = dims->next; } } return ndims; } /*---------------------------------------------------------------------------*/ /* OPAQUE OBJECTS */ extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; extern MemberOp GetMemberX; static void free_opaque(void *addr); static UnaryOp print_opaque; static void bad_opaque_class(const yeti_opaque_class_t *class); static Operations opaque_ops= { &free_opaque, T_OPAQUE, 0, T_STRING, "opaque_object", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalX, &SetupX, &GetMemberX, &MatMultX, &print_opaque }; /* See `defmem.h' for explanations about meaning and usage of MemoryBlock structure. */ static MemryBlock obj_block = {NULL, NULL, sizeof(yeti_opaque_t), 64*sizeof(yeti_opaque_t)}; yeti_opaque_t *yeti_new_opaque(void *data, const yeti_opaque_class_t *class) { yeti_opaque_t *obj = NextUnit(&obj_block); obj->references = 0; obj->ops = &opaque_ops; obj->class = class; obj->data = data; return obj; } /* free_opaque is automatically called by Yorick to delete an opaque object instance that is no longer referenced */ static void free_opaque(void *addr) { yeti_opaque_t *obj = addr; if (obj->class && obj->class->delete) obj->class->delete(obj->data); FreeUnit(&obj_block, addr); } /* print_opaque is used by Yorick's info command */ static void print_opaque(Operand *op) { yeti_opaque_t *obj = (yeti_opaque_t *)op->value; if (obj->class->print) { obj->class->print(obj->data); } else { char line[80]; ForceNewline(); PrintFunc("Opaque object instance of class: "); PrintFunc(obj->class && obj->class->name ? obj->class->name : ""); sprintf(line, ", references=%d", obj->references); PrintFunc(line); ForceNewline(); } } yeti_opaque_t *yeti_get_opaque(Symbol *stack, const yeti_opaque_class_t *class, int fatal) { yeti_opaque_t *obj; Symbol *s = (stack->ops == &referenceSym) ? &globTab[stack->index] : stack; if (s->ops != &dataBlockSym || s->value.db->ops != &opaque_ops) { if (fatal) YError("not an opaque object"); return NULL; } obj = (yeti_opaque_t *)s->value.db; if (class && obj->class != class) { if (fatal) bad_opaque_class(class); return NULL; } if (s != stack) { /* Replace reference onto the stack (equivalent to the statement ReplaceRef(stack); see ydata.c for actual code of this routine). */ stack->value.db = Ref(s->value.db); stack->ops = &dataBlockSym; /* change ops only AFTER value updated */ } return obj; } static void bad_opaque_class(const yeti_opaque_class_t *class) { #undef MAXLEN #define MAXLEN 40 char msg[(60+MAXLEN)]; int len; strcpy(msg, "bad object (not instance of "); if (class == NULL || class->name == NULL) { strcat(msg, ""); } else if ((len = strlen(class->name)) > MAXLEN) { strncat(msg, class->name, (len-MAXLEN)); strcat(msg, "[...]"); } else { strcat(msg, class->name); } strcat(msg, " class)"); YError(msg); } /*---------------------------------------------------------------------------*/ /* WORKSPACE */ typedef struct ws ws_t; struct ws { /* Common part of all Yorick's DataBlocks: */ int references; /* reference counter */ Operations *ops; /* virtual function table */ }; static void FreeWS(void *addr); static UnaryOp PrintWS; extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; extern MemberOp GetMemberX; Operations wsOps= { &FreeWS, T_OPAQUE, 0, /* promoteID= */ T_STRING /* means illegal */, "workspace", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalX, &SetupX, &GetMemberX, &MatMultX, &PrintWS }; /* FreeRE is automatically called by Yorick to delete a regex data block that is no more referenced. */ static void FreeWS(void *addr) { p_free(addr); } /* PrintRE is used by Yorick's info command. */ static void PrintWS(Operand *op) { ForceNewline(); PrintFunc("object of type: workspace"); ForceNewline(); } void *yeti_push_workspace(size_t nbytes) { /* EXTRA is the number of bytes needed to store DataBlock header rounded up to the size of a double (to avoid alignment errors). */ const size_t extra = YETI_ROUND_UP(sizeof(ws_t), sizeof(double)); ws_t *ws = p_malloc(nbytes + extra); ws->references = 0; ws->ops = &wsOps; return (void *)((char *)PushDataBlock(ws) + extra); } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_yhdf.i000066400000000000000000000634561253351442600155440ustar00rootroot00000000000000/* * yeti_yhdf.i -- * * Implement support for Yeti Hierarchical Data File. * *----------------------------------------------------------------------------- * * Copyright (C) 2005-2009 Eric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/ or redistribute the software under the terms of the CeCILL-C license * as circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, * modify and redistribute granted by the license, users are provided only * with a limited warranty and the software's author, the holder of the * economic rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more * generally, to use and operate it in the same conditions as regards * security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ local yhd_format; /* DOCUMENT DESCRIPTION OF YHD FILE FORMAT A YHD file consists in a header (256 bytes) followed by any number of records (one record for each member of the saved hash_table object). The file header is a 256 character array filled with a text string padded with nulls: YetiHD-VERSION (DATE)\n ENCODING\n COMMENT\n where VERSION is the version number (an integer); DATE is the creation date of the file (see Yorick built-in timestamp); ENCODING is a human-readable array of 32 integers separated by commas and enclosed in square brackets (ie.: [n1,n2,....,n32]); COMMENT is an optional comment string. All binary data of a YHD file is written following the ENCODING format of the file. The format of a record for an array member is as follow: | Number Type Name Description | ----------------------------------------------------------------------- | 1 long TYPE data type of record | 1 long IDSIZE number of bytes in member identifier (may be 0) | 1 long RANK number of dimensions, 0 if scalar/non-array | RANK long DIMLIST dimension list (absent if RANK is 0) | IDSIZE char IDENT identifier of record (see below) | *special* DATA binary data of the record TYPE is: <0 - string array 5 - float array 11 - range | 0 - void 6 - double array 12 - evaluator as function | 1 - char array 7 - complex array 13 - evaluator as symbol name | 2 - short array 8 - pointer array | 3 - int array 9 - function | 4 - long array 10 - symbolic link For string array, TYPE is strictly less than zero and is minus the number of characters needed to represent all elements of the array in packed form (more on this below). Void objects are also used to represent NULL pointer data -- this means that a NULL pointer element takes 3*sizeof(long) bytes to be stored in the file, which may be an issue if you use large pointer array sparsely filled with data. IDENT is the full name of the member: it is a IDSIZE char array where null characters are used to separate submember names and with a final null. If IDSIZE=0, no IDENT is written. Arrays of strings are written in packed form, each string being prefixed with '\1' (nil string) or '\2' (non-nil string) and suffixed with '\0', hence: | '\1' '\0' (2 bytes) for a nil string | '\2' ... '\0' (2+LEN bytes) for a string of length LEN this is needed to distinguish nil-string from "". The data part of an arrays of pointers consists in anonymous records (records with IDSIZE=0 and no IDENT) for each element of the array. Non-array members such as functions and symbolic links have the following record: | Number Type Name Description | ----------------------------------------------------------------------- | 1 long TYPE data type of record (9 or 10) | 1 long IDSIZE number of bytes in member identifier (may be 0) | 1 long LENGTH number of bytes for the name of the function | IDSIZE char IDENT identifier of record (see above) | LENGTH char NAME name of function or symbolic link Note that the final '\0' of the name is not saved in the file. Range members have the following record: | Number Type Name Description | ----------------------------------------------------------------------- | 1 long TYPE data type of record (11) | 1 long IDSIZE number of bytes in member identifier (may be 0) | 1 long FLAGS flags of range | IDSIZE char IDENT identifier of record (see above) | 3 long RANGE MIN,MAX,STEP Evaluators have the following record: | Number Type Name Description | ----------------------------------------------------------------------- | 1 long TYPE data type of record (12 or 13) | 1 long IDSIZE number of bytes in member identifier (may be 0) | 1 long LENGTH number of bytes for the name of the evaluator | IDSIZE char IDENT identifier of record (see above) | LENGTH char NAME name of function or symbolic link Note that the final '\0' of the name is not saved in the file. The last component of identifier (the member name) must be empty for an evaluator. */ func yhd_save(filename, obj, keylist, .., comment=, encoding=, overwrite=) /* DOCUMENT yhd_save, filename, obj; -or- yhd_save, filename, obj, keylist, ...; Save contents of hash object OBJ into the Yeti Hierarchical Data (YHD) file FILENAME. If additional arguments are provided, they are the names of members to save. The default is to save every member. Keyword COMMENT can be used to store a (short) string comment in the file header. The comment is truncated if it is too long (more than about 130 bytes) to fit into the header. COMMENT must not contain any DEL (octal 177) character. Keyword ENCODING can be used to specify a particular binary data format for the file; ENCODING can be the name of some known data format (see get_encoding) or an array of 32 integers (see set_primitives). The default is to use the native data format. If keyword OVERWRITE is true and file FILENAME already exists, the new file will (silently) overwrite the old one; othwerwise, file FILENAME must not already exist (default behaviour). SEE ALSO yhd_restore, yhd_info, yhd_check, yhd_format, get_encoding, set_primitives, h_new. */ { /* Declaration of variables that will be inherited by subroutines called by this routine (not really necessary, but just to make this clear). */ local file, address, elsize; /* Set some 'constants'. */ YHD_HEADER_SIZE = 256; YHD_VERSION = 2; // version number /* Get list of members to save. */ if (! is_hash(obj)) error, "expecting hash_table object"; while (more_args()) grow, keylist, next_arg(); if (! is_void(keylist) && structof(keylist) != string) error, "invalid member list"; /* Check COMMENT string. */ if (is_void(comment)) comment = ""; else if (strmatch(comment, "\177")) error, "invalid character in COMMENT"; /* Create binary file with correct primitives and avoid log-file. */ if (! overwrite && open(filename,"r",1)) error, "file \"" + filename + "\" already exists"; logname = filename + "L"; remove_log = (open(logname, "r", 1) ? 0n : 1n); file = open(filename, "wb"); if (remove_log) remove, logname; if (is_void(encoding)) encoding = "native"; if (structof(encoding) == string) encoding = get_encoding(encoding); install_encoding, file, encoding; save, file, complex; /* install the definition of a complex */ /* Write header. */ ident = swrite(format="YetiHD-%d (%s)\n[%d", YHD_VERSION, timestamp(), encoding(1)); for (i = 2; i <= 32; ++i) ident += swrite(format=",%d", encoding(i)); maxlen = YHD_HEADER_SIZE - 3 - strlen(ident); if (strlen(comment) > maxlen) { __yhd_warn, "too long comment get truncated"; comment = strpart(comment, 1:maxlen); } ident += swrite(format="]\n%s\n", comment); len = strlen(ident); (hdr = array(char, YHD_HEADER_SIZE))(1:len) = (*pointer(ident))(1:len); address = 0; _write, file, address, hdr; address += YHD_HEADER_SIZE; /* Build table of size of primary data types in file encoding. */ elsize = [encoding(1), encoding(4), encoding(7), encoding(10), encoding(13), encoding(16), 2*encoding(16)]; /* Save members. */ __yhd_save_hash, obj, [], keylist; } func __yhd_save_member(data, ident) { /**/extern file, address, elsize, long_size; idsize = numberof(ident); if (is_array(data)) { if ((s = structof(data)) == char) { type = 1; } else if (s == short) { type = 2; } else if (s == int) { type = 3; } else if (s == long) { type = 4; } else if (s == float) { type = 5; } else if (s == double) { type = 6; } else if (s == complex) { type = 7; } else if (s == pointer) { type = 8; } else if (s == string) { /* For string arrays, TYPE is minus the number of character written to the file. */ type = -(2*numberof(data) + sum(strlen(data))); } else { /* Unsupported array type. */ type = 0; } if (type) { /* Write array record header. */ dimlist = dimsof(data); number = numberof(data); header = array(long, 2 + numberof(dimlist)); header(1) = type; header(2) = idsize; header(3:) = dimlist; _write, file, address, header; address += long_size*numberof(header); if (idsize) { _write, file, address, ident; address += idsize; } /* Write data array. */ if (type < 0) { /* string array */ nil = ['\1', '\0']; for (i = 1; i <= number; ++i) { c = *pointer(data(i)); n = numberof(c); if (n) { _write, file, address++, '\2'; _write, file, address, c; address += n; } else { _write, file, address, nil; address += 2; } } } else if (type == 8) { /* pointer data array */ for (i = 1; i <= number; ++i) { __yhd_save_member, *data(i); /* no ident */ } } else { /* Numerical data array. */ _write, file, address, data; address += elsize(type)*number; } return; /* end for supported array types */ } } /* Non-array types. */ if (is_hash(data)) { __yhd_save_hash, data, ident; } else if (is_func(data)) { __yhd_save_named, ident, 9, nameof(data); } else if (is_symlink(data)) { __yhd_save_named, ident, 10, name_of_symlink(data); } else if (is_range(data)) { temp = parse_range(data); header = array(long, 3); header(1) = 11; /* type */ header(2) = idsize; header(3) = temp(1); /* flags */ _write, file, address, header; address += long_size*numberof(header); if (idsize) { _write, file, address, ident; address += idsize; } _write, file, address, temp(2:4); address += long_size*3; } else { /* Void or unsupported data type. */ if (! is_void(data)) { if (idsize) { /* hash table member */ __yhd_warn, "unsupported data type: ", typeof(data), " for member \"", __yhd_member_name(ident), "\" - replaced by void data"; } else { /* element of a pointer array: save NULL pointer */ __yhd_warn, "unsupported data type: ", typeof(data), " - replaced by NULL pointer element"; } } _write, file, address, long([0, idsize, 0]); /* type, idsize, rank */ address += 3*long_size; if (idsize) { _write, file, address, ident; address += idsize; } } } func __yhd_save_hash(hash, prefix, keylist) { /* Declaration and initialization of shared variables. */ /**/extern address, file, elsize; long_size = elsize(4); /* sizeof(long) in file encoding */ /* Deal with the evaluator if any. */ evl = h_evaluator(hash); if (evl) { ident = grow(prefix, '\0'); /* the evaluator has an empty member name */ if (is_func(evl)) { /* The evaluator is given by a function object (FIXME: check this). */ __yhd_save_named, ident, 12, nameof(evl); } else { /* The evaluator is given by its name. */ __yhd_save_named, ident, 13, evl; } } /* Recursively save all/some chidren of current member. */ if (is_void(keylist)) { keylist = h_keys(hash); if (is_array(keylist)) keylist = keylist(sort(keylist)); } n = numberof(keylist); for (i = 1; i <= n; ++i) { key = keylist(i); ident = grow(prefix, strchar(key)); __yhd_save_member, h_get(hash, key), ident; } } func __yhd_save_named(ident, type, name) { /**/extern address, file, long_size; idsize = numberof(ident); length = strlen(name); header = array(long, 3); header(1) = type; header(2) = idsize; header(3) = length; _write, file, address, header; address += long_size*numberof(header); if (idsize) { _write, file, address, ident; address += idsize; } if (length) { _write, file, address, strchar(name)(1:length); address += length; } } func yhd_check(file, &version, &date, &encoding, &comment) /* DOCUMENT yhd_check(file); -or- yhd_check(file, version, date, encoding, comment); Return 1 (true) if FILE is a valid YHD file; otherwise return 0 (false). The nature of FILE is guessed by reading its header. Input argument FILE can be a file name (scalar string) of a binary file stream opened for reading; all other arguments are pure outputs and may be omitted (if result is false, the contents of these outputs is undetermined). SEE ALSO yhd_info, yhd_save, yhd_restore, yhd_format. */ { /* Read header array. */ YHD_HEADER_SIZE = 256; if (structof(file) == string) file = open(file, "rb"); hdr = array(char, YHD_HEADER_SIZE); if (_read(file, 0, hdr) != YHD_HEADER_SIZE) return 0n; hdrstr = string(&hdr); /* Parse header string (hopefully the DEL character (octal 177) is not part of the comment string). */ comment = date = token = string(); encoding = array(long, 32); version = 0; if (sread(hdrstr, format="YetiHD-%d (%[^)])\n[%[^]]]\n%[^\177]", version, date, token, comment) >= 3) { /* Parse encoding array. */ value = 0; for (i = 1; i < 32; ++i) { if (sread(token, format="%d ,%[^]]", value, token) != 2) break; encoding(i) = value; } if (i==32 && sread(token, format="%d %[^]]", value, token) == 1) { /* Finalize encoding array and comment string. */ encoding(i) = value; if (strpart(comment, 0:0) == "\n") comment = strpart(comment, :-1); return 1n; } } return 0n; } func yhd_info(file) /* DOCUMENT yhd_info, file; Print out some information about YHD file. FILE can be a file name (scalar string) of a binary file stream opened for reading. SEE ALSO yhd_check, yhd_restore, yhd_save, yhd_format. */ { local version, date, encoding, comment; if (! yhd_check(file, version, date, encoding, comment)) { error, (structof(file) == string ? "\""+file+"\" is not a valid YHD file" : "invalid YHD file"); } write, format="%s:\n version = %d\n date = %s\n comment = %s\n", (structof(file) == string ? file : "YHD file"), version, date, comment; } func yhd_restore(filename, keylist, ..) /* DOCUMENT yhd_restore(filename); -or- yhd_restore(filename, keylist, ...); Restore and return hash table object saved in YHD file FILENAME. If additional arguments are provided, they are the names of members to restore. The default is to restore every member. SEE ALSO yhd_check, yhd_info, yhd_save, yhd_format. */ { /* Declaration of variables that will be inherited by subroutines called by this routine (not really necessary, but just to make this clear). */ local file, address, elsize, type, dimlist, ident; /* List of members to restore. */ while (more_args()) grow, keylist, next_arg(); if (! is_void(keylist) && structof(keylist) != string) error, "invalid member list"; /* Open file, read header and set primitives. */ local version, date, encoding, comment; file = open(filename, "rb"); if (! yhd_check(file, version, date, encoding, comment)) error, "\""+filename+"\" is not a valid YHD file"; if (version != 1 && version != 2) { error, swrite(format="unsupported YHD file version: %d", version); } install_encoding, file, encoding; save, file, complex; /* install the definition of a complex */ address = 256; /* header has already been read */ /* Build table of size of primary data types in file encoding. */ elsize = [encoding(1), encoding(4), encoding(7), encoding(10), encoding(13), encoding(16), 2*encoding(16)]; /* Read contents of file. */ obj = h_new(); for (;;) { /* Read header of next member. */ if (! __yhd_read_member_header(ident, type, dimlist)) return obj; /* normal end-of-file */ /* Skip member if first "path" component not in KEYLIST. */ path = strchar(ident); key = path(1); if (! is_void(keylist) && noneof(keylist == key)) { __yhd_restore_data, type, dimlist, 1; continue; } /* Convert identifier to OWNER-KEY pair. */ owner = obj; n = numberof(path); for (j = 2; j <= n; ++j) { if (! key) key = ""; /* deal with NULL strings returned by strchar() */ if (h_has(owner, key)) { owner = h_get(owner, key); if (! is_hash(owner)) { error, "inconsistent hierarchical member \""+ __yhd_member_name(path)+"\""; } } else { h_set, owner, key, (tmp = h_new()); owner = tmp; } key = path(j); } if (type == 12 || type ==13) { /* Evaluators must have empty member name. */ if (key) error, ("expecting empty member name for an evaluator, got \""+ key + "\" in member \"" + __yhd_member_name(path) + "\""); if (h_evaluator(owner)) __yhd_warn, "duplicate evaluator \"", __yhd_member_name(path), "\" in YHD file (last setting overrides previous ones)"; h_evaluator, owner, __yhd_restore_data(type, dimlist); } else { /* For normal members, deal with NULL strings returned by strchar() and check that the member does not yet exist before restoring it. */ if (! key) key = ""; if (h_has(owner, key)) __yhd_warn, "duplicate member \"", __yhd_member_name(path), "\" in YHD file (last value overrides previous ones)"; h_set, owner, key, __yhd_restore_data(type, dimlist); } } return obj } func __yhd_read_member_header(&ident, &type, &dimlist, pt) { /**/extern file, address, elsize; /* Figure out if there is anything else to read. */ tmp = 'a'; if (! _read(file, address, tmp)) return 0; /* normal end-of-file */ /* Read record header */ long_size = elsize(4); /* sizeof(long) in file encoding */ char_size = elsize(1); /* sizeof(char) in file encoding */ tmp = __yhd_read(long_size, long, 3); type = tmp(1); idsize = tmp(2); rank = tmp(3); if (type != 0) { if (rank < 0) error, "bad RANK in record header of YHD file"; if (type <= 8) { dimlist = array(rank, rank+1); if (rank) dimlist(2:) = __yhd_read(long_size, long, rank); } else { dimlist = rank; /* special */ } } else { /* Void object. */ if (rank != 0) error, "bad RANK in record header of YHD file"; dimlist = []; } if (pt) { /* Element of pointer array: IDSIZE must be 0. */ if (idsize) error, "unexpected named member in YHD file"; } else { if (idsize < 1) error, "unexpected anonymous member in YHD file"; ident = __yhd_read(char_size, char, idsize); } return 1; } func __yhd_restore_data(type, dimlist, skip) { /**/extern file, address, elsize; if (type >= 1 && type <= 7) { /* Numerical array data. */ size = elsize(type); if (skip) { n = numberof(dimlist); for (i = 2; i <= n; ++i) size *= dimlist(i); address += size; return; } return __yhd_read(size, (type == 1 ? char : (type == 2 ? short : (type == 3 ? int : (type == 4 ? long : (type == 5 ? float : (type == 6 ? double : complex)))))), dimlist); } if (type < 0) { /* Restore string array. For string arrays, TYPE is minus the number of character written to the file. */ char_size = elsize(1); /* sizeof(char) in file encoding */ count = -type; if (skip) { address += char_size*count; return; } c = __yhd_read(char_size, char, count); j = where(! c); data = array(string, dimlist); number = numberof(data); if (numberof(j) != number) { __yhd_warn, "bad string array in file (elements left empty)"; } else { k1 = 1; for (i = 1; i <= number; ++i) { k2 = j(i); if (c(k1) == '\2') data(i) = string(&c(k1+1:k2)); k1 = k2 + 1; } } return data; } if (type == 8) { /* Pointer array. */ local etype, edims; /* type and dimension list for every element */ ptr = array(pointer, dimlist); number = numberof(ptr); for (i = 1; i <= number; ++i) { if (! __yhd_read_member_header(/*not needed*/, etype, edims, 1)) { __yhd_warn, "short YHD file (unterminated pointer array)"; break; } tmp = __yhd_restore_data(etype, edims, skip); if (! skip && etype) ptr(i) = &tmp; } if (skip) return; return ptr; } if (type == 9 || type == 10) { /* Restore function. */ length = dimlist; /* special */ if (skip) { address += length; return; } if (length <= 0) { __yhd_warn, "unexpected lenght for function or symbolic link name"; return; } cbuf = array(char, length); if (_read(file, address, cbuf) != length) { error, "short YHD file (unterminated function or symbolic link)"; } name = strchar(cbuf); if (strlen(name) != length) { error, "invalid name in function or symbolic link member"; } address += length; if (type == 9) { /* function member */ if (symbol_exists(name)) { value = symbol_def(name); if (is_func(value)) { return value; } } __yhd_warn, ("function \"" + name + "\" not defined (will be replaced by a symbolic link"); } /* symbolic link */ return symlink_to_name(name); } if (type == 11) { /* Restore range. */ long_size = elsize(4); /* sizeof(long) in file encoding */ if (skip) { address += 3*long_size; return; } temp = array(long, 4); temp(1) = dimlist; /* special: "flags" in this context */ temp(2:4) = __yhd_read(long_size, long, 3); return make_range(temp); } if (type == 12 || type == 13) { /* Restore evaluator. */ length = dimlist; /* special */ if (skip) { address += length; return; } if (length <= 0) { __yhd_warn, "unexpected lenght for evaluator name"; return; } cbuf = array(char, length); if (_read(file, address, cbuf) != length) { error, "short YHD file (unterminated evaluator name)"; } name = strchar(cbuf); if (strlen(name) != length) { error, "invalid evaluator name"; } address += length; if (type == 12) { /* evaluator specified as a function */ if (symbol_exists(name)) { value = symbol_def(name); if (is_func(value)) { return value; } } __yhd_warn, ("evaluator function \"" + name + "\" not defined (will be replaced by a symbolic link"); } return name; } if (type) { error, "invalid TYPE in record header of YHD file"; } } func __yhd_read(element_size, data_type, dimlist) { /**/extern file, address; data = array(data_type, dimlist); nbytes = element_size*numberof(data); if (data_type != char) _read, file, address, data; else if (_read(file, address, data) != nbytes) error, "short file"; address += nbytes; return data; } func __yhd_warn(s, ..) { while (more_args()) s += next_arg(); write, format="WARNING - %s\n", s; } func __yhd_member_name(ident, separator) { if (structof(ident) == char) { ident = strchar(ident); } name = ident(1); if ((n = numberof(i)) >= 2) { if (is_void(separator)) separator = "."; for (j = 2; j <= n; ++j) { name += separator + ident(j); } } return name; } func __yhd_insert_dim(dimlist, first_dim) { newlist = 0; grow, newlist, dimlist; newlist(1) = numberof(newlist)-1; newlist(2) = first_dim; return newlist; } /* * Local Variables: * mode: Yorick * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 78 * coding: utf-8 * End: */ Yeti-6.4.0/core/yeti_yhdf_test.i000066400000000000000000000141351253351442600165710ustar00rootroot00000000000000#if 0 plug_dir,"."; include,"./yeti.i"; include,"./yeti_yhdf.i"; #endif func yhd_test_random_0(type, ..) { dimlist = [0]; while (more_args()) { arg = next_arg(); if (is_void(arg)) continue; if ((s=structof(arg))!=long) { if (s==char || s==short || s==int) arg=long(arg); else error, "bad data type in dimension list"; } if (min(arg) <= 0) error, "invalid null or negative dimension"; rank = dimsof(arg)(1); if (rank == 0) grow, dimlist, arg; else if (rank == 1 && numberof(arg) == arg(1)+1) grow, dimlist, arg(2:); else error, "invalid dimension list"; } dimlist(1) = numberof(dimlist)-1; x = random(dimlist); if (type==double) return x; if (type==float) return float(x); if (type==complex) return x + 1i*random(dimlist); scale = 2.0^(8*sizeof(type)); min_value = (type==char) ? 0.0 : -scale/2.0; max_value = scale + min_value - 1.0; x = floor(scale*x) + min_value; if (min(x) < min_value) error, "bad min value"; if (max(x) > max_value) error, "bad max value"; return type(x); } func yhd_test_random(type) { MAX_LEN = 10; /* maximum string lenght */ MAX_RANK = 4; /* maximum number of dimensions */ MAX_DIM = 7; /* maximum dimension lenght */ rank = long(MAX_RANK*random() + 0.5); dimlist = array(rank, rank+1); if (rank) dimlist(2:) = long((MAX_DIM-1)*random(rank) + 1.5); if (type==string) { s = array(string, dimlist); len = long(MAX_LEN*random(numberof(s)) + 0.5); k = where(len); n = numberof(k); for (i=1 ; i<=n ; ++i) { j = k(i); s(j) = string(&char(256*random(len(j)))); } return s; } x = random(dimlist); if (type==double) return x; if (type==float) return float(x); if (type==complex) return x + 1i*random(dimlist); scale = 2.0^(8*sizeof(type)); min_value = (type==char) ? 0.0 : -scale/2.0; max_value = scale + min_value - 1.0; x = floor(scale*x) + min_value; if (min(x) < min_value) error, "bad min value"; if (max(x) > max_value) error, "bad max value"; return type(x); } func yhd_test(tmpfilename) { write, "Building some complex hash table..."; nil = string(0); null = pointer(0); ptr = [[&yhd_test_random(float), &yhd_test_random(long), &yhd_test_random(short), null], [&yhd_test_random(char), &yhd_test_random(long), &yhd_test_random(short), &yhd_test_random(string)]]; a = h_new(f=sin, /* builtin function */ g=symlink_to_name("foo"), x=random(12,7,14), /* array of doubles */ y="hello", /* a scalar string */ /* a hash table */ z=h_new(msg=[["this", "is" , "a" , "string" ], ["array", "with", "some" , "elements"], ["and", "one" , "null" , nil ], ["and", "some", "more" , "(3)" ], ["nulls", nil , nil , nil ]], int_array=yhd_test_random(int), #if 1 complex_array=yhd_test_random(complex), #endif char_array=yhd_test_random(char), short_array=yhd_test_random(short), long_array=yhd_test_random(long), string_array=yhd_test_random(string), double_array=yhd_test_random(double), float_array=yhd_test_random(float)), "this is a more complex member name",0xDEADC0DE, #if 1 p=[[&yhd_test_random(float), &yhd_test_random(long)], [pointer(0), &ptr]], #endif "break",933, /* member name is a reserved keyword */ "",-711, /* empty member name */ "void",[] /* empty member */ ); write, "Try with \"native\" encoding..."; yhd_save, tmpfilename, a, overwrite=1; b = yhd_restore(tmpfilename); yhd_test_compare, a, b; names = ["alpha", "cray", "dec", "i86", "ibmpc", "mac", "macl", "sgi64", "sun", "sun3", "vax", "vaxg", "xdr"]; for (i=1 ; i<=numberof(names) ; ++i) { write, "Try with \""+names(i)+"\" encoding..."; yhd_save, tmpfilename, a, overwrite=1, encoding=names(i); b = yhd_restore(tmpfilename); yhd_test_compare,a, b; } remove, tmpfilename; } func yhd_test_compare(a, b, name) { if (typeof(a) != typeof(b)) { write, format=" *** not same data type for member \"%s\" (%s vs. %s)\n", name, typeof(a), typeof(b); return; } if (is_array(a)) { if ((da = dimsof(a))(1) != (db = dimsof(b))(1) || anyof(da!=db)) { write, format=" *** not same dimension list for member \"%s\"\n", name; return; } s = structof(a); if (s==char || s==short || s==int || s==long || s==float || s==double || s==complex || s==string) { if (anyof(a != b)) { write, format=" *** value(s) differ for member \"%s\"\n", name; return; } return; } if (s==pointer) { n = numberof(a); f = name+"(%d)"; for (i=1 ; i<=n ; ++i) { yhd_test_compare, *a(i), *b(i), swrite(format=f, i); } return; } } else if (is_hash(a)) { prefix = (is_void(name) ? "" : name+"::"); na = numberof((ka = h_keys(a))); nb = numberof((kb = h_keys(b))); if (nb) match = array(int, nb); for (i=1 ; i<=na ; ++i) { key = ka(i); name = prefix+key; if (h_has(b, key)) { yhd_test_compare, h_get(a, key), h_get(b, key), name; if (nb) match(where(key == kb)) = 1; } else { write, format=" *** missing member \"%s\" in B\n", name; } } if (nb) { missing = kb(where(!match)); f = " *** missing member \""+prefix+"%s\" in A\n" for (i=1 ; i<=numberof(missing) ; ++i) write, format=f, missing(i); } } else if (is_func(a)) { if (a != b) { write, format=" *** value(s) differ for member \"%s\"\n", name; return; } } else if (is_symlink(a)) { if (name_of_symlink(a) != name_of_symlink(b)) { write, format=" *** value(s) differ for member \"%s\"\n", name; return; } } else if (! is_void(a)) { write, format=" *** unexpected type for member \"%s\" (%s)\n", name, typeof(a); } } Yeti-6.4.0/doc/000077500000000000000000000000001253351442600132055ustar00rootroot00000000000000Yeti-6.4.0/doc/Makefile000066400000000000000000000020141253351442600146420ustar00rootroot00000000000000.SUFFIXES: .c .i .o .h .so .dll .dyn .a .exe .txt .text .ps .pdf .html #------------------------------------------------------------------------------ # these values filled in by: yorick -batch make.i Y_MAKEDIR= Y_EXE= Y_EXE_PKGS= Y_EXE_HOME= Y_EXE_SITE= DOCDIR=$(Y_EXE_SITE)/doc # command and rules to produce documentation PREFIX = $(HOME)/sw BINDIR = $(PREFIX)/bin ASCIIDOC = $(BINDIR)/asciidoc A2X = $(BINDIR)/a2x DOCTYPE = article default: doc clean: rm -f *~ core doc: make_doc.i $(Y_EXE) -batch ./make_doc.i install: doc for file in *.doc; do \ cp -pf $$file $(DOCDIR)/$$file; \ chmod 644 $(DOCDIR)/$$file; \ done cp -pf ../README.md $(DOCDIR)/README.yeti; \ chmod 644 $(DOCDIR)/README.yeti html: yeti.html README.md ../README: yeti.text rm -f $@ cp -a $< $@ chmod 444 $@ %.html: %.txt Makefile $(ASCIIDOC) -b xhtml11 -a icons -o $@ $< %.text: %.txt Makefile $(A2X) -d $(DOCTYPE) -f text --icons --icons-dir=./pics $< #------------------------------------------------------------------------------ Yeti-6.4.0/doc/make_doc.i000066400000000000000000000006161253351442600151240ustar00rootroot00000000000000/* * Yorick script to generate alphabetized listings of all help * command documentation. */ #include "mkdoc.i" /* assume current working directory is top level of distribution */ mkdoc, ["../core/yeti.i", "../core/yeti_yhdf.i"], "yeti.doc"; mkdoc, "../fftw/yeti_fftw.i", "yeti_fftw.doc"; mkdoc, "../regex/yeti_regex.i", "yeti_regex.doc"; mkdoc, "../tiff/yeti_tiff.i", "yeti_tiff.doc"; quit; Yeti-6.4.0/fftw/000077500000000000000000000000001253351442600134065ustar00rootroot00000000000000Yeti-6.4.0/fftw/Makefile000066400000000000000000000040471253351442600150530ustar00rootroot00000000000000# these values filled in by yorick -batch make.i Y_MAKEDIR= Y_EXE= Y_EXE_PKGS= Y_EXE_HOME= Y_EXE_SITE= # ----------------------------------------------------- optimization flags # options for make command line, e.g.- make COPT=-g TGT=exe COPT=$(COPT_DEFAULT) TGT=$(DEFAULT_TGT) # ------------------------------------------------ macros for this package PKG_NAME=yeti_fftw PKG_I=yeti_fftw.i OBJS=yeti_fftw.o # change to give the executable a name other than yorick PKG_EXENAME=yorick # PKG_DEPLIBS=-Lsomedir -lsomelib for dependencies of this package PKG_DEPLIBS = -L/home/soft/local/lib -lrfftw -lfftw # set compiler (or rarely loader) flags specific to this package PKG_CFLAGS = -I/home/soft/local/include -DHAVE_FFTW=1 PKG_LDFLAGS= # list of additional package names you want in PKG_EXENAME # (typically Y_EXE_PKGS should be first here) EXTRA_PKGS=$(Y_EXE_PKGS) # list of additional files for clean PKG_CLEAN= # autoload file for this package, if any PKG_I_START= # non-pkg.i include files for this package, if any PKG_I_EXTRA= # -------------------------------- standard targets and rules (in Makepkg) # set macros Makepkg uses in target and dependency names # DLL_TARGETS, LIB_TARGETS, EXE_TARGETS # are any additional targets (defined below) prerequisite to # the plugin library, archive library, and executable, respectively PKG_I_DEPS=$(PKG_I) Y_DISTMAKE=distmake include $(Y_MAKEDIR)/Make.cfg include $(Y_MAKEDIR)/Makepkg include $(Y_MAKEDIR)/Make$(TGT) # override macros Makepkg sets for rules and other macros # Y_HOME and Y_SITE in Make.cfg may not be correct (e.g.- relocatable) Y_HOME=$(Y_EXE_HOME) Y_SITE=$(Y_EXE_SITE) # reduce chance of yorick-1.5 corrupting this Makefile MAKE_TEMPLATE = protect-against-1.5 # ------------------------------------- targets and rules for this package # simple example: #myfunc.o: myapi.h # more complex example (also consider using PKG_CFLAGS above): #myfunc.o: myapi.h myfunc.c # $(CC) $(CPPFLAGS) $(CFLAGS) -DMY_SWITCH -o $@ -c myfunc.c # -------------------------------------------------------- end of Makefile Yeti-6.4.0/fftw/yeti_fftw.c000066400000000000000000000427121253351442600155600ustar00rootroot00000000000000/* * yeti_fftw.c - * * Implement support for FFTW, the "fastest Fourier transform in the West", * in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 2003-2006, 2015: Éric Thiébaut * * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include #include "play.h" #include "pstdlib.h" #include "ydata.h" #include "yio.h" /* BUILT-IN ROUTINES */ extern BuiltIn Y_fftw, Y_fftw_plan; #ifndef HAVE_FFTW # define HAVE_FFTW 0 #endif #if HAVE_FFTW /*---------------------------------------------------------------------------*/ #if defined(FFTW_PREFIX) && (FFTW_PREFIX != 0) # include "dfftw.h" # include "drfftw.h" #else # include "fftw.h" # include "rfftw.h" #endif #ifdef FFTW_ENABLE_FLOAT # error only double precision real supported #endif /* Offset (in bytes) of MEMBER in structure TYPE. */ #define OFFSET_OF(type, member) ((char *)&((type *)0)->member - (char *)0) /* PRIVATE ROUTINES */ static int get_boolean(Symbol *s); static void FreePlan(void *addr); static void PrintPlan(Operand *op); /*---------------------------------------------------------------------------*/ /* FFTW plan opaque object */ typedef struct y_fftw_plan_struct y_fftw_plan_t; struct y_fftw_plan_struct { int references; /* reference counter */ Operations *ops; /* virtual function table */ int flags; /* FFTW flags */ int dir; /* transform direction FFTW_FORWARD or FFTW_BACKWARD */ int real; /* real transform? */ void *plan; /* FFTW plan for transform */ void *buf; /* NULL or an array of N complex numbers, that FFTW will use as temporary space to perform the in-place computation. */ int rank; /* dimensionality of the arrays to be transformed */ int dims[1]; /* Dimension list for FFTW which uses row-major format to store arrays: the first dimension's index varies most slowly and the last dimension's index varies most quickly (i.e. opposite of Yorick interpreter but same order as the chained dimension list). _MUST_ BE LAST MEMBER (actual size is max(rank,1)). */ }; extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; extern MemberOp GetMemberX; Operations fftwPlanOps = { &FreePlan, T_OPAQUE, 0, T_STRING, "fftw_plan", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalX, &SetupX, &GetMemberX, &MatMultX, &PrintPlan }; static void FreePlan(void *addr) { if (addr) { y_fftw_plan_t *p = (y_fftw_plan_t *)addr; if (p->rank >= 1 && p->plan) { if (p->real) rfftwnd_destroy_plan(p->plan); else if (p->rank == 1) fftw_destroy_plan(p->plan); else fftwnd_destroy_plan(p->plan); } if (p->buf) p_free(p->buf); p_free(addr); } } static void PrintPlan(Operand *op) { y_fftw_plan_t *p = (y_fftw_plan_t *)op->value; const char *dir; char line[80]; int i, flags = p->flags; if (p->real) { if (p->dir == FFTW_REAL_TO_COMPLEX) dir = "REAL_TO_COMPLEX"; else dir = "COMPLEX_TO_REAL"; } else { if (p->dir == FFTW_FORWARD) dir = "FORWARD"; else dir = "BACKWARD"; } ForceNewline(); PrintFunc("Object of type: "); PrintFunc(p->ops->typeName); sprintf(line, " (dims=["); PrintFunc(line); for (i=p->rank-1 ; i>=0 ; --i) { sprintf(line, (i >= 1 ? "%d," : "%d"), p->dims[i]); PrintFunc(line); } #ifdef DEBUG sprintf(line, "], references=%d, dir=%s, flags=", p->references, dir); #else sprintf(line, "], dir=%s, flags=", dir); #endif PrintFunc(line); PrintFunc((flags & FFTW_IN_PLACE) ? "IN_PLACE" : "OUT_OF_PLACE"); PrintFunc((flags & FFTW_MEASURE) ? "|MEASURE)" : "|ESTIMATE)"); ForceNewline(); } /* IMPLEMENTATION NOTES: * * FFTW can compute many different kinds of FFT's (1-D or n-D, in-place or * out-of-place, complex or real), here are the (hopefully motivated) * choices I have made: * * . FFT of a scalar (rank=0) is a no-operation: * y[l] = sum_{k=0}^{N-1} x[k]*exp(±i*2*PI*k*l/N) * ==> y[0] = x[0] when N=1 * * . 1-D complex FFT's are computed in-place with a scratch buffer; * * . n-D complex FFT's are computed in-place (no scratch buffer); * * . All real FFT's use the rfftwnd calls (i.e. I do not want to translate * the way 1-D real FFT's are organized in FFTW); * * . Real to complex FFT's are computed in-place (no scratch buffer); * * . Complex to real FFT's are computed out-of-place, since complex to * real FFTW destroy the contents of input arrays, I take care of making * a temporary copy as needed. * */ void Y_fftw_plan(int argc) { y_fftw_plan_t *p; int i, len=0, rank=0, dir=0, number=0; int measure=0, real=0; Symbol *stack; long *dimlist=NULL; Operand op; unsigned int size; /* Parse arguments from first to last one. */ for (stack=sp-argc+1 ; stack<=sp ; ++stack) { if (stack->ops) { /* non-keyword argument */ if (! dimlist) { stack->ops->FormOperand(stack, &op); switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: op.ops->ToLong(&op); case T_LONG: /* Check dimension list and compute rank. */ dimlist = op.value; if (! op.type.dims) { /* dimension list specified as a scalar */ if ((number = dimlist[0]) <= 0) goto bad_dimlist; rank = (number > 1 ? 1 : 0); } else if (! op.type.dims->next) { /* dimension list specified as a vector */ rank = dimlist[0]; len = op.type.number; if (len != rank + 1) goto bad_dimlist; for (i=1 ; iindex]; ++stack; if (! strcmp(keyword, "real")) { real = get_boolean(stack); } else if (! strcmp(keyword, "measure")) { measure = get_boolean(stack); } else { YError("unknown keyword in fftw_plan"); } } } if (! dir) YError("too few arguments in fftw_plan"); /* Allocate new plan (with at least one slot for dims member) and push it on top of the stack. */ size = OFFSET_OF(y_fftw_plan_t, dims) + (rank > 1 ? rank : 1)*sizeof(*p->dims); p = p_malloc(size); memset(p, 0, size); p->ops = &fftwPlanOps; PushDataBlock(p); /* _AFTER_ having set OPS */ p->dir = dir; p->flags = ((measure ? FFTW_MEASURE : FFTW_ESTIMATE) | ((real && dir == FFTW_COMPLEX_TO_REAL) ? FFTW_OUT_OF_PLACE : FFTW_IN_PLACE)); p->real = real; p->rank = rank; /* Store list of dimensions for this plan in row-major order. */ if (len == 0) { p->dims[0] = number; } else { i = 0; while (--len >= 1) p->dims[i++] = dimlist[len]; } /* Create plan (noting to do for rank=0, because FFT of a scalar is a no-op). */ if (rank >= 1) { if (real) { /* Always use n-D plan for real FFT (because storage of complex values in 1-D RFFTW is not very useful in a language like Yorick). */ p->plan = rfftwnd_create_plan(rank, p->dims, p->dir, p->flags); } else if (rank == 1) { /* Allocate plan and scratch buffer for 1-D complex transforms. */ p->plan = fftw_create_plan(p->dims[0], p->dir, p->flags); p->buf = p_malloc(2*sizeof(double)*p->dims[0]); } else { /* Allocate plan for N-D complex transforms. */ p->plan = fftwnd_create_plan(rank, p->dims, p->dir, p->flags); } if (! p->plan) YError("failed to create FFTW plan"); } #ifdef DEBUG for (i=0 ; irank ; ++i) printf("dims[%d]=%d\n",i,p->dims[i]); #endif } void Y_fftw(int argc) { Array *array; Dimension *dimlist; Symbol *s; Operand op; y_fftw_plan_t *p; int i, j, rank, *dims; int nr, nc, nhc, n, len; int real_to_complex, complex_to_real; void *inp = NULL; /* address of array to transform */ if (argc != 2) YError("fftw takes exactly 2 arguments"); /* Get FFTW plan. */ s = sp; if (! s->ops) YError("unexpected keyword"); if (s->ops == &referenceSym) s = &globTab[s->index]; if (s->ops != &dataBlockSym || s->value.db->ops != &fftwPlanOps) YError("expecting a FFTW plan"); p = (y_fftw_plan_t *)s->value.db; /* Get input array. */ s = sp - 1; if (! s->ops) YError("unexpected keyword"); s->ops->FormOperand(s, &op); /* Check input data type. */ real_to_complex = (p->real && p->dir == FFTW_REAL_TO_COMPLEX); complex_to_real = (p->real && p->dir == FFTW_COMPLEX_TO_REAL); switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: case T_DOUBLE: break; case T_COMPLEX: if (! real_to_complex) break; default: YError("bad data type for input array"); } /* Check dimension list. */ dims = p->dims; rank = p->rank; dimlist = op.type.dims; i = 0; while (dimlist) { if (i >= rank || dimlist->number != ((complex_to_real && i == rank-1) ? dims[i]/2+1 : dims[i])) { i = -1; /* trigger error below */ break; } ++i; dimlist = dimlist->next; } if (i != rank) YError("dimension list of input array incompatible with FFTW plan"); if (rank == 0) { /********************************* *** *** *** transform of a scalar *** *** *** *********************************/ if (complex_to_real) op.ops->ToDouble(&op); else op.ops->ToComplex(&op); } else if (real_to_complex) { /******************************* *** *** *** real -> complex FFT *** *** *** *******************************/ const double zero = 0.0; double *cptr; /* Push a complex array with proper dimensions to store the result on top of the stack. */ nr = dims[rank - 1]; /* number of real's along 1st dim */ nhc = nr/2 + 1; /* number of complex's along 1st dim */ if (tmpDims) { Dimension *oldDims = tmpDims; tmpDims = 0; FreeDimension(oldDims); } tmpDims = NewDimension(nhc, 1, tmpDims); for (n=1, i=rank-2 ; i>=0 ; --i) { n *= (len = dims[i]); tmpDims = NewDimension(len, 1, tmpDims); } if (n*nr != op.type.number) YError("BUG in dimension list code"); array = NewArray(&complexStruct, tmpDims); PushDataBlock(array); inp = array->value.d; /* Copy input into output array, taking care of padding (zero padding is in case rank=0). */ cptr = inp; /* complex array as double */ nc = 2*nhc; /* complex array as double */ #define COPY(type_t) \ { \ type_t *rptr = op.value; \ for (j=0 ; jtypeID) { case T_CHAR: COPY(char); case T_SHORT: COPY(short); case T_INT: COPY(int); case T_LONG: COPY(long); case T_FLOAT: COPY(float); case T_DOUBLE: COPY(double); } #undef COPY /* Replace input array by output one. */ PopTo(op.owner); /* Apply the real->complex transform. */ rfftwnd_one_real_to_complex(p->plan, inp, NULL); } else { /****************************************** *** *** *** complex -> real or complex FFT *** *** *** ******************************************/ /* Make sure input array is complex and a temporary one (either because it will be the result or because FFTW_COMPLEX_TO_REAL destroys its input). */ switch (op.ops->typeID) { case T_CHAR: case T_SHORT: case T_INT: case T_LONG: case T_FLOAT: case T_DOUBLE: /* Convert input in a new complex array. */ op.ops->ToComplex(&op); inp = op.value; break; case T_COMPLEX: /* If input array has references (is not temporary), make a new copy. */ if (op.references) { array = NewArray(&complexStruct, op.type.dims); PushDataBlock(array); inp = array->value.d; memcpy(inp, op.value, 2*op.type.number*sizeof(double)); PopTo(op.owner); } else { inp = op.value; } break; } if (p->real) { /* Push output real array and apply out-of-place complex to real transform, then pop output array in place of (temporary) input one. */ if (tmpDims) { Dimension *oldDims = tmpDims; tmpDims = 0; FreeDimension(oldDims); } for (i=rank-1 ; i>=0 ; --i) tmpDims = NewDimension(dims[i], 1, tmpDims); array = NewArray(&doubleStruct, tmpDims); PushDataBlock(array); rfftwnd_one_complex_to_real(p->plan, inp, array->value.d); PopTo(sp - 2); } else if (rank == 1) { /* Apply in-place 1-D complex transform with scratch buffer. */ fftw_one(p->plan, inp, p->buf); } else { /* Apply in-place n-D complex transform (no scratch buffer). */ fftwnd_one(p->plan, inp, NULL); } } /* Drop FFTW plan and left result on top of the stack. */ Drop(1); } /*---------------------------------------------------------------------------*/ static int get_boolean(Symbol *s) { if (s->ops == &referenceSym) s = &globTab[s->index]; if (s->ops == &intScalar) return (s->value.i != 0); if (s->ops == &longScalar) return (s->value.l != 0L); if (s->ops == &doubleScalar) return (s->value.d != 0.0); if (s->ops == &dataBlockSym) { Operand op; s->ops->FormOperand(s, &op); if (! op.type.dims) { switch (op.ops->typeID) { case T_CHAR: return (*(char *)op.value != 0); case T_SHORT: return (*(short *)op.value != 0); case T_INT: return (*(int *)op.value != 0); case T_LONG: return (*(long *)op.value != 0L); case T_FLOAT: return (*(float *)op.value != 0.0F); case T_DOUBLE: return (*(double *)op.value != 0.0); case T_COMPLEX:return (((double *)op.value)[0] != 0.0 || ((double *)op.value)[1] != 0.0); case T_STRING: return (op.value != NULL); case T_VOID: return 0; } } } YError("bad non-boolean argument"); return 0; /* avoid compiler warning */ } /*---------------------------------------------------------------------------*/ #else /* not HAVE_FFTW */ static char *no_fftw_support = "no FFTW support in this version of Yorick"; void Y_fftw(int nargs) { YError(no_fftw_support); } void Y_fftw_plan(int nargs) { YError(no_fftw_support); } #endif /* not HAVE_FFTW */ /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 79 * coding: utf-8 * End: */ Yeti-6.4.0/fftw/yeti_fftw.i000066400000000000000000000241041253351442600155610ustar00rootroot00000000000000/* * yeti_fftw.i -- * * Implement support for FFTW, the "fastest Fourier transform in the West", * in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1996-2013 Éric Thiébaut * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ /* load dynamic code */ if (is_func(plug_in)) plug_in, "yeti_fftw"; extern fftw_plan; /* DOCUMENT fftw_plan(dimlist, dir) Creates a plan for fast Fourier transforming by fftw (which see) of arrays of dimension list DIMLIST. DIR=+/-1 and has the same meaning as in fft (which see): DIR meaning 1-D discrete Fourier transform --- --------------------- ------------------------------ +1 "forward" transform sum_k x(k)*exp(-i*2*PI*k*l/N) -1 "backward" transform sum_k x(k)*exp(+i*2*PI*k*l/N) where i=sqrt(-1). Except when keyword REAL is true (se below), the name "forward" or "backward" is only a question of convention, only the sign of the complex exponent really matters. If keyword REAL is true, then the result is a plan for a real to complex transform if DIR=+1 ("forward") or for a complex to real transform if DIR=-1 ("backward"). The result of a real to complex transform contains only half of the complex DFT amplitudes (since the negative-frequency amplitudes for real data are the complex conjugate of the positive-frequency amplitudes). If the real array is N1xN2x...xNn then the result is a complex (N1/2 + 1)xN2x...xNn array. Reciprocally, the complex to real transform takes a (N1/2 + 1)xN2x...xNn complex input array to compute a N1xN2x...xNn real array. When the plan is created with REAL set to true, DIMS must be the dimension list of the real array and DIR must be +1 ("forward") for a real to complex transform and -1 ("backward") for a complex to real transform. If keyword MEASURE is true, then FFTW attempts to find the optimal plan by actually computing several FFT's and measuring their execution time. The default is to not run any FFT and provide a "reasonable" plan (for a RISC processor with many registers). Computing an efficient plan for FFTW (with keyword MEASURE set to true) may be very expensive. FFTW is therefore mostly advantageous when several FFT's of arrays with same dimension lists are to be computed; in this case the user should save the plan in a variable, e.g.: plan_for_x_and_dir = fftw_plan(dimsof(x), dir, measure=1); for (...) { ...; fft_of_x = fftw(x, plan_for_x_and_dir); ...; } instead of: for (...) { ...; fft_of_x = fftw(x, fftw_plan(dimsof(x), dir, measure=1)); ...; } However note that it is relatively inexpensive to compute a plan for the default strategy; therefore: for (...) { ...; fft_of_x = fftw(x, fftw_plan(dimsof(x), dir)); ...; } is not too inefficient (this is what does cfftw). SEE ALSO fftw, fft, fft_setup. */ extern fftw; func cfftw(x, dir) { return fftw(x, fftw_plan(dimsof(x), dir)); } /* DOCUMENT fftw(x, plan) -or- cfftw(x, dir) Computes the fast Fourier transform of X with the "fastest Fourier transform in the West". The fftw function makes use of PLAN that has been created by fftw_plan (which see) and computes a real to complex or complex to real transform, if PLAN was created with keyword REAL set to true; otherwise, fftw computes a complex transform. The cfftw function always computes a complex transform and creates a temporary plan for the dimensions of X and FFT direction DIR (+/-1). If you want to compute seral FFT's of identical dimensions and directions, or if you want to compute real to complex (or complex to real) transforms, or if you want to use the "measure" strategy in defining FFTW plan, you should rather use fftw_plan and fftw. SEE ALSO fftw_plan. */ func fftw_indgen(dim) { return (u= indgen(0:dim-1)) - dim*(u > dim/2); } /* DOCUMENT fftw_indgen(len) Return FFT frequencies along a dimension of length LEN. SEE ALSO: indgen, span, fftw_dist. */ func fftw_dist(.., nyquist=, square=, half=) /* DOCUMENT fftw_dist(dimlist); Returns Euclidian lenght of spatial frequencies in frequel units for a FFT of dimensions DIMLIST. If keyword NYQUIST is true, the frequel coordinates get rescaled so that the Nyquist frequency is equal to NYQUIST along every dimension. This is obtained by using coordinates: (2.0*NYQUIST/DIM(i))*fft_indgen(DIM(i)) along i-th dimension of lenght DIM(i). If keyword SQUARE is true, the square of the Euclidian norm is returned instead. If keyword HALF is true, the length is only computed for half of the spatial frequencies so that it can be used with a real to complex FFTW forward transform (the first dimension becomes DIM(1)/2 + 1). SEE ALSO: fftw, fftw_indgen. */ { /* Build dimension list. */ local arg, dims; while (more_args()) { eq_nocopy, arg, next_arg(); if ((s = structof(arg)) == long || s == int || s == short || s == char) { /* got an integer array */ if (! (n = dimsof(arg)(1))) { /* got a scalar */ grow, dims, arg; } else if (n == 1 && (n = numberof(arg) - 1) == arg(1)) { /* got a vector which is a valid dimension list */ if (n) grow, dims, arg(2:); } else { error, "bad dimension list"; } } else if (! is_void(arg)) { error, "unexpected data type in dimension list"; } } if (! (n = numberof(dims))) return 0.0; /* scalar array */ if (min(dims) <= 0) error, "negative value in dimension list"; /* Build square radius array one dimension at a time, starting with the last dimension. */ local r2; if (is_void(nyquist)) { for (k=n ; k>=1 ; --k) { u = double((k==1 && half ? indgen(0:dims(k)/2) : fftw_indgen(dims(k)))); r2 = (k=1 ; --k) { dim = dims(k); u = (s/dim)*(k==1 && half ? indgen(0:dim/2) : fftw_indgen(dim)); r2 = (k Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. Yeti-6.4.0/regex/glibc/COPYING.LIB000066400000000000000000000636501253351442600163040ustar00rootroot00000000000000 GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. ^L Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. ^L GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. ^L Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. ^L 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. ^L 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. ^L 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. ^L 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS ^L How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! Yeti-6.4.0/regex/glibc/VERSION000066400000000000000000000000141253351442600156750ustar00rootroot00000000000000glibc-2.3.5 Yeti-6.4.0/regex/glibc/regcomp.c000066400000000000000000003620721253351442600164440ustar00rootroot00000000000000/* Extended regular expression matching and search library. Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Isamu Hasegawa . The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ static reg_errcode_t re_compile_internal (regex_t *preg, const char * pattern, int length, reg_syntax_t syntax); static void re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state, char *fastmap); static reg_errcode_t init_dfa (re_dfa_t *dfa, int pat_len); static void init_word_char (re_dfa_t *dfa); #ifdef RE_ENABLE_I18N static void free_charset (re_charset_t *cset); #endif /* RE_ENABLE_I18N */ static void free_workarea_compile (regex_t *preg); static reg_errcode_t create_initial_state (re_dfa_t *dfa); #ifdef RE_ENABLE_I18N static void optimize_utf8 (re_dfa_t *dfa); #endif struct subexp_optimize { re_dfa_t *dfa; re_token_t *nodes; int no_sub, re_nsub; }; static bin_tree_t *optimize_subexps (struct subexp_optimize *so, bin_tree_t *node, int sidx, int depth); static reg_errcode_t analyze (re_dfa_t *dfa); static reg_errcode_t analyze_tree (re_dfa_t *dfa, bin_tree_t *node); static void calc_first (re_dfa_t *dfa, bin_tree_t *node); static void calc_next (re_dfa_t *dfa, bin_tree_t *node); static void calc_epsdest (re_dfa_t *dfa, bin_tree_t *node); static reg_errcode_t duplicate_node_closure (re_dfa_t *dfa, int top_org_node, int top_clone_node, int root_node, unsigned int constraint); static reg_errcode_t duplicate_node (int *new_idx, re_dfa_t *dfa, int org_idx, unsigned int constraint); static int search_duplicated_node (re_dfa_t *dfa, int org_node, unsigned int constraint); static reg_errcode_t calc_eclosure (re_dfa_t *dfa); static reg_errcode_t calc_eclosure_iter (re_node_set *new_set, re_dfa_t *dfa, int node, int root); static void calc_inveclosure (re_dfa_t *dfa); static int fetch_number (re_string_t *input, re_token_t *token, reg_syntax_t syntax); static void fetch_token (re_token_t *result, re_string_t *input, reg_syntax_t syntax); static int peek_token (re_token_t *token, re_string_t *input, reg_syntax_t syntax); static int peek_token_bracket (re_token_t *token, re_string_t *input, reg_syntax_t syntax); static bin_tree_t *parse (re_string_t *regexp, regex_t *preg, reg_syntax_t syntax, reg_errcode_t *err); static bin_tree_t *parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token, reg_syntax_t syntax, int nest, reg_errcode_t *err); static bin_tree_t *parse_branch (re_string_t *regexp, regex_t *preg, re_token_t *token, reg_syntax_t syntax, int nest, reg_errcode_t *err); static bin_tree_t *parse_expression (re_string_t *regexp, regex_t *preg, re_token_t *token, reg_syntax_t syntax, int nest, reg_errcode_t *err); static bin_tree_t *parse_sub_exp (re_string_t *regexp, regex_t *preg, re_token_t *token, reg_syntax_t syntax, int nest, reg_errcode_t *err); static bin_tree_t *parse_dup_op (bin_tree_t *dup_elem, re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, reg_syntax_t syntax, reg_errcode_t *err); static bin_tree_t *parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, reg_syntax_t syntax, reg_errcode_t *err); static reg_errcode_t parse_bracket_element (bracket_elem_t *elem, re_string_t *regexp, re_token_t *token, int token_len, re_dfa_t *dfa, reg_syntax_t syntax, int accept_hyphen); static reg_errcode_t parse_bracket_symbol (bracket_elem_t *elem, re_string_t *regexp, re_token_t *token); #ifndef _LIBC # ifdef RE_ENABLE_I18N static reg_errcode_t build_range_exp (re_bitset_ptr_t sbcset, re_charset_t *mbcset, int *range_alloc, bracket_elem_t *start_elem, bracket_elem_t *end_elem); static reg_errcode_t build_collating_symbol (re_bitset_ptr_t sbcset, re_charset_t *mbcset, int *coll_sym_alloc, const unsigned char *name); # else /* not RE_ENABLE_I18N */ static reg_errcode_t build_range_exp (re_bitset_ptr_t sbcset, bracket_elem_t *start_elem, bracket_elem_t *end_elem); static reg_errcode_t build_collating_symbol (re_bitset_ptr_t sbcset, const unsigned char *name); # endif /* not RE_ENABLE_I18N */ #endif /* not _LIBC */ #ifdef RE_ENABLE_I18N static reg_errcode_t build_equiv_class (re_bitset_ptr_t sbcset, re_charset_t *mbcset, int *equiv_class_alloc, const unsigned char *name); static reg_errcode_t build_charclass (unsigned RE_TRANSLATE_TYPE trans, re_bitset_ptr_t sbcset, re_charset_t *mbcset, int *char_class_alloc, const unsigned char *class_name, reg_syntax_t syntax); #else /* not RE_ENABLE_I18N */ static reg_errcode_t build_equiv_class (re_bitset_ptr_t sbcset, const unsigned char *name); static reg_errcode_t build_charclass (unsigned RE_TRANSLATE_TYPE trans, re_bitset_ptr_t sbcset, const unsigned char *class_name, reg_syntax_t syntax); #endif /* not RE_ENABLE_I18N */ static bin_tree_t *build_charclass_op (re_dfa_t *dfa, unsigned RE_TRANSLATE_TYPE trans, const unsigned char *class_name, const unsigned char *extra, int non_match, reg_errcode_t *err); static bin_tree_t *create_tree (re_dfa_t *dfa, bin_tree_t *left, bin_tree_t *right, re_token_type_t type, int index); static bin_tree_t *re_dfa_add_tree_node (re_dfa_t *dfa, bin_tree_t *left, bin_tree_t *right, const re_token_t *token) __attribute ((noinline)); static bin_tree_t *duplicate_tree (const bin_tree_t *src, re_dfa_t *dfa); static void mark_opt_subexp (const bin_tree_t *src, re_dfa_t *dfa); static void mark_opt_subexp_iter (const bin_tree_t *src, re_dfa_t *dfa, int idx); /* This table gives an error message for each of the error codes listed in regex.h. Obviously the order here has to be same as there. POSIX doesn't require that we do anything for REG_NOERROR, but why not be nice? */ const char __re_error_msgid[] attribute_hidden = { #define REG_NOERROR_IDX 0 gettext_noop ("Success") /* REG_NOERROR */ "\0" #define REG_NOMATCH_IDX (REG_NOERROR_IDX + sizeof "Success") gettext_noop ("No match") /* REG_NOMATCH */ "\0" #define REG_BADPAT_IDX (REG_NOMATCH_IDX + sizeof "No match") gettext_noop ("Invalid regular expression") /* REG_BADPAT */ "\0" #define REG_ECOLLATE_IDX (REG_BADPAT_IDX + sizeof "Invalid regular expression") gettext_noop ("Invalid collation character") /* REG_ECOLLATE */ "\0" #define REG_ECTYPE_IDX (REG_ECOLLATE_IDX + sizeof "Invalid collation character") gettext_noop ("Invalid character class name") /* REG_ECTYPE */ "\0" #define REG_EESCAPE_IDX (REG_ECTYPE_IDX + sizeof "Invalid character class name") gettext_noop ("Trailing backslash") /* REG_EESCAPE */ "\0" #define REG_ESUBREG_IDX (REG_EESCAPE_IDX + sizeof "Trailing backslash") gettext_noop ("Invalid back reference") /* REG_ESUBREG */ "\0" #define REG_EBRACK_IDX (REG_ESUBREG_IDX + sizeof "Invalid back reference") gettext_noop ("Unmatched [ or [^") /* REG_EBRACK */ "\0" #define REG_EPAREN_IDX (REG_EBRACK_IDX + sizeof "Unmatched [ or [^") gettext_noop ("Unmatched ( or \\(") /* REG_EPAREN */ "\0" #define REG_EBRACE_IDX (REG_EPAREN_IDX + sizeof "Unmatched ( or \\(") gettext_noop ("Unmatched \\{") /* REG_EBRACE */ "\0" #define REG_BADBR_IDX (REG_EBRACE_IDX + sizeof "Unmatched \\{") gettext_noop ("Invalid content of \\{\\}") /* REG_BADBR */ "\0" #define REG_ERANGE_IDX (REG_BADBR_IDX + sizeof "Invalid content of \\{\\}") gettext_noop ("Invalid range end") /* REG_ERANGE */ "\0" #define REG_ESPACE_IDX (REG_ERANGE_IDX + sizeof "Invalid range end") gettext_noop ("Memory exhausted") /* REG_ESPACE */ "\0" #define REG_BADRPT_IDX (REG_ESPACE_IDX + sizeof "Memory exhausted") gettext_noop ("Invalid preceding regular expression") /* REG_BADRPT */ "\0" #define REG_EEND_IDX (REG_BADRPT_IDX + sizeof "Invalid preceding regular expression") gettext_noop ("Premature end of regular expression") /* REG_EEND */ "\0" #define REG_ESIZE_IDX (REG_EEND_IDX + sizeof "Premature end of regular expression") gettext_noop ("Regular expression too big") /* REG_ESIZE */ "\0" #define REG_ERPAREN_IDX (REG_ESIZE_IDX + sizeof "Regular expression too big") gettext_noop ("Unmatched ) or \\)") /* REG_ERPAREN */ }; const size_t __re_error_msgid_idx[] attribute_hidden = { REG_NOERROR_IDX, REG_NOMATCH_IDX, REG_BADPAT_IDX, REG_ECOLLATE_IDX, REG_ECTYPE_IDX, REG_EESCAPE_IDX, REG_ESUBREG_IDX, REG_EBRACK_IDX, REG_EPAREN_IDX, REG_EBRACE_IDX, REG_BADBR_IDX, REG_ERANGE_IDX, REG_ESPACE_IDX, REG_BADRPT_IDX, REG_EEND_IDX, REG_ESIZE_IDX, REG_ERPAREN_IDX }; /* Entry points for GNU code. */ /* re_compile_pattern is the GNU regular expression compiler: it compiles PATTERN (of length LENGTH) and puts the result in BUFP. Returns 0 if the pattern was valid, otherwise an error string. Assumes the `allocated' (and perhaps `buffer') and `translate' fields are set in BUFP on entry. */ const char * re_compile_pattern (pattern, length, bufp) const char *pattern; size_t length; struct re_pattern_buffer *bufp; { reg_errcode_t ret; /* And GNU code determines whether or not to get register information by passing null for the REGS argument to re_match, etc., not by setting no_sub, unless RE_NO_SUB is set. */ bufp->no_sub = !!(re_syntax_options & RE_NO_SUB); /* Match anchors at newline. */ bufp->newline_anchor = 1; ret = re_compile_internal (bufp, pattern, length, re_syntax_options); if (!ret) return NULL; return gettext (__re_error_msgid + __re_error_msgid_idx[(int) ret]); } #ifdef _LIBC weak_alias (__re_compile_pattern, re_compile_pattern) #endif /* Set by `re_set_syntax' to the current regexp syntax to recognize. Can also be assigned to arbitrarily: each pattern buffer stores its own syntax, so it can be changed between regex compilations. */ /* This has no initializer because initialized variables in Emacs become read-only after dumping. */ reg_syntax_t re_syntax_options; /* Specify the precise syntax of regexps for compilation. This provides for compatibility for various utilities which historically have different, incompatible syntaxes. The argument SYNTAX is a bit mask comprised of the various bits defined in regex.h. We return the old syntax. */ reg_syntax_t re_set_syntax (syntax) reg_syntax_t syntax; { reg_syntax_t ret = re_syntax_options; re_syntax_options = syntax; return ret; } #ifdef _LIBC weak_alias (__re_set_syntax, re_set_syntax) #endif int re_compile_fastmap (bufp) struct re_pattern_buffer *bufp; { re_dfa_t *dfa = (re_dfa_t *) bufp->buffer; char *fastmap = bufp->fastmap; memset (fastmap, '\0', sizeof (char) * SBC_MAX); re_compile_fastmap_iter (bufp, dfa->init_state, fastmap); if (dfa->init_state != dfa->init_state_word) re_compile_fastmap_iter (bufp, dfa->init_state_word, fastmap); if (dfa->init_state != dfa->init_state_nl) re_compile_fastmap_iter (bufp, dfa->init_state_nl, fastmap); if (dfa->init_state != dfa->init_state_begbuf) re_compile_fastmap_iter (bufp, dfa->init_state_begbuf, fastmap); bufp->fastmap_accurate = 1; return 0; } #ifdef _LIBC weak_alias (__re_compile_fastmap, re_compile_fastmap) #endif static inline void __attribute ((always_inline)) re_set_fastmap (char *fastmap, int icase, int ch) { fastmap[ch] = 1; if (icase) fastmap[tolower (ch)] = 1; } /* Helper function for re_compile_fastmap. Compile fastmap for the initial_state INIT_STATE. */ static void re_compile_fastmap_iter (bufp, init_state, fastmap) regex_t *bufp; const re_dfastate_t *init_state; char *fastmap; { re_dfa_t *dfa = (re_dfa_t *) bufp->buffer; int node_cnt; int icase = (dfa->mb_cur_max == 1 && (bufp->syntax & RE_ICASE)); for (node_cnt = 0; node_cnt < init_state->nodes.nelem; ++node_cnt) { int node = init_state->nodes.elems[node_cnt]; re_token_type_t type = dfa->nodes[node].type; if (type == CHARACTER) { re_set_fastmap (fastmap, icase, dfa->nodes[node].opr.c); #ifdef RE_ENABLE_I18N if ((bufp->syntax & RE_ICASE) && dfa->mb_cur_max > 1) { unsigned char *buf = alloca (dfa->mb_cur_max), *p; wchar_t wc; mbstate_t state; p = buf; *p++ = dfa->nodes[node].opr.c; while (++node < dfa->nodes_len && dfa->nodes[node].type == CHARACTER && dfa->nodes[node].mb_partial) *p++ = dfa->nodes[node].opr.c; memset (&state, 0, sizeof (state)); if (mbrtowc (&wc, (const char *) buf, p - buf, &state) == p - buf && __wcrtomb ((char *) buf, towlower (wc), &state) > 0) re_set_fastmap (fastmap, 0, buf[0]); } #endif } else if (type == SIMPLE_BRACKET) { int i, j, ch; for (i = 0, ch = 0; i < BITSET_UINTS; ++i) for (j = 0; j < UINT_BITS; ++j, ++ch) if (dfa->nodes[node].opr.sbcset[i] & (1 << j)) re_set_fastmap (fastmap, icase, ch); } #ifdef RE_ENABLE_I18N else if (type == COMPLEX_BRACKET) { int i; re_charset_t *cset = dfa->nodes[node].opr.mbcset; if (cset->non_match || cset->ncoll_syms || cset->nequiv_classes || cset->nranges || cset->nchar_classes) { # ifdef _LIBC if (_NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES) != 0) { /* In this case we want to catch the bytes which are the first byte of any collation elements. e.g. In da_DK, we want to catch 'a' since "aa" is a valid collation element, and don't catch 'b' since 'b' is the only collation element which starts from 'b'. */ int j, ch; const int32_t *table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB); for (i = 0, ch = 0; i < BITSET_UINTS; ++i) for (j = 0; j < UINT_BITS; ++j, ++ch) if (table[ch] < 0) re_set_fastmap (fastmap, icase, ch); } # else if (dfa->mb_cur_max > 1) for (i = 0; i < SBC_MAX; ++i) if (__btowc (i) == WEOF) re_set_fastmap (fastmap, icase, i); # endif /* not _LIBC */ } for (i = 0; i < cset->nmbchars; ++i) { char buf[256]; mbstate_t state; memset (&state, '\0', sizeof (state)); __wcrtomb (buf, cset->mbchars[i], &state); re_set_fastmap (fastmap, icase, *(unsigned char *) buf); if ((bufp->syntax & RE_ICASE) && dfa->mb_cur_max > 1) { __wcrtomb (buf, towlower (cset->mbchars[i]), &state); re_set_fastmap (fastmap, 0, *(unsigned char *) buf); } } } #endif /* RE_ENABLE_I18N */ else if (type == OP_PERIOD #ifdef RE_ENABLE_I18N || type == OP_UTF8_PERIOD #endif /* RE_ENABLE_I18N */ || type == END_OF_RE) { memset (fastmap, '\1', sizeof (char) * SBC_MAX); if (type == END_OF_RE) bufp->can_be_null = 1; return; } } } /* Entry point for POSIX code. */ /* regcomp takes a regular expression as a string and compiles it. PREG is a regex_t *. We do not expect any fields to be initialized, since POSIX says we shouldn't. Thus, we set `buffer' to the compiled pattern; `used' to the length of the compiled pattern; `syntax' to RE_SYNTAX_POSIX_EXTENDED if the REG_EXTENDED bit in CFLAGS is set; otherwise, to RE_SYNTAX_POSIX_BASIC; `newline_anchor' to REG_NEWLINE being set in CFLAGS; `fastmap' to an allocated space for the fastmap; `fastmap_accurate' to zero; `re_nsub' to the number of subexpressions in PATTERN. PATTERN is the address of the pattern string. CFLAGS is a series of bits which affect compilation. If REG_EXTENDED is set, we use POSIX extended syntax; otherwise, we use POSIX basic syntax. If REG_NEWLINE is set, then . and [^...] don't match newline. Also, regexec will try a match beginning after every newline. If REG_ICASE is set, then we considers upper- and lowercase versions of letters to be equivalent when matching. If REG_NOSUB is set, then when PREG is passed to regexec, that routine will report only success or failure, and nothing about the registers. It returns 0 if it succeeds, nonzero if it doesn't. (See regex.h for the return codes and their meanings.) */ int regcomp (preg, pattern, cflags) regex_t *__restrict preg; const char *__restrict pattern; int cflags; { reg_errcode_t ret; reg_syntax_t syntax = ((cflags & REG_EXTENDED) ? RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC); preg->buffer = NULL; preg->allocated = 0; preg->used = 0; /* Try to allocate space for the fastmap. */ preg->fastmap = re_malloc (char, SBC_MAX); if (BE (preg->fastmap == NULL, 0)) return REG_ESPACE; syntax |= (cflags & REG_ICASE) ? RE_ICASE : 0; /* If REG_NEWLINE is set, newlines are treated differently. */ if (cflags & REG_NEWLINE) { /* REG_NEWLINE implies neither . nor [^...] match newline. */ syntax &= ~RE_DOT_NEWLINE; syntax |= RE_HAT_LISTS_NOT_NEWLINE; /* It also changes the matching behavior. */ preg->newline_anchor = 1; } else preg->newline_anchor = 0; preg->no_sub = !!(cflags & REG_NOSUB); preg->translate = NULL; ret = re_compile_internal (preg, pattern, strlen (pattern), syntax); /* POSIX doesn't distinguish between an unmatched open-group and an unmatched close-group: both are REG_EPAREN. */ if (ret == REG_ERPAREN) ret = REG_EPAREN; /* We have already checked preg->fastmap != NULL. */ if (BE (ret == REG_NOERROR, 1)) /* Compute the fastmap now, since regexec cannot modify the pattern buffer. This function never fails in this implementation. */ (void) re_compile_fastmap (preg); else { /* Some error occurred while compiling the expression. */ re_free (preg->fastmap); preg->fastmap = NULL; } return (int) ret; } #ifdef _LIBC weak_alias (__regcomp, regcomp) #endif /* Returns a message corresponding to an error code, ERRCODE, returned from either regcomp or regexec. We don't use PREG here. */ size_t regerror (errcode, preg, errbuf, errbuf_size) int errcode; const regex_t *preg; char *errbuf; size_t errbuf_size; { const char *msg; size_t msg_size; if (BE (errcode < 0 || errcode >= (int) (sizeof (__re_error_msgid_idx) / sizeof (__re_error_msgid_idx[0])), 0)) /* Only error codes returned by the rest of the code should be passed to this routine. If we are given anything else, or if other regex code generates an invalid error code, then the program has a bug. Dump core so we can fix it. */ abort (); msg = gettext (__re_error_msgid + __re_error_msgid_idx[errcode]); msg_size = strlen (msg) + 1; /* Includes the null. */ if (BE (errbuf_size != 0, 1)) { if (BE (msg_size > errbuf_size, 0)) { #if defined HAVE_MEMPCPY || defined _LIBC *((char *) __mempcpy (errbuf, msg, errbuf_size - 1)) = '\0'; #else memcpy (errbuf, msg, errbuf_size - 1); errbuf[errbuf_size - 1] = 0; #endif } else memcpy (errbuf, msg, msg_size); } return msg_size; } #ifdef _LIBC weak_alias (__regerror, regerror) #endif #ifdef RE_ENABLE_I18N /* This static array is used for the map to single-byte characters when UTF-8 is used. Otherwise we would allocate memory just to initialize it the same all the time. UTF-8 is the preferred encoding so this is a worthwhile optimization. */ static const bitset utf8_sb_map = { /* Set the first 128 bits. */ # if UINT_MAX == 0xffffffff 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff # else # error "Add case for new unsigned int size" # endif }; #endif static void free_dfa_content (re_dfa_t *dfa) { int i, j; if (dfa->nodes) for (i = 0; i < dfa->nodes_len; ++i) { re_token_t *node = dfa->nodes + i; #ifdef RE_ENABLE_I18N if (node->type == COMPLEX_BRACKET && node->duplicated == 0) free_charset (node->opr.mbcset); else #endif /* RE_ENABLE_I18N */ if (node->type == SIMPLE_BRACKET && node->duplicated == 0) re_free (node->opr.sbcset); } re_free (dfa->nexts); for (i = 0; i < dfa->nodes_len; ++i) { if (dfa->eclosures != NULL) re_node_set_free (dfa->eclosures + i); if (dfa->inveclosures != NULL) re_node_set_free (dfa->inveclosures + i); if (dfa->edests != NULL) re_node_set_free (dfa->edests + i); } re_free (dfa->edests); re_free (dfa->eclosures); re_free (dfa->inveclosures); re_free (dfa->nodes); if (dfa->state_table) for (i = 0; i <= dfa->state_hash_mask; ++i) { struct re_state_table_entry *entry = dfa->state_table + i; for (j = 0; j < entry->num; ++j) { re_dfastate_t *state = entry->array[j]; free_state (state); } re_free (entry->array); } re_free (dfa->state_table); #ifdef RE_ENABLE_I18N if (dfa->sb_char != utf8_sb_map) re_free (dfa->sb_char); #endif re_free (dfa->subexp_map); #ifdef DEBUG re_free (dfa->re_str); #endif re_free (dfa); } /* Free dynamically allocated space used by PREG. */ void regfree (preg) regex_t *preg; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; if (BE (dfa != NULL, 1)) free_dfa_content (dfa); preg->buffer = NULL; preg->allocated = 0; re_free (preg->fastmap); preg->fastmap = NULL; re_free (preg->translate); preg->translate = NULL; } #ifdef _LIBC weak_alias (__regfree, regfree) #endif /* Entry points compatible with 4.2 BSD regex library. We don't define them unless specifically requested. */ #if defined _REGEX_RE_COMP || defined _LIBC /* BSD has one and only one pattern buffer. */ static struct re_pattern_buffer re_comp_buf; char * # ifdef _LIBC /* Make these definitions weak in libc, so POSIX programs can redefine these names if they don't use our functions, and still use regcomp/regexec above without link errors. */ weak_function # endif re_comp (s) const char *s; { reg_errcode_t ret; char *fastmap; if (!s) { if (!re_comp_buf.buffer) return gettext ("No previous regular expression"); return 0; } if (re_comp_buf.buffer) { fastmap = re_comp_buf.fastmap; re_comp_buf.fastmap = NULL; __regfree (&re_comp_buf); memset (&re_comp_buf, '\0', sizeof (re_comp_buf)); re_comp_buf.fastmap = fastmap; } if (re_comp_buf.fastmap == NULL) { re_comp_buf.fastmap = (char *) malloc (SBC_MAX); if (re_comp_buf.fastmap == NULL) return (char *) gettext (__re_error_msgid + __re_error_msgid_idx[(int) REG_ESPACE]); } /* Since `re_exec' always passes NULL for the `regs' argument, we don't need to initialize the pattern buffer fields which affect it. */ /* Match anchors at newlines. */ re_comp_buf.newline_anchor = 1; ret = re_compile_internal (&re_comp_buf, s, strlen (s), re_syntax_options); if (!ret) return NULL; /* Yes, we're discarding `const' here if !HAVE_LIBINTL. */ return (char *) gettext (__re_error_msgid + __re_error_msgid_idx[(int) ret]); } #ifdef _LIBC libc_freeres_fn (free_mem) { __regfree (&re_comp_buf); } #endif #endif /* _REGEX_RE_COMP */ /* Internal entry point. Compile the regular expression PATTERN, whose length is LENGTH. SYNTAX indicate regular expression's syntax. */ static reg_errcode_t re_compile_internal (preg, pattern, length, syntax) regex_t *preg; const char * pattern; int length; reg_syntax_t syntax; { reg_errcode_t err = REG_NOERROR; re_dfa_t *dfa; re_string_t regexp; /* Initialize the pattern buffer. */ preg->fastmap_accurate = 0; preg->syntax = syntax; preg->not_bol = preg->not_eol = 0; preg->used = 0; preg->re_nsub = 0; preg->can_be_null = 0; preg->regs_allocated = REGS_UNALLOCATED; /* Initialize the dfa. */ dfa = (re_dfa_t *) preg->buffer; if (BE (preg->allocated < sizeof (re_dfa_t), 0)) { /* If zero allocated, but buffer is non-null, try to realloc enough space. This loses if buffer's address is bogus, but that is the user's responsibility. If ->buffer is NULL this is a simple allocation. */ dfa = re_realloc (preg->buffer, re_dfa_t, 1); if (dfa == NULL) return REG_ESPACE; preg->allocated = sizeof (re_dfa_t); preg->buffer = (unsigned char *) dfa; } preg->used = sizeof (re_dfa_t); err = init_dfa (dfa, length); if (BE (err != REG_NOERROR, 0)) { free_dfa_content (dfa); preg->buffer = NULL; preg->allocated = 0; return err; } #ifdef DEBUG dfa->re_str = re_malloc (char, length + 1); strncpy (dfa->re_str, pattern, length + 1); #endif err = re_string_construct (®exp, pattern, length, preg->translate, syntax & RE_ICASE, dfa); if (BE (err != REG_NOERROR, 0)) { re_compile_internal_free_return: free_workarea_compile (preg); re_string_destruct (®exp); free_dfa_content (dfa); preg->buffer = NULL; preg->allocated = 0; return err; } /* Parse the regular expression, and build a structure tree. */ preg->re_nsub = 0; dfa->str_tree = parse (®exp, preg, syntax, &err); if (BE (dfa->str_tree == NULL, 0)) goto re_compile_internal_free_return; #ifdef RE_ENABLE_I18N /* If possible, do searching in single byte encoding to speed things up. */ if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL) optimize_utf8 (dfa); #endif if (preg->re_nsub > 0) { struct subexp_optimize so; so.dfa = dfa; so.nodes = dfa->nodes; so.no_sub = preg->no_sub; so.re_nsub = preg->re_nsub; dfa->str_tree = optimize_subexps (&so, dfa->str_tree, -1, 0); } /* Analyze the tree and collect information which is necessary to create the dfa. */ err = analyze (dfa); if (BE (err != REG_NOERROR, 0)) goto re_compile_internal_free_return; /* Then create the initial state of the dfa. */ err = create_initial_state (dfa); /* Release work areas. */ free_workarea_compile (preg); re_string_destruct (®exp); if (BE (err != REG_NOERROR, 0)) { free_dfa_content (dfa); preg->buffer = NULL; preg->allocated = 0; } return err; } /* Initialize DFA. We use the length of the regular expression PAT_LEN as the initial length of some arrays. */ static reg_errcode_t init_dfa (dfa, pat_len) re_dfa_t *dfa; int pat_len; { int table_size; #ifndef _LIBC char *codeset_name; #endif memset (dfa, '\0', sizeof (re_dfa_t)); /* Force allocation of str_tree_storage the first time. */ dfa->str_tree_storage_idx = BIN_TREE_STORAGE_SIZE; dfa->nodes_alloc = pat_len + 1; dfa->nodes = re_malloc (re_token_t, dfa->nodes_alloc); dfa->states_alloc = pat_len + 1; /* table_size = 2 ^ ceil(log pat_len) */ for (table_size = 1; table_size > 0; table_size <<= 1) if (table_size > pat_len) break; dfa->state_table = calloc (sizeof (struct re_state_table_entry), table_size); dfa->state_hash_mask = table_size - 1; dfa->mb_cur_max = MB_CUR_MAX; #ifdef _LIBC if (dfa->mb_cur_max == 6 && strcmp (_NL_CURRENT (LC_CTYPE, _NL_CTYPE_CODESET_NAME), "UTF-8") == 0) dfa->is_utf8 = 1; dfa->map_notascii = (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MAP_TO_NONASCII) != 0); #else # ifdef HAVE_LANGINFO_CODESET codeset_name = nl_langinfo (CODESET); # else codeset_name = getenv ("LC_ALL"); if (codeset_name == NULL) codeset_name = getenv ("LC_CTYPE"); if (codeset_name == NULL) codeset_name = getenv ("LANG"); if (codeset_name == NULL) codeset_name = ""; else if (strchr (codeset_name, '.') != NULL) codeset_name = strchr (codeset_name, '.') + 1; # endif if (strcasecmp (codeset_name, "UTF-8") == 0 || strcasecmp (codeset_name, "UTF8") == 0) dfa->is_utf8 = 1; /* We check exhaustively in the loop below if this charset is a superset of ASCII. */ dfa->map_notascii = 0; #endif #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) { if (dfa->is_utf8) dfa->sb_char = (re_bitset_ptr_t) utf8_sb_map; else { int i, j, ch; dfa->sb_char = (re_bitset_ptr_t) calloc (sizeof (bitset), 1); if (BE (dfa->sb_char == NULL, 0)) return REG_ESPACE; /* Clear all bits by, then set those corresponding to single byte chars. */ bitset_empty (dfa->sb_char); for (i = 0, ch = 0; i < BITSET_UINTS; ++i) for (j = 0; j < UINT_BITS; ++j, ++ch) { wchar_t wch = __btowc (ch); if (wch != WEOF) dfa->sb_char[i] |= 1 << j; # ifndef _LIBC if (isascii (ch) && wch != (wchar_t) ch) dfa->map_notascii = 1; # endif } } } #endif if (BE (dfa->nodes == NULL || dfa->state_table == NULL, 0)) return REG_ESPACE; return REG_NOERROR; } /* Initialize WORD_CHAR table, which indicate which character is "word". In this case "word" means that it is the word construction character used by some operators like "\<", "\>", etc. */ static void init_word_char (dfa) re_dfa_t *dfa; { int i, j, ch; dfa->word_ops_used = 1; for (i = 0, ch = 0; i < BITSET_UINTS; ++i) for (j = 0; j < UINT_BITS; ++j, ++ch) if (isalnum (ch) || ch == '_') dfa->word_char[i] |= 1 << j; } /* Free the work area which are only used while compiling. */ static void free_workarea_compile (preg) regex_t *preg; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_storage_t *storage, *next; for (storage = dfa->str_tree_storage; storage; storage = next) { next = storage->next; re_free (storage); } dfa->str_tree_storage = NULL; dfa->str_tree_storage_idx = BIN_TREE_STORAGE_SIZE; dfa->str_tree = NULL; re_free (dfa->org_indices); dfa->org_indices = NULL; } /* Create initial states for all contexts. */ static reg_errcode_t create_initial_state (dfa) re_dfa_t *dfa; { int first, i; reg_errcode_t err; re_node_set init_nodes; /* Initial states have the epsilon closure of the node which is the first node of the regular expression. */ first = dfa->str_tree->first; dfa->init_node = first; err = re_node_set_init_copy (&init_nodes, dfa->eclosures + first); if (BE (err != REG_NOERROR, 0)) return err; /* The back-references which are in initial states can epsilon transit, since in this case all of the subexpressions can be null. Then we add epsilon closures of the nodes which are the next nodes of the back-references. */ if (dfa->nbackref > 0) for (i = 0; i < init_nodes.nelem; ++i) { int node_idx = init_nodes.elems[i]; re_token_type_t type = dfa->nodes[node_idx].type; int clexp_idx; if (type != OP_BACK_REF) continue; for (clexp_idx = 0; clexp_idx < init_nodes.nelem; ++clexp_idx) { re_token_t *clexp_node; clexp_node = dfa->nodes + init_nodes.elems[clexp_idx]; if (clexp_node->type == OP_CLOSE_SUBEXP && clexp_node->opr.idx == dfa->nodes[node_idx].opr.idx) break; } if (clexp_idx == init_nodes.nelem) continue; if (type == OP_BACK_REF) { int dest_idx = dfa->edests[node_idx].elems[0]; if (!re_node_set_contains (&init_nodes, dest_idx)) { re_node_set_merge (&init_nodes, dfa->eclosures + dest_idx); i = 0; } } } /* It must be the first time to invoke acquire_state. */ dfa->init_state = re_acquire_state_context (&err, dfa, &init_nodes, 0); /* We don't check ERR here, since the initial state must not be NULL. */ if (BE (dfa->init_state == NULL, 0)) return err; if (dfa->init_state->has_constraint) { dfa->init_state_word = re_acquire_state_context (&err, dfa, &init_nodes, CONTEXT_WORD); dfa->init_state_nl = re_acquire_state_context (&err, dfa, &init_nodes, CONTEXT_NEWLINE); dfa->init_state_begbuf = re_acquire_state_context (&err, dfa, &init_nodes, CONTEXT_NEWLINE | CONTEXT_BEGBUF); if (BE (dfa->init_state_word == NULL || dfa->init_state_nl == NULL || dfa->init_state_begbuf == NULL, 0)) return err; } else dfa->init_state_word = dfa->init_state_nl = dfa->init_state_begbuf = dfa->init_state; re_node_set_free (&init_nodes); return REG_NOERROR; } #ifdef RE_ENABLE_I18N /* If it is possible to do searching in single byte encoding instead of UTF-8 to speed things up, set dfa->mb_cur_max to 1, clear is_utf8 and change DFA nodes where needed. */ static void optimize_utf8 (dfa) re_dfa_t *dfa; { int node, i, mb_chars = 0, has_period = 0; for (node = 0; node < dfa->nodes_len; ++node) switch (dfa->nodes[node].type) { case CHARACTER: if (dfa->nodes[node].opr.c >= 0x80) mb_chars = 1; break; case ANCHOR: switch (dfa->nodes[node].opr.idx) { case LINE_FIRST: case LINE_LAST: case BUF_FIRST: case BUF_LAST: break; default: /* Word anchors etc. cannot be handled. */ return; } break; case OP_PERIOD: has_period = 1; break; case OP_BACK_REF: case OP_ALT: case END_OF_RE: case OP_DUP_ASTERISK: case OP_DUP_QUESTION: case OP_OPEN_SUBEXP: case OP_CLOSE_SUBEXP: break; case SIMPLE_BRACKET: /* Just double check. */ for (i = 0x80 / UINT_BITS; i < BITSET_UINTS; ++i) if (dfa->nodes[node].opr.sbcset[i]) return; break; default: return; } if (mb_chars || has_period) for (node = 0; node < dfa->nodes_len; ++node) { if (dfa->nodes[node].type == CHARACTER && dfa->nodes[node].opr.c >= 0x80) dfa->nodes[node].mb_partial = 0; else if (dfa->nodes[node].type == OP_PERIOD) dfa->nodes[node].type = OP_UTF8_PERIOD; } /* The search can be in single byte locale. */ dfa->mb_cur_max = 1; dfa->is_utf8 = 0; dfa->has_mb_node = dfa->nbackref > 0 || has_period; } #endif static bin_tree_t * optimize_subexps (so, node, sidx, depth) struct subexp_optimize *so; bin_tree_t *node; int sidx, depth; { int idx, new_depth, new_sidx; bin_tree_t *ret; if (node == NULL) return NULL; new_depth = 0; new_sidx = sidx; if ((depth & 1) && node->type == CONCAT && node->right && node->right->type == 0 && so->nodes[idx = node->right->node_idx].type == OP_CLOSE_SUBEXP) { new_depth = depth + 1; if (new_depth == 2 || (so->nodes[idx].opr.idx < 8 * sizeof (so->dfa->used_bkref_map) && so->dfa->used_bkref_map & (1 << so->nodes[idx].opr.idx))) new_sidx = so->nodes[idx].opr.idx; } node->left = optimize_subexps (so, node->left, new_sidx, new_depth); new_depth = (depth & 1) == 0 && node->type == CONCAT && node->left && node->left->type == 0 && so->nodes[node->left->node_idx].type == OP_OPEN_SUBEXP ? depth + 1 : 0; node->right = optimize_subexps (so, node->right, sidx, new_depth); if (node->type != CONCAT) return node; if ((depth & 1) == 0 && node->left && node->left->type == 0 && so->nodes[idx = node->left->node_idx].type == OP_OPEN_SUBEXP) ret = node->right; else if ((depth & 1) && node->right && node->right->type == 0 && so->nodes[idx = node->right->node_idx].type == OP_CLOSE_SUBEXP) ret = node->left; else return node; if (so->nodes[idx].opr.idx < 8 * sizeof (so->dfa->used_bkref_map) && so->dfa->used_bkref_map & (1 << so->nodes[idx].opr.idx)) return node; if (!so->no_sub) { int i; if (depth < 2) return node; if (so->dfa->subexp_map == NULL) { so->dfa->subexp_map = re_malloc (int, so->re_nsub); if (so->dfa->subexp_map == NULL) return node; for (i = 0; i < so->re_nsub; i++) so->dfa->subexp_map[i] = i; } i = so->nodes[idx].opr.idx; assert (sidx < i); so->dfa->subexp_map[i] = sidx; } so->nodes[idx].type = OP_DELETED_SUBEXP; ret->parent = node->parent; return ret; } /* Analyze the structure tree, and calculate "first", "next", "edest", "eclosure", and "inveclosure". */ static reg_errcode_t analyze (dfa) re_dfa_t *dfa; { int i; reg_errcode_t ret; /* Allocate arrays. */ dfa->nexts = re_malloc (int, dfa->nodes_alloc); dfa->org_indices = re_malloc (int, dfa->nodes_alloc); dfa->edests = re_malloc (re_node_set, dfa->nodes_alloc); dfa->eclosures = re_malloc (re_node_set, dfa->nodes_alloc); dfa->inveclosures = re_malloc (re_node_set, dfa->nodes_alloc); if (BE (dfa->nexts == NULL || dfa->org_indices == NULL || dfa->edests == NULL || dfa->eclosures == NULL || dfa->inveclosures == NULL, 0)) return REG_ESPACE; /* Initialize them. */ for (i = 0; i < dfa->nodes_len; ++i) { dfa->nexts[i] = -1; re_node_set_init_empty (dfa->edests + i); re_node_set_init_empty (dfa->eclosures + i); re_node_set_init_empty (dfa->inveclosures + i); } ret = analyze_tree (dfa, dfa->str_tree); if (BE (ret == REG_NOERROR, 1)) { ret = calc_eclosure (dfa); if (ret == REG_NOERROR) calc_inveclosure (dfa); } return ret; } /* Helper functions for analyze. This function calculate "first", "next", and "edest" for the subtree whose root is NODE. */ static reg_errcode_t analyze_tree (dfa, node) re_dfa_t *dfa; bin_tree_t *node; { reg_errcode_t ret; if (node->first == -1) calc_first (dfa, node); if (node->next == -1) calc_next (dfa, node); calc_epsdest (dfa, node); /* Calculate "first" etc. for the left child. */ if (node->left != NULL) { ret = analyze_tree (dfa, node->left); if (BE (ret != REG_NOERROR, 0)) return ret; } /* Calculate "first" etc. for the right child. */ if (node->right != NULL) { ret = analyze_tree (dfa, node->right); if (BE (ret != REG_NOERROR, 0)) return ret; } return REG_NOERROR; } /* Calculate "first" for the node NODE. */ static void calc_first (dfa, node) re_dfa_t *dfa; bin_tree_t *node; { int idx, type; idx = node->node_idx; type = (node->type == 0) ? dfa->nodes[idx].type : node->type; switch (type) { #ifdef DEBUG case OP_OPEN_BRACKET: case OP_CLOSE_BRACKET: case OP_OPEN_DUP_NUM: case OP_CLOSE_DUP_NUM: case OP_DUP_PLUS: case OP_NON_MATCH_LIST: case OP_OPEN_COLL_ELEM: case OP_CLOSE_COLL_ELEM: case OP_OPEN_EQUIV_CLASS: case OP_CLOSE_EQUIV_CLASS: case OP_OPEN_CHAR_CLASS: case OP_CLOSE_CHAR_CLASS: /* These must not appear here. */ assert (0); #endif case END_OF_RE: case CHARACTER: case OP_PERIOD: case OP_DUP_ASTERISK: case OP_DUP_QUESTION: #ifdef RE_ENABLE_I18N case OP_UTF8_PERIOD: case COMPLEX_BRACKET: #endif /* RE_ENABLE_I18N */ case SIMPLE_BRACKET: case OP_BACK_REF: case ANCHOR: case OP_OPEN_SUBEXP: case OP_CLOSE_SUBEXP: node->first = idx; break; case OP_ALT: node->first = idx; break; /* else fall through */ default: #ifdef DEBUG assert (node->left != NULL); #endif if (node->left->first == -1) calc_first (dfa, node->left); node->first = node->left->first; break; } } /* Calculate "next" for the node NODE. */ static void calc_next (dfa, node) re_dfa_t *dfa; bin_tree_t *node; { int idx, type; bin_tree_t *parent = node->parent; if (parent == NULL) { node->next = -1; idx = node->node_idx; if (node->type == 0) dfa->nexts[idx] = node->next; return; } idx = parent->node_idx; type = (parent->type == 0) ? dfa->nodes[idx].type : parent->type; switch (type) { case OP_DUP_ASTERISK: node->next = idx; break; case CONCAT: if (parent->left == node) { if (parent->right->first == -1) calc_first (dfa, parent->right); node->next = parent->right->first; break; } /* else fall through */ default: if (parent->next == -1) calc_next (dfa, parent); node->next = parent->next; break; } idx = node->node_idx; if (node->type == 0) dfa->nexts[idx] = node->next; } /* Calculate "edest" for the node NODE. */ static void calc_epsdest (dfa, node) re_dfa_t *dfa; bin_tree_t *node; { int idx; idx = node->node_idx; if (node->type == 0) { if (dfa->nodes[idx].type == OP_DUP_ASTERISK || dfa->nodes[idx].type == OP_DUP_QUESTION) { if (node->left->first == -1) calc_first (dfa, node->left); if (node->next == -1) calc_next (dfa, node); re_node_set_init_2 (dfa->edests + idx, node->left->first, node->next); } else if (dfa->nodes[idx].type == OP_ALT) { int left, right; if (node->left != NULL) { if (node->left->first == -1) calc_first (dfa, node->left); left = node->left->first; } else { if (node->next == -1) calc_next (dfa, node); left = node->next; } if (node->right != NULL) { if (node->right->first == -1) calc_first (dfa, node->right); right = node->right->first; } else { if (node->next == -1) calc_next (dfa, node); right = node->next; } re_node_set_init_2 (dfa->edests + idx, left, right); } else if (dfa->nodes[idx].type == ANCHOR || dfa->nodes[idx].type == OP_OPEN_SUBEXP || dfa->nodes[idx].type == OP_CLOSE_SUBEXP || dfa->nodes[idx].type == OP_BACK_REF) re_node_set_init_1 (dfa->edests + idx, node->next); else assert (!IS_EPSILON_NODE (dfa->nodes[idx].type)); } } /* Duplicate the epsilon closure of the node ROOT_NODE. Note that duplicated nodes have constraint INIT_CONSTRAINT in addition to their own constraint. */ static reg_errcode_t duplicate_node_closure (dfa, top_org_node, top_clone_node, root_node, init_constraint) re_dfa_t *dfa; int top_org_node, top_clone_node, root_node; unsigned int init_constraint; { reg_errcode_t err; int org_node, clone_node, ret; unsigned int constraint = init_constraint; for (org_node = top_org_node, clone_node = top_clone_node;;) { int org_dest, clone_dest; if (dfa->nodes[org_node].type == OP_BACK_REF) { /* If the back reference epsilon-transit, its destination must also have the constraint. Then duplicate the epsilon closure of the destination of the back reference, and store it in edests of the back reference. */ org_dest = dfa->nexts[org_node]; re_node_set_empty (dfa->edests + clone_node); err = duplicate_node (&clone_dest, dfa, org_dest, constraint); if (BE (err != REG_NOERROR, 0)) return err; dfa->nexts[clone_node] = dfa->nexts[org_node]; ret = re_node_set_insert (dfa->edests + clone_node, clone_dest); if (BE (ret < 0, 0)) return REG_ESPACE; } else if (dfa->edests[org_node].nelem == 0) { /* In case of the node can't epsilon-transit, don't duplicate the destination and store the original destination as the destination of the node. */ dfa->nexts[clone_node] = dfa->nexts[org_node]; break; } else if (dfa->edests[org_node].nelem == 1) { /* In case of the node can epsilon-transit, and it has only one destination. */ org_dest = dfa->edests[org_node].elems[0]; re_node_set_empty (dfa->edests + clone_node); if (dfa->nodes[org_node].type == ANCHOR) { /* In case of the node has another constraint, append it. */ if (org_node == root_node && clone_node != org_node) { /* ...but if the node is root_node itself, it means the epsilon closure have a loop, then tie it to the destination of the root_node. */ ret = re_node_set_insert (dfa->edests + clone_node, org_dest); if (BE (ret < 0, 0)) return REG_ESPACE; break; } constraint |= dfa->nodes[org_node].opr.ctx_type; } err = duplicate_node (&clone_dest, dfa, org_dest, constraint); if (BE (err != REG_NOERROR, 0)) return err; ret = re_node_set_insert (dfa->edests + clone_node, clone_dest); if (BE (ret < 0, 0)) return REG_ESPACE; } else /* dfa->edests[org_node].nelem == 2 */ { /* In case of the node can epsilon-transit, and it has two destinations. E.g. '|', '*', '+', '?'. */ org_dest = dfa->edests[org_node].elems[0]; re_node_set_empty (dfa->edests + clone_node); /* Search for a duplicated node which satisfies the constraint. */ clone_dest = search_duplicated_node (dfa, org_dest, constraint); if (clone_dest == -1) { /* There are no such a duplicated node, create a new one. */ err = duplicate_node (&clone_dest, dfa, org_dest, constraint); if (BE (err != REG_NOERROR, 0)) return err; ret = re_node_set_insert (dfa->edests + clone_node, clone_dest); if (BE (ret < 0, 0)) return REG_ESPACE; err = duplicate_node_closure (dfa, org_dest, clone_dest, root_node, constraint); if (BE (err != REG_NOERROR, 0)) return err; } else { /* There are a duplicated node which satisfy the constraint, use it to avoid infinite loop. */ ret = re_node_set_insert (dfa->edests + clone_node, clone_dest); if (BE (ret < 0, 0)) return REG_ESPACE; } org_dest = dfa->edests[org_node].elems[1]; err = duplicate_node (&clone_dest, dfa, org_dest, constraint); if (BE (err != REG_NOERROR, 0)) return err; ret = re_node_set_insert (dfa->edests + clone_node, clone_dest); if (BE (ret < 0, 0)) return REG_ESPACE; } org_node = org_dest; clone_node = clone_dest; } return REG_NOERROR; } /* Search for a node which is duplicated from the node ORG_NODE, and satisfies the constraint CONSTRAINT. */ static int search_duplicated_node (dfa, org_node, constraint) re_dfa_t *dfa; int org_node; unsigned int constraint; { int idx; for (idx = dfa->nodes_len - 1; dfa->nodes[idx].duplicated && idx > 0; --idx) { if (org_node == dfa->org_indices[idx] && constraint == dfa->nodes[idx].constraint) return idx; /* Found. */ } return -1; /* Not found. */ } /* Duplicate the node whose index is ORG_IDX and set the constraint CONSTRAINT. The new index will be stored in NEW_IDX and return REG_NOERROR if succeeded, otherwise return the error code. */ static reg_errcode_t duplicate_node (new_idx, dfa, org_idx, constraint) re_dfa_t *dfa; int *new_idx, org_idx; unsigned int constraint; { int dup_idx = re_dfa_add_node (dfa, dfa->nodes[org_idx], 1); if (BE (dup_idx == -1, 0)) return REG_ESPACE; dfa->nodes[dup_idx].constraint = constraint; if (dfa->nodes[org_idx].type == ANCHOR) dfa->nodes[dup_idx].constraint |= dfa->nodes[org_idx].opr.ctx_type; dfa->nodes[dup_idx].duplicated = 1; re_node_set_init_empty (dfa->edests + dup_idx); re_node_set_init_empty (dfa->eclosures + dup_idx); re_node_set_init_empty (dfa->inveclosures + dup_idx); /* Store the index of the original node. */ dfa->org_indices[dup_idx] = org_idx; *new_idx = dup_idx; return REG_NOERROR; } static void calc_inveclosure (dfa) re_dfa_t *dfa; { int src, idx, dest; for (src = 0; src < dfa->nodes_len; ++src) { if (dfa->nodes[src].type == OP_DELETED_SUBEXP) continue; for (idx = 0; idx < dfa->eclosures[src].nelem; ++idx) { dest = dfa->eclosures[src].elems[idx]; re_node_set_insert_last (dfa->inveclosures + dest, src); } } } /* Calculate "eclosure" for all the node in DFA. */ static reg_errcode_t calc_eclosure (dfa) re_dfa_t *dfa; { int node_idx, incomplete; #ifdef DEBUG assert (dfa->nodes_len > 0); #endif incomplete = 0; /* For each nodes, calculate epsilon closure. */ for (node_idx = 0; ; ++node_idx) { reg_errcode_t err; re_node_set eclosure_elem; if (node_idx == dfa->nodes_len) { if (!incomplete) break; incomplete = 0; node_idx = 0; } #ifdef DEBUG assert (dfa->eclosures[node_idx].nelem != -1); #endif if (dfa->nodes[node_idx].type == OP_DELETED_SUBEXP) continue; /* If we have already calculated, skip it. */ if (dfa->eclosures[node_idx].nelem != 0) continue; /* Calculate epsilon closure of `node_idx'. */ err = calc_eclosure_iter (&eclosure_elem, dfa, node_idx, 1); if (BE (err != REG_NOERROR, 0)) return err; if (dfa->eclosures[node_idx].nelem == 0) { incomplete = 1; re_node_set_free (&eclosure_elem); } } return REG_NOERROR; } /* Calculate epsilon closure of NODE. */ static reg_errcode_t calc_eclosure_iter (new_set, dfa, node, root) re_node_set *new_set; re_dfa_t *dfa; int node, root; { reg_errcode_t err; unsigned int constraint; int i, incomplete; re_node_set eclosure; incomplete = 0; err = re_node_set_alloc (&eclosure, dfa->edests[node].nelem + 1); if (BE (err != REG_NOERROR, 0)) return err; /* This indicates that we are calculating this node now. We reference this value to avoid infinite loop. */ dfa->eclosures[node].nelem = -1; constraint = ((dfa->nodes[node].type == ANCHOR) ? dfa->nodes[node].opr.ctx_type : 0); /* If the current node has constraints, duplicate all nodes. Since they must inherit the constraints. */ if (constraint && dfa->edests[node].nelem && !dfa->nodes[dfa->edests[node].elems[0]].duplicated) { int org_node, cur_node; org_node = cur_node = node; err = duplicate_node_closure (dfa, node, node, node, constraint); if (BE (err != REG_NOERROR, 0)) return err; } /* Expand each epsilon destination nodes. */ if (IS_EPSILON_NODE(dfa->nodes[node].type)) for (i = 0; i < dfa->edests[node].nelem; ++i) { re_node_set eclosure_elem; int edest = dfa->edests[node].elems[i]; /* If calculating the epsilon closure of `edest' is in progress, return intermediate result. */ if (dfa->eclosures[edest].nelem == -1) { incomplete = 1; continue; } /* If we haven't calculated the epsilon closure of `edest' yet, calculate now. Otherwise use calculated epsilon closure. */ if (dfa->eclosures[edest].nelem == 0) { err = calc_eclosure_iter (&eclosure_elem, dfa, edest, 0); if (BE (err != REG_NOERROR, 0)) return err; } else eclosure_elem = dfa->eclosures[edest]; /* Merge the epsilon closure of `edest'. */ re_node_set_merge (&eclosure, &eclosure_elem); /* If the epsilon closure of `edest' is incomplete, the epsilon closure of this node is also incomplete. */ if (dfa->eclosures[edest].nelem == 0) { incomplete = 1; re_node_set_free (&eclosure_elem); } } /* Epsilon closures include itself. */ re_node_set_insert (&eclosure, node); if (incomplete && !root) dfa->eclosures[node].nelem = 0; else dfa->eclosures[node] = eclosure; *new_set = eclosure; return REG_NOERROR; } /* Functions for token which are used in the parser. */ /* Fetch a token from INPUT. We must not use this function inside bracket expressions. */ static void fetch_token (result, input, syntax) re_token_t *result; re_string_t *input; reg_syntax_t syntax; { re_string_skip_bytes (input, peek_token (result, input, syntax)); } /* Peek a token from INPUT, and return the length of the token. We must not use this function inside bracket expressions. */ static int peek_token (token, input, syntax) re_token_t *token; re_string_t *input; reg_syntax_t syntax; { unsigned char c; if (re_string_eoi (input)) { token->type = END_OF_RE; return 0; } c = re_string_peek_byte (input, 0); token->opr.c = c; token->word_char = 0; #ifdef RE_ENABLE_I18N token->mb_partial = 0; if (input->mb_cur_max > 1 && !re_string_first_byte (input, re_string_cur_idx (input))) { token->type = CHARACTER; token->mb_partial = 1; return 1; } #endif if (c == '\\') { unsigned char c2; if (re_string_cur_idx (input) + 1 >= re_string_length (input)) { token->type = BACK_SLASH; return 1; } c2 = re_string_peek_byte_case (input, 1); token->opr.c = c2; token->type = CHARACTER; #ifdef RE_ENABLE_I18N if (input->mb_cur_max > 1) { wint_t wc = re_string_wchar_at (input, re_string_cur_idx (input) + 1); token->word_char = IS_WIDE_WORD_CHAR (wc) != 0; } else #endif token->word_char = IS_WORD_CHAR (c2) != 0; switch (c2) { case '|': if (!(syntax & RE_LIMITED_OPS) && !(syntax & RE_NO_BK_VBAR)) token->type = OP_ALT; break; case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (!(syntax & RE_NO_BK_REFS)) { token->type = OP_BACK_REF; token->opr.idx = c2 - '1'; } break; case '<': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = WORD_FIRST; } break; case '>': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = WORD_LAST; } break; case 'b': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = WORD_DELIM; } break; case 'B': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = NOT_WORD_DELIM; } break; case 'w': if (!(syntax & RE_NO_GNU_OPS)) token->type = OP_WORD; break; case 'W': if (!(syntax & RE_NO_GNU_OPS)) token->type = OP_NOTWORD; break; case 's': if (!(syntax & RE_NO_GNU_OPS)) token->type = OP_SPACE; break; case 'S': if (!(syntax & RE_NO_GNU_OPS)) token->type = OP_NOTSPACE; break; case '`': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = BUF_FIRST; } break; case '\'': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; token->opr.ctx_type = BUF_LAST; } break; case '(': if (!(syntax & RE_NO_BK_PARENS)) token->type = OP_OPEN_SUBEXP; break; case ')': if (!(syntax & RE_NO_BK_PARENS)) token->type = OP_CLOSE_SUBEXP; break; case '+': if (!(syntax & RE_LIMITED_OPS) && (syntax & RE_BK_PLUS_QM)) token->type = OP_DUP_PLUS; break; case '?': if (!(syntax & RE_LIMITED_OPS) && (syntax & RE_BK_PLUS_QM)) token->type = OP_DUP_QUESTION; break; case '{': if ((syntax & RE_INTERVALS) && (!(syntax & RE_NO_BK_BRACES))) token->type = OP_OPEN_DUP_NUM; break; case '}': if ((syntax & RE_INTERVALS) && (!(syntax & RE_NO_BK_BRACES))) token->type = OP_CLOSE_DUP_NUM; break; default: break; } return 2; } token->type = CHARACTER; #ifdef RE_ENABLE_I18N if (input->mb_cur_max > 1) { wint_t wc = re_string_wchar_at (input, re_string_cur_idx (input)); token->word_char = IS_WIDE_WORD_CHAR (wc) != 0; } else #endif token->word_char = IS_WORD_CHAR (token->opr.c); switch (c) { case '\n': if (syntax & RE_NEWLINE_ALT) token->type = OP_ALT; break; case '|': if (!(syntax & RE_LIMITED_OPS) && (syntax & RE_NO_BK_VBAR)) token->type = OP_ALT; break; case '*': token->type = OP_DUP_ASTERISK; break; case '+': if (!(syntax & RE_LIMITED_OPS) && !(syntax & RE_BK_PLUS_QM)) token->type = OP_DUP_PLUS; break; case '?': if (!(syntax & RE_LIMITED_OPS) && !(syntax & RE_BK_PLUS_QM)) token->type = OP_DUP_QUESTION; break; case '{': if ((syntax & RE_INTERVALS) && (syntax & RE_NO_BK_BRACES)) token->type = OP_OPEN_DUP_NUM; break; case '}': if ((syntax & RE_INTERVALS) && (syntax & RE_NO_BK_BRACES)) token->type = OP_CLOSE_DUP_NUM; break; case '(': if (syntax & RE_NO_BK_PARENS) token->type = OP_OPEN_SUBEXP; break; case ')': if (syntax & RE_NO_BK_PARENS) token->type = OP_CLOSE_SUBEXP; break; case '[': token->type = OP_OPEN_BRACKET; break; case '.': token->type = OP_PERIOD; break; case '^': if (!(syntax & (RE_CONTEXT_INDEP_ANCHORS | RE_CARET_ANCHORS_HERE)) && re_string_cur_idx (input) != 0) { char prev = re_string_peek_byte (input, -1); if (!(syntax & RE_NEWLINE_ALT) || prev != '\n') break; } token->type = ANCHOR; token->opr.ctx_type = LINE_FIRST; break; case '$': if (!(syntax & RE_CONTEXT_INDEP_ANCHORS) && re_string_cur_idx (input) + 1 != re_string_length (input)) { re_token_t next; re_string_skip_bytes (input, 1); peek_token (&next, input, syntax); re_string_skip_bytes (input, -1); if (next.type != OP_ALT && next.type != OP_CLOSE_SUBEXP) break; } token->type = ANCHOR; token->opr.ctx_type = LINE_LAST; break; default: break; } return 1; } /* Peek a token from INPUT, and return the length of the token. We must not use this function out of bracket expressions. */ static int peek_token_bracket (token, input, syntax) re_token_t *token; re_string_t *input; reg_syntax_t syntax; { unsigned char c; if (re_string_eoi (input)) { token->type = END_OF_RE; return 0; } c = re_string_peek_byte (input, 0); token->opr.c = c; #ifdef RE_ENABLE_I18N if (input->mb_cur_max > 1 && !re_string_first_byte (input, re_string_cur_idx (input))) { token->type = CHARACTER; return 1; } #endif /* RE_ENABLE_I18N */ if (c == '\\' && (syntax & RE_BACKSLASH_ESCAPE_IN_LISTS) && re_string_cur_idx (input) + 1 < re_string_length (input)) { /* In this case, '\' escape a character. */ unsigned char c2; re_string_skip_bytes (input, 1); c2 = re_string_peek_byte (input, 0); token->opr.c = c2; token->type = CHARACTER; return 1; } if (c == '[') /* '[' is a special char in a bracket exps. */ { unsigned char c2; int token_len; if (re_string_cur_idx (input) + 1 < re_string_length (input)) c2 = re_string_peek_byte (input, 1); else c2 = 0; token->opr.c = c2; token_len = 2; switch (c2) { case '.': token->type = OP_OPEN_COLL_ELEM; break; case '=': token->type = OP_OPEN_EQUIV_CLASS; break; case ':': if (syntax & RE_CHAR_CLASSES) { token->type = OP_OPEN_CHAR_CLASS; break; } /* else fall through. */ default: token->type = CHARACTER; token->opr.c = c; token_len = 1; break; } return token_len; } switch (c) { case '-': token->type = OP_CHARSET_RANGE; break; case ']': token->type = OP_CLOSE_BRACKET; break; case '^': token->type = OP_NON_MATCH_LIST; break; default: token->type = CHARACTER; } return 1; } /* Functions for parser. */ /* Entry point of the parser. Parse the regular expression REGEXP and return the structure tree. If an error is occured, ERR is set by error code, and return NULL. This function build the following tree, from regular expression : CAT / \ / \ EOR CAT means concatenation. EOR means end of regular expression. */ static bin_tree_t * parse (regexp, preg, syntax, err) re_string_t *regexp; regex_t *preg; reg_syntax_t syntax; reg_errcode_t *err; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_t *tree, *eor, *root; re_token_t current_token; dfa->syntax = syntax; fetch_token (¤t_token, regexp, syntax | RE_CARET_ANCHORS_HERE); tree = parse_reg_exp (regexp, preg, ¤t_token, syntax, 0, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; eor = re_dfa_add_tree_node (dfa, NULL, NULL, ¤t_token); if (tree != NULL) root = create_tree (dfa, tree, eor, CONCAT, 0); else root = eor; if (BE (eor == NULL || root == NULL, 0)) { *err = REG_ESPACE; return NULL; } return root; } /* This function build the following tree, from regular expression |: ALT / \ / \ ALT means alternative, which represents the operator `|'. */ static bin_tree_t * parse_reg_exp (regexp, preg, token, syntax, nest, err) re_string_t *regexp; regex_t *preg; re_token_t *token; reg_syntax_t syntax; int nest; reg_errcode_t *err; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_t *tree, *branch = NULL; tree = parse_branch (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; while (token->type == OP_ALT) { re_token_t alt_token = *token; fetch_token (token, regexp, syntax | RE_CARET_ANCHORS_HERE); if (token->type != OP_ALT && token->type != END_OF_RE && (nest == 0 || token->type != OP_CLOSE_SUBEXP)) { branch = parse_branch (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && branch == NULL, 0)) return NULL; } else branch = NULL; tree = re_dfa_add_tree_node (dfa, tree, branch, &alt_token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } dfa->has_plural_match = 1; } return tree; } /* This function build the following tree, from regular expression : CAT / \ / \ CAT means concatenation. */ static bin_tree_t * parse_branch (regexp, preg, token, syntax, nest, err) re_string_t *regexp; regex_t *preg; re_token_t *token; reg_syntax_t syntax; int nest; reg_errcode_t *err; { bin_tree_t *tree, *exp; re_dfa_t *dfa = (re_dfa_t *) preg->buffer; tree = parse_expression (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; while (token->type != OP_ALT && token->type != END_OF_RE && (nest == 0 || token->type != OP_CLOSE_SUBEXP)) { exp = parse_expression (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && exp == NULL, 0)) { return NULL; } if (tree != NULL && exp != NULL) { tree = create_tree (dfa, tree, exp, CONCAT, 0); if (tree == NULL) { *err = REG_ESPACE; return NULL; } } else if (tree == NULL) tree = exp; /* Otherwise exp == NULL, we don't need to create new tree. */ } return tree; } /* This function build the following tree, from regular expression a*: * | a */ static bin_tree_t * parse_expression (regexp, preg, token, syntax, nest, err) re_string_t *regexp; regex_t *preg; re_token_t *token; reg_syntax_t syntax; int nest; reg_errcode_t *err; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_t *tree; switch (token->type) { case CHARACTER: tree = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) { while (!re_string_eoi (regexp) && !re_string_first_byte (regexp, re_string_cur_idx (regexp))) { bin_tree_t *mbc_remain; fetch_token (token, regexp, syntax); mbc_remain = re_dfa_add_tree_node (dfa, NULL, NULL, token); tree = create_tree (dfa, tree, mbc_remain, CONCAT, 0); if (BE (mbc_remain == NULL || tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } } } #endif break; case OP_OPEN_SUBEXP: tree = parse_sub_exp (regexp, preg, token, syntax, nest + 1, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; break; case OP_OPEN_BRACKET: tree = parse_bracket_exp (regexp, dfa, token, syntax, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; break; case OP_BACK_REF: if (!BE (dfa->completed_bkref_map & (1 << token->opr.idx), 1)) { *err = REG_ESUBREG; return NULL; } dfa->used_bkref_map |= 1 << token->opr.idx; tree = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } ++dfa->nbackref; dfa->has_mb_node = 1; break; case OP_OPEN_DUP_NUM: if (syntax & RE_CONTEXT_INVALID_DUP) { *err = REG_BADRPT; return NULL; } /* FALLTHROUGH */ case OP_DUP_ASTERISK: case OP_DUP_PLUS: case OP_DUP_QUESTION: if (syntax & RE_CONTEXT_INVALID_OPS) { *err = REG_BADRPT; return NULL; } else if (syntax & RE_CONTEXT_INDEP_OPS) { fetch_token (token, regexp, syntax); return parse_expression (regexp, preg, token, syntax, nest, err); } /* else fall through */ case OP_CLOSE_SUBEXP: if ((token->type == OP_CLOSE_SUBEXP) && !(syntax & RE_UNMATCHED_RIGHT_PAREN_ORD)) { *err = REG_ERPAREN; return NULL; } /* else fall through */ case OP_CLOSE_DUP_NUM: /* We treat it as a normal character. */ /* Then we can these characters as normal characters. */ token->type = CHARACTER; /* mb_partial and word_char bits should be initialized already by peek_token. */ tree = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } break; case ANCHOR: if ((token->opr.ctx_type & (WORD_DELIM | NOT_WORD_DELIM | WORD_FIRST | WORD_LAST)) && dfa->word_ops_used == 0) init_word_char (dfa); if (token->opr.ctx_type == WORD_DELIM || token->opr.ctx_type == NOT_WORD_DELIM) { bin_tree_t *tree_first, *tree_last; if (token->opr.ctx_type == WORD_DELIM) { token->opr.ctx_type = WORD_FIRST; tree_first = re_dfa_add_tree_node (dfa, NULL, NULL, token); token->opr.ctx_type = WORD_LAST; } else { token->opr.ctx_type = INSIDE_WORD; tree_first = re_dfa_add_tree_node (dfa, NULL, NULL, token); token->opr.ctx_type = INSIDE_NOTWORD; } tree_last = re_dfa_add_tree_node (dfa, NULL, NULL, token); token->type = OP_ALT; tree = re_dfa_add_tree_node (dfa, tree_first, tree_last, token); if (BE (tree_first == NULL || tree_last == NULL || tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } } else { tree = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } } /* We must return here, since ANCHORs can't be followed by repetition operators. eg. RE"^*" is invalid or "", it must not be "". */ fetch_token (token, regexp, syntax); return tree; case OP_PERIOD: tree = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } if (dfa->mb_cur_max > 1) dfa->has_mb_node = 1; break; case OP_WORD: case OP_NOTWORD: tree = build_charclass_op (dfa, regexp->trans, (const unsigned char *) "alnum", (const unsigned char *) "_", token->type == OP_NOTWORD, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; break; case OP_SPACE: case OP_NOTSPACE: tree = build_charclass_op (dfa, regexp->trans, (const unsigned char *) "space", (const unsigned char *) "", token->type == OP_NOTSPACE, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; break; case OP_ALT: case END_OF_RE: return NULL; case BACK_SLASH: *err = REG_EESCAPE; return NULL; default: /* Must not happen? */ #ifdef DEBUG assert (0); #endif return NULL; } fetch_token (token, regexp, syntax); while (token->type == OP_DUP_ASTERISK || token->type == OP_DUP_PLUS || token->type == OP_DUP_QUESTION || token->type == OP_OPEN_DUP_NUM) { tree = parse_dup_op (tree, regexp, dfa, token, syntax, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; /* In BRE consecutive duplications are not allowed. */ if ((syntax & RE_CONTEXT_INVALID_DUP) && (token->type == OP_DUP_ASTERISK || token->type == OP_OPEN_DUP_NUM)) { *err = REG_BADRPT; return NULL; } dfa->has_plural_match = 1; } return tree; } /* This function build the following tree, from regular expression (): SUBEXP | */ static bin_tree_t * parse_sub_exp (regexp, preg, token, syntax, nest, err) re_string_t *regexp; regex_t *preg; re_token_t *token; reg_syntax_t syntax; int nest; reg_errcode_t *err; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_t *tree, *left_par, *right_par; size_t cur_nsub; cur_nsub = preg->re_nsub++; left_par = re_dfa_add_tree_node (dfa, NULL, NULL, token); if (BE (left_par == NULL, 0)) { *err = REG_ESPACE; return NULL; } dfa->nodes[left_par->node_idx].opr.idx = cur_nsub; fetch_token (token, regexp, syntax | RE_CARET_ANCHORS_HERE); /* The subexpression may be a null string. */ if (token->type == OP_CLOSE_SUBEXP) tree = NULL; else { tree = parse_reg_exp (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; } if (BE (token->type != OP_CLOSE_SUBEXP, 0)) { *err = REG_EPAREN; return NULL; } right_par = re_dfa_add_tree_node (dfa, NULL, NULL, token); dfa->completed_bkref_map |= 1 << cur_nsub; tree = ((tree == NULL) ? right_par : create_tree (dfa, tree, right_par, CONCAT, 0)); tree = create_tree (dfa, left_par, tree, CONCAT, 0); if (BE (right_par == NULL || tree == NULL, 0)) { *err = REG_ESPACE; return NULL; } dfa->nodes[right_par->node_idx].opr.idx = cur_nsub; return tree; } /* This function parse repetition operators like "*", "+", "{1,3}" etc. */ static bin_tree_t * parse_dup_op (elem, regexp, dfa, token, syntax, err) bin_tree_t *elem; re_string_t *regexp; re_dfa_t *dfa; re_token_t *token; reg_syntax_t syntax; reg_errcode_t *err; { re_token_t dup_token; bin_tree_t *tree = NULL, *old_tree = NULL; int i, start, end, start_idx = re_string_cur_idx (regexp); re_token_t start_token = *token; if (token->type == OP_OPEN_DUP_NUM) { end = 0; start = fetch_number (regexp, token, syntax); if (start == -1) { if (token->type == CHARACTER && token->opr.c == ',') start = 0; /* We treat "{,m}" as "{0,m}". */ else { *err = REG_BADBR; /* {} is invalid. */ return NULL; } } if (BE (start != -2, 1)) { /* We treat "{n}" as "{n,n}". */ end = ((token->type == OP_CLOSE_DUP_NUM) ? start : ((token->type == CHARACTER && token->opr.c == ',') ? fetch_number (regexp, token, syntax) : -2)); } if (BE (start == -2 || end == -2, 0)) { /* Invalid sequence. */ if (BE (!(syntax & RE_INVALID_INTERVAL_ORD), 0)) { if (token->type == END_OF_RE) *err = REG_EBRACE; else *err = REG_BADBR; return NULL; } /* If the syntax bit is set, rollback. */ re_string_set_index (regexp, start_idx); *token = start_token; token->type = CHARACTER; /* mb_partial and word_char bits should be already initialized by peek_token. */ return elem; } if (BE (end != -1 && start > end, 0)) { /* First number greater than second. */ *err = REG_BADBR; return NULL; } } else { start = (token->type == OP_DUP_PLUS) ? 1 : 0; end = (token->type == OP_DUP_QUESTION) ? 1 : -1; } fetch_token (token, regexp, syntax); /* Treat "{0}*" etc. as "{0}". */ if (BE (elem == NULL || (start == 0 && end == 0), 0)) return NULL; /* Extract "{n,m}" to "...{0,}". */ if (BE (start > 0, 0)) { tree = elem; for (i = 2; i <= start; ++i) { elem = duplicate_tree (elem, dfa); tree = create_tree (dfa, tree, elem, CONCAT, 0); if (BE (elem == NULL || tree == NULL, 0)) goto parse_dup_op_espace; } if (start == end) return tree; /* Duplicate ELEM before it is marked optional. */ elem = duplicate_tree (elem, dfa); old_tree = tree; } else old_tree = NULL; mark_opt_subexp (elem, dfa); dup_token.type = (end == -1 ? OP_DUP_ASTERISK : OP_DUP_QUESTION); tree = re_dfa_add_tree_node (dfa, elem, NULL, &dup_token); if (BE (tree == NULL, 0)) goto parse_dup_op_espace; /* This loop is actually executed only when end != -1, to rewrite {0,n} as ((...?)?)?... We have already created the start+1-th copy. */ for (i = start + 2; i <= end; ++i) { elem = duplicate_tree (elem, dfa); tree = create_tree (dfa, tree, elem, CONCAT, 0); if (BE (elem == NULL || tree == NULL, 0)) goto parse_dup_op_espace; tree = re_dfa_add_tree_node (dfa, tree, NULL, &dup_token); if (BE (tree == NULL, 0)) goto parse_dup_op_espace; } if (old_tree) tree = create_tree (dfa, old_tree, tree, CONCAT, 0); return tree; parse_dup_op_espace: *err = REG_ESPACE; return NULL; } /* Size of the names for collating symbol/equivalence_class/character_class. I'm not sure, but maybe enough. */ #define BRACKET_NAME_BUF_SIZE 32 #ifndef _LIBC /* Local function for parse_bracket_exp only used in case of NOT _LIBC. Build the range expression which starts from START_ELEM, and ends at END_ELEM. The result are written to MBCSET and SBCSET. RANGE_ALLOC is the allocated size of mbcset->range_starts, and mbcset->range_ends, is a pointer argument sinse we may update it. */ static reg_errcode_t # ifdef RE_ENABLE_I18N build_range_exp (sbcset, mbcset, range_alloc, start_elem, end_elem) re_charset_t *mbcset; int *range_alloc; # else /* not RE_ENABLE_I18N */ build_range_exp (sbcset, start_elem, end_elem) # endif /* not RE_ENABLE_I18N */ re_bitset_ptr_t sbcset; bracket_elem_t *start_elem, *end_elem; { unsigned int start_ch, end_ch; /* Equivalence Classes and Character Classes can't be a range start/end. */ if (BE (start_elem->type == EQUIV_CLASS || start_elem->type == CHAR_CLASS || end_elem->type == EQUIV_CLASS || end_elem->type == CHAR_CLASS, 0)) return REG_ERANGE; /* We can handle no multi character collating elements without libc support. */ if (BE ((start_elem->type == COLL_SYM && strlen ((char *) start_elem->opr.name) > 1) || (end_elem->type == COLL_SYM && strlen ((char *) end_elem->opr.name) > 1), 0)) return REG_ECOLLATE; # ifdef RE_ENABLE_I18N { wchar_t wc, start_wc, end_wc; wchar_t cmp_buf[6] = {L'\0', L'\0', L'\0', L'\0', L'\0', L'\0'}; start_ch = ((start_elem->type == SB_CHAR) ? start_elem->opr.ch : ((start_elem->type == COLL_SYM) ? start_elem->opr.name[0] : 0)); end_ch = ((end_elem->type == SB_CHAR) ? end_elem->opr.ch : ((end_elem->type == COLL_SYM) ? end_elem->opr.name[0] : 0)); start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM) ? __btowc (start_ch) : start_elem->opr.wch); end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM) ? __btowc (end_ch) : end_elem->opr.wch); if (start_wc == WEOF || end_wc == WEOF) return REG_ECOLLATE; cmp_buf[0] = start_wc; cmp_buf[4] = end_wc; if (wcscoll (cmp_buf, cmp_buf + 4) > 0) return REG_ERANGE; /* Got valid collation sequence values, add them as a new entry. However, for !_LIBC we have no collation elements: if the character set is single byte, the single byte character set that we build below suffices. parse_bracket_exp passes no MBCSET if dfa->mb_cur_max == 1. */ if (mbcset) { /* Check the space of the arrays. */ if (BE (*range_alloc == mbcset->nranges, 0)) { /* There is not enough space, need realloc. */ wchar_t *new_array_start, *new_array_end; int new_nranges; /* +1 in case of mbcset->nranges is 0. */ new_nranges = 2 * mbcset->nranges + 1; /* Use realloc since mbcset->range_starts and mbcset->range_ends are NULL if *range_alloc == 0. */ new_array_start = re_realloc (mbcset->range_starts, wchar_t, new_nranges); new_array_end = re_realloc (mbcset->range_ends, wchar_t, new_nranges); if (BE (new_array_start == NULL || new_array_end == NULL, 0)) return REG_ESPACE; mbcset->range_starts = new_array_start; mbcset->range_ends = new_array_end; *range_alloc = new_nranges; } mbcset->range_starts[mbcset->nranges] = start_wc; mbcset->range_ends[mbcset->nranges++] = end_wc; } /* Build the table for single byte characters. */ for (wc = 0; wc < SBC_MAX; ++wc) { cmp_buf[2] = wc; if (wcscoll (cmp_buf, cmp_buf + 2) <= 0 && wcscoll (cmp_buf + 2, cmp_buf + 4) <= 0) bitset_set (sbcset, wc); } } # else /* not RE_ENABLE_I18N */ { unsigned int ch; start_ch = ((start_elem->type == SB_CHAR ) ? start_elem->opr.ch : ((start_elem->type == COLL_SYM) ? start_elem->opr.name[0] : 0)); end_ch = ((end_elem->type == SB_CHAR ) ? end_elem->opr.ch : ((end_elem->type == COLL_SYM) ? end_elem->opr.name[0] : 0)); if (start_ch > end_ch) return REG_ERANGE; /* Build the table for single byte characters. */ for (ch = 0; ch < SBC_MAX; ++ch) if (start_ch <= ch && ch <= end_ch) bitset_set (sbcset, ch); } # endif /* not RE_ENABLE_I18N */ return REG_NOERROR; } #endif /* not _LIBC */ #ifndef _LIBC /* Helper function for parse_bracket_exp only used in case of NOT _LIBC.. Build the collating element which is represented by NAME. The result are written to MBCSET and SBCSET. COLL_SYM_ALLOC is the allocated size of mbcset->coll_sym, is a pointer argument since we may update it. */ static reg_errcode_t # ifdef RE_ENABLE_I18N build_collating_symbol (sbcset, mbcset, coll_sym_alloc, name) re_charset_t *mbcset; int *coll_sym_alloc; # else /* not RE_ENABLE_I18N */ build_collating_symbol (sbcset, name) # endif /* not RE_ENABLE_I18N */ re_bitset_ptr_t sbcset; const unsigned char *name; { size_t name_len = strlen ((const char *) name); if (BE (name_len != 1, 0)) return REG_ECOLLATE; else { bitset_set (sbcset, name[0]); return REG_NOERROR; } } #endif /* not _LIBC */ /* This function parse bracket expression like "[abc]", "[a-c]", "[[.a-a.]]" etc. */ static bin_tree_t * parse_bracket_exp (regexp, dfa, token, syntax, err) re_string_t *regexp; re_dfa_t *dfa; re_token_t *token; reg_syntax_t syntax; reg_errcode_t *err; { #ifdef _LIBC const unsigned char *collseqmb; const char *collseqwc; uint32_t nrules; int32_t table_size; const int32_t *symb_table; const unsigned char *extra; /* Local function for parse_bracket_exp used in _LIBC environement. Seek the collating symbol entry correspondings to NAME. Return the index of the symbol in the SYMB_TABLE. */ auto inline int32_t __attribute ((always_inline)) seek_collating_symbol_entry (name, name_len) const unsigned char *name; size_t name_len; { int32_t hash = elem_hash ((const char *) name, name_len); int32_t elem = hash % table_size; int32_t second = hash % (table_size - 2); while (symb_table[2 * elem] != 0) { /* First compare the hashing value. */ if (symb_table[2 * elem] == hash /* Compare the length of the name. */ && name_len == extra[symb_table[2 * elem + 1]] /* Compare the name. */ && memcmp (name, &extra[symb_table[2 * elem + 1] + 1], name_len) == 0) { /* Yep, this is the entry. */ break; } /* Next entry. */ elem += second; } return elem; } /* Local function for parse_bracket_exp used in _LIBC environement. Look up the collation sequence value of BR_ELEM. Return the value if succeeded, UINT_MAX otherwise. */ auto inline unsigned int __attribute ((always_inline)) lookup_collation_sequence_value (br_elem) bracket_elem_t *br_elem; { if (br_elem->type == SB_CHAR) { /* if (MB_CUR_MAX == 1) */ if (nrules == 0) return collseqmb[br_elem->opr.ch]; else { wint_t wc = __btowc (br_elem->opr.ch); return __collseq_table_lookup (collseqwc, wc); } } else if (br_elem->type == MB_CHAR) { return __collseq_table_lookup (collseqwc, br_elem->opr.wch); } else if (br_elem->type == COLL_SYM) { size_t sym_name_len = strlen ((char *) br_elem->opr.name); if (nrules != 0) { int32_t elem, idx; elem = seek_collating_symbol_entry (br_elem->opr.name, sym_name_len); if (symb_table[2 * elem] != 0) { /* We found the entry. */ idx = symb_table[2 * elem + 1]; /* Skip the name of collating element name. */ idx += 1 + extra[idx]; /* Skip the byte sequence of the collating element. */ idx += 1 + extra[idx]; /* Adjust for the alignment. */ idx = (idx + 3) & ~3; /* Skip the multibyte collation sequence value. */ idx += sizeof (unsigned int); /* Skip the wide char sequence of the collating element. */ idx += sizeof (unsigned int) * (1 + *(unsigned int *) (extra + idx)); /* Return the collation sequence value. */ return *(unsigned int *) (extra + idx); } else if (symb_table[2 * elem] == 0 && sym_name_len == 1) { /* No valid character. Match it as a single byte character. */ return collseqmb[br_elem->opr.name[0]]; } } else if (sym_name_len == 1) return collseqmb[br_elem->opr.name[0]]; } return UINT_MAX; } /* Local function for parse_bracket_exp used in _LIBC environement. Build the range expression which starts from START_ELEM, and ends at END_ELEM. The result are written to MBCSET and SBCSET. RANGE_ALLOC is the allocated size of mbcset->range_starts, and mbcset->range_ends, is a pointer argument sinse we may update it. */ auto inline reg_errcode_t __attribute ((always_inline)) build_range_exp (sbcset, mbcset, range_alloc, start_elem, end_elem) re_charset_t *mbcset; int *range_alloc; re_bitset_ptr_t sbcset; bracket_elem_t *start_elem, *end_elem; { unsigned int ch; uint32_t start_collseq; uint32_t end_collseq; /* Equivalence Classes and Character Classes can't be a range start/end. */ if (BE (start_elem->type == EQUIV_CLASS || start_elem->type == CHAR_CLASS || end_elem->type == EQUIV_CLASS || end_elem->type == CHAR_CLASS, 0)) return REG_ERANGE; start_collseq = lookup_collation_sequence_value (start_elem); end_collseq = lookup_collation_sequence_value (end_elem); /* Check start/end collation sequence values. */ if (BE (start_collseq == UINT_MAX || end_collseq == UINT_MAX, 0)) return REG_ECOLLATE; if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_collseq > end_collseq, 0)) return REG_ERANGE; /* Got valid collation sequence values, add them as a new entry. However, if we have no collation elements, and the character set is single byte, the single byte character set that we build below suffices. */ if (nrules > 0 || dfa->mb_cur_max > 1) { /* Check the space of the arrays. */ if (BE (*range_alloc == mbcset->nranges, 0)) { /* There is not enough space, need realloc. */ uint32_t *new_array_start; uint32_t *new_array_end; int new_nranges; /* +1 in case of mbcset->nranges is 0. */ new_nranges = 2 * mbcset->nranges + 1; new_array_start = re_realloc (mbcset->range_starts, uint32_t, new_nranges); new_array_end = re_realloc (mbcset->range_ends, uint32_t, new_nranges); if (BE (new_array_start == NULL || new_array_end == NULL, 0)) return REG_ESPACE; mbcset->range_starts = new_array_start; mbcset->range_ends = new_array_end; *range_alloc = new_nranges; } mbcset->range_starts[mbcset->nranges] = start_collseq; mbcset->range_ends[mbcset->nranges++] = end_collseq; } /* Build the table for single byte characters. */ for (ch = 0; ch < SBC_MAX; ch++) { uint32_t ch_collseq; /* if (MB_CUR_MAX == 1) */ if (nrules == 0) ch_collseq = collseqmb[ch]; else ch_collseq = __collseq_table_lookup (collseqwc, __btowc (ch)); if (start_collseq <= ch_collseq && ch_collseq <= end_collseq) bitset_set (sbcset, ch); } return REG_NOERROR; } /* Local function for parse_bracket_exp used in _LIBC environement. Build the collating element which is represented by NAME. The result are written to MBCSET and SBCSET. COLL_SYM_ALLOC is the allocated size of mbcset->coll_sym, is a pointer argument sinse we may update it. */ auto inline reg_errcode_t __attribute ((always_inline)) build_collating_symbol (sbcset, mbcset, coll_sym_alloc, name) re_charset_t *mbcset; int *coll_sym_alloc; re_bitset_ptr_t sbcset; const unsigned char *name; { int32_t elem, idx; size_t name_len = strlen ((const char *) name); if (nrules != 0) { elem = seek_collating_symbol_entry (name, name_len); if (symb_table[2 * elem] != 0) { /* We found the entry. */ idx = symb_table[2 * elem + 1]; /* Skip the name of collating element name. */ idx += 1 + extra[idx]; } else if (symb_table[2 * elem] == 0 && name_len == 1) { /* No valid character, treat it as a normal character. */ bitset_set (sbcset, name[0]); return REG_NOERROR; } else return REG_ECOLLATE; /* Got valid collation sequence, add it as a new entry. */ /* Check the space of the arrays. */ if (BE (*coll_sym_alloc == mbcset->ncoll_syms, 0)) { /* Not enough, realloc it. */ /* +1 in case of mbcset->ncoll_syms is 0. */ int new_coll_sym_alloc = 2 * mbcset->ncoll_syms + 1; /* Use realloc since mbcset->coll_syms is NULL if *alloc == 0. */ int32_t *new_coll_syms = re_realloc (mbcset->coll_syms, int32_t, new_coll_sym_alloc); if (BE (new_coll_syms == NULL, 0)) return REG_ESPACE; mbcset->coll_syms = new_coll_syms; *coll_sym_alloc = new_coll_sym_alloc; } mbcset->coll_syms[mbcset->ncoll_syms++] = idx; return REG_NOERROR; } else { if (BE (name_len != 1, 0)) return REG_ECOLLATE; else { bitset_set (sbcset, name[0]); return REG_NOERROR; } } } #endif re_token_t br_token; re_bitset_ptr_t sbcset; #ifdef RE_ENABLE_I18N re_charset_t *mbcset; int coll_sym_alloc = 0, range_alloc = 0, mbchar_alloc = 0; int equiv_class_alloc = 0, char_class_alloc = 0; #endif /* not RE_ENABLE_I18N */ int non_match = 0; bin_tree_t *work_tree; int token_len; int first_round = 1; #ifdef _LIBC collseqmb = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB); nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules) { /* if (MB_CUR_MAX > 1) */ collseqwc = _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQWC); table_size = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_SYMB_HASH_SIZEMB); symb_table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_TABLEMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); } #endif sbcset = (re_bitset_ptr_t) calloc (sizeof (unsigned int), BITSET_UINTS); #ifdef RE_ENABLE_I18N mbcset = (re_charset_t *) calloc (sizeof (re_charset_t), 1); #endif /* RE_ENABLE_I18N */ #ifdef RE_ENABLE_I18N if (BE (sbcset == NULL || mbcset == NULL, 0)) #else if (BE (sbcset == NULL, 0)) #endif /* RE_ENABLE_I18N */ { *err = REG_ESPACE; return NULL; } token_len = peek_token_bracket (token, regexp, syntax); if (BE (token->type == END_OF_RE, 0)) { *err = REG_BADPAT; goto parse_bracket_exp_free_return; } if (token->type == OP_NON_MATCH_LIST) { #ifdef RE_ENABLE_I18N mbcset->non_match = 1; #endif /* not RE_ENABLE_I18N */ non_match = 1; if (syntax & RE_HAT_LISTS_NOT_NEWLINE) bitset_set (sbcset, '\0'); re_string_skip_bytes (regexp, token_len); /* Skip a token. */ token_len = peek_token_bracket (token, regexp, syntax); if (BE (token->type == END_OF_RE, 0)) { *err = REG_BADPAT; goto parse_bracket_exp_free_return; } } /* We treat the first ']' as a normal character. */ if (token->type == OP_CLOSE_BRACKET) token->type = CHARACTER; while (1) { bracket_elem_t start_elem, end_elem; unsigned char start_name_buf[BRACKET_NAME_BUF_SIZE]; unsigned char end_name_buf[BRACKET_NAME_BUF_SIZE]; reg_errcode_t ret; int token_len2 = 0, is_range_exp = 0; re_token_t token2; start_elem.opr.name = start_name_buf; ret = parse_bracket_element (&start_elem, regexp, token, token_len, dfa, syntax, first_round); if (BE (ret != REG_NOERROR, 0)) { *err = ret; goto parse_bracket_exp_free_return; } first_round = 0; /* Get information about the next token. We need it in any case. */ token_len = peek_token_bracket (token, regexp, syntax); /* Do not check for ranges if we know they are not allowed. */ if (start_elem.type != CHAR_CLASS && start_elem.type != EQUIV_CLASS) { if (BE (token->type == END_OF_RE, 0)) { *err = REG_EBRACK; goto parse_bracket_exp_free_return; } if (token->type == OP_CHARSET_RANGE) { re_string_skip_bytes (regexp, token_len); /* Skip '-'. */ token_len2 = peek_token_bracket (&token2, regexp, syntax); if (BE (token2.type == END_OF_RE, 0)) { *err = REG_EBRACK; goto parse_bracket_exp_free_return; } if (token2.type == OP_CLOSE_BRACKET) { /* We treat the last '-' as a normal character. */ re_string_skip_bytes (regexp, -token_len); token->type = CHARACTER; } else is_range_exp = 1; } } if (is_range_exp == 1) { end_elem.opr.name = end_name_buf; ret = parse_bracket_element (&end_elem, regexp, &token2, token_len2, dfa, syntax, 1); if (BE (ret != REG_NOERROR, 0)) { *err = ret; goto parse_bracket_exp_free_return; } token_len = peek_token_bracket (token, regexp, syntax); #ifdef _LIBC *err = build_range_exp (sbcset, mbcset, &range_alloc, &start_elem, &end_elem); #else # ifdef RE_ENABLE_I18N *err = build_range_exp (sbcset, dfa->mb_cur_max > 1 ? mbcset : NULL, &range_alloc, &start_elem, &end_elem); # else *err = build_range_exp (sbcset, &start_elem, &end_elem); # endif #endif /* RE_ENABLE_I18N */ if (BE (*err != REG_NOERROR, 0)) goto parse_bracket_exp_free_return; } else { switch (start_elem.type) { case SB_CHAR: bitset_set (sbcset, start_elem.opr.ch); break; #ifdef RE_ENABLE_I18N case MB_CHAR: /* Check whether the array has enough space. */ if (BE (mbchar_alloc == mbcset->nmbchars, 0)) { wchar_t *new_mbchars; /* Not enough, realloc it. */ /* +1 in case of mbcset->nmbchars is 0. */ mbchar_alloc = 2 * mbcset->nmbchars + 1; /* Use realloc since array is NULL if *alloc == 0. */ new_mbchars = re_realloc (mbcset->mbchars, wchar_t, mbchar_alloc); if (BE (new_mbchars == NULL, 0)) goto parse_bracket_exp_espace; mbcset->mbchars = new_mbchars; } mbcset->mbchars[mbcset->nmbchars++] = start_elem.opr.wch; break; #endif /* RE_ENABLE_I18N */ case EQUIV_CLASS: *err = build_equiv_class (sbcset, #ifdef RE_ENABLE_I18N mbcset, &equiv_class_alloc, #endif /* RE_ENABLE_I18N */ start_elem.opr.name); if (BE (*err != REG_NOERROR, 0)) goto parse_bracket_exp_free_return; break; case COLL_SYM: *err = build_collating_symbol (sbcset, #ifdef RE_ENABLE_I18N mbcset, &coll_sym_alloc, #endif /* RE_ENABLE_I18N */ start_elem.opr.name); if (BE (*err != REG_NOERROR, 0)) goto parse_bracket_exp_free_return; break; case CHAR_CLASS: *err = build_charclass (regexp->trans, sbcset, #ifdef RE_ENABLE_I18N mbcset, &char_class_alloc, #endif /* RE_ENABLE_I18N */ start_elem.opr.name, syntax); if (BE (*err != REG_NOERROR, 0)) goto parse_bracket_exp_free_return; break; default: assert (0); break; } } if (BE (token->type == END_OF_RE, 0)) { *err = REG_EBRACK; goto parse_bracket_exp_free_return; } if (token->type == OP_CLOSE_BRACKET) break; } re_string_skip_bytes (regexp, token_len); /* Skip a token. */ /* If it is non-matching list. */ if (non_match) bitset_not (sbcset); #ifdef RE_ENABLE_I18N /* Ensure only single byte characters are set. */ if (dfa->mb_cur_max > 1) bitset_mask (sbcset, dfa->sb_char); #endif /* RE_ENABLE_I18N */ /* Build a tree for simple bracket. */ br_token.type = SIMPLE_BRACKET; br_token.opr.sbcset = sbcset; work_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token); if (BE (work_tree == NULL, 0)) goto parse_bracket_exp_espace; #ifdef RE_ENABLE_I18N if (mbcset->nmbchars || mbcset->ncoll_syms || mbcset->nequiv_classes || mbcset->nranges || (dfa->mb_cur_max > 1 && (mbcset->nchar_classes || mbcset->non_match))) { re_token_t alt_token; bin_tree_t *mbc_tree; int sbc_idx; /* Build a tree for complex bracket. */ dfa->has_mb_node = 1; for (sbc_idx = 0; sbc_idx < BITSET_UINTS; ++sbc_idx) if (sbcset[sbc_idx]) break; /* If there are no bits set in sbcset, there is no point of having both SIMPLE_BRACKET and COMPLEX_BRACKET. */ if (sbc_idx == BITSET_UINTS) { re_free (sbcset); dfa->nodes[work_tree->node_idx].type = COMPLEX_BRACKET; dfa->nodes[work_tree->node_idx].opr.mbcset = mbcset; return work_tree; } br_token.type = COMPLEX_BRACKET; br_token.opr.mbcset = mbcset; mbc_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token); if (BE (mbc_tree == NULL, 0)) goto parse_bracket_exp_espace; /* Then join them by ALT node. */ alt_token.type = OP_ALT; dfa->has_plural_match = 1; work_tree = re_dfa_add_tree_node (dfa, work_tree, mbc_tree, &alt_token); if (BE (mbc_tree != NULL, 1)) return work_tree; } else { free_charset (mbcset); return work_tree; } #else /* not RE_ENABLE_I18N */ return work_tree; #endif /* not RE_ENABLE_I18N */ parse_bracket_exp_espace: *err = REG_ESPACE; parse_bracket_exp_free_return: re_free (sbcset); #ifdef RE_ENABLE_I18N free_charset (mbcset); #endif /* RE_ENABLE_I18N */ return NULL; } /* Parse an element in the bracket expression. */ static reg_errcode_t parse_bracket_element (elem, regexp, token, token_len, dfa, syntax, accept_hyphen) bracket_elem_t *elem; re_string_t *regexp; re_token_t *token; int token_len; re_dfa_t *dfa; reg_syntax_t syntax; int accept_hyphen; { #ifdef RE_ENABLE_I18N int cur_char_size; cur_char_size = re_string_char_size_at (regexp, re_string_cur_idx (regexp)); if (cur_char_size > 1) { elem->type = MB_CHAR; elem->opr.wch = re_string_wchar_at (regexp, re_string_cur_idx (regexp)); re_string_skip_bytes (regexp, cur_char_size); return REG_NOERROR; } #endif /* RE_ENABLE_I18N */ re_string_skip_bytes (regexp, token_len); /* Skip a token. */ if (token->type == OP_OPEN_COLL_ELEM || token->type == OP_OPEN_CHAR_CLASS || token->type == OP_OPEN_EQUIV_CLASS) return parse_bracket_symbol (elem, regexp, token); if (BE (token->type == OP_CHARSET_RANGE, 0) && !accept_hyphen) { /* A '-' must only appear as anything but a range indicator before the closing bracket. Everything else is an error. */ re_token_t token2; (void) peek_token_bracket (&token2, regexp, syntax); if (token2.type != OP_CLOSE_BRACKET) /* The actual error value is not standardized since this whole case is undefined. But ERANGE makes good sense. */ return REG_ERANGE; } elem->type = SB_CHAR; elem->opr.ch = token->opr.c; return REG_NOERROR; } /* Parse a bracket symbol in the bracket expression. Bracket symbols are such as [::], [..], and [==]. */ static reg_errcode_t parse_bracket_symbol (elem, regexp, token) bracket_elem_t *elem; re_string_t *regexp; re_token_t *token; { unsigned char ch, delim = token->opr.c; int i = 0; if (re_string_eoi(regexp)) return REG_EBRACK; for (;; ++i) { if (i >= BRACKET_NAME_BUF_SIZE) return REG_EBRACK; if (token->type == OP_OPEN_CHAR_CLASS) ch = re_string_fetch_byte_case (regexp); else ch = re_string_fetch_byte (regexp); if (re_string_eoi(regexp)) return REG_EBRACK; if (ch == delim && re_string_peek_byte (regexp, 0) == ']') break; elem->opr.name[i] = ch; } re_string_skip_bytes (regexp, 1); elem->opr.name[i] = '\0'; switch (token->type) { case OP_OPEN_COLL_ELEM: elem->type = COLL_SYM; break; case OP_OPEN_EQUIV_CLASS: elem->type = EQUIV_CLASS; break; case OP_OPEN_CHAR_CLASS: elem->type = CHAR_CLASS; break; default: break; } return REG_NOERROR; } /* Helper function for parse_bracket_exp. Build the equivalence class which is represented by NAME. The result are written to MBCSET and SBCSET. EQUIV_CLASS_ALLOC is the allocated size of mbcset->equiv_classes, is a pointer argument sinse we may update it. */ static reg_errcode_t #ifdef RE_ENABLE_I18N build_equiv_class (sbcset, mbcset, equiv_class_alloc, name) re_charset_t *mbcset; int *equiv_class_alloc; #else /* not RE_ENABLE_I18N */ build_equiv_class (sbcset, name) #endif /* not RE_ENABLE_I18N */ re_bitset_ptr_t sbcset; const unsigned char *name; { #if defined _LIBC uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules != 0) { const int32_t *table, *indirect; const unsigned char *weights, *extra, *cp; unsigned char char_buf[2]; int32_t idx1, idx2; unsigned int ch; size_t len; /* This #include defines a local function! */ # include /* Calculate the index for equivalence class. */ cp = name; table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB); weights = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_WEIGHTMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); idx1 = findidx (&cp); if (BE (idx1 == 0 || cp < name + strlen ((const char *) name), 0)) /* This isn't a valid character. */ return REG_ECOLLATE; /* Build single byte matcing table for this equivalence class. */ char_buf[1] = (unsigned char) '\0'; len = weights[idx1]; for (ch = 0; ch < SBC_MAX; ++ch) { char_buf[0] = ch; cp = char_buf; idx2 = findidx (&cp); /* idx2 = table[ch]; */ if (idx2 == 0) /* This isn't a valid character. */ continue; if (len == weights[idx2]) { int cnt = 0; while (cnt <= len && weights[idx1 + 1 + cnt] == weights[idx2 + 1 + cnt]) ++cnt; if (cnt > len) bitset_set (sbcset, ch); } } /* Check whether the array has enough space. */ if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0)) { /* Not enough, realloc it. */ /* +1 in case of mbcset->nequiv_classes is 0. */ int new_equiv_class_alloc = 2 * mbcset->nequiv_classes + 1; /* Use realloc since the array is NULL if *alloc == 0. */ int32_t *new_equiv_classes = re_realloc (mbcset->equiv_classes, int32_t, new_equiv_class_alloc); if (BE (new_equiv_classes == NULL, 0)) return REG_ESPACE; mbcset->equiv_classes = new_equiv_classes; *equiv_class_alloc = new_equiv_class_alloc; } mbcset->equiv_classes[mbcset->nequiv_classes++] = idx1; } else #endif /* _LIBC */ { if (BE (strlen ((const char *) name) != 1, 0)) return REG_ECOLLATE; bitset_set (sbcset, *name); } return REG_NOERROR; } /* Helper function for parse_bracket_exp. Build the character class which is represented by NAME. The result are written to MBCSET and SBCSET. CHAR_CLASS_ALLOC is the allocated size of mbcset->char_classes, is a pointer argument sinse we may update it. */ static reg_errcode_t #ifdef RE_ENABLE_I18N build_charclass (trans, sbcset, mbcset, char_class_alloc, class_name, syntax) re_charset_t *mbcset; int *char_class_alloc; #else /* not RE_ENABLE_I18N */ build_charclass (trans, sbcset, class_name, syntax) #endif /* not RE_ENABLE_I18N */ unsigned RE_TRANSLATE_TYPE trans; re_bitset_ptr_t sbcset; const unsigned char *class_name; reg_syntax_t syntax; { int i; const char *name = (const char *) class_name; /* In case of REG_ICASE "upper" and "lower" match the both of upper and lower cases. */ if ((syntax & RE_ICASE) && (strcmp (name, "upper") == 0 || strcmp (name, "lower") == 0)) name = "alpha"; #ifdef RE_ENABLE_I18N /* Check the space of the arrays. */ if (BE (*char_class_alloc == mbcset->nchar_classes, 0)) { /* Not enough, realloc it. */ /* +1 in case of mbcset->nchar_classes is 0. */ int new_char_class_alloc = 2 * mbcset->nchar_classes + 1; /* Use realloc since array is NULL if *alloc == 0. */ wctype_t *new_char_classes = re_realloc (mbcset->char_classes, wctype_t, new_char_class_alloc); if (BE (new_char_classes == NULL, 0)) return REG_ESPACE; mbcset->char_classes = new_char_classes; *char_class_alloc = new_char_class_alloc; } mbcset->char_classes[mbcset->nchar_classes++] = __wctype (name); #endif /* RE_ENABLE_I18N */ #define BUILD_CHARCLASS_LOOP(ctype_func) \ for (i = 0; i < SBC_MAX; ++i) \ { \ if (ctype_func (i)) \ { \ int ch = trans ? trans[i] : i; \ bitset_set (sbcset, ch); \ } \ } if (strcmp (name, "alnum") == 0) BUILD_CHARCLASS_LOOP (isalnum) else if (strcmp (name, "cntrl") == 0) BUILD_CHARCLASS_LOOP (iscntrl) else if (strcmp (name, "lower") == 0) BUILD_CHARCLASS_LOOP (islower) else if (strcmp (name, "space") == 0) BUILD_CHARCLASS_LOOP (isspace) else if (strcmp (name, "alpha") == 0) BUILD_CHARCLASS_LOOP (isalpha) else if (strcmp (name, "digit") == 0) BUILD_CHARCLASS_LOOP (isdigit) else if (strcmp (name, "print") == 0) BUILD_CHARCLASS_LOOP (isprint) else if (strcmp (name, "upper") == 0) BUILD_CHARCLASS_LOOP (isupper) else if (strcmp (name, "blank") == 0) BUILD_CHARCLASS_LOOP (isblank) else if (strcmp (name, "graph") == 0) BUILD_CHARCLASS_LOOP (isgraph) else if (strcmp (name, "punct") == 0) BUILD_CHARCLASS_LOOP (ispunct) else if (strcmp (name, "xdigit") == 0) BUILD_CHARCLASS_LOOP (isxdigit) else return REG_ECTYPE; return REG_NOERROR; } static bin_tree_t * build_charclass_op (dfa, trans, class_name, extra, non_match, err) re_dfa_t *dfa; unsigned RE_TRANSLATE_TYPE trans; const unsigned char *class_name; const unsigned char *extra; int non_match; reg_errcode_t *err; { re_bitset_ptr_t sbcset; #ifdef RE_ENABLE_I18N re_charset_t *mbcset; int alloc = 0; #endif /* not RE_ENABLE_I18N */ reg_errcode_t ret; re_token_t br_token; bin_tree_t *tree; sbcset = (re_bitset_ptr_t) calloc (sizeof (unsigned int), BITSET_UINTS); #ifdef RE_ENABLE_I18N mbcset = (re_charset_t *) calloc (sizeof (re_charset_t), 1); #endif /* RE_ENABLE_I18N */ #ifdef RE_ENABLE_I18N if (BE (sbcset == NULL || mbcset == NULL, 0)) #else /* not RE_ENABLE_I18N */ if (BE (sbcset == NULL, 0)) #endif /* not RE_ENABLE_I18N */ { *err = REG_ESPACE; return NULL; } if (non_match) { #ifdef RE_ENABLE_I18N /* if (syntax & RE_HAT_LISTS_NOT_NEWLINE) bitset_set(cset->sbcset, '\0'); */ mbcset->non_match = 1; #endif /* not RE_ENABLE_I18N */ } /* We don't care the syntax in this case. */ ret = build_charclass (trans, sbcset, #ifdef RE_ENABLE_I18N mbcset, &alloc, #endif /* RE_ENABLE_I18N */ class_name, 0); if (BE (ret != REG_NOERROR, 0)) { re_free (sbcset); #ifdef RE_ENABLE_I18N free_charset (mbcset); #endif /* RE_ENABLE_I18N */ *err = ret; return NULL; } /* \w match '_' also. */ for (; *extra; extra++) bitset_set (sbcset, *extra); /* If it is non-matching list. */ if (non_match) bitset_not (sbcset); #ifdef RE_ENABLE_I18N /* Ensure only single byte characters are set. */ if (dfa->mb_cur_max > 1) bitset_mask (sbcset, dfa->sb_char); #endif /* Build a tree for simple bracket. */ br_token.type = SIMPLE_BRACKET; br_token.opr.sbcset = sbcset; tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token); if (BE (tree == NULL, 0)) goto build_word_op_espace; #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) { re_token_t alt_token; bin_tree_t *mbc_tree; /* Build a tree for complex bracket. */ br_token.type = COMPLEX_BRACKET; br_token.opr.mbcset = mbcset; dfa->has_mb_node = 1; mbc_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token); if (BE (mbc_tree == NULL, 0)) goto build_word_op_espace; /* Then join them by ALT node. */ alt_token.type = OP_ALT; dfa->has_plural_match = 1; tree = re_dfa_add_tree_node (dfa, tree, mbc_tree, &alt_token); if (BE (mbc_tree != NULL, 1)) return tree; } else { free_charset (mbcset); return tree; } #else /* not RE_ENABLE_I18N */ return tree; #endif /* not RE_ENABLE_I18N */ build_word_op_espace: re_free (sbcset); #ifdef RE_ENABLE_I18N free_charset (mbcset); #endif /* RE_ENABLE_I18N */ *err = REG_ESPACE; return NULL; } /* This is intended for the expressions like "a{1,3}". Fetch a number from `input', and return the number. Return -1, if the number field is empty like "{,1}". Return -2, If an error is occured. */ static int fetch_number (input, token, syntax) re_string_t *input; re_token_t *token; reg_syntax_t syntax; { int num = -1; unsigned char c; while (1) { fetch_token (token, input, syntax); c = token->opr.c; if (BE (token->type == END_OF_RE, 0)) return -2; if (token->type == OP_CLOSE_DUP_NUM || c == ',') break; num = ((token->type != CHARACTER || c < '0' || '9' < c || num == -2) ? -2 : ((num == -1) ? c - '0' : num * 10 + c - '0')); num = (num > RE_DUP_MAX) ? -2 : num; } return num; } #ifdef RE_ENABLE_I18N static void free_charset (re_charset_t *cset) { re_free (cset->mbchars); # ifdef _LIBC re_free (cset->coll_syms); re_free (cset->equiv_classes); re_free (cset->range_starts); re_free (cset->range_ends); # endif re_free (cset->char_classes); re_free (cset); } #endif /* RE_ENABLE_I18N */ /* Functions for binary tree operation. */ /* Create a tree node. */ static bin_tree_t * create_tree (dfa, left, right, type, index) re_dfa_t *dfa; bin_tree_t *left; bin_tree_t *right; re_token_type_t type; int index; { bin_tree_t *tree; if (BE (dfa->str_tree_storage_idx == BIN_TREE_STORAGE_SIZE, 0)) { bin_tree_storage_t *storage = re_malloc (bin_tree_storage_t, 1); if (storage == NULL) return NULL; storage->next = dfa->str_tree_storage; dfa->str_tree_storage = storage; dfa->str_tree_storage_idx = 0; } tree = &dfa->str_tree_storage->data[dfa->str_tree_storage_idx++]; tree->parent = NULL; tree->left = left; tree->right = right; tree->type = type; tree->node_idx = index; tree->first = -1; tree->next = -1; re_node_set_init_empty (&tree->eclosure); if (left != NULL) left->parent = tree; if (right != NULL) right->parent = tree; return tree; } /* Create both a DFA node and a tree for it. */ static bin_tree_t * re_dfa_add_tree_node (dfa, left, right, token) re_dfa_t *dfa; bin_tree_t *left; bin_tree_t *right; const re_token_t *token; { int new_idx = re_dfa_add_node (dfa, *token, 0); if (new_idx == -1) return NULL; return create_tree (dfa, left, right, 0, new_idx); } /* Mark the tree SRC as an optional subexpression. */ static void mark_opt_subexp (src, dfa) const bin_tree_t *src; re_dfa_t *dfa; { /* Pass an OPT_SUBEXP_IDX which is != 1 if the duplicated tree is a subexpression. */ if (src->type == CONCAT && src->left->type == NON_TYPE && dfa->nodes[src->left->node_idx].type == OP_OPEN_SUBEXP) mark_opt_subexp_iter (src, dfa, dfa->nodes[src->left->node_idx].opr.idx); } /* Recursive tree walker for mark_opt_subexp. */ static void mark_opt_subexp_iter (src, dfa, idx) const bin_tree_t *src; re_dfa_t *dfa; int idx; { int node_idx; if (src->type == NON_TYPE) { node_idx = src->node_idx; if ((dfa->nodes[node_idx].type == OP_OPEN_SUBEXP || dfa->nodes[node_idx].type == OP_CLOSE_SUBEXP) && dfa->nodes[node_idx].opr.idx == idx) dfa->nodes[node_idx].opt_subexp = 1; } if (src->left != NULL) mark_opt_subexp_iter (src->left, dfa, idx); if (src->right != NULL) mark_opt_subexp_iter (src->right, dfa, idx); } /* Duplicate the node SRC, and return new node. */ static bin_tree_t * duplicate_tree (src, dfa) const bin_tree_t *src; re_dfa_t *dfa; { bin_tree_t *left = NULL, *right = NULL, *new_tree; int new_node_idx; /* Since node indies must be according to Post-order of the tree, we must duplicate the left at first. */ if (src->left != NULL) { left = duplicate_tree (src->left, dfa); if (left == NULL) return NULL; } /* Secondaly, duplicate the right. */ if (src->right != NULL) { right = duplicate_tree (src->right, dfa); if (right == NULL) return NULL; } /* At last, duplicate itself. */ if (src->type == NON_TYPE) { new_node_idx = re_dfa_add_node (dfa, dfa->nodes[src->node_idx], 0); dfa->nodes[new_node_idx].duplicated = 1; if (BE (new_node_idx == -1, 0)) return NULL; } else new_node_idx = src->type; new_tree = create_tree (dfa, left, right, src->type, new_node_idx); return new_tree; } Yeti-6.4.0/regex/glibc/regex.c000066400000000000000000000066061253351442600161200ustar00rootroot00000000000000/* Extended regular expression matching and search library. Copyright (C) 2002, 2003 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Isamu Hasegawa . The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #ifdef HAVE_CONFIG_H #include "config.h" #endif #ifdef _AIX #pragma alloca #else # ifndef allocax /* predefined by HP cc +Olibcalls */ # ifdef __GNUC__ # define alloca(size) __builtin_alloca (size) # else # if HAVE_ALLOCA_H # include # else # ifdef __hpux void *alloca (); # else # if !defined __OS2__ && !defined WIN32 char *alloca (); # else # include /* OS/2 defines alloca in here */ # endif # endif # endif # endif # endif #endif #ifdef _LIBC /* We have to keep the namespace clean. */ # define regfree(preg) __regfree (preg) # define regexec(pr, st, nm, pm, ef) __regexec (pr, st, nm, pm, ef) # define regcomp(preg, pattern, cflags) __regcomp (preg, pattern, cflags) # define regerror(errcode, preg, errbuf, errbuf_size) \ __regerror(errcode, preg, errbuf, errbuf_size) # define re_set_registers(bu, re, nu, st, en) \ __re_set_registers (bu, re, nu, st, en) # define re_match_2(bufp, string1, size1, string2, size2, pos, regs, stop) \ __re_match_2 (bufp, string1, size1, string2, size2, pos, regs, stop) # define re_match(bufp, string, size, pos, regs) \ __re_match (bufp, string, size, pos, regs) # define re_search(bufp, string, size, startpos, range, regs) \ __re_search (bufp, string, size, startpos, range, regs) # define re_compile_pattern(pattern, length, bufp) \ __re_compile_pattern (pattern, length, bufp) # define re_set_syntax(syntax) __re_set_syntax (syntax) # define re_search_2(bufp, st1, s1, st2, s2, startpos, range, regs, stop) \ __re_search_2 (bufp, st1, s1, st2, s2, startpos, range, regs, stop) # define re_compile_fastmap(bufp) __re_compile_fastmap (bufp) # include "../locale/localeinfo.h" #endif /* POSIX says that must be included (by the caller) before . */ #include /* On some systems, limits.h sets RE_DUP_MAX to a lower value than GNU regex allows. Include it before , which correctly #undefs RE_DUP_MAX and sets it to the right value. */ #include #include #include "regex_internal.h" #include "regex_internal.c" #include "regcomp.c" #include "regexec.c" /* Binary backward compatibility. */ #if _LIBC # include # if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_3) link_warning (re_max_failures, "the 're_max_failures' variable is obsolete and will go away.") int re_max_failures = 2000; # endif #endif Yeti-6.4.0/regex/glibc/regex.h000066400000000000000000000537401253351442600161260ustar00rootroot00000000000000/* Definitions for data structures and routines for the regular expression library. Copyright (C) 1985,1989-93,1995-98,2000,2001,2002,2003 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #ifndef _REGEX_H #define _REGEX_H 1 #include /* Allow the use in C++ code. */ #ifdef __cplusplus extern "C" { #endif /* POSIX says that must be included (by the caller) before . */ #if !defined _POSIX_C_SOURCE && !defined _POSIX_SOURCE && defined VMS /* VMS doesn't have `size_t' in , even though POSIX says it should be there. */ # include #endif /* The following two types have to be signed and unsigned integer type wide enough to hold a value of a pointer. For most ANSI compilers ptrdiff_t and size_t should be likely OK. Still size of these two types is 2 for Microsoft C. Ugh... */ typedef long int s_reg_t; typedef unsigned long int active_reg_t; /* The following bits are used to determine the regexp syntax we recognize. The set/not-set meanings are chosen so that Emacs syntax remains the value 0. The bits are given in alphabetical order, and the definitions shifted by one from the previous bit; thus, when we add or remove a bit, only one other definition need change. */ typedef unsigned long int reg_syntax_t; /* If this bit is not set, then \ inside a bracket expression is literal. If set, then such a \ quotes the following character. */ #define RE_BACKSLASH_ESCAPE_IN_LISTS ((unsigned long int) 1) /* If this bit is not set, then + and ? are operators, and \+ and \? are literals. If set, then \+ and \? are operators and + and ? are literals. */ #define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1) /* If this bit is set, then character classes are supported. They are: [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. If not set, then character classes are not supported. */ #define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1) /* If this bit is set, then ^ and $ are always anchors (outside bracket expressions, of course). If this bit is not set, then it depends: ^ is an anchor if it is at the beginning of a regular expression or after an open-group or an alternation operator; $ is an anchor if it is at the end of a regular expression, or before a close-group or an alternation operator. This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because POSIX draft 11.2 says that * etc. in leading positions is undefined. We already implemented a previous draft which made those constructs invalid, though, so we haven't changed the code back. */ #define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1) /* If this bit is set, then special characters are always special regardless of where they are in the pattern. If this bit is not set, then special characters are special only in some contexts; otherwise they are ordinary. Specifically, * + ? and intervals are only special when not after the beginning, open-group, or alternation operator. */ #define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1) /* If this bit is set, then *, +, ?, and { cannot be first in an re or immediately after an alternation or begin-group operator. */ #define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1) /* If this bit is set, then . matches newline. If not set, then it doesn't. */ #define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1) /* If this bit is set, then . doesn't match NUL. If not set, then it does. */ #define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1) /* If this bit is set, nonmatching lists [^...] do not match newline. If not set, they do. */ #define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1) /* If this bit is set, either \{...\} or {...} defines an interval, depending on RE_NO_BK_BRACES. If not set, \{, \}, {, and } are literals. */ #define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1) /* If this bit is set, +, ? and | aren't recognized as operators. If not set, they are. */ #define RE_LIMITED_OPS (RE_INTERVALS << 1) /* If this bit is set, newline is an alternation operator. If not set, newline is literal. */ #define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1) /* If this bit is set, then `{...}' defines an interval, and \{ and \} are literals. If not set, then `\{...\}' defines an interval. */ #define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1) /* If this bit is set, (...) defines a group, and \( and \) are literals. If not set, \(...\) defines a group, and ( and ) are literals. */ #define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1) /* If this bit is set, then \ matches . If not set, then \ is a back-reference. */ #define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1) /* If this bit is set, then | is an alternation operator, and \| is literal. If not set, then \| is an alternation operator, and | is literal. */ #define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1) /* If this bit is set, then an ending range point collating higher than the starting range point, as in [z-a], is invalid. If not set, then when ending range point collates higher than the starting range point, the range is ignored. */ #define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1) /* If this bit is set, then an unmatched ) is ordinary. If not set, then an unmatched ) is invalid. */ #define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1) /* If this bit is set, succeed as soon as we match the whole pattern, without further backtracking. */ #define RE_NO_POSIX_BACKTRACKING (RE_UNMATCHED_RIGHT_PAREN_ORD << 1) /* If this bit is set, do not process the GNU regex operators. If not set, then the GNU regex operators are recognized. */ #define RE_NO_GNU_OPS (RE_NO_POSIX_BACKTRACKING << 1) /* If this bit is set, turn on internal regex debugging. If not set, and debugging was on, turn it off. This only works if regex.c is compiled -DDEBUG. We define this bit always, so that all that's needed to turn on debugging is to recompile regex.c; the calling code can always have this bit set, and it won't affect anything in the normal case. */ #define RE_DEBUG (RE_NO_GNU_OPS << 1) /* If this bit is set, a syntactically invalid interval is treated as a string of ordinary characters. For example, the ERE 'a{1' is treated as 'a\{1'. */ #define RE_INVALID_INTERVAL_ORD (RE_DEBUG << 1) /* If this bit is set, then ignore case when matching. If not set, then case is significant. */ #define RE_ICASE (RE_INVALID_INTERVAL_ORD << 1) /* This bit is used internally like RE_CONTEXT_INDEP_ANCHORS but only for ^, because it is difficult to scan the regex backwards to find whether ^ should be special. */ #define RE_CARET_ANCHORS_HERE (RE_ICASE << 1) /* If this bit is set, then \{ cannot be first in an bre or immediately after an alternation or begin-group operator. */ #define RE_CONTEXT_INVALID_DUP (RE_CARET_ANCHORS_HERE << 1) /* If this bit is set, then no_sub will be set to 1 during re_compile_pattern. */ #define RE_NO_SUB (RE_CONTEXT_INVALID_DUP << 1) /* This global variable defines the particular regexp syntax to use (for some interfaces). When a regexp is compiled, the syntax used is stored in the pattern buffer, so changing this does not affect already-compiled regexps. */ extern reg_syntax_t re_syntax_options; /* Define combinations of the above bits for the standard possibilities. (The [[[ comments delimit what gets put into the Texinfo file, so don't delete them!) */ /* [[[begin syntaxes]]] */ #define RE_SYNTAX_EMACS 0 #define RE_SYNTAX_AWK \ (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \ | RE_DOT_NEWLINE | RE_CONTEXT_INDEP_ANCHORS \ | RE_UNMATCHED_RIGHT_PAREN_ORD | RE_NO_GNU_OPS) #define RE_SYNTAX_GNU_AWK \ ((RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DEBUG) \ & ~(RE_DOT_NOT_NULL | RE_INTERVALS | RE_CONTEXT_INDEP_OPS \ | RE_CONTEXT_INVALID_OPS )) #define RE_SYNTAX_POSIX_AWK \ (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS \ | RE_INTERVALS | RE_NO_GNU_OPS) #define RE_SYNTAX_GREP \ (RE_BK_PLUS_QM | RE_CHAR_CLASSES \ | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \ | RE_NEWLINE_ALT) #define RE_SYNTAX_EGREP \ (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \ | RE_NEWLINE_ALT | RE_NO_BK_PARENS \ | RE_NO_BK_VBAR) #define RE_SYNTAX_POSIX_EGREP \ (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES \ | RE_INVALID_INTERVAL_ORD) /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */ #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC /* Syntax bits common to both basic and extended POSIX regex syntax. */ #define _RE_SYNTAX_POSIX_COMMON \ (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \ | RE_INTERVALS | RE_NO_EMPTY_RANGES) #define RE_SYNTAX_POSIX_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM | RE_CONTEXT_INVALID_DUP) /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this isn't minimal, since other operators, such as \`, aren't disabled. */ #define RE_SYNTAX_POSIX_MINIMAL_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS) #define RE_SYNTAX_POSIX_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_VBAR \ | RE_CONTEXT_INVALID_OPS | RE_UNMATCHED_RIGHT_PAREN_ORD) /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INDEP_OPS is removed and RE_NO_BK_REFS is added. */ #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD) /* [[[end syntaxes]]] */ /* Maximum number of duplicates an interval can allow. Some systems (erroneously) define this in other header files, but we want our value, so remove any previous define. */ #ifdef RE_DUP_MAX # undef RE_DUP_MAX #endif /* If sizeof(int) == 2, then ((1 << 15) - 1) overflows. */ #define RE_DUP_MAX (0x7fff) /* POSIX `cflags' bits (i.e., information for `regcomp'). */ /* If this bit is set, then use extended regular expression syntax. If not set, then use basic regular expression syntax. */ #define REG_EXTENDED 1 /* If this bit is set, then ignore case when matching. If not set, then case is significant. */ #define REG_ICASE (REG_EXTENDED << 1) /* If this bit is set, then anchors do not match at newline characters in the string. If not set, then anchors do match at newlines. */ #define REG_NEWLINE (REG_ICASE << 1) /* If this bit is set, then report only success or fail in regexec. If not set, then returns differ between not matching and errors. */ #define REG_NOSUB (REG_NEWLINE << 1) /* POSIX `eflags' bits (i.e., information for regexec). */ /* If this bit is set, then the beginning-of-line operator doesn't match the beginning of the string (presumably because it's not the beginning of a line). If not set, then the beginning-of-line operator does match the beginning of the string. */ #define REG_NOTBOL 1 /* Like REG_NOTBOL, except for the end-of-line. */ #define REG_NOTEOL (1 << 1) /* Use PMATCH[0] to delimit the start and end of the search in the buffer. */ #define REG_STARTEND (1 << 2) /* If any error codes are removed, changed, or added, update the `re_error_msg' table in regex.c. */ typedef enum { #ifdef _XOPEN_SOURCE REG_ENOSYS = -1, /* This will never happen for this implementation. */ #endif REG_NOERROR = 0, /* Success. */ REG_NOMATCH, /* Didn't find a match (for regexec). */ /* POSIX regcomp return error codes. (In the order listed in the standard.) */ REG_BADPAT, /* Invalid pattern. */ REG_ECOLLATE, /* Inalid collating element. */ REG_ECTYPE, /* Invalid character class name. */ REG_EESCAPE, /* Trailing backslash. */ REG_ESUBREG, /* Invalid back reference. */ REG_EBRACK, /* Unmatched left bracket. */ REG_EPAREN, /* Parenthesis imbalance. */ REG_EBRACE, /* Unmatched \{. */ REG_BADBR, /* Invalid contents of \{\}. */ REG_ERANGE, /* Invalid range end. */ REG_ESPACE, /* Ran out of memory. */ REG_BADRPT, /* No preceding re for repetition op. */ /* Error codes we've added. */ REG_EEND, /* Premature end. */ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */ REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */ } reg_errcode_t; /* This data structure represents a compiled pattern. Before calling the pattern compiler, the fields `buffer', `allocated', `fastmap', `translate', and `no_sub' can be set. After the pattern has been compiled, the `re_nsub' field is available. All other fields are private to the regex routines. */ #ifndef RE_TRANSLATE_TYPE # define RE_TRANSLATE_TYPE char * #endif struct re_pattern_buffer { /* [[[begin pattern_buffer]]] */ /* Space that holds the compiled pattern. It is declared as `unsigned char *' because its elements are sometimes used as array indexes. */ unsigned char *buffer; /* Number of bytes to which `buffer' points. */ unsigned long int allocated; /* Number of bytes actually used in `buffer'. */ unsigned long int used; /* Syntax setting with which the pattern was compiled. */ reg_syntax_t syntax; /* Pointer to a fastmap, if any, otherwise zero. re_search uses the fastmap, if there is one, to skip over impossible starting points for matches. */ char *fastmap; /* Either a translate table to apply to all characters before comparing them, or zero for no translation. The translation is applied to a pattern when it is compiled and to a string when it is matched. */ RE_TRANSLATE_TYPE translate; /* Number of subexpressions found by the compiler. */ size_t re_nsub; /* Zero if this pattern cannot match the empty string, one else. Well, in truth it's used only in `re_search_2', to see whether or not we should use the fastmap, so we don't set this absolutely perfectly; see `re_compile_fastmap' (the `duplicate' case). */ unsigned can_be_null : 1; /* If REGS_UNALLOCATED, allocate space in the `regs' structure for `max (RE_NREGS, re_nsub + 1)' groups. If REGS_REALLOCATE, reallocate space if necessary. If REGS_FIXED, use what's there. */ #define REGS_UNALLOCATED 0 #define REGS_REALLOCATE 1 #define REGS_FIXED 2 unsigned regs_allocated : 2; /* Set to zero when `regex_compile' compiles a pattern; set to one by `re_compile_fastmap' if it updates the fastmap. */ unsigned fastmap_accurate : 1; /* If set, `re_match_2' does not return information about subexpressions. */ unsigned no_sub : 1; /* If set, a beginning-of-line anchor doesn't match at the beginning of the string. */ unsigned not_bol : 1; /* Similarly for an end-of-line anchor. */ unsigned not_eol : 1; /* If true, an anchor at a newline matches. */ unsigned newline_anchor : 1; /* [[[end pattern_buffer]]] */ }; typedef struct re_pattern_buffer regex_t; /* Type for byte offsets within the string. POSIX mandates this. */ typedef int regoff_t; /* This is the structure we store register match data in. See regex.texinfo for a full description of what registers match. */ struct re_registers { unsigned num_regs; regoff_t *start; regoff_t *end; }; /* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer, `re_match_2' returns information about at least this many registers the first time a `regs' structure is passed. */ #ifndef RE_NREGS # define RE_NREGS 30 #endif /* POSIX specification for registers. Aside from the different names than `re_registers', POSIX uses an array of structures, instead of a structure of arrays. */ typedef struct { regoff_t rm_so; /* Byte offset from string's start to substring's start. */ regoff_t rm_eo; /* Byte offset from string's start to substring's end. */ } regmatch_t; /* Declarations for routines. */ /* To avoid duplicating every routine declaration -- once with a prototype (if we are ANSI), and once without (if we aren't) -- we use the following macro to declare argument types. This unfortunately clutters up the declarations a bit, but I think it's worth it. */ #if __STDC__ # define _RE_ARGS(args) args #else /* not __STDC__ */ # define _RE_ARGS(args) () #endif /* not __STDC__ */ /* Sets the current default syntax to SYNTAX, and return the old syntax. You can also simply assign to the `re_syntax_options' variable. */ extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax)); /* Compile the regular expression PATTERN, with length LENGTH and syntax given by the global `re_syntax_options', into the buffer BUFFER. Return NULL if successful, and an error string if not. */ extern const char *re_compile_pattern _RE_ARGS ((const char *pattern, size_t length, struct re_pattern_buffer *buffer)); /* Compile a fastmap for the compiled pattern in BUFFER; used to accelerate searches. Return 0 if successful and -2 if was an internal error. */ extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer)); /* Search in the string STRING (with length LENGTH) for the pattern compiled into BUFFER. Start searching at position START, for RANGE characters. Return the starting position of the match, -1 for no match, or -2 for an internal error. Also return register information in REGS (if REGS and BUFFER->no_sub are nonzero). */ extern int re_search _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, int range, struct re_registers *regs)); /* Like `re_search', but search in the concatenation of STRING1 and STRING2. Also, stop searching at index START + STOP. */ extern int re_search_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, int range, struct re_registers *regs, int stop)); /* Like `re_search', but return how many characters in STRING the regexp in BUFFER matched, starting at position START. */ extern int re_match _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, struct re_registers *regs)); /* Relates to `re_match' as `re_search_2' relates to `re_search'. */ extern int re_match_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, struct re_registers *regs, int stop)); /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated with malloc, and must each be at least `NUM_REGS * sizeof (regoff_t)' bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ extern void re_set_registers _RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs, unsigned num_regs, regoff_t *starts, regoff_t *ends)); #if defined _REGEX_RE_COMP || defined _LIBC # ifndef _CRAY /* 4.2 bsd compatibility. */ extern char *re_comp _RE_ARGS ((const char *)); extern int re_exec _RE_ARGS ((const char *)); # endif #endif /* GCC 2.95 and later have "__restrict"; C99 compilers have "restrict", and "configure" may have defined "restrict". */ #ifndef __restrict # if ! (2 < __GNUC__ || (2 == __GNUC__ && 95 <= __GNUC_MINOR__)) # if defined restrict || 199901L <= __STDC_VERSION__ # define __restrict restrict # else # define __restrict # endif # endif #endif /* gcc 3.1 and up support the [restrict] syntax. */ #ifndef __restrict_arr # if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1) # define __restrict_arr __restrict # else # define __restrict_arr # endif #endif /* POSIX compatibility. */ extern int regcomp _RE_ARGS ((regex_t *__restrict __preg, const char *__restrict __pattern, int __cflags)); extern int regexec _RE_ARGS ((const regex_t *__restrict __preg, const char *__restrict __string, size_t __nmatch, regmatch_t __pmatch[__restrict_arr], int __eflags)); extern size_t regerror _RE_ARGS ((int __errcode, const regex_t *__preg, char *__errbuf, size_t __errbuf_size)); extern void regfree _RE_ARGS ((regex_t *__preg)); #ifdef __cplusplus } #endif /* C++ */ #endif /* regex.h */ /* Local variables: make-backup-files: t version-control: t trim-versions-without-asking: nil End: */ Yeti-6.4.0/regex/glibc/regex_internal.c000066400000000000000000001421371253351442600200140ustar00rootroot00000000000000/* Extended regular expression matching and search library. Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Isamu Hasegawa . The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ static void re_string_construct_common (const char *str, int len, re_string_t *pstr, RE_TRANSLATE_TYPE trans, int icase, const re_dfa_t *dfa) internal_function; #ifdef RE_ENABLE_I18N static int re_string_skip_chars (re_string_t *pstr, int new_raw_idx, wint_t *last_wc) internal_function; #endif /* RE_ENABLE_I18N */ static reg_errcode_t register_state (re_dfa_t *dfa, re_dfastate_t *newstate, unsigned int hash) internal_function; static re_dfastate_t *create_ci_newstate (re_dfa_t *dfa, const re_node_set *nodes, unsigned int hash) internal_function; static re_dfastate_t *create_cd_newstate (re_dfa_t *dfa, const re_node_set *nodes, unsigned int context, unsigned int hash) internal_function; static unsigned int inline calc_state_hash (const re_node_set *nodes, unsigned int context) internal_function; /* Functions for string operation. */ /* This function allocate the buffers. It is necessary to call re_string_reconstruct before using the object. */ static reg_errcode_t re_string_allocate (pstr, str, len, init_len, trans, icase, dfa) re_string_t *pstr; const char *str; int len, init_len, icase; RE_TRANSLATE_TYPE trans; const re_dfa_t *dfa; { reg_errcode_t ret; int init_buf_len; /* Ensure at least one character fits into the buffers. */ if (init_len < dfa->mb_cur_max) init_len = dfa->mb_cur_max; init_buf_len = (len + 1 < init_len) ? len + 1: init_len; re_string_construct_common (str, len, pstr, trans, icase, dfa); ret = re_string_realloc_buffers (pstr, init_buf_len); if (BE (ret != REG_NOERROR, 0)) return ret; pstr->word_char = dfa->word_char; pstr->word_ops_used = dfa->word_ops_used; pstr->mbs = pstr->mbs_allocated ? pstr->mbs : (unsigned char *) str; pstr->valid_len = (pstr->mbs_allocated || dfa->mb_cur_max > 1) ? 0 : len; pstr->valid_raw_len = pstr->valid_len; return REG_NOERROR; } /* This function allocate the buffers, and initialize them. */ static reg_errcode_t re_string_construct (pstr, str, len, trans, icase, dfa) re_string_t *pstr; const char *str; int len, icase; RE_TRANSLATE_TYPE trans; const re_dfa_t *dfa; { reg_errcode_t ret; memset (pstr, '\0', sizeof (re_string_t)); re_string_construct_common (str, len, pstr, trans, icase, dfa); if (len > 0) { ret = re_string_realloc_buffers (pstr, len + 1); if (BE (ret != REG_NOERROR, 0)) return ret; } pstr->mbs = pstr->mbs_allocated ? pstr->mbs : (unsigned char *) str; if (icase) { #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) { while (1) { ret = build_wcs_upper_buffer (pstr); if (BE (ret != REG_NOERROR, 0)) return ret; if (pstr->valid_raw_len >= len) break; if (pstr->bufs_len > pstr->valid_len + dfa->mb_cur_max) break; ret = re_string_realloc_buffers (pstr, pstr->bufs_len * 2); if (BE (ret != REG_NOERROR, 0)) return ret; } } else #endif /* RE_ENABLE_I18N */ build_upper_buffer (pstr); } else { #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) build_wcs_buffer (pstr); else #endif /* RE_ENABLE_I18N */ { if (trans != NULL) re_string_translate_buffer (pstr); else { pstr->valid_len = pstr->bufs_len; pstr->valid_raw_len = pstr->bufs_len; } } } return REG_NOERROR; } /* Helper functions for re_string_allocate, and re_string_construct. */ static reg_errcode_t re_string_realloc_buffers (pstr, new_buf_len) re_string_t *pstr; int new_buf_len; { #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) { wint_t *new_array = re_realloc (pstr->wcs, wint_t, new_buf_len); if (BE (new_array == NULL, 0)) return REG_ESPACE; pstr->wcs = new_array; if (pstr->offsets != NULL) { int *new_array = re_realloc (pstr->offsets, int, new_buf_len); if (BE (new_array == NULL, 0)) return REG_ESPACE; pstr->offsets = new_array; } } #endif /* RE_ENABLE_I18N */ if (pstr->mbs_allocated) { unsigned char *new_array = re_realloc (pstr->mbs, unsigned char, new_buf_len); if (BE (new_array == NULL, 0)) return REG_ESPACE; pstr->mbs = new_array; } pstr->bufs_len = new_buf_len; return REG_NOERROR; } static void re_string_construct_common (str, len, pstr, trans, icase, dfa) const char *str; int len; re_string_t *pstr; RE_TRANSLATE_TYPE trans; int icase; const re_dfa_t *dfa; { pstr->raw_mbs = (const unsigned char *) str; pstr->len = len; pstr->raw_len = len; pstr->trans = (unsigned RE_TRANSLATE_TYPE) trans; pstr->icase = icase ? 1 : 0; pstr->mbs_allocated = (trans != NULL || icase); pstr->mb_cur_max = dfa->mb_cur_max; pstr->is_utf8 = dfa->is_utf8; pstr->map_notascii = dfa->map_notascii; pstr->stop = pstr->len; pstr->raw_stop = pstr->stop; } #ifdef RE_ENABLE_I18N /* Build wide character buffer PSTR->WCS. If the byte sequence of the string are: (0), (1), (0), (1), Then wide character buffer will be: , WEOF , , WEOF , We use WEOF for padding, they indicate that the position isn't a first byte of a multibyte character. Note that this function assumes PSTR->VALID_LEN elements are already built and starts from PSTR->VALID_LEN. */ static void build_wcs_buffer (pstr) re_string_t *pstr; { #ifdef _LIBC unsigned char buf[pstr->mb_cur_max]; #else unsigned char buf[64]; #endif mbstate_t prev_st; int byte_idx, end_idx, mbclen, remain_len; /* Build the buffers from pstr->valid_len to either pstr->len or pstr->bufs_len. */ end_idx = (pstr->bufs_len > pstr->len) ? pstr->len : pstr->bufs_len; for (byte_idx = pstr->valid_len; byte_idx < end_idx;) { wchar_t wc; const char *p; remain_len = end_idx - byte_idx; prev_st = pstr->cur_state; /* Apply the translation if we need. */ if (BE (pstr->trans != NULL, 0)) { int i, ch; for (i = 0; i < pstr->mb_cur_max && i < remain_len; ++i) { ch = pstr->raw_mbs [pstr->raw_mbs_idx + byte_idx + i]; buf[i] = pstr->mbs[byte_idx + i] = pstr->trans[ch]; } p = (const char *) buf; } else p = (const char *) pstr->raw_mbs + pstr->raw_mbs_idx + byte_idx; mbclen = mbrtowc (&wc, p, remain_len, &pstr->cur_state); if (BE (mbclen == (size_t) -2, 0)) { /* The buffer doesn't have enough space, finish to build. */ pstr->cur_state = prev_st; break; } else if (BE (mbclen == (size_t) -1 || mbclen == 0, 0)) { /* We treat these cases as a singlebyte character. */ mbclen = 1; wc = (wchar_t) pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; if (BE (pstr->trans != NULL, 0)) wc = pstr->trans[wc]; pstr->cur_state = prev_st; } /* Write wide character and padding. */ pstr->wcs[byte_idx++] = wc; /* Write paddings. */ for (remain_len = byte_idx + mbclen - 1; byte_idx < remain_len ;) pstr->wcs[byte_idx++] = WEOF; } pstr->valid_len = byte_idx; pstr->valid_raw_len = byte_idx; } /* Build wide character buffer PSTR->WCS like build_wcs_buffer, but for REG_ICASE. */ static int build_wcs_upper_buffer (pstr) re_string_t *pstr; { mbstate_t prev_st; int src_idx, byte_idx, end_idx, mbclen, remain_len; #ifdef _LIBC unsigned char buf[pstr->mb_cur_max]; #else unsigned char buf[64]; #endif byte_idx = pstr->valid_len; end_idx = (pstr->bufs_len > pstr->len) ? pstr->len : pstr->bufs_len; /* The following optimization assumes that ASCII characters can be mapped to wide characters with a simple cast. */ if (! pstr->map_notascii && pstr->trans == NULL && !pstr->offsets_needed) { while (byte_idx < end_idx) { wchar_t wc; if (isascii (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]) && mbsinit (&pstr->cur_state)) { /* In case of a singlebyte character. */ pstr->mbs[byte_idx] = toupper (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]); /* The next step uses the assumption that wchar_t is encoded ASCII-safe: all ASCII values can be converted like this. */ pstr->wcs[byte_idx] = (wchar_t) pstr->mbs[byte_idx]; ++byte_idx; continue; } remain_len = end_idx - byte_idx; prev_st = pstr->cur_state; mbclen = mbrtowc (&wc, ((const char *) pstr->raw_mbs + pstr->raw_mbs_idx + byte_idx), remain_len, &pstr->cur_state); if (BE (mbclen > 0, 1)) { wchar_t wcu = wc; if (iswlower (wc)) { int mbcdlen; wcu = towupper (wc); mbcdlen = wcrtomb (buf, wcu, &prev_st); if (BE (mbclen == mbcdlen, 1)) memcpy (pstr->mbs + byte_idx, buf, mbclen); else { src_idx = byte_idx; goto offsets_needed; } } else memcpy (pstr->mbs + byte_idx, pstr->raw_mbs + pstr->raw_mbs_idx + byte_idx, mbclen); pstr->wcs[byte_idx++] = wcu; /* Write paddings. */ for (remain_len = byte_idx + mbclen - 1; byte_idx < remain_len ;) pstr->wcs[byte_idx++] = WEOF; } else if (mbclen == (size_t) -1 || mbclen == 0) { /* It is an invalid character or '\0'. Just use the byte. */ int ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; pstr->mbs[byte_idx] = ch; /* And also cast it to wide char. */ pstr->wcs[byte_idx++] = (wchar_t) ch; if (BE (mbclen == (size_t) -1, 0)) pstr->cur_state = prev_st; } else { /* The buffer doesn't have enough space, finish to build. */ pstr->cur_state = prev_st; break; } } pstr->valid_len = byte_idx; pstr->valid_raw_len = byte_idx; return REG_NOERROR; } else for (src_idx = pstr->valid_raw_len; byte_idx < end_idx;) { wchar_t wc; const char *p; offsets_needed: remain_len = end_idx - byte_idx; prev_st = pstr->cur_state; if (BE (pstr->trans != NULL, 0)) { int i, ch; for (i = 0; i < pstr->mb_cur_max && i < remain_len; ++i) { ch = pstr->raw_mbs [pstr->raw_mbs_idx + src_idx + i]; buf[i] = pstr->trans[ch]; } p = (const char *) buf; } else p = (const char *) pstr->raw_mbs + pstr->raw_mbs_idx + src_idx; mbclen = mbrtowc (&wc, p, remain_len, &pstr->cur_state); if (BE (mbclen > 0, 1)) { wchar_t wcu = wc; if (iswlower (wc)) { int mbcdlen; wcu = towupper (wc); mbcdlen = wcrtomb ((char *) buf, wcu, &prev_st); if (BE (mbclen == mbcdlen, 1)) memcpy (pstr->mbs + byte_idx, buf, mbclen); else { int i; if (byte_idx + mbcdlen > pstr->bufs_len) { pstr->cur_state = prev_st; break; } if (pstr->offsets == NULL) { pstr->offsets = re_malloc (int, pstr->bufs_len); if (pstr->offsets == NULL) return REG_ESPACE; } if (!pstr->offsets_needed) { for (i = 0; i < byte_idx; ++i) pstr->offsets[i] = i; pstr->offsets_needed = 1; } memcpy (pstr->mbs + byte_idx, buf, mbcdlen); pstr->wcs[byte_idx] = wcu; pstr->offsets[byte_idx] = src_idx; for (i = 1; i < mbcdlen; ++i) { pstr->offsets[byte_idx + i] = src_idx + (i < mbclen ? i : mbclen - 1); pstr->wcs[byte_idx + i] = WEOF; } pstr->len += mbcdlen - mbclen; if (pstr->raw_stop > src_idx) pstr->stop += mbcdlen - mbclen; end_idx = (pstr->bufs_len > pstr->len) ? pstr->len : pstr->bufs_len; byte_idx += mbcdlen; src_idx += mbclen; continue; } } else memcpy (pstr->mbs + byte_idx, p, mbclen); if (BE (pstr->offsets_needed != 0, 0)) { int i; for (i = 0; i < mbclen; ++i) pstr->offsets[byte_idx + i] = src_idx + i; } src_idx += mbclen; pstr->wcs[byte_idx++] = wcu; /* Write paddings. */ for (remain_len = byte_idx + mbclen - 1; byte_idx < remain_len ;) pstr->wcs[byte_idx++] = WEOF; } else if (mbclen == (size_t) -1 || mbclen == 0) { /* It is an invalid character or '\0'. Just use the byte. */ int ch = pstr->raw_mbs[pstr->raw_mbs_idx + src_idx]; if (BE (pstr->trans != NULL, 0)) ch = pstr->trans [ch]; pstr->mbs[byte_idx] = ch; if (BE (pstr->offsets_needed != 0, 0)) pstr->offsets[byte_idx] = src_idx; ++src_idx; /* And also cast it to wide char. */ pstr->wcs[byte_idx++] = (wchar_t) ch; if (BE (mbclen == (size_t) -1, 0)) pstr->cur_state = prev_st; } else { /* The buffer doesn't have enough space, finish to build. */ pstr->cur_state = prev_st; break; } } pstr->valid_len = byte_idx; pstr->valid_raw_len = src_idx; return REG_NOERROR; } /* Skip characters until the index becomes greater than NEW_RAW_IDX. Return the index. */ static int re_string_skip_chars (pstr, new_raw_idx, last_wc) re_string_t *pstr; int new_raw_idx; wint_t *last_wc; { mbstate_t prev_st; int rawbuf_idx, mbclen; wchar_t wc = 0; /* Skip the characters which are not necessary to check. */ for (rawbuf_idx = pstr->raw_mbs_idx + pstr->valid_raw_len; rawbuf_idx < new_raw_idx;) { int remain_len; remain_len = pstr->len - rawbuf_idx; prev_st = pstr->cur_state; mbclen = mbrtowc (&wc, (const char *) pstr->raw_mbs + rawbuf_idx, remain_len, &pstr->cur_state); if (BE (mbclen == (size_t) -2 || mbclen == (size_t) -1 || mbclen == 0, 0)) { /* We treat these cases as a singlebyte character. */ mbclen = 1; pstr->cur_state = prev_st; } /* Then proceed the next character. */ rawbuf_idx += mbclen; } *last_wc = (wint_t) wc; return rawbuf_idx; } #endif /* RE_ENABLE_I18N */ /* Build the buffer PSTR->MBS, and apply the translation if we need. This function is used in case of REG_ICASE. */ static void build_upper_buffer (pstr) re_string_t *pstr; { int char_idx, end_idx; end_idx = (pstr->bufs_len > pstr->len) ? pstr->len : pstr->bufs_len; for (char_idx = pstr->valid_len; char_idx < end_idx; ++char_idx) { int ch = pstr->raw_mbs[pstr->raw_mbs_idx + char_idx]; if (BE (pstr->trans != NULL, 0)) ch = pstr->trans[ch]; if (islower (ch)) pstr->mbs[char_idx] = toupper (ch); else pstr->mbs[char_idx] = ch; } pstr->valid_len = char_idx; pstr->valid_raw_len = char_idx; } /* Apply TRANS to the buffer in PSTR. */ static void re_string_translate_buffer (pstr) re_string_t *pstr; { int buf_idx, end_idx; end_idx = (pstr->bufs_len > pstr->len) ? pstr->len : pstr->bufs_len; for (buf_idx = pstr->valid_len; buf_idx < end_idx; ++buf_idx) { int ch = pstr->raw_mbs[pstr->raw_mbs_idx + buf_idx]; pstr->mbs[buf_idx] = pstr->trans[ch]; } pstr->valid_len = buf_idx; pstr->valid_raw_len = buf_idx; } /* This function re-construct the buffers. Concretely, convert to wide character in case of pstr->mb_cur_max > 1, convert to upper case in case of REG_ICASE, apply translation. */ static reg_errcode_t re_string_reconstruct (pstr, idx, eflags) re_string_t *pstr; int idx, eflags; { int offset = idx - pstr->raw_mbs_idx; if (BE (offset < 0, 0)) { /* Reset buffer. */ #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) memset (&pstr->cur_state, '\0', sizeof (mbstate_t)); #endif /* RE_ENABLE_I18N */ pstr->len = pstr->raw_len; pstr->stop = pstr->raw_stop; pstr->valid_len = 0; pstr->raw_mbs_idx = 0; pstr->valid_raw_len = 0; pstr->offsets_needed = 0; pstr->tip_context = ((eflags & REG_NOTBOL) ? CONTEXT_BEGBUF : CONTEXT_NEWLINE | CONTEXT_BEGBUF); if (!pstr->mbs_allocated) pstr->mbs = (unsigned char *) pstr->raw_mbs; offset = idx; } if (BE (offset != 0, 1)) { /* Are the characters which are already checked remain? */ if (BE (offset < pstr->valid_raw_len, 1) #ifdef RE_ENABLE_I18N /* Handling this would enlarge the code too much. Accept a slowdown in that case. */ && pstr->offsets_needed == 0 #endif ) { /* Yes, move them to the front of the buffer. */ pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags); #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) memmove (pstr->wcs, pstr->wcs + offset, (pstr->valid_len - offset) * sizeof (wint_t)); #endif /* RE_ENABLE_I18N */ if (BE (pstr->mbs_allocated, 0)) memmove (pstr->mbs, pstr->mbs + offset, pstr->valid_len - offset); pstr->valid_len -= offset; pstr->valid_raw_len -= offset; #if DEBUG assert (pstr->valid_len > 0); #endif } else { /* No, skip all characters until IDX. */ #ifdef RE_ENABLE_I18N if (BE (pstr->offsets_needed, 0)) { pstr->len = pstr->raw_len - idx + offset; pstr->stop = pstr->raw_stop - idx + offset; pstr->offsets_needed = 0; } #endif pstr->valid_len = 0; pstr->valid_raw_len = 0; #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) { int wcs_idx; wint_t wc = WEOF; if (pstr->is_utf8) { const unsigned char *raw, *p, *q, *end; /* Special case UTF-8. Multi-byte chars start with any byte other than 0x80 - 0xbf. */ raw = pstr->raw_mbs + pstr->raw_mbs_idx; end = raw + (offset - pstr->mb_cur_max); for (p = raw + offset - 1; p >= end; --p) if ((*p & 0xc0) != 0x80) { mbstate_t cur_state; wchar_t wc2; int mlen = raw + pstr->len - p; unsigned char buf[6]; q = p; if (BE (pstr->trans != NULL, 0)) { int i = mlen < 6 ? mlen : 6; while (--i >= 0) buf[i] = pstr->trans[p[i]]; q = buf; } /* XXX Don't use mbrtowc, we know which conversion to use (UTF-8 -> UCS4). */ memset (&cur_state, 0, sizeof (cur_state)); mlen = mbrtowc (&wc2, p, mlen, &cur_state) - (raw + offset - p); if (mlen >= 0) { memset (&pstr->cur_state, '\0', sizeof (mbstate_t)); pstr->valid_len = mlen; wc = wc2; } break; } } if (wc == WEOF) pstr->valid_len = re_string_skip_chars (pstr, idx, &wc) - idx; if (BE (pstr->valid_len, 0)) { for (wcs_idx = 0; wcs_idx < pstr->valid_len; ++wcs_idx) pstr->wcs[wcs_idx] = WEOF; if (pstr->mbs_allocated) memset (pstr->mbs, 255, pstr->valid_len); } pstr->valid_raw_len = pstr->valid_len; pstr->tip_context = ((BE (pstr->word_ops_used != 0, 0) && IS_WIDE_WORD_CHAR (wc)) ? CONTEXT_WORD : ((IS_WIDE_NEWLINE (wc) && pstr->newline_anchor) ? CONTEXT_NEWLINE : 0)); } else #endif /* RE_ENABLE_I18N */ { int c = pstr->raw_mbs[pstr->raw_mbs_idx + offset - 1]; if (pstr->trans) c = pstr->trans[c]; pstr->tip_context = (bitset_contain (pstr->word_char, c) ? CONTEXT_WORD : ((IS_NEWLINE (c) && pstr->newline_anchor) ? CONTEXT_NEWLINE : 0)); } } if (!BE (pstr->mbs_allocated, 0)) pstr->mbs += offset; } pstr->raw_mbs_idx = idx; pstr->len -= offset; pstr->stop -= offset; /* Then build the buffers. */ #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) { if (pstr->icase) { int ret = build_wcs_upper_buffer (pstr); if (BE (ret != REG_NOERROR, 0)) return ret; } else build_wcs_buffer (pstr); } else #endif /* RE_ENABLE_I18N */ if (BE (pstr->mbs_allocated, 0)) { if (pstr->icase) build_upper_buffer (pstr); else if (pstr->trans != NULL) re_string_translate_buffer (pstr); } else pstr->valid_len = pstr->len; pstr->cur_idx = 0; return REG_NOERROR; } static unsigned char re_string_peek_byte_case (pstr, idx) const re_string_t *pstr; int idx; { int ch, off; /* Handle the common (easiest) cases first. */ if (BE (!pstr->mbs_allocated, 1)) return re_string_peek_byte (pstr, idx); #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1 && ! re_string_is_single_byte_char (pstr, pstr->cur_idx + idx)) return re_string_peek_byte (pstr, idx); #endif off = pstr->cur_idx + idx; #ifdef RE_ENABLE_I18N if (pstr->offsets_needed) off = pstr->offsets[off]; #endif ch = pstr->raw_mbs[pstr->raw_mbs_idx + off]; #ifdef RE_ENABLE_I18N /* Ensure that e.g. for tr_TR.UTF-8 BACKSLASH DOTLESS SMALL LETTER I this function returns CAPITAL LETTER I instead of first byte of DOTLESS SMALL LETTER I. The latter would confuse the parser, since peek_byte_case doesn't advance cur_idx in any way. */ if (pstr->offsets_needed && !isascii (ch)) return re_string_peek_byte (pstr, idx); #endif return ch; } static unsigned char re_string_fetch_byte_case (pstr) re_string_t *pstr; { if (BE (!pstr->mbs_allocated, 1)) return re_string_fetch_byte (pstr); #ifdef RE_ENABLE_I18N if (pstr->offsets_needed) { int off, ch; /* For tr_TR.UTF-8 [[:islower:]] there is [[: CAPITAL LETTER I WITH DOT lower:]] in mbs. Skip in that case the whole multi-byte character and return the original letter. On the other side, with [[: DOTLESS SMALL LETTER I return [[:I, as doing anything else would complicate things too much. */ if (!re_string_first_byte (pstr, pstr->cur_idx)) return re_string_fetch_byte (pstr); off = pstr->offsets[pstr->cur_idx]; ch = pstr->raw_mbs[pstr->raw_mbs_idx + off]; if (! isascii (ch)) return re_string_fetch_byte (pstr); re_string_skip_bytes (pstr, re_string_char_size_at (pstr, pstr->cur_idx)); return ch; } #endif return pstr->raw_mbs[pstr->raw_mbs_idx + pstr->cur_idx++]; } static void re_string_destruct (pstr) re_string_t *pstr; { #ifdef RE_ENABLE_I18N re_free (pstr->wcs); re_free (pstr->offsets); #endif /* RE_ENABLE_I18N */ if (pstr->mbs_allocated) re_free (pstr->mbs); } /* Return the context at IDX in INPUT. */ static unsigned int re_string_context_at (input, idx, eflags) const re_string_t *input; int idx, eflags; { int c; if (BE (idx < 0, 0)) /* In this case, we use the value stored in input->tip_context, since we can't know the character in input->mbs[-1] here. */ return input->tip_context; if (BE (idx == input->len, 0)) return ((eflags & REG_NOTEOL) ? CONTEXT_ENDBUF : CONTEXT_NEWLINE | CONTEXT_ENDBUF); #ifdef RE_ENABLE_I18N if (input->mb_cur_max > 1) { wint_t wc; int wc_idx = idx; while(input->wcs[wc_idx] == WEOF) { #ifdef DEBUG /* It must not happen. */ assert (wc_idx >= 0); #endif --wc_idx; if (wc_idx < 0) return input->tip_context; } wc = input->wcs[wc_idx]; if (BE (input->word_ops_used != 0, 0) && IS_WIDE_WORD_CHAR (wc)) return CONTEXT_WORD; return (IS_WIDE_NEWLINE (wc) && input->newline_anchor ? CONTEXT_NEWLINE : 0); } else #endif { c = re_string_byte_at (input, idx); if (bitset_contain (input->word_char, c)) return CONTEXT_WORD; return IS_NEWLINE (c) && input->newline_anchor ? CONTEXT_NEWLINE : 0; } } /* Functions for set operation. */ static reg_errcode_t re_node_set_alloc (set, size) re_node_set *set; int size; { set->alloc = size; set->nelem = 0; set->elems = re_malloc (int, size); if (BE (set->elems == NULL, 0)) return REG_ESPACE; return REG_NOERROR; } static reg_errcode_t re_node_set_init_1 (set, elem) re_node_set *set; int elem; { set->alloc = 1; set->nelem = 1; set->elems = re_malloc (int, 1); if (BE (set->elems == NULL, 0)) { set->alloc = set->nelem = 0; return REG_ESPACE; } set->elems[0] = elem; return REG_NOERROR; } static reg_errcode_t re_node_set_init_2 (set, elem1, elem2) re_node_set *set; int elem1, elem2; { set->alloc = 2; set->elems = re_malloc (int, 2); if (BE (set->elems == NULL, 0)) return REG_ESPACE; if (elem1 == elem2) { set->nelem = 1; set->elems[0] = elem1; } else { set->nelem = 2; if (elem1 < elem2) { set->elems[0] = elem1; set->elems[1] = elem2; } else { set->elems[0] = elem2; set->elems[1] = elem1; } } return REG_NOERROR; } static reg_errcode_t re_node_set_init_copy (dest, src) re_node_set *dest; const re_node_set *src; { dest->nelem = src->nelem; if (src->nelem > 0) { dest->alloc = dest->nelem; dest->elems = re_malloc (int, dest->alloc); if (BE (dest->elems == NULL, 0)) { dest->alloc = dest->nelem = 0; return REG_ESPACE; } memcpy (dest->elems, src->elems, src->nelem * sizeof (int)); } else re_node_set_init_empty (dest); return REG_NOERROR; } /* Calculate the intersection of the sets SRC1 and SRC2. And merge it to DEST. Return value indicate the error code or REG_NOERROR if succeeded. Note: We assume dest->elems is NULL, when dest->alloc is 0. */ static reg_errcode_t re_node_set_add_intersect (dest, src1, src2) re_node_set *dest; const re_node_set *src1, *src2; { int i1, i2, is, id, delta, sbase; if (src1->nelem == 0 || src2->nelem == 0) return REG_NOERROR; /* We need dest->nelem + 2 * elems_in_intersection; this is a conservative estimate. */ if (src1->nelem + src2->nelem + dest->nelem > dest->alloc) { int new_alloc = src1->nelem + src2->nelem + dest->alloc; int *new_elems = re_realloc (dest->elems, int, new_alloc); if (BE (new_elems == NULL, 0)) return REG_ESPACE; dest->elems = new_elems; dest->alloc = new_alloc; } /* Find the items in the intersection of SRC1 and SRC2, and copy into the top of DEST those that are not already in DEST itself. */ sbase = dest->nelem + src1->nelem + src2->nelem; i1 = src1->nelem - 1; i2 = src2->nelem - 1; id = dest->nelem - 1; for (;;) { if (src1->elems[i1] == src2->elems[i2]) { /* Try to find the item in DEST. Maybe we could binary search? */ while (id >= 0 && dest->elems[id] > src1->elems[i1]) --id; if (id < 0 || dest->elems[id] != src1->elems[i1]) dest->elems[--sbase] = src1->elems[i1]; if (--i1 < 0 || --i2 < 0) break; } /* Lower the highest of the two items. */ else if (src1->elems[i1] < src2->elems[i2]) { if (--i2 < 0) break; } else { if (--i1 < 0) break; } } id = dest->nelem - 1; is = dest->nelem + src1->nelem + src2->nelem - 1; delta = is - sbase + 1; /* Now copy. When DELTA becomes zero, the remaining DEST elements are already in place; this is more or less the same loop that is in re_node_set_merge. */ dest->nelem += delta; if (delta > 0 && id >= 0) for (;;) { if (dest->elems[is] > dest->elems[id]) { /* Copy from the top. */ dest->elems[id + delta--] = dest->elems[is--]; if (delta == 0) break; } else { /* Slide from the bottom. */ dest->elems[id + delta] = dest->elems[id]; if (--id < 0) break; } } /* Copy remaining SRC elements. */ memcpy (dest->elems, dest->elems + sbase, delta * sizeof (int)); return REG_NOERROR; } /* Calculate the union set of the sets SRC1 and SRC2. And store it to DEST. Return value indicate the error code or REG_NOERROR if succeeded. */ static reg_errcode_t re_node_set_init_union (dest, src1, src2) re_node_set *dest; const re_node_set *src1, *src2; { int i1, i2, id; if (src1 != NULL && src1->nelem > 0 && src2 != NULL && src2->nelem > 0) { dest->alloc = src1->nelem + src2->nelem; dest->elems = re_malloc (int, dest->alloc); if (BE (dest->elems == NULL, 0)) return REG_ESPACE; } else { if (src1 != NULL && src1->nelem > 0) return re_node_set_init_copy (dest, src1); else if (src2 != NULL && src2->nelem > 0) return re_node_set_init_copy (dest, src2); else re_node_set_init_empty (dest); return REG_NOERROR; } for (i1 = i2 = id = 0 ; i1 < src1->nelem && i2 < src2->nelem ;) { if (src1->elems[i1] > src2->elems[i2]) { dest->elems[id++] = src2->elems[i2++]; continue; } if (src1->elems[i1] == src2->elems[i2]) ++i2; dest->elems[id++] = src1->elems[i1++]; } if (i1 < src1->nelem) { memcpy (dest->elems + id, src1->elems + i1, (src1->nelem - i1) * sizeof (int)); id += src1->nelem - i1; } else if (i2 < src2->nelem) { memcpy (dest->elems + id, src2->elems + i2, (src2->nelem - i2) * sizeof (int)); id += src2->nelem - i2; } dest->nelem = id; return REG_NOERROR; } /* Calculate the union set of the sets DEST and SRC. And store it to DEST. Return value indicate the error code or REG_NOERROR if succeeded. */ static reg_errcode_t re_node_set_merge (dest, src) re_node_set *dest; const re_node_set *src; { int is, id, sbase, delta; if (src == NULL || src->nelem == 0) return REG_NOERROR; if (dest->alloc < 2 * src->nelem + dest->nelem) { int new_alloc = 2 * (src->nelem + dest->alloc); int *new_buffer = re_realloc (dest->elems, int, new_alloc); if (BE (new_buffer == NULL, 0)) return REG_ESPACE; dest->elems = new_buffer; dest->alloc = new_alloc; } if (BE (dest->nelem == 0, 0)) { dest->nelem = src->nelem; memcpy (dest->elems, src->elems, src->nelem * sizeof (int)); return REG_NOERROR; } /* Copy into the top of DEST the items of SRC that are not found in DEST. Maybe we could binary search in DEST? */ for (sbase = dest->nelem + 2 * src->nelem, is = src->nelem - 1, id = dest->nelem - 1; is >= 0 && id >= 0; ) { if (dest->elems[id] == src->elems[is]) is--, id--; else if (dest->elems[id] < src->elems[is]) dest->elems[--sbase] = src->elems[is--]; else /* if (dest->elems[id] > src->elems[is]) */ --id; } if (is >= 0) { /* If DEST is exhausted, the remaining items of SRC must be unique. */ sbase -= is + 1; memcpy (dest->elems + sbase, src->elems, (is + 1) * sizeof (int)); } id = dest->nelem - 1; is = dest->nelem + 2 * src->nelem - 1; delta = is - sbase + 1; if (delta == 0) return REG_NOERROR; /* Now copy. When DELTA becomes zero, the remaining DEST elements are already in place. */ dest->nelem += delta; for (;;) { if (dest->elems[is] > dest->elems[id]) { /* Copy from the top. */ dest->elems[id + delta--] = dest->elems[is--]; if (delta == 0) break; } else { /* Slide from the bottom. */ dest->elems[id + delta] = dest->elems[id]; if (--id < 0) { /* Copy remaining SRC elements. */ memcpy (dest->elems, dest->elems + sbase, delta * sizeof (int)); break; } } } return REG_NOERROR; } /* Insert the new element ELEM to the re_node_set* SET. SET should not already have ELEM. return -1 if an error is occured, return 1 otherwise. */ static int re_node_set_insert (set, elem) re_node_set *set; int elem; { int idx; /* In case the set is empty. */ if (set->alloc == 0) { if (BE (re_node_set_init_1 (set, elem) == REG_NOERROR, 1)) return 1; else return -1; } if (BE (set->nelem, 0) == 0) { /* We already guaranteed above that set->alloc != 0. */ set->elems[0] = elem; ++set->nelem; return 1; } /* Realloc if we need. */ if (set->alloc == set->nelem) { int *new_array; set->alloc = set->alloc * 2; new_array = re_realloc (set->elems, int, set->alloc); if (BE (new_array == NULL, 0)) return -1; set->elems = new_array; } /* Move the elements which follows the new element. Test the first element separately to skip a check in the inner loop. */ if (elem < set->elems[0]) { idx = 0; for (idx = set->nelem; idx > 0; idx--) set->elems[idx] = set->elems[idx - 1]; } else { for (idx = set->nelem; set->elems[idx - 1] > elem; idx--) set->elems[idx] = set->elems[idx - 1]; } /* Insert the new element. */ set->elems[idx] = elem; ++set->nelem; return 1; } /* Insert the new element ELEM to the re_node_set* SET. SET should not already have any element greater than or equal to ELEM. Return -1 if an error is occured, return 1 otherwise. */ static int re_node_set_insert_last (set, elem) re_node_set *set; int elem; { /* Realloc if we need. */ if (set->alloc == set->nelem) { int *new_array; set->alloc = (set->alloc + 1) * 2; new_array = re_realloc (set->elems, int, set->alloc); if (BE (new_array == NULL, 0)) return -1; set->elems = new_array; } /* Insert the new element. */ set->elems[set->nelem++] = elem; return 1; } /* Compare two node sets SET1 and SET2. return 1 if SET1 and SET2 are equivalent, return 0 otherwise. */ static int re_node_set_compare (set1, set2) const re_node_set *set1, *set2; { int i; if (set1 == NULL || set2 == NULL || set1->nelem != set2->nelem) return 0; for (i = set1->nelem ; --i >= 0 ; ) if (set1->elems[i] != set2->elems[i]) return 0; return 1; } /* Return (idx + 1) if SET contains the element ELEM, return 0 otherwise. */ static int re_node_set_contains (set, elem) const re_node_set *set; int elem; { unsigned int idx, right, mid; if (set->nelem <= 0) return 0; /* Binary search the element. */ idx = 0; right = set->nelem - 1; while (idx < right) { mid = (idx + right) / 2; if (set->elems[mid] < elem) idx = mid + 1; else right = mid; } return set->elems[idx] == elem ? idx + 1 : 0; } static void re_node_set_remove_at (set, idx) re_node_set *set; int idx; { if (idx < 0 || idx >= set->nelem) return; --set->nelem; for (; idx < set->nelem; idx++) set->elems[idx] = set->elems[idx + 1]; } /* Add the token TOKEN to dfa->nodes, and return the index of the token. Or return -1, if an error will be occured. */ static int re_dfa_add_node (dfa, token, mode) re_dfa_t *dfa; re_token_t token; int mode; { if (BE (dfa->nodes_len >= dfa->nodes_alloc, 0)) { int new_nodes_alloc = dfa->nodes_alloc * 2; re_token_t *new_array = re_realloc (dfa->nodes, re_token_t, new_nodes_alloc); if (BE (new_array == NULL, 0)) return -1; dfa->nodes = new_array; if (mode) { int *new_nexts, *new_indices; re_node_set *new_edests, *new_eclosures, *new_inveclosures; new_nexts = re_realloc (dfa->nexts, int, new_nodes_alloc); new_indices = re_realloc (dfa->org_indices, int, new_nodes_alloc); new_edests = re_realloc (dfa->edests, re_node_set, new_nodes_alloc); new_eclosures = re_realloc (dfa->eclosures, re_node_set, new_nodes_alloc); new_inveclosures = re_realloc (dfa->inveclosures, re_node_set, new_nodes_alloc); if (BE (new_nexts == NULL || new_indices == NULL || new_edests == NULL || new_eclosures == NULL || new_inveclosures == NULL, 0)) return -1; dfa->nexts = new_nexts; dfa->org_indices = new_indices; dfa->edests = new_edests; dfa->eclosures = new_eclosures; dfa->inveclosures = new_inveclosures; } dfa->nodes_alloc = new_nodes_alloc; } dfa->nodes[dfa->nodes_len] = token; dfa->nodes[dfa->nodes_len].opt_subexp = 0; dfa->nodes[dfa->nodes_len].duplicated = 0; dfa->nodes[dfa->nodes_len].constraint = 0; return dfa->nodes_len++; } static unsigned int inline calc_state_hash (nodes, context) const re_node_set *nodes; unsigned int context; { unsigned int hash = nodes->nelem + context; int i; for (i = 0 ; i < nodes->nelem ; i++) hash += nodes->elems[i]; return hash; } /* Search for the state whose node_set is equivalent to NODES. Return the pointer to the state, if we found it in the DFA. Otherwise create the new one and return it. In case of an error return NULL and set the error code in ERR. Note: - We assume NULL as the invalid state, then it is possible that return value is NULL and ERR is REG_NOERROR. - We never return non-NULL value in case of any errors, it is for optimization. */ static re_dfastate_t* re_acquire_state (err, dfa, nodes) reg_errcode_t *err; re_dfa_t *dfa; const re_node_set *nodes; { unsigned int hash; re_dfastate_t *new_state; struct re_state_table_entry *spot; int i; if (BE (nodes->nelem == 0, 0)) { *err = REG_NOERROR; return NULL; } hash = calc_state_hash (nodes, 0); spot = dfa->state_table + (hash & dfa->state_hash_mask); for (i = 0 ; i < spot->num ; i++) { re_dfastate_t *state = spot->array[i]; if (hash != state->hash) continue; if (re_node_set_compare (&state->nodes, nodes)) return state; } /* There are no appropriate state in the dfa, create the new one. */ new_state = create_ci_newstate (dfa, nodes, hash); if (BE (new_state != NULL, 1)) return new_state; else { *err = REG_ESPACE; return NULL; } } /* Search for the state whose node_set is equivalent to NODES and whose context is equivalent to CONTEXT. Return the pointer to the state, if we found it in the DFA. Otherwise create the new one and return it. In case of an error return NULL and set the error code in ERR. Note: - We assume NULL as the invalid state, then it is possible that return value is NULL and ERR is REG_NOERROR. - We never return non-NULL value in case of any errors, it is for optimization. */ static re_dfastate_t* re_acquire_state_context (err, dfa, nodes, context) reg_errcode_t *err; re_dfa_t *dfa; const re_node_set *nodes; unsigned int context; { unsigned int hash; re_dfastate_t *new_state; struct re_state_table_entry *spot; int i; if (nodes->nelem == 0) { *err = REG_NOERROR; return NULL; } hash = calc_state_hash (nodes, context); spot = dfa->state_table + (hash & dfa->state_hash_mask); for (i = 0 ; i < spot->num ; i++) { re_dfastate_t *state = spot->array[i]; if (state->hash == hash && state->context == context && re_node_set_compare (state->entrance_nodes, nodes)) return state; } /* There are no appropriate state in `dfa', create the new one. */ new_state = create_cd_newstate (dfa, nodes, context, hash); if (BE (new_state != NULL, 1)) return new_state; else { *err = REG_ESPACE; return NULL; } } /* Finish initialization of the new state NEWSTATE, and using its hash value HASH put in the appropriate bucket of DFA's state table. Return value indicates the error code if failed. */ static reg_errcode_t register_state (dfa, newstate, hash) re_dfa_t *dfa; re_dfastate_t *newstate; unsigned int hash; { struct re_state_table_entry *spot; reg_errcode_t err; int i; newstate->hash = hash; err = re_node_set_alloc (&newstate->non_eps_nodes, newstate->nodes.nelem); if (BE (err != REG_NOERROR, 0)) return REG_ESPACE; for (i = 0; i < newstate->nodes.nelem; i++) { int elem = newstate->nodes.elems[i]; if (!IS_EPSILON_NODE (dfa->nodes[elem].type)) re_node_set_insert_last (&newstate->non_eps_nodes, elem); } spot = dfa->state_table + (hash & dfa->state_hash_mask); if (BE (spot->alloc <= spot->num, 0)) { int new_alloc = 2 * spot->num + 2; re_dfastate_t **new_array = re_realloc (spot->array, re_dfastate_t *, new_alloc); if (BE (new_array == NULL, 0)) return REG_ESPACE; spot->array = new_array; spot->alloc = new_alloc; } spot->array[spot->num++] = newstate; return REG_NOERROR; } /* Create the new state which is independ of contexts. Return the new state if succeeded, otherwise return NULL. */ static re_dfastate_t * create_ci_newstate (dfa, nodes, hash) re_dfa_t *dfa; const re_node_set *nodes; unsigned int hash; { int i; reg_errcode_t err; re_dfastate_t *newstate; newstate = (re_dfastate_t *) calloc (sizeof (re_dfastate_t), 1); if (BE (newstate == NULL, 0)) return NULL; err = re_node_set_init_copy (&newstate->nodes, nodes); if (BE (err != REG_NOERROR, 0)) { re_free (newstate); return NULL; } newstate->entrance_nodes = &newstate->nodes; for (i = 0 ; i < nodes->nelem ; i++) { re_token_t *node = dfa->nodes + nodes->elems[i]; re_token_type_t type = node->type; if (type == CHARACTER && !node->constraint) continue; /* If the state has the halt node, the state is a halt state. */ else if (type == END_OF_RE) newstate->halt = 1; #ifdef RE_ENABLE_I18N else if (type == COMPLEX_BRACKET || type == OP_UTF8_PERIOD || (type == OP_PERIOD && dfa->mb_cur_max > 1)) newstate->accept_mb = 1; #endif /* RE_ENABLE_I18N */ else if (type == OP_BACK_REF) newstate->has_backref = 1; else if (type == ANCHOR || node->constraint) newstate->has_constraint = 1; } err = register_state (dfa, newstate, hash); if (BE (err != REG_NOERROR, 0)) { free_state (newstate); newstate = NULL; } return newstate; } /* Create the new state which is depend on the context CONTEXT. Return the new state if succeeded, otherwise return NULL. */ static re_dfastate_t * create_cd_newstate (dfa, nodes, context, hash) re_dfa_t *dfa; const re_node_set *nodes; unsigned int context, hash; { int i, nctx_nodes = 0; reg_errcode_t err; re_dfastate_t *newstate; newstate = (re_dfastate_t *) calloc (sizeof (re_dfastate_t), 1); if (BE (newstate == NULL, 0)) return NULL; err = re_node_set_init_copy (&newstate->nodes, nodes); if (BE (err != REG_NOERROR, 0)) { re_free (newstate); return NULL; } newstate->context = context; newstate->entrance_nodes = &newstate->nodes; for (i = 0 ; i < nodes->nelem ; i++) { unsigned int constraint = 0; re_token_t *node = dfa->nodes + nodes->elems[i]; re_token_type_t type = node->type; if (node->constraint) constraint = node->constraint; if (type == CHARACTER && !constraint) continue; /* If the state has the halt node, the state is a halt state. */ else if (type == END_OF_RE) newstate->halt = 1; #ifdef RE_ENABLE_I18N else if (type == COMPLEX_BRACKET || type == OP_UTF8_PERIOD || (type == OP_PERIOD && dfa->mb_cur_max > 1)) newstate->accept_mb = 1; #endif /* RE_ENABLE_I18N */ else if (type == OP_BACK_REF) newstate->has_backref = 1; else if (type == ANCHOR) constraint = node->opr.ctx_type; if (constraint) { if (newstate->entrance_nodes == &newstate->nodes) { newstate->entrance_nodes = re_malloc (re_node_set, 1); if (BE (newstate->entrance_nodes == NULL, 0)) { free_state (newstate); return NULL; } re_node_set_init_copy (newstate->entrance_nodes, nodes); nctx_nodes = 0; newstate->has_constraint = 1; } if (NOT_SATISFY_PREV_CONSTRAINT (constraint,context)) { re_node_set_remove_at (&newstate->nodes, i - nctx_nodes); ++nctx_nodes; } } } err = register_state (dfa, newstate, hash); if (BE (err != REG_NOERROR, 0)) { free_state (newstate); newstate = NULL; } return newstate; } static void free_state (state) re_dfastate_t *state; { re_node_set_free (&state->non_eps_nodes); re_node_set_free (&state->inveclosure); if (state->entrance_nodes != &state->nodes) { re_node_set_free (state->entrance_nodes); re_free (state->entrance_nodes); } re_node_set_free (&state->nodes); re_free (state->trtable); re_free (state); } Yeti-6.4.0/regex/glibc/regex_internal.h000066400000000000000000000601441253351442600200160ustar00rootroot00000000000000/* Extended regular expression matching and search library. Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Isamu Hasegawa . The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #ifndef _REGEX_INTERNAL_H #define _REGEX_INTERNAL_H 1 #include #include #include #include #include #if defined HAVE_LANGINFO_H || defined HAVE_LANGINFO_CODESET || defined _LIBC # include #endif #if defined HAVE_LOCALE_H || defined _LIBC # include #endif #if defined HAVE_WCHAR_H || defined _LIBC # include #endif /* HAVE_WCHAR_H || _LIBC */ #if defined HAVE_WCTYPE_H || defined _LIBC # include #endif /* HAVE_WCTYPE_H || _LIBC */ /* In case that the system doesn't have isblank(). */ #if !defined _LIBC && !defined HAVE_ISBLANK && !defined isblank # define isblank(ch) ((ch) == ' ' || (ch) == '\t') #endif #ifdef _LIBC # ifndef _RE_DEFINE_LOCALE_FUNCTIONS # define _RE_DEFINE_LOCALE_FUNCTIONS 1 # include # include # include # endif #endif /* This is for other GNU distributions with internationalized messages. */ #if (HAVE_LIBINTL_H && ENABLE_NLS) || defined _LIBC # include # ifdef _LIBC # undef gettext # define gettext(msgid) \ INTUSE(__dcgettext) (INTUSE(_libc_intl_domainname), msgid, LC_MESSAGES) # endif #else # define gettext(msgid) (msgid) #endif #ifndef gettext_noop /* This define is so xgettext can find the internationalizable strings. */ # define gettext_noop(String) String #endif #if (defined MB_CUR_MAX && HAVE_LOCALE_H && HAVE_WCTYPE_H && HAVE_WCHAR_H && HAVE_WCRTOMB && HAVE_MBRTOWC && HAVE_WCSCOLL) || _LIBC # define RE_ENABLE_I18N #endif #if __GNUC__ >= 3 # define BE(expr, val) __builtin_expect (expr, val) #else # define BE(expr, val) (expr) # define inline #endif /* Number of bits in a byte. */ #define BYTE_BITS 8 /* Number of single byte character. */ #define SBC_MAX 256 #define COLL_ELEM_LEN_MAX 8 /* The character which represents newline. */ #define NEWLINE_CHAR '\n' #define WIDE_NEWLINE_CHAR L'\n' /* Rename to standard API for using out of glibc. */ #ifndef _LIBC # define __wctype wctype # define __iswctype iswctype # define __btowc btowc # define __mempcpy mempcpy # define __wcrtomb wcrtomb # define __regfree regfree # define attribute_hidden #endif /* not _LIBC */ #ifdef __GNUC__ # define __attribute(arg) __attribute__ (arg) #else # define __attribute(arg) #endif extern const char __re_error_msgid[] attribute_hidden; extern const size_t __re_error_msgid_idx[] attribute_hidden; /* Number of bits in an unsinged int. */ #define UINT_BITS (sizeof (unsigned int) * BYTE_BITS) /* Number of unsigned int in an bit_set. */ #define BITSET_UINTS ((SBC_MAX + UINT_BITS - 1) / UINT_BITS) typedef unsigned int bitset[BITSET_UINTS]; typedef unsigned int *re_bitset_ptr_t; typedef const unsigned int *re_const_bitset_ptr_t; #define bitset_set(set,i) (set[i / UINT_BITS] |= 1 << i % UINT_BITS) #define bitset_clear(set,i) (set[i / UINT_BITS] &= ~(1 << i % UINT_BITS)) #define bitset_contain(set,i) (set[i / UINT_BITS] & (1 << i % UINT_BITS)) #define bitset_empty(set) memset (set, 0, sizeof (unsigned int) * BITSET_UINTS) #define bitset_set_all(set) \ memset (set, 255, sizeof (unsigned int) * BITSET_UINTS) #define bitset_copy(dest,src) \ memcpy (dest, src, sizeof (unsigned int) * BITSET_UINTS) static inline void bitset_not (bitset set); static inline void bitset_merge (bitset dest, const bitset src); static inline void bitset_not_merge (bitset dest, const bitset src); static inline void bitset_mask (bitset dest, const bitset src); #define PREV_WORD_CONSTRAINT 0x0001 #define PREV_NOTWORD_CONSTRAINT 0x0002 #define NEXT_WORD_CONSTRAINT 0x0004 #define NEXT_NOTWORD_CONSTRAINT 0x0008 #define PREV_NEWLINE_CONSTRAINT 0x0010 #define NEXT_NEWLINE_CONSTRAINT 0x0020 #define PREV_BEGBUF_CONSTRAINT 0x0040 #define NEXT_ENDBUF_CONSTRAINT 0x0080 #define WORD_DELIM_CONSTRAINT 0x0100 #define NOT_WORD_DELIM_CONSTRAINT 0x0200 typedef enum { INSIDE_WORD = PREV_WORD_CONSTRAINT | NEXT_WORD_CONSTRAINT, WORD_FIRST = PREV_NOTWORD_CONSTRAINT | NEXT_WORD_CONSTRAINT, WORD_LAST = PREV_WORD_CONSTRAINT | NEXT_NOTWORD_CONSTRAINT, INSIDE_NOTWORD = PREV_NOTWORD_CONSTRAINT | NEXT_NOTWORD_CONSTRAINT, LINE_FIRST = PREV_NEWLINE_CONSTRAINT, LINE_LAST = NEXT_NEWLINE_CONSTRAINT, BUF_FIRST = PREV_BEGBUF_CONSTRAINT, BUF_LAST = NEXT_ENDBUF_CONSTRAINT, WORD_DELIM = WORD_DELIM_CONSTRAINT, NOT_WORD_DELIM = NOT_WORD_DELIM_CONSTRAINT } re_context_type; typedef struct { int alloc; int nelem; int *elems; } re_node_set; typedef enum { NON_TYPE = 0, /* Node type, These are used by token, node, tree. */ CHARACTER = 1, END_OF_RE = 2, SIMPLE_BRACKET = 3, OP_BACK_REF = 4, OP_PERIOD = 5, #ifdef RE_ENABLE_I18N COMPLEX_BRACKET = 6, OP_UTF8_PERIOD = 7, #endif /* RE_ENABLE_I18N */ /* We define EPSILON_BIT as a macro so that OP_OPEN_SUBEXP is used when the debugger shows values of this enum type. */ #define EPSILON_BIT 8 OP_OPEN_SUBEXP = EPSILON_BIT | 0, OP_CLOSE_SUBEXP = EPSILON_BIT | 1, OP_ALT = EPSILON_BIT | 2, OP_DUP_ASTERISK = EPSILON_BIT | 3, OP_DUP_PLUS = EPSILON_BIT | 4, OP_DUP_QUESTION = EPSILON_BIT | 5, ANCHOR = EPSILON_BIT | 6, OP_DELETED_SUBEXP = EPSILON_BIT | 7, /* Tree type, these are used only by tree. */ CONCAT = 16, /* Token type, these are used only by token. */ OP_OPEN_BRACKET = 17, OP_CLOSE_BRACKET, OP_CHARSET_RANGE, OP_OPEN_DUP_NUM, OP_CLOSE_DUP_NUM, OP_NON_MATCH_LIST, OP_OPEN_COLL_ELEM, OP_CLOSE_COLL_ELEM, OP_OPEN_EQUIV_CLASS, OP_CLOSE_EQUIV_CLASS, OP_OPEN_CHAR_CLASS, OP_CLOSE_CHAR_CLASS, OP_WORD, OP_NOTWORD, OP_SPACE, OP_NOTSPACE, BACK_SLASH } re_token_type_t; #ifdef RE_ENABLE_I18N typedef struct { /* Multibyte characters. */ wchar_t *mbchars; /* Collating symbols. */ # ifdef _LIBC int32_t *coll_syms; # endif /* Equivalence classes. */ # ifdef _LIBC int32_t *equiv_classes; # endif /* Range expressions. */ # ifdef _LIBC uint32_t *range_starts; uint32_t *range_ends; # else /* not _LIBC */ wchar_t *range_starts; wchar_t *range_ends; # endif /* not _LIBC */ /* Character classes. */ wctype_t *char_classes; /* If this character set is the non-matching list. */ unsigned int non_match : 1; /* # of multibyte characters. */ int nmbchars; /* # of collating symbols. */ int ncoll_syms; /* # of equivalence classes. */ int nequiv_classes; /* # of range expressions. */ int nranges; /* # of character classes. */ int nchar_classes; } re_charset_t; #endif /* RE_ENABLE_I18N */ typedef struct { union { unsigned char c; /* for CHARACTER */ re_bitset_ptr_t sbcset; /* for SIMPLE_BRACKET */ #ifdef RE_ENABLE_I18N re_charset_t *mbcset; /* for COMPLEX_BRACKET */ #endif /* RE_ENABLE_I18N */ int idx; /* for BACK_REF */ re_context_type ctx_type; /* for ANCHOR */ } opr; #if __GNUC__ >= 2 re_token_type_t type : 8; #else re_token_type_t type; #endif unsigned int constraint : 10; /* context constraint */ unsigned int duplicated : 1; unsigned int opt_subexp : 1; #ifdef RE_ENABLE_I18N /* These 2 bits can be moved into the union if needed (e.g. if running out of bits; move opr.c to opr.c.c and move the flags to opr.c.flags). */ unsigned int mb_partial : 1; #endif unsigned int word_char : 1; } re_token_t; #define IS_EPSILON_NODE(type) ((type) & EPSILON_BIT) #define ACCEPT_MB_NODE(type) \ ((type) >= OP_PERIOD && (type) <= OP_UTF8_PERIOD) struct re_string_t { /* Indicate the raw buffer which is the original string passed as an argument of regexec(), re_search(), etc.. */ const unsigned char *raw_mbs; /* Store the multibyte string. In case of "case insensitive mode" like REG_ICASE, upper cases of the string are stored, otherwise MBS points the same address that RAW_MBS points. */ unsigned char *mbs; #ifdef RE_ENABLE_I18N /* Store the wide character string which is corresponding to MBS. */ wint_t *wcs; int *offsets; mbstate_t cur_state; #endif /* Index in RAW_MBS. Each character mbs[i] corresponds to raw_mbs[raw_mbs_idx + i]. */ int raw_mbs_idx; /* The length of the valid characters in the buffers. */ int valid_len; /* The corresponding number of bytes in raw_mbs array. */ int valid_raw_len; /* The length of the buffers MBS and WCS. */ int bufs_len; /* The index in MBS, which is updated by re_string_fetch_byte. */ int cur_idx; /* length of RAW_MBS array. */ int raw_len; /* This is RAW_LEN - RAW_MBS_IDX + VALID_LEN - VALID_RAW_LEN. */ int len; /* End of the buffer may be shorter than its length in the cases such as re_match_2, re_search_2. Then, we use STOP for end of the buffer instead of LEN. */ int raw_stop; /* This is RAW_STOP - RAW_MBS_IDX adjusted through OFFSETS. */ int stop; /* The context of mbs[0]. We store the context independently, since the context of mbs[0] may be different from raw_mbs[0], which is the beginning of the input string. */ unsigned int tip_context; /* The translation passed as a part of an argument of re_compile_pattern. */ unsigned RE_TRANSLATE_TYPE trans; /* Copy of re_dfa_t's word_char. */ re_const_bitset_ptr_t word_char; /* 1 if REG_ICASE. */ unsigned char icase; unsigned char is_utf8; unsigned char map_notascii; unsigned char mbs_allocated; unsigned char offsets_needed; unsigned char newline_anchor; unsigned char word_ops_used; int mb_cur_max; }; typedef struct re_string_t re_string_t; struct re_dfa_t; typedef struct re_dfa_t re_dfa_t; #ifndef _LIBC # ifdef __i386__ # define internal_function __attribute ((regparm (3), stdcall)) # else # define internal_function # endif #endif #ifndef RE_NO_INTERNAL_PROTOTYPES static reg_errcode_t re_string_allocate (re_string_t *pstr, const char *str, int len, int init_len, RE_TRANSLATE_TYPE trans, int icase, const re_dfa_t *dfa) internal_function; static reg_errcode_t re_string_construct (re_string_t *pstr, const char *str, int len, RE_TRANSLATE_TYPE trans, int icase, const re_dfa_t *dfa) internal_function; static reg_errcode_t re_string_reconstruct (re_string_t *pstr, int idx, int eflags) internal_function; static reg_errcode_t re_string_realloc_buffers (re_string_t *pstr, int new_buf_len) internal_function; # ifdef RE_ENABLE_I18N static void build_wcs_buffer (re_string_t *pstr) internal_function; static int build_wcs_upper_buffer (re_string_t *pstr) internal_function; # endif /* RE_ENABLE_I18N */ static void build_upper_buffer (re_string_t *pstr) internal_function; static void re_string_translate_buffer (re_string_t *pstr) internal_function; static void re_string_destruct (re_string_t *pstr) internal_function; # ifdef RE_ENABLE_I18N static int re_string_elem_size_at (const re_string_t *pstr, int idx) internal_function __attribute ((pure)); static inline int re_string_char_size_at (const re_string_t *pstr, int idx) internal_function __attribute ((pure)); static inline wint_t re_string_wchar_at (const re_string_t *pstr, int idx) internal_function __attribute ((pure)); # endif /* RE_ENABLE_I18N */ static unsigned int re_string_context_at (const re_string_t *input, int idx, int eflags) internal_function __attribute ((pure)); static unsigned char re_string_peek_byte_case (const re_string_t *pstr, int idx) internal_function __attribute ((pure)); static unsigned char re_string_fetch_byte_case (re_string_t *pstr) internal_function __attribute ((pure)); #endif #define re_string_peek_byte(pstr, offset) \ ((pstr)->mbs[(pstr)->cur_idx + offset]) #define re_string_fetch_byte(pstr) \ ((pstr)->mbs[(pstr)->cur_idx++]) #define re_string_first_byte(pstr, idx) \ ((idx) == (pstr)->valid_len || (pstr)->wcs[idx] != WEOF) #define re_string_is_single_byte_char(pstr, idx) \ ((pstr)->wcs[idx] != WEOF && ((pstr)->valid_len == (idx) + 1 \ || (pstr)->wcs[(idx) + 1] != WEOF)) #define re_string_eoi(pstr) ((pstr)->stop <= (pstr)->cur_idx) #define re_string_cur_idx(pstr) ((pstr)->cur_idx) #define re_string_get_buffer(pstr) ((pstr)->mbs) #define re_string_length(pstr) ((pstr)->len) #define re_string_byte_at(pstr,idx) ((pstr)->mbs[idx]) #define re_string_skip_bytes(pstr,idx) ((pstr)->cur_idx += (idx)) #define re_string_set_index(pstr,idx) ((pstr)->cur_idx = (idx)) #define re_malloc(t,n) ((t *) malloc ((n) * sizeof (t))) #define re_realloc(p,t,n) ((t *) realloc (p, (n) * sizeof (t))) #define re_free(p) free (p) struct bin_tree_t { struct bin_tree_t *parent; struct bin_tree_t *left; struct bin_tree_t *right; /* `node_idx' is the index in dfa->nodes, if `type' == 0. Otherwise `type' indicate the type of this node. */ re_token_type_t type; int node_idx; int first; int next; re_node_set eclosure; }; typedef struct bin_tree_t bin_tree_t; #define BIN_TREE_STORAGE_SIZE \ ((1024 - sizeof (void *)) / sizeof (bin_tree_t)) struct bin_tree_storage_t { struct bin_tree_storage_t *next; bin_tree_t data[BIN_TREE_STORAGE_SIZE]; }; typedef struct bin_tree_storage_t bin_tree_storage_t; #define CONTEXT_WORD 1 #define CONTEXT_NEWLINE (CONTEXT_WORD << 1) #define CONTEXT_BEGBUF (CONTEXT_NEWLINE << 1) #define CONTEXT_ENDBUF (CONTEXT_BEGBUF << 1) #define IS_WORD_CONTEXT(c) ((c) & CONTEXT_WORD) #define IS_NEWLINE_CONTEXT(c) ((c) & CONTEXT_NEWLINE) #define IS_BEGBUF_CONTEXT(c) ((c) & CONTEXT_BEGBUF) #define IS_ENDBUF_CONTEXT(c) ((c) & CONTEXT_ENDBUF) #define IS_ORDINARY_CONTEXT(c) ((c) == 0) #define IS_WORD_CHAR(ch) (isalnum (ch) || (ch) == '_') #define IS_NEWLINE(ch) ((ch) == NEWLINE_CHAR) #define IS_WIDE_WORD_CHAR(ch) (iswalnum (ch) || (ch) == L'_') #define IS_WIDE_NEWLINE(ch) ((ch) == WIDE_NEWLINE_CHAR) #define NOT_SATISFY_PREV_CONSTRAINT(constraint,context) \ ((((constraint) & PREV_WORD_CONSTRAINT) && !IS_WORD_CONTEXT (context)) \ || ((constraint & PREV_NOTWORD_CONSTRAINT) && IS_WORD_CONTEXT (context)) \ || ((constraint & PREV_NEWLINE_CONSTRAINT) && !IS_NEWLINE_CONTEXT (context))\ || ((constraint & PREV_BEGBUF_CONSTRAINT) && !IS_BEGBUF_CONTEXT (context))) #define NOT_SATISFY_NEXT_CONSTRAINT(constraint,context) \ ((((constraint) & NEXT_WORD_CONSTRAINT) && !IS_WORD_CONTEXT (context)) \ || (((constraint) & NEXT_NOTWORD_CONSTRAINT) && IS_WORD_CONTEXT (context)) \ || (((constraint) & NEXT_NEWLINE_CONSTRAINT) && !IS_NEWLINE_CONTEXT (context)) \ || (((constraint) & NEXT_ENDBUF_CONSTRAINT) && !IS_ENDBUF_CONTEXT (context))) struct re_dfastate_t { unsigned int hash; re_node_set nodes; re_node_set non_eps_nodes; re_node_set inveclosure; re_node_set *entrance_nodes; struct re_dfastate_t **trtable; unsigned int context : 4; unsigned int halt : 1; /* If this state can accept `multi byte'. Note that we refer to multibyte characters, and multi character collating elements as `multi byte'. */ unsigned int accept_mb : 1; /* If this state has backreference node(s). */ unsigned int has_backref : 1; unsigned int has_constraint : 1; unsigned int word_trtable : 1; }; typedef struct re_dfastate_t re_dfastate_t; struct re_state_table_entry { int num; int alloc; re_dfastate_t **array; }; /* Array type used in re_sub_match_last_t and re_sub_match_top_t. */ typedef struct { int next_idx; int alloc; re_dfastate_t **array; } state_array_t; /* Store information about the node NODE whose type is OP_CLOSE_SUBEXP. */ typedef struct { int node; int str_idx; /* The position NODE match at. */ state_array_t path; } re_sub_match_last_t; /* Store information about the node NODE whose type is OP_OPEN_SUBEXP. And information about the node, whose type is OP_CLOSE_SUBEXP, corresponding to NODE is stored in LASTS. */ typedef struct { int str_idx; int node; int next_last_offset; state_array_t *path; int alasts; /* Allocation size of LASTS. */ int nlasts; /* The number of LASTS. */ re_sub_match_last_t **lasts; } re_sub_match_top_t; struct re_backref_cache_entry { int node; int str_idx; int subexp_from; int subexp_to; char more; char unused; unsigned short int eps_reachable_subexps_map; }; typedef struct { /* The string object corresponding to the input string. */ re_string_t input; #if defined _LIBC || (defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L) re_dfa_t *const dfa; #else re_dfa_t *dfa; #endif /* EFLAGS of the argument of regexec. */ int eflags; /* Where the matching ends. */ int match_last; int last_node; /* The state log used by the matcher. */ re_dfastate_t **state_log; int state_log_top; /* Back reference cache. */ int nbkref_ents; int abkref_ents; struct re_backref_cache_entry *bkref_ents; int max_mb_elem_len; int nsub_tops; int asub_tops; re_sub_match_top_t **sub_tops; } re_match_context_t; typedef struct { re_dfastate_t **sifted_states; re_dfastate_t **limited_states; int last_node; int last_str_idx; re_node_set limits; } re_sift_context_t; struct re_fail_stack_ent_t { int idx; int node; regmatch_t *regs; re_node_set eps_via_nodes; }; struct re_fail_stack_t { int num; int alloc; struct re_fail_stack_ent_t *stack; }; struct re_dfa_t { re_token_t *nodes; int nodes_alloc; int nodes_len; int *nexts; int *org_indices; re_node_set *edests; re_node_set *eclosures; re_node_set *inveclosures; struct re_state_table_entry *state_table; re_dfastate_t *init_state; re_dfastate_t *init_state_word; re_dfastate_t *init_state_nl; re_dfastate_t *init_state_begbuf; bin_tree_t *str_tree; bin_tree_storage_t *str_tree_storage; re_bitset_ptr_t sb_char; int str_tree_storage_idx; /* number of subexpressions `re_nsub' is in regex_t. */ unsigned int state_hash_mask; int states_alloc; int init_node; int nbackref; /* The number of backreference in this dfa. */ /* Bitmap expressing which backreference is used. */ unsigned int used_bkref_map; unsigned int completed_bkref_map; unsigned int has_plural_match : 1; /* If this dfa has "multibyte node", which is a backreference or a node which can accept multibyte character or multi character collating element. */ unsigned int has_mb_node : 1; unsigned int is_utf8 : 1; unsigned int map_notascii : 1; unsigned int word_ops_used : 1; int mb_cur_max; bitset word_char; reg_syntax_t syntax; int *subexp_map; #ifdef DEBUG char* re_str; #endif }; #ifndef RE_NO_INTERNAL_PROTOTYPES static reg_errcode_t re_node_set_alloc (re_node_set *set, int size) internal_function; static reg_errcode_t re_node_set_init_1 (re_node_set *set, int elem) internal_function; static reg_errcode_t re_node_set_init_2 (re_node_set *set, int elem1, int elem2) internal_function; #define re_node_set_init_empty(set) memset (set, '\0', sizeof (re_node_set)) static reg_errcode_t re_node_set_init_copy (re_node_set *dest, const re_node_set *src) internal_function; static reg_errcode_t re_node_set_add_intersect (re_node_set *dest, const re_node_set *src1, const re_node_set *src2) internal_function; static reg_errcode_t re_node_set_init_union (re_node_set *dest, const re_node_set *src1, const re_node_set *src2) internal_function; static reg_errcode_t re_node_set_merge (re_node_set *dest, const re_node_set *src) internal_function; static int re_node_set_insert (re_node_set *set, int elem) internal_function; static int re_node_set_insert_last (re_node_set *set, int elem) internal_function; static int re_node_set_compare (const re_node_set *set1, const re_node_set *set2) internal_function __attribute ((pure)); static int re_node_set_contains (const re_node_set *set, int elem) internal_function __attribute ((pure)); static void re_node_set_remove_at (re_node_set *set, int idx) internal_function; #define re_node_set_remove(set,id) \ (re_node_set_remove_at (set, re_node_set_contains (set, id) - 1)) #define re_node_set_empty(p) ((p)->nelem = 0) #define re_node_set_free(set) re_free ((set)->elems) static int re_dfa_add_node (re_dfa_t *dfa, re_token_t token, int mode) internal_function; static re_dfastate_t *re_acquire_state (reg_errcode_t *err, re_dfa_t *dfa, const re_node_set *nodes) internal_function; static re_dfastate_t *re_acquire_state_context (reg_errcode_t *err, re_dfa_t *dfa, const re_node_set *nodes, unsigned int context) internal_function; static void free_state (re_dfastate_t *state) internal_function; #endif typedef enum { SB_CHAR, MB_CHAR, EQUIV_CLASS, COLL_SYM, CHAR_CLASS } bracket_elem_type; typedef struct { bracket_elem_type type; union { unsigned char ch; unsigned char *name; wchar_t wch; } opr; } bracket_elem_t; /* Inline functions for bitset operation. */ static inline void bitset_not (bitset set) { int bitset_i; for (bitset_i = 0; bitset_i < BITSET_UINTS; ++bitset_i) set[bitset_i] = ~set[bitset_i]; } static inline void bitset_merge (bitset dest, const bitset src) { int bitset_i; for (bitset_i = 0; bitset_i < BITSET_UINTS; ++bitset_i) dest[bitset_i] |= src[bitset_i]; } static inline void bitset_not_merge (bitset dest, const bitset src) { int i; for (i = 0; i < BITSET_UINTS; ++i) dest[i] |= ~src[i]; } static inline void bitset_mask (bitset dest, const bitset src) { int bitset_i; for (bitset_i = 0; bitset_i < BITSET_UINTS; ++bitset_i) dest[bitset_i] &= src[bitset_i]; } #if defined RE_ENABLE_I18N && !defined RE_NO_INTERNAL_PROTOTYPES /* Inline functions for re_string. */ static inline int internal_function re_string_char_size_at (const re_string_t *pstr, int idx) { int byte_idx; if (pstr->mb_cur_max == 1) return 1; for (byte_idx = 1; idx + byte_idx < pstr->valid_len; ++byte_idx) if (pstr->wcs[idx + byte_idx] != WEOF) break; return byte_idx; } static inline wint_t internal_function re_string_wchar_at (const re_string_t *pstr, int idx) { if (pstr->mb_cur_max == 1) return (wint_t) pstr->mbs[idx]; return (wint_t) pstr->wcs[idx]; } static int internal_function re_string_elem_size_at (const re_string_t *pstr, int idx) { #ifdef _LIBC const unsigned char *p, *extra; const int32_t *table, *indirect; int32_t tmp; # include uint_fast32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules != 0) { table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); p = pstr->mbs + idx; tmp = findidx (&p); return p - pstr->mbs - idx; } else #endif /* _LIBC */ return 1; } #endif /* RE_ENABLE_I18N */ #endif /* _REGEX_INTERNAL_H */ Yeti-6.4.0/regex/glibc/regexec.c000066400000000000000000004403401253351442600164250ustar00rootroot00000000000000/* Extended regular expression matching and search library. Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Isamu Hasegawa . The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags, int n) internal_function; static void match_ctx_clean (re_match_context_t *mctx) internal_function; static void match_ctx_free (re_match_context_t *cache) internal_function; static reg_errcode_t match_ctx_add_entry (re_match_context_t *cache, int node, int str_idx, int from, int to) internal_function; static int search_cur_bkref_entry (re_match_context_t *mctx, int str_idx) internal_function; static reg_errcode_t match_ctx_add_subtop (re_match_context_t *mctx, int node, int str_idx) internal_function; static re_sub_match_last_t * match_ctx_add_sublast (re_sub_match_top_t *subtop, int node, int str_idx) internal_function; static void sift_ctx_init (re_sift_context_t *sctx, re_dfastate_t **sifted_sts, re_dfastate_t **limited_sts, int last_node, int last_str_idx) internal_function; static reg_errcode_t re_search_internal (const regex_t *preg, const char *string, int length, int start, int range, int stop, size_t nmatch, regmatch_t pmatch[], int eflags) internal_function; static int re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1, int length1, const char *string2, int length2, int start, int range, struct re_registers *regs, int stop, int ret_len) internal_function; static int re_search_stub (struct re_pattern_buffer *bufp, const char *string, int length, int start, int range, int stop, struct re_registers *regs, int ret_len) internal_function; static unsigned re_copy_regs (struct re_registers *regs, regmatch_t *pmatch, int nregs, int regs_allocated) internal_function; static inline re_dfastate_t *acquire_init_state_context (reg_errcode_t *err, const re_match_context_t *mctx, int idx) __attribute ((always_inline)) internal_function; static reg_errcode_t prune_impossible_nodes (re_match_context_t *mctx) internal_function; static int check_matching (re_match_context_t *mctx, int fl_longest_match, int *p_match_first) internal_function; static int check_halt_node_context (const re_dfa_t *dfa, int node, unsigned int context) internal_function; static int check_halt_state_context (const re_match_context_t *mctx, const re_dfastate_t *state, int idx) internal_function; static void update_regs (re_dfa_t *dfa, regmatch_t *pmatch, regmatch_t *prev_idx_match, int cur_node, int cur_idx, int nmatch) internal_function; static int proceed_next_node (const re_match_context_t *mctx, int nregs, regmatch_t *regs, int *pidx, int node, re_node_set *eps_via_nodes, struct re_fail_stack_t *fs) internal_function; static reg_errcode_t push_fail_stack (struct re_fail_stack_t *fs, int str_idx, int dest_node, int nregs, regmatch_t *regs, re_node_set *eps_via_nodes) internal_function; static int pop_fail_stack (struct re_fail_stack_t *fs, int *pidx, int nregs, regmatch_t *regs, re_node_set *eps_via_nodes) internal_function; static reg_errcode_t set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, regmatch_t *pmatch, int fl_backtrack) internal_function; static reg_errcode_t free_fail_stack_return (struct re_fail_stack_t *fs) internal_function; #ifdef RE_ENABLE_I18N static int sift_states_iter_mb (const re_match_context_t *mctx, re_sift_context_t *sctx, int node_idx, int str_idx, int max_str_idx) internal_function; #endif /* RE_ENABLE_I18N */ static reg_errcode_t sift_states_backward (re_match_context_t *mctx, re_sift_context_t *sctx) internal_function; static reg_errcode_t build_sifted_states (re_match_context_t *mctx, re_sift_context_t *sctx, int str_idx, re_node_set *cur_dest) internal_function; static reg_errcode_t update_cur_sifted_state (re_match_context_t *mctx, re_sift_context_t *sctx, int str_idx, re_node_set *dest_nodes) internal_function; static reg_errcode_t add_epsilon_src_nodes (re_dfa_t *dfa, re_node_set *dest_nodes, const re_node_set *candidates) internal_function; static reg_errcode_t sub_epsilon_src_nodes (re_dfa_t *dfa, int node, re_node_set *dest_nodes, const re_node_set *and_nodes) internal_function; static int check_dst_limits (re_match_context_t *mctx, re_node_set *limits, int dst_node, int dst_idx, int src_node, int src_idx) internal_function; static int check_dst_limits_calc_pos_1 (re_match_context_t *mctx, int boundaries, int subexp_idx, int from_node, int bkref_idx) internal_function; static int check_dst_limits_calc_pos (re_match_context_t *mctx, int limit, int subexp_idx, int node, int str_idx, int bkref_idx) internal_function; static reg_errcode_t check_subexp_limits (re_dfa_t *dfa, re_node_set *dest_nodes, const re_node_set *candidates, re_node_set *limits, struct re_backref_cache_entry *bkref_ents, int str_idx) internal_function; static reg_errcode_t sift_states_bkref (re_match_context_t *mctx, re_sift_context_t *sctx, int str_idx, const re_node_set *candidates) internal_function; static reg_errcode_t clean_state_log_if_needed (re_match_context_t *mctx, int next_state_log_idx) internal_function; static reg_errcode_t merge_state_array (re_dfa_t *dfa, re_dfastate_t **dst, re_dfastate_t **src, int num) internal_function; static re_dfastate_t *find_recover_state (reg_errcode_t *err, re_match_context_t *mctx) internal_function; static re_dfastate_t *transit_state (reg_errcode_t *err, re_match_context_t *mctx, re_dfastate_t *state) internal_function; static re_dfastate_t *merge_state_with_log (reg_errcode_t *err, re_match_context_t *mctx, re_dfastate_t *next_state) internal_function; static reg_errcode_t check_subexp_matching_top (re_match_context_t *mctx, re_node_set *cur_nodes, int str_idx) internal_function; #if 0 static re_dfastate_t *transit_state_sb (reg_errcode_t *err, re_match_context_t *mctx, re_dfastate_t *pstate) internal_function; #endif #ifdef RE_ENABLE_I18N static reg_errcode_t transit_state_mb (re_match_context_t *mctx, re_dfastate_t *pstate) internal_function; #endif /* RE_ENABLE_I18N */ static reg_errcode_t transit_state_bkref (re_match_context_t *mctx, const re_node_set *nodes) internal_function; static reg_errcode_t get_subexp (re_match_context_t *mctx, int bkref_node, int bkref_str_idx) internal_function; static reg_errcode_t get_subexp_sub (re_match_context_t *mctx, const re_sub_match_top_t *sub_top, re_sub_match_last_t *sub_last, int bkref_node, int bkref_str) internal_function; static int find_subexp_node (const re_dfa_t *dfa, const re_node_set *nodes, int subexp_idx, int type) internal_function; static reg_errcode_t check_arrival (re_match_context_t *mctx, state_array_t *path, int top_node, int top_str, int last_node, int last_str, int type) internal_function; static reg_errcode_t check_arrival_add_next_nodes (re_match_context_t *mctx, int str_idx, re_node_set *cur_nodes, re_node_set *next_nodes) internal_function; static reg_errcode_t check_arrival_expand_ecl (re_dfa_t *dfa, re_node_set *cur_nodes, int ex_subexp, int type) internal_function; static reg_errcode_t check_arrival_expand_ecl_sub (re_dfa_t *dfa, re_node_set *dst_nodes, int target, int ex_subexp, int type) internal_function; static reg_errcode_t expand_bkref_cache (re_match_context_t *mctx, re_node_set *cur_nodes, int cur_str, int subexp_num, int type) internal_function; static re_dfastate_t **build_trtable (re_dfa_t *dfa, re_dfastate_t *state) internal_function; #ifdef RE_ENABLE_I18N static int check_node_accept_bytes (re_dfa_t *dfa, int node_idx, const re_string_t *input, int idx) internal_function; # ifdef _LIBC static unsigned int find_collation_sequence_value (const unsigned char *mbs, size_t name_len) internal_function; # endif /* _LIBC */ #endif /* RE_ENABLE_I18N */ static int group_nodes_into_DFAstates (re_dfa_t *dfa, const re_dfastate_t *state, re_node_set *states_node, bitset *states_ch) internal_function; static int check_node_accept (const re_match_context_t *mctx, const re_token_t *node, int idx) internal_function; static reg_errcode_t extend_buffers (re_match_context_t *mctx) internal_function; /* Entry point for POSIX code. */ /* regexec searches for a given pattern, specified by PREG, in the string STRING. If NMATCH is zero or REG_NOSUB was set in the cflags argument to `regcomp', we ignore PMATCH. Otherwise, we assume PMATCH has at least NMATCH elements, and we set them to the offsets of the corresponding matched substrings. EFLAGS specifies `execution flags' which affect matching: if REG_NOTBOL is set, then ^ does not match at the beginning of the string; if REG_NOTEOL is set, then $ does not match at the end. We return 0 if we find a match and REG_NOMATCH if not. */ int regexec (preg, string, nmatch, pmatch, eflags) const regex_t *__restrict preg; const char *__restrict string; size_t nmatch; regmatch_t pmatch[]; int eflags; { reg_errcode_t err; int start, length; if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND)) return REG_BADPAT; if (eflags & REG_STARTEND) { start = pmatch[0].rm_so; length = pmatch[0].rm_eo; } else { start = 0; length = strlen (string); } if (preg->no_sub) err = re_search_internal (preg, string, length, start, length - start, length, 0, NULL, eflags); else err = re_search_internal (preg, string, length, start, length - start, length, nmatch, pmatch, eflags); return err != REG_NOERROR; } #ifdef _LIBC # include versioned_symbol (libc, __regexec, regexec, GLIBC_2_3_4); # if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_3_4) __typeof__ (__regexec) __compat_regexec; int attribute_compat_text_section __compat_regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[], int eflags) { return regexec (preg, string, nmatch, pmatch, eflags & (REG_NOTBOL | REG_NOTEOL)); } compat_symbol (libc, __compat_regexec, regexec, GLIBC_2_0); # endif #endif /* Entry points for GNU code. */ /* re_match, re_search, re_match_2, re_search_2 The former two functions operate on STRING with length LENGTH, while the later two operate on concatenation of STRING1 and STRING2 with lengths LENGTH1 and LENGTH2, respectively. re_match() matches the compiled pattern in BUFP against the string, starting at index START. re_search() first tries matching at index START, then it tries to match starting from index START + 1, and so on. The last start position tried is START + RANGE. (Thus RANGE = 0 forces re_search to operate the same way as re_match().) The parameter STOP of re_{match,search}_2 specifies that no match exceeding the first STOP characters of the concatenation of the strings should be concerned. If REGS is not NULL, and BUFP->no_sub is not set, the offsets of the match and all groups is stroed in REGS. (For the "_2" variants, the offsets are computed relative to the concatenation, not relative to the individual strings.) On success, re_match* functions return the length of the match, re_search* return the position of the start of the match. Return value -1 means no match was found and -2 indicates an internal error. */ int re_match (bufp, string, length, start, regs) struct re_pattern_buffer *bufp; const char *string; int length, start; struct re_registers *regs; { return re_search_stub (bufp, string, length, start, 0, length, regs, 1); } #ifdef _LIBC weak_alias (__re_match, re_match) #endif int re_search (bufp, string, length, start, range, regs) struct re_pattern_buffer *bufp; const char *string; int length, start, range; struct re_registers *regs; { return re_search_stub (bufp, string, length, start, range, length, regs, 0); } #ifdef _LIBC weak_alias (__re_search, re_search) #endif int re_match_2 (bufp, string1, length1, string2, length2, start, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int length1, length2, start, stop; struct re_registers *regs; { return re_search_2_stub (bufp, string1, length1, string2, length2, start, 0, regs, stop, 1); } #ifdef _LIBC weak_alias (__re_match_2, re_match_2) #endif int re_search_2 (bufp, string1, length1, string2, length2, start, range, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int length1, length2, start, range, stop; struct re_registers *regs; { return re_search_2_stub (bufp, string1, length1, string2, length2, start, range, regs, stop, 0); } #ifdef _LIBC weak_alias (__re_search_2, re_search_2) #endif static int re_search_2_stub (bufp, string1, length1, string2, length2, start, range, regs, stop, ret_len) struct re_pattern_buffer *bufp; const char *string1, *string2; int length1, length2, start, range, stop, ret_len; struct re_registers *regs; { const char *str; int rval; int len = length1 + length2; int free_str = 0; if (BE (length1 < 0 || length2 < 0 || stop < 0, 0)) return -2; /* Concatenate the strings. */ if (length2 > 0) if (length1 > 0) { char *s = re_malloc (char, len); if (BE (s == NULL, 0)) return -2; memcpy (s, string1, length1); memcpy (s + length1, string2, length2); str = s; free_str = 1; } else str = string2; else str = string1; rval = re_search_stub (bufp, str, len, start, range, stop, regs, ret_len); if (free_str) re_free ((char *) str); return rval; } /* The parameters have the same meaning as those of re_search. Additional parameters: If RET_LEN is nonzero the length of the match is returned (re_match style); otherwise the position of the match is returned. */ static int re_search_stub (bufp, string, length, start, range, stop, regs, ret_len) struct re_pattern_buffer *bufp; const char *string; int length, start, range, stop, ret_len; struct re_registers *regs; { reg_errcode_t result; regmatch_t *pmatch; int nregs, rval; int eflags = 0; /* Check for out-of-range. */ if (BE (start < 0 || start > length, 0)) return -1; if (BE (start + range > length, 0)) range = length - start; else if (BE (start + range < 0, 0)) range = -start; eflags |= (bufp->not_bol) ? REG_NOTBOL : 0; eflags |= (bufp->not_eol) ? REG_NOTEOL : 0; /* Compile fastmap if we haven't yet. */ if (range > 0 && bufp->fastmap != NULL && !bufp->fastmap_accurate) re_compile_fastmap (bufp); if (BE (bufp->no_sub, 0)) regs = NULL; /* We need at least 1 register. */ if (regs == NULL) nregs = 1; else if (BE (bufp->regs_allocated == REGS_FIXED && regs->num_regs < bufp->re_nsub + 1, 0)) { nregs = regs->num_regs; if (BE (nregs < 1, 0)) { /* Nothing can be copied to regs. */ regs = NULL; nregs = 1; } } else nregs = bufp->re_nsub + 1; pmatch = re_malloc (regmatch_t, nregs); if (BE (pmatch == NULL, 0)) return -2; result = re_search_internal (bufp, string, length, start, range, stop, nregs, pmatch, eflags); rval = 0; /* I hope we needn't fill ther regs with -1's when no match was found. */ if (result != REG_NOERROR) rval = -1; else if (regs != NULL) { /* If caller wants register contents data back, copy them. */ bufp->regs_allocated = re_copy_regs (regs, pmatch, nregs, bufp->regs_allocated); if (BE (bufp->regs_allocated == REGS_UNALLOCATED, 0)) rval = -2; } if (BE (rval == 0, 1)) { if (ret_len) { assert (pmatch[0].rm_so == start); rval = pmatch[0].rm_eo - start; } else rval = pmatch[0].rm_so; } re_free (pmatch); return rval; } static unsigned re_copy_regs (regs, pmatch, nregs, regs_allocated) struct re_registers *regs; regmatch_t *pmatch; int nregs, regs_allocated; { int rval = REGS_REALLOCATE; int i; int need_regs = nregs + 1; /* We need one extra element beyond `num_regs' for the `-1' marker GNU code uses. */ /* Have the register data arrays been allocated? */ if (regs_allocated == REGS_UNALLOCATED) { /* No. So allocate them with malloc. */ regs->start = re_malloc (regoff_t, need_regs); regs->end = re_malloc (regoff_t, need_regs); if (BE (regs->start == NULL, 0) || BE (regs->end == NULL, 0)) return REGS_UNALLOCATED; regs->num_regs = need_regs; } else if (regs_allocated == REGS_REALLOCATE) { /* Yes. If we need more elements than were already allocated, reallocate them. If we need fewer, just leave it alone. */ if (BE (need_regs > regs->num_regs, 0)) { regoff_t *new_start = re_realloc (regs->start, regoff_t, need_regs); regoff_t *new_end = re_realloc (regs->end, regoff_t, need_regs); if (BE (new_start == NULL, 0) || BE (new_end == NULL, 0)) return REGS_UNALLOCATED; regs->start = new_start; regs->end = new_end; regs->num_regs = need_regs; } } else { assert (regs_allocated == REGS_FIXED); /* This function may not be called with REGS_FIXED and nregs too big. */ assert (regs->num_regs >= nregs); rval = REGS_FIXED; } /* Copy the regs. */ for (i = 0; i < nregs; ++i) { regs->start[i] = pmatch[i].rm_so; regs->end[i] = pmatch[i].rm_eo; } for ( ; i < regs->num_regs; ++i) regs->start[i] = regs->end[i] = -1; return rval; } /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using PATTERN_BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated using the malloc library routine, and must each be at least NUM_REGS * sizeof (regoff_t) bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ void re_set_registers (bufp, regs, num_regs, starts, ends) struct re_pattern_buffer *bufp; struct re_registers *regs; unsigned num_regs; regoff_t *starts, *ends; { if (num_regs) { bufp->regs_allocated = REGS_REALLOCATE; regs->num_regs = num_regs; regs->start = starts; regs->end = ends; } else { bufp->regs_allocated = REGS_UNALLOCATED; regs->num_regs = 0; regs->start = regs->end = (regoff_t *) 0; } } #ifdef _LIBC weak_alias (__re_set_registers, re_set_registers) #endif /* Entry points compatible with 4.2 BSD regex library. We don't define them unless specifically requested. */ #if defined _REGEX_RE_COMP || defined _LIBC int # ifdef _LIBC weak_function # endif re_exec (s) const char *s; { return 0 == regexec (&re_comp_buf, s, 0, NULL, 0); } #endif /* _REGEX_RE_COMP */ /* Internal entry point. */ /* Searches for a compiled pattern PREG in the string STRING, whose length is LENGTH. NMATCH, PMATCH, and EFLAGS have the same mingings with regexec. START, and RANGE have the same meanings with re_search. Return REG_NOERROR if we find a match, and REG_NOMATCH if not, otherwise return the error code. Note: We assume front end functions already check ranges. (START + RANGE >= 0 && START + RANGE <= LENGTH) */ static reg_errcode_t re_search_internal (preg, string, length, start, range, stop, nmatch, pmatch, eflags) const regex_t *preg; const char *string; int length, start, range, stop, eflags; size_t nmatch; regmatch_t pmatch[]; { reg_errcode_t err; re_dfa_t *dfa = (re_dfa_t *)preg->buffer; int left_lim, right_lim, incr; int fl_longest_match, match_first, match_kind, match_last = -1; int sb, ch; #if defined _LIBC || (defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L) re_match_context_t mctx = { .dfa = dfa }; #else re_match_context_t mctx; #endif char *fastmap = (preg->fastmap != NULL && preg->fastmap_accurate && range && !preg->can_be_null) ? preg->fastmap : NULL; unsigned RE_TRANSLATE_TYPE t = (unsigned RE_TRANSLATE_TYPE) preg->translate; #if !(defined _LIBC || (defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L)) memset (&mctx, '\0', sizeof (re_match_context_t)); mctx.dfa = dfa; #endif /* Check if the DFA haven't been compiled. */ if (BE (preg->used == 0 || dfa->init_state == NULL || dfa->init_state_word == NULL || dfa->init_state_nl == NULL || dfa->init_state_begbuf == NULL, 0)) return REG_NOMATCH; #ifdef DEBUG /* We assume front-end functions already check them. */ assert (start + range >= 0 && start + range <= length); #endif /* If initial states with non-begbuf contexts have no elements, the regex must be anchored. If preg->newline_anchor is set, we'll never use init_state_nl, so do not check it. */ if (dfa->init_state->nodes.nelem == 0 && dfa->init_state_word->nodes.nelem == 0 && (dfa->init_state_nl->nodes.nelem == 0 || !preg->newline_anchor)) { if (start != 0 && start + range != 0) return REG_NOMATCH; start = range = 0; } /* We must check the longest matching, if nmatch > 0. */ fl_longest_match = (nmatch != 0 || dfa->nbackref); err = re_string_allocate (&mctx.input, string, length, dfa->nodes_len + 1, preg->translate, preg->syntax & RE_ICASE, dfa); if (BE (err != REG_NOERROR, 0)) goto free_return; mctx.input.stop = stop; mctx.input.raw_stop = stop; mctx.input.newline_anchor = preg->newline_anchor; err = match_ctx_init (&mctx, eflags, dfa->nbackref * 2); if (BE (err != REG_NOERROR, 0)) goto free_return; /* We will log all the DFA states through which the dfa pass, if nmatch > 1, or this dfa has "multibyte node", which is a back-reference or a node which can accept multibyte character or multi character collating element. */ if (nmatch > 1 || dfa->has_mb_node) { mctx.state_log = re_malloc (re_dfastate_t *, mctx.input.bufs_len + 1); if (BE (mctx.state_log == NULL, 0)) { err = REG_ESPACE; goto free_return; } } else mctx.state_log = NULL; match_first = start; mctx.input.tip_context = (eflags & REG_NOTBOL) ? CONTEXT_BEGBUF : CONTEXT_NEWLINE | CONTEXT_BEGBUF; /* Check incrementally whether of not the input string match. */ incr = (range < 0) ? -1 : 1; left_lim = (range < 0) ? start + range : start; right_lim = (range < 0) ? start : start + range; sb = dfa->mb_cur_max == 1; match_kind = (fastmap ? ((sb || !(preg->syntax & RE_ICASE || t) ? 4 : 0) | (range >= 0 ? 2 : 0) | (t != NULL ? 1 : 0)) : 8); for (;; match_first += incr) { err = REG_NOMATCH; if (match_first < left_lim || right_lim < match_first) goto free_return; /* Advance as rapidly as possible through the string, until we find a plausible place to start matching. This may be done with varying efficiency, so there are various possibilities: only the most common of them are specialized, in order to save on code size. We use a switch statement for speed. */ switch (match_kind) { case 8: /* No fastmap. */ break; case 7: /* Fastmap with single-byte translation, match forward. */ while (BE (match_first < right_lim, 1) && !fastmap[t[(unsigned char) string[match_first]]]) ++match_first; goto forward_match_found_start_or_reached_end; case 6: /* Fastmap without translation, match forward. */ while (BE (match_first < right_lim, 1) && !fastmap[(unsigned char) string[match_first]]) ++match_first; forward_match_found_start_or_reached_end: if (BE (match_first == right_lim, 0)) { ch = match_first >= length ? 0 : (unsigned char) string[match_first]; if (!fastmap[t ? t[ch] : ch]) goto free_return; } break; case 4: case 5: /* Fastmap without multi-byte translation, match backwards. */ while (match_first >= left_lim) { ch = match_first >= length ? 0 : (unsigned char) string[match_first]; if (fastmap[t ? t[ch] : ch]) break; --match_first; } if (match_first < left_lim) goto free_return; break; default: /* In this case, we can't determine easily the current byte, since it might be a component byte of a multibyte character. Then we use the constructed buffer instead. */ for (;;) { /* If MATCH_FIRST is out of the valid range, reconstruct the buffers. */ unsigned int offset = match_first - mctx.input.raw_mbs_idx; if (BE (offset >= (unsigned int) mctx.input.valid_raw_len, 0)) { err = re_string_reconstruct (&mctx.input, match_first, eflags); if (BE (err != REG_NOERROR, 0)) goto free_return; offset = match_first - mctx.input.raw_mbs_idx; } /* If MATCH_FIRST is out of the buffer, leave it as '\0'. Note that MATCH_FIRST must not be smaller than 0. */ ch = (match_first >= length ? 0 : re_string_byte_at (&mctx.input, offset)); if (fastmap[ch]) break; match_first += incr; if (match_first < left_lim || match_first > right_lim) { err = REG_NOMATCH; goto free_return; } } break; } /* Reconstruct the buffers so that the matcher can assume that the matching starts from the beginning of the buffer. */ err = re_string_reconstruct (&mctx.input, match_first, eflags); if (BE (err != REG_NOERROR, 0)) goto free_return; #ifdef RE_ENABLE_I18N /* Don't consider this char as a possible match start if it part, yet isn't the head, of a multibyte character. */ if (!sb && !re_string_first_byte (&mctx.input, 0)) continue; #endif /* It seems to be appropriate one, then use the matcher. */ /* We assume that the matching starts from 0. */ mctx.state_log_top = mctx.nbkref_ents = mctx.max_mb_elem_len = 0; match_last = check_matching (&mctx, fl_longest_match, range >= 0 ? &match_first : NULL); if (match_last != -1) { if (BE (match_last == -2, 0)) { err = REG_ESPACE; goto free_return; } else { mctx.match_last = match_last; if ((!preg->no_sub && nmatch > 1) || dfa->nbackref) { re_dfastate_t *pstate = mctx.state_log[match_last]; mctx.last_node = check_halt_state_context (&mctx, pstate, match_last); } if ((!preg->no_sub && nmatch > 1 && dfa->has_plural_match) || dfa->nbackref) { err = prune_impossible_nodes (&mctx); if (err == REG_NOERROR) break; if (BE (err != REG_NOMATCH, 0)) goto free_return; match_last = -1; } else break; /* We found a match. */ } } match_ctx_clean (&mctx); } #ifdef DEBUG assert (match_last != -1); assert (err == REG_NOERROR); #endif /* Set pmatch[] if we need. */ if (nmatch > 0) { int reg_idx; /* Initialize registers. */ for (reg_idx = 1; reg_idx < nmatch; ++reg_idx) pmatch[reg_idx].rm_so = pmatch[reg_idx].rm_eo = -1; /* Set the points where matching start/end. */ pmatch[0].rm_so = 0; pmatch[0].rm_eo = mctx.match_last; if (!preg->no_sub && nmatch > 1) { err = set_regs (preg, &mctx, nmatch, pmatch, dfa->has_plural_match && dfa->nbackref > 0); if (BE (err != REG_NOERROR, 0)) goto free_return; } /* At last, add the offset to the each registers, since we slided the buffers so that we could assume that the matching starts from 0. */ for (reg_idx = 0; reg_idx < nmatch; ++reg_idx) if (pmatch[reg_idx].rm_so != -1) { #ifdef RE_ENABLE_I18N if (BE (mctx.input.offsets_needed != 0, 0)) { if (pmatch[reg_idx].rm_so == mctx.input.valid_len) pmatch[reg_idx].rm_so += mctx.input.valid_raw_len - mctx.input.valid_len; else pmatch[reg_idx].rm_so = mctx.input.offsets[pmatch[reg_idx].rm_so]; if (pmatch[reg_idx].rm_eo == mctx.input.valid_len) pmatch[reg_idx].rm_eo += mctx.input.valid_raw_len - mctx.input.valid_len; else pmatch[reg_idx].rm_eo = mctx.input.offsets[pmatch[reg_idx].rm_eo]; } #else assert (mctx.input.offsets_needed == 0); #endif pmatch[reg_idx].rm_so += match_first; pmatch[reg_idx].rm_eo += match_first; } if (dfa->subexp_map) for (reg_idx = 0; reg_idx + 1 < nmatch && reg_idx < preg->re_nsub; reg_idx++) if (dfa->subexp_map[reg_idx] != reg_idx) { pmatch[reg_idx + 1].rm_so = pmatch[dfa->subexp_map[reg_idx] + 1].rm_so; pmatch[reg_idx + 1].rm_eo = pmatch[dfa->subexp_map[reg_idx] + 1].rm_eo; } } free_return: re_free (mctx.state_log); if (dfa->nbackref) match_ctx_free (&mctx); re_string_destruct (&mctx.input); return err; } static reg_errcode_t prune_impossible_nodes (mctx) re_match_context_t *mctx; { re_dfa_t *const dfa = mctx->dfa; int halt_node, match_last; reg_errcode_t ret; re_dfastate_t **sifted_states; re_dfastate_t **lim_states = NULL; re_sift_context_t sctx; #ifdef DEBUG assert (mctx->state_log != NULL); #endif match_last = mctx->match_last; halt_node = mctx->last_node; sifted_states = re_malloc (re_dfastate_t *, match_last + 1); if (BE (sifted_states == NULL, 0)) { ret = REG_ESPACE; goto free_return; } if (dfa->nbackref) { lim_states = re_malloc (re_dfastate_t *, match_last + 1); if (BE (lim_states == NULL, 0)) { ret = REG_ESPACE; goto free_return; } while (1) { memset (lim_states, '\0', sizeof (re_dfastate_t *) * (match_last + 1)); sift_ctx_init (&sctx, sifted_states, lim_states, halt_node, match_last); ret = sift_states_backward (mctx, &sctx); re_node_set_free (&sctx.limits); if (BE (ret != REG_NOERROR, 0)) goto free_return; if (sifted_states[0] != NULL || lim_states[0] != NULL) break; do { --match_last; if (match_last < 0) { ret = REG_NOMATCH; goto free_return; } } while (mctx->state_log[match_last] == NULL || !mctx->state_log[match_last]->halt); halt_node = check_halt_state_context (mctx, mctx->state_log[match_last], match_last); } ret = merge_state_array (dfa, sifted_states, lim_states, match_last + 1); re_free (lim_states); lim_states = NULL; if (BE (ret != REG_NOERROR, 0)) goto free_return; } else { sift_ctx_init (&sctx, sifted_states, lim_states, halt_node, match_last); ret = sift_states_backward (mctx, &sctx); re_node_set_free (&sctx.limits); if (BE (ret != REG_NOERROR, 0)) goto free_return; } re_free (mctx->state_log); mctx->state_log = sifted_states; sifted_states = NULL; mctx->last_node = halt_node; mctx->match_last = match_last; ret = REG_NOERROR; free_return: re_free (sifted_states); re_free (lim_states); return ret; } /* Acquire an initial state and return it. We must select appropriate initial state depending on the context, since initial states may have constraints like "\<", "^", etc.. */ static inline re_dfastate_t * acquire_init_state_context (err, mctx, idx) reg_errcode_t *err; const re_match_context_t *mctx; int idx; { re_dfa_t *const dfa = mctx->dfa; if (dfa->init_state->has_constraint) { unsigned int context; context = re_string_context_at (&mctx->input, idx - 1, mctx->eflags); if (IS_WORD_CONTEXT (context)) return dfa->init_state_word; else if (IS_ORDINARY_CONTEXT (context)) return dfa->init_state; else if (IS_BEGBUF_CONTEXT (context) && IS_NEWLINE_CONTEXT (context)) return dfa->init_state_begbuf; else if (IS_NEWLINE_CONTEXT (context)) return dfa->init_state_nl; else if (IS_BEGBUF_CONTEXT (context)) { /* It is relatively rare case, then calculate on demand. */ return re_acquire_state_context (err, dfa, dfa->init_state->entrance_nodes, context); } else /* Must not happen? */ return dfa->init_state; } else return dfa->init_state; } /* Check whether the regular expression match input string INPUT or not, and return the index where the matching end, return -1 if not match, or return -2 in case of an error. FL_LONGEST_MATCH means we want the POSIX longest matching. If P_MATCH_FIRST is not NULL, and the match fails, it is set to the next place where we may want to try matching. Note that the matcher assume that the maching starts from the current index of the buffer. */ static int check_matching (mctx, fl_longest_match, p_match_first) re_match_context_t *mctx; int fl_longest_match; int *p_match_first; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int match = 0; int match_last = -1; int cur_str_idx = re_string_cur_idx (&mctx->input); re_dfastate_t *cur_state; int at_init_state = p_match_first != NULL; int next_start_idx = cur_str_idx; err = REG_NOERROR; cur_state = acquire_init_state_context (&err, mctx, cur_str_idx); /* An initial state must not be NULL (invalid). */ if (BE (cur_state == NULL, 0)) { assert (err == REG_ESPACE); return -2; } if (mctx->state_log != NULL) { mctx->state_log[cur_str_idx] = cur_state; /* Check OP_OPEN_SUBEXP in the initial state in case that we use them later. E.g. Processing back references. */ if (BE (dfa->nbackref, 0)) { at_init_state = 0; err = check_subexp_matching_top (mctx, &cur_state->nodes, 0); if (BE (err != REG_NOERROR, 0)) return err; if (cur_state->has_backref) { err = transit_state_bkref (mctx, &cur_state->nodes); if (BE (err != REG_NOERROR, 0)) return err; } } } /* If the RE accepts NULL string. */ if (BE (cur_state->halt, 0)) { if (!cur_state->has_constraint || check_halt_state_context (mctx, cur_state, cur_str_idx)) { if (!fl_longest_match) return cur_str_idx; else { match_last = cur_str_idx; match = 1; } } } while (!re_string_eoi (&mctx->input)) { re_dfastate_t *old_state = cur_state; int next_char_idx = re_string_cur_idx (&mctx->input) + 1; if (BE (next_char_idx >= mctx->input.bufs_len, 0) || (BE (next_char_idx >= mctx->input.valid_len, 0) && mctx->input.valid_len < mctx->input.len)) { err = extend_buffers (mctx); if (BE (err != REG_NOERROR, 0)) { assert (err == REG_ESPACE); return -2; } } cur_state = transit_state (&err, mctx, cur_state); if (mctx->state_log != NULL) cur_state = merge_state_with_log (&err, mctx, cur_state); if (cur_state == NULL) { /* Reached the invalid state or an error. Try to recover a valid state using the state log, if available and if we have not already found a valid (even if not the longest) match. */ if (BE (err != REG_NOERROR, 0)) return -2; if (mctx->state_log == NULL || (match && !fl_longest_match) || (cur_state = find_recover_state (&err, mctx)) == NULL) break; } if (BE (at_init_state, 0)) { if (old_state == cur_state) next_start_idx = next_char_idx; else at_init_state = 0; } if (cur_state->halt) { /* Reached a halt state. Check the halt state can satisfy the current context. */ if (!cur_state->has_constraint || check_halt_state_context (mctx, cur_state, re_string_cur_idx (&mctx->input))) { /* We found an appropriate halt state. */ match_last = re_string_cur_idx (&mctx->input); match = 1; /* We found a match, do not modify match_first below. */ p_match_first = NULL; if (!fl_longest_match) break; } } } if (p_match_first) *p_match_first += next_start_idx; return match_last; } /* Check NODE match the current context. */ static int check_halt_node_context (dfa, node, context) const re_dfa_t *dfa; int node; unsigned int context; { re_token_type_t type = dfa->nodes[node].type; unsigned int constraint = dfa->nodes[node].constraint; if (type != END_OF_RE) return 0; if (!constraint) return 1; if (NOT_SATISFY_NEXT_CONSTRAINT (constraint, context)) return 0; return 1; } /* Check the halt state STATE match the current context. Return 0 if not match, if the node, STATE has, is a halt node and match the context, return the node. */ static int check_halt_state_context (mctx, state, idx) const re_match_context_t *mctx; const re_dfastate_t *state; int idx; { int i; unsigned int context; #ifdef DEBUG assert (state->halt); #endif context = re_string_context_at (&mctx->input, idx, mctx->eflags); for (i = 0; i < state->nodes.nelem; ++i) if (check_halt_node_context (mctx->dfa, state->nodes.elems[i], context)) return state->nodes.elems[i]; return 0; } /* Compute the next node to which "NFA" transit from NODE("NFA" is a NFA corresponding to the DFA). Return the destination node, and update EPS_VIA_NODES, return -1 in case of errors. */ static int proceed_next_node (mctx, nregs, regs, pidx, node, eps_via_nodes, fs) const re_match_context_t *mctx; regmatch_t *regs; int nregs, *pidx, node; re_node_set *eps_via_nodes; struct re_fail_stack_t *fs; { re_dfa_t *const dfa = mctx->dfa; int i, err, dest_node; dest_node = -1; if (IS_EPSILON_NODE (dfa->nodes[node].type)) { re_node_set *cur_nodes = &mctx->state_log[*pidx]->nodes; re_node_set *edests = &dfa->edests[node]; int dest_node; err = re_node_set_insert (eps_via_nodes, node); if (BE (err < 0, 0)) return -2; /* Pick up a valid destination, or return -1 if none is found. */ for (dest_node = -1, i = 0; i < edests->nelem; ++i) { int candidate = edests->elems[i]; if (!re_node_set_contains (cur_nodes, candidate)) continue; if (dest_node == -1) dest_node = candidate; else { /* In order to avoid infinite loop like "(a*)*", return the second epsilon-transition if the first was already considered. */ if (re_node_set_contains (eps_via_nodes, dest_node)) return candidate; /* Otherwise, push the second epsilon-transition on the fail stack. */ else if (fs != NULL && push_fail_stack (fs, *pidx, candidate, nregs, regs, eps_via_nodes)) return -2; /* We know we are going to exit. */ break; } } return dest_node; } else { int naccepted = 0; re_token_type_t type = dfa->nodes[node].type; #ifdef RE_ENABLE_I18N if (ACCEPT_MB_NODE (type)) naccepted = check_node_accept_bytes (dfa, node, &mctx->input, *pidx); else #endif /* RE_ENABLE_I18N */ if (type == OP_BACK_REF) { int subexp_idx = dfa->nodes[node].opr.idx + 1; naccepted = regs[subexp_idx].rm_eo - regs[subexp_idx].rm_so; if (fs != NULL) { if (regs[subexp_idx].rm_so == -1 || regs[subexp_idx].rm_eo == -1) return -1; else if (naccepted) { char *buf = (char *) re_string_get_buffer (&mctx->input); if (memcmp (buf + regs[subexp_idx].rm_so, buf + *pidx, naccepted) != 0) return -1; } } if (naccepted == 0) { err = re_node_set_insert (eps_via_nodes, node); if (BE (err < 0, 0)) return -2; dest_node = dfa->edests[node].elems[0]; if (re_node_set_contains (&mctx->state_log[*pidx]->nodes, dest_node)) return dest_node; } } if (naccepted != 0 || check_node_accept (mctx, dfa->nodes + node, *pidx)) { dest_node = dfa->nexts[node]; *pidx = (naccepted == 0) ? *pidx + 1 : *pidx + naccepted; if (fs && (*pidx > mctx->match_last || mctx->state_log[*pidx] == NULL || !re_node_set_contains (&mctx->state_log[*pidx]->nodes, dest_node))) return -1; re_node_set_empty (eps_via_nodes); return dest_node; } } return -1; } static reg_errcode_t push_fail_stack (fs, str_idx, dest_node, nregs, regs, eps_via_nodes) struct re_fail_stack_t *fs; int str_idx, dest_node, nregs; regmatch_t *regs; re_node_set *eps_via_nodes; { reg_errcode_t err; int num = fs->num++; if (fs->num == fs->alloc) { struct re_fail_stack_ent_t *new_array; new_array = realloc (fs->stack, (sizeof (struct re_fail_stack_ent_t) * fs->alloc * 2)); if (new_array == NULL) return REG_ESPACE; fs->alloc *= 2; fs->stack = new_array; } fs->stack[num].idx = str_idx; fs->stack[num].node = dest_node; fs->stack[num].regs = re_malloc (regmatch_t, nregs); if (fs->stack[num].regs == NULL) return REG_ESPACE; memcpy (fs->stack[num].regs, regs, sizeof (regmatch_t) * nregs); err = re_node_set_init_copy (&fs->stack[num].eps_via_nodes, eps_via_nodes); return err; } static int pop_fail_stack (fs, pidx, nregs, regs, eps_via_nodes) struct re_fail_stack_t *fs; int *pidx, nregs; regmatch_t *regs; re_node_set *eps_via_nodes; { int num = --fs->num; assert (num >= 0); *pidx = fs->stack[num].idx; memcpy (regs, fs->stack[num].regs, sizeof (regmatch_t) * nregs); re_node_set_free (eps_via_nodes); re_free (fs->stack[num].regs); *eps_via_nodes = fs->stack[num].eps_via_nodes; return fs->stack[num].node; } /* Set the positions where the subexpressions are starts/ends to registers PMATCH. Note: We assume that pmatch[0] is already set, and pmatch[i].rm_so == pmatch[i].rm_eo == -1 for 0 < i < nmatch. */ static reg_errcode_t set_regs (preg, mctx, nmatch, pmatch, fl_backtrack) const regex_t *preg; const re_match_context_t *mctx; size_t nmatch; regmatch_t *pmatch; int fl_backtrack; { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; int idx, cur_node, real_nmatch; re_node_set eps_via_nodes; struct re_fail_stack_t *fs; struct re_fail_stack_t fs_body = { 0, 2, NULL }; regmatch_t *prev_idx_match; #ifdef DEBUG assert (nmatch > 1); assert (mctx->state_log != NULL); #endif if (fl_backtrack) { fs = &fs_body; fs->stack = re_malloc (struct re_fail_stack_ent_t, fs->alloc); if (fs->stack == NULL) return REG_ESPACE; } else fs = NULL; cur_node = dfa->init_node; real_nmatch = (nmatch <= preg->re_nsub) ? nmatch : preg->re_nsub + 1; re_node_set_init_empty (&eps_via_nodes); prev_idx_match = (regmatch_t *) alloca (sizeof (regmatch_t) * real_nmatch); memcpy (prev_idx_match, pmatch, sizeof (regmatch_t) * real_nmatch); for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) { update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, real_nmatch); if (idx == pmatch[0].rm_eo && cur_node == mctx->last_node) { int reg_idx; if (fs) { for (reg_idx = 0; reg_idx < nmatch; ++reg_idx) if (pmatch[reg_idx].rm_so > -1 && pmatch[reg_idx].rm_eo == -1) break; if (reg_idx == nmatch) { re_node_set_free (&eps_via_nodes); return free_fail_stack_return (fs); } cur_node = pop_fail_stack (fs, &idx, nmatch, pmatch, &eps_via_nodes); } else { re_node_set_free (&eps_via_nodes); return REG_NOERROR; } } /* Proceed to next node. */ cur_node = proceed_next_node (mctx, nmatch, pmatch, &idx, cur_node, &eps_via_nodes, fs); if (BE (cur_node < 0, 0)) { if (BE (cur_node == -2, 0)) { re_node_set_free (&eps_via_nodes); free_fail_stack_return (fs); return REG_ESPACE; } if (fs) cur_node = pop_fail_stack (fs, &idx, nmatch, pmatch, &eps_via_nodes); else { re_node_set_free (&eps_via_nodes); return REG_NOMATCH; } } } re_node_set_free (&eps_via_nodes); return free_fail_stack_return (fs); } static reg_errcode_t free_fail_stack_return (fs) struct re_fail_stack_t *fs; { if (fs) { int fs_idx; for (fs_idx = 0; fs_idx < fs->num; ++fs_idx) { re_node_set_free (&fs->stack[fs_idx].eps_via_nodes); re_free (fs->stack[fs_idx].regs); } re_free (fs->stack); } return REG_NOERROR; } static void update_regs (dfa, pmatch, prev_idx_match, cur_node, cur_idx, nmatch) re_dfa_t *dfa; regmatch_t *pmatch, *prev_idx_match; int cur_node, cur_idx, nmatch; { int type = dfa->nodes[cur_node].type; if (type == OP_OPEN_SUBEXP) { int reg_num = dfa->nodes[cur_node].opr.idx + 1; /* We are at the first node of this sub expression. */ if (reg_num < nmatch) { pmatch[reg_num].rm_so = cur_idx; pmatch[reg_num].rm_eo = -1; } } else if (type == OP_CLOSE_SUBEXP) { int reg_num = dfa->nodes[cur_node].opr.idx + 1; if (reg_num < nmatch) { /* We are at the last node of this sub expression. */ if (pmatch[reg_num].rm_so < cur_idx) { pmatch[reg_num].rm_eo = cur_idx; /* This is a non-empty match or we are not inside an optional subexpression. Accept this right away. */ memcpy (prev_idx_match, pmatch, sizeof (regmatch_t) * nmatch); } else { if (dfa->nodes[cur_node].opt_subexp && prev_idx_match[reg_num].rm_so != -1) /* We transited through an empty match for an optional subexpression, like (a?)*, and this is not the subexp's first match. Copy back the old content of the registers so that matches of an inner subexpression are undone as well, like in ((a?))*. */ memcpy (pmatch, prev_idx_match, sizeof (regmatch_t) * nmatch); else /* We completed a subexpression, but it may be part of an optional one, so do not update PREV_IDX_MATCH. */ pmatch[reg_num].rm_eo = cur_idx; } } } } /* This function checks the STATE_LOG from the SCTX->last_str_idx to 0 and sift the nodes in each states according to the following rules. Updated state_log will be wrote to STATE_LOG. Rules: We throw away the Node `a' in the STATE_LOG[STR_IDX] if... 1. When STR_IDX == MATCH_LAST(the last index in the state_log): If `a' isn't the LAST_NODE and `a' can't epsilon transit to the LAST_NODE, we throw away the node `a'. 2. When 0 <= STR_IDX < MATCH_LAST and `a' accepts string `s' and transit to `b': i. If 'b' isn't in the STATE_LOG[STR_IDX+strlen('s')], we throw away the node `a'. ii. If 'b' is in the STATE_LOG[STR_IDX+strlen('s')] but 'b' is thrown away, we throw away the node `a'. 3. When 0 <= STR_IDX < MATCH_LAST and 'a' epsilon transit to 'b': i. If 'b' isn't in the STATE_LOG[STR_IDX], we throw away the node `a'. ii. If 'b' is in the STATE_LOG[STR_IDX] but 'b' is thrown away, we throw away the node `a'. */ #define STATE_NODE_CONTAINS(state,node) \ ((state) != NULL && re_node_set_contains (&(state)->nodes, node)) static reg_errcode_t sift_states_backward (mctx, sctx) re_match_context_t *mctx; re_sift_context_t *sctx; { reg_errcode_t err; int null_cnt = 0; int str_idx = sctx->last_str_idx; re_node_set cur_dest; #ifdef DEBUG assert (mctx->state_log != NULL && mctx->state_log[str_idx] != NULL); #endif /* Build sifted state_log[str_idx]. It has the nodes which can epsilon transit to the last_node and the last_node itself. */ err = re_node_set_init_1 (&cur_dest, sctx->last_node); if (BE (err != REG_NOERROR, 0)) return err; err = update_cur_sifted_state (mctx, sctx, str_idx, &cur_dest); if (BE (err != REG_NOERROR, 0)) goto free_return; /* Then check each states in the state_log. */ while (str_idx > 0) { /* Update counters. */ null_cnt = (sctx->sifted_states[str_idx] == NULL) ? null_cnt + 1 : 0; if (null_cnt > mctx->max_mb_elem_len) { memset (sctx->sifted_states, '\0', sizeof (re_dfastate_t *) * str_idx); re_node_set_free (&cur_dest); return REG_NOERROR; } re_node_set_empty (&cur_dest); --str_idx; if (mctx->state_log[str_idx]) { err = build_sifted_states (mctx, sctx, str_idx, &cur_dest); if (BE (err != REG_NOERROR, 0)) goto free_return; } /* Add all the nodes which satisfy the following conditions: - It can epsilon transit to a node in CUR_DEST. - It is in CUR_SRC. And update state_log. */ err = update_cur_sifted_state (mctx, sctx, str_idx, &cur_dest); if (BE (err != REG_NOERROR, 0)) goto free_return; } err = REG_NOERROR; free_return: re_node_set_free (&cur_dest); return err; } static reg_errcode_t build_sifted_states (mctx, sctx, str_idx, cur_dest) re_match_context_t *mctx; re_sift_context_t *sctx; int str_idx; re_node_set *cur_dest; { re_dfa_t *const dfa = mctx->dfa; re_node_set *cur_src = &mctx->state_log[str_idx]->non_eps_nodes; int i; /* Then build the next sifted state. We build the next sifted state on `cur_dest', and update `sifted_states[str_idx]' with `cur_dest'. Note: `cur_dest' is the sifted state from `state_log[str_idx + 1]'. `cur_src' points the node_set of the old `state_log[str_idx]' (with the epsilon nodes pre-filtered out). */ for (i = 0; i < cur_src->nelem; i++) { int prev_node = cur_src->elems[i]; int naccepted = 0; int ret; #if defined DEBUG || defined RE_ENABLE_I18N re_token_type_t type = dfa->nodes[prev_node].type; #endif #ifdef DEBUG assert (!IS_EPSILON_NODE (type)); #endif #ifdef RE_ENABLE_I18N /* If the node may accept `multi byte'. */ if (ACCEPT_MB_NODE (type)) naccepted = sift_states_iter_mb (mctx, sctx, prev_node, str_idx, sctx->last_str_idx); #endif /* RE_ENABLE_I18N */ /* We don't check backreferences here. See update_cur_sifted_state(). */ if (!naccepted && check_node_accept (mctx, dfa->nodes + prev_node, str_idx) && STATE_NODE_CONTAINS (sctx->sifted_states[str_idx + 1], dfa->nexts[prev_node])) naccepted = 1; if (naccepted == 0) continue; if (sctx->limits.nelem) { int to_idx = str_idx + naccepted; if (check_dst_limits (mctx, &sctx->limits, dfa->nexts[prev_node], to_idx, prev_node, str_idx)) continue; } ret = re_node_set_insert (cur_dest, prev_node); if (BE (ret == -1, 0)) return REG_ESPACE; } return REG_NOERROR; } /* Helper functions. */ static reg_errcode_t clean_state_log_if_needed (mctx, next_state_log_idx) re_match_context_t *mctx; int next_state_log_idx; { int top = mctx->state_log_top; if (next_state_log_idx >= mctx->input.bufs_len || (next_state_log_idx >= mctx->input.valid_len && mctx->input.valid_len < mctx->input.len)) { reg_errcode_t err; err = extend_buffers (mctx); if (BE (err != REG_NOERROR, 0)) return err; } if (top < next_state_log_idx) { memset (mctx->state_log + top + 1, '\0', sizeof (re_dfastate_t *) * (next_state_log_idx - top)); mctx->state_log_top = next_state_log_idx; } return REG_NOERROR; } static reg_errcode_t merge_state_array (dfa, dst, src, num) re_dfa_t *dfa; re_dfastate_t **dst; re_dfastate_t **src; int num; { int st_idx; reg_errcode_t err; for (st_idx = 0; st_idx < num; ++st_idx) { if (dst[st_idx] == NULL) dst[st_idx] = src[st_idx]; else if (src[st_idx] != NULL) { re_node_set merged_set; err = re_node_set_init_union (&merged_set, &dst[st_idx]->nodes, &src[st_idx]->nodes); if (BE (err != REG_NOERROR, 0)) return err; dst[st_idx] = re_acquire_state (&err, dfa, &merged_set); re_node_set_free (&merged_set); if (BE (err != REG_NOERROR, 0)) return err; } } return REG_NOERROR; } static reg_errcode_t update_cur_sifted_state (mctx, sctx, str_idx, dest_nodes) re_match_context_t *mctx; re_sift_context_t *sctx; int str_idx; re_node_set *dest_nodes; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; const re_node_set *candidates; candidates = ((mctx->state_log[str_idx] == NULL) ? NULL : &mctx->state_log[str_idx]->nodes); if (dest_nodes->nelem == 0) sctx->sifted_states[str_idx] = NULL; else { if (candidates) { /* At first, add the nodes which can epsilon transit to a node in DEST_NODE. */ err = add_epsilon_src_nodes (dfa, dest_nodes, candidates); if (BE (err != REG_NOERROR, 0)) return err; /* Then, check the limitations in the current sift_context. */ if (sctx->limits.nelem) { err = check_subexp_limits (dfa, dest_nodes, candidates, &sctx->limits, mctx->bkref_ents, str_idx); if (BE (err != REG_NOERROR, 0)) return err; } } sctx->sifted_states[str_idx] = re_acquire_state (&err, dfa, dest_nodes); if (BE (err != REG_NOERROR, 0)) return err; } if (candidates && mctx->state_log[str_idx]->has_backref) { err = sift_states_bkref (mctx, sctx, str_idx, candidates); if (BE (err != REG_NOERROR, 0)) return err; } return REG_NOERROR; } static reg_errcode_t add_epsilon_src_nodes (dfa, dest_nodes, candidates) re_dfa_t *dfa; re_node_set *dest_nodes; const re_node_set *candidates; { reg_errcode_t err = REG_NOERROR; int i; re_dfastate_t *state = re_acquire_state (&err, dfa, dest_nodes); if (BE (err != REG_NOERROR, 0)) return err; if (!state->inveclosure.alloc) { err = re_node_set_alloc (&state->inveclosure, dest_nodes->nelem); if (BE (err != REG_NOERROR, 0)) return REG_ESPACE; for (i = 0; i < dest_nodes->nelem; i++) re_node_set_merge (&state->inveclosure, dfa->inveclosures + dest_nodes->elems[i]); } return re_node_set_add_intersect (dest_nodes, candidates, &state->inveclosure); } static reg_errcode_t sub_epsilon_src_nodes (dfa, node, dest_nodes, candidates) re_dfa_t *dfa; int node; re_node_set *dest_nodes; const re_node_set *candidates; { int ecl_idx; reg_errcode_t err; re_node_set *inv_eclosure = dfa->inveclosures + node; re_node_set except_nodes; re_node_set_init_empty (&except_nodes); for (ecl_idx = 0; ecl_idx < inv_eclosure->nelem; ++ecl_idx) { int cur_node = inv_eclosure->elems[ecl_idx]; if (cur_node == node) continue; if (IS_EPSILON_NODE (dfa->nodes[cur_node].type)) { int edst1 = dfa->edests[cur_node].elems[0]; int edst2 = ((dfa->edests[cur_node].nelem > 1) ? dfa->edests[cur_node].elems[1] : -1); if ((!re_node_set_contains (inv_eclosure, edst1) && re_node_set_contains (dest_nodes, edst1)) || (edst2 > 0 && !re_node_set_contains (inv_eclosure, edst2) && re_node_set_contains (dest_nodes, edst2))) { err = re_node_set_add_intersect (&except_nodes, candidates, dfa->inveclosures + cur_node); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&except_nodes); return err; } } } } for (ecl_idx = 0; ecl_idx < inv_eclosure->nelem; ++ecl_idx) { int cur_node = inv_eclosure->elems[ecl_idx]; if (!re_node_set_contains (&except_nodes, cur_node)) { int idx = re_node_set_contains (dest_nodes, cur_node) - 1; re_node_set_remove_at (dest_nodes, idx); } } re_node_set_free (&except_nodes); return REG_NOERROR; } static int check_dst_limits (mctx, limits, dst_node, dst_idx, src_node, src_idx) re_match_context_t *mctx; re_node_set *limits; int dst_node, dst_idx, src_node, src_idx; { re_dfa_t *const dfa = mctx->dfa; int lim_idx, src_pos, dst_pos; int dst_bkref_idx = search_cur_bkref_entry (mctx, dst_idx); int src_bkref_idx = search_cur_bkref_entry (mctx, src_idx); for (lim_idx = 0; lim_idx < limits->nelem; ++lim_idx) { int subexp_idx; struct re_backref_cache_entry *ent; ent = mctx->bkref_ents + limits->elems[lim_idx]; subexp_idx = dfa->nodes[ent->node].opr.idx; dst_pos = check_dst_limits_calc_pos (mctx, limits->elems[lim_idx], subexp_idx, dst_node, dst_idx, dst_bkref_idx); src_pos = check_dst_limits_calc_pos (mctx, limits->elems[lim_idx], subexp_idx, src_node, src_idx, src_bkref_idx); /* In case of: ( ) ( ) ( ) */ if (src_pos == dst_pos) continue; /* This is unrelated limitation. */ else return 1; } return 0; } static int check_dst_limits_calc_pos_1 (mctx, boundaries, subexp_idx, from_node, bkref_idx) re_match_context_t *mctx; int boundaries, subexp_idx, from_node, bkref_idx; { re_dfa_t *const dfa = mctx->dfa; re_node_set *eclosures = dfa->eclosures + from_node; int node_idx; /* Else, we are on the boundary: examine the nodes on the epsilon closure. */ for (node_idx = 0; node_idx < eclosures->nelem; ++node_idx) { int node = eclosures->elems[node_idx]; switch (dfa->nodes[node].type) { case OP_BACK_REF: if (bkref_idx != -1) { struct re_backref_cache_entry *ent = mctx->bkref_ents + bkref_idx; do { int dst, cpos; if (ent->node != node) continue; if (subexp_idx <= 8 * sizeof (ent->eps_reachable_subexps_map) && !(ent->eps_reachable_subexps_map & (1 << subexp_idx))) continue; /* Recurse trying to reach the OP_OPEN_SUBEXP and OP_CLOSE_SUBEXP cases below. But, if the destination node is the same node as the source node, don't recurse because it would cause an infinite loop: a regex that exhibits this behavior is ()\1*\1* */ dst = dfa->edests[node].elems[0]; if (dst == from_node) { if (boundaries & 1) return -1; else /* if (boundaries & 2) */ return 0; } cpos = check_dst_limits_calc_pos_1 (mctx, boundaries, subexp_idx, dst, bkref_idx); if (cpos == -1 /* && (boundaries & 1) */) return -1; if (cpos == 0 && (boundaries & 2)) return 0; ent->eps_reachable_subexps_map &= ~(1 << subexp_idx); } while (ent++->more); } break; case OP_OPEN_SUBEXP: if ((boundaries & 1) && subexp_idx == dfa->nodes[node].opr.idx) return -1; break; case OP_CLOSE_SUBEXP: if ((boundaries & 2) && subexp_idx == dfa->nodes[node].opr.idx) return 0; break; default: break; } } return (boundaries & 2) ? 1 : 0; } static int check_dst_limits_calc_pos (mctx, limit, subexp_idx, from_node, str_idx, bkref_idx) re_match_context_t *mctx; int limit, subexp_idx, from_node, str_idx, bkref_idx; { struct re_backref_cache_entry *lim = mctx->bkref_ents + limit; int boundaries; /* If we are outside the range of the subexpression, return -1 or 1. */ if (str_idx < lim->subexp_from) return -1; if (lim->subexp_to < str_idx) return 1; /* If we are within the subexpression, return 0. */ boundaries = (str_idx == lim->subexp_from); boundaries |= (str_idx == lim->subexp_to) << 1; if (boundaries == 0) return 0; /* Else, examine epsilon closure. */ return check_dst_limits_calc_pos_1 (mctx, boundaries, subexp_idx, from_node, bkref_idx); } /* Check the limitations of sub expressions LIMITS, and remove the nodes which are against limitations from DEST_NODES. */ static reg_errcode_t check_subexp_limits (dfa, dest_nodes, candidates, limits, bkref_ents, str_idx) re_dfa_t *dfa; re_node_set *dest_nodes; const re_node_set *candidates; re_node_set *limits; struct re_backref_cache_entry *bkref_ents; int str_idx; { reg_errcode_t err; int node_idx, lim_idx; for (lim_idx = 0; lim_idx < limits->nelem; ++lim_idx) { int subexp_idx; struct re_backref_cache_entry *ent; ent = bkref_ents + limits->elems[lim_idx]; if (str_idx <= ent->subexp_from || ent->str_idx < str_idx) continue; /* This is unrelated limitation. */ subexp_idx = dfa->nodes[ent->node].opr.idx; if (ent->subexp_to == str_idx) { int ops_node = -1; int cls_node = -1; for (node_idx = 0; node_idx < dest_nodes->nelem; ++node_idx) { int node = dest_nodes->elems[node_idx]; re_token_type_t type = dfa->nodes[node].type; if (type == OP_OPEN_SUBEXP && subexp_idx == dfa->nodes[node].opr.idx) ops_node = node; else if (type == OP_CLOSE_SUBEXP && subexp_idx == dfa->nodes[node].opr.idx) cls_node = node; } /* Check the limitation of the open subexpression. */ /* Note that (ent->subexp_to = str_idx != ent->subexp_from). */ if (ops_node >= 0) { err = sub_epsilon_src_nodes (dfa, ops_node, dest_nodes, candidates); if (BE (err != REG_NOERROR, 0)) return err; } /* Check the limitation of the close subexpression. */ if (cls_node >= 0) for (node_idx = 0; node_idx < dest_nodes->nelem; ++node_idx) { int node = dest_nodes->elems[node_idx]; if (!re_node_set_contains (dfa->inveclosures + node, cls_node) && !re_node_set_contains (dfa->eclosures + node, cls_node)) { /* It is against this limitation. Remove it form the current sifted state. */ err = sub_epsilon_src_nodes (dfa, node, dest_nodes, candidates); if (BE (err != REG_NOERROR, 0)) return err; --node_idx; } } } else /* (ent->subexp_to != str_idx) */ { for (node_idx = 0; node_idx < dest_nodes->nelem; ++node_idx) { int node = dest_nodes->elems[node_idx]; re_token_type_t type = dfa->nodes[node].type; if (type == OP_CLOSE_SUBEXP || type == OP_OPEN_SUBEXP) { if (subexp_idx != dfa->nodes[node].opr.idx) continue; /* It is against this limitation. Remove it form the current sifted state. */ err = sub_epsilon_src_nodes (dfa, node, dest_nodes, candidates); if (BE (err != REG_NOERROR, 0)) return err; } } } } return REG_NOERROR; } static reg_errcode_t sift_states_bkref (mctx, sctx, str_idx, candidates) re_match_context_t *mctx; re_sift_context_t *sctx; int str_idx; const re_node_set *candidates; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int node_idx, node; re_sift_context_t local_sctx; int first_idx = search_cur_bkref_entry (mctx, str_idx); if (first_idx == -1) return REG_NOERROR; local_sctx.sifted_states = NULL; /* Mark that it hasn't been initialized. */ for (node_idx = 0; node_idx < candidates->nelem; ++node_idx) { int enabled_idx; re_token_type_t type; struct re_backref_cache_entry *entry; node = candidates->elems[node_idx]; type = dfa->nodes[node].type; /* Avoid infinite loop for the REs like "()\1+". */ if (node == sctx->last_node && str_idx == sctx->last_str_idx) continue; if (type != OP_BACK_REF) continue; entry = mctx->bkref_ents + first_idx; enabled_idx = first_idx; do { int subexp_len, to_idx, dst_node; re_dfastate_t *cur_state; if (entry->node != node) continue; subexp_len = entry->subexp_to - entry->subexp_from; to_idx = str_idx + subexp_len; dst_node = (subexp_len ? dfa->nexts[node] : dfa->edests[node].elems[0]); if (to_idx > sctx->last_str_idx || sctx->sifted_states[to_idx] == NULL || !STATE_NODE_CONTAINS (sctx->sifted_states[to_idx], dst_node) || check_dst_limits (mctx, &sctx->limits, node, str_idx, dst_node, to_idx)) continue; if (local_sctx.sifted_states == NULL) { local_sctx = *sctx; err = re_node_set_init_copy (&local_sctx.limits, &sctx->limits); if (BE (err != REG_NOERROR, 0)) goto free_return; } local_sctx.last_node = node; local_sctx.last_str_idx = str_idx; err = re_node_set_insert (&local_sctx.limits, enabled_idx); if (BE (err < 0, 0)) { err = REG_ESPACE; goto free_return; } cur_state = local_sctx.sifted_states[str_idx]; err = sift_states_backward (mctx, &local_sctx); if (BE (err != REG_NOERROR, 0)) goto free_return; if (sctx->limited_states != NULL) { err = merge_state_array (dfa, sctx->limited_states, local_sctx.sifted_states, str_idx + 1); if (BE (err != REG_NOERROR, 0)) goto free_return; } local_sctx.sifted_states[str_idx] = cur_state; re_node_set_remove (&local_sctx.limits, enabled_idx); /* mctx->bkref_ents may have changed, reload the pointer. */ entry = mctx->bkref_ents + enabled_idx; } while (enabled_idx++, entry++->more); } err = REG_NOERROR; free_return: if (local_sctx.sifted_states != NULL) { re_node_set_free (&local_sctx.limits); } return err; } #ifdef RE_ENABLE_I18N static int sift_states_iter_mb (mctx, sctx, node_idx, str_idx, max_str_idx) const re_match_context_t *mctx; re_sift_context_t *sctx; int node_idx, str_idx, max_str_idx; { re_dfa_t *const dfa = mctx->dfa; int naccepted; /* Check the node can accept `multi byte'. */ naccepted = check_node_accept_bytes (dfa, node_idx, &mctx->input, str_idx); if (naccepted > 0 && str_idx + naccepted <= max_str_idx && !STATE_NODE_CONTAINS (sctx->sifted_states[str_idx + naccepted], dfa->nexts[node_idx])) /* The node can't accept the `multi byte', or the destination was already thrown away, then the node could't accept the current input `multi byte'. */ naccepted = 0; /* Otherwise, it is sure that the node could accept `naccepted' bytes input. */ return naccepted; } #endif /* RE_ENABLE_I18N */ /* Functions for state transition. */ /* Return the next state to which the current state STATE will transit by accepting the current input byte, and update STATE_LOG if necessary. If STATE can accept a multibyte char/collating element/back reference update the destination of STATE_LOG. */ static re_dfastate_t * transit_state (err, mctx, state) reg_errcode_t *err; re_match_context_t *mctx; re_dfastate_t *state; { re_dfa_t *const dfa = mctx->dfa; re_dfastate_t **trtable; unsigned char ch; #ifdef RE_ENABLE_I18N /* If the current state can accept multibyte. */ if (BE (state->accept_mb, 0)) { *err = transit_state_mb (mctx, state); if (BE (*err != REG_NOERROR, 0)) return NULL; } #endif /* RE_ENABLE_I18N */ /* Then decide the next state with the single byte. */ if (1) { /* Use transition table */ ch = re_string_fetch_byte (&mctx->input); trtable = state->trtable; if (trtable == NULL) { trtable = build_trtable (dfa, state); if (trtable == NULL) { *err = REG_ESPACE; return NULL; } } if (BE (state->word_trtable, 0)) { unsigned int context; context = re_string_context_at (&mctx->input, re_string_cur_idx (&mctx->input) - 1, mctx->eflags); if (IS_WORD_CONTEXT (context)) return trtable[ch + SBC_MAX]; else return trtable[ch]; } else return trtable[ch]; } #if 0 else /* don't use transition table */ return transit_state_sb (err, mctx, state); #endif } /* Update the state_log if we need */ re_dfastate_t * merge_state_with_log (err, mctx, next_state) reg_errcode_t *err; re_match_context_t *mctx; re_dfastate_t *next_state; { re_dfa_t *const dfa = mctx->dfa; int cur_idx = re_string_cur_idx (&mctx->input); if (cur_idx > mctx->state_log_top) { mctx->state_log[cur_idx] = next_state; mctx->state_log_top = cur_idx; } else if (mctx->state_log[cur_idx] == 0) { mctx->state_log[cur_idx] = next_state; } else { re_dfastate_t *pstate; unsigned int context; re_node_set next_nodes, *log_nodes, *table_nodes = NULL; /* If (state_log[cur_idx] != 0), it implies that cur_idx is the destination of a multibyte char/collating element/ back reference. Then the next state is the union set of these destinations and the results of the transition table. */ pstate = mctx->state_log[cur_idx]; log_nodes = pstate->entrance_nodes; if (next_state != NULL) { table_nodes = next_state->entrance_nodes; *err = re_node_set_init_union (&next_nodes, table_nodes, log_nodes); if (BE (*err != REG_NOERROR, 0)) return NULL; } else next_nodes = *log_nodes; /* Note: We already add the nodes of the initial state, then we don't need to add them here. */ context = re_string_context_at (&mctx->input, re_string_cur_idx (&mctx->input) - 1, mctx->eflags); next_state = mctx->state_log[cur_idx] = re_acquire_state_context (err, dfa, &next_nodes, context); /* We don't need to check errors here, since the return value of this function is next_state and ERR is already set. */ if (table_nodes != NULL) re_node_set_free (&next_nodes); } if (BE (dfa->nbackref, 0) && next_state != NULL) { /* Check OP_OPEN_SUBEXP in the current state in case that we use them later. We must check them here, since the back references in the next state might use them. */ *err = check_subexp_matching_top (mctx, &next_state->nodes, cur_idx); if (BE (*err != REG_NOERROR, 0)) return NULL; /* If the next state has back references. */ if (next_state->has_backref) { *err = transit_state_bkref (mctx, &next_state->nodes); if (BE (*err != REG_NOERROR, 0)) return NULL; next_state = mctx->state_log[cur_idx]; } } return next_state; } /* Skip bytes in the input that correspond to part of a multi-byte match, then look in the log for a state from which to restart matching. */ re_dfastate_t * find_recover_state (err, mctx) reg_errcode_t *err; re_match_context_t *mctx; { re_dfastate_t *cur_state = NULL; do { int max = mctx->state_log_top; int cur_str_idx = re_string_cur_idx (&mctx->input); do { if (++cur_str_idx > max) return NULL; re_string_skip_bytes (&mctx->input, 1); } while (mctx->state_log[cur_str_idx] == NULL); cur_state = merge_state_with_log (err, mctx, NULL); } while (err == REG_NOERROR && cur_state == NULL); return cur_state; } /* Helper functions for transit_state. */ /* From the node set CUR_NODES, pick up the nodes whose types are OP_OPEN_SUBEXP and which have corresponding back references in the regular expression. And register them to use them later for evaluating the correspoding back references. */ static reg_errcode_t check_subexp_matching_top (mctx, cur_nodes, str_idx) re_match_context_t *mctx; re_node_set *cur_nodes; int str_idx; { re_dfa_t *const dfa = mctx->dfa; int node_idx; reg_errcode_t err; /* TODO: This isn't efficient. Because there might be more than one nodes whose types are OP_OPEN_SUBEXP and whose index is SUBEXP_IDX, we must check all nodes. E.g. RE: (a){2} */ for (node_idx = 0; node_idx < cur_nodes->nelem; ++node_idx) { int node = cur_nodes->elems[node_idx]; if (dfa->nodes[node].type == OP_OPEN_SUBEXP && dfa->nodes[node].opr.idx < (8 * sizeof (dfa->used_bkref_map)) && dfa->used_bkref_map & (1 << dfa->nodes[node].opr.idx)) { err = match_ctx_add_subtop (mctx, node, str_idx); if (BE (err != REG_NOERROR, 0)) return err; } } return REG_NOERROR; } #if 0 /* Return the next state to which the current state STATE will transit by accepting the current input byte. */ static re_dfastate_t * transit_state_sb (err, mctx, state) reg_errcode_t *err; re_match_context_t *mctx; re_dfastate_t *state; { re_dfa_t *const dfa = mctx->dfa; re_node_set next_nodes; re_dfastate_t *next_state; int node_cnt, cur_str_idx = re_string_cur_idx (&mctx->input); unsigned int context; *err = re_node_set_alloc (&next_nodes, state->nodes.nelem + 1); if (BE (*err != REG_NOERROR, 0)) return NULL; for (node_cnt = 0; node_cnt < state->nodes.nelem; ++node_cnt) { int cur_node = state->nodes.elems[node_cnt]; if (check_node_accept (mctx, dfa->nodes + cur_node, cur_str_idx)) { *err = re_node_set_merge (&next_nodes, dfa->eclosures + dfa->nexts[cur_node]); if (BE (*err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return NULL; } } } context = re_string_context_at (&mctx->input, cur_str_idx, mctx->eflags); next_state = re_acquire_state_context (err, dfa, &next_nodes, context); /* We don't need to check errors here, since the return value of this function is next_state and ERR is already set. */ re_node_set_free (&next_nodes); re_string_skip_bytes (&mctx->input, 1); return next_state; } #endif #ifdef RE_ENABLE_I18N static reg_errcode_t transit_state_mb (mctx, pstate) re_match_context_t *mctx; re_dfastate_t *pstate; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int i; for (i = 0; i < pstate->nodes.nelem; ++i) { re_node_set dest_nodes, *new_nodes; int cur_node_idx = pstate->nodes.elems[i]; int naccepted = 0, dest_idx; unsigned int context; re_dfastate_t *dest_state; if (dfa->nodes[cur_node_idx].constraint) { context = re_string_context_at (&mctx->input, re_string_cur_idx (&mctx->input), mctx->eflags); if (NOT_SATISFY_NEXT_CONSTRAINT (dfa->nodes[cur_node_idx].constraint, context)) continue; } /* How many bytes the node can accept? */ if (ACCEPT_MB_NODE (dfa->nodes[cur_node_idx].type)) naccepted = check_node_accept_bytes (dfa, cur_node_idx, &mctx->input, re_string_cur_idx (&mctx->input)); if (naccepted == 0) continue; /* The node can accepts `naccepted' bytes. */ dest_idx = re_string_cur_idx (&mctx->input) + naccepted; mctx->max_mb_elem_len = ((mctx->max_mb_elem_len < naccepted) ? naccepted : mctx->max_mb_elem_len); err = clean_state_log_if_needed (mctx, dest_idx); if (BE (err != REG_NOERROR, 0)) return err; #ifdef DEBUG assert (dfa->nexts[cur_node_idx] != -1); #endif /* `cur_node_idx' may point the entity of the OP_CONTEXT_NODE, then we use pstate->nodes.elems[i] instead. */ new_nodes = dfa->eclosures + dfa->nexts[pstate->nodes.elems[i]]; dest_state = mctx->state_log[dest_idx]; if (dest_state == NULL) dest_nodes = *new_nodes; else { err = re_node_set_init_union (&dest_nodes, dest_state->entrance_nodes, new_nodes); if (BE (err != REG_NOERROR, 0)) return err; } context = re_string_context_at (&mctx->input, dest_idx - 1, mctx->eflags); mctx->state_log[dest_idx] = re_acquire_state_context (&err, dfa, &dest_nodes, context); if (dest_state != NULL) re_node_set_free (&dest_nodes); if (BE (mctx->state_log[dest_idx] == NULL && err != REG_NOERROR, 0)) return err; } return REG_NOERROR; } #endif /* RE_ENABLE_I18N */ static reg_errcode_t transit_state_bkref (mctx, nodes) re_match_context_t *mctx; const re_node_set *nodes; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int i; int cur_str_idx = re_string_cur_idx (&mctx->input); for (i = 0; i < nodes->nelem; ++i) { int dest_str_idx, prev_nelem, bkc_idx; int node_idx = nodes->elems[i]; unsigned int context; const re_token_t *node = dfa->nodes + node_idx; re_node_set *new_dest_nodes; /* Check whether `node' is a backreference or not. */ if (node->type != OP_BACK_REF) continue; if (node->constraint) { context = re_string_context_at (&mctx->input, cur_str_idx, mctx->eflags); if (NOT_SATISFY_NEXT_CONSTRAINT (node->constraint, context)) continue; } /* `node' is a backreference. Check the substring which the substring matched. */ bkc_idx = mctx->nbkref_ents; err = get_subexp (mctx, node_idx, cur_str_idx); if (BE (err != REG_NOERROR, 0)) goto free_return; /* And add the epsilon closures (which is `new_dest_nodes') of the backreference to appropriate state_log. */ #ifdef DEBUG assert (dfa->nexts[node_idx] != -1); #endif for (; bkc_idx < mctx->nbkref_ents; ++bkc_idx) { int subexp_len; re_dfastate_t *dest_state; struct re_backref_cache_entry *bkref_ent; bkref_ent = mctx->bkref_ents + bkc_idx; if (bkref_ent->node != node_idx || bkref_ent->str_idx != cur_str_idx) continue; subexp_len = bkref_ent->subexp_to - bkref_ent->subexp_from; new_dest_nodes = (subexp_len == 0 ? dfa->eclosures + dfa->edests[node_idx].elems[0] : dfa->eclosures + dfa->nexts[node_idx]); dest_str_idx = (cur_str_idx + bkref_ent->subexp_to - bkref_ent->subexp_from); context = re_string_context_at (&mctx->input, dest_str_idx - 1, mctx->eflags); dest_state = mctx->state_log[dest_str_idx]; prev_nelem = ((mctx->state_log[cur_str_idx] == NULL) ? 0 : mctx->state_log[cur_str_idx]->nodes.nelem); /* Add `new_dest_node' to state_log. */ if (dest_state == NULL) { mctx->state_log[dest_str_idx] = re_acquire_state_context (&err, dfa, new_dest_nodes, context); if (BE (mctx->state_log[dest_str_idx] == NULL && err != REG_NOERROR, 0)) goto free_return; } else { re_node_set dest_nodes; err = re_node_set_init_union (&dest_nodes, dest_state->entrance_nodes, new_dest_nodes); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&dest_nodes); goto free_return; } mctx->state_log[dest_str_idx] = re_acquire_state_context (&err, dfa, &dest_nodes, context); re_node_set_free (&dest_nodes); if (BE (mctx->state_log[dest_str_idx] == NULL && err != REG_NOERROR, 0)) goto free_return; } /* We need to check recursively if the backreference can epsilon transit. */ if (subexp_len == 0 && mctx->state_log[cur_str_idx]->nodes.nelem > prev_nelem) { err = check_subexp_matching_top (mctx, new_dest_nodes, cur_str_idx); if (BE (err != REG_NOERROR, 0)) goto free_return; err = transit_state_bkref (mctx, new_dest_nodes); if (BE (err != REG_NOERROR, 0)) goto free_return; } } } err = REG_NOERROR; free_return: return err; } /* Enumerate all the candidates which the backreference BKREF_NODE can match at BKREF_STR_IDX, and register them by match_ctx_add_entry(). Note that we might collect inappropriate candidates here. However, the cost of checking them strictly here is too high, then we delay these checking for prune_impossible_nodes(). */ static reg_errcode_t get_subexp (mctx, bkref_node, bkref_str_idx) re_match_context_t *mctx; int bkref_node, bkref_str_idx; { re_dfa_t *const dfa = mctx->dfa; int subexp_num, sub_top_idx; const char *buf = (const char *) re_string_get_buffer (&mctx->input); /* Return if we have already checked BKREF_NODE at BKREF_STR_IDX. */ int cache_idx = search_cur_bkref_entry (mctx, bkref_str_idx); if (cache_idx != -1) { const struct re_backref_cache_entry *entry = mctx->bkref_ents + cache_idx; do if (entry->node == bkref_node) return REG_NOERROR; /* We already checked it. */ while (entry++->more); } subexp_num = dfa->nodes[bkref_node].opr.idx; /* For each sub expression */ for (sub_top_idx = 0; sub_top_idx < mctx->nsub_tops; ++sub_top_idx) { reg_errcode_t err; re_sub_match_top_t *sub_top = mctx->sub_tops[sub_top_idx]; re_sub_match_last_t *sub_last; int sub_last_idx, sl_str, bkref_str_off; if (dfa->nodes[sub_top->node].opr.idx != subexp_num) continue; /* It isn't related. */ sl_str = sub_top->str_idx; bkref_str_off = bkref_str_idx; /* At first, check the last node of sub expressions we already evaluated. */ for (sub_last_idx = 0; sub_last_idx < sub_top->nlasts; ++sub_last_idx) { int sl_str_diff; sub_last = sub_top->lasts[sub_last_idx]; sl_str_diff = sub_last->str_idx - sl_str; /* The matched string by the sub expression match with the substring at the back reference? */ if (sl_str_diff > 0) { if (BE (bkref_str_off + sl_str_diff > mctx->input.valid_len, 0)) { /* Not enough chars for a successful match. */ if (bkref_str_off + sl_str_diff > mctx->input.len) break; err = clean_state_log_if_needed (mctx, bkref_str_off + sl_str_diff); if (BE (err != REG_NOERROR, 0)) return err; buf = (const char *) re_string_get_buffer (&mctx->input); } if (memcmp (buf + bkref_str_off, buf + sl_str, sl_str_diff) != 0) break; /* We don't need to search this sub expression any more. */ } bkref_str_off += sl_str_diff; sl_str += sl_str_diff; err = get_subexp_sub (mctx, sub_top, sub_last, bkref_node, bkref_str_idx); /* Reload buf, since the preceding call might have reallocated the buffer. */ buf = (const char *) re_string_get_buffer (&mctx->input); if (err == REG_NOMATCH) continue; if (BE (err != REG_NOERROR, 0)) return err; } if (sub_last_idx < sub_top->nlasts) continue; if (sub_last_idx > 0) ++sl_str; /* Then, search for the other last nodes of the sub expression. */ for (; sl_str <= bkref_str_idx; ++sl_str) { int cls_node, sl_str_off; const re_node_set *nodes; sl_str_off = sl_str - sub_top->str_idx; /* The matched string by the sub expression match with the substring at the back reference? */ if (sl_str_off > 0) { if (BE (bkref_str_off >= mctx->input.valid_len, 0)) { /* If we are at the end of the input, we cannot match. */ if (bkref_str_off >= mctx->input.len) break; err = extend_buffers (mctx); if (BE (err != REG_NOERROR, 0)) return err; buf = (const char *) re_string_get_buffer (&mctx->input); } if (buf [bkref_str_off++] != buf[sl_str - 1]) break; /* We don't need to search this sub expression any more. */ } if (mctx->state_log[sl_str] == NULL) continue; /* Does this state have a ')' of the sub expression? */ nodes = &mctx->state_log[sl_str]->nodes; cls_node = find_subexp_node (dfa, nodes, subexp_num, OP_CLOSE_SUBEXP); if (cls_node == -1) continue; /* No. */ if (sub_top->path == NULL) { sub_top->path = calloc (sizeof (state_array_t), sl_str - sub_top->str_idx + 1); if (sub_top->path == NULL) return REG_ESPACE; } /* Can the OP_OPEN_SUBEXP node arrive the OP_CLOSE_SUBEXP node in the current context? */ err = check_arrival (mctx, sub_top->path, sub_top->node, sub_top->str_idx, cls_node, sl_str, OP_CLOSE_SUBEXP); if (err == REG_NOMATCH) continue; if (BE (err != REG_NOERROR, 0)) return err; sub_last = match_ctx_add_sublast (sub_top, cls_node, sl_str); if (BE (sub_last == NULL, 0)) return REG_ESPACE; err = get_subexp_sub (mctx, sub_top, sub_last, bkref_node, bkref_str_idx); if (err == REG_NOMATCH) continue; } } return REG_NOERROR; } /* Helper functions for get_subexp(). */ /* Check SUB_LAST can arrive to the back reference BKREF_NODE at BKREF_STR. If it can arrive, register the sub expression expressed with SUB_TOP and SUB_LAST. */ static reg_errcode_t get_subexp_sub (mctx, sub_top, sub_last, bkref_node, bkref_str) re_match_context_t *mctx; const re_sub_match_top_t *sub_top; re_sub_match_last_t *sub_last; int bkref_node, bkref_str; { reg_errcode_t err; int to_idx; /* Can the subexpression arrive the back reference? */ err = check_arrival (mctx, &sub_last->path, sub_last->node, sub_last->str_idx, bkref_node, bkref_str, OP_OPEN_SUBEXP); if (err != REG_NOERROR) return err; err = match_ctx_add_entry (mctx, bkref_node, bkref_str, sub_top->str_idx, sub_last->str_idx); if (BE (err != REG_NOERROR, 0)) return err; to_idx = bkref_str + sub_last->str_idx - sub_top->str_idx; return clean_state_log_if_needed (mctx, to_idx); } /* Find the first node which is '(' or ')' and whose index is SUBEXP_IDX. Search '(' if FL_OPEN, or search ')' otherwise. TODO: This function isn't efficient... Because there might be more than one nodes whose types are OP_OPEN_SUBEXP and whose index is SUBEXP_IDX, we must check all nodes. E.g. RE: (a){2} */ static int find_subexp_node (dfa, nodes, subexp_idx, type) const re_dfa_t *dfa; const re_node_set *nodes; int subexp_idx, type; { int cls_idx; for (cls_idx = 0; cls_idx < nodes->nelem; ++cls_idx) { int cls_node = nodes->elems[cls_idx]; const re_token_t *node = dfa->nodes + cls_node; if (node->type == type && node->opr.idx == subexp_idx) return cls_node; } return -1; } /* Check whether the node TOP_NODE at TOP_STR can arrive to the node LAST_NODE at LAST_STR. We record the path onto PATH since it will be heavily reused. Return REG_NOERROR if it can arrive, or REG_NOMATCH otherwise. */ static reg_errcode_t check_arrival (mctx, path, top_node, top_str, last_node, last_str, type) re_match_context_t *mctx; state_array_t *path; int top_node, top_str, last_node, last_str, type; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int subexp_num, backup_cur_idx, str_idx, null_cnt; re_dfastate_t *cur_state = NULL; re_node_set *cur_nodes, next_nodes; re_dfastate_t **backup_state_log; unsigned int context; subexp_num = dfa->nodes[top_node].opr.idx; /* Extend the buffer if we need. */ if (BE (path->alloc < last_str + mctx->max_mb_elem_len + 1, 0)) { re_dfastate_t **new_array; int old_alloc = path->alloc; path->alloc += last_str + mctx->max_mb_elem_len + 1; new_array = re_realloc (path->array, re_dfastate_t *, path->alloc); if (new_array == NULL) { path->alloc = old_alloc; return REG_ESPACE; } path->array = new_array; memset (new_array + old_alloc, '\0', sizeof (re_dfastate_t *) * (path->alloc - old_alloc)); } str_idx = path->next_idx == 0 ? top_str : path->next_idx; /* Temporary modify MCTX. */ backup_state_log = mctx->state_log; backup_cur_idx = mctx->input.cur_idx; mctx->state_log = path->array; mctx->input.cur_idx = str_idx; /* Setup initial node set. */ context = re_string_context_at (&mctx->input, str_idx - 1, mctx->eflags); if (str_idx == top_str) { err = re_node_set_init_1 (&next_nodes, top_node); if (BE (err != REG_NOERROR, 0)) return err; err = check_arrival_expand_ecl (dfa, &next_nodes, subexp_num, type); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } } else { cur_state = mctx->state_log[str_idx]; if (cur_state && cur_state->has_backref) { err = re_node_set_init_copy (&next_nodes, &cur_state->nodes); if (BE ( err != REG_NOERROR, 0)) return err; } else re_node_set_init_empty (&next_nodes); } if (str_idx == top_str || (cur_state && cur_state->has_backref)) { if (next_nodes.nelem) { err = expand_bkref_cache (mctx, &next_nodes, str_idx, subexp_num, type); if (BE ( err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } } cur_state = re_acquire_state_context (&err, dfa, &next_nodes, context); if (BE (cur_state == NULL && err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } mctx->state_log[str_idx] = cur_state; } for (null_cnt = 0; str_idx < last_str && null_cnt <= mctx->max_mb_elem_len;) { re_node_set_empty (&next_nodes); if (mctx->state_log[str_idx + 1]) { err = re_node_set_merge (&next_nodes, &mctx->state_log[str_idx + 1]->nodes); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } } if (cur_state) { err = check_arrival_add_next_nodes (mctx, str_idx, &cur_state->non_eps_nodes, &next_nodes); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } } ++str_idx; if (next_nodes.nelem) { err = check_arrival_expand_ecl (dfa, &next_nodes, subexp_num, type); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } err = expand_bkref_cache (mctx, &next_nodes, str_idx, subexp_num, type); if (BE ( err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } } context = re_string_context_at (&mctx->input, str_idx - 1, mctx->eflags); cur_state = re_acquire_state_context (&err, dfa, &next_nodes, context); if (BE (cur_state == NULL && err != REG_NOERROR, 0)) { re_node_set_free (&next_nodes); return err; } mctx->state_log[str_idx] = cur_state; null_cnt = cur_state == NULL ? null_cnt + 1 : 0; } re_node_set_free (&next_nodes); cur_nodes = (mctx->state_log[last_str] == NULL ? NULL : &mctx->state_log[last_str]->nodes); path->next_idx = str_idx; /* Fix MCTX. */ mctx->state_log = backup_state_log; mctx->input.cur_idx = backup_cur_idx; /* Then check the current node set has the node LAST_NODE. */ if (cur_nodes != NULL && re_node_set_contains (cur_nodes, last_node)) return REG_NOERROR; return REG_NOMATCH; } /* Helper functions for check_arrival. */ /* Calculate the destination nodes of CUR_NODES at STR_IDX, and append them to NEXT_NODES. TODO: This function is similar to the functions transit_state*(), however this function has many additional works. Can't we unify them? */ static reg_errcode_t check_arrival_add_next_nodes (mctx, str_idx, cur_nodes, next_nodes) re_match_context_t *mctx; int str_idx; re_node_set *cur_nodes, *next_nodes; { re_dfa_t *const dfa = mctx->dfa; int result; int cur_idx; #ifdef RE_ENABLE_I18N reg_errcode_t err; #endif re_node_set union_set; re_node_set_init_empty (&union_set); for (cur_idx = 0; cur_idx < cur_nodes->nelem; ++cur_idx) { int naccepted = 0; int cur_node = cur_nodes->elems[cur_idx]; #if defined DEBUG || defined RE_ENABLE_I18N re_token_type_t type = dfa->nodes[cur_node].type; #endif #ifdef DEBUG assert (!IS_EPSILON_NODE (type)); #endif #ifdef RE_ENABLE_I18N /* If the node may accept `multi byte'. */ if (ACCEPT_MB_NODE (type)) { naccepted = check_node_accept_bytes (dfa, cur_node, &mctx->input, str_idx); if (naccepted > 1) { re_dfastate_t *dest_state; int next_node = dfa->nexts[cur_node]; int next_idx = str_idx + naccepted; dest_state = mctx->state_log[next_idx]; re_node_set_empty (&union_set); if (dest_state) { err = re_node_set_merge (&union_set, &dest_state->nodes); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&union_set); return err; } } result = re_node_set_insert (&union_set, next_node); if (BE (result < 0, 0)) { re_node_set_free (&union_set); return REG_ESPACE; } mctx->state_log[next_idx] = re_acquire_state (&err, dfa, &union_set); if (BE (mctx->state_log[next_idx] == NULL && err != REG_NOERROR, 0)) { re_node_set_free (&union_set); return err; } } } #endif /* RE_ENABLE_I18N */ if (naccepted || check_node_accept (mctx, dfa->nodes + cur_node, str_idx)) { result = re_node_set_insert (next_nodes, dfa->nexts[cur_node]); if (BE (result < 0, 0)) { re_node_set_free (&union_set); return REG_ESPACE; } } } re_node_set_free (&union_set); return REG_NOERROR; } /* For all the nodes in CUR_NODES, add the epsilon closures of them to CUR_NODES, however exclude the nodes which are: - inside the sub expression whose number is EX_SUBEXP, if FL_OPEN. - out of the sub expression whose number is EX_SUBEXP, if !FL_OPEN. */ static reg_errcode_t check_arrival_expand_ecl (dfa, cur_nodes, ex_subexp, type) re_dfa_t *dfa; re_node_set *cur_nodes; int ex_subexp, type; { reg_errcode_t err; int idx, outside_node; re_node_set new_nodes; #ifdef DEBUG assert (cur_nodes->nelem); #endif err = re_node_set_alloc (&new_nodes, cur_nodes->nelem); if (BE (err != REG_NOERROR, 0)) return err; /* Create a new node set NEW_NODES with the nodes which are epsilon closures of the node in CUR_NODES. */ for (idx = 0; idx < cur_nodes->nelem; ++idx) { int cur_node = cur_nodes->elems[idx]; re_node_set *eclosure = dfa->eclosures + cur_node; outside_node = find_subexp_node (dfa, eclosure, ex_subexp, type); if (outside_node == -1) { /* There are no problematic nodes, just merge them. */ err = re_node_set_merge (&new_nodes, eclosure); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&new_nodes); return err; } } else { /* There are problematic nodes, re-calculate incrementally. */ err = check_arrival_expand_ecl_sub (dfa, &new_nodes, cur_node, ex_subexp, type); if (BE (err != REG_NOERROR, 0)) { re_node_set_free (&new_nodes); return err; } } } re_node_set_free (cur_nodes); *cur_nodes = new_nodes; return REG_NOERROR; } /* Helper function for check_arrival_expand_ecl. Check incrementally the epsilon closure of TARGET, and if it isn't problematic append it to DST_NODES. */ static reg_errcode_t check_arrival_expand_ecl_sub (dfa, dst_nodes, target, ex_subexp, type) re_dfa_t *dfa; int target, ex_subexp, type; re_node_set *dst_nodes; { int cur_node; for (cur_node = target; !re_node_set_contains (dst_nodes, cur_node);) { int err; if (dfa->nodes[cur_node].type == type && dfa->nodes[cur_node].opr.idx == ex_subexp) { if (type == OP_CLOSE_SUBEXP) { err = re_node_set_insert (dst_nodes, cur_node); if (BE (err == -1, 0)) return REG_ESPACE; } break; } err = re_node_set_insert (dst_nodes, cur_node); if (BE (err == -1, 0)) return REG_ESPACE; if (dfa->edests[cur_node].nelem == 0) break; if (dfa->edests[cur_node].nelem == 2) { err = check_arrival_expand_ecl_sub (dfa, dst_nodes, dfa->edests[cur_node].elems[1], ex_subexp, type); if (BE (err != REG_NOERROR, 0)) return err; } cur_node = dfa->edests[cur_node].elems[0]; } return REG_NOERROR; } /* For all the back references in the current state, calculate the destination of the back references by the appropriate entry in MCTX->BKREF_ENTS. */ static reg_errcode_t expand_bkref_cache (mctx, cur_nodes, cur_str, subexp_num, type) re_match_context_t *mctx; int cur_str, subexp_num, type; re_node_set *cur_nodes; { re_dfa_t *const dfa = mctx->dfa; reg_errcode_t err; int cache_idx_start = search_cur_bkref_entry (mctx, cur_str); struct re_backref_cache_entry *ent; if (cache_idx_start == -1) return REG_NOERROR; restart: ent = mctx->bkref_ents + cache_idx_start; do { int to_idx, next_node; /* Is this entry ENT is appropriate? */ if (!re_node_set_contains (cur_nodes, ent->node)) continue; /* No. */ to_idx = cur_str + ent->subexp_to - ent->subexp_from; /* Calculate the destination of the back reference, and append it to MCTX->STATE_LOG. */ if (to_idx == cur_str) { /* The backreference did epsilon transit, we must re-check all the node in the current state. */ re_node_set new_dests; reg_errcode_t err2, err3; next_node = dfa->edests[ent->node].elems[0]; if (re_node_set_contains (cur_nodes, next_node)) continue; err = re_node_set_init_1 (&new_dests, next_node); err2 = check_arrival_expand_ecl (dfa, &new_dests, subexp_num, type); err3 = re_node_set_merge (cur_nodes, &new_dests); re_node_set_free (&new_dests); if (BE (err != REG_NOERROR || err2 != REG_NOERROR || err3 != REG_NOERROR, 0)) { err = (err != REG_NOERROR ? err : (err2 != REG_NOERROR ? err2 : err3)); return err; } /* TODO: It is still inefficient... */ goto restart; } else { re_node_set union_set; next_node = dfa->nexts[ent->node]; if (mctx->state_log[to_idx]) { int ret; if (re_node_set_contains (&mctx->state_log[to_idx]->nodes, next_node)) continue; err = re_node_set_init_copy (&union_set, &mctx->state_log[to_idx]->nodes); ret = re_node_set_insert (&union_set, next_node); if (BE (err != REG_NOERROR || ret < 0, 0)) { re_node_set_free (&union_set); err = err != REG_NOERROR ? err : REG_ESPACE; return err; } } else { err = re_node_set_init_1 (&union_set, next_node); if (BE (err != REG_NOERROR, 0)) return err; } mctx->state_log[to_idx] = re_acquire_state (&err, dfa, &union_set); re_node_set_free (&union_set); if (BE (mctx->state_log[to_idx] == NULL && err != REG_NOERROR, 0)) return err; } } while (ent++->more); return REG_NOERROR; } /* Build transition table for the state. Return the new table if succeeded, otherwise return NULL. */ static re_dfastate_t ** build_trtable (dfa, state) re_dfa_t *dfa; re_dfastate_t *state; { reg_errcode_t err; int i, j, ch; unsigned int elem, mask; int dests_node_malloced = 0, dest_states_malloced = 0; int ndests; /* Number of the destination states from `state'. */ re_dfastate_t **trtable; re_dfastate_t **dest_states = NULL, **dest_states_word, **dest_states_nl; re_node_set follows, *dests_node; bitset *dests_ch; bitset acceptable; /* We build DFA states which corresponds to the destination nodes from `state'. `dests_node[i]' represents the nodes which i-th destination state contains, and `dests_ch[i]' represents the characters which i-th destination state accepts. */ #ifdef _LIBC if (__libc_use_alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX)) dests_node = (re_node_set *) alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX); else #endif { dests_node = (re_node_set *) malloc ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX); if (BE (dests_node == NULL, 0)) return NULL; dests_node_malloced = 1; } dests_ch = (bitset *) (dests_node + SBC_MAX); /* Initialize transiton table. */ state->word_trtable = 0; /* At first, group all nodes belonging to `state' into several destinations. */ ndests = group_nodes_into_DFAstates (dfa, state, dests_node, dests_ch); if (BE (ndests <= 0, 0)) { if (dests_node_malloced) free (dests_node); /* Return NULL in case of an error, trtable otherwise. */ if (ndests == 0) { state->trtable = (re_dfastate_t **) calloc (sizeof (re_dfastate_t *), SBC_MAX);; return state->trtable; } return NULL; } err = re_node_set_alloc (&follows, ndests + 1); if (BE (err != REG_NOERROR, 0)) goto out_free; #ifdef _LIBC if (__libc_use_alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX + ndests * 3 * sizeof (re_dfastate_t *))) dest_states = (re_dfastate_t **) alloca (ndests * 3 * sizeof (re_dfastate_t *)); else #endif { dest_states = (re_dfastate_t **) malloc (ndests * 3 * sizeof (re_dfastate_t *)); if (BE (dest_states == NULL, 0)) { out_free: if (dest_states_malloced) free (dest_states); re_node_set_free (&follows); for (i = 0; i < ndests; ++i) re_node_set_free (dests_node + i); if (dests_node_malloced) free (dests_node); return NULL; } dest_states_malloced = 1; } dest_states_word = dest_states + ndests; dest_states_nl = dest_states_word + ndests; bitset_empty (acceptable); /* Then build the states for all destinations. */ for (i = 0; i < ndests; ++i) { int next_node; re_node_set_empty (&follows); /* Merge the follows of this destination states. */ for (j = 0; j < dests_node[i].nelem; ++j) { next_node = dfa->nexts[dests_node[i].elems[j]]; if (next_node != -1) { err = re_node_set_merge (&follows, dfa->eclosures + next_node); if (BE (err != REG_NOERROR, 0)) goto out_free; } } dest_states[i] = re_acquire_state_context (&err, dfa, &follows, 0); if (BE (dest_states[i] == NULL && err != REG_NOERROR, 0)) goto out_free; /* If the new state has context constraint, build appropriate states for these contexts. */ if (dest_states[i]->has_constraint) { dest_states_word[i] = re_acquire_state_context (&err, dfa, &follows, CONTEXT_WORD); if (BE (dest_states_word[i] == NULL && err != REG_NOERROR, 0)) goto out_free; if (dest_states[i] != dest_states_word[i] && dfa->mb_cur_max > 1) state->word_trtable = 1; dest_states_nl[i] = re_acquire_state_context (&err, dfa, &follows, CONTEXT_NEWLINE); if (BE (dest_states_nl[i] == NULL && err != REG_NOERROR, 0)) goto out_free; } else { dest_states_word[i] = dest_states[i]; dest_states_nl[i] = dest_states[i]; } bitset_merge (acceptable, dests_ch[i]); } if (!BE (state->word_trtable, 0)) { /* We don't care about whether the following character is a word character, or we are in a single-byte character set so we can discern by looking at the character code: allocate a 256-entry transition table. */ trtable = (re_dfastate_t **) calloc (sizeof (re_dfastate_t *), SBC_MAX); if (BE (trtable == NULL, 0)) goto out_free; /* For all characters ch...: */ for (i = 0; i < BITSET_UINTS; ++i) for (ch = i * UINT_BITS, elem = acceptable[i], mask = 1; elem; mask <<= 1, elem >>= 1, ++ch) if (BE (elem & 1, 0)) { /* There must be exactly one destination which accepts character ch. See group_nodes_into_DFAstates. */ for (j = 0; (dests_ch[j][i] & mask) == 0; ++j) ; /* j-th destination accepts the word character ch. */ if (dfa->word_char[i] & mask) trtable[ch] = dest_states_word[j]; else trtable[ch] = dest_states[j]; } } else { /* We care about whether the following character is a word character, and we are in a multi-byte character set: discern by looking at the character code: build two 256-entry transition tables, one starting at trtable[0] and one starting at trtable[SBC_MAX]. */ trtable = (re_dfastate_t **) calloc (sizeof (re_dfastate_t *), 2 * SBC_MAX); if (BE (trtable == NULL, 0)) goto out_free; /* For all characters ch...: */ for (i = 0; i < BITSET_UINTS; ++i) for (ch = i * UINT_BITS, elem = acceptable[i], mask = 1; elem; mask <<= 1, elem >>= 1, ++ch) if (BE (elem & 1, 0)) { /* There must be exactly one destination which accepts character ch. See group_nodes_into_DFAstates. */ for (j = 0; (dests_ch[j][i] & mask) == 0; ++j) ; /* j-th destination accepts the word character ch. */ trtable[ch] = dest_states[j]; trtable[ch + SBC_MAX] = dest_states_word[j]; } } /* new line */ if (bitset_contain (acceptable, NEWLINE_CHAR)) { /* The current state accepts newline character. */ for (j = 0; j < ndests; ++j) if (bitset_contain (dests_ch[j], NEWLINE_CHAR)) { /* k-th destination accepts newline character. */ trtable[NEWLINE_CHAR] = dest_states_nl[j]; if (state->word_trtable) trtable[NEWLINE_CHAR + SBC_MAX] = dest_states_nl[j]; /* There must be only one destination which accepts newline. See group_nodes_into_DFAstates. */ break; } } if (dest_states_malloced) free (dest_states); re_node_set_free (&follows); for (i = 0; i < ndests; ++i) re_node_set_free (dests_node + i); if (dests_node_malloced) free (dests_node); state->trtable = trtable; return trtable; } /* Group all nodes belonging to STATE into several destinations. Then for all destinations, set the nodes belonging to the destination to DESTS_NODE[i] and set the characters accepted by the destination to DEST_CH[i]. This function return the number of destinations. */ static int group_nodes_into_DFAstates (dfa, state, dests_node, dests_ch) re_dfa_t *dfa; const re_dfastate_t *state; re_node_set *dests_node; bitset *dests_ch; { reg_errcode_t err; int result; int i, j, k; int ndests; /* Number of the destinations from `state'. */ bitset accepts; /* Characters a node can accept. */ const re_node_set *cur_nodes = &state->nodes; bitset_empty (accepts); ndests = 0; /* For all the nodes belonging to `state', */ for (i = 0; i < cur_nodes->nelem; ++i) { re_token_t *node = &dfa->nodes[cur_nodes->elems[i]]; re_token_type_t type = node->type; unsigned int constraint = node->constraint; /* Enumerate all single byte character this node can accept. */ if (type == CHARACTER) bitset_set (accepts, node->opr.c); else if (type == SIMPLE_BRACKET) { bitset_merge (accepts, node->opr.sbcset); } else if (type == OP_PERIOD) { #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) bitset_merge (accepts, dfa->sb_char); else #endif bitset_set_all (accepts); if (!(dfa->syntax & RE_DOT_NEWLINE)) bitset_clear (accepts, '\n'); if (dfa->syntax & RE_DOT_NOT_NULL) bitset_clear (accepts, '\0'); } #ifdef RE_ENABLE_I18N else if (type == OP_UTF8_PERIOD) { memset (accepts, 255, sizeof (unsigned int) * BITSET_UINTS / 2); if (!(dfa->syntax & RE_DOT_NEWLINE)) bitset_clear (accepts, '\n'); if (dfa->syntax & RE_DOT_NOT_NULL) bitset_clear (accepts, '\0'); } #endif else continue; /* Check the `accepts' and sift the characters which are not match it the context. */ if (constraint) { if (constraint & NEXT_NEWLINE_CONSTRAINT) { int accepts_newline = bitset_contain (accepts, NEWLINE_CHAR); bitset_empty (accepts); if (accepts_newline) bitset_set (accepts, NEWLINE_CHAR); else continue; } if (constraint & NEXT_ENDBUF_CONSTRAINT) { bitset_empty (accepts); continue; } if (constraint & NEXT_WORD_CONSTRAINT) { unsigned int any_set = 0; if (type == CHARACTER && !node->word_char) { bitset_empty (accepts); continue; } #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) for (j = 0; j < BITSET_UINTS; ++j) any_set |= (accepts[j] &= (dfa->word_char[j] | ~dfa->sb_char[j])); else #endif for (j = 0; j < BITSET_UINTS; ++j) any_set |= (accepts[j] &= dfa->word_char[j]); if (!any_set) continue; } if (constraint & NEXT_NOTWORD_CONSTRAINT) { unsigned int any_set = 0; if (type == CHARACTER && node->word_char) { bitset_empty (accepts); continue; } #ifdef RE_ENABLE_I18N if (dfa->mb_cur_max > 1) for (j = 0; j < BITSET_UINTS; ++j) any_set |= (accepts[j] &= ~(dfa->word_char[j] & dfa->sb_char[j])); else #endif for (j = 0; j < BITSET_UINTS; ++j) any_set |= (accepts[j] &= ~dfa->word_char[j]); if (!any_set) continue; } } /* Then divide `accepts' into DFA states, or create a new state. Above, we make sure that accepts is not empty. */ for (j = 0; j < ndests; ++j) { bitset intersec; /* Intersection sets, see below. */ bitset remains; /* Flags, see below. */ int has_intersec, not_subset, not_consumed; /* Optimization, skip if this state doesn't accept the character. */ if (type == CHARACTER && !bitset_contain (dests_ch[j], node->opr.c)) continue; /* Enumerate the intersection set of this state and `accepts'. */ has_intersec = 0; for (k = 0; k < BITSET_UINTS; ++k) has_intersec |= intersec[k] = accepts[k] & dests_ch[j][k]; /* And skip if the intersection set is empty. */ if (!has_intersec) continue; /* Then check if this state is a subset of `accepts'. */ not_subset = not_consumed = 0; for (k = 0; k < BITSET_UINTS; ++k) { not_subset |= remains[k] = ~accepts[k] & dests_ch[j][k]; not_consumed |= accepts[k] = accepts[k] & ~dests_ch[j][k]; } /* If this state isn't a subset of `accepts', create a new group state, which has the `remains'. */ if (not_subset) { bitset_copy (dests_ch[ndests], remains); bitset_copy (dests_ch[j], intersec); err = re_node_set_init_copy (dests_node + ndests, &dests_node[j]); if (BE (err != REG_NOERROR, 0)) goto error_return; ++ndests; } /* Put the position in the current group. */ result = re_node_set_insert (&dests_node[j], cur_nodes->elems[i]); if (BE (result < 0, 0)) goto error_return; /* If all characters are consumed, go to next node. */ if (!not_consumed) break; } /* Some characters remain, create a new group. */ if (j == ndests) { bitset_copy (dests_ch[ndests], accepts); err = re_node_set_init_1 (dests_node + ndests, cur_nodes->elems[i]); if (BE (err != REG_NOERROR, 0)) goto error_return; ++ndests; bitset_empty (accepts); } } return ndests; error_return: for (j = 0; j < ndests; ++j) re_node_set_free (dests_node + j); return -1; } #ifdef RE_ENABLE_I18N /* Check how many bytes the node `dfa->nodes[node_idx]' accepts. Return the number of the bytes the node accepts. STR_IDX is the current index of the input string. This function handles the nodes which can accept one character, or one collating element like '.', '[a-z]', opposite to the other nodes can only accept one byte. */ static int check_node_accept_bytes (dfa, node_idx, input, str_idx) re_dfa_t *dfa; int node_idx, str_idx; const re_string_t *input; { const re_token_t *node = dfa->nodes + node_idx; int char_len, elem_len; int i; if (BE (node->type == OP_UTF8_PERIOD, 0)) { unsigned char c = re_string_byte_at (input, str_idx), d; if (BE (c < 0xc2, 1)) return 0; if (str_idx + 2 > input->len) return 0; d = re_string_byte_at (input, str_idx + 1); if (c < 0xe0) return (d < 0x80 || d > 0xbf) ? 0 : 2; else if (c < 0xf0) { char_len = 3; if (c == 0xe0 && d < 0xa0) return 0; } else if (c < 0xf8) { char_len = 4; if (c == 0xf0 && d < 0x90) return 0; } else if (c < 0xfc) { char_len = 5; if (c == 0xf8 && d < 0x88) return 0; } else if (c < 0xfe) { char_len = 6; if (c == 0xfc && d < 0x84) return 0; } else return 0; if (str_idx + char_len > input->len) return 0; for (i = 1; i < char_len; ++i) { d = re_string_byte_at (input, str_idx + i); if (d < 0x80 || d > 0xbf) return 0; } return char_len; } char_len = re_string_char_size_at (input, str_idx); if (node->type == OP_PERIOD) { if (char_len <= 1) return 0; /* FIXME: I don't think this if is needed, as both '\n' and '\0' are char_len == 1. */ /* '.' accepts any one character except the following two cases. */ if ((!(dfa->syntax & RE_DOT_NEWLINE) && re_string_byte_at (input, str_idx) == '\n') || ((dfa->syntax & RE_DOT_NOT_NULL) && re_string_byte_at (input, str_idx) == '\0')) return 0; return char_len; } elem_len = re_string_elem_size_at (input, str_idx); if ((elem_len <= 1 && char_len <= 1) || char_len == 0) return 0; if (node->type == COMPLEX_BRACKET) { const re_charset_t *cset = node->opr.mbcset; # ifdef _LIBC const unsigned char *pin = ((char *) re_string_get_buffer (input) + str_idx); int j; uint32_t nrules; # endif /* _LIBC */ int match_len = 0; wchar_t wc = ((cset->nranges || cset->nchar_classes || cset->nmbchars) ? re_string_wchar_at (input, str_idx) : 0); /* match with multibyte character? */ for (i = 0; i < cset->nmbchars; ++i) if (wc == cset->mbchars[i]) { match_len = char_len; goto check_node_accept_bytes_match; } /* match with character_class? */ for (i = 0; i < cset->nchar_classes; ++i) { wctype_t wt = cset->char_classes[i]; if (__iswctype (wc, wt)) { match_len = char_len; goto check_node_accept_bytes_match; } } # ifdef _LIBC nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules != 0) { unsigned int in_collseq = 0; const int32_t *table, *indirect; const unsigned char *weights, *extra; const char *collseqwc; int32_t idx; /* This #include defines a local function! */ # include /* match with collating_symbol? */ if (cset->ncoll_syms) extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); for (i = 0; i < cset->ncoll_syms; ++i) { const unsigned char *coll_sym = extra + cset->coll_syms[i]; /* Compare the length of input collating element and the length of current collating element. */ if (*coll_sym != elem_len) continue; /* Compare each bytes. */ for (j = 0; j < *coll_sym; j++) if (pin[j] != coll_sym[1 + j]) break; if (j == *coll_sym) { /* Match if every bytes is equal. */ match_len = j; goto check_node_accept_bytes_match; } } if (cset->nranges) { if (elem_len <= char_len) { collseqwc = _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQWC); in_collseq = __collseq_table_lookup (collseqwc, wc); } else in_collseq = find_collation_sequence_value (pin, elem_len); } /* match with range expression? */ for (i = 0; i < cset->nranges; ++i) if (cset->range_starts[i] <= in_collseq && in_collseq <= cset->range_ends[i]) { match_len = elem_len; goto check_node_accept_bytes_match; } /* match with equivalence_class? */ if (cset->nequiv_classes) { const unsigned char *cp = pin; table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB); weights = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_WEIGHTMB); extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB); indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); idx = findidx (&cp); if (idx > 0) for (i = 0; i < cset->nequiv_classes; ++i) { int32_t equiv_class_idx = cset->equiv_classes[i]; size_t weight_len = weights[idx]; if (weight_len == weights[equiv_class_idx]) { int cnt = 0; while (cnt <= weight_len && (weights[equiv_class_idx + 1 + cnt] == weights[idx + 1 + cnt])) ++cnt; if (cnt > weight_len) { match_len = elem_len; goto check_node_accept_bytes_match; } } } } } else # endif /* _LIBC */ { /* match with range expression? */ #if __GNUC__ >= 2 wchar_t cmp_buf[] = {L'\0', L'\0', wc, L'\0', L'\0', L'\0'}; #else wchar_t cmp_buf[] = {L'\0', L'\0', L'\0', L'\0', L'\0', L'\0'}; cmp_buf[2] = wc; #endif for (i = 0; i < cset->nranges; ++i) { cmp_buf[0] = cset->range_starts[i]; cmp_buf[4] = cset->range_ends[i]; if (wcscoll (cmp_buf, cmp_buf + 2) <= 0 && wcscoll (cmp_buf + 2, cmp_buf + 4) <= 0) { match_len = char_len; goto check_node_accept_bytes_match; } } } check_node_accept_bytes_match: if (!cset->non_match) return match_len; else { if (match_len > 0) return 0; else return (elem_len > char_len) ? elem_len : char_len; } } return 0; } # ifdef _LIBC static unsigned int find_collation_sequence_value (mbs, mbs_len) const unsigned char *mbs; size_t mbs_len; { uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules == 0) { if (mbs_len == 1) { /* No valid character. Match it as a single byte character. */ const unsigned char *collseq = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB); return collseq[mbs[0]]; } return UINT_MAX; } else { int32_t idx; const unsigned char *extra = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB); int32_t extrasize = (const unsigned char *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_SYMB_EXTRAMB + 1) - extra; for (idx = 0; idx < extrasize;) { int mbs_cnt, found = 0; int32_t elem_mbs_len; /* Skip the name of collating element name. */ idx = idx + extra[idx] + 1; elem_mbs_len = extra[idx++]; if (mbs_len == elem_mbs_len) { for (mbs_cnt = 0; mbs_cnt < elem_mbs_len; ++mbs_cnt) if (extra[idx + mbs_cnt] != mbs[mbs_cnt]) break; if (mbs_cnt == elem_mbs_len) /* Found the entry. */ found = 1; } /* Skip the byte sequence of the collating element. */ idx += elem_mbs_len; /* Adjust for the alignment. */ idx = (idx + 3) & ~3; /* Skip the collation sequence value. */ idx += sizeof (uint32_t); /* Skip the wide char sequence of the collating element. */ idx = idx + sizeof (uint32_t) * (extra[idx] + 1); /* If we found the entry, return the sequence value. */ if (found) return *(uint32_t *) (extra + idx); /* Skip the collation sequence value. */ idx += sizeof (uint32_t); } return UINT_MAX; } } # endif /* _LIBC */ #endif /* RE_ENABLE_I18N */ /* Check whether the node accepts the byte which is IDX-th byte of the INPUT. */ static int check_node_accept (mctx, node, idx) const re_match_context_t *mctx; const re_token_t *node; int idx; { unsigned char ch; ch = re_string_byte_at (&mctx->input, idx); switch (node->type) { case CHARACTER: if (node->opr.c != ch) return 0; break; case SIMPLE_BRACKET: if (!bitset_contain (node->opr.sbcset, ch)) return 0; break; #ifdef RE_ENABLE_I18N case OP_UTF8_PERIOD: if (ch >= 0x80) return 0; /* FALLTHROUGH */ #endif case OP_PERIOD: if ((ch == '\n' && !(mctx->dfa->syntax & RE_DOT_NEWLINE)) || (ch == '\0' && (mctx->dfa->syntax & RE_DOT_NOT_NULL))) return 0; break; default: return 0; } if (node->constraint) { /* The node has constraints. Check whether the current context satisfies the constraints. */ unsigned int context = re_string_context_at (&mctx->input, idx, mctx->eflags); if (NOT_SATISFY_NEXT_CONSTRAINT (node->constraint, context)) return 0; } return 1; } /* Extend the buffers, if the buffers have run out. */ static reg_errcode_t extend_buffers (mctx) re_match_context_t *mctx; { reg_errcode_t ret; re_string_t *pstr = &mctx->input; /* Double the lengthes of the buffers. */ ret = re_string_realloc_buffers (pstr, pstr->bufs_len * 2); if (BE (ret != REG_NOERROR, 0)) return ret; if (mctx->state_log != NULL) { /* And double the length of state_log. */ /* XXX We have no indication of the size of this buffer. If this allocation fail we have no indication that the state_log array does not have the right size. */ re_dfastate_t **new_array = re_realloc (mctx->state_log, re_dfastate_t *, pstr->bufs_len + 1); if (BE (new_array == NULL, 0)) return REG_ESPACE; mctx->state_log = new_array; } /* Then reconstruct the buffers. */ if (pstr->icase) { #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) { ret = build_wcs_upper_buffer (pstr); if (BE (ret != REG_NOERROR, 0)) return ret; } else #endif /* RE_ENABLE_I18N */ build_upper_buffer (pstr); } else { #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) build_wcs_buffer (pstr); else #endif /* RE_ENABLE_I18N */ { if (pstr->trans != NULL) re_string_translate_buffer (pstr); } } return REG_NOERROR; } /* Functions for matching context. */ /* Initialize MCTX. */ static reg_errcode_t match_ctx_init (mctx, eflags, n) re_match_context_t *mctx; int eflags, n; { mctx->eflags = eflags; mctx->match_last = -1; if (n > 0) { mctx->bkref_ents = re_malloc (struct re_backref_cache_entry, n); mctx->sub_tops = re_malloc (re_sub_match_top_t *, n); if (BE (mctx->bkref_ents == NULL || mctx->sub_tops == NULL, 0)) return REG_ESPACE; } /* Already zero-ed by the caller. else mctx->bkref_ents = NULL; mctx->nbkref_ents = 0; mctx->nsub_tops = 0; */ mctx->abkref_ents = n; mctx->max_mb_elem_len = 1; mctx->asub_tops = n; return REG_NOERROR; } /* Clean the entries which depend on the current input in MCTX. This function must be invoked when the matcher changes the start index of the input, or changes the input string. */ static void match_ctx_clean (mctx) re_match_context_t *mctx; { int st_idx; for (st_idx = 0; st_idx < mctx->nsub_tops; ++st_idx) { int sl_idx; re_sub_match_top_t *top = mctx->sub_tops[st_idx]; for (sl_idx = 0; sl_idx < top->nlasts; ++sl_idx) { re_sub_match_last_t *last = top->lasts[sl_idx]; re_free (last->path.array); re_free (last); } re_free (top->lasts); if (top->path) { re_free (top->path->array); re_free (top->path); } free (top); } mctx->nsub_tops = 0; mctx->nbkref_ents = 0; } /* Free all the memory associated with MCTX. */ static void match_ctx_free (mctx) re_match_context_t *mctx; { /* First, free all the memory associated with MCTX->SUB_TOPS. */ match_ctx_clean (mctx); re_free (mctx->sub_tops); re_free (mctx->bkref_ents); } /* Add a new backreference entry to MCTX. Note that we assume that caller never call this function with duplicate entry, and call with STR_IDX which isn't smaller than any existing entry. */ static reg_errcode_t match_ctx_add_entry (mctx, node, str_idx, from, to) re_match_context_t *mctx; int node, str_idx, from, to; { if (mctx->nbkref_ents >= mctx->abkref_ents) { struct re_backref_cache_entry* new_entry; new_entry = re_realloc (mctx->bkref_ents, struct re_backref_cache_entry, mctx->abkref_ents * 2); if (BE (new_entry == NULL, 0)) { re_free (mctx->bkref_ents); return REG_ESPACE; } mctx->bkref_ents = new_entry; memset (mctx->bkref_ents + mctx->nbkref_ents, '\0', sizeof (struct re_backref_cache_entry) * mctx->abkref_ents); mctx->abkref_ents *= 2; } if (mctx->nbkref_ents > 0 && mctx->bkref_ents[mctx->nbkref_ents - 1].str_idx == str_idx) mctx->bkref_ents[mctx->nbkref_ents - 1].more = 1; mctx->bkref_ents[mctx->nbkref_ents].node = node; mctx->bkref_ents[mctx->nbkref_ents].str_idx = str_idx; mctx->bkref_ents[mctx->nbkref_ents].subexp_from = from; mctx->bkref_ents[mctx->nbkref_ents].subexp_to = to; /* This is a cache that saves negative results of check_dst_limits_calc_pos. If bit N is clear, means that this entry won't epsilon-transition to an OP_OPEN_SUBEXP or OP_CLOSE_SUBEXP for the N+1-th subexpression. If it is set, check_dst_limits_calc_pos_1 will recurse and try to find one such node. A backreference does not epsilon-transition unless it is empty, so set to all zeros if FROM != TO. */ mctx->bkref_ents[mctx->nbkref_ents].eps_reachable_subexps_map = (from == to ? ~0 : 0); mctx->bkref_ents[mctx->nbkref_ents++].more = 0; if (mctx->max_mb_elem_len < to - from) mctx->max_mb_elem_len = to - from; return REG_NOERROR; } /* Search for the first entry which has the same str_idx, or -1 if none is found. Note that MCTX->BKREF_ENTS is already sorted by MCTX->STR_IDX. */ static int search_cur_bkref_entry (mctx, str_idx) re_match_context_t *mctx; int str_idx; { int left, right, mid, last; last = right = mctx->nbkref_ents; for (left = 0; left < right;) { mid = (left + right) / 2; if (mctx->bkref_ents[mid].str_idx < str_idx) left = mid + 1; else right = mid; } if (left < last && mctx->bkref_ents[left].str_idx == str_idx) return left; else return -1; } /* Register the node NODE, whose type is OP_OPEN_SUBEXP, and which matches at STR_IDX. */ static reg_errcode_t match_ctx_add_subtop (mctx, node, str_idx) re_match_context_t *mctx; int node, str_idx; { #ifdef DEBUG assert (mctx->sub_tops != NULL); assert (mctx->asub_tops > 0); #endif if (BE (mctx->nsub_tops == mctx->asub_tops, 0)) { int new_asub_tops = mctx->asub_tops * 2; re_sub_match_top_t **new_array = re_realloc (mctx->sub_tops, re_sub_match_top_t *, new_asub_tops); if (BE (new_array == NULL, 0)) return REG_ESPACE; mctx->sub_tops = new_array; mctx->asub_tops = new_asub_tops; } mctx->sub_tops[mctx->nsub_tops] = calloc (1, sizeof (re_sub_match_top_t)); if (BE (mctx->sub_tops[mctx->nsub_tops] == NULL, 0)) return REG_ESPACE; mctx->sub_tops[mctx->nsub_tops]->node = node; mctx->sub_tops[mctx->nsub_tops++]->str_idx = str_idx; return REG_NOERROR; } /* Register the node NODE, whose type is OP_CLOSE_SUBEXP, and which matches at STR_IDX, whose corresponding OP_OPEN_SUBEXP is SUB_TOP. */ static re_sub_match_last_t * match_ctx_add_sublast (subtop, node, str_idx) re_sub_match_top_t *subtop; int node, str_idx; { re_sub_match_last_t *new_entry; if (BE (subtop->nlasts == subtop->alasts, 0)) { int new_alasts = 2 * subtop->alasts + 1; re_sub_match_last_t **new_array = re_realloc (subtop->lasts, re_sub_match_last_t *, new_alasts); if (BE (new_array == NULL, 0)) return NULL; subtop->lasts = new_array; subtop->alasts = new_alasts; } new_entry = calloc (1, sizeof (re_sub_match_last_t)); if (BE (new_entry != NULL, 1)) { subtop->lasts[subtop->nlasts] = new_entry; new_entry->node = node; new_entry->str_idx = str_idx; ++subtop->nlasts; } return new_entry; } static void sift_ctx_init (sctx, sifted_sts, limited_sts, last_node, last_str_idx) re_sift_context_t *sctx; re_dfastate_t **sifted_sts, **limited_sts; int last_node, last_str_idx; { sctx->sifted_states = sifted_sts; sctx->limited_states = limited_sts; sctx->last_node = last_node; sctx->last_str_idx = last_str_idx; re_node_set_init_empty (&sctx->limits); } Yeti-6.4.0/regex/yeti_regex.c000066400000000000000000001031071253351442600160640ustar00rootroot00000000000000/* * yeti_regex.c -- * * Regular expressions for Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1998, 1999, 2002, 2015: Éric Thiébaut * * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include /* POSIX says that must be included (by the caller) before . */ #include /* On some systems, limits.h sets RE_DUP_MAX to a lower value than GNU regex allows. Include it before , which correctly #undefs RE_DUP_MAX and sets it to the right value. */ #include /*---------------------------------------------------------------------------*/ /* DEFINITIONS FOR POSIX REGULAR EXPRESSION ROUTINES */ #if HAVE_REGEX /* Use external regular expression library. */ # include #else /* not HAVE_REGEX: use builtin regex from GLIBC */ # define HAVE_REGERROR 1 /* we de have regerror in builtin library */ # undef HAVE_ALLOCA_H # undef HAVE_CONFIG_H # undef HAVE_ISBLANK /* isblank() is a GNU extension */ # undef HAVE_LANGINFO_CODESET # undef HAVE_LANGINFO_H # undef HAVE_LIBINTL # undef HAVE_LIBINTL_H # undef HAVE_LOCALE_H # undef HAVE_MBRTOWC # undef HAVE_MEMPCPY /* mempcpy() is a GNU extension */ # undef HAVE_WCHAR_H # undef HAVE_WCRTOMB # undef HAVE_WCSCOLL # undef HAVE_WCTYPE_H # ifdef _AIX # pragma alloca # else # ifndef allocax /* predefined by HP cc +Olibcalls */ # ifdef __GNUC__ # define alloca(size) __builtin_alloca (size) # else # if HAVE_ALLOCA_H # include # else # ifdef __hpux void *alloca (); # else # if !defined __OS2__ && !defined WIN32 char *alloca (); # else # include /* OS/2 defines alloca in here */ # endif # endif # endif # endif # endif # endif /* We have to keep the namespace clean. */ # define __re_error_msgid yt___re_error_msgid # define __re_error_msgid_idx yt___re_error_msgid_idx # define re_comp yt_re_comp # define re_compile_fastmap yt_re_compile_fastmap # define re_compile_pattern yt_re_compile_pattern # define re_exec yt_re_exec # define re_match yt_re_match # define re_match_2 yt_re_match_2 # define re_search yt_re_search # define re_search_2 yt_re_search_2 # define re_set_registers yt_re_set_registers # define re_set_syntax yt_re_set_syntax # define re_syntax_options yt_re_syntax_options # define regcomp yt_regcomp # define regerror yt_regerror # define regexec yt_regexec # define regfree yt_regfree # include "./glibc/regex.h" # include "./glibc/regex_internal.h" # include "./glibc/regex_internal.c" # include "./glibc/regcomp.c" # include "./glibc/regexec.c" # if defined(_LIBC) # error macro _LIBC defined # endif #endif /* HAVE_REGEX */ /*---------------------------------------------------------------------------*/ #include "pstdlib.h" #include "ydata.h" #include "yio.h" /* Macro to get rid of some GCC extensions when not compiling with GCC. */ #if ! (defined(__GNUC__) && __GNUC__ > 1) # undef __attribute__ # define __attribute__(x) /* empty */ #endif # define p_strfree p_free /* usage: p_strfree(STR) */ # define p_stralloc(LEN) p_malloc((LEN)+1) /* usage: p_stralloc(LEN) */ /* Redefine definition of YError to avoid GCC warnings (about uninitialized variables or reaching end of non-void function): */ extern void YError(const char *msg) __attribute__ ((noreturn)); /*---------------------------------------------------------------------------*/ /* MISCELLANEOUS PRIVATE ROUTINES */ #define MY_ROUND_UP(a,b) ((((a)+(b)-1)/(b))*(b)) typedef struct ws ws_t; struct ws { /* Common part of all Yorick's DataBlocks: */ int references; /* reference counter */ Operations *ops; /* virtual function table */ }; static void FreeWS(void *addr); static UnaryOp PrintWS; extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; extern MemberOp GetMemberX; Operations wsOps= { &FreeWS, T_OPAQUE, 0, /* promoteID= */ T_STRING /* means illegal */, "workspace", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalX, &SetupX, &GetMemberX, &MatMultX, &PrintWS }; /* FreeRE is automatically called by Yorick to delete a regex data block that is no more referenced. */ static void FreeWS(void *addr) { p_free(addr); } /* PrintRE is used by Yorick's info command. */ static void PrintWS(Operand *op) { ForceNewline(); PrintFunc("object of type: workspace"); ForceNewline(); } static void *my_push_workspace(size_t nbytes) { /* EXTRA is the number of bytes needed to store DataBlock header rounded up to the size of a double (to avoid alignment errors). */ const size_t extra = MY_ROUND_UP(sizeof(ws_t), sizeof(double)); ws_t *ws = p_malloc(nbytes + extra); ws->references = 0; ws->ops = &wsOps; return (void *)((char *)PushDataBlock(ws) + extra); } /* tmpDims is a global temporary for Dimension lists under construction -- you should always use it, then just leave your garbage there when you are done for the next guy to clean up -- your part of the perpetual cleanup comes first. */ static void my_reset_dims(void) { if (tmpDims) { Dimension *dims = tmpDims; tmpDims = NULL; FreeDimension(dims); } } #define MY_APPEND_DIMENSION(number, origin) \ (tmpDims = NewDimension(number, origin, tmpDims)) /*----- Append a new temporary dimension (i.e. insert dimension before tmpDims) and return tmpDims. */ /* Suffixes in Yorick: * C = char * S = short * I = int * L = long * F = float * D = double * Z = complex * Q = string * P = pointer */ #define MY_PUSH_NEW_ARRAY(SDEF, DIMS)((Array *)PushDataBlock(NewArray(SDEF, DIMS))) #define MY_PUSH_NEW_ARRAY_C(DIMS) MY_PUSH_NEW_ARRAY(&charStruct, DIMS) #define MY_PUSH_NEW_ARRAY_S(DIMS) MY_PUSH_NEW_ARRAY(&shortStruct, DIMS) #define MY_PUSH_NEW_ARRAY_I(DIMS) MY_PUSH_NEW_ARRAY(&intStruct, DIMS) #define MY_PUSH_NEW_ARRAY_L(DIMS) MY_PUSH_NEW_ARRAY(&longStruct, DIMS) #define MY_PUSH_NEW_ARRAY_F(DIMS) MY_PUSH_NEW_ARRAY(&floatStruct, DIMS) #define MY_PUSH_NEW_ARRAY_D(DIMS) MY_PUSH_NEW_ARRAY(&doubleStruct, DIMS) #define MY_PUSH_NEW_ARRAY_Z(DIMS) MY_PUSH_NEW_ARRAY(&complexStruct, DIMS) #define MY_PUSH_NEW_ARRAY_Q(DIMS) MY_PUSH_NEW_ARRAY(&stringStruct, DIMS) #define MY_PUSH_NEW_ARRAY_P(DIMS) MY_PUSH_NEW_ARRAY(&pointerStruct, DIMS) /*----- These macros allocate a new Yorick array with dimension list DIMS, push it on top of the stack and return the address of the array structure. There must be an element left on top of the stack to store the new array. See also: MY_PUSH_NEW_A. */ #define MY_PUSH_NEW_A(SDEF, DIMS, MEMBER) (MY_PUSH_NEW_ARRAY(SDEF, DIMS)->value.MEMBER) #define MY_PUSH_NEW_C(DIMS) MY_PUSH_NEW_A(&charStruct, DIMS, c) #define MY_PUSH_NEW_S(DIMS) MY_PUSH_NEW_A(&shortStruct, DIMS, s) #define MY_PUSH_NEW_I(DIMS) MY_PUSH_NEW_A(&intStruct, DIMS, i) #define MY_PUSH_NEW_L(DIMS) MY_PUSH_NEW_A(&longStruct, DIMS, l) #define MY_PUSH_NEW_F(DIMS) MY_PUSH_NEW_A(&floatStruct, DIMS, f) #define MY_PUSH_NEW_D(DIMS) MY_PUSH_NEW_A(&doubleStruct, DIMS, d) #define MY_PUSH_NEW_Z(DIMS) MY_PUSH_NEW_A(&complexStruct, DIMS, d) #define MY_PUSH_NEW_Q(DIMS) MY_PUSH_NEW_A(&stringStruct, DIMS, q) #define MY_PUSH_NEW_P(DIMS) MY_PUSH_NEW_A(&pointerStruct, DIMS, p) /*----- These macros allocate a new Yorick array with dimension list DIMS, push it on top of the stack and return the base address of the array contents. See MY_PUSH_NEW_ARRAY for side effects and restrictions. */ static char *my_strncpy(const char *s, size_t len) { if (s) { char *t = p_stralloc(len); memcpy(t, s, len); t[len] = '\0'; return t; } return (char *)0; } static int my_get_boolean(Symbol *s) { if (s->ops == &referenceSym) s = &globTab[s->index]; if (s->ops == &intScalar) return (s->value.i != 0); if (s->ops == &longScalar) return (s->value.l != 0L); if (s->ops == &doubleScalar) return (s->value.d != 0.0); if (s->ops == &dataBlockSym) { Operand op; s->ops->FormOperand(s, &op); if (! op.type.dims) { switch (op.ops->typeID) { case T_CHAR: return (*(char *)op.value != 0); case T_SHORT: return (*(short *)op.value != 0); case T_INT: return (*(int *)op.value != 0); case T_LONG: return (*(long *)op.value != 0L); case T_FLOAT: return (*(float *)op.value != 0.0F); case T_DOUBLE: return (*(double *)op.value != 0.0); case T_COMPLEX:return (((double *)op.value)[0] != 0.0 || ((double *)op.value)[1] != 0.0); case T_STRING: return (op.value != NULL); case T_VOID: return 0; } } } YError("bad non-boolean argument"); return 0; /* avoid compiler warning */ } static void my_unknown_keyword(void) { YError("unrecognized keyword in builtin function call"); } /*---------------------------------------------------------------------------*/ /* SUPPORT FOR DYNAMIC STRING */ static char *ds_string = NULL; static size_t ds_size = 0; static size_t ds_length = 0; static void ds_reset(void); static void ds_free(void); static char *ds_copy(void); static void ds_append(const char *str, size_t len); static void ds_reset(void) { if (ds_string) ds_string[0] = 0; else ds_size = 0; ds_length = 0; } static void ds_free(void) { char *tmp = ds_string; ds_string = NULL; ds_length = 0; ds_size = 0; if (tmp) p_free(tmp); } static void ds_append(const char *str, size_t len) { if (len) { size_t newlen; if (! ds_string) ds_length = ds_size = 0; /* in case of interrupts... */ if ((newlen = ds_length + len) >= ds_size) { char *newstr, *oldstr = ds_string; size_t newsiz = 128; while (newlen >= newsiz) newsiz *= 2; newstr = p_malloc(newsiz); if (ds_length) memcpy(newstr, ds_string, ds_length); newstr[ds_length] = 0; ds_string = newstr; ds_size = newsiz; if (oldstr) p_free(oldstr); } ds_string[newlen] = 0; memcpy(ds_string + ds_length, str, len); ds_length = newlen; } } static char *ds_copy(void) { if (! ds_string) ds_length = ds_size = 0; /* in case of interrupts... */ return my_strncpy(ds_string, ds_length); } /*---------------------------------------------------------------------------*/ /* The regdb_t is a Yorick DataBlock that stores compiled regular expression -- when it is destroyed, all related resources are automatically freed. */ typedef struct regdb regdb_t; struct regdb { /* Common part of all Yorick's DataBlocks: */ int references; /* reference counter */ Operations *ops; /* virtual function table */ /* Specific part for this kind of object: */ int cflags; /* flags used to compile the regular expression */ regex_t regex; /* compiled regular expression */ }; static void FreeRE(void *addr); static UnaryOp PrintRE; extern PromoteOp PromXX; extern UnaryOp ToAnyX, NegateX, ComplementX, NotX, TrueX; extern BinaryOp AddX, SubtractX, MultiplyX, DivideX, ModuloX, PowerX; extern BinaryOp EqualX, NotEqualX, GreaterX, GreaterEQX; extern BinaryOp ShiftLX, ShiftRX, OrX, AndX, XorX; extern BinaryOp AssignX, MatMultX; extern UnaryOp EvalX, SetupX, PrintX; extern MemberOp GetMemberX; Operations regexOps= { &FreeRE, T_OPAQUE, 0, /* promoteID= */ T_STRING /* means illegal */, "regex", {&PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX, &PromXX}, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &ToAnyX, &NegateX, &ComplementX, &NotX, &TrueX, &AddX, &SubtractX, &MultiplyX, &DivideX, &ModuloX, &PowerX, &EqualX, &NotEqualX, &GreaterX, &GreaterEQX, &ShiftLX, &ShiftRX, &OrX, &AndX, &XorX, &AssignX, &EvalX, &SetupX, &GetMemberX, &MatMultX, &PrintRE }; /* FreeRE is automatically called by Yorick to delete a regex data block that is no more referenced. */ static void FreeRE(void *addr) { regfree(&(((regdb_t *)addr)->regex)); p_free(addr); } /* PrintRE is used by Yorick's info command. */ static void PrintRE(Operand *op) { regdb_t *re = (regdb_t *)op->value; char line[140]; int flag = 0, cflags = re->cflags; sprintf(line, "compiled regular expression (%d ref.): nsub=%d; flags=", re->references, (int)re->regex.re_nsub); if (! (cflags®_EXTENDED)) { strcat(line, "basic"); flag = 1; } if (cflags®_ICASE) { if (flag) strcat(line, "|"); strcat(line, "icase"); flag = 1; } if (cflags®_NOSUB) { if (flag) strcat(line, "|"); strcat(line, "nosub"); flag = 1; } if (cflags®_NEWLINE) { if (flag) strcat(line, "|"); strcat(line, "newline"); flag = 1; } if (! flag) strcat(line, ""); ForceNewline(); PrintFunc(line); ForceNewline(); } /*---------------------------------------------------------------------------*/ extern BuiltIn Y_regcomp, Y_regmatch, Y_regsub; static const char *regex_error_message(int errcode, const regex_t *preg); /* Return regular expression error message stored in a static buffer. */ #define DEFAULT_CFLAGS (REG_EXTENDED) static regdb_t *new_regdb(const char *regex, int cflags); /*----- Compile regular expression and return new data-block. */ static regdb_t *get_regdb(Symbol *stack, int cflags); /*----- Return compiled regular expression data-block for stack symbol STACK. If STACK is a scalar string, it gets compiled according to CFLAGS or with DEFAULT_CFLAGS if CFLAGS=-1 (see man page of regcomp); otherwise it must be a compiled regular expression compiled and CFLAGS must be -1. If STACK is a reference or a scalar string, it get replaced by the result. */ /*---------------------------------------------------------------------------*/ static long id_all = -1; static long id_basic = -1; static long id_icase = -1; static long id_indices = -1; static long id_newline = -1; static long id_nosub = -1; static long id_notbol = -1; static long id_noteol = -1; static long id_start = -1; static int first_time = 1; static void initialize(void) { id_all = Globalize("all", 3); id_basic = Globalize("basic", 5); id_icase = Globalize("icase", 5); id_indices = Globalize("indices", 7); id_newline = Globalize("newline", 7); id_nosub = Globalize("nosub", 5); id_notbol = Globalize("notbol", 6); id_noteol = Globalize("noteol", 6); id_start = Globalize("start", 5); } void Y_regcomp(int argc) { Symbol *stack, *regex = NULL; DataBlock *db; Array *array; int cflags=DEFAULT_CFLAGS; /* Initialize internals as needed. */ if (first_time) { initialize(); first_time = 0; } /* Parse arguments from first to last one. */ for (stack=sp-argc+1 ; stack<=sp ; ++stack) { if (stack->ops) { /* Normal argument. */ if (regex) goto badNArgs; regex = (stack->ops == &referenceSym) ? &globTab[stack->index] : stack; } else { /* Must be a keyword: sp[i] is for keyword and sp[i+1] for related value. */ long id = stack->index; ++stack; if (id == id_icase) { if (my_get_boolean(stack)) cflags |= REG_ICASE; } else if (id == id_newline) { if (my_get_boolean(stack)) cflags |= REG_NEWLINE; } else if (id == id_nosub) { if (my_get_boolean(stack)) cflags |= REG_NOSUB; } else if (id == id_basic) { if (my_get_boolean(stack)) cflags &= ~REG_EXTENDED; } else { my_unknown_keyword(); } } } if (! regex) { badNArgs: YError("regcomp takes exactly 1 non-keyword argument"); } if (regex->ops == &referenceSym) regex = &globTab[regex->index]; if (regex->ops != &dataBlockSym || (db = regex->value.db)->ops != &stringOps || (array = (Array *)db)->type.dims != NULL) { YError("expecting scalar string"); } db = (DataBlock *)new_regdb(array->value.q[0], cflags); PushDataBlock(db); } /*---------------------------------------------------------------------------*/ void Y_regmatch(int argc) { void **outPtr; /* array of pointers to outputs */ long *outIndex; /* array of output index */ long nmatch; /* number of required outputs */ regoff_t start_option=1; /* starting index in matching string */ regoff_t start=1; /* actual starting index in matching string */ regoff_t len; /* length of input string */ regmatch_t *pmatch; Dimension *dims = NULL; Symbol *stack, *last_arg, *regexSymbol= NULL; regdb_t *re; regex_t *regex; char **input=NULL, *str; int indices=0, eflags=0, tflags=DEFAULT_CFLAGS, cflags=-1; long i, number, j; int status, *match; /* Initialize internals as needed. */ if (first_time) { initialize(); first_time = 0; } /* First pass on argument list to parse keywords and figure out the number of outputs requested. */ last_arg = sp; nmatch = -2; for (stack = last_arg+1-argc ; stack <= last_arg ; ++stack) { if (stack->ops) { /* Normal argument. */ ++nmatch; } else { /* Keyword argument: sp[i] is for keyword name and sp[i+1] for its value. */ long id = stack->index; ++stack; if (id == id_icase) { if (my_get_boolean(stack)) tflags |= REG_ICASE; cflags = tflags; } else if (id == id_newline) { if (my_get_boolean(stack)) tflags |= REG_NEWLINE; cflags = tflags; } else if (id == id_nosub) { if (my_get_boolean(stack)) tflags |= REG_NOSUB; cflags = tflags; } else if (id == id_basic) { if (my_get_boolean(stack)) tflags &= ~REG_EXTENDED; cflags = tflags; } else if (id == id_notbol) { if (my_get_boolean(stack)) eflags |= REG_NOTBOL; } else if (id == id_noteol) { if (my_get_boolean(stack)) eflags |= REG_NOTEOL; } else if (id == id_indices) { indices = my_get_boolean(stack); } else if (id == id_start) { start_option = YGetInteger(stack); } else { my_unknown_keyword(); } } } if (nmatch < 0) YError("regmatch takes at least 2 non-keyword arguments"); /* Allocate enough workspace for outputs. */ if (nmatch > 0) { CheckStack(nmatch + 4); last_arg = sp; /* in case stack was relocated */ #define ALLOC_WS(PTR, NUMBER) PTR=my_push_workspace((NUMBER)*sizeof(*(PTR))) ALLOC_WS(outPtr, nmatch); ALLOC_WS(outIndex, nmatch); ALLOC_WS(pmatch, nmatch); /*cflags|= REG_EXTENDED;*/ #undef ALLOC_WS } else { outPtr = NULL; outIndex = NULL; pmatch = NULL; /*cflags|= REG_NOSUB;*/ } /* Parse non-keyword arguments from first to last one (must be done _after_ call to CheckStack). */ j = 0; for (stack = last_arg+1-argc ; stack <= last_arg ; ++stack) { if (stack->ops) { /* Normal argument. */ if (! regexSymbol) { regexSymbol = stack; } else if (! input) { input = YGet_Q(stack, 0, &dims); } else { outPtr[j] = stack; outIndex[j] = (stack->ops == &referenceSym) ? stack->index : -1; ++j; } } else { /* Keyword argument (skip it). */ ++stack; } } /* Get/compile regular expression. */ re = get_regdb(regexSymbol, cflags); regex = &re->regex; /* Push result on top of the stack (must be done *BEFORE* other outputs). */ match = MY_PUSH_NEW_I(dims); /* Prepare output arrays. */ number = TotalNumber(dims); if (indices) { Dimension *ptr; my_reset_dims(); MY_APPEND_DIMENSION(2, 1); for (ptr=dims ; ptr!=NULL ; ptr=ptr->next) { MY_APPEND_DIMENSION(ptr->number, ptr->origin); } for (j=0 ; j= 1) { if ((start = start_option) <= len) { str += start - 1; } else { str = NULL; } } else { if ((start = len - start_option) >= 1) { str += start - 1; } else { str = NULL; } } } if (! str || (status = regexec(regex, str, nmatch, pmatch, eflags)) == REG_NOMATCH) { /* No match. */ match[i] = 0; if (indices) { for (j=0 ; j pmatch[j].rm_so) { OUTPUT_Q(j, i) = my_strncpy(str + pmatch[j].rm_so, pmatch[j].rm_eo - pmatch[j].rm_so); } } } } else { YError(regex_error_message(status, regex)); } } # undef OUTPUT # undef OUTPUT_L # undef OUTPUT_Q /* Pop outputs in place (from last to first one) and left result on top of the stack. */ for (j=nmatch-1 ; j>=0 ; --j) { if (outIndex[j]<0) Drop(1); else PopTo(&globTab[outIndex[j]]); } } /*---------------------------------------------------------------------------*/ void Y_regsub(int argc) { regmatch_t *match; Symbol *stack, *regexSymbol=NULL; regdb_t *re; regex_t *regex; Dimension *dims= NULL; const char *substr= NULL, *src, *end; char **input= NULL, **output, *dst; int all=0, eflags=0, tflags=DEFAULT_CFLAGS, cflags=-1; long len, number, nsub, srclen, index; int i, j, c, status, nnodes, argnum; size_t part1; struct Node { /* This structure describes a node in the substitution string. An array of such nodes provides some sort of compiled version of the substitution string. */ const char *p; /* non-NULL means textual string NULL means index of sub-expression */ long l; /* length of textual string or index of sub-expression */ } *node; /* Initialize internals as needed. */ if (first_time) { initialize(); first_time = 0; } /* Parse arguments from first to last one. */ argnum = 0; for (stack=sp+1-argc ; stack<=sp ; ++stack) { if (stack->ops) { /* Normal argument. */ switch (++argnum) { case 1: regexSymbol = stack; break; case 2: input = YGet_Q(stack, 0, &dims); break; case 3: substr = YGetString(stack); break; default: goto badNArgs; } } else { /* Keyword argument: sp[i] is for keyword name and sp[i+1] for its value. */ long id = stack->index; ++stack; if (id == id_icase) { if (my_get_boolean(stack)) tflags |= REG_ICASE; cflags = tflags; } else if (id == id_newline) { if (my_get_boolean(stack)) tflags |= REG_NEWLINE; cflags = tflags; } else if (id == id_nosub) { if (my_get_boolean(stack)) tflags |= REG_NOSUB; cflags = tflags; } else if (id == id_basic) { if (my_get_boolean(stack)) tflags &= ~REG_EXTENDED; cflags = tflags; } else if (id == id_notbol) { if (my_get_boolean(stack)) eflags |= REG_NOTBOL; } else if (id == id_noteol) { if (my_get_boolean(stack)) eflags |= REG_NOTEOL; } else if (id == id_all) { all = my_get_boolean(stack); } else { my_unknown_keyword(); } } } if (argnum < 2 || argnum > 3) { badNArgs: YError("regsub takes 2 or 3 non-keyword arguments"); } /* Get/compile regular expression. */ re = get_regdb(regexSymbol, cflags); regex = &re->regex; /* Make sure the stack can holds 2 more items: a temporary workspace and the result of the call. */ CheckStack(2); /* Allocate workspace: NSUB+1 regmatch_t for MATCH array * + LEN struct Node for NODE array * + LEN+1 char for literal parts of SUBSTR. * Notes: 1. Allocate as many nodes as characters in SUBSTR since this is * the maximum possible number of nodes; furthermore the * allocation is done in one chunk. * 2. Round up sizes of different parts to avoid alignment errors. */ len = (substr ? strlen(substr) : 0); nsub = regex->re_nsub; /* number of subexpressions in compiled regex */ part1 = (nsub + 1)*sizeof(match[0]); part1 = MY_ROUND_UP(part1, sizeof(node[0])); match = my_push_workspace(part1 + len*(1 + sizeof(node[0])) + 1); node = (void *)((char *)match + part1); dst = (char *)(node + len); /* Compile substitution string SUBSTR by splitting it into nodes. */ nnodes = 0; if (len) { #define ADD_NODE(P,L) node[nnodes].p=(P); node[nnodes++].l=(L) int l = 0; for (i=0 ; ; ) { c = substr[i++]; #if 0 if (c == '&') { if (l) { ADD_NODE(dst,l); dst[l]=0; dst+=l+1; l=0; } ADD_NODE(NULL, 0); continue; } #endif if (c == '\\') { c = substr[i++]; index = c - '0'; if (index >= 0 && index <= 9) { if (index > nsub) YError("sub-expression index overreach number of sub-expressions"); if (l) { ADD_NODE(dst,l); dst[l]=0; dst+=l+1; l=0; } ADD_NODE(NULL, index); continue; } if (c == 0) YError("bad final backslash in substitution string"); } else if (c==0) { /* End of string. */ if (l) { ADD_NODE(dst,l); dst[l]=0; } break; } /* Literal character. */ dst[l++] = c; } #undef ADD_NODE } /* Allocate output string array and enough workspace. */ number = TotalNumber(dims); output = MY_PUSH_NEW_Q(dims); /* Match regular expression against input string(s). */ for (i=0 ; i 0) ds_append(src, len); /* Substitute each nodes. */ for (j=0 ; j match[index].rm_so) { ds_append(src + match[index].rm_so, match[index].rm_eo - match[index].rm_so); } } } /* Skip the part of the source string that matched the entire regular expression (advance source pointer by at least 1 character to avoid infinite loop). */ if (match[0].rm_eo > match[0].rm_so) src += match[0].rm_eo; else src += match[0].rm_so + 1; if (! all || src >= end) break; tflags |= REG_NOTBOL; /* since SRC is advanced, we are no longer at the beginning of the string. */ } /* Copy the tail of the source string that didn't match the regular expression. */ len = srclen - (src - input[i]); if (len > 0) ds_append(src, len); /* Stores the substituted string into the output array. */ output[i] = ds_copy(); } ds_free(); } /*---------------------------------------------------------------------------*/ static regdb_t *new_regdb(const char *regex, int cflags) { regdb_t *re; int status; if (! regex) YError("unexpected nil string"); re = p_malloc(sizeof(regdb_t)); re->references = 0; re->ops = ®exOps; re->cflags = cflags; status = regcomp(&re->regex, regex, cflags); if (status) { const char *msg = regex_error_message(status, &re->regex); FreeRE(re); YError(msg); } return re; } static regdb_t *get_regdb(Symbol *stack, int cflags) { Symbol *s = (stack->ops == &referenceSym) ? &globTab[stack->index] : stack; if (s->ops == &dataBlockSym) { DataBlock *db = s->value.db; if (db->ops == ®exOps) { if (cflags != -1) YError("attempt to modify flags in compiled regular expression"); if (s != stack) { /* Replace reference (we already know that S is a DataBlock). */ stack->value.db = Ref(db); stack->ops = &dataBlockSym; /* change ops only AFTER value updated */ } return (regdb_t *)db; } if (db->ops == &stringOps) { Array *array = (Array *)db; if (! array->type.dims) { /* Compile regular expression and store it into a new data block that is returned. */ regdb_t *re = new_regdb(array->value.q[0], (cflags == -1 ? DEFAULT_CFLAGS : cflags)); db = (stack->ops==&dataBlockSym) ? stack->value.db : NULL; stack->value.db = (DataBlock *)re; stack->ops = &dataBlockSym; if (db) Unref(db); return re; } } } YError("expecting scalar string or compiled regular expression"); #if ! (defined(__GNUC__) && __GNUC__ > 1) return NULL; /* This is to avoid compiler warnings. */ #endif } /*---------------------------------------------------------------------------*/ static const char *regex_error_message(int errcode, const regex_t *preg) { #if HAVE_REGERROR static char errbuf[128]; regerror(errcode, preg, errbuf, sizeof(errbuf)); return errbuf; #else /* HAVE_REGERROR */ switch (errcode) { #ifdef REG_BADRPT case REG_BADRPT: return "regex: Invalid use of repetition operators such as using `*' as the first character."; #endif /* REG_BADRPT */ #ifdef REG_BADBR case REG_BADBR: return "regex: Invalid use of back reference operator."; #endif /* REG_BADBR */ #ifdef REG_EBRACE case REG_EBRACE: return "regex: Un-matched brace interval operators."; #endif /* REG_EBRACE */ #ifdef REG_EBRACK case REG_EBRACK: return "regex: Un-matched bracket list operators."; #endif /* REG_EBRACK */ #ifdef REG_ERANGE case REG_ERANGE: return "regex: Invalid use of the range operator, eg. the ending point of the range occurs prior to the starting point."; #endif /* REG_ERANGE */ #ifdef REG_ECTYPE case REG_ECTYPE: return "regex: Unknown character class name."; #endif /* REG_ECTYPE */ #ifdef REG_EPAREN case REG_EPAREN: return "regex: Un-matched parenthesis group operators."; #endif /* REG_EPAREN */ #ifdef REG_ESUBREG case REG_ESUBREG: return "regex: Invalid back reference to a subexpression."; #endif /* REG_ESUBREG */ #ifdef REG_EEND case REG_EEND: return "regex: Non specific error."; #endif /* REG_EEND */ #ifdef REG_ESCAPE case REG_ESCAPE: return "regex: Invalid escape sequence."; #endif /* REG_ESCAPE */ #ifdef REG_BADPAT case REG_BADPAT: return "regex: Invalid use of pattern operators such as group or list"; #endif /* REG_BADPAT */ #ifdef REG_ESIZE case REG_ESIZE: return "regex: Compiled regular expression requires a pattern buffer larger than 64Kb."; #endif /* REG_ESIZE */ #ifdef REG_ESPACE case REG_ESPACE: return "The regex routines ran out of memory."; #endif /* REG_ESPACE */ default: return "regex: Unknown error."; } #endif /* HAVE_REGERROR */ } /* * Local Variables: * mode: C * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 79 * coding: utf-8 * End: */ Yeti-6.4.0/regex/yeti_regex.i000066400000000000000000000214471253351442600161000ustar00rootroot00000000000000/* * yeti_regex.i -- * * POSIX regular expressions for Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 1998, 1999, 2002, 2015: Éric Thiébaut * * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ /* load dynamic code */ if (is_func(plug_in)) plug_in, "yeti_regex"; extern regcomp; /* DOCUMENT regcomp(reg); Compile regular expression STR for further use. REG must be a scalar string. If keyword NEWLINE is true (non-nil and non-zero), then newlines are treated specifically (i.e. as if text lines separated by newlines were individual strings): - Match-any-character operators don't match a newline. - A non-matching list ([^...]) not containing a newline does not match a newline. - Match-beginning-of-line operator (^) matches the empty string immediately after a newline, regardless of whether keyword NOTBOL is true when the regular expression is used in regsub or regmatch. - Match-end-of-line operator ($) matches the empty string immediately before a newline, regardless of whether keyword NOTEOL is true when the regular expression is used. The default is to treat newline as any other character. If keyword BASIC is true (non-nil and non-zero), then POSIX Basic Regular Expression syntax is used; the default is to use POSIX Extended Regular Expression syntax. If keyword ICASE is true (non-nil and non-zero), the regular expression will be case insensitive. The default is to differentiate case. Keyword NOSUB can be set to a non-zero value if support for subexpression addressing of matches is not required (see regsub and regmatch). KEYWORDS: basic, icase, newline, nosub. SEE ALSO: regmatch, regsub. */ extern regmatch; /* DOCUMENT regmatch(reg, str); -or- regmatch(reg, str, match0, match1, match2, match3, ...); Test if regular expression REG matches string array STR. The result is an array of int's with same dimension list as STR and with elements set to 1/0 whether REG match or not the corresponding element in STR. REG can be a scalar string which get compiled at runtime and the same compilation keywords as those accepted by regcomp can be specified: BASIC, ICASE, NEWLINE and/or NOSUB. Otherwise, REG must be a pre-compiled regular expression (by regcomp) and no compilation keyword can be specified. Keyword START can be used to specify a starting index for the matching in the input string(s). If START is less or equal zero, then it is counted from the end of the input string(s): START=0 to start with the last character, START=-1 to start with the 2 last characters. The match is considered as a failure when the starting character is outside the string. The default is START=1 (start with the first character ot every input string). If keyword NOTBOL is true (non-nil and non-zero), the match-beginning-of-line operator (^) always fails to match. Similarly, if keyword NOTEOL is true, the match-end-of-line operator ($) always fails to match. These keywords may be used when different portions of a string are matched against a regular expression and the beginning/end of the string should not be interpreted as the beginning/end of the line (but see the keyword NEWLINE in regcomp). In the second form, matching patterns get stored in output variables MATCH0, MATCH1... MATCH0 will be filled by the part of STR that matched the complete regular expression REG; MATCHn (with n=1,2...), will contain the part of STR that matched the n-th parenthesized subexpression in REG. If keyword INDICES is true (non-nil and non-zero) the indices of matching (sub)expressions get stored into the MATCHn variables: MATCHn(1,..) and MATCHn(2,..) are the indices of the first and last+1 characters of the matching (sub)expressions -- it must be last+1 to allow for empty match. If keyword INDICE is false or omitted, the MATCHn variables will be string arrays with same dimension lists as STR. KEYWORDS: basic, icase, indices, newline, nosub, notbol, noteol. SEE ALSO: regcomp, , regmatch_part, regsub. */ extern regsub; /* DOCUMENT regsub(reg, str) -or- regsub(reg, str, sub) Substitute pattern(s) in STR matching regular expression REG by string SUB (which is the nil string by default). STR is an array of strings, and the result has the same dimension list as STR. Each string of STR is matched against regular expression REG and, if there is a match, the matching part is replaced by SUB (which must be a scalar string). If SUB contains a sequence "\n" (where n=0,...,9), then this sequence is replaced in the result by the n-th parenthesized subexpression of REG ("\0" stands for the whole matching part). Other backslash sequences "\c" (where c is any character) in SUB get replaced by the escaped character c. Beware that Yorick's parser interprets backslash sequences in literal strings, you may therefore have to write 2 backslashes. REG can be a scalar string which get compiled at runtime and the same compilation keywords as those accepted by regcomp can be specified: BASIC, ICASE, NEWLINE and/or NOSUB. Otherwise, REG must be a pre-compiled regular expression (by regcomp) and no compilation keyword can be specified. If keyword ALL is true (non-nil and non-zero), then all occurences of REG get substituted. Otherwise only the 1st occurence of REG get substituted. Of course substitution is performed for every element of STR. If keyword NOTBOL is true (non-nil and non-zero), the match-beginning-of-line operator (^) always fails to match. Similarly, if keyword NOTEOL is true, the match-end-of-line operator ($) always fails to match. These keywords may be used when different portions of a string are matched against a regular expression and the beginning/end of the string should not be interpreted as the beginning/end of the line (but see the keyword NEWLINE above). KEYWORDS: all, basic, icase, newline, nosub, notbol, noteol. SEE ALSO: regcomp, regmatch. */ func regmatch_part(s, i) /* DOCUMENT regmatch_part(str, idx); Get part of string STR indexed by IDX (which should be one of the outputs of regmatch when called with keyword INDICES=1). The result is an array of strings with same dimension list as STR. IDX must have one more dimension than STR, the first dimension of IDX must be 2 and the other must be identical to those of STR. SEE ALSO regmatch. */ { sp = array(string, dimsof(s)); i1 = i(1,); i2 = i(2,) - 1; i = where(i2 >= i1); n = numberof(i); for (k=1 ; k<=n ; ++k) { ik = i(k); sp(ik) = strpart(s(ik), i1(ik):i2(ik)); } return sp; } /* * Local Variables: * mode: Yorick * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 79 * coding: utf-8 * End: */ Yeti-6.4.0/tiff/000077500000000000000000000000001253351442600133705ustar00rootroot00000000000000Yeti-6.4.0/tiff/Makefile000066400000000000000000000037561253351442600150430ustar00rootroot00000000000000# these values filled in by yorick -batch make.i Y_MAKEDIR= Y_EXE= Y_EXE_PKGS= Y_EXE_HOME= Y_EXE_SITE= # ----------------------------------------------------- optimization flags # options for make command line, e.g.- make COPT=-g TGT=exe COPT=$(COPT_DEFAULT) TGT=$(DEFAULT_TGT) # ------------------------------------------------ macros for this package PKG_NAME=yeti_tiff PKG_I=yeti_tiff.i OBJS=yeti_tiff.o # change to give the executable a name other than yorick PKG_EXENAME=yorick # PKG_DEPLIBS=-Lsomedir -lsomelib for dependencies of this package PKG_DEPLIBS = -ltiff # set compiler (or rarely loader) flags specific to this package PKG_CFLAGS = -DHAVE_TIFF=1 PKG_LDFLAGS= # list of additional package names you want in PKG_EXENAME # (typically Y_EXE_PKGS should be first here) EXTRA_PKGS=$(Y_EXE_PKGS) # list of additional files for clean PKG_CLEAN= # autoload file for this package, if any PKG_I_START= # non-pkg.i include files for this package, if any PKG_I_EXTRA= # -------------------------------- standard targets and rules (in Makepkg) # set macros Makepkg uses in target and dependency names # DLL_TARGETS, LIB_TARGETS, EXE_TARGETS # are any additional targets (defined below) prerequisite to # the plugin library, archive library, and executable, respectively PKG_I_DEPS=$(PKG_I) Y_DISTMAKE=distmake include $(Y_MAKEDIR)/Make.cfg include $(Y_MAKEDIR)/Makepkg include $(Y_MAKEDIR)/Make$(TGT) # override macros Makepkg sets for rules and other macros # Y_HOME and Y_SITE in Make.cfg may not be correct (e.g.- relocatable) Y_HOME=$(Y_EXE_HOME) Y_SITE=$(Y_EXE_SITE) # reduce chance of yorick-1.5 corrupting this Makefile MAKE_TEMPLATE = protect-against-1.5 # ------------------------------------- targets and rules for this package # simple example: #myfunc.o: myapi.h # more complex example (also consider using PKG_CFLAGS above): #myfunc.o: myapi.h myfunc.c # $(CC) $(CPPFLAGS) $(CFLAGS) -DMY_SWITCH -o $@ -c myfunc.c # -------------------------------------------------------- end of Makefile Yeti-6.4.0/tiff/yeti_tiff.c000066400000000000000000001034401253351442600155200ustar00rootroot00000000000000/* * yeti_tiff.c - * * Implement support for TIFF images in Yorick. * *----------------------------------------------------------------------------- * * Copyright (C) 2003-2006, 2015: Éric Thiébaut * * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ #include #include #include #include #include "yapi.h" /*---------------------------------------------------------------------------*/ #define TEMPORARY_HACK 1 #if TEMPORARY_HACK # include "pstdlib.h" # include "yio.h" # warning "following code is temporary hack" /*typedef void ybuiltin_t(int argc);*/ #define y_free p_free #define y_malloc p_malloc #define y_strcpy p_strcpy /* y_expand_name returns NAME after expansion */ PLUG_API char *y_expand_name(const char *name); PLUG_API char *y_expand_name(const char *name) { return YExpandName(name); } PLUG_API void *ypush_a(int typeid, long *dims); void *ypush_a(int typeid, long *dims) { switch (typeid) { case Y_CHAR: return ypush_c(dims); case Y_SHORT: return ypush_s(dims); case Y_INT: return ypush_i(dims); case Y_LONG: return ypush_l(dims); case Y_FLOAT: return ypush_f(dims); case Y_DOUBLE: return ypush_d(dims); case Y_COMPLEX: return ypush_z(dims); case Y_STRING: return ypush_q(dims); case Y_POINTER: return ypush_p(dims); } y_error("(BUG) non-array type number"); return 0; } PLUG_API void ypush_string(const char *value); PLUG_API void ypush_string(const char *value) { ypush_q(NULL)[0] = (value ? p_strcpy((char *)value) : NULL); } /* better if there exists an yget_any (not specially array) */ PLUG_API int yarg_true(int iarg); PLUG_API int yarg_true(int iarg) { int typeid; long dims[Y_DIMSIZE]; void *ptr; typeid = yarg_typeid(iarg); if (typeid >= Y_CHAR && typeid <= Y_POINTER) { ptr = ygeta_any(iarg, NULL, dims, NULL); if (! dims[0]) { switch (typeid) { case Y_CHAR: return (*(char *)ptr != 0); case Y_SHORT: return (*(short *)ptr != 0); case Y_INT: return (*(int *)ptr != 0); case Y_LONG: return (*(long *)ptr != 0L); case Y_FLOAT: return (*(float *)ptr != 0.0F); case Y_DOUBLE: return (*(double *)ptr != 0.0); case Y_COMPLEX: return (((double *)ptr)[0] != 0.0 || ((double *)ptr)[1] != 0.0); case Y_STRING: return (*(ystring_t *)ptr != NULL); case Y_POINTER: return (*(ypointer_t *)ptr != NULL); } } } else if (typeid == Y_VOID) { return 0; } else { return 1; } y_error("bad non-boolean argument"); return 0; /* avoid compiler warning */ } #endif /* TEMPORARY_HACK */ /*---------------------------------------------------------------------------*/ /* PUBLIC ROUTINES */ extern ybuiltin_t Y_tiff_open; extern ybuiltin_t Y_tiff_read_directory; extern ybuiltin_t Y_tiff_read_image; extern ybuiltin_t Y_tiff_read_pixels; extern ybuiltin_t Y_tiff_debug; /*---------------------------------------------------------------------------*/ #ifndef HAVE_TIFF # define HAVE_TIFF 0 #endif #if HAVE_TIFF #include #include /*---------------------------------------------------------------------------*/ /* DATA TYPES */ typedef struct _tag tag_t; typedef struct _object object_t; /* PRIVATE DATA */ static char message[2048]; static int debug = 0; /* PRIVATE ROUTINES */ static void *push_workspace(long nbytes); static void load_pixels(TIFF *tiff); static void error_handler(const char* module, const char* fmt, va_list ap); static void warning_handler(const char* module, const char* fmt, va_list ap); static int cmapbits(unsigned int n, const uint16 r[], const uint16 g[], const uint16 b[]); static void missing_required_tag(const char *tagname); /*---------------------------------------------------------------------------*/ /* OPAQUE OBJECTS */ static void on_free(void *); static void on_print(void *); static void on_eval(void *, int); static void on_extract(void *, long); static object_t *get_object(int iarg); static y_userobj_t tiff_class = { "TIFF file handle", on_free, on_print, on_eval, on_extract, NULL }; struct _object { TIFF *handle; /* TIFF file handle */ char *path; /* full path */ char *mode; /* mode when TIFFOpen was called */ }; /* * The tags understood by libtiff, the number of parameter values, and the * expected types for the parameter values are shown below. The data types * are: char* is null-terminated string and corresponds to the ASCII data * type; uint16 is an unsigned 16-bit value; uint32 is an unsigned 32-bit * value; uint16* is an array of unsigned 16-bit values. void* is an array * of data values of unspecified type. * * Consult the TIFF specification for information on the meaning of each * tag. * * Tag Name Count Types Notes * TIFFTAG_ARTIST 1 char* * TIFFTAG_BADFAXLINES 1 uint32 * TIFFTAG_BITSPERSAMPLE 1 uint16 - * TIFFTAG_CLEANFAXDATA 1 uint16 * TIFFTAG_COLORMAP 3 uint16* 1< 0 * TIFFTAG_SAMPLEFORMAT 1 uint16 - * TIFFTAG_SAMPLESPERPIXEL 1 uint16 - value must be <= 4 * TIFFTAG_SMAXSAMPLEVALUE 1 double * TIFFTAG_SMINSAMPLEVALUE 1 double * TIFFTAG_SOFTWARE 1 char* * TIFFTAG_STONITS 1 double - * TIFFTAG_SUBFILETYPE 1 uint32 * TIFFTAG_SUBIFD 2 uint16,uint32* count & offsets array * TIFFTAG_TARGETPRINTER 1 char* * TIFFTAG_THRESHHOLDING 1 uint16 * TIFFTAG_TILEDEPTH 1 uint32 - * TIFFTAG_TILELENGTH 1 uint32 - must be a multiple of 8 * TIFFTAG_TILEWIDTH 1 uint32 - must be a multiple of 8 * TIFFTAG_TRANSFERFUNCTION 1 or 3 = uint16*1<>8; cmap += number; for (i=0 ; i>8; cmap += number; for (i=0 ; i>8; } } } #define TAG_UINT16(name, tag) {push_tag_uint16, name, tag, -1} #define TAG_UINT32(name, tag) {push_tag_uint32, name, tag, -1} #define TAG_INT(name, tag) {push_tag_int, name, tag, -1} #define TAG_FLOAT(name, tag) {push_tag_float, name, tag, -1} #define TAG_DOUBLE(name, tag) {push_tag_double, name, tag, -1} #define TAG_STRING(name, tag) {push_tag_string, name, tag, -1} #define TAG_COLORMAP(name, tag) {push_tag_colormap, name, tag, -1} static long filename_index = -1L; static long filemode_index = -1L; static tag_t tag_table[] = { TAG_STRING( "artist", TIFFTAG_ARTIST), TAG_UINT16( "bitspersample", TIFFTAG_BITSPERSAMPLE), TAG_UINT16( "cleanfaxdata", TIFFTAG_CLEANFAXDATA), TAG_COLORMAP( "colormap", TIFFTAG_COLORMAP), TAG_UINT16( "compression", TIFFTAG_COMPRESSION), TAG_UINT32( "consecutivebadfaxlines", TIFFTAG_CONSECUTIVEBADFAXLINES), TAG_STRING( "copyright", TIFFTAG_COPYRIGHT), TAG_UINT16( "datatype", TIFFTAG_DATATYPE), TAG_STRING( "datetime", TIFFTAG_DATETIME), TAG_STRING( "documentname", TIFFTAG_DOCUMENTNAME), TAG_INT( "faxmode", TIFFTAG_FAXMODE), TAG_UINT16( "fillorder", TIFFTAG_FILLORDER), TAG_UINT32( "group3options", TIFFTAG_GROUP3OPTIONS), TAG_UINT32( "group4options", TIFFTAG_GROUP4OPTIONS), TAG_STRING( "hostcomputer", TIFFTAG_HOSTCOMPUTER), TAG_UINT32( "imagedepth", TIFFTAG_IMAGEDEPTH), TAG_STRING( "imagedescription", TIFFTAG_IMAGEDESCRIPTION), TAG_UINT32( "imagelength", TIFFTAG_IMAGELENGTH), TAG_UINT32( "imagewidth", TIFFTAG_IMAGEWIDTH), TAG_STRING( "inknames", TIFFTAG_INKNAMES), TAG_UINT16( "inkset", TIFFTAG_INKSET), TAG_INT( "jpegquality", TIFFTAG_JPEGQUALITY), TAG_INT( "jpegcolormode", TIFFTAG_JPEGCOLORMODE), TAG_INT( "jpegtablesmode", TIFFTAG_JPEGTABLESMODE), TAG_STRING( "make", TIFFTAG_MAKE), TAG_UINT16( "matteing", TIFFTAG_MATTEING), TAG_UINT16( "maxsamplevalue", TIFFTAG_MAXSAMPLEVALUE), TAG_UINT16( "minsamplevalue", TIFFTAG_MINSAMPLEVALUE), TAG_STRING( "model", TIFFTAG_MODEL), TAG_UINT16( "orientation", TIFFTAG_ORIENTATION), TAG_STRING( "pagename", TIFFTAG_PAGENAME), /*TAG_UINT16( "pagenumber", TIFFTAG_PAGENUMBER),*/ TAG_UINT16( "photometric", TIFFTAG_PHOTOMETRIC), TAG_UINT16( "planarconfig", TIFFTAG_PLANARCONFIG), TAG_UINT16( "predictor", TIFFTAG_PREDICTOR), TAG_UINT16( "resolutionunit", TIFFTAG_RESOLUTIONUNIT), TAG_UINT32( "rowsperstrip", TIFFTAG_ROWSPERSTRIP), TAG_UINT16( "sampleformat", TIFFTAG_SAMPLEFORMAT), TAG_UINT16( "samplesperpixel", TIFFTAG_SAMPLESPERPIXEL), TAG_DOUBLE( "smaxsamplevalue", TIFFTAG_SMAXSAMPLEVALUE), TAG_DOUBLE( "sminsamplevalue", TIFFTAG_SMINSAMPLEVALUE), TAG_STRING( "software", TIFFTAG_SOFTWARE), TAG_FLOAT( "xposition", TIFFTAG_XPOSITION), TAG_FLOAT( "xresolution", TIFFTAG_XRESOLUTION), TAG_FLOAT( "yposition", TIFFTAG_YPOSITION), TAG_FLOAT( "yresolution", TIFFTAG_YRESOLUTION), TAG_DOUBLE( "stonits", TIFFTAG_STONITS), TAG_UINT32( "subfiletype", TIFFTAG_SUBFILETYPE), TAG_STRING( "targetprinter", TIFFTAG_TARGETPRINTER), TAG_UINT16( "threshholding", TIFFTAG_THRESHHOLDING), TAG_UINT32( "tiledepth", TIFFTAG_TILEDEPTH), TAG_UINT32( "tilelength", TIFFTAG_TILELENGTH), TAG_UINT32( "tilewidth", TIFFTAG_TILEWIDTH), TAG_UINT16( "ycbcrpositioning", TIFFTAG_YCBCRPOSITIONING), /*TAG_UINT16( "ycbcrsampling", TIFFTAG_YCBCRSAMPLING),*/ {0, 0, 0, 0}, }; #undef TAG_UINT16 #undef TAG_UINT32 #undef TAG_INT #undef TAG_FLOAT #undef TAG_DOUBLE #undef TAG_STRING #undef TAG_COLORMAP static void tag_error(const char *errmsg, const char *tagname); static void tag_error(const char *errmsg, const char *tagname) { if (tagname) { sprintf(message, "%s \"%.40s%s\"", errmsg, tagname, (strlen(tagname)>40 ? "..." : "")); errmsg = message; } y_error(errmsg); } static void push_tag(object_t *this, long index) { tag_t *entry; /* find TIFF tag entry in table */ if (index == filename_index) { ypush_string(this->path); } else if (index == filemode_index) { ypush_string(this->mode); } else { for (entry = tag_table ; entry->name ; ++entry) { if (entry->index == index) { entry->push(this->handle, entry->tag); return; } } tag_error("non-existing TIFF tag", yfind_name(index)); } } static void on_extract(void *addr, long index) { push_tag((object_t *)addr, index); } static void on_eval(void *addr, int argc) { long index; char *name; if (argc != 1) { y_error("expecting exactly one scalar string argument"); } name = ygets_q(argc - 1); if (name) { index = yfind_global(name, 0); } else { index = -1L; } push_tag((object_t *)addr, index); } static void error_handler(const char* module, const char* fmt, va_list ap) { char *ptr = message; strcpy(ptr, "TIFF"); if (module) { strcat(ptr, " ["); strcat(ptr, module); strcat(ptr, "]: "); } else { strcat(ptr, ": "); } vsprintf(ptr + strlen(ptr), fmt, ap); } static void warning_handler(const char* module, const char* fmt, va_list ap) { if (debug) { fputs("TIFF WARNING", stderr); if (module) { fputs(" [", stderr); fputs(module, stderr); fputs("]: ", stderr); } else { fputs(": ", stderr); } vfprintf(stderr, fmt, ap); fputs("\n", stderr); fflush(stderr); } } /* on_free is automatically called by Yorick to cleanup object instance data when object is no longer referenced */ static void on_free(void *addr) { object_t *this = (object_t *)addr; if (this->handle) TIFFClose(this->handle); if (this->path) p_free(this->path); if (this->mode) p_free(this->mode); } /* on_print is used by Yorick's info command */ static void on_print(void *addr) { object_t *this = (object_t *)addr; y_print(tiff_class.type_name, 0); y_print(": path=\"", 0); y_print(this->path, 0); y_print("\"", 1); } static void bad_arg_list(const char *function) { sprintf(message, "bad argument list to %s function", function); y_error(message); } void Y_tiff_debug(int argc) { int prev = debug; if (argc!=1) bad_arg_list("tiff_debug"); debug = yarg_true(0); ypush_int(prev); } void Y_tiff_open(int argc) { object_t *this; char *filename, *filemode; /* Initialization. */ if (filename_index < 0L) { tag_t *m; TIFFSetErrorHandler(error_handler); TIFFSetWarningHandler(warning_handler); for (m=tag_table ; m->name ; ++m) { m->index = yget_global(m->name, 0); } filemode_index = yget_global("filemode", 0); filename_index = yget_global("filename", 0); } message[0] = 0; if (argc<1 || argc>2) bad_arg_list("tiff_open"); filename = ygets_q(argc - 1); filemode = (argc >= 2 ? ygets_q(argc - 2) : "r"); /* Push new opaque object on the stack (which will be automatically destroyed in case of error). */ this = (object_t *)ypush_obj(&tiff_class, sizeof(object_t)); this->path = y_expand_name(filename); this->mode = y_strcpy(filemode); this->handle = TIFFOpen(this->path, filemode); if (! this->handle) y_error(message); } void Y_tiff_read_directory(int argc) { int result; if (argc != 1) bad_arg_list("tiff_read_directory"); message[0] = 0; result = TIFFReadDirectory(get_object(argc - 1)->handle); if (! result && message[0]) y_error(message); ypush_int(result); } void Y_tiff_read_image(int argc) { long dims[Y_DIMSIZE]; uint16 photometric, bitsPerSample; uint32 width, height, depth; void *raster; object_t *this; TIFF *tiff; int stopOnError; if (argc < 1 || argc > 2) bad_arg_list("tiff_read_image"); this = get_object(argc - 1); tiff = this->handle; stopOnError = (argc >= 2 ? yarg_true(argc - 2) : 0); /* Get TIFF information to figure out the image type. */ message[0] = 0; /* reset error message buffer */ if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_PHOTOMETRIC, &photometric)) missing_required_tag("photometric"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGEDEPTH, &depth)) missing_required_tag("depth"); if (depth != 1) y_error("TIFF depth != 1 not yet supported"); /* Push image on top of the stack. */ switch (photometric) { case PHOTOMETRIC_MINISWHITE: /* Grey and binary images. */ case PHOTOMETRIC_MINISBLACK: /* Grey and binary images. */ load_pixels(tiff); break; case PHOTOMETRIC_RGB: /* RGB Full color images */ case PHOTOMETRIC_PALETTE: /* palette color images */ /* Read RGBA image (do not stop on error). */ if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_BITSPERSAMPLE, &bitsPerSample)) missing_required_tag("bitsPerSample"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGEWIDTH, &width)) missing_required_tag("imageWidth"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGELENGTH, &height)) missing_required_tag("imageLength"); dims[0] = 3; dims[1] = 4; dims[2] = width; dims[3] = height; raster = ypush_c(dims); if (! TIFFReadRGBAImage(tiff, width, height, raster, stopOnError)) { if (! message[0]) strcpy(message, "TIFFReadRGBAImage failed to read complete image"); if (stopOnError) y_error(message); fprintf(stderr, "TIFF WARNING: %s\n", message); } break; default: y_error("unknown photometric in TIFF file"); } } void Y_tiff_read_pixels(int argc) { if (argc != 1) bad_arg_list("tiff_read_pixels"); load_pixels(get_object(argc - 1)->handle); } #if 0 void Y_tiff_SetField(int argc) { object_t *this; TIFF *tiff; if (argc < 1 && argc % 2 != 1) bad_arg_list("TIFFSetField"); this = get_object(argc - 1); tiff = this->handle; } #endif static void missing_required_tag(const char *tagname) { if (! message[0]) sprintf(message, "missing required TIFF tag \"%s\"", tagname); y_error(message); } #if 0 /* unused */ static void set_tag_read_only(object_t *this, tag_t *m, Symbol *s) { sprintf(message, "TIFF field \"%s\" is readonly", m->name); y_error(message); } #endif /* unused */ #if 0 /* unused */ static void cannot_set_member(tag_t *m); static void cannot_set_member(tag_t *m) { if (! message[0]) sprintf(message, "cannot set value of TIFF field \"%s\"", m->name); y_error(message); } #endif /* unused */ static int cmapbits(unsigned int n, const uint16 r[], const uint16 g[], const uint16 b[]) { unsigned int i; for (i=0 ; i= 256 || g[i] >= 256 || b[i] >= 256) return 16; } return 8; /* assume 8-bit colormap */ } static object_t *get_object(int iarg) { void *addr = yget_obj(iarg, &tiff_class); if (! addr) y_error("expecting TIFF object"); return (object_t *)addr; } /*---------------------------------------------------------------------------*/ static void *push_workspace(long nbytes) { long dims[2]; dims[0] = 1; dims[1] = nbytes; return ypush_c(dims); } /* * TIFFTAG_PHOTOMETRIC 262 // photometric interpretation * PHOTOMETRIC_MINISWHITE 0 // min value is white * PHOTOMETRIC_MINISBLACK 1 // min value is black * PHOTOMETRIC_RGB 2 // RGB color model * PHOTOMETRIC_PALETTE 3 // color map indexed * PHOTOMETRIC_MASK 4 // $holdout mask * PHOTOMETRIC_SEPARATED 5 // !color separations * PHOTOMETRIC_YCBCR 6 // !CCIR 601 * PHOTOMETRIC_CIELAB 8 // !1976 CIE L*a*b* * PHOTOMETRIC_ITULAB 10 // ITU L*a*b* * PHOTOMETRIC_LOGL 32844 // CIE Log2(L) * PHOTOMETRIC_LOGLUV 32845 // CIE Log2(L) (u',v') */ static void load_pixels(TIFF *tiff) { long dims[4]; void *buf, *raster; tstrip_t strip, numberOfStrips; uint16 orientation; uint16 sampleFormat; uint16 samplesPerPixel; uint16 bitsPerSample; uint16 planarConfig; uint16 photometric; uint32 width, height, depth; uint32 rowsPerStrip; uint32 stripSize; uint32 x, y, y1, y0, rowLength, rowSize; int single, complexData = 0, typeid; /* Get information about pixels organization. */ message[0] = 0; if (TIFFIsTiled(tiff)) y_error("reading of tiled TIFF images not yet implemented"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_PHOTOMETRIC, &photometric)) missing_required_tag("photometric"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_SAMPLESPERPIXEL, &samplesPerPixel)) missing_required_tag("samplesPerPixel"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_BITSPERSAMPLE, &bitsPerSample)) missing_required_tag("bitsPerSample"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGEWIDTH, &width)) missing_required_tag("imageWidth"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGELENGTH, &height)) missing_required_tag("imageLength"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_PLANARCONFIG, &planarConfig)) missing_required_tag("planarConfig"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_IMAGEDEPTH, &depth)) missing_required_tag("imageDepth"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_ROWSPERSTRIP, &rowsPerStrip)) missing_required_tag("rowsPerStrip"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_SAMPLEFORMAT, &sampleFormat)) missing_required_tag("sampleFormat"); if (! TIFFGetFieldDefaulted(tiff, TIFFTAG_ORIENTATION, &orientation)) missing_required_tag("orientation"); /* Figure out which data type to use for the result. */ typeid = Y_VOID; /* deliberately initialize with a non-array type */ if (sampleFormat == SAMPLEFORMAT_UINT || sampleFormat == SAMPLEFORMAT_INT) { if (bitsPerSample == 1 || bitsPerSample == 2 || bitsPerSample == 4 || bitsPerSample == 8) { typeid = Y_CHAR; } else if (bitsPerSample == 8*sizeof(long)) { typeid = Y_LONG; } else if (bitsPerSample == 8*sizeof(short)) { typeid = Y_SHORT; } else if (bitsPerSample == 8*sizeof(int)) { typeid = Y_INT; } } else if (sampleFormat == SAMPLEFORMAT_IEEEFP) { if (bitsPerSample == 8*sizeof(float)) { typeid = Y_FLOAT; } else if (bitsPerSample == 8*sizeof(double)) { typeid = Y_DOUBLE; } #ifdef SAMPLEFORMAT_COMPLEXINT } else if (sampleFormat == SAMPLEFORMAT_COMPLEXINT) { complexData = 1; typeid = Y_COMPLEX; #endif #ifdef SAMPLEFORMAT_COMPLEXIEEEFP } else if (sampleFormat == SAMPLEFORMAT_COMPLEXIEEEFP) { complexData = 1; typeid = Y_COMPLEX; #endif } if (complexData #ifdef SAMPLEFORMAT_COMPLEXIEEEFP && (sampleFormat != SAMPLEFORMAT_COMPLEXIEEEFP || bitsPerSample == 8*sizeof(double)) #endif ) { y_error("unsupported TIFF complex sample"); } if (typeid == Y_VOID) { sprintf(message, "unsupported TIFF image data/format (BitsPerSample=%d, SampleFormat=%d)", (int)bitsPerSample, (int)sampleFormat); y_error(message); } /* We need to push 2 buffers onto the stack: a scanline/tile buffer and an image buffer. */ ypush_check(2); /* Allocate a strip buffer onto the stack. */ numberOfStrips = TIFFNumberOfStrips(tiff); stripSize = TIFFStripSize(tiff); buf = push_workspace(stripSize); /* Allocate raster image. */ single = (samplesPerPixel == 1); if (single) { dims[0] = 2; dims[1] = width; dims[2] = height; } else { dims[0] = 3; dims[1] = samplesPerPixel; dims[2] = width; dims[3] = height; } raster = ypush_a(typeid, dims); rowLength = samplesPerPixel*width; rowSize = ((bitsPerSample + 7)/8)*rowLength; if (planarConfig == PLANARCONFIG_CONTIG) { for (strip=0, y0=0 ; y0= numberOfStrips) y_error("bad number of strips"); TIFFReadEncodedStrip(tiff, strip, buf, stripSize); /* Convert every rows in the strip. */ if ((y1 = y0 + rowsPerStrip) > height) y1 = height; if (bitsPerSample%8 == 0) { /* Just copy values. */ memcpy(dst, src, rowSize*(y1 - y0)); } else if (bitsPerSample == 1) { unsigned int mask=1, value; uint32 n = (rowLength/8)*8; for (y=y0 ; y>1) & mask; dst[x+2] = (value>>2) & mask; dst[x+3] = (value>>3) & mask; dst[x+4] = (value>>4) & mask; dst[x+5] = (value>>5) & mask; dst[x+6] = (value>>6) & mask; dst[x+7] = (value>>6) & mask; } if (x < rowLength) { for (value = *src++ ; x < rowLength ; ++x, value >>= 1) { dst[x] = value & mask; } } } } else if (bitsPerSample == 2) { unsigned int mask=3, value; uint32 n = (rowLength/4)*4; for (y=y0 ; y>2) & mask; dst[x+2] = (value>>4) & mask; dst[x+3] = (value>>6) & mask; } if (x < rowLength) { for (value = *src++ ; x < rowLength ; ++x, value >>= 2) { dst[x] = value & mask; } } } } else if (bitsPerSample == 4) { unsigned int mask=15, value; uint32 n = (rowLength/2)*2; for (y=y0 ; y>4) & mask; } if (x < rowLength) { for (value = *src++ ; x < rowLength ; ++x, value >>= 4) { dst[x] = value & mask; } } } } } } else { /* Must be PLANARCONFIG_SEPARATE. */ y_error("unsupported TIFF planar configuration"); } #warning "take orientation into account..." #if 0 int transpose=0, revx=0, revy=0; switch (orientation) { case ORIENTATION_TOPLEFT: /* row 0 top, col 0 lhs */ revx = 0; revy = 1; transpose = 0; break; case ORIENTATION_TOPRIGHT: /* row 0 top, col 0 rhs */ revx = 1; revy = 1; transpose = 0; break; case ORIENTATION_BOTRIGHT: /* row 0 bottom, col 0 rhs */ revx = 1; revy = 0; transpose = 0; break; case ORIENTATION_BOTLEFT: /* row 0 bottom, col 0 lhs */ revx = 0; revy = 0; transpose = 0; break; case ORIENTATION_LEFTTOP: /* row 0 lhs, col 0 top */ revx = 0; revy = 0; transpose = 1; break; case ORIENTATION_RIGHTTOP: /* row 0 rhs, col 0 top */ revx = 0; revy = 0; transpose = 1; break; case ORIENTATION_RIGHTBOT: /* row 0 rhs, col 0 bottom */ revx = 0; revy = 0; transpose = 1; break; case ORIENTATION_LEFTBOT: /* row 0 lhs, col 0 bottom */ revx = 0; revy = 0; transpose = 1; break; } if (revx) { if (samplesPerPixel == 1) { for (y=0 ; y * * This software is governed by the CeCILL-C license under French law and * abiding by the rules of distribution of free software. You can use, modify * and/or redistribute the software under the terms of the CeCILL-C license as * circulated by CEA, CNRS and INRIA at the following URL * "http://www.cecill.info". * * As a counterpart to the access to the source code and rights to copy, modify * and redistribute granted by the license, users are provided only with a * limited warranty and the software's author, the holder of the economic * rights, and the successive licensors have only limited liability. * * In this respect, the user's attention is drawn to the risks associated with * loading, using, modifying and/or developing or reproducing the software by * the user in light of its specific status of free software, that may mean * that it is complicated to manipulate, and that also therefore means that it * is reserved for developers and experienced professionals having in-depth * computer knowledge. Users are therefore encouraged to load and test the * software's suitability as regards their requirements in conditions enabling * the security of their systems and/or data to be ensured and, more generally, * to use and operate it in the same conditions as regards security. * * The fact that you are presently reading this means that you have had * knowledge of the CeCILL-C license and that you accept its terms. * *----------------------------------------------------------------------------- */ /* load dynamic code */ if (is_func(plug_in)) plug_in, "yeti_tiff"; extern tiff_open; /* DOCUMENT tiff_open(filename) or tiff_open(filename, filemode) Open TIFF file FILENAME with mode FILEMODE (default "r", i.e. reading) and return an opaque handle that can be used to manage the TIFF file. The returned handle can also be used like a structure to query TIFF field or pseudo-tag value, e.g.: obj = tiff_open(filename); width = obj.imagewidth; // width of image height = obj.imagelength; // height of image If TIFF field or pseudo-tag MEMBER is undefined, then OBJ.MEMBER gives a nil value; but if the field or pseudo-tag MEMBER is unsupported or does not exist in TIFF specification, an error is raised. The identifiers of TIFF fields or pseudo-tags can be guessed from the tag macros in "tiff.h" header file from libtiff library (e.g. TIFFTAG_IMAGEDESCRIPTION becomes "imagedescription"), this file also gives some description about the tag values. Additional tags are: obj.filename // full path of file (scalar string) obj.filemode // file mode (scalar string) The value of the TIFF tags can also be obtained by indexing the TIFF object with the tag name, for instance: width = obj("imagewidth"); // width of image path = obj("filename"); // path of image file SEE ALSO: tiff_debug, tiff_read_directory, tiff_read_image. */ extern tiff_debug; /* DOCUMENT tiff_debug(value) Set debug flag for TIFF operations and returns the previous debug value. If VALUE is true, TIFF warning messages get printed to standard error; otherwise these messages are never printed. */ extern tiff_read_pixels; /* DOCUMENT tiff_read_pixels(obj) Returns pixel values of image in current "TIFF-directory" (see tiff_read_directory) of TIFF handle OBJ. The result is a WIDTH-by-HEIGHT array for a grayscale or a colormapped image and a SAMPLESPERPIXEL-by-WIDTH-by-HEIGHT array for an RGB image. SEE ALSO: tiff_debug, tiff_open, tiff_read_directory, tiff_read_image. */ extern tiff_read_image; /* DOCUMENT tiff_read_image(obj) or tiff_read_image(obj, stop_on_error) Returns image data in current "TIFF-directory" (see tiff_read_directory) of TIFF handle OBJ. The result is a WIDTH-by-HEIGHT array for a grayscale image and a 4-by-WIDTH-by-HEIGHT array for a color image (the first dimension correspond to: red, green, blue and alpha channels). If optional argument STOP_ON_ERROR is true (it is false by default), then any error while reading the image get raised; otherwise, a simple warning is printed out (note: in any case, an error is raised if there is not enough memory to store the image or if the image format is unsupported). SEE ALSO: tiff_debug, tiff_open, tiff_read_directory, tiff_read_pixels. */ extern tiff_read_directory; /* DOCUMENT tiff_read_directory(obj) Read the next directory in the specified file and make it the current directory. Applications only need to call tiff_read_directory to read multiple subfiles in a single TIFF file -- the first directory in a file is automatically read when tiff_open is called. If the next directory was successfully read, 1 is returned. Otherwise, if there are no more directories to be read, 0 is returned. SEE ALSO: tiff_debug, tiff_open, tiff_read_directory. */ func tiff_read(filename, index=, raw=) /* DOCUMENT tiff_read(filename) Return a grayscale or RGBA image from TIFF file FILENAME (see tiff_read_image) or an array of pixel values if keyword RAW is true (see tiff_read_pixels). Keyword INDEX may be used to specify the TIFF directory number of the image to read (default INDEX=1). SEE ALSO tiff_open, tiff_read_directory, tiff_read_image, tiff_read_pixels. */ { this = tiff_open(filename); if (! is_void(index)) { if (index < 1) error, "bad index"; while (--index) { if (! tiff_read_directory(this)) error, "index too large"; } } //photometric = this.photometric; return (raw ? tiff_read_pixels(this) : tiff_read_image(this, 0)); } func tiff_check(filename) /* DOCUMENT tiff_check(filename) Check whether or not FILENAME is an existing TIFF file, returned value is: 1 if FILENAME is a readable big endian TIFF file, 2 if FILENAME is a readable little endian TIFF file, 0 otherwise (FILENAME unreadable or not a TIFF file). Note that the check is intended to be fast, it is not completely reliable since only the 2 first bytes of FILENAME are used. SEE ALSO: tiff_open. */ { magic = array(char, 2); if ((f = open(filename, "rb", 1)) && _read(f, 0, magic) == sizeof(magic) && (m = magic(1)) == magic(2)) { if (m == 0x4d) return 1; /* big endian */ if (m == 0x49) return 2; /* little endian */ } return 0; } /* * Local Variables: * mode: Yorick * tab-width: 8 * c-basic-offset: 2 * indent-tabs-mode: nil * fill-column: 79 * coding: utf-8 * End: */ Yeti-6.4.0/tools/000077500000000000000000000000001253351442600136005ustar00rootroot00000000000000Yeti-6.4.0/tools/c_cleanup000077500000000000000000000041301253351442600154550ustar00rootroot00000000000000#! /bin/sh # # This script is a filter to cleanup C/C++ files. The following # operations are performed: # - Trailing spaces are removed. # - Unix-like end of line markers are used (LF). # - Code is re-indented by AStyle. # # Note that the filter is called for every 'git-add', 'git-status', etc. # So it is better if it is fast... debug=no bypass=no if test "$debug" = "yes"; then echo >&2 "command -----> $0 $@" fi # Simply execute "cat" to bypass the filter: test "$bypass" = "yes" && exec cat ## AStyle options: astyle=/usr/bin/astyle options="--options=none --mode=c --quiet --preserve-date \ --style=stroustrup --indent=spaces=2 \ --pad-header --indent-labels --lineend=linux \ --min-conditional-indent=0 --max-instatement-indent=80 \ --keep-one-line-blocks --keep-one-line-statements \ --align-pointer=name" # --suffix=none ## Option --align-reference=... is only available after version 2.02. version=$($astyle --version 2>&1 \ | sed 's/^[^0-9]*\([.0-9]*\).*/\1/;s/0*\([1-9][0-9]*\|0\)/\1/g') if test "$debug" = "yes"; then echo >&2 "version -----> $version" fi major=$(echo "$version" | cut -d. -f1) minor=$(echo "$version" | cut -d. -f2) if test "$major" -gt 2 -o \( $major -eq 2 -a $minor -ge 2 \); then options="$options --align-reference=name" fi if test "$debug" = "yes"; then echo >&2 "major -------> $major" echo >&2 "minor -------> $minor" fi # This script can be as a filter (with no arguments) or not. if [ $# -gt 0 ]; then $astyle $options "$@" code=$? else # For some reasons, AStyle cannot work as a filter with GIT. So I first # save the input in a temporary file and run AStyle on it to generate the # output. tmp=$(mktemp "/tmp/c_cleanup-XXXXXXXXXX") cat >"$tmp" $astyle $options <"$tmp" code=$? if test "$debug" = "yes"; then echo >&2 "input -------> $tmp" echo >&2 "status ------> $code" else rm -f "$tmp" fi fi return $code ## No needs to remove spaces, astyle takes care of that; otherwise add: ## | sed -e 's/[ ][ ]*$//' # Local Variables: # mode: sh # tab-width: 8 # indent-tabs-mode: nil # fill-column: 78 # coding: utf-8 # End: Yeti-6.4.0/tools/code_cleanup000077500000000000000000000011461253351442600161510ustar00rootroot00000000000000#! /bin/sh # # This script is a filter to cleanup text or code files. The following # operations are performed: # - Trailing spaces are removed. # - Unix-like end of line markers are used (LF). # - Leading tabulations are replaced by spaces. # debug=no bypass=no if test "$debug" = "yes"; then echo >&2 "command -----> $0 $@" fi # Simply execute "cat" to bypass the filter: test "$bypass" = "yes" && exec cat expand --initial --tabs=8 \ | sed -e 's/\x0d$//;s/[ ]*$//;s/[ ]*\x0d/\x0a/g' # Local Variables: # mode: sh # tab-width: 8 # indent-tabs-mode: nil # fill-column: 78 # coding: utf-8 # End: Yeti-6.4.0/tools/ymk_cleanup000077500000000000000000000007251253351442600160410ustar00rootroot00000000000000#! /bin/sed -f # # This SED script is a filter to Yorick Makefile. The following # operations are performed: # - Trailing spaces are removed. # - Tabulations are left unchanged (this is intended for Makefiles). # - Unix-like end of line markers are used (LF). # - Macros like Y_HOME which are overwritten by 'yorick -batch make.i' are # cleared. # s/\x0d$// s/ *$// s/ *\x0d/\x0a/g s/^\( *Y_\(MAKEDIR\|EXE\(\|_PKGS\|_HOME\|_SITE\)\|HOME_PKG\) *=\).*/\1/ #