zpaq-1.10.orig/0002755000000000000500000000000011403162125011651 5ustar rootsrczpaq-1.10.orig/LICENSE0000644000000000000500000010476611376275540012711 0ustar rootsrcLICENSE description added by Matt Mahoney on May 23, 2010. All code with the exception of the SHA1 class is Copyright (C), Ocarina Networks Inc, as dated in the source code, and is licensed under the GNU General Public License, version 3. The SHA1 class is derived from code in RFC-3174, which is Copyright (C), 2001, The Internet Society. Both licenses are included below. -------------------------------------------------------------------------- License for code derived from RFC-3174 (class SHA1). Source: http://datatracker.ietf.org/doc/rfc3174/ Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. -------------------------------------------------------------------------- License for all code not derived from RFC-3174. Source: http://www.gnu.org/licenses/gpl.txt GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS zpaq-1.10.orig/zpaqmake.bat0000644000000000000500000000156311267740711014170 0ustar rootsrc: This script is called by ZPAQ to compile optimized versions. It is : expected to compile %1.cpp to %1.exe with -DOPT -DNDEBUG, : #include and link it to zpaq.cpp or zpaq.o. These files can : be anywhere, but this script is expected to find them. : The following assumes zpaq.cpp and zpaq.h are in C:\bin\src : and temporary files go in %TEMP%. Adjust accordingly. @echo off : MinGW 4.4.0 (recommended) g++ -O2 -s -fomit-frame-pointer -march=pentiumpro -DNDEBUG -DOPT %1.cpp -IC:\bin\src C:\bin\src\zpaq.cpp -o %1.exe : Borland compiler :cd %temp% :bcc32 -O -6 -DNDEBUG -DOPT -IC:\bin\src %1.cpp C:\bin\src\zpaq.cpp -e%1.exe :del zpaq_*.obj zpaq_*.map zpaq_*.tds : Digital Mars :cd %temp% :\dm\bin\dmc -o -6 -IC:\bin\src %1.cpp -DNDEBUG -DOPT C:\bin\src\zpaq.cpp :del zpaq_*.obj zpaq_*.map : Optionally compress the output .exe upx -qqq %1.exe zpaq-1.10.orig/readme.txt0000644000000000000500000002421711316271520013656 0ustar rootsrcREADME for ZPAQ v1.10 Matt Mahoney - Dec. 28, 2009, matmahoney (at) yahoo (dot) com. ZPAQ is a configurable file compressor and archiver. Its goal is a high compression ratio in an open format without loss of compatibility between versions as new compression algorithms are discovered. ZPAQ includes tools to help develop and test new algorithms. All software is (C) 2009, Ocarina Networks Inc. and written by Matt Mahoney. It is open source licensed under GPL v3. http://www.gnu.org/copyleft/gpl.html Contents: zpaq.exe - The ZPAQ compressor, decompressor, and environment for developing new compression algorithms in the ZPAQ format. Compiled for 32 bit Windows. zpaq.cpp, zpaq.h - Source code (GPL) for zpaq.exe. See comments for usage. zpaqmake.bat - Script used by ZPAQ to build optimized code. min.cfg - ZPAQ config file for fast compression. mid.cfg - Config file for average compression (default). max.cfg - Config file for good compression. lzppre.exe - LZP preprocessor, required with min.cfg. lzppre.cpp - Source code for lzppre.exe. readme.txt - This file. Brief usage summary: To compress: zpaq ocmax.cfg,2 archive files... To decompress: zpaq ox archive files... To list contents: zpaq l archive For help: zpaq For compression, "c" means compress. To append to an existing archive, use "a" instead, as "zpaq oamax.cfg,2 archive files...". "o" means optimize (run faster). You need a C++ compiler installed to use this option. If not, drop the "o". You can still use zpaq but it will take about twice as long to run. "max.cfg" selects maximum (but slow) compression. min.cfg selects minimum but fast compression. mid.cfg is in the middle. Decompression speed will be the same as compression. ",2" means use 4 times more memory. Each increment doubles usage. You need the same memory to decompress. "ox" means extract fast. You can extract more slowly with "x" if you don't have C++ installed. Output files are renamed in the same order they are stored and listed. If you don't rename the output files, then the files will be extracted to the current directory with the same names they had when stored. See zpaq.cpp for complete descriptions, many other options, and how to write config files for custom compression algorithms, and installation instructions. History ------- Versions prior to 1.00 are not compatible with the ZPAQ standard and are obsolete. All versions 1.00 and higher are forward and backward compatible. v0.01 - Feb. 15, 2009. Original release. Conforms to v0.29 of spec. except does not support postprocessing. v0.02 - Feb. 18, 2009. Adds R=X, X=R, and LJ instructions and R[256] register. Removes .= instruction. Spaces are required before ZPAQL operands. Adds end of segment signal to decoder. Adds "x" transform (E8E9). PASS transform is changed to "0". Adds a header byte to describe HCOMP language. Not compatible with v0.01. Conforms to v0.32 of spec. Current max.cfg does poorly with maximumcompression.com. Expect more changes. v0.03 - Feb. 19, 2009. Fixed MIX, MIX2, and IMIX spec. to reduce overflow, which resulted in poor compression of large files. Modified stretch function for better compression. Block 1: requires 314.476 MB memory (with POST X to turn on E8E9) maxcomp\a10.jpg 842468 -> 829159 maxcomp\acrord32.exe 3870784 -> 1154882 maxcomp\english.dic 4067439 -> 476099 maxcomp\FlashMX.pdf 4526946 -> 3649140 maxcomp\fp.log 20617071 -> 432826 maxcomp\mso97.dll 3782416 -> 1545417 maxcomp\ohs.doc 4168192 -> 757538 maxcomp\rafale.bmp 4149414 -> 763314 maxcomp\vcfiu.hlp 4121418 -> 499321 maxcomp\world95.txt 2988578 -> 441130 53,134,726 -> 10,548,826 v0.04 - Feb. 21, 2009. Fixed train() spec. to fix poor compression with SSE and possibly other components. Modifed squash() for better compression. New max.cfg. v0.05 - Feb. 26, 2009. Changed representation of squashed probabilities to 15 bits (0..32767) and stretched to 6 bit scale in (-2048..2047), and mixer weights to 20 bit signed numbers. Mixers are now guaranteed not to overflow. The higher resolution improves compression on highly redundant files. MIX2 now has weights constrained to add to 1 which also improves compression. v0.06 - Feb. 27, 2009. Optionally appends a SHA1 hash of the input file for each segment, which is checked by the decompressor. Added "b" command to append without a checksum. Replaced IMIX2 with ISSE. Compression prints memory usage by component. v0.07 - Feb. 28, 2009. Modified ISSE to use decreasing learning rate on the fixed size inversely proportional to a count. ISSE drops the c and rate parameters. SSE drops the mask parameter. Bit history next-state tables are updated by removing some of the n0=0 or n1=0 states and adding other states. v0.08 - Mar. 8, 2009. Added LZP preprocessor. Improved memory utilization reporting. Minor speed improvements. Added mid.cfg. Changed MATCH so that the buffer and hash table sizes are specified separately. Clarified role of comment field. Removed zpaqd.exe. v0.09 - Mar. 9, 2009. Removed counters from ISSE and ICM and replaced bit history map with initial estimates based on n1/(n0+n1) to improve speed. Fixed a bug where x clobbers files when it says it isn't. v1.00 - Mar. 12, 2009. First level 1 candidate. Simplified the bit history tables and replaced with code to generate them in both the documentation and code. First release of the reference standard unzpaq1 v1.00. Improved compression on some files. v1.01 - Apr. 27, 2009. Updated unzpaq to fix VS2005 compiler issues. v1.02 - June 14, 2009. Updated zpaq and unzpaq to close files immediately after extraction instead of when program exits. Fixed g++ 4.4 compiler warnings. v1.03 - Sept. 8, 2009. unzpaq and zpaq: added support for appending unnamed segments to the previous file. In unzpaq 1.02 and earlier you would need to extract each segment to a different file and concatenate them manually. Also, unzpaq will refuse to extract filenames stored with an absolute path, drive letter, or that have upward links "../" or "..\" or that have control characters (ASCII 0-31) in the file name unless a filename is given on the command line (in which case any name is allowed). Quits on the first error rather than skipping files. zpaq only: made mid.cfg the default configuration. Also added the k command to create segmented files. When the offset is not 0 the segment is stored with no name to signal the decompressor to append to the previous file (which may be in a different ZPAQ block). Added the r command to store full paths. 1.02 and earlier always did this. By default, 1.03 stores only the file name. Updated the s command to output the full header as a C array. Sept. 14, 2009. Added zpaqsfx 1.03. v1.04 - Sept. 18, 2009. zpaq will extract from self extracting archives. Added progress meter. zpaqsfx.exe is slightly smaller. Fixed zpaqsfx.cpp compiler issue (replaced "and" with "&&" in main()). v1.05 - Sept. 28, 2009. Removed built in x (E8E9) and p (LZP) preprocessors and made these external programs (included). Config files now specify an external preprocessor command line and ZPAQL code to invert the transform. The inversion is verified before compression. Added structured programming (if/ifnot-else-endif, do-while/until/forever) to ZPAQL. Reorganized the less commonly used commands. New commands to extract from single blocks, extract with paths (default is now to current directory), extract unnamed blocks as separate files, compress without filenames or with full paths, or without comments, debug both the HCOMP and new PCOMP sections of config files, and display trace in either decimal or hexadecimal. Fixed detection of corrupted input in decoder. unzpaq.exe not included in distribution because zpaq.exe has all the same functions. v1.06 - Sept. 29, 2009. Updated specification zpaq1.pdf to include a recommendation of adding a 13 byte locater tag to mark the start of a ZPAQ archive embedded in other data. Updated zpaq.cpp, unzpaq.cpp, and zpaqsfx.cpp to find this tag. Also added "ta" to append this tag. Some minor bug fixes and porting issues fixed. Changed unzpaq to extract to current directory by default. v1.07 - Oct. 2, 2009. zpaq config files now accept arguments. Fixed a bug in min.cfg. Cleaned up "tr" command display. min.cfg, mid.cfg, max.cfg accept an argument to change memory. min.cfg takes a second argument to change LZP minimum match. pcomp external preprocessor command must end with ; v1.08 - Oct. 14, 2009. Added optimization, which makes zpaq about twice as fast if an external C++ compiler is available. The "o" option compiles the model and creates a temporary program optimized for the current input, and runs it. Also changed meaning of "nx" to mean decompress all output to one file. Fixed ZPAQL shift instructions to be consistent with spec on non x86 machines. v1.09 - Oct 21, 2009. Port to Linux. Preprocessor temporary files now go in %TEMP% or $TEMP. TMPDIR not used. Optimized decompressor now verifies header contents matches code. File size display fixed for sizes over 2 GB. Added q option (quiet) to suppress output. Compression shows preprocessed size if different. v1.10 - Dec. 28, 2009. zpaq.cpp bug fix for g++ 4.4.1/Linux. Thanks to Tom Hargreaves for a patch. zpaq.h is still v1.09. zpaq-1.10.orig/mid.cfg0000644000000000000500000000142211263657240013113 0ustar rootsrc(zpaq 1.07+ config file tuned for average compression. Uses 111 * 2^$1 MB memory, where $1 is the first argument. (C) 2009, Ocarina Networks Inc. Written by Matt Mahoney. This software is free under GPL v3. http://www.gnu.org/copyleft/gpl.html ) comp 3 3 0 0 8 (hh hm ph pm n) 0 icm 5 (order 0...5 chain) 1 isse 13 0 2 isse $1+17 1 3 isse $1+18 2 4 isse $1+18 3 5 isse $1+19 4 6 match $1+22 $1+24 (order 7) 7 mix 16 0 7 24 255 (order 1) hcomp c++ *c=a b=c a=0 (save in rotating buffer M) d= 1 hash *d=a (orders 1...5 for isse) b-- d++ hash *d=a b-- d++ hash *d=a b-- d++ hash *d=a b-- d++ hash *d=a b-- d++ hash b-- hash *d=a (order 7 for match) d++ a=*c a<<= 8 *d=a (order 1 for mix) halt post 0 end zpaq-1.10.orig/zpaq.h0000644000000000000500000002346311267725351013021 0ustar rootsrc/* header file for zpaq v1.09 archiver and file compressor. (C) 2009, Ocarina Networks, Inc. Written by Matt Mahoney, matmahoney@yahoo.com, Oct. 21, 2009. LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details at Visit . See zpaq.cpp source code for documentation. */ #include #include #include #include const int LEVEL=1; // ZPAQ level 0=experimental 1=final // 1, 2, 4 byte unsigned integers typedef unsigned char U8; typedef unsigned short U16; typedef unsigned int U32; // An Array of T is cleared and aligned on a 64 byte address // with no constructors called. No copy or assignment. // Array a(n, ex=0); - creates n< class Array { private: T *data; // user location of [0] on a 64 byte boundary int n; // user size-1 int offset; // distance back in bytes to start of actual allocation void operator=(const Array&); // no assignment Array(const Array&); // no copy public: Array(int sz=0, int ex=0): data(0), n(-1), offset(0) { resize(sz, ex);} // [0..sz-1] = 0 void resize(int sz, int ex=0); // change size, erase content to zeros ~Array() {resize(0);} // free memory int size() const {return n+1;} // get size T& operator[](int i) {assert(n>=0 && i>=0 && U32(i)<=U32(n)); return data[i];} T& operator()(int i) {assert(n>=0 && (n&(n+1))==0); return data[i&n];} }; // Change size to sz< void Array::resize(int sz, int ex) { while (ex>0) { if (sz<0 || sz>=(1<<30)) fprintf(stderr, "Array too big\n"), exit(1); sz*=2, --ex; } if (sz<0) fprintf(stderr, "Array too big\n"), exit(1); if (n>-1) { assert(offset>0 && offset<=64); assert((char*)data-offset); free((char*)data-offset); } n=-1; if (sz<=0) return; n=sz-1; data=(T*)calloc(64+(n+1)*sizeof(T), 1); if (!data) fprintf(stderr, "Out of memory\n"), exit(1); offset=64-int((long)data&63); assert(offset>0 && offset<=64); data=(T*)((char*)data+offset); } // The SHA1 class is used to compute segment checksums. // SHA-1 code modified from RFC 3174. // http://www.faqs.org/rfcs/rfc3174.html enum { shaSuccess = 0, shaNull, /* Null pointer parameter */ shaInputTooLong, /* input data too long */ shaStateError /* called Input after Result */ }; const int SHA1HashSize=20; class SHA1 { U32 Intermediate_Hash[SHA1HashSize/4]; /* Message Digest */ U32 Length_Low; /* Message length in bits */ U32 Length_High; /* Message length in bits */ int Message_Block_Index; /* Index into message block array */ U8 Message_Block[64]; /* 512-bit message blocks */ int Computed; /* Is the digest computed? */ int Corrupted; /* Is the message digest corrupted? */ U8 result_buf[20]; // Place to put result void SHA1PadMessage(); void SHA1ProcessMessageBlock(); U32 SHA1CircularShift(int bits, U32 word) { return (((word) << (bits)) | ((word) >> (32-(bits)))); } int SHA1Reset(); // Initalize int SHA1Input(const U8 *, unsigned int n); // Hash n bytes int SHA1Result(U8 Message_Digest[SHA1HashSize]); // Store result public: SHA1() {SHA1Reset();} // Begin hash void put(int c) { // Hash 1 byte U8 ch=c; SHA1Input(&ch, 1); } int result(int i); // Finish and return byte i (0..19) of SHA1 hash double size() const { // Number of bytes hashed so far return (Length_Low+4294967296.0*Length_High)/8;} }; // A Reader reads from a file or an array U8 p[n] class Reader { FILE *in; const U8 *ptr; int len; public: Reader(FILE *f): in(f), ptr(0), len(0) {} // Read from file Reader(const U8 *p, int n): in(0), ptr(p), len(n) {} // Read from p[n] int get() { // return 1 byte or EOF if (in) return getc(in); else if (ptr && len) return --len, *ptr++; return EOF; } }; // A ZPAQL machine COMP+HCOMP or PCOMP. class ZPAQL { public: ZPAQL(); int read(Reader r); // Read header from archive or array int write(FILE* out); // Write header to archive void verify(); // Compare header to zlist/pzlist void inith(); // Initialize as HCOMP to run void initp(); // Initialize as PCOMP to run U32 H(int i) {return h(i);} // get element of h void run(U32 input); // Execute with input FILE* output; // Destination for OUT instruction, or 0 to suppress SHA1* sha1; // Points to checksum computer void step(U32 input, bool ishex); // Execute while displaying progress double memory(); // Return memory requirement in bytes // ZPAQ1 block header Array header; // hsize[2] hh hm ph pm n COMP (guard) HCOMP (guard) int cend; // COMP in header[7...cend-1] int hbegin, hend; // HCOMP/PCOMP in header[hbegin...hend-1] int select; // Which optimized version of run()? (default 0) private: // Machine state for executing HCOMP Array m; // memory array M for HCOMP Array h; // hash array H for HCOMP Array r; // 256 element register array U32 a, b, c, d; // machine registers int f; // condition flag int pc; // program counter // Support code void init(int hbits, int mbits); // initialize H and M sizes int execute(); // execute 1 instruction, return 0 after HALT, else 1 void run0(U32 input); // default run() when select==0 void div(U32 x) {if (x) a/=x; else a=0;} void mod(U32 x) {if (x) a%=x; else a=0;} void swap(U32& x) {a^=x; x^=a; a^=x;} void swap(U8& x) {a^=x; x^=a; a^=x;} void err(); // exit with run time error }; // A Component is a context model, indirect context model, match model, // fixed weight mixer, adaptive 2 input mixer without or with current // partial byte as context, adaptive m input mixer (without or with), // or SSE (without or with). struct Component { int limit; // max count for cm U32 cxt; // saved context int a, b, c; // multi-purpose variables Array cm; // cm[cxt] -> p in bits 31..10, n in 9..0; MATCH index Array ht; // ICM hash table[0..size1][0..15] of bit histories; MATCH buf Array a16; // multi-use Component(); // initialize to all 0 }; // Next state table generator class StateTable { enum {B=6, N=64}; // sizes of b, t static U8 ns[1024]; // state*4 -> next state if 0, if 1, n0, n1 static const int bound[B]; // n0 -> max n1, n1 -> max n0 int num_states(int n0, int n1); // compute t[n0][n1][1] void discount(int& n0); // set new value of n0 after 1 or n1 after 0 void next_state(int& n0, int& n1, int y); // new (n0,n1) after bit y public: int next(int state, int y) { // next state for bit y assert(state>=0 && state<256); assert(y>=0 && y<4); return ns[state*4+y]; } int cminit(int state) { // initial probability of 1 * 2^23 assert(state>=0 && state<256); return ((ns[state*4+3]*2+1)<<22)/(ns[state*4+2]+ns[state*4+3]+1); } StateTable(); }; // A predictor guesses the next bit class Predictor { public: Predictor(ZPAQL&); // build model int predict(); // probability that next bit is a 1 (0..4095) void update(int y); // train on bit y (0..1) void stat(); // print statistics private: // Predictor state int c8; // last 0...7 bits. int hmap4; // c8 split into nibbles int p[256]; // predictions ZPAQL& z; // VM to compute context hashes, includes H, n Component comp[256]; // the model, includes P // Modeling support functions int predict0(); // default int predict1(); // optimized void update0(int y); // default void update1(int y); // optimized int dt[1024]; // division table for cm: dt[i] = 2^16/(i+1.5) U16 squasht[4096]; // squash() lookup table short stretcht[32768];// stretch() lookup table StateTable st; // next, cminit functions // reduce prediction error in cr.cm void train(Component& cr, int y) { assert(y==0 || y==1); U32& pn=cr.cm(cr.cxt); int count=pn&0x3ff; int error=y*32767-(cr.cm(cr.cxt)>>17); pn+=(error*dt[count]&-1024)+(count floor(32768/(1+exp(-x/64))) int squash(int x) { assert(x>=-2048 && x<=2047); return squasht[x+2048]; } // x -> round(64*log((x+0.5)/(32767.5-x))), approx inverse of squash int stretch(int x) { assert(x>=0 && x<=32767); return stretcht[x]; } // bound x to a 12 bit signed int int clamp2k(int x) { if (x<-2048) return -2048; else if (x>2047) return 2047; else return x; } // bound x to a 20 bit signed int int clamp512k(int x) { if (x<-(1<<19)) return -(1<<19); else if (x>=(1<<19)) return (1<<19)-1; else return x; } // Get cxt in ht, creating a new row if needed int find(Array& ht, int sizebits, U32 cxt); }; // Globals for optimization extern const char *pre_cmd; // preprocessor command extern const U8 *zlist; // model header for COMP, HCOMP extern const U8 *pzlist; // postprocessor code zpaq-1.10.orig/lzppre.cpp0000644000000000000500000001707111260255227013704 0ustar rootsrc/* lzppre.cpp LZP preprocessor (C) 2009, Ocarina Networks, Inc. Written by Matt Mahoney, matmahoney@yahoo.com, Sept. 28, 2009. LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details at Visit . Usage: lzppre ph pm esc minlen hmul input output lzppre preprocesses its input for compression. The inverse operation is provided by the ZPAQL code in the comments below. Encoding is as follows. The sequence (esc 0) codes for esc. The sequence (esc n) where n > 0 codes for a match of length n+minlen from an output buffer of the last 2^pm bytes after the last place that had the same rolling context hash. The hash is computed from input byte c by hash=(hash*hmul+c)&(1< #include #include #include #include #include #include // 1, 2, 4 byte unsigned integers typedef unsigned char U8; typedef unsigned short U16; typedef unsigned int U32; // Print an error message and exit void error(const char* msg="") { fprintf(stderr, "\nError: %s\n", msg); exit(1); } // An Array of T is cleared and aligned on a 64 byte address // with no constructors called. No copy or assignment. // Array a(n, ex=0); - creates n< class Array { private: T *data; // user location of [0] on a 64 byte boundary int n; // user size-1 int offset; // distance back in bytes to start of actual allocation void operator=(const Array&); // no assignment Array(const Array&); // no copy public: Array(int sz=0, int ex=0): data(0), n(-1), offset(0) { resize(sz, ex);} // [0..sz-1] = 0 void resize(int sz, int ex=0); // change size, erase content to zeros ~Array() {resize(0);} // free memory int size() {return n+1;} // get size T& operator[](int i) {assert(n>=0 && i>=0 && U32(i)<=U32(n)); return data[i];} T& operator()(int i) {assert(n>=0 && (n&(n+1))==0); return data[i&n];} }; template void Array::resize(int sz, int ex) { while (ex>0) { if (sz<0 || sz>=(1<<30)) fprintf(stderr, "Array too big\n"), exit(1); sz*=2, --ex; } if (sz<0) fprintf(stderr, "Array too big\n"), exit(1); if (n>-1) { assert(offset>0 && offset<=64); assert((char*)data-offset); free((char*)data-offset); } n=-1; if (sz<=0) return; n=sz-1; data=(T*)calloc(64+(n+1)*sizeof(T), 1); if (!data) fprintf(stderr, "Out of memory\n"), exit(1); offset=64-int((long)data&63); assert(offset>0 && offset<=64); data=(T*)((char*)data+offset); } //////////////////////////// PreProcessor //////////////////////// const U32 EOS=U32(-1); class PreProcessor { FILE *out; // output int ph, pm; // memory sizes for H, M from config file int esc, minlen, hmul; // lzp parameters int state; // 0 = init, 1 = after U32 b, c, d; // general purpose state for transforms Array m; Array h; void lzp(U32 a); // (p) LZP transform void lzp_flush(); // used by lzp() public: PreProcessor(FILE *f, int e, int m_, int h_, int ph_, int pm_); void compress(U32 a); }; PreProcessor::PreProcessor(FILE *f, int e, int m_, int h_, int ph_, int pm_): out(f), ph(ph_), pm(pm_), esc(e), minlen(m_), hmul(h_) { state=0; b=c=d=0; m.resize(1, pm); h.resize(1, ph); } // LZP preprocessor. Strings of length (minlen+len) that match the // last occurrence occuring in the same context hash within 2^pm // are replaced with the 2 byte sequence (esc len) where len=(1...255). // The byte esc is replaced with (esc 0). The context hash is updated // by byte A by hash = hash*hmul+A mod 2^ph. void PreProcessor::lzp(U32 a) { // State is as follows: // F = is last byte ESC? // D = context hash // B = number of bytes output // M = output buffer[0...B-1], size 2^pm // C = pointer to match in M, C < B // H = index mapping D to last match in M, size 2^ph */ /* (ZPAQL code for LZP inverse transform with ESC=5, MINLEN=3, HMUL=40) jf 30 (last byte was esc?) a> 0 jf 37 (goto output esc) a+= 3 r=a 0 c=*d *d=b (top of copy loop) a=*c *b=a b++ c++ out d<>a a*= 40 a+=d d<>a a=r 0 a-- r=a 0 a> 0 jt -20 (to top of copy loop) halt a== 5 jf 1 halt a> 255 jf 4 aa a*= 40 a+=d d<>a halt static const U8 prog[59]={ // compiled from above 1,56,0,47,30,239,0,47,37,135,0,55,0,86,113,69,96,9, 17,57,24,151,0,131,24,7,0,2,55,0,239,0,39,236,56,223,0, 47,1,56,239,255,47,4,224,56,71,0,113,96,9,57,24,151,0,131, 24,56,0}; if (state==0) { for (int i=0; i<59; ++i) { if (i==36 || i==47) encp->compress(ESC); else if (i==10) encp->compress(MINLEN); else if (i==22 || i==54) encp->compress(HMUL); else encp->compress(prog[i]); } state=1; } */ // Forward transform uses similar state information: // b = number of bytes input // c = number of bytes output // d = hash of context at c // m = buffer with pending output at m(c...b-1) // h = index of context hashes h(d) -> m(0...c-1) if (a==EOS) { while (b!=c) lzp_flush(); } else { m(b++)=a; if (b>256+minlen+c || b==(1<0 && c-p+258+minlenminlen) { putc(esc, out); putc(len-minlen, out); while (len-->0) { assert(c!=b); h(d)=c; d=d*hmul+m(c++); } } // Encode a literal else { putc(m(c), out); if (m(c)==esc) putc(0, out); h(d)=c; d=d*hmul+m(c++); } } // Compress one byte (0...255) or EOS void PreProcessor::compress(U32 a) { lzp(a); } int main(int argc, char** argv) { if (argc<8) printf("Usage: lzppre ph pm esc minlen hmul input output\n"), exit(1); FILE *in=fopen(argv[6], "rb"); if (!in) perror(argv[6]), exit(1); FILE *out=fopen(argv[7], "wb"); if (!out) perror(argv[7]), exit(1); int ph=atoi(argv[1]); int pm=atoi(argv[2]); int esc=atoi(argv[3]); int minlen=atoi(argv[4]); int hmul=atoi(argv[5]); PreProcessor pp(out, esc, minlen, hmul, ph, pm); int c; while ((c=getc(in))!=EOF) pp.compress(c); pp.compress(EOS); return 0; } zpaq-1.10.orig/min.cfg0000644000000000000500000000360411263657344013136 0ustar rootsrc(zpaq 1.07 minimum (fast) compression. Uses 4 x 2^$1 MB memory. $2 increases minimum match length. Requires lzppre as an external preprocessor. (C) 2009, Ocarina Networks Inc. Written by Matt Mahoney. This software is free under GPL v3. http://www.gnu.org/copyleft/gpl.html ) comp 3 3 $1+18 $1+20 1 (hh hm PH PM n) 0 cm $1+19 5 (context model size=2^19, limit=5*4) hcomp *d<>a a^=*d a<<= 8 *d=a (order 2 context) halt pcomp lzppre $1+18 $1+20 127 $2+2 96 ; (lzppre PH PM ESC MINLEN HMUL) (If you change these values, then change them in the code too) (The sequence ESC 0 codes for ESC. The sequence ESC LEN codes for a match of length LEN+MINLEN at the last place in the output buffer M (size 2^PM) that had the same context hash in the low PH bits of D. D indexes hash table H which points into buffer M, which contains B bytes. When called, A contains the byte to be decoded and F=true if the last byte was ESC. The rolling context hash D is updated by D=D*HMUL+M[B]) if (last byte was ESC then copy from match) a> 0 jf 50 (goto output esc) a+= $2+2 (MINLEN) r=a 0 (save length in R0) c=*d (c points to match) do (find match and output it) *d=b (update index with last output byte) a=*c *b=a b++ c++ out (copy and output matching byte) d<>a a*= 96 (HMUL) a+=d d<>a (update context hash) a=r 0 a-- r=a 0 (decrement length) a> 0 while (repeat until length is 0) halt endif (otherwise, set F for ESC) a== 127 (ESC) if halt endif (reset state at EOF) a> 255 if b=0 c=0 a= 1 a<<= $1+18 d=a do d-- *d=0 a=d (clear index) a> 0 while halt (F=0) (goto here: output esc) a= 127 (ESC) endif *d=b (update index) *b=a b++ out (update buffer and output) d<>a a*= 96 (HMUL) a+=d d<>a (update context hash) halt end zpaq-1.10.orig/zpaq.cpp0000644000000000000500000045553011316262132013344 0ustar rootsrc/* zpaq v1.10 archiver and file compressor. (C) 2009, Ocarina Networks, Inc. Written by Matt Mahoney, matmahoney@yahoo.com, Dec. 28, 2009. LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details at Visit . This program compresses files into archives and decompresses them. The archive format is compatible with other ZPAQ level 1 compliant programs. See http://mattmahoney.net/dc/ Installation ------------ ZPAQ runs from a command window. In Windows, the following files need to be somewhere in your PATH or in the current directory when run: zpaq.exe - this program. zpaqmake.bat - script used for optimization. lzppre.exe - preprocessor for fast compression (min.cfg) In addition you will need a C++ compiler and zpaq.cpp and zpaq.h to use the optimization feature. Optimization speeds up compression and decompression typically by 50% to 100% or more. It works by generating C++ code for an optimized version of ZPAQ tuned to the input, then compiling and running it. The script zpaqmake.bat should take one argument %1, which will be a C++ program without the .cpp extension. It should compile %1.cpp with options -DOPT -DNDEBUG, link to zpaq.cpp (wherever you put it) and name the result %1.exe. The compiler needs to know where zpaq.h and zpaq.cpp can be found. For example, suppose that zpaq.cpp and zpaq.h are placed in the directory C:\zpaq. Then zpaqmake.bat would contain at a minimum: g++ -O2 -DOPT -DNDEBUG %1.cpp -IC:\zpaq C:\zpaq\zpaq.cpp -o %1.exe g++ 4.4.0 is the recommended compiler, but others will work. -DOPT is required. It removes features not needed for specialized compressors. -DNDEBUG removes run time checks for better speed. %1.cpp is the input file. %1 is passed to the script. It includes the full path, normally %TEMP%/filename where filename is "zpaq_" followed by 40 hexadecimal digits. -IC:\zpaq tells g++ to look in C:\zpaq for zpaq.h C:\zpaq\zpaq.cpp is this file. Alternatively it may be compiled in advance with -DOPT -DNDEBUG to zpaq.o and it could be linked. -o %1.exe names the output file. -O2 optimizes for speed. It often works better than -O3. Other optimizations should be specific to the installed computer. Some useful options are: -s to strip debugging symbols to save space. -fomit-frame-pointer improves speed a bit. -march=pentiumpro is the oldest target that doesn't hurt speed much. You can also compile zpaq.cpp to zpaq.o in advance and link to it in the script. If you do so, compile zpaq.cpp with -c -DOPT -DNDEBUG and other optimizations as appropriate. The script can also compress the resulting .exe, e.g. upx -qqq %1.exe If you don't use the optimization feature then you don't need a C++ compiler, source code, or script. You only need zpaq.exe and lzppre.exe in your PATH. Linux Installation ------------------ If you are installing in Linux then you will need to write an equivalent shell script called zpaqmake (no .bat extension) and make it executable. The output should still have a .exe extension. For example, suppose that zpaq.cpp and zpaq.h are in /usr/zpaq #!/bin/csh g++ -O2 -DOPT -DNDEBUG $1.cpp -I/usr/zpaq /usr/zpaq/zpaq.cpp -o $1.exe You will also need to compile zpaq.cpp and lzppre.cpp and put the executables somewhere in your $PATH. Compile with -DNDEBUG. Don't add a .exe extension. If you want to use /tmp for temporary files, then you need to set setenv TEMP /tmp prior to running zpaq. Command summary --------------- To compress to new archive: zpaq [opnsitqv]c[F[,N...]] archive files... To append to archive: zpaq [opnsitqv]a[F[,N...]] archive files... Optional modifiers: o = compress faster (requires C++ compiler) p = store filename paths in archive n = don't store filenames (names will be needed to decompress) s = don't store SHA1 checksums (saves 20 bytes) i = don't store file sizes as comments (saves a few bytes) t = append locator tag to non-ZPAQ data q = quiet v = verbose (show F as it compiles) F = use options in configuration file F (min.cfg, max.cfg) ,N = pass numeric arguments to F To list contents: zpaq l archive To extract: zpaq [opntq]x[N] archive [files...] o = extract faster (requires C++ compiler) p = extract to stored paths instead of current directory n = decompress all to one file t = don't post-process (for debugging) q = quiet N = extract only block N (1, 2, 3...) files... = rename extracted files (clobbers) otherwise use stored names (does not clobber) To debug configuration file F: zpaq [pthv]rF[,N...] [args...] p = run PCOMP (default is to run HCOMP) t = trace (single step), args are numeric inputs otherwise args are input, output (default stdin, stdout) h = trace display in hexadecimal v = verbose compile ,N = pass numeric arguments to F Basic commands -------------- zpaq c archive files... Compresses one or more files and creates a new archive. If the archive exists then it is overwritten. File names are stored without a path ("C:\tmp\foo.txt" is saved as "foo.txt"). zpaq a archive files... Compresses files and appends to archive. If the archive does not exist then it is created as with the c command. zpaq l archive List the archive contents. Files are listed in the same order they were added. zpaq x archive Extract the contents of the archive. New files are created and named according to the stored filenames. Does not clobber existing files. Extracts to current directory. zpaq x archive files... Extract files and renames in the order they were added to the archive. Clobbers any already existing output files. The number of files extracted is the smaller of the number of filenames on the command line or the number of files in the archive. Archive format -------------- The precise archive format is ZPAQ level 1 revision 1, found at http://mattmahoney.net/dc/zpaq1.pdf revised Sept. 29, 2009. A ZPAQ archive consists of a sequence of blocks that can be decompressed independently. Each block contains one or more segments that must be decompressed in sequence from the start of the block. A block header describes the compression algorithm. A segment contains an optional filename field, an optional comment string, the compressed file, and optionally the SHA1 checksum of the data prior to compression. The "c" and "a" commands create or append one block with each file in a separate segment. The "l" and "x" commands list or extract all of the blocks in the order that they were added. An archive may be mixed with other data provided that each ZPAQ block sequence is preceded by a locator tag (appended with "ta"). ZPAQ will ignore the other data. One use for this technique is to append ZPAQ blocks to a self extracting archive stub. Compression options ------------------- zpaq [opnsitv]ca[F[,N...]] archive files... Create (c) or append (a) archive with optional modifiers and an optional configuration file F. Three files are included: min.cfg - for fast but poor compression. max.cfg - for slow but good compression. mid.cfg - for moderate speed and compression (default). Other config files are available as add-on options or you can write them as explained later. A numeric argument may be appended to F to increase memory usage for better compression. Each increment doubles usage. There should be no space before or after the comma. For example: zpaq cmax.cfg archive files... = 246 MB zpaq cmax.cfg,1 archive files... = 476 MB zpaq cmax.cfg,2 archive files... = 938 MB zpaq cmax.cfg,3 archive files... = 1861 MB zpaq cmax.cfg,-1 archive files... = 130 MB (negative values allowed) Modifiers may be in any order before the "c" or "a" command. The modifiers, command, and configuration file must be written together without any spaces, for example zpaq ipscmax.cfg books.zpaq book1 book2 creates archive books.zpaq with options i, p, s, and configuration file max.cfg. Modifiers have the following meaning: p Store file name paths as given on the command line. The default is to store the name without the path. For example, "zpaq pc books.zpaq tmp\book1" will store the name as "tmp\book1" instead of "book1". If the p option is also given during extraction, then ZPAQ will attempt to extract book1 to the subdirectory tmp instead of the current directory. This will fail if tmp does not exist. ZPAQ does not create directories as needed. n Do not store filenames. The effect is to require that filenames be given during decompression. s Do not store SHA1 checksums. This saves 20 bytes. The decompressor will not check that the output is identical to the original input. i Do not store anything in the comment field. Normally the input file size is stored as a decimal string, taking a few bytes. The comment field has no effect on the program except that it is displayed by the "l" and "x" commands. v Verbose. Show config file F (if present) as it is compiled. This is useful for error checking. q Quiet. Don't display compression progress on the screen. t Append a locator tag to non-ZPAQ data. The tag is a string of 13 bytes that allows ZPAQ and UNZPAQ to find the start of a sequence of ZPAQ blocks embedded in other data. zpaqsfx.exe already has this tag at the end. However, if a new stub is compiled from the source then the t command should be used when appending the first file. o means optimize. If successful, compression is typically 50% to 100% faster. ZPAQ will look for a program named zpaq_X.exe in the temporary directory, where X is derived from the SHA1 checksum of the block header produced by config file F with arguments N. If the program exists, then ZPAQ will call it with the same arguments to perform the compression. If it does not exist then ZPAQ will create a source code file zpaq_X.cpp in the temporary directory, compile it, and link it to zpaq.cpp or zpaq.o depending on the installation. The temporary directory is specified by the environment variable TEMP if it exists, or else the current directory. The program zpaq_X.exe will compress its input in the same format as described by F, but faster. If F specifies a preprocessor, then zpaq_X.exe will expect to find it too. It will also decompress archive blocks in the same configuration but fail if it attempts to decompress blocks in any other configuration. zpaq_X.exe will accept the "c", "a" and "x" commands with all of the same modifiers, but will ignore the "v" and "o" modifiers and ignore any config file F and arguments passed to it. It will not accept the "l" or "r" commands. Extraction requires a block number ("x1", "x2", etc). A different optimized program is used to extract each block. ZPAQ will call the external program zpaqmake to compile zpaq_X.cpp, passing it zpaq_X as an argument. Normally this will be a script that calls a C++ compiler to produce zpaq_X.o, links to zpaq.o and outputs zpaq_X.exe. The script could link to zpaq.cpp instead of zpaq.o. Extraction options ------------------ zpaq [opntq]x[N] archive [files...] p means extract using stored paths if present. The default is to extract to the current directory regardless of how the file names are stored. Stored paths must be relative to the current directory, not start with a "/", "\", a drive letter like "C:" or contain "../" or "..\". If extracting to a subdirectory, it must already exist. It will not be created. [files...] overrides and has no restrictions on file names. Each segment extracts to a different file. If any segments do not have a stored filename then they can only be extracted using the "p" or "n" modifiers. n means to ignore stored filenames and append all output to one file, the first file in [files...]. t means extract without postprocessing (for debugging). Expect checksum errors. q means quiet. Don't display decompression progress. N means to extract only from block number N, where 1 is the first block. Otherwise all blocks are extracted. The "l" command shows which files are in each block. o means optimize. This typically speeds up decompression by 50% to 100%. For each block in the archive, ZPAQ will compute the SHA1 checksum X of the block header including compressed postprocessor code, and call zpaq_X.exe in the temporary direcory with the same arguments but with the block number N appended. If zpaq_X.exe does not exist in the temporary directory (TMPDIR, else TEMP, else the current directory), then it will create it by calling the external program zpaqmake passing zpaq_X as an argument. The resulting program will work like the one created with "oc" or "oa" except that it won't be able to pre-process. If the block uses postprocessing, then X will be different than the corresponding compressor. Development options ------------------- zpaq [pthv]*rF[,N...] [args...] Run the ZPAQL program in HCOMP section of configuration file F. The program is run once for each byte of input from the file named in the first argument and once at EOF with the input byte (or -1) in the A register. Output is to the file named in the second argument. If run with no arguments then take input from stdin and output to stdout. Modifiers: p Run the PCOMP section rather than HCOMP. t Trace (single step) the program. The arguments should be numbers rather than file names. The program is run once for each argument with the value in the A register. As each instruction is executed the register contents are shown. At HALT, memory contents are displayed. h When tracing, display register and memory contents in hexadecimal instead of decimal. v Verbose. Display the config file as it is being compiled. If an error occurs, it will be easier to locate. v is also useful for displaying jump targets. ,N Pass up to 9 numeric arguments to config file F (like the c and a commands). Configuration files ------------------- ZPAQ uses a configurable compression algorithm based on bitwise prediction and arithmetic coding, and optional pre- and post-processing. The algorithm is described precisely in http://mattmahoney.net/dc/zpaq1.pdf The compression and decompression algorithm is described in a configuration file. The decompression algorithm is stored in the ZPAQ archive. The configuration file is only needed during compression. It has 3 parts: COMP - a description of a sequence of bit predictors. Each component takes a context and earlier predictions as input, and outputs a prediction for the next bit. The last component's prediction is output to the arithmetic coder which codes the bit at a cost of log2(1/p), where p is the probability guessed for that bit. (Thus, better prediction leads to better compression). Bits are coded in MSB (most significant bit) to LSB (least significant bit) order. HCOMP - a program that is called once for each byte of uncompressed data with that byte as input, and outputs an array of 32-bit contexts, one for each component in the COMP section. The program is written in ZPAQL, a sandboxed assembler-like language designed for small size and fast interpretation (rather than for easy development). POST/PCOMP - an optional pair of programs to preprocess the input before compression and postprocess the output after decoding for decompression. POST indicates no pre- or postprocessing. The model described by COMP and HCOMP sees a 0 byte followed by a concatenation of the uncompressed files. During decompression, the leading 0 indicates no postprocessing. PCOMP describes an external program to preprocess the input files during compression, and a ZPAQL program to perform the reverse conversion to restore the original input. Unlike COMP and HCOMP, two programs are needed because the compression and decompression models are not the same. During compression, ZPAQ will run the input through both programs and compare the output with the input. If they don't match, then ZPAQ will refuse to compress the file. If they do match, then the input files are preprocessed and compressed, along with the postprocessing program that will be used later to invert the preprocessing transform. The compression model described in the COMP and HCOMP sections sees a 1 as the first byte to indicate that the decoded data should be postprocessed before output. This is followed by a 2 byte program length (LSB first), the ZPAQL postprocessor code, and a concatenation of the preprocessed input files. The PCOMP code sees just the preprocessed files as input, each ending with EOS (-1). The preprocessor is an external program. It is not needed for decompression so it is not saved in the archive. It expects to be called with an input filename and output filename as its last 2 arguments. The postprocessor is a ZPAQL program that is called once for each input byte and once with input EOS (-1) at the end of each segment. The program is initialized at the beginning of a block but maintains state information between segments within the same block. Its input is from archive.$zpaq.pre during compression testing and from the decoder during decompression. The configuration file has the following format: COMP hh hm ph pm n (n numbered component descriptions) HCOMP (program to generate contexts, memory size = hh, hm) POST (for no pre/post procesing) 0 END Or (for custom pre/post processing): COMP hh hm ph pm n (...) HCOMP (...) PCOMP preprocessor-command ; (postprocessor program, memory size = ph, pm) END Configuration files are free format (all white space is the same) and mostly not case sensitive. They may contain comments in ((nested) parenthesis). For example, mid.cfg: comp 3 3 0 0 8 (hh hm ph pm n) 0 icm 5 (chain of indirect model orders 0 to 5) 1 isse 13 0 2 isse 17 1 3 isse 18 2 4 isse 18 3 5 isse 19 4 6 match 22 24 (order 7 match model with 16 MB buffer) 7 mix 16 0 7 24 255 (order 1 mixer, output to arithmetic coder) hcomp c++ *c=a b=c a=0 (save in rotating buffer) d= 1 hash *d=a (order 1 context for isse 1) b-- d++ hash *d=a (order 2 context) b-- d++ hash *d=a (order 3 context) b-- d++ hash *d=a (order 4 context) b-- d++ hash *d=a (order 5 context) b-- d++ hash b-- hash *d=a (order 7 context for match) d++ a=*c a<<= 8 *d=a (order 1 context for mix) halt post (no pre/post processing) 0 end Components ---------- The COMP section has 5 arguments (hh, hm, ph, pm, n) followed by a list of n components numbered consecutively from 0 through n-1. hh, hm, ph, and pm describe the sizes of the arrays used by the HCOMP and PCOMP virtual machines as described later. Each machine has two arrays, H and M. Their sizes are 2^hh and 2^hm respectively in HCOMP, and 2^ph and 2^pm in PCOMP. The HCOMP program computes the context for the n components by placing them in H[0] through H[n-1] as 32-bit numbers. Thus, hh should be set so that 2^hh >= n. In mid.cfg, n = 8 and hh = 3, allowing up to 8 contexts. Larger values would work but waste memory. Memory usage is 2^(hh+2) + 2^hm + 2^(ph+2) + 2^pm bytes. mid.cfg does not use pre/post processing. Thus, there is no PCOMP virtual machine, so ph and pm are set to 0. The 9 possible component types are: CONST c CM s limit ICM s MATCH s b AVG j k wt MIX2 s j k rate mask MIX s j m rate mask ISSE s j SSE s j start limit All component parameters are numbers in the range 0..255. Each component outputs a "stretched" probability in the form ln(p(1)/p(0)). where p(1) and p(0) are the model's estimated probabilities that the next bit will be a 1 or 0, respectively. Thus, negative numbers predict a 0 and positive predict 1. The magnitude is the confidence of the prediction. The output is a number in the range -32 to 32 with precision 1/64 (a 12 bit signed number). Components are as follows: CONST c (constant) Output is (c-128)/16. Thus, numbers larger than 128 predict 1 and smaller predict 0, regardless of context. CONST is very fast and uses no memory. CM s limit (context model) Outputs a prediction by looking up the context in a table of size 2^s using the s low bits of the H[i] (for component i) XORed with a 9 bit expansion of the previously coded (high order) bits of the current byte. (Recall that H[i] is updated once per byte). Each table entry contains a prediction p(1) and a count in the range 0..limit*4 (max 1020). The prediction is updated in proportion to the prediction error and inversely proportional to the count. A large limit (max 255) is best for stationary sources. A smaller value makes the model more adaptive to changing statistics. Memory usage is 2^(s+2) bytes. ICM s (indirect context model) Outputs a prediction by looking up the context in a hash table of size 64 * 2^s bit histores (1 byte states). The histories index a second table of size 256 that outputs a prediction. The table is updated by adjusting the prediction to reduce the prediction error (at a slow, fixed rate) and updating the bit history. A bit history represents a recent count of 0 and 1 bits and indicates whether the most recent bit was a 0 or 1. The hash table is indexed by the low s+10 bits of H[i] and the previous bits of the current byte, with highest 8 bits (s+9..s+2) used to detect hash collisions. An ICM works best on nonstationary sources or where memory efficiency is important. It uses 2^(s+6) bytes. MATCH s b Outputs a prediction by searching for the previous occurrence of the current context in a history buffer of size 2^b bytes, and predicting whatever bit came next, with a confidence proportional to the length of the match. Matches are found using an index of 2^s pointers into the history buffer, each of which points to the previous occurrence of the current context. A MATCH is useful for any data that has repeat occurrences of strings longer than about 6-8 bytes within a window of size 2^b. Generally, larger b (up to the file size) gives better compression, and s = b-2 gives adequate indexing. The context should be a high order hash. Memory usage is 4*2^s + 2^b bytes. AVG j k wt Averages the predictions of components j and k (which must precede i, the current component). The average is weighted by wt/256 for component j and 1 - wt/256 for component k. Often, averaging two predictions gives better results than either prediction by itself. wt should be selected to favor the more accurate component. AVG is very fast and uses no memory. MIX2 s j k rate mask Averages the predictions of components j and k (which must precede i) adaptively. The weight is selected from a table of size 2^s by the low s bits of H[i] added to the masked, previously coded bits of the current byte (an 8 bit value). A mask of 255 includes the current byte, and a mask of 0 excludes it. (Other masks are rarely useful). The adaptation rate is selectable. Typical values are around 8 to 32. Lower values are best for stationary sources. Higher rates are more adaptive. A MIX2 generally gives better compression than AVG but at a cost in speed and memory. Uses 2^(s+1) bytes of memory. MIX s j m rate mask A MIX works like a MIX2 but with m inputs over a range of components j..j+m-1, all of which must precede i. A typical use is as the final component, taking all other components as input with a low order context. A MIX with 2 inputs is different than a MIX2 in that the weights are not constrained to add to 1. This sometimes gives better compression, sometimes worse. Memory usage is m*2^(s+2) bytes. Execution time is proportional to m. ISSE s j (indirect secondary symbol estimator) An ISSE takes a prediction and a context as input and outputs an adjusted prediction. The low s+10 bits of H[i] and the previous bits of the current byte index a hash table of size 2^(s+6) bit histories as with an ICM. The bit history is used as an 8 bit context to select a pair of weights for a 2 input MIX (not a MIX2) with component j (preceding i) as one input and a CONST 144 as the other. The effect is to adjust the previous prediction using a (typically longer) context. A typical use is a chain of ISSE after a low order CM or ICM working up to higher order contexts as in mid.cfg. (This architecture is also used in the PAQ9A compressor). Uses 2^(s+6) bytes. SSE s j start limit (secondary symbol estimator) An SSE takes a predicion and context as input (like an ISSE) and outputs an adjusted prediction. The mapping is direct, however. The input from component j and the context are mapped to a 2^s by 64 CM table by quantizing the prediction to 64 levels and interpolating between the two nearest values. The context is formed by adding the partial current byte to the low s bits of H[i]. The table is updated in proportion to the prediction error and inversely proportional to a count as with a CM. The count is initialized to start and has the range (start..limit*4). A large limit is best for stationary sources. A smaller limit is more adaptive. The starting count does not start at 0 because the table is initialized so that output predictions are the same as input predictions regardless of context. If the initial guess is close, then a higher start value works better. An SSE sometimes gives better compression than an ISSE, especially on stationary sources where a CM works better than an ICM. But it uses 2^12 times more memory for the same context size, so it is useful mostly for low order contexts. A typical use is to adjust the output of a MIX. It is sometimes followed by an AVG to average its input and output, typically weighted 75% to 90% in favor of the output. Sometimes more than one SSE or SSE-AVG pair is used in series with progressively higher order contexts, or may be used in parallel and mixed. An SSE uses 2^(s+8) bytes. All components are designed to work with context hashes that are uniformly distributed over the low order bits (depending on the s parameter for that component). A CM, MIX2, MIX, or SSE may also be used effectively with direct context lookup for low orders. In this case, the low 9 bits of a CM or low 8 bits of the other components should be cleared to leave space to combine with the bits of the current byte. This is summarized: Component Context size Memory ------------------- ------------ ------ CONST c 0 0 CM s limit s 2^(s+2) ICM s s+10 2^(s+6) MATCH s b s 2^(s+2) + 2^b AVG j k wt 0 0 MIX2 s j k rate mask s 2^(s+1) MIX s j m rate mask s m*2^(s+2) ISSE s j s+10 2^(s+6) SSE s j start limit s 2^(s+8) Although the ZPAQ standard does not specify a maximum for s, this program will not create arrays 2GB (2^31) or larger. ZPAQL ----- There are one or two ZPAQL programs in a configuration file. The first, HCOMP, describes a program that computes the context hashes. The second, PCOMP, is optional. It describes the code that inverts any preprocessing performed by an external program prior to compression. The COMP and HCOMP sections are stored in the block headers uncompressed. PCOMP, if used, is appended to the start of the input data and compressed along with it. Each virtual machine has the following state: 4 general purpose 32 bit registers, A, B, C, D. A 1 bit flag register F. A 16 bit program counter, PC. 256 32-bit registers R0 through R255. An array of 32 bit elements, H, of size 2^hh (HCOMP) or 2^ph (PCOMP). An array of 8 bit elements, M, of size 2^hm (HCOMP) or 2^pm (PCOMP). Recall that the first line of a configuration file is: COMP hh hm ph pm n HCOMP is called once per byte of input to be compressed or decompressed with that byte in the A register. It returns with context hashes for the n components in H[0] through H[n-1]. PCOMP is called once per decompressed byte with that byte in the A register. At the end of a segment, it is called with EOS (-1) in A. Output is by the OUT instruction. The output should be the uncompressed data exactly as it was originally input prior to preprocessing. H has no special meaning. All state variables are initialized to 0 at the beginning of a block. State is maintained between calls (and across segment boundaries) except for A (used for input) and PC, which is reset to 0 (the first instruction). The A register is used as the destination of most arithmetic or logical operations. B and C may be used as pointers into M. D points into H. F stores the result of comparisons and is used to decide conditional jumps. R0 through R255 are used for auxilary storage. All operations are modulo 2^32. All array index operations are modulo the size of the array (i.e. using the low bits of the pointer). The instruction set is as follows: - Y=Z (assignment) - where Y is A B C D *B *C *D - where Z is A B C D *B *C *D (0...255) - AxZ (binary operations) - where x is += -= *= /= %= &= &~ |= ^= <<= >>= == < > - where Z is as above. - Yx (unary operations) - where Y is as above - where x is <>A ++ -- ! =0 - except A<>A is not valid. - J N (conditional jumps) - where J is JT JF JMP - where N is a number in (-128...127). - LJ NN (long jump) - where NN is in (0...65535). - X=R N (read R array) - where X is A B C D - where N is in (0...255). - R=A N (write R array) - where N is in (0...255). - ERROR - HALT - OUT - HASH - HASHD All instructions except LJ are 1 or 2 bytes, where the second byte is a number in the range 0..255 (-128..127 for jumps). A 2 byte instruction must be written as 2 tokens separated by a space, e.g. "A= 3", not "A=3" or "A = 3". The exception is assigning 0, which has a 1 byte form, "A=0". The notation *B, *C, and *D mean M[B], M[C], and H[D] respectively, modulo the array sizes. For example "*B=*D" assigns M[B]=H[D] (discarding the high 24 bits of H[D] because M[B] is a byte). Binary operations always put the result in A. =, +=, -=, *=, &=, |=, ^= have the same meanings as in C/C++. /=, %= have the result 0 if the right operand is 0. A&~B means A &= ~B; A<<=B, A>>=B mean the same as in C/C++ but are explicitly defined when B > 31 to mean the low 5 bits of B. ==, <, > compare and put the result in F as 1 (true) or 0 (false). Comparison is unsigned. Thus PCOMP would test for EOS (-1) as "A> 255". There are no !=, <=, or >= operators. B<>A means swap B with A. A must be the right operand. "A<>B" is not valid. When 32 and 8 bit values are swapped as in "*B<>A", the high bits are unchanged. ++ and -- increment and decrement as in C/C++ but must be written in postfix form. "++A" is not valid. Note that "*B++" increments *B, not B. ! means to complement all bits. Thus, "A!" means A = ~A; JT (jump if true), JF (jump if false), and JMP (jump) operands are relative to the next instruction in the range -128..127. Thus "A> 255 JT 1 A++" increments A not to exceed 256. A jump outside the range of the program is a run time error. LJ is a long jump. It is 3 bytes but the operand is written as a number in the range 0..65535 but not exceeding the size of the program. Thus, "A> 255 JT 3 LJ 0" jumps to the beginning of the program if A <= 255. The R registers can only be read or written, as in "R=A 3 B=R 3" which assigns A to R3, then R3 to B. These registers can only be assigned from A or to A, B, C, or D. ERROR causes an error like an undefined instruction, but is not reserved for future use (possibly in ZPAQ level 2) like other undefined instructions. HALT causes the program to end (and compression to resume). A program should always execute HALT. OUT in PCOMP outputs the low 8 bits of A as one byte to the file being extracted. In HCOMP it has no effect. HASH is equivalent to A = (A + *B + 512) * 773; HASHD is equivalent to *D = (*D + A + 512) * 773; These are convenient for computing context hashes that work well with the COMP components. They are not required, however. For example, "A+=*D A*= 12 *D=A" updates a rolling order s/2 context hash for an s-bit wide component pointed to by D. In general, an order ceil(s/k) hash can be updated by using a multiplier which is an odd multiple of 2^k. HASH and HASHD are not rolling hashes. They must be computed completely for each context. HASH is convenient when M is used as a history buffer. In most programs it is not necessary to code jump instructions. ZPAQL supports the following structured programming constructs: IF ... ENDIF (execute ... if F is true) IF ... ELSE ... ENDIF (execute 1st part if true, 2nd if false) IFNOT ... ENDIF (execute ... if F is false) IFNOT ... ELSE ... ENDIF (execute 1st part if false, 2nd if true) DO ... WHILE (loop while true (test F at end)) DO ... UNTIL (loop while false) DO ... FOREVER (loop forever) These constructs may be nested 1000 deep. However IF statements and DO loops nest independently and may be crossed. For example, the following loop outputs a 0 terminated string pointed to by *B by breaking out when it finds a 0. DO A=*B A> 0 IF (JF endif) OUT B++ FOREVER (JMP do) ENDIF IF, IFNOT, and ELSE are coded as JF, JT and JMP respectively. They can only jump over at most 127 instructions. If the code in these sections are longer, then use the long forms IFL, IFNOTL, or ELSEL. These behave the same but are coded using LJ instead. There are no special forms for WHILE, UNTIL, or FOREVER. The compiler will automatically use the long forms when needed. Parameters ---------- In a config file, paramaters may be passed as $1, $2, ..., $9. These are replaced with numeric values passed on the command line. For example: zpaq cmax.cfg,3,4 archive files... would have the effect of replacing $1 with 3 and $2 with 4. The default value is 0, i.e. $3 through $9 are replaced with 0. In addition, a parameter may have the form $N+M, where N is 1 through 9 and M is a number. The effect is to add M. For example, $2+10 would be replaced with 14. Parameters may be used anywhere in the config file where a number is allowed. Pre/Post processing ------------------- The PCOMP/POST section has the form: POST 0 END to indicate no preprocessing or postprocessing, or PCOMP preprocessor-command ; (postprocessing code) END to preprocess with an external program and to invert the transform with postprocessing code written in ZPAQL. The preprocessing command must end with a space followed by a semicolon. The command may contain spaces or options. The program is expected to take as two additional arguments an input file and an output file. ZPAQ will call the program by appending the input file and a temporary file "%TEMP%\archive.zpaq.pre" formed by appending the extension to the archive name. If the program needs to save any state information then it should do so in a file named "%TEMP%\archive.zpaq.tmp" (i.e. replace ".pre" with ".tmp" in the output filename). ZPAQ will delete this file before compressing the first file to initialize the state of the preprocessor, and again after compressing the last file to clean up. It will also delete archive.zpaq.pre before and after compressing each file. Before each file is compressed, ZPAQ will verify that the transformed data in archive.zpaq.pre will be converted back to the original input file by inputting archive.zpaq.pre to the ZPAQL program in PCOMP and comparing its output to the original input. If the output is verified then the file is compressed. Otherwise it is skipped. The algorithm is: Command: zpaq [pnsitvo]ca[F][,N...]] archive inputfiles... if F then compile header Z from F else use default header Z If "c" then open archive for write If "a" then open archive for append Delete archive.zpaq.tmp FIRST = true For each inputfile FILENAME loop Open FILENAME as IN If open fails then continue CHECK1 = SHA1(IN) SIZE = |IN| if Z.PCOMP then Close IN Delete archive.zpaq.pre Run Z.preprocessor-command FILENAME archive.zpaq.pre CHECK2 = SHA1(Z.PCOMP(archive.zpaq.pre, EOS)) if CHECK1 != CHECK2 then continue Open archive.zpaq.pre as IN Else if Z.POST then Rewind IN If FIRST then Code start of block Code Z.COMP, Z.HCOMP Code start of segment If not "p" then strip path from FILENAME If not "n" then code FILENAME If not "i" then code SIZE as comment If FIRST then If Z.PCOMP then compress 1, |Z.PCOMP|, Z.PCOMP Else if Z.POST then compress 0 Compress IN Close IN If not "s" then code CHECK1 Code end of segment FIRST = false Code end of block Close archive Delete archive.zpaq.tmp, archive.zpaq.pre Temporary files will be placed in %TEMP% in Windows or $TEMP in Linux. Windows normally defines %TEMP% as a directory for temporary files. If the environment variable TEMP is not set, then temporary files will be placed in the current directory. To use /tmp in Linux, use the command "setenv TEMP /tmp" before running ZPAQ. Example: Suppose a preprocessor program, caesar.exe, implements a Caesar cipher. It takes a number, input file, and output file as 3 arguments. It encrypts by adding the number to each byte of the input file. For example: caesar 3 book1 book1.enc would encrypt book1 to book1.enc by changing A to D, B to E, etc. To decrypt to book1.out: caesar -3 book1.enc book1.out Then the following config file would use caesar.exe as a preprocessor with a key of 5 and compress with a simple stationary order 0 model. COMP 0 0 0 0 1 0 cm 9 255 HCOMP halt PCOMP caesar 5 ; a> 255 jf 1 halt (ignore EOS) a-= 5 out halt (subtract 5 from each byte) END The ZPAQL code inverts the transform by subtracting 5 from each byte. During decompression, the code is called once for each (transformed) decompressed byte in the A register, and once with EOS (0xFFFFFFFF) at the end of file, which is ignored. To compile ---------- g++ -O2 -march=pentiumpro -fomit-frame-pointer -s zpaq.cpp -o zpaq To turn off run time checks for better speed, compile with -DNDEBUG If linking to optimized code generated by "oc", "oa", or "ox" then compile with -DOPT. This also removes some features to save space. */ #include "zpaq.h" #include #include #include // Print an error message and exit void error(const char* msg="") { #ifdef OPT fprintf(stderr, "\nOPT error: %s\n", msg); #else fprintf(stderr, "\nError: %s\n", msg); #endif exit(1); } // Append string s to array a, enlarging as needed void append(Array& a, const char* s) { if (!s) return; if (!a.size()) a.resize(strlen(s)+1); int len=strlen(&a[0])+strlen(s)+1; if (len>a.size()) { Array tmp(a.size()); strcpy(&tmp[0], &a[0]); a.resize(len*5/4+64); strcpy(&a[0], &tmp[0]); } strcat(&a[0], s); } //////////////////////////// SHA-1 ////////////////////////////// // The SHA1 class is used to compute segment checksums. // SHA-1 code modified from RFC 3174. // http://www.faqs.org/rfcs/rfc3174.html int SHA1::result(int i) { assert(i>=0 && i<20); if (!Computed && shaSuccess != SHA1Result(result_buf)) error("SHA1 failed\n"); return result_buf[i]; } /* * SHA1Reset * * Description: * This function will initialize the SHA1Context in preparation * for computing a new SHA1 message digest. * * Parameters: none * * Returns: * sha Error Code. * */ int SHA1::SHA1Reset() { Length_Low = 0; Length_High = 0; Message_Block_Index = 0; Intermediate_Hash[0] = 0x67452301; Intermediate_Hash[1] = 0xEFCDAB89; Intermediate_Hash[2] = 0x98BADCFE; Intermediate_Hash[3] = 0x10325476; Intermediate_Hash[4] = 0xC3D2E1F0; Computed = 0; Corrupted = 0; return shaSuccess; } /* * SHA1Result * * Description: * This function will return the 160-bit message digest into the * Message_Digest array provided by the caller. * NOTE: The first octet of hash is stored in the 0th element, * the last octet of hash in the 19th element. * * Parameters: * Message_Digest: [out] * Where the digest is returned. * * Returns: * sha Error Code. * */ int SHA1::SHA1Result(U8 Message_Digest[SHA1HashSize]) { int i; if (!Message_Digest) { return shaNull; } if (Corrupted) { return Corrupted; } if (!Computed) { SHA1PadMessage(); for(i=0; i<64; ++i) { /* message may be sensitive, clear it out */ Message_Block[i] = 0; } // Length_Low = 0; /* and DON'T clear length */ // Length_High = 0; Computed = 1; } for(i = 0; i < SHA1HashSize; ++i) { Message_Digest[i] = Intermediate_Hash[i>>2] >> 8 * ( 3 - ( i & 0x03 ) ); } return shaSuccess; } /* * SHA1Input * * Description: * This function accepts an array of octets as the next portion * of the message. * * Parameters: * message_array: [in] * An array of characters representing the next portion of * the message. * length: [in] * The length of the message in message_array * * Returns: * sha Error Code. * */ int SHA1::SHA1Input(const U8 *message_array, unsigned length) { if (!length) { return shaSuccess; } if (!message_array) { return shaNull; } if (Computed) { Corrupted = shaStateError; return shaStateError; } if (Corrupted) { return Corrupted; } while(length-- && !Corrupted) { Message_Block[Message_Block_Index++] = (*message_array & 0xFF); Length_Low += 8; if (Length_Low == 0) { Length_High++; if (Length_High == 0) { /* Message is too long */ Corrupted = 1; } } if (Message_Block_Index == 64) { SHA1ProcessMessageBlock(); } message_array++; } return shaSuccess; } /* * SHA1ProcessMessageBlock * * Description: * This function will process the next 512 bits of the message * stored in the Message_Block array. * * Parameters: * None. * * Returns: * Nothing. * * Comments: * Many of the variable names in this code, especially the * single character names, were used because those were the * names used in the publication. * * */ void SHA1::SHA1ProcessMessageBlock() { const U32 K[] = { /* Constants defined in SHA-1 */ 0x5A827999, 0x6ED9EBA1, 0x8F1BBCDC, 0xCA62C1D6 }; int t; /* Loop counter */ U32 temp; /* Temporary word value */ U32 W[80]; /* Word sequence */ U32 A, B, C, D, E; /* Word buffers */ /* * Initialize the first 16 words in the array W */ for(t = 0; t < 16; t++) { W[t] = Message_Block[t * 4] << 24; W[t] |= Message_Block[t * 4 + 1] << 16; W[t] |= Message_Block[t * 4 + 2] << 8; W[t] |= Message_Block[t * 4 + 3]; } for(t = 16; t < 80; t++) { W[t] = SHA1CircularShift(1,W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16]); } A = Intermediate_Hash[0]; B = Intermediate_Hash[1]; C = Intermediate_Hash[2]; D = Intermediate_Hash[3]; E = Intermediate_Hash[4]; for(t = 0; t < 20; t++) { temp = SHA1CircularShift(5,A) + ((B & C) | ((~B) & D)) + E + W[t] + K[0]; E = D; D = C; C = SHA1CircularShift(30,B); B = A; A = temp; } for(t = 20; t < 40; t++) { temp = SHA1CircularShift(5,A) + (B ^ C ^ D) + E + W[t] + K[1]; E = D; D = C; C = SHA1CircularShift(30,B); B = A; A = temp; } for(t = 40; t < 60; t++) { temp = SHA1CircularShift(5,A) + ((B & C) | (B & D) | (C & D)) + E + W[t] + K[2]; E = D; D = C; C = SHA1CircularShift(30,B); B = A; A = temp; } for(t = 60; t < 80; t++) { temp = SHA1CircularShift(5,A) + (B ^ C ^ D) + E + W[t] + K[3]; E = D; D = C; C = SHA1CircularShift(30,B); B = A; A = temp; } Intermediate_Hash[0] += A; Intermediate_Hash[1] += B; Intermediate_Hash[2] += C; Intermediate_Hash[3] += D; Intermediate_Hash[4] += E; Message_Block_Index = 0; } /* * SHA1PadMessage * * Description: * According to the standard, the message must be padded to an even * 512 bits. The first padding bit must be a '1'. The last 64 * bits represent the length of the original message. All bits in * between should be 0. This function will pad the message * according to those rules by filling the Message_Block array * accordingly. It will also call the ProcessMessageBlock function * provided appropriately. When it returns, it can be assumed that * the message digest has been computed. * * Parameters: * ProcessMessageBlock: [in] * The appropriate SHA*ProcessMessageBlock function * Returns: * Nothing. * */ void SHA1::SHA1PadMessage() { /* * Check to see if the current message block is too small to hold * the initial padding bits and length. If so, we will pad the * block, process it, and then continue padding into a second * block. */ if (Message_Block_Index > 55) { Message_Block[Message_Block_Index++] = 0x80; while(Message_Block_Index < 64) { Message_Block[Message_Block_Index++] = 0; } SHA1ProcessMessageBlock(); while(Message_Block_Index < 56) { Message_Block[Message_Block_Index++] = 0; } } else { Message_Block[Message_Block_Index++] = 0x80; while(Message_Block_Index < 56) { Message_Block[Message_Block_Index++] = 0; } } /* * Store the message length as the last 8 octets */ Message_Block[56] = Length_High >> 24; Message_Block[57] = Length_High >> 16; Message_Block[58] = Length_High >> 8; Message_Block[59] = Length_High; Message_Block[60] = Length_Low >> 24; Message_Block[61] = Length_Low >> 16; Message_Block[62] = Length_Low >> 8; Message_Block[63] = Length_Low; SHA1ProcessMessageBlock(); } //////////////////////////// ZPAQL ////////////////////////////// // Symbolic constants, instruction size, and names typedef enum {NONE,CONST,CM,ICM,MATCH,AVG,MIX2,MIX,ISSE,SSE, JT=39,JF=47,JMP=63,LJ=255, POST=256,PCOMP,END,IF,IFNOT,ELSE,ENDIF,DO, WHILE,UNTIL,FOREVER,IFL,IFNOTL,ELSEL,SEMICOLON} CompType; static const int compsize[256]={0,2,3,2,3,4,6,6,3,5}; bool verbose=false; // global: display lots of stuff? bool quiet=false; // global: display less stuff? static const char* compname[]= {"","const","cm","icm","match","avg","mix2","mix","isse","sse",0}; #ifndef OPT // Opcodes from ZPAQ spec, table 1, without operands (N, M)". static const char* opcodelist[272]={ "error","a++", "a--", "a!", "a=0", "", "", "a=r", "b<>a", "b++", "b--", "b!", "b=0", "", "", "b=r", "c<>a", "c++", "c--", "c!", "c=0", "", "", "c=r", "d<>a", "d++", "d--", "d!", "d=0", "", "", "d=r", "*b<>a","*b++", "*b--", "*b!", "*b=0", "", "", "jt", "*c<>a","*c++", "*c--", "*c!", "*c=0", "", "", "jf", "*d<>a","*d++", "*d--", "*d!", "*d=0", "", "", "r=a", "halt", "out", "", "hash", "hashd","", "", "jmp", "a=a", "a=b", "a=c", "a=d", "a=*b", "a=*c", "a=*d", "a=", "b=a", "b=b", "b=c", "b=d", "b=*b", "b=*c", "b=*d", "b=", "c=a", "c=b", "c=c", "c=d", "c=*b", "c=*c", "c=*d", "c=", "d=a", "d=b", "d=c", "d=d", "d=*b", "d=*c", "d=*d", "d=", "*b=a", "*b=b", "*b=c", "*b=d", "*b=*b","*b=*c","*b=*d","*b=", "*c=a", "*c=b", "*c=c", "*c=d", "*c=*b","*c=*c","*c=*d","*c=", "*d=a", "*d=b", "*d=c", "*d=d", "*d=*b","*d=*c","*d=*d","*d=", "", "", "", "", "", "", "", "", "a+=a", "a+=b", "a+=c", "a+=d", "a+=*b","a+=*c","a+=*d","a+=", "a-=a", "a-=b", "a-=c", "a-=d", "a-=*b","a-=*c","a-=*d","a-=", "a*=a", "a*=b", "a*=c", "a*=d", "a*=*b","a*=*c","a*=*d","a*=", "a/=a", "a/=b", "a/=c", "a/=d", "a/=*b","a/=*c","a/=*d","a/=", "a%=a", "a%=b", "a%=c", "a%=d", "a%=*b","a%=*c","a%=*d","a%=", "a&=a", "a&=b", "a&=c", "a&=d", "a&=*b","a&=*c","a&=*d","a&=", "a&~a", "a&~b", "a&~c", "a&~d", "a&~*b","a&~*c","a&~*d","a&~", "a|=a", "a|=b", "a|=c", "a|=d", "a|=*b","a|=*c","a|=*d","a|=", "a^=a", "a^=b", "a^=c", "a^=d", "a^=*b","a^=*c","a^=*d","a^=", "a<<=a","a<<=b","a<<=c","a<<=d","a<<=*b","a<<=*c","a<<=*d","a<<=", "a>>=a","a>>=b","a>>=c","a>>=d","a>>=*b","a>>=*c","a>>=*d","a>>=", "a==a", "a==b", "a==c", "a==d", "a==*b","a==*c","a==*d","a==", "aa", "a>b", "a>c", "a>d", "a>*b", "a>*c", "a>*d", "a>", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "lj", "post", "pcomp","end", "if", "ifnot","else", "endif","do", "while","until","forever","ifl","ifnotl","elsel",";", 0}; #endif // Constructor ZPAQL::ZPAQL() { cend=hbegin=hend=0; // COMP and HCOMP locations a=b=c=d=f=pc=0; // machine state output=0; sha1=0; select=0; } // Read header, return number of bytes read int ZPAQL::read(Reader r) { // Get header size and allocate int hsize=r.get(); hsize+=r.get()*256; header.resize(hsize+300); cend=hbegin=hend=0; header[cend++]=hsize&255; header[cend++]=hsize>>8; while (cend<7) header[cend++]=r.get(); // hh hm ph pm n // Read COMP int n=header[cend-1]; for (int i=0; iheader.size()-8) error("COMP list too big"); for (int j=1; j=7 && cendhbegin && hend=7 && cendhbegin && hend2) return; const U8* list=select==1?zlist:pzlist; int hsize=list[0]+256*list[1]; if (hsize!=cend+hend-hbegin-2 || memcmp(&header[0], list, cend) || memcmp(&header[hbegin], list+cend, hend-hbegin)) error("block header verify"); #endif } // Initialize machine state as HCOMP void ZPAQL::inith() { assert(header.size()>6); init(header[2], header[3]); // hh, hm } // Initialize machine state as PCOMP void ZPAQL::initp() { assert(header.size()>6); init(header[4], header[5]); // ph, pm } // Initialize machine state to run a program. // Set select to nonzero if header matches anything in the cache // or else add it. void ZPAQL::init(int hbits, int mbits) { assert(header.size()>0); assert(h.size()==0); assert(m.size()==0); assert(cend>=7); assert(hbegin>=cend+128); assert(hend>=hbegin); assert(hend6); assert(hbegin>=cend+128); assert(hend>=hbegin); assert(hend0); assert(h.size()>0); assert(header[0]+256*header[1]==cend+hend-hbegin-2); pc=hbegin; a=input; #ifdef OPT error("no model"); #else while (execute()) ; #endif } #ifndef OPT // Execute program input and show progress void ZPAQL::step(U32 input, bool ishex) { assert(cend>6); assert(hbegin>=cend+128); assert(hend>=hbegin); assert(hend0); assert(h.size()>0); pc=hbegin; a=input; printf("\n" " pc opcode f a b *b c *c d *d\n" "----- -------- - ---------- ---------- --- ---------- --- ---------- ----------\n"); printf(ishex ? " %d %08X %08X %02X %08X %02X %08X %08X\n" : " %d %10u %10u %3u %10u %3u %10u %10u\n", f, a, b, m(b), c, m(c), d, h(d)); while (1) { assert(pc>=cend && pc) { chomp; $code++; if ($_ ne "") { $comment=$_; s/ N$/N/; if (/^([ABCD])(=)(R)/) {($a,$op,$b)=($1,$2,$3);} elsif (/^(R)(=)(A)/) {($a,$op,$b)=($1,$2,$3);} elsif (/^(\*?[ABCD])(\W*)(\*[ABCDN0])$/) {($a,$op,$b)=($1,$2,$3);} elsif (/^(\*?[ABCD])(\W*)([ABCDN0])$/) {($a,$op,$b)=($1,$2,$3);} elsif (/^(\*?[ABCD])(\W*)$/) {($a,$op,$b)=($1,$2);} else {($a,$op,$b)=($_);} $a=~tr/A-Z/a-z/; $b=~tr/A-Z/a-z/; $a=~s/\*([bc])/m($1)/; $b=~s/\*([bc])/m($1)/; $a=~s/\*d/h(d)/; $b=~s/\*d/h(d)/; $b=~s/n/header[pc++]/; $op=~s/&~/&= ~/; $a=~s/error//; $a=~s/halt/return 0/; print(" case $code: "); if ($a eq "jtn") {print"if (f) $go; else ++pc;";} elsif ($a eq "lj n m") {print"if((pc=hbegin+header[pc]+256*header[pc+1])>=hend)err();";} elsif ($a eq "jfn") {print"if (!f) $go; else ++pc;";} elsif ($a eq "jmpn") {print"$go;";} elsif ($a eq "out") {print"if (output) putc(a, output); if (sha1) sha1->put(a);";} elsif ($a eq "hash") {print"a = (a+m(b)+512)*773;"} elsif ($a eq "hashd") {print"h(d) = (h(d)+a+512)*773;"} elsif ($op eq "<>") {print"swap($a);";} elsif ($op eq "==" || $op eq "<" || $op eq ">") {print"f = ($a $op $b);";} elsif ($op eq "++" || $op eq "--") {print"$op$a;";} elsif ($op eq "!") {print"$a = ~$a;";} elsif ($op eq ".=") {print"$a = ($a<<8)+$b;";} elsif ($op eq "/=") {print"div($b);";} elsif ($op eq "%=") {print"mod($b);";} elsif ($b eq "r") {print"$a = r[header[pc++]];";} elsif ($a eq "r") {print"r[header[pc++]] = $b;";} elsif ($a) {print("$a $op $b;");} else {print"err();";} if ($a ne "return 0") {print" break;"} if ($comment eq "") {$comment="undefined";} print" // $comment\n"; } } print" default: err();\n }\n"; */ switch(header[pc++]) { case 0: err(); break; // ERROR case 1: ++a; break; // A++ case 2: --a; break; // A-- case 3: a = ~a; break; // A! case 4: a = 0; break; // A=0 case 7: a = r[header[pc++]]; break; // A=R N case 8: swap(b); break; // B<>A case 9: ++b; break; // B++ case 10: --b; break; // B-- case 11: b = ~b; break; // B! case 12: b = 0; break; // B=0 case 15: b = r[header[pc++]]; break; // B=R N case 16: swap(c); break; // C<>A case 17: ++c; break; // C++ case 18: --c; break; // C-- case 19: c = ~c; break; // C! case 20: c = 0; break; // C=0 case 23: c = r[header[pc++]]; break; // C=R N case 24: swap(d); break; // D<>A case 25: ++d; break; // D++ case 26: --d; break; // D-- case 27: d = ~d; break; // D! case 28: d = 0; break; // D=0 case 31: d = r[header[pc++]]; break; // D=R N case 32: swap(m(b)); break; // *B<>A case 33: ++m(b); break; // *B++ case 34: --m(b); break; // *B-- case 35: m(b) = ~m(b); break; // *B! case 36: m(b) = 0; break; // *B=0 case 39: if (f) pc+=((header[pc]+128)&255)-127; else ++pc; break; // JT N case 40: swap(m(c)); break; // *C<>A case 41: ++m(c); break; // *C++ case 42: --m(c); break; // *C-- case 43: m(c) = ~m(c); break; // *C! case 44: m(c) = 0; break; // *C=0 case 47: if (!f) pc+=((header[pc]+128)&255)-127; else ++pc; break; // JF N case 48: swap(h(d)); break; // *D<>A case 49: ++h(d); break; // *D++ case 50: --h(d); break; // *D-- case 51: h(d) = ~h(d); break; // *D! case 52: h(d) = 0; break; // *D=0 case 55: r[header[pc++]] = a; break; // R=A N case 56: return 0 ; // HALT case 57: if (output) putc(a, output); if (sha1) sha1->put(a); break; // OUT case 59: a = (a+m(b)+512)*773; break; // HASH case 60: h(d) = (h(d)+a+512)*773; break; // HASHD case 63: pc+=((header[pc]+128)&255)-127; break; // JMP N case 64: a = a; break; // A=A case 65: a = b; break; // A=B case 66: a = c; break; // A=C case 67: a = d; break; // A=D case 68: a = m(b); break; // A=*B case 69: a = m(c); break; // A=*C case 70: a = h(d); break; // A=*D case 71: a = header[pc++]; break; // A= N case 72: b = a; break; // B=A case 73: b = b; break; // B=B case 74: b = c; break; // B=C case 75: b = d; break; // B=D case 76: b = m(b); break; // B=*B case 77: b = m(c); break; // B=*C case 78: b = h(d); break; // B=*D case 79: b = header[pc++]; break; // B= N case 80: c = a; break; // C=A case 81: c = b; break; // C=B case 82: c = c; break; // C=C case 83: c = d; break; // C=D case 84: c = m(b); break; // C=*B case 85: c = m(c); break; // C=*C case 86: c = h(d); break; // C=*D case 87: c = header[pc++]; break; // C= N case 88: d = a; break; // D=A case 89: d = b; break; // D=B case 90: d = c; break; // D=C case 91: d = d; break; // D=D case 92: d = m(b); break; // D=*B case 93: d = m(c); break; // D=*C case 94: d = h(d); break; // D=*D case 95: d = header[pc++]; break; // D= N case 96: m(b) = a; break; // *B=A case 97: m(b) = b; break; // *B=B case 98: m(b) = c; break; // *B=C case 99: m(b) = d; break; // *B=D case 100: m(b) = m(b); break; // *B=*B case 101: m(b) = m(c); break; // *B=*C case 102: m(b) = h(d); break; // *B=*D case 103: m(b) = header[pc++]; break; // *B= N case 104: m(c) = a; break; // *C=A case 105: m(c) = b; break; // *C=B case 106: m(c) = c; break; // *C=C case 107: m(c) = d; break; // *C=D case 108: m(c) = m(b); break; // *C=*B case 109: m(c) = m(c); break; // *C=*C case 110: m(c) = h(d); break; // *C=*D case 111: m(c) = header[pc++]; break; // *C= N case 112: h(d) = a; break; // *D=A case 113: h(d) = b; break; // *D=B case 114: h(d) = c; break; // *D=C case 115: h(d) = d; break; // *D=D case 116: h(d) = m(b); break; // *D=*B case 117: h(d) = m(c); break; // *D=*C case 118: h(d) = h(d); break; // *D=*D case 119: h(d) = header[pc++]; break; // *D= N case 128: a += a; break; // A+=A case 129: a += b; break; // A+=B case 130: a += c; break; // A+=C case 131: a += d; break; // A+=D case 132: a += m(b); break; // A+=*B case 133: a += m(c); break; // A+=*C case 134: a += h(d); break; // A+=*D case 135: a += header[pc++]; break; // A+= N case 136: a -= a; break; // A-=A case 137: a -= b; break; // A-=B case 138: a -= c; break; // A-=C case 139: a -= d; break; // A-=D case 140: a -= m(b); break; // A-=*B case 141: a -= m(c); break; // A-=*C case 142: a -= h(d); break; // A-=*D case 143: a -= header[pc++]; break; // A-= N case 144: a *= a; break; // A*=A case 145: a *= b; break; // A*=B case 146: a *= c; break; // A*=C case 147: a *= d; break; // A*=D case 148: a *= m(b); break; // A*=*B case 149: a *= m(c); break; // A*=*C case 150: a *= h(d); break; // A*=*D case 151: a *= header[pc++]; break; // A*= N case 152: div(a); break; // A/=A case 153: div(b); break; // A/=B case 154: div(c); break; // A/=C case 155: div(d); break; // A/=D case 156: div(m(b)); break; // A/=*B case 157: div(m(c)); break; // A/=*C case 158: div(h(d)); break; // A/=*D case 159: div(header[pc++]); break; // A/= N case 160: mod(a); break; // A%=A case 161: mod(b); break; // A%=B case 162: mod(c); break; // A%=C case 163: mod(d); break; // A%=D case 164: mod(m(b)); break; // A%=*B case 165: mod(m(c)); break; // A%=*C case 166: mod(h(d)); break; // A%=*D case 167: mod(header[pc++]); break; // A%= N case 168: a &= a; break; // A&=A case 169: a &= b; break; // A&=B case 170: a &= c; break; // A&=C case 171: a &= d; break; // A&=D case 172: a &= m(b); break; // A&=*B case 173: a &= m(c); break; // A&=*C case 174: a &= h(d); break; // A&=*D case 175: a &= header[pc++]; break; // A&= N case 176: a &= ~ a; break; // A&~A case 177: a &= ~ b; break; // A&~B case 178: a &= ~ c; break; // A&~C case 179: a &= ~ d; break; // A&~D case 180: a &= ~ m(b); break; // A&~*B case 181: a &= ~ m(c); break; // A&~*C case 182: a &= ~ h(d); break; // A&~*D case 183: a &= ~ header[pc++]; break; // A&~ N case 184: a |= a; break; // A|=A case 185: a |= b; break; // A|=B case 186: a |= c; break; // A|=C case 187: a |= d; break; // A|=D case 188: a |= m(b); break; // A|=*B case 189: a |= m(c); break; // A|=*C case 190: a |= h(d); break; // A|=*D case 191: a |= header[pc++]; break; // A|= N case 192: a ^= a; break; // A^=A case 193: a ^= b; break; // A^=B case 194: a ^= c; break; // A^=C case 195: a ^= d; break; // A^=D case 196: a ^= m(b); break; // A^=*B case 197: a ^= m(c); break; // A^=*C case 198: a ^= h(d); break; // A^=*D case 199: a ^= header[pc++]; break; // A^= N case 200: a <<= (a&31); break; // A<<=A case 201: a <<= (b&31); break; // A<<=B case 202: a <<= (c&31); break; // A<<=C case 203: a <<= (d&31); break; // A<<=D case 204: a <<= (m(b)&31); break; // A<<=*B case 205: a <<= (m(c)&31); break; // A<<=*C case 206: a <<= (h(d)&31); break; // A<<=*D case 207: a <<= (header[pc++]&31); break; // A<<= N case 208: a >>= (a&31); break; // A>>=A case 209: a >>= (b&31); break; // A>>=B case 210: a >>= (c&31); break; // A>>=C case 211: a >>= (d&31); break; // A>>=D case 212: a >>= (m(b)&31); break; // A>>=*B case 213: a >>= (m(c)&31); break; // A>>=*C case 214: a >>= (h(d)&31); break; // A>>=*D case 215: a >>= (header[pc++]&31); break; // A>>= N case 216: f = (a == a); break; // A==A case 217: f = (a == b); break; // A==B case 218: f = (a == c); break; // A==C case 219: f = (a == d); break; // A==D case 220: f = (a == U32(m(b))); break; // A==*B case 221: f = (a == U32(m(c))); break; // A==*C case 222: f = (a == h(d)); break; // A==*D case 223: f = (a == U32(header[pc++])); break; // A== N case 224: f = (a < a); break; // A a); break; // A>A case 233: f = (a > b); break; // A>B case 234: f = (a > c); break; // A>C case 235: f = (a > d); break; // A>D case 236: f = (a > U32(m(b))); break; // A>*B case 237: f = (a > U32(m(c))); break; // A>*C case 238: f = (a > h(d)); break; // A>*D case 239: f = (a > U32(header[pc++])); break; // A> N case 255: if((pc=hbegin+header[pc]+256*header[pc+1])>=hend)err();break;//LJ default: err(); } return 1; } #endif // Print illegal instruction error message and exit void ZPAQL::err() { --pc; fprintf(stderr, "\nExecution aborted: pc=%d a=%d b=%d->%d c=%d->%d d=%d->%d\n", pc-hbegin, a, b, m(b), c, m(c), d, h(d)); if (pc>=hbegin && pc0) { c=getc(in); if (c=='(') ++paren; if (c==')') --paren, c=' '; if (c==EOF) return 0; } // read token separated by whitespace do { if (lowercase && isupper(c)) c=tolower(c); s[len++]=c; } while (len<511 && (c=getc(in))!=EOF && c>' '); s[len++]=0; if (verbose) printf("%s ", s); // Substitute parameters $1..$9 with args[0..8], $i+n with args[i-1]+n if (s[0]=='$' && s[1]>='1' && s[1]<='9') { int i=s[1]-'1'; assert(i>=0 && i<9); int val=args[i]; if (s[2]=='+') val+=atoi(s+3); sprintf(s, "%d", val); if (verbose) printf("(%s) ", s); } return s; } // Read a token, which must be in the NULL terminated list or else // exit with an error. If found, return its index. int rtoken(FILE* in, const char* list[]) { assert(in); assert(list); const char* tok=token(in); if (!tok) fprintf(stderr, "\nUnexpected end of configuration file\n"), exit(1); for (int i=0; list[i]; ++i) if (!strcmp(list[i], tok)) return i; fprintf(stderr, "\nConfiguration file error at %s\n", tok), exit(1); assert(0); return -1; // not reached } // Read a token which must be the specified value s void rtoken(FILE* in, const char* s) { assert(s); const char* t=token(in); if (!t) fprintf(stderr, "\nExpected %s, found EOF\n", s), exit(1); if (strcmp(s, t)) fprintf(stderr, "\nExpected %s, found %s\n", s, t), exit(1); } // Read a number in (low...high) or exit with an error int rtoken(FILE* in, int low, int high) { const char* tok=token(in); if (!tok) fprintf(stderr, "\nUnexpected end of configuration file\n"), exit(1); int n=0; const char* p=tok; int sign=1; if (*p=='-') sign=-1, ++p; while (*p) { if (isdigit(*p)) n=n*10+*p-'0'; else fprintf(stderr, "\nConfiguration file error at %s: expected a number\n", tok), exit(1); ++p; } n*=sign; if (n>=low && n<=high) return n; fprintf(stderr, "\nConfiguration file error: expected (%d...%d), found %d\n", low, high, n); exit(1); return 0; } // Stack of n elements of type T template class Stack { Array s; int top; public: Stack(int n): s(n), top(0) {} void push(const T& x) { if (top>=s.size()) error("stack full"); s[top++]=x; } T pop() { if (top<=0) error("stack empty"); return s[--top]; } }; // Compile HCOMP or PCOMP code. Exit on error. Return // code for end token (POST, PCOMP, END) CompType compile_comp(FILE *in, ZPAQL& z) { int op=0; Stack if_stack(1000), do_stack(1000); // IF, DO saved addresses if (verbose) printf("\n"); int indent=0; // program listing indentation while (z.hend<0x10000) { if (verbose) { printf("(%4d) ", z.hend-z.hbegin); for (int i=0; iz.hbegin && a=0); if (j>127) error("IF too big, try IFL, IFNOTL"); z.header[a]=j; if (verbose) printf("((%d) %s %d (to %d)) ", a-z.hbegin-1, opcodelist[z.header[a-1]], j, z.hend-z.hbegin+2); } else { // IFL, IFNOTL int j=z.hend-z.hbegin+2+(op==LJ); assert(j>=0); z.header[a]=j&255; z.header[a+1]=(j>>8)&255; if (verbose) printf("((%d) lj %d) ", a-z.hbegin-1, j); } if_stack.push(z.hend+1); // save JMP target location } else if (op==ENDIF) { int a=if_stack.pop(); // jump target address assert(a>z.hbegin && a=0); if (z.header[a-1]!=LJ) { assert(z.header[a-1]==JT || z.header[a-1]==JF || z.header[a-1]==JMP); if (j>127) error("IF too big, try IFL, IFNOTL, ELSEL\n"); z.header[a]=j; if (verbose) printf("((%d) %s %d (to %d))\n", a-z.hbegin-1, opcodelist[z.header[a-1]], j, z.hend-z.hbegin); } else { j=z.hend-z.hbegin; z.header[a]=j&255; z.header[a+1]=(j>>8)&255; if (verbose) printf("((%d) lj %d)\n", a-1, j); } --indent; } else if (op==DO) { do_stack.push(z.hend); if (verbose) printf("\n"); ++indent; } else if (op==WHILE || op==UNTIL || op==FOREVER) { int a=do_stack.pop(); assert(a>=z.hbegin && a=-127) { // backward short jump if (op==WHILE) op=JT; if (op==UNTIL) op=JF; if (op==FOREVER) op=JMP; operand=j&255; if (verbose) printf("(%s %d (to %d)) ", opcodelist[op], j, z.hend-z.hbegin+2+j); } else { // backward long jump j=a-z.hbegin; assert(j>=0 && j>8; if (verbose) printf("(lj %d) ", j); } --indent; } else if ((op&7)==7) { // 2 byte operand, read N if (op==LJ) { operand=rtoken(in, 0, 65535); operand2=operand>>8; operand&=255; if (verbose) printf("(to %d) ", operand+256*operand2); } else if (op==JT || op==JF || op==JMP) { operand=rtoken(in, -128, 127); if (verbose) printf("(to %d) ", z.hend-z.hbegin+2+operand); operand&=255; } else operand=rtoken(in, 0, 255); } if (verbose) { if (operand2>=0) printf("(%d %d %d)\n", op, operand, operand2); else if (operand>=0) printf("(%d %d)\n", op, operand); else if (op>=0 && op<=255) printf("(%d)\n", op); } if (op>=0 && op<=255) z.header[z.hend++]=op; if (operand>=0) z.header[z.hend++]=operand; if (operand2>=0) z.header[z.hend++]=operand2; if (z.hend-z.hbegin>=0x10000 || z.hend>z.header.size()-144) error("program too big"); } z.header[z.hend++]=0; // END return CompType(op); } // Compile a configuration file. Store COMP/HCOMP section in z. // If there is a PCOMP section, store it in pz and store the PCOMP // command in pcomp_cmd. Replace "$1..$9+n" with args[0..8]+n void compile(FILE* in, ZPAQL& z, ZPAQL& pz, Array& pcomp_cmd, int args[]) { // Allocate header z.header.resize(0x11000); // Compile the COMP section of header z.cend=z.hbegin=z.hend=2; rtoken(in, "comp"); z.header[z.cend++]=rtoken(in, 0, 255); // hh z.header[z.cend++]=rtoken(in, 0, 255); // hm z.header[z.cend++]=rtoken(in, 0, 255); // ph z.header[z.cend++]=rtoken(in, 0, 255); // pm int n=z.header[z.cend++]=rtoken(in, 0, 255); // n if (verbose) printf("\n"); for (int i=0; i0 && clen<10); for (int j=1; j=0x10000) printf("\nProgram too big\n"), exit(1); // Compute header size int hsize=z.hend-z.hbegin+z.cend-2; z.header[0]=hsize&255; z.header[1]=hsize>>8; // Compile POST 0 END if (op==POST) { rtoken(in, 0, 0); rtoken(in, "end"); } // Compile PCOMP pcomp_cmd\n program... END else if (op==PCOMP) { pz.header.resize(0x10300); pz.header[4]=z.header[4]; // copy ph pz.header[5]=z.header[5]; // copy pm pz.cend=8; // empty COMP section // get pcomp_cmd ending with ";" (case sensitive) const char *tok; while ((tok=token(in, false))!=0 && strcmp(tok, ";")) { if (pcomp_cmd.size() && pcomp_cmd[0]) append(pcomp_cmd, " "); append(pcomp_cmd, tok); } pz.hbegin=pz.hend=pz.cend+128; op=compile_comp(in, pz); if (op!=END) error("Expected END in configuation file"); // Compute header size int hsize=pz.hend-pz.hbegin+pz.cend-2; pz.header[0]=hsize&255; pz.header[1]=hsize>>8; } } #endif // ifndef OPT ///////////////////////////// Predictor /////////////////////////// Component::Component(): limit(0), cxt(0), a(0), b(0), c(0) {} U8 StateTable::ns[1024]={0}; const int StateTable::bound[B]={20,48,15,8,6,5}; // n0 -> max n1, n1 -> max n0 // How many states with count of n0 zeros, n1 ones (0...2) int StateTable::num_states(int n0, int n1) { if (n0=N || n1>=N || n1>=B || n0>bound[n1]) return 0; return 1+(n1>0 && n0+n1<=17); } // New value of count n0 if 1 is observed (and vice versa) void StateTable::discount(int& n0) { n0=(n0>=1)+(n0>=2)+(n0>=3)+(n0>=4)+(n0>=5)+(n0>=7)+(n0>=8); } // compute next n0,n1 (0 to N) given input y (0 or 1) void StateTable::next_state(int& n0, int& n1, int y) { if (n0 20,0 // 48,1,0 -> 48,1 // 15,2,0 -> 8,1 // 8,3,0 -> 6,2 // 8,3,1 -> 5,3 // 6,4,0 -> 5,3 // 5,5,0 -> 5,4 // 5,5,1 -> 4,5 while (!num_states(n0, n1)) { if (n1<2) --n0; else { n0=(n0*(n1-1)+(n1/2))/n1; --n1; } } } } // Initialize next state table ns[state*4] -> next if 0, next if 1, n0, n1 StateTable::StateTable() { // Assign states by increasing priority U8 t[N][N][2]={{{0}}}; // (n0,n1,y) -> state number int state=0; for (int i=0; i=0 && n<=2); if (n) { t[n0][n1][0]=state; t[n0][n1][1]=state+n-1; state+=n; } } } // Generate next state table for (int n0=0; n0=0 && s<256); int s0=n0, s1=n1; next_state(s0, s1, 0); assert(s0>=0 && s0=0 && s1=0 && s0=0 && s10); printf("%2d %s", i, compname[type]); for (int j=1; j0); assert(cr.ht.size()>0); int count=0; for (int j=0; j0); int count=0; for (int j=0; j0); int count=0; for (int j=0; j0); for (int j=0; j0) { int hcount=0; for (int j=0; j0) ++hcount; printf(": %d/%d (%1.2f%%)", hcount, cr.ht.size(), hcount*100.0/cr.ht.size()); } cp+=compsize[type]; printf("\n"); } } // Initailize the model Predictor::Predictor(ZPAQL& zr): c8(1), hmap4(1), z(zr) { assert(sizeof(U8)==1); assert(sizeof(U16)==2); assert(sizeof(U32)==4); assert(sizeof(short)==2); assert(sizeof(int)==4); // Initialize tables for (int i=0; i<1024; ++i) dt[i]=(1<<17)/(i*2+3)*2; for (int i=0; i<32768; ++i) stretcht[i]=int(log((i+0.5)/(32767.5-i))*64+0.5+100000)-100000; for (int i=0; i<4096; ++i) squasht[i]=int(32768.0/(1+exp((i-2048)*(-1.0/64)))); // Verify floating point math for squash() and stretch() U32 sqsum=0, stsum=0; for (int i=32767; i>=0; --i) stsum=stsum*3+stretch(i); for (int i=4095; i>=0; --i) sqsum=sqsum*3+squash(i-2048); assert(stsum==3887533746u); assert(sqsum==2278286169u); // Initialize context hash function z.inith(); // Initialize predictions for (int i=0; i<256; ++i) p[i]=0; // Initialize components int n=z.header[6]; // hsize[0..1] hh hm ph pm n (comp)[n] END 0[128] (hcomp) END if (n<1 || n>255) error("n must be 1..255 components"); const U8* cp=&z.header[7]; // start of component list for (int i=0; i&z.header[0] && cp<&z.header[z.header.size()-8]); Component& cr=comp[i]; switch(cp[0]) { case CONST: // c p[i]=(cp[1]-128)*4; break; case CM: // sizebits limit cr.cm.resize(1, cp[1]); // packed CM (22 bits) + CMCOUNT (10 bits) cr.limit=cp[2]*4; for (int j=0; j=i) error("MIX2 k >= i"); if (cp[2]>=i) error("MIX2 j >= i"); cr.c=(1<=i) error("MIX j >= i"); if (cp[3]<1 || cp[3]>i-cp[2]) error("MIX m not in 1..i-j"); int m=cp[3]; // number of inputs assert(m>=1); cr.c=(1<=i) error("ISSE j >= i"); cr.ht.resize(64, cp[1]); cr.cm.resize(512); for (int j=0; j<256; ++j) { cr.cm[j*2]=1<<15; cr.cm[j*2+1]=clamp512k(stretch(st.cminit(j)>>8)<<10); } break; case SSE: // sizebits j start limit if (cp[2]>=i) error("SSE j >= i"); if (cp[3]>cp[4]*4) error("SSE start > limit*4"); cr.cm.resize(32, cp[1]); cr.limit=cp[4]*4; for (int j=0; j0); cp+=compsize[*cp]; assert(cp>=&z.header[7] && cp<&z.header[z.cend]); } } int Predictor::predict0() { assert(c8>=1 && c8<=255); #ifdef OPT error("no model"); return 16384; #else // Predict next bit int n=z.header[6]; assert(n>0 && n<=255); const U8* cp=&z.header[7]; assert(cp[-1]==n); for (int i=0; i&z.header[0] && cp<&z.header[z.header.size()-8]); Component& cr=comp[i]; switch(cp[0]) { case CONST: // c break; case CM: // sizebits limit cr.cxt=z.H(i)^hmap4; p[i]=stretch(cr.cm(cr.cxt)>>17); break; case ICM: // sizebits assert((hmap4&15)>0); if (c8==1 || (c8&0xf0)==16) cr.c=find(cr.ht, cp[1]+2, z.H(i)+16*c8); cr.cxt=cr.ht[cr.c+(hmap4&15)]; p[i]=stretch(cr.cm(cr.cxt)>>8); break; case MATCH: // sizebits bufbits: a=len, b=offset, c=bit, cxt=256/len, // ht=buf, limit=8*pos+bp assert(cr.a>=0 && cr.a<=255); if (cr.a==0) p[i]=0; else { cr.c=cr.ht((cr.limit>>3)-cr.b)>>(7-(cr.limit&7))&1; // predicted bit p[i]=stretch(cr.cxt*(cr.c*-2+1)&32767); } break; case AVG: // j k wt p[i]=(p[cp[1]]*cp[3]+p[cp[2]]*(256-cp[3]))>>8; break; case MIX2: { // sizebits j k rate mask // c=size cm=wt[size][m] cxt=input cr.cxt=((z.H(i)+(c8&cp[5]))&(cr.c-1)); assert(int(cr.cxt)>=0 && int(cr.cxt)=0 && w<65536); p[i]=(w*p[cp[2]]+(65536-w)*p[cp[3]])>>16; assert(p[i]>=-2048 && p[i]<2048); } break; case MIX: { // sizebits j m rate mask // c=size cm=wt[size][m] cxt=index of wt in cm int m=cp[3]; assert(m>=1 && m<=i); cr.cxt=z.H(i)+(c8&cp[5]); cr.cxt=(cr.cxt&(cr.c-1))*m; // pointer to row of weights assert(int(cr.cxt)>=0 && int(cr.cxt)<=cr.cm.size()-m); int* wt=(int*)&cr.cm[cr.cxt]; p[i]=0; for (int j=0; j>8)*p[cp[2]+j]; p[i]=clamp2k(p[i]>>8); } break; case ISSE: { // sizebits j -- c=hi, cxt=bh assert((hmap4&15)>0); if (c8==1 || (c8&0xf0)==16) cr.c=find(cr.ht, cp[1]+2, z.H(i)+16*c8); cr.cxt=cr.ht[cr.c+(hmap4&15)]; // bit history int *wt=(int*)&cr.cm[cr.cxt*2]; p[i]=clamp2k((wt[0]*p[cp[2]]+wt[1]*64)>>16); } break; case SSE: { // sizebits j start limit cr.cxt=(z.H(i)+c8)*32; int pq=p[cp[2]]+992; if (pq<0) pq=0; if (pq>1983) pq=1983; int wt=pq&63; pq>>=6; assert(pq>=0 && pq<=30); cr.cxt+=pq; p[i]=stretch(((cr.cm(cr.cxt)>>10)*(64-wt)+(cr.cm(cr.cxt+1)>>10)*wt)>>13); cr.cxt+=wt>>5; } break; default: error("component predict not implemented"); } cp+=compsize[cp[0]]; assert(cp<&z.header[z.cend]); assert(p[i]>=-2048 && p[i]<2048); } assert(cp[0]==NONE); return squash(p[n-1]); #endif } // Update model with decoded bit y (0...1) void Predictor::update0(int y) { #ifdef OPT error("no model"); #else assert(y==0 || y==1); assert(c8>=1 && c8<=255); assert(hmap4>=1 && hmap4<=511); // Update components const U8* cp=&z.header[7]; int n=z.header[6]; assert(n>=1 && n<=255); assert(cp[-1]==n); for (int i=0; i>8))>>2; } break; case MATCH: // sizebits bufbits: // a=len, b=offset, c=bit, cm=index, cxt=256/len // ht=buf, limit=8*pos+bp { assert(cr.a>=0 && cr.a<=255); assert(cr.c==0 || cr.c==1); if (cr.c!=y) cr.a=0; // mismatch? cr.ht(cr.limit>>3)+=cr.ht(cr.limit>>3)+y; if ((++cr.limit&7)==0) { int pos=cr.limit>>3; if (cr.a==0) { // look for a match cr.b=pos-cr.cm(z.H(i)); if (cr.b&(cr.ht.size()-1)) while (cr.a<255 && cr.ht(pos-cr.a-1)==cr.ht(pos-cr.a-cr.b-1)) ++cr.a; } else cr.a+=cr.a<255; cr.cm(z.H(i))=pos; if (cr.a>0) cr.cxt=2048/cr.a; } } break; case AVG: // j k wt break; case MIX2: { // sizebits j k rate mask // cm=input[2],wt[size][2], cxt=weight row assert(cr.a16.size()==cr.c); assert(int(cr.cxt)>=0 && int(cr.cxt)>5; int w=cr.a16[cr.cxt]; w+=(err*(p[cp[2]]-p[cp[3]])+(1<<12))>>13; if (w<0) w=0; if (w>65535) w=65535; cr.a16[cr.cxt]=w; } break; case MIX: { // sizebits j m rate mask // cm=wt[size][m], cxt=input int m=cp[3]; assert(m>0 && m<=i); assert(cr.cm.size()==m*cr.c); assert(int(cr.cxt)>=0 && int(cr.cxt)<=cr.cm.size()-m); int err=(y*32767-squash(p[i]))*cp[4]>>4; int* wt=(int*)&cr.cm[cr.cxt]; for (int j=0; j>13)); } break; case ISSE: { // sizebits j -- c=hi, cxt=bh assert(cr.cxt==cr.ht[cr.c+(hmap4&15)]); int err=y*32767-squash(p[i]); int *wt=(int*)&cr.cm[cr.cxt*2]; wt[0]=clamp512k(wt[0]+((err*p[cp[2]]+(1<<12))>>13)); wt[1]=clamp512k(wt[1]+((err+16)>>5)); cr.ht[cr.c+(hmap4&15)]=st.next(cr.cxt, y); } break; case SSE: // sizebits j start limit train(cr, y); break; default: assert(0); } cp+=compsize[cp[0]]; assert(cp>=&z.header[7] && cp<&z.header[z.cend] && cp<&z.header[z.header.size()-8]); } assert(cp[0]==NONE); // Save bit y in c8, hmap4 c8+=c8+y; if (c8>=256) { z.run(c8-256); hmap4=1; c8=1; } else if (c8>=16 && c8<32) hmap4=(hmap4&0xf)<<5|y<<4|1; else hmap4=(hmap4&0x1f0)|(((hmap4&0xf)*2+y)&0xf); #endif } // Find cxt row in hash table ht. ht has rows of 16 indexed by the // low sizebits of cxt with element 0 having the next higher 8 bits for // collision detection. If not found after 3 adjacent tries, replace the // row with lowest element 1 as priority. Return index of row. int Predictor::find(Array& ht, int sizebits, U32 cxt) { assert(ht.size()==16<>sizebits&255; int h0=(cxt*16)&(ht.size()-16); if (ht[h0]==chk) return h0; int h1=h0^16; if (ht[h1]==chk) return h1; int h2=h0^32; if (ht[h2]==chk) return h2; if (ht[h0+1]<=ht[h1+1] && ht[h0+1]<=ht[h2+1]) return memset(&ht[h0], 0, 16), ht[h0]=chk, h0; else if (ht[h1+1]&z.header[0] && cp<&z.header[z.header.size()-8]); switch(cp[0]) { case CONST: // c fprintf(out, "\n // %d CONST %d\n", i, cp[1]); break; case CM: // sizebits limit fprintf(out, "\n // %d CM %d %d\n", i, cp[1], cp[2]); fprintf(out, " comp[%d].cxt=z.H(%d)^hmap4;\n" " p[%d]=stretch(comp[%d].cm(comp[%d].cxt)>>17);\n", i, i, i, i, i); break; case ICM: // sizebits fprintf(out, "\n // %d ICM %d\n", i, cp[1]); fprintf(out, " if (c8==1 || (c8&0xf0)==16)\n" " comp[%d].c=find(comp[%d].ht, %d+2, z.H(%d)+16*c8);\n" " comp[%d].cxt=comp[%d].ht[comp[%d].c+(hmap4&15)];\n" " p[%d]=stretch(comp[%d].cm(comp[%d].cxt)>>8);\n", i, i, cp[1], i, i, i, i, i, i, i); break; case MATCH: // sizebits bufbits: a=len, b=offset, c=bit, cxt=256/len, // ht=buf, limit=8*pos+bp fprintf(out, "\n // %d MATCH %d %d\n", i, cp[1], cp[2]); fprintf(out, " if (comp[%d].a==0) p[%d]=0;\n" " else {\n" " comp[%d].c=comp[%d].ht((comp[%d].limit>>3)\n" " -comp[%d].b)>>(7-(comp[%d].limit&7))&1;\n" " p[%d]=stretch(comp[%d].cxt*(comp[%d].c*-2+1)&32767);\n" " }\n", i, i, i, i, i, i, i, i, i, i); break; case AVG: // j k wt fprintf(out, "\n // %d AVG %d %d %d\n", i, cp[1], cp[2], cp[3]); fprintf(out, " p[%d]=(p[%d]*%d+p[%d]*(256-%d))>>8;\n", i, cp[1], cp[3], cp[2], cp[3]); break; case MIX2: // sizebits j k rate mask // c=size cm=wt[size][m] cxt=input fprintf(out, "\n // %d MIX2 %d %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4], cp[5]); fprintf(out, " {\n" " comp[%d].cxt=((z.H(%d)+(c8&%d))&(comp[%d].c-1));\n" " int w=comp[%d].a16[comp[%d].cxt];\n" " p[%d]=(w*p[%d]+(65536-w)*p[%d])>>16;\n" " }\n", i, i, cp[5], i, i, i, i, cp[2], cp[3]); break; case MIX: // sizebits j m rate mask // c=size cm=wt[size][m] cxt=index of wt in cm fprintf(out, "\n // %d MIX %d %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4], cp[5]); fprintf(out, " {\n" " comp[%d].cxt=z.H(%d)+(c8&%d);\n" " comp[%d].cxt=(comp[%d].cxt&(comp[%d].c-1))*%d;\n" " int* wt=(int*)&comp[%d].cm[comp[%d].cxt];\n", i, i, cp[5], i, i, i, cp[3], i, i); for (int j=0; j>8)*p[%d];\n", i, j?"+":"", j, cp[2]+j); fprintf(out, " p[%d]=clamp2k(p[%d]>>8);\n" " }\n", i, i); break; case ISSE: // sizebits j -- c=hi, cxt=bh fprintf(out, "\n // %d ISSE %d %d\n", i, cp[1], cp[2]); fprintf(out, " {\n" " if (c8==1 || (c8&0xf0)==16)\n" " comp[%d].c=find(comp[%d].ht, %d, z.H(%d)+16*c8);\n" " comp[%d].cxt=comp[%d].ht[comp[%d].c+(hmap4&15)];\n" " int *wt=(int*)&comp[%d].cm[comp[%d].cxt*2];\n" " p[%d]=clamp2k((wt[0]*p[%d]+wt[1]*64)>>16);\n" " }\n", i, i, cp[1]+2, i, i, i, i, i, i, i, cp[2]); break; case SSE: // sizebits j start limit fprintf(out, "\n // %d SSE %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4]); fprintf(out, " {\n" " comp[%d].cxt=(z.H(%d)+c8)*32;\n" " int pq=p[%d]+992;\n" " if (pq<0) pq=0;\n" " if (pq>1983) pq=1983;\n" " int wt=pq&63;\n" " pq>>=6;\n" " comp[%d].cxt+=pq;\n" " p[%d]=stretch(((comp[%d].cm(comp[%d].cxt)>>10)*(64-wt)\n" " +(comp[%d].cm(comp[%d].cxt+1)>>10)*wt)>>13);\n" " comp[%d].cxt+=wt>>5;\n" " }\n", i, i, cp[2], i, i, i, i, i, i, i); break; } cp+=compsize[cp[0]]; assert(cp<&z.header[z.cend]); } assert(cp[0]==NONE); fprintf(out, " return squash(p[%d]);\n", n-1); } void opt_update(FILE *out, ZPAQL& z) { int n=z.header[6]; fprintf(out, " // %d components\n", n); // PCOMP should not call update() if (n==0) { fprintf(out, " assert(0);\n"); return; } // Code each component const U8* cp=&z.header[7]; assert(cp[-1]==n); for (int i=0; i&z.header[0] && cp<&z.header[z.header.size()-8]); switch(cp[0]) { case CONST: // c fprintf(out, "\n // %d CONST %d\n", i, cp[1]); break; case CM: // sizebits limit fprintf(out, "\n // %d CM %d %d\n", i, cp[1], cp[2]); fprintf(out, " train(comp[%d], y);\n", i); break; case ICM: // sizebits: cxt=ht[b]=bh, ht[c][0..15]=bh row, cxt=bh fprintf(out, "\n // %d ICM %d\n", i, cp[1]); fprintf(out, " {\n" " comp[%d].ht[comp[%d].c+(hmap4&15)]=\n" " st.next(comp[%d].ht[comp[%d].c+(hmap4&15)], y);\n" " U32& pn=comp[%d].cm(comp[%d].cxt);\n" " pn+=int(y*32767-(pn>>8))>>2;\n" " }\n", i, i, i, i, i, i); break; case MATCH: // sizebits bufbits: // a=len, b=offset, c=bit, cm=index, cxt=256/len // ht=buf, limit=8*pos+bp fprintf(out, "\n // %d MATCH %d %d\n", i, cp[1], cp[2]); fprintf(out, " {\n" " if (comp[%d].c!=y) comp[%d].a=0;\n" " comp[%d].ht(comp[%d].limit>>3)+=comp[%d].ht(comp[%d].limit>>3)+y;\n" " if ((++comp[%d].limit&7)==0) {\n" " int pos=comp[%d].limit>>3;\n" " if (comp[%d].a==0) {\n" " comp[%d].b=pos-comp[%d].cm(z.H(%d));\n" " if (comp[%d].b&(comp[%d].ht.size()-1))\n" " while (comp[%d].a<255 && comp[%d].ht(pos-comp[%d].a-1)\n" " ==comp[%d].ht(pos-comp[%d].a-comp[%d].b-1))\n" " ++comp[%d].a;\n" " }\n" " else comp[%d].a+=comp[%d].a<255;\n" " comp[%d].cm(z.H(%d))=pos;\n" " if (comp[%d].a>0) comp[%d].cxt=2048/comp[%d].a;\n" " }\n" " }\n", i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i); break; case AVG: // j k wt fprintf(out, "\n // %d AVG %d %d %d\n", i, cp[1], cp[2], cp[3]); break; case MIX2: // sizebits j k rate mask // cm=input[2],wt[size][2], cxt=weight row fprintf(out, "\n // %d MIX2 %d %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4], cp[5]); fprintf(out, " {\n" " int err=(y*32767-squash(p[%d]))*%d>>5;\n" " int w=comp[%d].a16[comp[%d].cxt];\n" " w+=(err*(p[%d]-p[%d])+(1<<12))>>13;\n" " if (w<0) w=0;\n" " if (w>65535) w=65535;\n" " comp[%d].a16[comp[%d].cxt]=w;\n" " }\n", i, cp[4], i, i, cp[2], cp[3], i, i); break; case MIX: // sizebits j m rate mask // cm=wt[size][m], cxt=input fprintf(out, "\n // %d MIX %d %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4], cp[5]); fprintf(out, " {\n" " int err=(y*32767-squash(p[%d]))*%d>>4;\n" " int* wt=(int*)&comp[%d].cm[comp[%d].cxt];\n", i, cp[4], i, i); for (int j=0; j>13));\n", j, j, cp[2]+j); fprintf(out, " }\n"); break; case ISSE: // sizebits j -- c=hi, cxt=bh fprintf(out, "\n // %d ISSE %d %d\n", i, cp[1], cp[2]); fprintf(out, " {\n" " int err=y*32767-squash(p[%d]);\n" " int *wt=(int*)&comp[%d].cm[comp[%d].cxt*2];\n" " wt[0]=clamp512k(wt[0]+((err*p[%d]+(1<<12))>>13));\n" " wt[1]=clamp512k(wt[1]+((err+16)>>5));\n" " comp[%d].ht[comp[%d].c+(hmap4&15)]=st.next(comp[%d].cxt, y);\n" " }\n", i, i, i, cp[2], i, i, i); break; case SSE: // sizebits j start limit fprintf(out, "\n // %d SSE %d %d %d %d\n", i, cp[1], cp[2], cp[3], cp[4]); fprintf(out, " train(comp[%d], y);\n", i); break; } cp+=compsize[cp[0]]; assert(cp<&z.header[z.cend]); } assert(cp[0]==NONE); } // Generate optimization code for the HCOMP section of z void opt_hcomp(FILE *out, ZPAQL& z, int select) { /* Instruction translation table. It was generated from the body of ZPAQL::run0() with the following perl script, then hand editing JT, JF, JMP, and LJ. for ($i=0; $i<256; ++$i) { $a[$i]=" \"err();\","; } while (<>) { chomp; if (/case (\d+): (.*) break; *\/\/(.*)/) { $n=$1; $op=$2; $comment=$3; $op=~s/header\[pc\+\+\]/%d/; $op="\"".$op."\","; $a[$n]=sprintf(" %-26s // $n ".$comment, $op); } } for ($i=0; $i<256; ++$i) { print("$a[$i]\n"); } */ static const char* inst[256]={ "err();", // 0 ERROR "++a;", // 1 A++ "--a;", // 2 A-- "a = ~a;", // 3 A! "a = 0;", // 4 A=0 "err();", "err();", "a = r[%d];", // 7 A=R N "swap(b);", // 8 B<>A "++b;", // 9 B++ "--b;", // 10 B-- "b = ~b;", // 11 B! "b = 0;", // 12 B=0 "err();", "err();", "b = r[%d];", // 15 B=R N "swap(c);", // 16 C<>A "++c;", // 17 C++ "--c;", // 18 C-- "c = ~c;", // 19 C! "c = 0;", // 20 C=0 "err();", "err();", "c = r[%d];", // 23 C=R N "swap(d);", // 24 D<>A "++d;", // 25 D++ "--d;", // 26 D-- "d = ~d;", // 27 D! "d = 0;", // 28 D=0 "err();", "err();", "d = r[%d];", // 31 D=R N "swap(m(b));", // 32 *B<>A "++m(b);", // 33 *B++ "--m(b);", // 34 *B-- "m(b) = ~m(b);", // 35 *B! "m(b) = 0;", // 36 *B=0 "err();", "err();", "if (f) goto L%d;", // 39 JT N "swap(m(c));", // 40 *C<>A "++m(c);", // 41 *C++ "--m(c);", // 42 *C-- "m(c) = ~m(c);", // 43 *C! "m(c) = 0;", // 44 *C=0 "err();", "err();", "if (!f) goto L%d;", // 47 JF N "swap(h(d));", // 48 *D<>A "++h(d);", // 49 *D++ "--h(d);", // 50 *D-- "h(d) = ~h(d);", // 51 *D! "h(d) = 0;", // 52 *D=0 "err();", "err();", "r[%d] = a;", // 55 R=A N "return;", // 56 HALT "if (output) putc(a, output); if (sha1) sha1->put(a);", // 57 OUT "err();", "a = (a+m(b)+512)*773;", // 59 HASH "h(d) = (h(d)+a+512)*773;",// 60 HASHD "err();", "err();", "goto L%d;", // 63 JMP N "a = a;", // 64 A=A "a = b;", // 65 A=B "a = c;", // 66 A=C "a = d;", // 67 A=D "a = m(b);", // 68 A=*B "a = m(c);", // 69 A=*C "a = h(d);", // 70 A=*D "a = %d;", // 71 A= N "b = a;", // 72 B=A "b = b;", // 73 B=B "b = c;", // 74 B=C "b = d;", // 75 B=D "b = m(b);", // 76 B=*B "b = m(c);", // 77 B=*C "b = h(d);", // 78 B=*D "b = %d;", // 79 B= N "c = a;", // 80 C=A "c = b;", // 81 C=B "c = c;", // 82 C=C "c = d;", // 83 C=D "c = m(b);", // 84 C=*B "c = m(c);", // 85 C=*C "c = h(d);", // 86 C=*D "c = %d;", // 87 C= N "d = a;", // 88 D=A "d = b;", // 89 D=B "d = c;", // 90 D=C "d = d;", // 91 D=D "d = m(b);", // 92 D=*B "d = m(c);", // 93 D=*C "d = h(d);", // 94 D=*D "d = %d;", // 95 D= N "m(b) = a;", // 96 *B=A "m(b) = b;", // 97 *B=B "m(b) = c;", // 98 *B=C "m(b) = d;", // 99 *B=D "m(b) = m(b);", // 100 *B=*B "m(b) = m(c);", // 101 *B=*C "m(b) = h(d);", // 102 *B=*D "m(b) = %d;", // 103 *B= N "m(c) = a;", // 104 *C=A "m(c) = b;", // 105 *C=B "m(c) = c;", // 106 *C=C "m(c) = d;", // 107 *C=D "m(c) = m(b);", // 108 *C=*B "m(c) = m(c);", // 109 *C=*C "m(c) = h(d);", // 110 *C=*D "m(c) = %d;", // 111 *C= N "h(d) = a;", // 112 *D=A "h(d) = b;", // 113 *D=B "h(d) = c;", // 114 *D=C "h(d) = d;", // 115 *D=D "h(d) = m(b);", // 116 *D=*B "h(d) = m(c);", // 117 *D=*C "h(d) = h(d);", // 118 *D=*D "h(d) = %d;", // 119 *D= N "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "a += a;", // 128 A+=A "a += b;", // 129 A+=B "a += c;", // 130 A+=C "a += d;", // 131 A+=D "a += m(b);", // 132 A+=*B "a += m(c);", // 133 A+=*C "a += h(d);", // 134 A+=*D "a += %d;", // 135 A+= N "a -= a;", // 136 A-=A "a -= b;", // 137 A-=B "a -= c;", // 138 A-=C "a -= d;", // 139 A-=D "a -= m(b);", // 140 A-=*B "a -= m(c);", // 141 A-=*C "a -= h(d);", // 142 A-=*D "a -= %d;", // 143 A-= N "a *= a;", // 144 A*=A "a *= b;", // 145 A*=B "a *= c;", // 146 A*=C "a *= d;", // 147 A*=D "a *= m(b);", // 148 A*=*B "a *= m(c);", // 149 A*=*C "a *= h(d);", // 150 A*=*D "a *= %d;", // 151 A*= N "div(a);", // 152 A/=A "div(b);", // 153 A/=B "div(c);", // 154 A/=C "div(d);", // 155 A/=D "div(m(b));", // 156 A/=*B "div(m(c));", // 157 A/=*C "div(h(d));", // 158 A/=*D "div(%d);", // 159 A/= N "mod(a);", // 160 A=A "mod(b);", // 161 A=B "mod(c);", // 162 A=C "mod(d);", // 163 A=D "mod(m(b));", // 164 A=*B "mod(m(c));", // 165 A=*C "mod(h(d));", // 166 A=*D "mod(%d);", // 167 A= N "a &= a;", // 168 A&=A "a &= b;", // 169 A&=B "a &= c;", // 170 A&=C "a &= d;", // 171 A&=D "a &= m(b);", // 172 A&=*B "a &= m(c);", // 173 A&=*C "a &= h(d);", // 174 A&=*D "a &= %d;", // 175 A&= N "a &= ~ a;", // 176 A&~A "a &= ~ b;", // 177 A&~B "a &= ~ c;", // 178 A&~C "a &= ~ d;", // 179 A&~D "a &= ~ m(b);", // 180 A&~*B "a &= ~ m(c);", // 181 A&~*C "a &= ~ h(d);", // 182 A&~*D "a &= ~ %d;", // 183 A&~ N "a |= a;", // 184 A|=A "a |= b;", // 185 A|=B "a |= c;", // 186 A|=C "a |= d;", // 187 A|=D "a |= m(b);", // 188 A|=*B "a |= m(c);", // 189 A|=*C "a |= h(d);", // 190 A|=*D "a |= %d;", // 191 A|= N "a ^= a;", // 192 A^=A "a ^= b;", // 193 A^=B "a ^= c;", // 194 A^=C "a ^= d;", // 195 A^=D "a ^= m(b);", // 196 A^=*B "a ^= m(c);", // 197 A^=*C "a ^= h(d);", // 198 A^=*D "a ^= %d;", // 199 A^= N "a <<= (a&31);", // 200 A<<=A "a <<= (b&31);", // 201 A<<=B "a <<= (c&31);", // 202 A<<=C "a <<= (d&31);", // 203 A<<=D "a <<= (m(b)&31);", // 204 A<<=*B "a <<= (m(c)&31);", // 205 A<<=*C "a <<= (h(d)&31);", // 206 A<<=*D "a <<= (%d&31);", // 207 A<<= N "a >>= (a&31);", // 208 A>>=A "a >>= (b&31);", // 209 A>>=B "a >>= (c&31);", // 210 A>>=C "a >>= (d&31);", // 211 A>>=D "a >>= (m(b)&31);", // 212 A>>=*B "a >>= (m(c)&31);", // 213 A>>=*C "a >>= (h(d)&31);", // 214 A>>=*D "a >>= (%d&31);", // 215 A>>= N "f = (a == a);", // 216 A==A "f = (a == b);", // 217 A==B "f = (a == c);", // 218 A==C "f = (a == d);", // 219 A==D "f = (a == U32(m(b)));", // 220 A==*B "f = (a == U32(m(c)));", // 221 A==*C "f = (a == h(d));", // 222 A==*D "f = (a == U32(%d));", // 223 A== N "f = (a < a);", // 224 A a);", // 232 A>A "f = (a > b);", // 233 A>B "f = (a > c);", // 234 A>C "f = (a > d);", // 235 A>D "f = (a > U32(m(b)));", // 236 A>*B "f = (a > U32(m(c)));", // 237 A>*C "f = (a > h(d));", // 238 A>*D "f = (a > U32(%d));", // 239 A> N "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "err();", "goto L%d;"}; // 255 LJ NN // Generate a map of jump targets if (z.hend<=z.hbegin) return; Array targets(0x10000); for (int i=z.hbegin; i>24)-z.hbegin; if (addr>=0 && addr<0x10000) targets[addr]=1; else error("goto target out of range"); } if (op%8==7) ++i; // 2 byte instruction (LJ is 3) } // Generate instructions. The output code will not compile // if any ZPAQL instructions jump to the middle of a 2 or 3 // byte instruction (legal) or out of range (legal if not executed). fprintf(out, " a = input;\n"); for (int i=z.hbegin; i>24)-z.hbegin; if (op==LJ) operand=select*100000+z.header[i+1]+256*z.header[i+2], ++i; // label if (op%8==7) ++i; // 2 byte instruction fprintf(out, " "); fprintf(out, inst[op], operand); fprintf(out, "\n"); } } // Write z.header as a C++ array of bytes, var void dump(FILE *out, ZPAQL& z, const char *var) { int hsize=z.cend+z.hend-z.hbegin; assert(hsize==0 || hsize==z.header[0]+256*z.header[1]+2); if (hsize==0) { fprintf(out, "const U8 %s_array[2]={0,0};\n", var); } else { fprintf(out, "const U8 %s_array[%d]={ // COMP=%d HCOMP=%d\n ", var, hsize, z.cend, z.hend-z.hbegin); for (int i=0, j=0; i\n" "\n", filename); // Write pre_cmd fprintf(out, "const char *pre_cmd=\"%s\";\n", pcomp_cmd); // Write zlist, pzlist dump(out, z, "zlist"); dump(out, pz, "pzlist"); // Write Predictor::predict() fprintf(out, "int Predictor::predict() {\n" " switch(z.select) {\n" " case 1: {\n"); opt_predict(out, z); fprintf(out, " }\n" " default: return predict0();\n" " }\n" "}\n" "\n"); // Write Predictor::update() fprintf(out, "void Predictor::update(int y) {\n" " switch(z.select) {\n" " case 1: {\n"); opt_update(out, z); fprintf(out, " break;\n" " }\n" " default: return update0(y);\n" " }\n" " c8+=c8+y;\n" " if (c8>=256) {\n" " z.run(c8-256);\n" " hmap4=1;\n" " c8=1;\n" " }\n" " else if (c8>=16 && c8<32)\n" " hmap4=(hmap4&0xf)<<5|y<<4|1;\n" " else\n" " hmap4=(hmap4&0x1f0)|(((hmap4&0xf)*2+y)&0xf);\n" "}\n" "\n"); // Write ZPAQL::run() fprintf(out, "void ZPAQL::run(U32 input) {\n" " switch(select) {\n" " case 1: {\n"); opt_hcomp(out, z, 1); fprintf(out, " break;\n" " }\n" " case 2: {\n"); opt_hcomp(out, pz, 2); fprintf(out, " break;\n" " }\n" " default: run0(input);\n" " }\n" "}\n" "\n" "\n"); // Close file fclose(out); if (!quiet) printf("Created %s\n", filename); } #endif // not OPT ////////////////////////////// Decoder //////////////////////////// // Decoder decompresses using an arithmetic code class Decoder { FILE* in; // destination U32 low, high; // range U32 curr; // last 4 bytes of archive Predictor pr; // to get p int decode(int p); // return decoded bit (0..1) with probability p (0..8191) public: Decoder(FILE* f, ZPAQL& z); int decompress(); // return a byte or EOF int skip(); // skip to the end of the segment, return next byte }; Decoder::Decoder(FILE* f, ZPAQL& z): in(f), low(1), high(0xFFFFFFFF), curr(0), pr(z) {} inline int Decoder::decode(int p) { assert(p>=0 && p<65536); assert(high>low && low>0); if (currhigh) error("archive corrupted"); assert(curr>=low && curr<=high); U32 mid=low+((high-low)>>16)*p+((((high-low)&0xffff)*p)>>16); // split range assert(high>mid && mid>=low); int y=curr<=mid; if (y) high=mid; else low=mid+1; // pick half while ((high^low)<0x1000000) { // shift out identical leading bytes high=high<<8|255; low=low<<8; low+=(low==0); int c=getc(in); if (c==EOF) error("unexpected end of file"); curr=curr<<8|c; } return y; } int Decoder::decompress() { if (curr==0) { // finish initialization for (int i=0; i<4; ++i) curr=curr<<8|getc(in); } if (decode(0)) { if (curr!=0) error("decoding end of stream"); return EOF; } else { int c=1; while (c<256) { // get 8 bits int p=pr.predict()*2+1; c+=c+decode(p); pr.update(c&1); } return c-256; } } // Find end of compressed data and return next byte int Decoder::skip() { int c=0; while (curr==0) // at start? curr=getc(in); while (curr && (c=getc(in))!=EOF) // find 4 zeros curr=curr<<8|c; while ((c=getc(in))==0) ; // might be more than 4 return c; } /////////////////////////// PostProcessor //////////////////// class PostProcessor { int state; // input parse state int hsize; // header size int ph, pm; // sizes of H and M in z public: ZPAQL z; // holds PCOMP PostProcessor(ZPAQL& hz); void set(FILE* out, SHA1* p) {z.output=out; z.sha1=p;} // Set output int write(int c); // Input a byte, return state }; // Copy ph, pm from block header. sel selects ZPAQL::run() version PostProcessor::PostProcessor(ZPAQL& hz) { state=hsize=0; ph=hz.header[4]; pm=hz.header[5]; } // (PASS=0 | PROG=1 psize[0..1] pcomp[0..psize-1]) data... EOB=-1 // Return state: 1=PASS, 2..4=loading PROG, 5=PROG loaded int PostProcessor::write(int c) { assert(c>=-1 && c<=255); switch (state) { case 0: // initial state if (c<0) error("Unexpected EOS"); state=c+1; // 1=PASS, 2=PROG if (state>2) error("unknown post processing type"); break; case 1: // PASS if (z.output && c>=0) putc(c, z.output); // data if (z.sha1 && c>=0) z.sha1->put(c); break; case 2: // PROG if (c<0) error("Unexpected EOS"); hsize=c; // low byte of size state=3; break; case 3: // PROG psize[0] if (c<0) error("Unexpected EOS"); hsize+=c*256; // high byte of psize z.header.resize(hsize+300); z.cend=8; z.hbegin=z.hend=z.cend+128; z.header[4]=ph; z.header[5]=pm; state=4; break; case 4: // PROG psize[0..1] pcomp[0...] if (c<0) error("Unexpected EOS"); assert(z.hend>8; z.initp(); state=5; } break; case 5: // PROG ... data z.run(c); break; } return state; } /////////////////////////// rerun //////////////////////////// // Return "/" in Linux or "\\" in Windows or error if unknown const char* slash() { // Guess by counting / and \ in PATH (or TEMP) and pick the most common static char result[2]={0}; if (!result[0]) { int forward=0; const char *path=getenv("PATH"); if (!path) path=getenv("TEMP"); if (path) { for (; *path; ++path) { if (*path=='/') ++forward; if (*path=='\\') --forward; } } if (forward>0) result[0]='/'; if (forward<0) result[0]='\\'; } if (!result[0]) error("unknown operating system"); return result; } // Put the name of a temporary directory in filename ending eith \ or / void tempdir(Array& filename) { const char *env=getenv("TEMP"); if (env) append(filename, env); else append(filename, "."); int len=strlen(&filename[0]); if (len>0 && filename[len-1]!='/' && filename[len-1]!='\\') append(filename, slash()); } #ifndef OPT // Call the optimized ZPAQ with arguments argc, argv. The name of the // program is TEMP/zpaq_SHA1(z.header, pz.header, pre_cmd).exe // If it doesn't exist then create a .cpp file with the same name // and call zpaqmake to compile it first. For compression, // the optimize function needs to preprocess with pre_cmd. // For decompression, append block to "x" if not 0 and // ignore argv[3..skipped_files+2] void rerun(int argc, char** argv, ZPAQL& z, ZPAQL& pz, const char* pre_cmd, int block=0, int skipped_files=0) { // Get filename from hash of z, pz, pre_cmd SHA1 sha1; for (int i=0; i filename; tempdir(filename); append(filename, "zpaq_"); for (int i=0; i<20; ++i) { char s[10]; sprintf(s, "%02x", sha1.result(i)); append(filename, s); } append(filename, ".exe"); // Test if file exists. If not, create it FILE *in=fopen(&filename[0], "rb"); if (!in) { // Generate optimized C++ code int len=strlen(&filename[0]); assert(len>40); filename[len-4]=0; // chop .exe append(filename, ".cpp"); optimize(z, pz, &filename[0], pre_cmd); // compile it filename[len-4]=0; // chop .cpp Array cmd; append(cmd, "zpaqmake "); append(cmd, &filename[0]); if (!quiet) printf("%s\n", &cmd[0]); system(&cmd[0]); // Test if compile worked append(filename, ".exe"); in=fopen(&filename[0], "rb"); if (!in) error("optimize: compile failed"); } fclose(in); // Execute command filename.exe(argc, argv) Array cmd; append(cmd, &filename[0]); for (int i=1; i=skipped_files+3) { // skip files append(cmd, " "); append(cmd, argv[i]); } if (i==1 && block>0) { // append block to command if not 0 char s[20]; sprintf(s, "%d", block); append(cmd, s); } } if (!quiet) printf("%s\n", &cmd[0]); system(&cmd[0]); } #endif /////////////////////////// Decompress /////////////////////// void usage(); // print help message and exit. // Reject archive filenames that might cause problems bool validate_filename(const char* filename) { int len=strlen(filename); if (len<1) return true; // No name is OK if (len>511) return false; // name too long if (strstr(filename, "../")) return false; // no backward paths if (strstr(filename, "..\\")) return false; if (filename[0]=='/' || filename[0]=='\\') return false; // no absolute path for (int i=0; iLEVEL || c<1 || getc(in)!=1) error("not ZPAQ"); // Skip block header int hsize=getc(in); hsize+=getc(in)*256; if (hsize<6 || hsize>65535) error("hsize missing"); while (hsize-->0) getc(in); // Skip segments while ((c=getc(in))==1) { ++segments; while (getc(in)>0) ; // skip filename while (getc(in)>0) ; // skip comment if (getc(in)!=0) error("reserved 0 missing"); // Skip to end of data U32 c4=0xFFFFFFFF; // last 4 bytes will be all 0 while ((c=getc(in))!=EOF && (c4=c4<<8|c)!=0) ; if (c==EOF) error("unexpected end of file"); while ((c=getc(in))==0) ; if (c==253) { // Skip SHA1 for (int i=0; i<20; ++i) getc(in); } else if (c!=254) error("missing end of segment marker"); } if (c!=255) error("missing end of block marker"); return segments; } // Remove path from filename const char* strip(const char* filename) { assert(filename); int len=strlen(filename); const char *result=filename; for (int i=0; i=3); // Get options bool ocmd=false, pcmd=false, ncmd=false, tcmd=false; int blocknum=0; const char* cmd=argv[1]; assert(cmd); while (*cmd) { if (*cmd=='o') ocmd=true; else if (*cmd=='p') pcmd=true; else if (*cmd=='n') ncmd=true; else if (*cmd=='t') tcmd=true; else if (*cmd=='q') quiet=true; else if (*cmd=='x') break; else usage(); ++cmd; } if (cmd[0]!='x') usage(); if (cmd[1]) blocknum=atoi(cmd+1); #ifdef OPT ocmd=false; if (blocknum<1) error("'x' command requires a block number"); #endif // Open archive FILE* in=fopen(argv[2], "rb"); if (!in) perror(argv[2]), exit(1); // Skip to specified block int block=1; while (blocknum>block) { skip_block(in); ++block; } // Read the archive int filecount=0; // number of files extracted int c; while (find_start(in)) { if (getc(in)!=LEVEL || getc(in)!=1) error("Not ZPAQ"); // Read block header ZPAQL z; z.read(Reader(in)); // PostProcessor and Decoder is created and and destroyed for each block PostProcessor pp(z); Decoder dec(in, z); #ifdef OPT z.select=1; // select optimized code z.verify(); pp.z.select=2; #else // clear output file for append if (ncmd && (block==1 || block==blocknum)) { if (argc!=4) error("'nx' requires one output filename"); remove(argv[3]); } #endif // Read segments bool first=true; // first segment of block? while ((c=getc(in))==1) { // Read the filename char filename[512]={0}; int i; for (i=0; (c=getc(in))>0; ++i) if (i<511) filename[i]=c; if (i>0 && i<512) filename[i]=0; if (!ocmd && !quiet) printf("%s ", filename); #ifndef OPT // If the user named some but not all output files, then skip the rest if (!ncmd && argc>3 && filecount+3>=argc) { if (!quiet) printf("\nSkipping %s and remaining files\n", filename); goto end; } #endif // Get comment char comment[20]={0}; i=0; while ((c=getc(in))!=EOF && c!=0) { if (i<19) comment[i]=c; ++i; } if (!ocmd && !quiet) printf("%s -> ", comment); if (getc(in)) error("reserved"); // reserved 0 // If not 'o', open output file FILE *out=0; if (!ocmd) { // If 'n', open as argv[3] for append. if (ncmd) { if (argc!=4) error("'nx' command requires one output filename"); out=fopen(argv[3], "ab"); if (!out) perror(argv[3]), exit(1); if (!quiet) printf("%s -> ", argv[3]); } // Else if the user gave an output file starting at argv[3], use it instead. else if (argc>3) { if (filecount+3>=argc) goto end; out=fopen(argv[filecount+3], "wb"); if (!out) { perror(argv[filecount+3]); goto end; } else if (!quiet) printf("%s ", argv[filecount+3]); } // Otherwise, use the names in the archive, but don't clobber // or use suspicious filenames else { const char* newname=filename; if (!pcmd) newname=strip(filename); if (newname!=filename) printf("%s -> ", newname); if (!validate_filename(newname)) { printf("Error: bad filename\n"); goto end; } out=fopen(newname, "rb"); if (out) { fclose(out); printf("Error: won't overwrite\n"); goto end; } else { out=fopen(newname, "wb"); if (!out) { perror(newname); goto end; } } } } // Decompress SHA1 sha1; pp.set(out, &sha1); #ifndef OPT // optimize: Decode PCOMP in first segment and skip the rest of // the block. Call rerun to use external optimized program to // extract the current block. if (ocmd) { if (first) { first=false; while ((c=dec.decompress())!=EOF) { c=pp.write(c); if (c==1 || c==5) { // 1=no PCOMP, 5=PCOMP c=dec.skip(); rerun(argc, argv, z, pp.z, "", blocknum?0:block, ncmd?0:filecount); break; } } } else c=dec.skip(); } else #endif // Extract the current segment { time_t now=time(0); int len=0; while ((c=dec.decompress())!=EOF) { if (!ocmd && tcmd) { // don't preprocess if (out) putc(c, out); sha1.put(c); } else { if (pp.write(c)==5 && first) { pp.z.verify(); first=false; } } if (!ocmd && !quiet && !(len++&0xfff) && time(0)!=now) { for (int i=printf("%1.0f ", sha1.size()); i>0; --i) putchar('\b'); fflush(stdout); now=time(0); } } if (!tcmd) pp.write(-1); if (out) fclose(out); } ++filecount; // Check for end of segment and block markers int eos=c; if (!ocmd) eos=getc(in); // 253=SHA1 follows, 254=EOS if (eos==253) { U8 hash[20]; bool match=true; for (int i=0; i<20; ++i) { hash[i]=getc(in); if (hash[i]!=sha1.result(i)) match=false; } if (!ocmd) { if (match) { if (!quiet) printf("Checksum OK "); } else { fprintf(stderr, "CHECKSUM FAILED: FILE IS NOT IDENTICAL\n Archive SHA1: "); for (int i=0; i<20; ++i) fprintf(stderr, "%02x", hash[i]); fprintf(stderr, "\n File SHA1: "); for (int i=0; i<20; ++i) fprintf(stderr, "%02x", sha1.result(i)); fprintf(stderr, "\n"); } } } else if (eos!=254) error("missing end of segment marker"); else if (!quiet) printf("OK, no checksum "); if (!ocmd && !quiet) printf("\n"); } if (c!=255) error("missing end of block marker"); if (blocknum) goto end; ++block; } // Close the archive end: if (!quiet) printf("%d file(s) extracted\n", filecount); fclose(in); } //////////////////////////// Compressor //////////////////////////// //////////////////////////// Encoder /////////////////////////////// // Encoder compresses using an arithmetic code class Encoder { FILE* out; // destination U32 low, high; // range Predictor pr; // to get p void encode(int y, int p); // encode bit y (0..1) with probability p (0..8191) U32 in_low, in_high; // number of input, output bytes (64 bits) U32 out_low, out_high; public: Encoder(FILE* f, ZPAQL& z); void compress(int c); // c is 0..255 or EOF void stat() {pr.stat();} // print predictor statistics void setOutput(FILE* f) {out=f;} double in_size() const {return in_low+4294967296.0*in_high;} double out_size() const {return out_low+4294967296.0*out_high;} void reset() {in_low=in_high=out_low=out_high=0;} // clear sizes }; // Compress to file f using model z Encoder::Encoder(FILE* f, ZPAQL& z): out(f), low(1), high(0xFFFFFFFF), pr(z) { reset(); } // compress bit y having probability p/64K inline void Encoder::encode(int y, int p) { assert(out); assert(p>=0 && p<65536); assert(y==0 || y==1); assert(high>low && low>0); U32 mid=low+((high-low)>>16)*p+((((high-low)&0xffff)*p)>>16); // split range assert(high>mid && mid>=low); if (y) high=mid; else low=mid+1; // pick half while ((high^low)<0x1000000) { // write identical leading bytes putc(high>>24, out); // same as low>>24 high=high<<8|255; low=low<<8; low+=(low==0); // so we don't code 4 0 bytes in a row out_high+=(++out_low==0); } } // compress byte c (0..255 or -1=EOS) void Encoder::compress(int c) { assert(out); if (c==-1) encode(1, 0); else { assert(c>=0 && c<=255); in_high+=(++in_low==0); encode(0, 0); for (int i=7; i>=0; --i) { int p=pr.predict()*2+1; assert(p>0 && p<65536); int y=c>>i&1; encode(y, p); pr.update(y); } } } //////////////////////////// Compress //////////////////////////// #ifndef OPT // Parse up to 9 comma separated numeric arguments appended to // cmd and put in global args[0..8]. Replace commas with 0 in cmd. void get_args(char *cmd) { if (cmd && cmd[0]) { int i=0; char *s=cmd, *sn; while (i<9 && (sn=strchr(s, ','))!=0) { args[i++]=atoi(sn+1); *sn=0; s=sn+1; } } } #endif // Compress files: [pnsiqvo]c|a[F][,N...]] archive files... void compress(int argc, char** argv) { assert(argc>=3); // Get command options bool pcmd=false, ncmd=false, scmd=false, icmd=false, // options tcmd=false, ocmd=false, acmd=false, ccmd=false; char *cmd=argv[1]; while (cmd && cmd[0]) { if (cmd[0]=='p') pcmd=true, ncmd=false; else if (cmd[0]=='n') ncmd=true, pcmd=false; else if (cmd[0]=='s') scmd=true; else if (cmd[0]=='i') icmd=true; else if (cmd[0]=='q') quiet=true; else if (cmd[0]=='v') verbose=true; else if (cmd[0]=='t') tcmd=true; else if (cmd[0]=='o') ocmd=true; else if (cmd[0]=='a') {acmd=true; break;} else if (cmd[0]=='c') {ccmd=true; break;} else usage(); ++cmd; } ++cmd; if (acmd==ccmd) usage(); #ifndef OPT // Parse comma separated arguments after config file (now in cmd) get_args(cmd); #endif ZPAQL z, pz; // compression and postprocessing models Array pcomp_cmd(64); // name of external preprocessor // Initialize from optimization code #ifdef OPT z.read(Reader(zlist, 0x10002)); z.select=1; if (pzlist[0] || pzlist[1]) { // hsize>0 ? pz.read(Reader(pzlist, 0x10002)); pz.select=2; } append(pcomp_cmd, pre_cmd); // Initialize from config file or use default #else if (cmd[0]) { // config file name? FILE* cfg=fopen(cmd, "rb"); if (!cfg) perror(cmd), exit(1); compile(cfg, z, pz, pcomp_cmd, args); fclose(cfg); if (!quiet) printf("%1.3f MB memory required.\n", z.memory()/1000000); } else { static U8 header[71]={ // COMP 34 bytes from mid.cfg 69,0,3,3,0,0,8,3,5,8,13,0,8,17,1,8, 18,2,8,18,3,8,19,4,4,22,24,7,16,0,7,24, 255,0, // HCOMP 37 bytes 17,104,74,4,95,1,59,112,10,25,59,112,10,25,59,112, 10,25,59,112,10,25,59,112,10,25,59,10,59,112,25,69, 207,8,112,56,0}; z.read(Reader(header, 71)); } if (ocmd) { rerun(argc, argv, z, pz, &pcomp_cmd[0]); return; } #endif if (pz.hend>pz.hbegin) pz.initp(); // Construct temporary file names from archive name Array prefile, tempfile; tempdir(prefile); tempdir(tempfile); append(prefile, argv[2]); append(prefile, ".zpaq.pre"); append(tempfile, argv[2]); append(tempfile, ".zpaq.tmp"); // Initialize preprocessor remove(&tempfile[0]); // Compress files in argv[3...argc-1] FILE *out=0; // archive opened when ready to compress first file Encoder enc(out, z); // compressor double outsum=0; // total output size for (int i=3; ipz.hbegin) { // PCOMP section? fclose(in); remove(&prefile[0]); // Run external preprocessor int len=strlen(&pcomp_cmd[0]); assert(len>=0 && len=0 && psize<0x10000); assert(pz.header.size()>=pz.hend); if (psize==0) enc.compress(0); // PASS else { enc.compress(1); // POST enc.compress(psize&255); // size low byte enc.compress(psize>>8&255); // size high byte for (int j=0; j %1.0f ", presize); } int len=0; time_t now=time(0); while ((c=getc(in))!=EOF) { enc.compress(c); if (!quiet && !(len++&0xfff) && now!=time(0)) { for (int j=printf("%1.0f -> %1.0f ", enc.in_size(), outsize+enc.out_size()); j>0; --j) putchar('\b'); fflush(stdout); now=time(0); } } enc.compress(-1); // Write segment trailer if (scmd) // no SHA1 outsize+=fprintf(out, "%c%c%c%c%c", 0, 0, 0, 0, 254); else { outsize+=20+fprintf(out, "%c%c%c%c%c", 0, 0, 0, 0, 253); for (int j=0; j<20; ++j) putc(check1.result(j), out); } fclose(in); in=0; remove(&prefile[0]); if (!quiet) printf("-> %1.0f \n", outsize+enc.out_size()); outsum+=outsize+enc.out_size(); } // Code end of block and close archive if (out) { putc(255, out); // block trailer if (!quiet) { printf("-> %1.0f\n", outsum); enc.stat(); // print statistics } fclose(out); // If no error then clean up temporary files remove(&tempfile[0]); remove(&prefile[0]); } else if (!quiet) printf("Archive %s not updated\n", argv[2]); } ////////////////////////// list ////////////////////////// #ifndef OPT // List archive contents: l archive void list(int argc, char** argv) { assert(argc>2 && argv[2]); // Open archive FILE* in=fopen(argv[2], "rb"); if (!in) perror(argv[2]), exit(1); // Read the file int c, blocks=0; while (find_start(in)) { // Read block header if (getc(in)!=LEVEL || getc(in)!=1) error("not ZPAQ"); ZPAQL z; double size=6+z.read(in); // compressed size printf("Block %d: requires %1.3f MB memory\n", ++blocks, z.memory()/1000000); // Read segments while ((c=getc(in))==1) { // Print filename and comments printf(" "); while ((c=getc(in))!=EOF && c) putchar(c), size+=1; printf(" "); while ((c=getc(in))!=EOF && c) putchar(c), size+=1; if (getc(in)!=0) error("reserved data"); size+=6; // Skip to end of data U32 c4=0xFFFFFFFF; // last 4 bytes will be all 0 while ((c=getc(in))!=EOF && (c4=c4<<8|c)!=0) size+=1; if (c==EOF) error("unexpected end of file"); while ((c=getc(in))==0) size+=1; if (c==253) { // print SHA1 in verbose mode printf(" SHA1="); size+=20; for (int i=0; i<20; ++i) { int c=getc(in); if (i<4) printf("%02x", c); } printf("..."); } else if (c!=254) error("missing end of segment marker"); printf(" -> %1.0f\n", size); size=0; } if (c!=255) error("missing end of block marker"); } } #endif //////////////////////////// run /////////////////////////// #ifndef OPT // Debug config file: [pvth]rF[,N...] [args...] // p=run PCOMP, v=verbose, t=trace once per numeric arg // otherwise args are output, input (default stdout, stdin), // h=trace in hexadecimal, o=generate zpaqopt.h. void run(int argc, char** argv) { assert(argc>=2); // Get options bool pcmd=false, tcmd=false, hcmd=false; char *cmd=argv[1]; assert(cmd); while (cmd[0]) { if (cmd[0]=='p') pcmd=true; else if (cmd[0]=='v') verbose=true; else if (cmd[0]=='t') tcmd=true; else if (cmd[0]=='h') hcmd=true; else if (cmd[0]=='r') break; else usage(); ++cmd; } ++cmd; // now points config file name if (!cmd[0]) usage(); // Parse comma separated arguments after config file (now in cmd) get_args(cmd); // Initialze virtual machine ZPAQL hz, pz; // HCOMP, PCOMP virtual machines Array pcomp_cmd; // PCOMP command (not used) FILE* in=fopen(cmd, "r"); if (!in) perror(cmd), exit(1); compile(in, hz, pz, pcomp_cmd, args); ZPAQL& z=pcmd?pz:hz; // the machine to be run if (z.hend<=z.hbegin) error("no program to run"); if (pcmd) z.initp(); else z.inith(); // Run the program if (tcmd) { // trace with numeric args for (int i=2; i2) { in=fopen(argv[2], "rb"); if (!in) perror(argv[2]), exit(1); } if (argc>3) { z.output=fopen(argv[3], "wb"); if (!z.output) perror(argv[3]), exit(1); } int c; while ((c=getc(in))!=EOF) z.run(c); z.run(U32(-1)); } } #endif ///////////////////////////// Main /////////////////////////// // Print help message and exit void usage() { printf("ZPAQ v1.10 archiver, (C) 2009, Ocarina Networks Inc.\n" "Written by Matt Mahoney, " __DATE__ ".\n" "This is free software under GPL v3, http://www.gnu.org/copyleft/gpl.html\n" "\n" "To compress to new archive: zpaq [opnsitqv]c[F[,N...]] archive files...\n" "To append to archive: zpaq [opnsitqv]a[F[,N...]] archive files...\n" "Optional modifiers:\n" #ifndef OPT " o = compress faster (requires C++ compiler)\n" #endif " p = store filename paths in archive\n" " n = don't store filenames (names will be needed to decompress)\n" " s = don't store SHA1 checksums (saves 20 bytes)\n" " i = don't store file sizes as comments (saves a few bytes)\n" " t = append locator tag to non-ZPAQ data\n" " q = quiet\n" #ifndef OPT " v = verbose (show F as it compiles)\n" " F = use options in configuration file F (min.cfg, max.cfg)\n" " ,N = pass numeric arguments to F\n" "To list contents: zpaq l archive\n" #endif "To extract: zpaq [opntq]x[N] archive [files...]\n" #ifndef OPT " o = extract faster (requires C++ compiler)\n" #endif " p = extract to stored paths instead of current directory\n" " n = decompress all to one file\n" " t = don't post-process (for debugging)\n" " q = quiet\n" " N = extract only block N (1, 2, 3...)\n" " files... = rename extracted files (clobbers)\n" " otherwise use stored names (does not clobber)\n" #ifndef OPT "To debug configuration file F: zpaq [pthv]rF[,N...] [args...]\n" " p = run PCOMP (default is to run HCOMP)\n" " t = trace (single step), args are numeric inputs\n" " otherwise args are input, output (default stdin, stdout)\n" " h = trace display in hexadecimal\n" " v = verbose compile\n" " ,N = pass numeric arguments to F\n" #endif ); exit(0); } // Command syntax as in usage() int main(int argc, char** argv) { time_t start=time(0); // Check usage if (argc<2) usage(); // Find the command c, a, x, l, r char cmd=0; for (int i=0; (cmd=argv[1][i])!=0; ++i) if (strchr("caxlr", cmd)) break; // Do the command if (argc>=3 && (cmd=='a' || cmd=='c')) compress(argc, argv); else if (argc>=3 && cmd=='x') decompress(argc, argv); #ifndef OPT else if (argc>=3 && cmd=='l') list(argc, argv); else if (cmd=='r') run(argc, argv); #endif else usage(); // Print time used if (!quiet) { printf("Process time %1.2f sec. Wall time %1.0f sec.\n", double(clock())/CLOCKS_PER_SEC, difftime(time(0), start)); } return 0; } zpaq-1.10.orig/max.cfg0000644000000000000500000000333211263657173013136 0ustar rootsrc(zpaq 1.07+ config file tuned for high compression (slow) Uses 245 x 2^$1 MB memory, where $1 is the first argument. (C) 2009, Ocarina Networks Inc. Written by Matt Mahoney. This software is free under GPL v3. http://www.gnu.org/copyleft/gpl.html ) comp 5 9 0 0 22 (hh hm ph pm n) 0 const 160 1 icm 5 (orders 0-6) 2 isse 13 1 (sizebits j) 3 isse $1+16 2 4 isse $1+18 3 5 isse $1+19 4 6 isse $1+19 5 7 isse $1+20 6 8 match $1+22 $1+24 9 icm $1+17 (order 0 word) 10 isse $1+19 9 (order 1 word) 11 icm 13 (sparse with gaps 1-3) 12 icm 13 13 icm 13 14 icm 14 (pic) 15 mix 16 0 15 24 255 (mix orders 1 and 0) 16 mix 8 0 16 10 255 (including last mixer) 17 mix2 0 15 16 24 0 18 sse 8 17 32 255 (order 0) 19 mix2 8 17 18 16 255 20 sse 16 19 32 255 (order 1) 21 mix2 0 19 20 16 0 hcomp c++ *c=a b=c a=0 (save in rotating buffer) d= 2 hash *d=a b-- (orders 1,2,3,4,5,7) d++ hash *d=a b-- d++ hash *d=a b-- d++ hash *d=a b-- d++ hash *d=a b-- d++ hash b-- hash *d=a b-- d++ hash *d=a b-- (match, order 8) d++ a=*c a&~ 32 (lowercase words) a> 64 if a< 91 if (if a-z) d++ hashd d-- (update order 1 word hash) *d<>a a+=*d a*= 20 *d=a (order 0 word hash) jmp 9 endif endif (else not a letter) a=*d a== 0 ifnot (move word order 0 to 1) d++ *d=a d-- endif *d=0 (clear order 0 word hash) (end else) d++ d++ b=c b-- a=0 hash *d=a (sparse 2) d++ b-- a=0 hash *d=a (sparse 3) d++ b-- a=0 hash *d=a (sparse 4) d++ a=b a-= 212 b=a a=0 hash *d=a b<>a a-= 216 b<>a a=*b a&= 60 hashd (pic) d++ a=*c a<<= 9 *d=a (mix) d++ d++ d++ d++ d++ *d=a (sse) halt post 0 end