WWW-RobotRules-6.03000755001750001751 015204207641 13253 5ustar00olafolaf000000000000Changes100644001750001751 273015204207641 14631 0ustar00olafolaf000000000000WWW-RobotRules-6.03Revision history for Perl distribution WWW-RobotRules 6.03 2026-05-23 02:23:28Z - Doing a proper version bump. 6.02 2026-05-21 14:45:27Z - WWW::RobotRules::AnyDBM_File::agent() no longer truncates the on-disk cache through an untie/tie(O_TRUNC) sequence. Stale-data reset now goes through the tied-hash CLEAR, eliminating a symlink-follow race that a local attacker with write access to the cache directory could exploit to overwrite arbitrary files writable by the crawler user. - The on-disk cache file mode has been tightened from 0640 to 0600. - t/rules-dbm.t has been hardened against symlink attacks on its tempfile during package builds. - A new SECURITY CONSIDERATIONS POD section documents the residual caller-trust requirement: the constructor's tie still follows symlinks because AnyDBM_File cannot portably plumb O_NOFOLLOW, so the caller must store the cache file in a directory writable only by the user that runs the code. - References: CWE-377, CWE-378, CWE-379. 6.02 2012-02-18 - Restore perl-5.8.1 compatibility. 6.01 2011-03-13 - Added legal notice and updated the meta repository link. 6.00 2011-02-25 - Initial release of WWW-RobotRules as a separate distribution. There are no code changes besides incrementing the version number since libwww-perl-5.837. The WWW::RobotRules module used to be bundled with the libwww-perl distribution. LICENSE100644001750001751 4627015204207641 14372 0ustar00olafolaf000000000000WWW-RobotRules-6.03This software is copyright (c) 1995 by Gisle Aas. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. Terms of the Perl programming language system itself a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or b) the "Artistic License" --- The GNU General Public License, Version 1, February 1989 --- This software is Copyright (c) 1995 by Gisle Aas. This is free software, licensed under: The GNU General Public License, Version 1, February 1989 GNU GENERAL PUBLIC LICENSE Version 1, February 1989 Copyright (C) 1989 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: a) cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and b) cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. d) You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, b) accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, c) accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, see . Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19xx name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. , 1 April 1989 Moe Ghoul, President of Vice That's all there is to it! --- The Perl Artistic License 1.0 --- This software is Copyright (c) 1995 by Gisle Aas. This is free software, licensed under: The Perl Artistic License 1.0 The "Artistic License" Preamble The intent of this document is to state the conditions under which a Package may be copied, such that the Copyright Holder maintains some semblance of artistic control over the development of the package, while giving the users of the package the right to use and distribute the Package in a more-or-less customary fashion, plus the right to make reasonable modifications. Definitions: "Package" refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files created through textual modification. "Standard Version" refers to such a Package if it has not been modified, or has been modified in accordance with the wishes of the Copyright Holder as specified below. "Copyright Holder" is whoever is named in the copyright or copyrights for the package. "You" is you, if you're thinking about copying or distributing this Package. "Reasonable copying fee" is whatever you can justify on the basis of media cost, duplication charges, time of people involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at large as a market that must bear the fee.) "Freely Available" means that no fee is charged for the item itself, though there may be fees involved in handling the item. It also means that recipients of the item may redistribute it under the same conditions they received it. 1. You may make and give away verbatim copies of the source form of the Standard Version of this Package without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright Holder. A Package modified in such a way shall still be considered the Standard Version. 3. You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each changed file stating how and when you changed that file, and provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as uunet.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. c) rename any non-standard executables so the names do not conflict with standard executables, which must also be provided, and provide a separate manual page for each non-standard executable that clearly documents how it differs from the Standard Version. d) make other distribution arrangements with the Copyright Holder. 4. You may distribute the programs of this Package in object code or executable form, provided that you do at least ONE of the following: a) distribute a Standard Version of the executables and library files, together with instructions (in the manual page or equivalent) on where to get the Standard Version. b) accompany the distribution with the machine-readable source of the Package with your modifications. c) give non-standard executables non-standard names, and clearly document the differences in manual pages (or equivalent), together with instructions on where to get the Standard Version. d) make other distribution arrangements with the Copyright Holder. 5. You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution provided that you do not advertise this Package as a product of your own. You may embed this Package's interpreter within an executable of yours (by linking); this shall be construed as a mere form of aggregation, provided that the complete Standard Version of the interpreter is so embedded. 6. The scripts and library files supplied as input to or produced as output from the programs of this Package do not automatically fall under the copyright of this Package, but belong to whoever generated them, and may be sold commercially, and may be aggregated with this Package. If such scripts or library files are aggregated with this Package via the so-called "undump" or "unexec" methods of producing a binary executable image, then distribution of such an image shall neither be construed as a distribution of this Package nor shall it fall under the restrictions of Paragraphs 3 and 4, provided that you do not represent such an executable image as a Standard Version of this Package. 7. C subroutines (or comparably compiled subroutines in other languages) supplied by you and linked into this Package in order to emulate subroutines and variables of the language defined by this Package shall not be considered part of this Package, but are the equivalent of input as in Paragraph 6, provided these subroutines do not change the language in any way that would cause it to fail the regression tests for the language. 8. Aggregation of this Package with a commercial distribution is always permitted provided that the use of this Package is embedded; that is, when no overt attempt is made to make this Package's interfaces visible to the end user of the commercial distribution. Such use shall not be construed as a distribution of this Package. 9. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. 10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The End INSTALL100644001750001751 457115204207641 14374 0ustar00olafolaf000000000000WWW-RobotRules-6.03This is the Perl distribution WWW-RobotRules. Installing WWW-RobotRules is straightforward. ## Installation with cpanm If you have cpanm, you only need one line: % cpanm WWW::RobotRules If it does not have permission to install modules to the current perl, cpanm will automatically set up and install to a local::lib in your home directory. See the local::lib documentation (https://metacpan.org/pod/local::lib) for details on enabling it in your environment. ## Installing with the CPAN shell Alternatively, if your CPAN shell is set up, you should just be able to do: % cpan WWW::RobotRules ## Manual installation As a last resort, you can manually install it. If you have not already downloaded the release tarball, you can find the download link on the module's MetaCPAN page: https://metacpan.org/pod/WWW::RobotRules Untar the tarball, install configure prerequisites (see below), then build it: % perl Makefile.PL % make && make test Then install it: % make install On Windows platforms, you should use `dmake` or `nmake`, instead of `make`. If your perl is system-managed, you can create a local::lib in your home directory to install modules to. For details, see the local::lib documentation: https://metacpan.org/pod/local::lib The prerequisites of this distribution will also have to be installed manually. The prerequisites are listed in one of the files: `MYMETA.yml` or `MYMETA.json` generated by running the manual build process described above. ## Configure Prerequisites This distribution requires other modules to be installed before this distribution's installer can be run. They can be found under the "configure_requires" key of META.yml or the "{prereqs}{configure}{requires}" key of META.json. ## Other Prerequisites This distribution may require additional modules to be installed after running Makefile.PL. Look for prerequisites in the following phases: * to run make, PHASE = build * to use the module code itself, PHASE = runtime * to run tests, PHASE = test They can all be found in the "PHASE_requires" key of MYMETA.yml or the "{prereqs}{PHASE}{requires}" key of MYMETA.json. ## Documentation WWW-RobotRules documentation is available as POD. You can run `perldoc` from a shell to read the documentation: % perldoc WWW::RobotRules For more information on installing Perl modules via CPAN, please see: https://www.cpan.org/modules/INSTALL.html cpanfile100644001750001751 160015204207641 15035 0ustar00olafolaf000000000000WWW-RobotRules-6.03use strict; use warnings; on 'configure' => sub { requires 'ExtUtils::MakeMaker'; }; on 'runtime' => sub { requires 'perl' => '5.008001'; requires 'strict'; requires 'AnyDBM_File'; requires 'Carp'; requires 'Fcntl'; requires 'URI' => '1.10'; # DB_File is only needed for the optional WWW::RobotRules::DB_File # backend. It is a non-core XS module, so it is suggested, not required. suggests 'DB_File'; }; on 'test' => sub { requires 'Test::More' => '0.96'; requires 'strict'; requires 'warnings'; }; on 'develop' => sub { requires 'Pod::Coverage::TrustPod'; requires 'Pod::Spell' => '1.25'; requires 'Test::EOL' => '2.00'; requires 'Test::MinimumVersion'; requires 'Test::Mojibake'; requires 'Test::Pod'; requires 'Test::Pod::Coverage'; requires 'Test::Portability::Files'; requires 'Test::Version'; }; dist.ini100644001750001751 377615204207641 15015 0ustar00olafolaf000000000000WWW-RobotRules-6.03name = WWW-RobotRules author = Gisle Aas license = Perl_5 copyright_holder = Gisle Aas copyright_year = 1995 [Git::GatherDir] exclude_filename = LICENSE exclude_filename = META.json exclude_filename = Makefile.PL exclude_filename = perlimports.toml exclude_filename = precious.toml exclude_filename = .perltidyrc exclude_filename = README.md [MetaConfig] [MetaProvides::Package] [MetaNoIndex] directory = t directory = xt [MetaYAML] [MetaJSON] [MetaResources] x_MailingList = mailto:libwww@perl.org [Git::Contributors] [GithubMeta] issues = 1 user = libwww-perl [Manifest] [License] [InstallGuide] :version = 1.200013 [Prereqs::FromCPANfile] [MakeMaker] [MojibakeTests] [Test::Version] [Test::ReportPrereqs] [Test::Compile] :version = 2.059 bail_out_on_fail = 1 xt_mode = 1 ; WWW::RobotRules::DB_File needs the optional, non-core DB_File module skip = WWW::RobotRules::DB_File [Test::Portability] [Test::EOL] [Test::MinimumVersion] [PodSyntaxTests] [Test::Pod::Coverage::Configurable] skip = WWW::RobotRules::AnyDBM_File skip = WWW::RobotRules::DB_File trustme = WWW::RobotRules => qr/^(?:Version|is_me|visit|no_visits|last_visit|fresh_until|push_rules|clear_rules|rules|dump)$/ [Test::PodSpelling] wordlist = Pod::Wordlist stopword = AnyDBM stopword = Aas stopword = Ardo stopword = cybermapper stopword = DBM stopword = diskcaching stopword = Gisle stopword = Hakan stopword = Koster stopword = Martijn stopword = RobotUA stopword = txt [Git::Check] allow_dirty = [CheckStrictVersion] decimal_only = 1 [RunExtraTests] [CheckChangeLog] [CheckChangesHasContent] [TestRelease] [UploadToCPAN] [ReadmeAnyFromPod / Markdown_Readme] source_filename = lib/WWW/RobotRules.pm type = markdown filename = README.md location = root phase = release [CopyFilesFromRelease] filename = META.json filename = LICENSE [@Git::VersionManager] commit_files_after_release = META.json commit_files_after_release = LICENSE commit_files_after_release = README.md [Git::Push] [ConfirmRelease] META.yml100644001750001751 3601015204207641 14625 0ustar00olafolaf000000000000WWW-RobotRules-6.03--- abstract: 'database of robots.txt-derived permissions' author: - 'Gisle Aas ' build_requires: ExtUtils::MakeMaker: '0' File::Spec: '0' Test::More: '0.96' strict: '0' warnings: '0' configure_requires: ExtUtils::MakeMaker: '0' dynamic_config: 0 generated_by: 'Dist::Zilla version 6.037, CPAN::Meta::Converter version 2.150013' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: WWW-RobotRules no_index: directory: - t - xt provides: WWW::RobotRules: file: lib/WWW/RobotRules.pm version: '6.03' WWW::RobotRules::AnyDBM_File: file: lib/WWW/RobotRules/AnyDBM_File.pm version: '6.03' WWW::RobotRules::DB_File: file: lib/WWW/RobotRules/DB_File.pm version: '6.03' WWW::RobotRules::InCore: file: lib/WWW/RobotRules.pm version: '6.03' requires: AnyDBM_File: '0' Carp: '0' Fcntl: '0' URI: '1.10' perl: '5.008001' strict: '0' resources: MailingList: mailto:libwww@perl.org bugtracker: https://github.com/libwww-perl/WWW-RobotRules/issues homepage: https://github.com/libwww-perl/WWW-RobotRules repository: https://github.com/libwww-perl/WWW-RobotRules.git version: '6.03' x_Dist_Zilla: perl: version: '5.042002' plugins: - class: Dist::Zilla::Plugin::Git::GatherDir config: Dist::Zilla::Plugin::GatherDir: exclude_filename: - .perltidyrc - LICENSE - META.json - Makefile.PL - README.md - perlimports.toml - precious.toml exclude_match: [] include_dotfiles: 0 prefix: '' prune_directory: [] root: . Dist::Zilla::Plugin::Git::GatherDir: include_untracked: 0 name: Git::GatherDir version: '2.052' - class: Dist::Zilla::Plugin::MetaConfig name: MetaConfig version: '6.037' - class: Dist::Zilla::Plugin::MetaProvides::Package config: Dist::Zilla::Plugin::MetaProvides::Package: finder_objects: - class: Dist::Zilla::Plugin::FinderCode name: MetaProvides::Package/AUTOVIV/:InstallModulesPM version: '6.037' include_underscores: 0 Dist::Zilla::Role::MetaProvider::Provider: $Dist::Zilla::Role::MetaProvider::Provider::VERSION: '2.002004' inherit_missing: 1 inherit_version: 1 meta_noindex: 1 Dist::Zilla::Role::ModuleMetadata: Module::Metadata: '1.000038' version: '0.006' name: MetaProvides::Package version: '2.004003' - class: Dist::Zilla::Plugin::MetaNoIndex name: MetaNoIndex version: '6.037' - class: Dist::Zilla::Plugin::MetaYAML name: MetaYAML version: '6.037' - class: Dist::Zilla::Plugin::MetaJSON name: MetaJSON version: '6.037' - class: Dist::Zilla::Plugin::MetaResources name: MetaResources version: '6.037' - class: Dist::Zilla::Plugin::Git::Contributors config: Dist::Zilla::Plugin::Git::Contributors: git_version: 2.43.0 include_authors: 0 include_releaser: 1 order_by: name paths: [] name: Git::Contributors version: '0.039' - class: Dist::Zilla::Plugin::GithubMeta name: GithubMeta version: '0.58' - class: Dist::Zilla::Plugin::Manifest name: Manifest version: '6.037' - class: Dist::Zilla::Plugin::License name: License version: '6.037' - class: Dist::Zilla::Plugin::InstallGuide config: Dist::Zilla::Role::ModuleMetadata: Module::Metadata: '1.000038' version: '0.006' name: InstallGuide version: '1.200014' - class: Dist::Zilla::Plugin::Prereqs::FromCPANfile name: Prereqs::FromCPANfile version: '0.08' - class: Dist::Zilla::Plugin::MakeMaker config: Dist::Zilla::Role::TestRunner: default_jobs: '8' name: MakeMaker version: '6.037' - class: Dist::Zilla::Plugin::MojibakeTests name: MojibakeTests version: '0.8' - class: Dist::Zilla::Plugin::Test::Version name: Test::Version version: '1.09' - class: Dist::Zilla::Plugin::Test::ReportPrereqs name: Test::ReportPrereqs version: '0.029' - class: Dist::Zilla::Plugin::Test::Compile config: Dist::Zilla::Plugin::Test::Compile: bail_out_on_fail: '1' fail_on_warning: author fake_home: 0 filename: xt/author/00-compile.t module_finder: - ':InstallModules' needs_display: 0 phase: develop script_finder: - ':PerlExecFiles' skips: - WWW::RobotRules::DB_File switch: [] name: Test::Compile version: '2.059' - class: Dist::Zilla::Plugin::Test::Portability config: Dist::Zilla::Plugin::Test::Portability: options: '' name: Test::Portability version: '2.001003' - class: Dist::Zilla::Plugin::Test::EOL config: Dist::Zilla::Plugin::Test::EOL: filename: xt/author/eol.t finder: - ':ExecFiles' - ':InstallModules' - ':TestFiles' trailing_whitespace: 1 name: Test::EOL version: '0.19' - class: Dist::Zilla::Plugin::Test::MinimumVersion config: Dist::Zilla::Plugin::Test::MinimumVersion: max_target_perl: ~ name: Test::MinimumVersion version: '2.000011' - class: Dist::Zilla::Plugin::PodSyntaxTests name: PodSyntaxTests version: '6.037' - class: Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable name: Test::Pod::Coverage::Configurable version: '0.07' - class: Dist::Zilla::Plugin::Test::PodSpelling config: Dist::Zilla::Plugin::Test::PodSpelling: directories: - bin - lib spell_cmd: '' stopwords: - Aas - AnyDBM - Ardo - DBM - Gisle - Hakan - Koster - Martijn - RobotUA - cybermapper - diskcaching - txt wordlist: Pod::Wordlist name: Test::PodSpelling version: '2.007006' - class: Dist::Zilla::Plugin::Git::Check config: Dist::Zilla::Plugin::Git::Check: untracked_files: die Dist::Zilla::Role::Git::DirtyFiles: allow_dirty: [] allow_dirty_match: [] changelog: Changes Dist::Zilla::Role::Git::Repo: git_version: 2.43.0 repo_root: . name: Git::Check version: '2.052' - class: Dist::Zilla::Plugin::CheckStrictVersion name: CheckStrictVersion version: '0.001' - class: Dist::Zilla::Plugin::RunExtraTests config: Dist::Zilla::Role::TestRunner: default_jobs: '8' name: RunExtraTests version: '0.029' - class: Dist::Zilla::Plugin::CheckChangeLog name: CheckChangeLog version: '0.05' - class: Dist::Zilla::Plugin::CheckChangesHasContent name: CheckChangesHasContent version: '0.011' - class: Dist::Zilla::Plugin::TestRelease name: TestRelease version: '6.037' - class: Dist::Zilla::Plugin::UploadToCPAN name: UploadToCPAN version: '6.037' - class: Dist::Zilla::Plugin::ReadmeAnyFromPod config: Dist::Zilla::Role::FileWatcher: version: '0.006' name: Markdown_Readme version: '0.163250' - class: Dist::Zilla::Plugin::CopyFilesFromRelease config: Dist::Zilla::Plugin::CopyFilesFromRelease: filename: - LICENSE - META.json match: [] name: CopyFilesFromRelease version: '0.007' - class: Dist::Zilla::Plugin::Prereqs config: Dist::Zilla::Plugin::Prereqs: phase: develop type: recommends name: '@Git::VersionManager/pluginbundle version' version: '6.037' - class: Dist::Zilla::Plugin::RewriteVersion::Transitional config: Dist::Zilla::Plugin::RewriteVersion: add_tarball_name: 0 finders: - ':ExecFiles' - ':InstallModules' global: 0 skip_version_provider: 0 Dist::Zilla::Plugin::RewriteVersion::Transitional: {} name: '@Git::VersionManager/RewriteVersion::Transitional' version: '0.009' - class: Dist::Zilla::Plugin::MetaProvides::Update name: '@Git::VersionManager/MetaProvides::Update' version: '0.007' - class: Dist::Zilla::Plugin::CopyFilesFromRelease config: Dist::Zilla::Plugin::CopyFilesFromRelease: filename: - Changes match: [] name: '@Git::VersionManager/CopyFilesFromRelease' version: '0.007' - class: Dist::Zilla::Plugin::Git::Commit config: Dist::Zilla::Plugin::Git::Commit: add_files_in: [] commit_msg: v%V%n%n%c signoff: 0 Dist::Zilla::Role::Git::DirtyFiles: allow_dirty: - Changes - LICENSE - META.json - README.md allow_dirty_match: [] changelog: Changes Dist::Zilla::Role::Git::Repo: git_version: 2.43.0 repo_root: . Dist::Zilla::Role::Git::StringFormatter: time_zone: local name: '@Git::VersionManager/release snapshot' version: '2.052' - class: Dist::Zilla::Plugin::Git::Tag config: Dist::Zilla::Plugin::Git::Tag: branch: ~ changelog: Changes signed: 0 tag: v6.03 tag_format: v%V tag_message: v%V Dist::Zilla::Role::Git::Repo: git_version: 2.43.0 repo_root: . Dist::Zilla::Role::Git::StringFormatter: time_zone: local name: '@Git::VersionManager/Git::Tag' version: '2.052' - class: Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional config: Dist::Zilla::Plugin::BumpVersionAfterRelease: finders: - ':ExecFiles' - ':InstallModules' global: 0 munge_makefile_pl: 1 Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional: {} name: '@Git::VersionManager/BumpVersionAfterRelease::Transitional' version: '0.009' - class: Dist::Zilla::Plugin::NextRelease name: '@Git::VersionManager/NextRelease' version: '6.037' - class: Dist::Zilla::Plugin::Git::Commit config: Dist::Zilla::Plugin::Git::Commit: add_files_in: [] commit_msg: 'increment $VERSION after %v release' signoff: 0 Dist::Zilla::Role::Git::DirtyFiles: allow_dirty: - Build.PL - Changes - Makefile.PL allow_dirty_match: - (?^:^lib/.*\.pm$) changelog: Changes Dist::Zilla::Role::Git::Repo: git_version: 2.43.0 repo_root: . Dist::Zilla::Role::Git::StringFormatter: time_zone: local name: '@Git::VersionManager/post-release commit' version: '2.052' - class: Dist::Zilla::Plugin::Git::Push config: Dist::Zilla::Plugin::Git::Push: push_to: - origin remotes_must_exist: 1 Dist::Zilla::Role::Git::Repo: git_version: 2.43.0 repo_root: . name: Git::Push version: '2.052' - class: Dist::Zilla::Plugin::ConfirmRelease name: ConfirmRelease version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':InstallModules' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':IncModules' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':TestFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':ExtraTestFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':ExecFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':PerlExecFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':ShareFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':MainModule' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':AllFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: ':NoFiles' version: '6.037' - class: Dist::Zilla::Plugin::FinderCode name: MetaProvides::Package/AUTOVIV/:InstallModulesPM version: '6.037' zilla: class: Dist::Zilla::Dist::Builder config: is_trial: 0 version: '6.037' x_contributors: - 'Adam Kennedy ' - 'Adam Sjogren ' - 'Alexey Tourbin ' - 'Alex Kapranoff ' - 'amire80 ' - 'Andreas J. Koenig ' - 'Anton Yuzhaninov ' - 'Bill Mann ' - 'Bron Gondwana ' - 'Daniel Hedlund ' - 'David E. Wheeler ' - 'DAVIDRW ' - 'David Steinbrunner ' - 'dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>' - 'Father Chrysostomos ' - 'FWILES ' - 'Gavin Peters ' - 'Graeme Thompson ' - 'Graham Knop ' - 'Hans-H. Froehlich ' - 'Ian Kilgore ' - 'Jacob J ' - 'jefflee ' - 'john9art ' - 'Mark Stosberg ' - 'Mike Schilli ' - 'mschilli ' - 'murphy ' - 'Olaf Alders ' - 'Ondrej Hanak ' - 'Peter Rabbitson ' - 'phrstbrn ' - 'Robert Stone ' - 'Rolf Grossmann ' - 'ruff ' - 'sasao ' - 'Sean M. Burke ' - 'Slaven Rezic ' - 'Spiros Denaxas ' - 'Steve Hay ' - 'Todd Lipcon ' - 'Tom Hukins ' - 'Tony Finch ' - 'Toru Yamaguchi ' - 'uid39246 ' - 'Ville Skyttä ' - 'Yuri Karaban ' - 'Zefram ' x_generated_by_perl: v5.42.2 x_serialization_backend: 'YAML::Tiny version 1.76' x_spdx_expression: 'Artistic-1.0-Perl OR GPL-1.0-or-later' MANIFEST100644001750001751 103415204207641 14463 0ustar00olafolaf000000000000WWW-RobotRules-6.03# This file was automatically generated by Dist::Zilla::Plugin::Manifest v6.037 Changes INSTALL LICENSE MANIFEST META.json META.yml Makefile.PL cpanfile dist.ini lib/WWW/RobotRules.pm lib/WWW/RobotRules/AnyDBM_File.pm lib/WWW/RobotRules/DB_File.pm t/00-report-prereqs.dd t/00-report-prereqs.t t/misc/dbmrobot t/rules-dbm.t t/rules.t xt/author/00-compile.t xt/author/eol.t xt/author/minimum-version.t xt/author/mojibake.t xt/author/pod-coverage.t xt/author/pod-spell.t xt/author/pod-syntax.t xt/author/portability.t xt/author/test-version.t t000755001750001751 015204207641 13437 5ustar00olafolaf000000000000WWW-RobotRules-6.03rules.t100644001750001751 1135015204207641 15136 0ustar00olafolaf000000000000WWW-RobotRules-6.03/tuse strict; use warnings; use Test::More; use WWW::RobotRules (); # We test a number of different /robots.txt files, # my $content1 = < 1 => 'http://foo/private' => 1, 2 => 'http://foo/also_private' => 1, ], [ $content1, 'Wubble' => 3 => 'http://foo/private' => 0, 4 => 'http://foo/also_private' => 0, 5 => 'http://foo/other' => 1, ], [ $content2, 'MOMspider' => 6 => 'http://foo/private' => 0, 7 => 'http://foo/other' => 1, ], [ $content2, 'Wubble' => 8 => 'http://foo/private' => 1, 9 => 'http://foo/also_private' => 1, 10 => 'http://foo/other' => 1, ], [ $content3, 'MOMspider' => 11 => 'http://foo/private' => 1, 12 => 'http://foo/other' => 1, ], [ $content3, 'Wubble' => 13 => 'http://foo/private' => 1, 14 => 'http://foo/other' => 1, ], [ $content4, 'MOMspider' => 15 => 'http://foo/private' => 1, 16 => 'http://foo/this' => 0, 17 => 'http://foo/that' => 1, ], [ $content4, 'Another' => 18 => 'http://foo/private' => 1, 19 => 'http://foo/this' => 1, 20 => 'http://foo/that' => 0, ], [ $content4, 'Wubble' => 21 => 'http://foo/private' => 0, 22 => 'http://foo/this' => 1, 23 => 'http://foo/that' => 1, ], [ $content4, 'Another/1.0' => 24 => 'http://foo/private' => 1, 25 => 'http://foo/this' => 1, 26 => 'http://foo/that' => 0, ], [ $content4, "SvartEnke1" => 27 => "http://foo/" => 0, 28 => "http://foo/this" => 0, 29 => "http://bar/" => 1, ], [ $content4, "SvartEnke2" => 30 => "http://foo/" => 1, 31 => "http://foo/this" => 1, 32 => "http://bar/" => 1, ], [ $content4, "MomSpiderJr" => # should match "MomSpider" 33 => 'http://foo/private' => 1, 34 => 'http://foo/also_private' => 1, 35 => 'http://foo/this/' => 0, ], [ $content4, "SvartEnk" => # should match "*" 36 => "http://foo/" => 1, 37 => "http://foo/private/" => 0, 38 => "http://bar/" => 1, ], [ $content5, 'Villager/1.0' => 39 => 'http://foo/west-wing/' => 0, 40 => 'http://foo/' => 0, ], [ $content5, 'Belle/2.0' => 41 => 'http://foo/west-wing/' => 0, 42 => 'http://foo/' => 1, ], [ $content5, 'Beast/3.0' => 43 => 'http://foo/west-wing/' => 1, 44 => 'http://foo/' => 1, ], [ $content6, 'Villager/1.0' => 45 => 'http://foo/west-wing/' => 0, 46 => 'http://foo/' => 0, ], [ $content6, 'Belle/2.0' => 47 => 'http://foo/west-wing/' => 0, 48 => 'http://foo/' => 1, ], [ $content6, 'Beast/3.0' => 49 => 'http://foo/west-wing/' => 1, 50 => 'http://foo/' => 1, ], # when adding tests, remember to increase # the maximum at the top ); for my $t (@tests1) { my ($content, $ua) = splice(@$t, 0, 2); my $robotsrules = WWW::RobotRules->new($ua); $robotsrules->parse('http://foo/robots.txt', $content); my ($num, $path, $expected); while (($num, $path, $expected) = splice(@$t, 0, 3)) { my $allowed = $robotsrules->allowed($path); $allowed = 1 if $allowed; is $allowed, $expected, "$ua => $path" or $robotsrules->dump; } } done_testing; META.json100644001750001751 5637215204207641 15012 0ustar00olafolaf000000000000WWW-RobotRules-6.03{ "abstract" : "database of robots.txt-derived permissions", "author" : [ "Gisle Aas " ], "dynamic_config" : 0, "generated_by" : "Dist::Zilla version 6.037, CPAN::Meta::Converter version 2.150013", "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : 2 }, "name" : "WWW-RobotRules", "no_index" : { "directory" : [ "t", "xt" ] }, "prereqs" : { "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" }, "suggests" : { "JSON::PP" : "2.27300" } }, "develop" : { "recommends" : { "Dist::Zilla::PluginBundle::Git::VersionManager" : "0.007" }, "requires" : { "File::Spec" : "0", "IO::Handle" : "0", "IPC::Open3" : "0", "Pod::Coverage::TrustPod" : "0", "Pod::Spell" : "1.25", "Test::EOL" : "2.00", "Test::MinimumVersion" : "0", "Test::Mojibake" : "0", "Test::More" : "0.94", "Test::Pod" : "1.41", "Test::Pod::Coverage" : "1.08", "Test::Portability::Files" : "0", "Test::Spelling" : "0.17", "Test::Version" : "1" } }, "runtime" : { "requires" : { "AnyDBM_File" : "0", "Carp" : "0", "Fcntl" : "0", "URI" : "1.10", "perl" : "5.008001", "strict" : "0" }, "suggests" : { "DB_File" : "0" } }, "test" : { "recommends" : { "CPAN::Meta" : "2.120900" }, "requires" : { "ExtUtils::MakeMaker" : "0", "File::Spec" : "0", "Test::More" : "0.96", "strict" : "0", "warnings" : "0" } } }, "provides" : { "WWW::RobotRules" : { "file" : "lib/WWW/RobotRules.pm", "version" : "6.03" }, "WWW::RobotRules::AnyDBM_File" : { "file" : "lib/WWW/RobotRules/AnyDBM_File.pm", "version" : "6.03" }, "WWW::RobotRules::DB_File" : { "file" : "lib/WWW/RobotRules/DB_File.pm", "version" : "6.03" }, "WWW::RobotRules::InCore" : { "file" : "lib/WWW/RobotRules.pm", "version" : "6.03" } }, "release_status" : "stable", "resources" : { "bugtracker" : { "web" : "https://github.com/libwww-perl/WWW-RobotRules/issues" }, "homepage" : "https://github.com/libwww-perl/WWW-RobotRules", "repository" : { "type" : "git", "url" : "https://github.com/libwww-perl/WWW-RobotRules.git", "web" : "https://github.com/libwww-perl/WWW-RobotRules" }, "x_MailingList" : "mailto:libwww@perl.org" }, "version" : "6.03", "x_Dist_Zilla" : { "perl" : { "version" : "5.042002" }, "plugins" : [ { "class" : "Dist::Zilla::Plugin::Git::GatherDir", "config" : { "Dist::Zilla::Plugin::GatherDir" : { "exclude_filename" : [ ".perltidyrc", "LICENSE", "META.json", "Makefile.PL", "README.md", "perlimports.toml", "precious.toml" ], "exclude_match" : [], "include_dotfiles" : 0, "prefix" : "", "prune_directory" : [], "root" : "." }, "Dist::Zilla::Plugin::Git::GatherDir" : { "include_untracked" : 0 } }, "name" : "Git::GatherDir", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::MetaConfig", "name" : "MetaConfig", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::MetaProvides::Package", "config" : { "Dist::Zilla::Plugin::MetaProvides::Package" : { "finder_objects" : [ { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : "MetaProvides::Package/AUTOVIV/:InstallModulesPM", "version" : "6.037" } ], "include_underscores" : 0 }, "Dist::Zilla::Role::MetaProvider::Provider" : { "$Dist::Zilla::Role::MetaProvider::Provider::VERSION" : "2.002004", "inherit_missing" : 1, "inherit_version" : 1, "meta_noindex" : 1 }, "Dist::Zilla::Role::ModuleMetadata" : { "Module::Metadata" : "1.000038", "version" : "0.006" } }, "name" : "MetaProvides::Package", "version" : "2.004003" }, { "class" : "Dist::Zilla::Plugin::MetaNoIndex", "name" : "MetaNoIndex", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::MetaYAML", "name" : "MetaYAML", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::MetaJSON", "name" : "MetaJSON", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::MetaResources", "name" : "MetaResources", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::Git::Contributors", "config" : { "Dist::Zilla::Plugin::Git::Contributors" : { "git_version" : "2.43.0", "include_authors" : 0, "include_releaser" : 1, "order_by" : "name", "paths" : [] } }, "name" : "Git::Contributors", "version" : "0.039" }, { "class" : "Dist::Zilla::Plugin::GithubMeta", "name" : "GithubMeta", "version" : "0.58" }, { "class" : "Dist::Zilla::Plugin::Manifest", "name" : "Manifest", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::License", "name" : "License", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::InstallGuide", "config" : { "Dist::Zilla::Role::ModuleMetadata" : { "Module::Metadata" : "1.000038", "version" : "0.006" } }, "name" : "InstallGuide", "version" : "1.200014" }, { "class" : "Dist::Zilla::Plugin::Prereqs::FromCPANfile", "name" : "Prereqs::FromCPANfile", "version" : "0.08" }, { "class" : "Dist::Zilla::Plugin::MakeMaker", "config" : { "Dist::Zilla::Role::TestRunner" : { "default_jobs" : "8" } }, "name" : "MakeMaker", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::MojibakeTests", "name" : "MojibakeTests", "version" : "0.8" }, { "class" : "Dist::Zilla::Plugin::Test::Version", "name" : "Test::Version", "version" : "1.09" }, { "class" : "Dist::Zilla::Plugin::Test::ReportPrereqs", "name" : "Test::ReportPrereqs", "version" : "0.029" }, { "class" : "Dist::Zilla::Plugin::Test::Compile", "config" : { "Dist::Zilla::Plugin::Test::Compile" : { "bail_out_on_fail" : "1", "fail_on_warning" : "author", "fake_home" : 0, "filename" : "xt/author/00-compile.t", "module_finder" : [ ":InstallModules" ], "needs_display" : 0, "phase" : "develop", "script_finder" : [ ":PerlExecFiles" ], "skips" : [ "WWW::RobotRules::DB_File" ], "switch" : [] } }, "name" : "Test::Compile", "version" : "2.059" }, { "class" : "Dist::Zilla::Plugin::Test::Portability", "config" : { "Dist::Zilla::Plugin::Test::Portability" : { "options" : "" } }, "name" : "Test::Portability", "version" : "2.001003" }, { "class" : "Dist::Zilla::Plugin::Test::EOL", "config" : { "Dist::Zilla::Plugin::Test::EOL" : { "filename" : "xt/author/eol.t", "finder" : [ ":ExecFiles", ":InstallModules", ":TestFiles" ], "trailing_whitespace" : 1 } }, "name" : "Test::EOL", "version" : "0.19" }, { "class" : "Dist::Zilla::Plugin::Test::MinimumVersion", "config" : { "Dist::Zilla::Plugin::Test::MinimumVersion" : { "max_target_perl" : null } }, "name" : "Test::MinimumVersion", "version" : "2.000011" }, { "class" : "Dist::Zilla::Plugin::PodSyntaxTests", "name" : "PodSyntaxTests", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable", "name" : "Test::Pod::Coverage::Configurable", "version" : "0.07" }, { "class" : "Dist::Zilla::Plugin::Test::PodSpelling", "config" : { "Dist::Zilla::Plugin::Test::PodSpelling" : { "directories" : [ "bin", "lib" ], "spell_cmd" : "", "stopwords" : [ "Aas", "AnyDBM", "Ardo", "DBM", "Gisle", "Hakan", "Koster", "Martijn", "RobotUA", "cybermapper", "diskcaching", "txt" ], "wordlist" : "Pod::Wordlist" } }, "name" : "Test::PodSpelling", "version" : "2.007006" }, { "class" : "Dist::Zilla::Plugin::Git::Check", "config" : { "Dist::Zilla::Plugin::Git::Check" : { "untracked_files" : "die" }, "Dist::Zilla::Role::Git::DirtyFiles" : { "allow_dirty" : [], "allow_dirty_match" : [], "changelog" : "Changes" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.43.0", "repo_root" : "." } }, "name" : "Git::Check", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::CheckStrictVersion", "name" : "CheckStrictVersion", "version" : "0.001" }, { "class" : "Dist::Zilla::Plugin::RunExtraTests", "config" : { "Dist::Zilla::Role::TestRunner" : { "default_jobs" : "8" } }, "name" : "RunExtraTests", "version" : "0.029" }, { "class" : "Dist::Zilla::Plugin::CheckChangeLog", "name" : "CheckChangeLog", "version" : "0.05" }, { "class" : "Dist::Zilla::Plugin::CheckChangesHasContent", "name" : "CheckChangesHasContent", "version" : "0.011" }, { "class" : "Dist::Zilla::Plugin::TestRelease", "name" : "TestRelease", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::UploadToCPAN", "name" : "UploadToCPAN", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::ReadmeAnyFromPod", "config" : { "Dist::Zilla::Role::FileWatcher" : { "version" : "0.006" } }, "name" : "Markdown_Readme", "version" : "0.163250" }, { "class" : "Dist::Zilla::Plugin::CopyFilesFromRelease", "config" : { "Dist::Zilla::Plugin::CopyFilesFromRelease" : { "filename" : [ "LICENSE", "META.json" ], "match" : [] } }, "name" : "CopyFilesFromRelease", "version" : "0.007" }, { "class" : "Dist::Zilla::Plugin::Prereqs", "config" : { "Dist::Zilla::Plugin::Prereqs" : { "phase" : "develop", "type" : "recommends" } }, "name" : "@Git::VersionManager/pluginbundle version", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::RewriteVersion::Transitional", "config" : { "Dist::Zilla::Plugin::RewriteVersion" : { "add_tarball_name" : 0, "finders" : [ ":ExecFiles", ":InstallModules" ], "global" : 0, "skip_version_provider" : 0 }, "Dist::Zilla::Plugin::RewriteVersion::Transitional" : {} }, "name" : "@Git::VersionManager/RewriteVersion::Transitional", "version" : "0.009" }, { "class" : "Dist::Zilla::Plugin::MetaProvides::Update", "name" : "@Git::VersionManager/MetaProvides::Update", "version" : "0.007" }, { "class" : "Dist::Zilla::Plugin::CopyFilesFromRelease", "config" : { "Dist::Zilla::Plugin::CopyFilesFromRelease" : { "filename" : [ "Changes" ], "match" : [] } }, "name" : "@Git::VersionManager/CopyFilesFromRelease", "version" : "0.007" }, { "class" : "Dist::Zilla::Plugin::Git::Commit", "config" : { "Dist::Zilla::Plugin::Git::Commit" : { "add_files_in" : [], "commit_msg" : "v%V%n%n%c", "signoff" : 0 }, "Dist::Zilla::Role::Git::DirtyFiles" : { "allow_dirty" : [ "Changes", "LICENSE", "META.json", "README.md" ], "allow_dirty_match" : [], "changelog" : "Changes" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.43.0", "repo_root" : "." }, "Dist::Zilla::Role::Git::StringFormatter" : { "time_zone" : "local" } }, "name" : "@Git::VersionManager/release snapshot", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::Git::Tag", "config" : { "Dist::Zilla::Plugin::Git::Tag" : { "branch" : null, "changelog" : "Changes", "signed" : 0, "tag" : "v6.03", "tag_format" : "v%V", "tag_message" : "v%V" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.43.0", "repo_root" : "." }, "Dist::Zilla::Role::Git::StringFormatter" : { "time_zone" : "local" } }, "name" : "@Git::VersionManager/Git::Tag", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional", "config" : { "Dist::Zilla::Plugin::BumpVersionAfterRelease" : { "finders" : [ ":ExecFiles", ":InstallModules" ], "global" : 0, "munge_makefile_pl" : 1 }, "Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional" : {} }, "name" : "@Git::VersionManager/BumpVersionAfterRelease::Transitional", "version" : "0.009" }, { "class" : "Dist::Zilla::Plugin::NextRelease", "name" : "@Git::VersionManager/NextRelease", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::Git::Commit", "config" : { "Dist::Zilla::Plugin::Git::Commit" : { "add_files_in" : [], "commit_msg" : "increment $VERSION after %v release", "signoff" : 0 }, "Dist::Zilla::Role::Git::DirtyFiles" : { "allow_dirty" : [ "Build.PL", "Changes", "Makefile.PL" ], "allow_dirty_match" : [ "(?^:^lib/.*\\.pm$)" ], "changelog" : "Changes" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.43.0", "repo_root" : "." }, "Dist::Zilla::Role::Git::StringFormatter" : { "time_zone" : "local" } }, "name" : "@Git::VersionManager/post-release commit", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::Git::Push", "config" : { "Dist::Zilla::Plugin::Git::Push" : { "push_to" : [ "origin" ], "remotes_must_exist" : 1 }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.43.0", "repo_root" : "." } }, "name" : "Git::Push", "version" : "2.052" }, { "class" : "Dist::Zilla::Plugin::ConfirmRelease", "name" : "ConfirmRelease", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":InstallModules", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":IncModules", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":TestFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ExtraTestFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ExecFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":PerlExecFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ShareFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":MainModule", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":AllFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":NoFiles", "version" : "6.037" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : "MetaProvides::Package/AUTOVIV/:InstallModulesPM", "version" : "6.037" } ], "zilla" : { "class" : "Dist::Zilla::Dist::Builder", "config" : { "is_trial" : 0 }, "version" : "6.037" } }, "x_contributors" : [ "Adam Kennedy ", "Adam Sjogren ", "Alexey Tourbin ", "Alex Kapranoff ", "amire80 ", "Andreas J. Koenig ", "Anton Yuzhaninov ", "Bill Mann ", "Bron Gondwana ", "Daniel Hedlund ", "David E. Wheeler ", "DAVIDRW ", "David Steinbrunner ", "dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>", "Father Chrysostomos ", "FWILES ", "Gavin Peters ", "Graeme Thompson ", "Graham Knop ", "Hans-H. Froehlich ", "Ian Kilgore ", "Jacob J ", "jefflee ", "john9art ", "Mark Stosberg ", "Mike Schilli ", "mschilli ", "murphy ", "Olaf Alders ", "Ondrej Hanak ", "Peter Rabbitson ", "phrstbrn ", "Robert Stone ", "Rolf Grossmann ", "ruff ", "sasao ", "Sean M. Burke ", "Slaven Rezic ", "Spiros Denaxas ", "Steve Hay ", "Todd Lipcon ", "Tom Hukins ", "Tony Finch ", "Toru Yamaguchi ", "uid39246 ", "Ville Skytt\u00e4 ", "Yuri Karaban ", "Zefram " ], "x_generated_by_perl" : "v5.42.2", "x_serialization_backend" : "Cpanel::JSON::XS version 4.40", "x_spdx_expression" : "Artistic-1.0-Perl OR GPL-1.0-or-later" } Makefile.PL100644001750001751 253415204207641 15312 0ustar00olafolaf000000000000WWW-RobotRules-6.03# This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v6.037 use strict; use warnings; use 5.008001; use ExtUtils::MakeMaker; my %WriteMakefileArgs = ( "ABSTRACT" => "database of robots.txt-derived permissions", "AUTHOR" => "Gisle Aas ", "CONFIGURE_REQUIRES" => { "ExtUtils::MakeMaker" => 0 }, "DISTNAME" => "WWW-RobotRules", "LICENSE" => "perl", "MIN_PERL_VERSION" => "5.008001", "NAME" => "WWW::RobotRules", "PREREQ_PM" => { "AnyDBM_File" => 0, "Carp" => 0, "Fcntl" => 0, "URI" => "1.10", "strict" => 0 }, "TEST_REQUIRES" => { "ExtUtils::MakeMaker" => 0, "File::Spec" => 0, "Test::More" => "0.96", "strict" => 0, "warnings" => 0 }, "VERSION" => "6.03", "test" => { "TESTS" => "t/*.t" } ); my %FallbackPrereqs = ( "AnyDBM_File" => 0, "Carp" => 0, "ExtUtils::MakeMaker" => 0, "Fcntl" => 0, "File::Spec" => 0, "Test::More" => "0.96", "URI" => "1.10", "strict" => 0, "warnings" => 0 ); unless ( eval { ExtUtils::MakeMaker->VERSION(6.63_03) } ) { delete $WriteMakefileArgs{TEST_REQUIRES}; delete $WriteMakefileArgs{BUILD_REQUIRES}; $WriteMakefileArgs{PREREQ_PM} = \%FallbackPrereqs; } delete $WriteMakefileArgs{CONFIGURE_REQUIRES} unless eval { ExtUtils::MakeMaker->VERSION(6.52) }; WriteMakefile(%WriteMakefileArgs); rules-dbm.t100644001750001751 553615204207641 15667 0ustar00olafolaf000000000000WWW-RobotRules-6.03/tuse strict; use warnings; use Test::More; use File::Temp qw( tempdir ); use WWW::RobotRules::AnyDBM_File (); my $dir = tempdir(CLEANUP => 1); my $file = "$dir/robotdb"; my $r = WWW::RobotRules::AnyDBM_File->new("myrobot/2.0", $file); # Cache backing file(s) must have no group/world permission bits. if ($^O ne 'MSWin32') { my @backing = glob "$file*"; ok scalar @backing, "DBM backing file(s) exist after construction"; for my $f (@backing) { my $mode = (stat $f)[2] & 07777; is($mode & 0077, 0, "$f mode " . sprintf("%04o", $mode) . " has no group/world bits"); } } $r->parse("http://www.aas.no/robots.txt", ""); $r->visit("www.aas.no:80"); is $r->no_visits("www.aas.no:80"), 1; $r->push_rules("www.sn.no:80", "/aas", "/per"); $r->push_rules("www.sn.no:80", "/god", "/old"); my @r = $r->rules("www.sn.no:80"); is "@r", "/aas /per /god /old"; $r->clear_rules("per"); $r->clear_rules("www.sn.no:80"); @r = $r->rules("www.sn.no:80"); is "@r", ""; $r->visit("www.aas.no:80", time + 10); $r->visit("www.sn.no:80"); note "No visits: " . $r->no_visits("www.aas.no:80"); note "Last visit: " . $r->last_visit("www.aas.no:80"); note "Fresh until: " . $r->fresh_until("www.aas.no:80"); is $r->no_visits("www.aas.no:80"), 2; cmp_ok abs($r->last_visit("www.sn.no:80") - time), '<=', 2; $r = undef; # Try to reopen the database without a name specified $r = WWW::RobotRules::AnyDBM_File->new(undef, $file); $r->visit("www.aas.no:80"); is $r->no_visits("www.aas.no:80"), 3; note "Agent-Name: ", $r->agent; is $r->agent, 'myrobot'; $r = undef; note "*** Dump of database ***"; tie(my %cat, 'AnyDBM_File', $file, 0, 0644) or die "Can't tie: $!"; while (my ($key, $val) = each(%cat)) { note "$key\t$val"; } note "******"; untie %cat; # Try to open database with a different agent name $r = WWW::RobotRules::AnyDBM_File->new("MOMSpider/2.0", $file); is $r->no_visits("www.sn.no:80"), 0; # Try parsing $r->parse("http://www.sn.no:8080/robots.txt", <rules("www.sn.no:8080"); is "@r", "/foo /bar"; cmp_ok $r->allowed("http://www.sn.no"), '<', 0; ok !$r->allowed("http://www.sn.no:8080/foo/gisle"); sleep(2); # wait until file has expired cmp_ok $r->allowed("http://www.sn.no:8080/foo/gisle"), '<', 0; $r = undef; note "*** Dump of database ***"; tie(%cat, 'AnyDBM_File', $file, 0, 0644) or die "Can't tie: $!"; while (my ($key, $val) = each(%cat)) { note "$key\t$val"; } note "******"; untie %cat; # Otherwise the next line fails on DOSish while (unlink("$file", "$file.pag", "$file.dir", "$file.db")) { } # Try open a an emty database without specifying a name eval { $r = WWW::RobotRules::AnyDBM_File->new(undef, $file); }; isnt $@, ""; unlink "$file", "$file.pag", "$file.dir", "$file.db"; done_testing; misc000755001750001751 015204207641 14372 5ustar00olafolaf000000000000WWW-RobotRules-6.03/tdbmrobot100755001750001751 114515204207641 16271 0ustar00olafolaf000000000000WWW-RobotRules-6.03/t/misc#!/local/perl/bin/perl -w use strict; use warnings; use URI::URL qw( url ); my $url = url(shift) || die "Usage: $0 \n"; use WWW::RobotRules::AnyDBM_File (); use LWP::RobotUA (); my $botname = "Spider/0.1"; my $rules = WWW::RobotRules::AnyDBM_File->new($botname, 'robotdb'); my $ua = LWP::RobotUA->new($botname, 'gisle@aas.no', $rules); $ua->delay(0.1); my $req = HTTP::Request->new(GET => $url); my $res = $ua->request($req); print "Got ", $res->code, " ", $res->message, "(", $res->content_type, ")\n"; my $netloc = $url->netloc; print "This was visit no ", $ua->no_visits($netloc), " to $netloc\n"; author000755001750001751 015204207641 15131 5ustar00olafolaf000000000000WWW-RobotRules-6.03/xteol.t100644001750001751 70415204207641 16216 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; # this test was generated with Dist::Zilla::Plugin::Test::EOL 0.19 use Test::More 0.88; use Test::EOL; my @files = ( 'lib/WWW/RobotRules.pm', 'lib/WWW/RobotRules/AnyDBM_File.pm', 'lib/WWW/RobotRules/DB_File.pm', 't/00-report-prereqs.dd', 't/00-report-prereqs.t', 't/misc/dbmrobot', 't/rules-dbm.t', 't/rules.t' ); eol_unix_ok($_, { trailing_whitespace => 1 }) foreach @files; done_testing; mojibake.t100644001750001751 15115204207641 17214 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/author#!perl use strict; use warnings qw(all); use Test::More; use Test::Mojibake; all_files_encoding_ok(); WWW000755001750001751 015204207641 14426 5ustar00olafolaf000000000000WWW-RobotRules-6.03/libRobotRules.pm100644001750001751 2730615204207641 17254 0ustar00olafolaf000000000000WWW-RobotRules-6.03/lib/WWWpackage WWW::RobotRules; use strict; our $VERSION = '6.03'; sub Version { $VERSION; } use URI (); sub new { my ($class, $ua) = @_; # This ugly hack is needed to ensure backwards compatibility. # The "WWW::RobotRules" class is now really abstract. $class = "WWW::RobotRules::InCore" if $class eq "WWW::RobotRules"; my $self = bless {}, $class; $self->agent($ua); $self; } sub parse { my ($self, $robot_txt_uri, $txt, $fresh_until) = @_; $robot_txt_uri = URI->new("$robot_txt_uri"); my $netloc = $robot_txt_uri->host . ":" . $robot_txt_uri->port; $self->clear_rules($netloc); $self->fresh_until($netloc, $fresh_until || (time + 365 * 24 * 3600)); my $ua; my $is_me = 0; # 1 iff this record is for me my $is_anon = 0; # 1 iff this record is for * my $seen_disallow = 0; # watch for missing record separators my @me_disallowed = (); # rules disallowed for me my @anon_disallowed = (); # rules disallowed for * # blank lines are significant, so turn CRLF into LF to avoid generating # false ones $txt =~ s/\015\012/\012/g; # split at \012 (LF) or \015 (CR) (Mac text files have just CR for EOL) for (split(/[\012\015]/, $txt)) { # Lines containing only a comment are discarded completely, and # therefore do not indicate a record boundary. next if /^\s*\#/; s/\s*\#.*//; # remove comments at end-of-line if (/^\s*$/) { # blank line last if $is_me; # That was our record. No need to read the rest. $is_anon = 0; $seen_disallow = 0; } elsif (/^\s*User-Agent\s*:\s*(.*)/i) { $ua = $1; $ua =~ s/\s+$//; if ($seen_disallow) { # treat as start of a new record $seen_disallow = 0; last if $is_me; # That was our record. No need to read the rest. $is_anon = 0; } if ($is_me) { # This record already had a User-agent that # we matched, so just continue. } elsif ($ua eq '*') { $is_anon = 1; } elsif ($self->is_me($ua)) { $is_me = 1; } } elsif (/^\s*Disallow\s*:\s*(.*)/i) { unless (defined $ua) { warn "RobotRules <$robot_txt_uri>: Disallow without preceding User-agent\n" if $^W; $is_anon = 1; # assume that User-agent: * was intended } my $disallow = $1; $disallow =~ s/\s+$//; $seen_disallow = 1; if (length $disallow) { my $ignore; eval { my $u = URI->new_abs($disallow, $robot_txt_uri); $ignore++ if $u->scheme ne $robot_txt_uri->scheme; $ignore++ if lc($u->host) ne lc($robot_txt_uri->host); $ignore++ if $u->port ne $robot_txt_uri->port; $disallow = $u->path_query; $disallow = "/" unless length $disallow; }; next if $@; next if $ignore; } if ($is_me) { push(@me_disallowed, $disallow); } elsif ($is_anon) { push(@anon_disallowed, $disallow); } } elsif (/\S\s*:/) { # ignore } else { warn "RobotRules <$robot_txt_uri>: Malformed record: <$_>\n" if $^W; } } if ($is_me) { $self->push_rules($netloc, @me_disallowed); } else { $self->push_rules($netloc, @anon_disallowed); } } # # Returns TRUE if the given name matches the # name of this robot # sub is_me { my ($self, $ua_line) = @_; my $me = $self->agent; # See whether my short-name is a substring of the # "User-Agent: ..." line that we were passed: if (index(lc($me), lc($ua_line)) >= 0) { return 1; } else { return ''; } } sub allowed { my ($self, $uri) = @_; $uri = URI->new("$uri"); return 1 unless $uri->scheme eq 'http' or $uri->scheme eq 'https'; # Robots.txt applies to only those schemes. my $netloc = $uri->host . ":" . $uri->port; my $fresh_until = $self->fresh_until($netloc); return -1 if !defined($fresh_until) || $fresh_until < time; my $str = $uri->path_query; my $rule; for $rule ($self->rules($netloc)) { return 1 unless length $rule; return 0 if index($str, $rule) == 0; } return 1; } # The following methods must be provided by the subclass. sub agent; sub visit; sub no_visits; sub last_visit; sub fresh_until; sub push_rules; sub clear_rules; sub rules; sub dump; package WWW::RobotRules::InCore; our @ISA = qw(WWW::RobotRules); sub agent { my ($self, $name) = @_; my $old = $self->{'ua'}; if ($name) { # Strip it so that it's just the short name. # I.e., "FooBot" => "FooBot" # "FooBot/1.2" => "FooBot" # "FooBot/1.2 [http://foobot.int; foo@bot.int]" => "FooBot" $name = $1 if $name =~ m/(\S+)/; # get first word $name =~ s!/.*!!; # get rid of version unless ($old && $old eq $name) { delete $self->{'loc'}; # all old info is now stale $self->{'ua'} = $name; } } $old; } sub visit { my ($self, $netloc, $time) = @_; return unless $netloc; $time ||= time; $self->{'loc'}{$netloc}{'last'} = $time; my $count = \$self->{'loc'}{$netloc}{'count'}; if (!defined $$count) { $$count = 1; } else { $$count++; } } sub no_visits { my ($self, $netloc) = @_; $self->{'loc'}{$netloc}{'count'}; } sub last_visit { my ($self, $netloc) = @_; $self->{'loc'}{$netloc}{'last'}; } sub fresh_until { my ($self, $netloc, $fresh_until) = @_; my $old = $self->{'loc'}{$netloc}{'fresh'}; if (defined $fresh_until) { $self->{'loc'}{$netloc}{'fresh'} = $fresh_until; } $old; } sub push_rules { my ($self, $netloc, @rules) = @_; push(@{$self->{'loc'}{$netloc}{'rules'}}, @rules); } sub clear_rules { my ($self, $netloc) = @_; delete $self->{'loc'}{$netloc}{'rules'}; } sub rules { my ($self, $netloc) = @_; if (defined $self->{'loc'}{$netloc}{'rules'}) { return @{$self->{'loc'}{$netloc}{'rules'}}; } else { return (); } } sub dump { my $self = shift; for (keys %$self) { next if $_ eq 'loc'; print "$_ = $self->{$_}\n"; } for (keys %{$self->{'loc'}}) { my @rules = $self->rules($_); print "$_: ", join("; ", @rules), "\n"; } } 1; __END__ # Bender: "Well, I don't have anything else # planned for today. Let's get drunk!" =head1 NAME WWW::RobotRules - database of robots.txt-derived permissions =head1 SYNOPSIS use WWW::RobotRules; my $rules = WWW::RobotRules->new('MOMspider/1.0'); use LWP::Simple qw(get); { my $url = "http://some.place/robots.txt"; my $robots_txt = get $url; $rules->parse($url, $robots_txt) if defined $robots_txt; } { my $url = "http://some.other.place/robots.txt"; my $robots_txt = get $url; $rules->parse($url, $robots_txt) if defined $robots_txt; } # Now we can check if a URL is valid for those servers # whose "robots.txt" files we've gotten and parsed: if($rules->allowed($url)) { $c = get $url; ... } =head1 DESCRIPTION This module parses F files as specified in at L. Webmasters can use the F file to forbid conforming robots from accessing parts of their web site. The parsed files are kept in a C object, and this object provides methods to check if access to a given URL is prohibited. The same C object can be used for one or more parsed F files on any number of hosts. The following methods are provided: =over 4 =item $rules = WWW::RobotRules->new($robot_name) This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot. =item $rules->parse($robot_txt_url, $content, $fresh_until) The parse() method takes as arguments the URL that was used to retrieve the F file, and the contents of the file. =item $rules->allowed($uri) Returns TRUE if this robot is allowed to retrieve this URL. =item $rules->agent([$name]) Get/set the agent name. NOTE: Changing the agent name will clear the F rules and expire times out of the cache. =back =head1 ROBOTS.TXT The format and semantics of the "/robots.txt" file are as follows: The file consists of one or more records separated by one or more blank lines. Each record contains lines of the form : The field name is case insensitive. Text after the '#' character on a line is ignored during parsing. This is used for comments. The following can be used: =over 3 =item User-Agent The value of this field is the name of the robot the record is describing access policy for. If more than one I field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. The I fields must occur before the I fields. If a record contains a I field after a I field, that constitutes a malformed record. This parser will assume that a blank line should have been placed before that I field, and will break the record into two. All the fields before the I field will constitute a record, and the I field will be the first field in a new record. =item Disallow The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved =back Unrecognized records are ignored. =head1 ROBOTS.TXT EXAMPLES The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/": User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/ # these will soon disappear This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called "cybermapper": User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space # Cybermapper knows where to go. User-agent: cybermapper Disallow: This example indicates that no robots should visit this site further: # go away User-agent: * Disallow: / This is an example of a malformed robots.txt file. # robots.txt for ancientcastle.example.com # I've locked myself away. User-agent: * Disallow: / # The castle is your home now, so you can go anywhere you like. User-agent: Belle Disallow: /west-wing/ # except the west wing! # It's good to be the Prince... User-agent: Beast Disallow: This file is missing the required blank lines between records. However, the intention is clear. =head1 SEE ALSO L, L =head1 COPYRIGHT Copyright 1995-2009, Gisle Aas Copyright 1995, Martijn Koster This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. 00-report-prereqs.t100644001750001751 1360115204207641 17214 0ustar00olafolaf000000000000WWW-RobotRules-6.03/t#!perl use strict; use warnings; # This test was generated by Dist::Zilla::Plugin::Test::ReportPrereqs 0.029 use Test::More tests => 1; use ExtUtils::MakeMaker; use File::Spec; # from $version::LAX my $lax_version_re = qr/(?: undef | (?: (?:[0-9]+) (?: \. | (?:\.[0-9]+) (?:_[0-9]+)? )? | (?:\.[0-9]+) (?:_[0-9]+)? ) | (?: v (?:[0-9]+) (?: (?:\.[0-9]+)+ (?:_[0-9]+)? )? | (?:[0-9]+)? (?:\.[0-9]+){2,} (?:_[0-9]+)? ) )/x; # hide optional CPAN::Meta modules from prereq scanner # and check if they are available my $cpan_meta = "CPAN::Meta"; my $cpan_meta_pre = "CPAN::Meta::Prereqs"; my $HAS_CPAN_META = eval "require $cpan_meta; $cpan_meta->VERSION('2.120900')" && eval "require $cpan_meta_pre"; ## no critic # Verify requirements? my $DO_VERIFY_PREREQS = 1; sub _max { my $max = shift; $max = ( $_ > $max ) ? $_ : $max for @_; return $max; } sub _merge_prereqs { my ($collector, $prereqs) = @_; # CPAN::Meta::Prereqs object if (ref $collector eq $cpan_meta_pre) { return $collector->with_merged_prereqs( CPAN::Meta::Prereqs->new( $prereqs ) ); } # Raw hashrefs for my $phase ( keys %$prereqs ) { for my $type ( keys %{ $prereqs->{$phase} } ) { for my $module ( keys %{ $prereqs->{$phase}{$type} } ) { $collector->{$phase}{$type}{$module} = $prereqs->{$phase}{$type}{$module}; } } } return $collector; } my @include = qw( ); my @exclude = qw( ); # Add static prereqs to the included modules list my $static_prereqs = do './t/00-report-prereqs.dd'; # Merge all prereqs (either with ::Prereqs or a hashref) my $full_prereqs = _merge_prereqs( ( $HAS_CPAN_META ? $cpan_meta_pre->new : {} ), $static_prereqs ); # Add dynamic prereqs to the included modules list (if we can) my ($source) = grep { -f } 'MYMETA.json', 'MYMETA.yml'; my $cpan_meta_error; if ( $source && $HAS_CPAN_META && (my $meta = eval { CPAN::Meta->load_file($source) } ) ) { $full_prereqs = _merge_prereqs($full_prereqs, $meta->prereqs); } else { $cpan_meta_error = $@; # capture error from CPAN::Meta->load_file($source) $source = 'static metadata'; } my @full_reports; my @dep_errors; my $req_hash = $HAS_CPAN_META ? $full_prereqs->as_string_hash : $full_prereqs; # Add static includes into a fake section for my $mod (@include) { $req_hash->{other}{modules}{$mod} = 0; } for my $phase ( qw(configure build test runtime develop other) ) { next unless $req_hash->{$phase}; next if ($phase eq 'develop' and not $ENV{AUTHOR_TESTING}); for my $type ( qw(requires recommends suggests conflicts modules) ) { next unless $req_hash->{$phase}{$type}; my $title = ucfirst($phase).' '.ucfirst($type); my @reports = [qw/Module Want Have/]; for my $mod ( sort keys %{ $req_hash->{$phase}{$type} } ) { next if grep { $_ eq $mod } @exclude; my $want = $req_hash->{$phase}{$type}{$mod}; $want = "undef" unless defined $want; $want = "any" if !$want && $want == 0; if ($mod eq 'perl') { push @reports, ['perl', $want, $]]; next; } my $req_string = $want eq 'any' ? 'any version required' : "version '$want' required"; my $file = $mod; $file =~ s{::}{/}g; $file .= ".pm"; my ($prefix) = grep { -e File::Spec->catfile($_, $file) } @INC; if ($prefix) { my $have = MM->parse_version( File::Spec->catfile($prefix, $file) ); $have = "undef" unless defined $have; push @reports, [$mod, $want, $have]; if ( $DO_VERIFY_PREREQS && $HAS_CPAN_META && $type eq 'requires' ) { if ( $have !~ /\A$lax_version_re\z/ ) { push @dep_errors, "$mod version '$have' cannot be parsed ($req_string)"; } elsif ( ! $full_prereqs->requirements_for( $phase, $type )->accepts_module( $mod => $have ) ) { push @dep_errors, "$mod version '$have' is not in required range '$want'"; } } } else { push @reports, [$mod, $want, "missing"]; if ( $DO_VERIFY_PREREQS && $type eq 'requires' ) { push @dep_errors, "$mod is not installed ($req_string)"; } } } if ( @reports ) { push @full_reports, "=== $title ===\n\n"; my $ml = _max( map { length $_->[0] } @reports ); my $wl = _max( map { length $_->[1] } @reports ); my $hl = _max( map { length $_->[2] } @reports ); if ($type eq 'modules') { splice @reports, 1, 0, ["-" x $ml, "", "-" x $hl]; push @full_reports, map { sprintf(" %*s %*s\n", -$ml, $_->[0], $hl, $_->[2]) } @reports; } else { splice @reports, 1, 0, ["-" x $ml, "-" x $wl, "-" x $hl]; push @full_reports, map { sprintf(" %*s %*s %*s\n", -$ml, $_->[0], $wl, $_->[1], $hl, $_->[2]) } @reports; } push @full_reports, "\n"; } } } if ( @full_reports ) { diag "\nVersions for all modules listed in $source (including optional ones):\n\n", @full_reports; } if ( $cpan_meta_error || @dep_errors ) { diag "\n*** WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***\n"; } if ( $cpan_meta_error ) { my ($orig_source) = grep { -f } 'MYMETA.json', 'MYMETA.yml'; diag "\nCPAN::Meta->load_file('$orig_source') failed with: $cpan_meta_error\n"; } if ( @dep_errors ) { diag join("\n", "\nThe following REQUIRED prerequisites were not satisfied:\n", @dep_errors, "\n" ); } pass('Reported prereqs'); # vim: ts=4 sts=4 sw=4 et: pod-spell.t100644001750001751 223115204207641 17353 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; use Test::More; # generated by Dist::Zilla::Plugin::Test::PodSpelling 2.007006 use Test::Spelling 0.17; use Pod::Wordlist; add_stopwords(); all_pod_files_spelling_ok( qw( bin lib ) ); __DATA__ 49699333 Aas Adam Alders Alex Alexey Andreas Anton AnyDBM AnyDBM_File Ardo Bill Bron Burke Chrysostomos DAVIDRW DBM DB_File Daniel David Denaxas FWILES Father Finch Froehlich Gavin Gisle Gondwana Graeme Graham Grossmann Hakan Hanak Hans Hay Hedlund Hukins Ian Jacob Kapranoff Karaban Kennedy Kilgore Knop Koenig Koster Lipcon MARKSTOS Mann Mark Martijn Mike Olaf Ondrej Peter Peters Rabbitson Rezic Robert RobotRules RobotUA Rolf Schilli Sean Sjogren Skyttä Slaven Spiros Steinbrunner Steve SteveHay Stone Stosberg Thompson Todd Tom Tony Toru Tourbin Ville WWW Wheeler Yamaguchi Yuri Yuzhaninov Zefram adamk amir amire80 andreas asjo at brong citrin cybermapper david davidrw denaxas dependabot diskcaching dot dsteinbrunner gisle github gpeters haarg hfroehlich iank jefflee john9art ka lib mschilli murphy olaf ondrej phrstbrn rg ribasushi ruff sasao sburke shaohua sprout srezic talby tech todd tom txt uid39246 ville waif wfmann zefram zigorou 00-report-prereqs.dd100644001750001751 520715204207641 17323 0ustar00olafolaf000000000000WWW-RobotRules-6.03/tdo { my $x = { 'configure' => { 'requires' => { 'ExtUtils::MakeMaker' => '0' }, 'suggests' => { 'JSON::PP' => '2.27300' } }, 'develop' => { 'recommends' => { 'Dist::Zilla::PluginBundle::Git::VersionManager' => '0.007' }, 'requires' => { 'File::Spec' => '0', 'IO::Handle' => '0', 'IPC::Open3' => '0', 'Pod::Coverage::TrustPod' => '0', 'Pod::Spell' => '1.25', 'Test::EOL' => '2.00', 'Test::MinimumVersion' => '0', 'Test::Mojibake' => '0', 'Test::More' => '0.94', 'Test::Pod' => '1.41', 'Test::Pod::Coverage' => '1.08', 'Test::Portability::Files' => '0', 'Test::Spelling' => '0.17', 'Test::Version' => '1' } }, 'runtime' => { 'requires' => { 'AnyDBM_File' => '0', 'Carp' => '0', 'Fcntl' => '0', 'URI' => '1.10', 'perl' => '5.008001', 'strict' => '0' }, 'suggests' => { 'DB_File' => '0' } }, 'test' => { 'recommends' => { 'CPAN::Meta' => '2.120900' }, 'requires' => { 'ExtUtils::MakeMaker' => '0', 'File::Spec' => '0', 'Test::More' => '0.96', 'strict' => '0', 'warnings' => '0' } } }; $x; }00-compile.t100644001750001751 257115204207641 17330 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; # this test was generated with Dist::Zilla::Plugin::Test::Compile 2.059 use Test::More 0.94; plan tests => 3; my @module_files = ( 'WWW/RobotRules.pm', 'WWW/RobotRules/AnyDBM_File.pm' ); # no fake home requested my @switches = ( -d 'blib' ? '-Mblib' : '-Ilib', ); use File::Spec; use IPC::Open3; use IO::Handle; open my $stdin, '<', File::Spec->devnull or die "can't open devnull: $!"; my @warnings; for my $lib (@module_files) { # see L my $stderr = IO::Handle->new; diag('Running: ', join(', ', map { my $str = $_; $str =~ s/'/\\'/g; q{'}.$str.q{'} } $^X, @switches, '-e', "require q[$lib]")) if $ENV{PERL_COMPILE_TEST_DEBUG}; my $pid = open3($stdin, '>&STDERR', $stderr, $^X, @switches, '-e', "require q[$lib]"); binmode $stderr, ':crlf' if $^O eq 'MSWin32'; my @_warnings = <$stderr>; waitpid($pid, 0); is($?, 0, "$lib loaded ok"); shift @_warnings if @_warnings and $_warnings[0] =~ /^Using .*\bblib/ and not eval { +require blib; blib->VERSION('1.01') }; if (@_warnings) { warn @_warnings; push @warnings, @_warnings; } } is(scalar(@warnings), 0, 'no warnings found') or diag 'got warnings: ', explain(\@warnings); BAIL_OUT("Compilation problems") if !Test::More->builder->is_passing; pod-syntax.t100644001750001751 25115204207641 17542 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/author#!perl # This file was automatically generated by Dist::Zilla::Plugin::PodSyntaxTests use strict; use warnings; use Test::More; use Test::Pod 1.41; all_pod_files_ok(); portability.t100644001750001751 13015204207641 17772 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; use Test::More; use Test::Portability::Files; run_tests(); test-version.t100644001750001751 63715204207641 20106 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; use Test::More; # generated by Dist::Zilla::Plugin::Test::Version 1.09 use Test::Version; my @imports = qw( version_all_ok ); my $params = { is_strict => 0, has_version => 1, multiple => 0, }; push @imports, $params if version->parse( $Test::Version::VERSION ) >= version->parse('1.002'); Test::Version->import(@imports); version_all_ok; done_testing; pod-coverage.t100644001750001751 233615204207641 20035 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/author#!perl # This file was automatically generated by Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable 0.07. use Test::Pod::Coverage 1.08; use Test::More 0.88; BEGIN { if ( $] <= 5.008008 ) { plan skip_all => 'These tests require Pod::Coverage::TrustPod, which only works with Perl 5.8.9+'; } } use Pod::Coverage::TrustPod; my %skip = map { $_ => 1 } qw( WWW::RobotRules::AnyDBM_File WWW::RobotRules::DB_File ); my @modules; for my $module ( all_modules() ) { next if $skip{$module}; push @modules, $module; } plan skip_all => 'All the modules we found were excluded from POD coverage test.' unless @modules; plan tests => scalar @modules; my %trustme = ( 'WWW::RobotRules' => [ qr/^(?:Version|is_me|visit|no_visits|last_visit|fresh_until|push_rules|clear_rules|rules|dump)$/ ] ); my @also_private; for my $module ( sort @modules ) { pod_coverage_ok( $module, { coverage_class => 'Pod::Coverage::TrustPod', also_private => \@also_private, trustme => $trustme{$module} || [], }, "pod coverage for $module" ); } done_testing(); minimum-version.t100644001750001751 15415204207641 20574 0ustar00olafolaf000000000000WWW-RobotRules-6.03/xt/authoruse strict; use warnings; use Test::More; use Test::MinimumVersion; all_minimum_version_from_metayml_ok(); RobotRules000755001750001751 015204207641 16526 5ustar00olafolaf000000000000WWW-RobotRules-6.03/lib/WWWDB_File.pm100644001750001751 643215204207641 20455 0ustar00olafolaf000000000000WWW-RobotRules-6.03/lib/WWW/RobotRulespackage WWW::RobotRules::DB_File; use strict; use WWW::RobotRules (); our @ISA = qw(WWW::RobotRules); our $VERSION = '6.03'; use Carp (); use DB_File; use Fcntl qw( O_CREAT O_RDWR ); sub new { my ($class, $name, $file) = @_; Carp::croak('WWW::RobotRules::DB_File cache file required') unless $file; my $self = WWW::RobotRules->new($name); $self = bless $self, $class; tie %{$self->{'rules'}}, DB_File, $file, O_CREAT | O_RDWR, 0640, $DB_HASH; $self; } sub expires { my ($self, $hostport, $expires) = @_; my $old = $self->{'rules'}{"$hostport##expires"}; $old = 0 unless (defined $old); if (defined $expires) { $self->{'rules'}{"$hostport##expires"} = $expires; } $old; } sub roboturl { my ($self, $hostport, $url) = @_; $url = $url->as_string if ref($url); my $old = $self->{'rules'}{"$hostport##url"}; if ($url) { $self->{'rules'}{"$hostport##url"} = $url; } $old; } sub host_count { my ($self, $hostport) = @_; $self->{'rules'}{"$hostport##count"}; } sub last_visit { my ($self, $hostport) = @_; $self->{'rules'}{"$hostport##last"}; } sub visit { my ($self, $hostport, $time) = @_; $time = time unless defined $time; $self->{'rules'}{"$hostport##last"} = $time; if (defined $self->{"rules##$hostport##count"}) { $self->{'rules'}{"$hostport##count"}++; } else { $self->{'rules'}{"$hostport##count"} = 1; } } sub push_rule { my ($self, $hostport, $rule) = @_; my $cnt = 0; foreach (keys %{$self->{'rules'}}) { if (/^$hostport\#\#rule\#\#\d+$/) { $cnt++; } } $self->{'rules'}{"$hostport##rule##$cnt"} = $rule; } sub clear_rules { my ($self, $hostport) = @_; foreach (keys %{$self->{'rules'}}) { if (/^($hostport\#\#rule\#\#\d+)$/) { delete $self->{'rules'}{$1}; } } } sub rules { my ($self, $hostport) = @_; my @rules = []; foreach (keys %{$self->{'rules'}}) { if (/^($hostport\#\#rule\#\#\d+)$/) { push(@rules, $self->{'rules'}{$1}); } } return \@rules; } sub hosts { my ($self) = @_; my @hosts; foreach (keys %{$self->{'rules'}}) { if (/^([^\#]+)\#\#count/) { push(@hosts, $1); } } return \@hosts; } 1; __END__ =head1 NAME WWW::RobotRules::DB_File - Parse robots.txt files using a disk cache =head1 SYNOPSIS require WWW::RobotRules::DB_File; require LWP::RobotUA; #Create a robot useragent that uses a disk caching RobotRules $ua = WWW::RobotUA->new( 'my-robot/1.0', 'me@foo.com' , WWW::RobotRules::DB_File->new( 'my-robot/1.0', '/path/cachefile' )); #The just use $ua as usual $res=$ua->request($req); =head1 DESCRIPTION This is a subclass of L that uses the DB_File package to implement disk caching of robots.txt. =head1 METHODS This is a subclass of L, so it implements the same methods =over 4 =item $rules = WWW::RobotRules::DB_File->new('my-robot/1.0', /path/cachefile) This is the constructor. The only difference from the original constructor from L is that you here has to specify a cache file as well. =back =head1 SEE ALSO L =head1 AUTHOR Hakan Ardo =cut AnyDBM_File.pm100644001750001751 1045315204207641 21260 0ustar00olafolaf000000000000WWW-RobotRules-6.03/lib/WWW/RobotRulespackage WWW::RobotRules::AnyDBM_File; use strict; use WWW::RobotRules (); our @ISA = qw(WWW::RobotRules); our $VERSION = '6.03'; use Carp (); use AnyDBM_File; use Fcntl qw( O_CREAT O_RDWR ); sub new { my ($class, $ua, $file) = @_; Carp::croak('WWW::RobotRules::AnyDBM_File filename required') unless $file; my $self = bless {}, $class; $self->{'filename'} = $file; tie %{$self->{'dbm'}}, 'AnyDBM_File', $file, O_CREAT | O_RDWR, 0600 or Carp::croak("Can't open $file: $!"); if ($ua) { $self->agent($ua); } else { # Try to obtain name from DBM file $ua = $self->{'dbm'}{"|ua-name|"}; Carp::croak("No agent name specified") unless $ua; } $self; } sub agent { my ($self, $newname) = @_; my $old = $self->{'dbm'}{"|ua-name|"}; if (defined $newname) { $newname =~ s!/?\s*\d+.\d+\s*$!!; # loose version unless ($old && $old eq $newname) { # Old info is now stale. Clear all keys through the tied # interface rather than untie+tie(O_TRUNC), which is a # symlink-follow TOCTOU on the DBM-backing file(s). %{$self->{'dbm'}} = (); $self->{'dbm'}{"|ua-name|"} = $newname; } } $old; } sub no_visits { my ($self, $netloc) = @_; my $t = $self->{'dbm'}{"$netloc|vis"}; return 0 unless $t; (split(/;\s*/, $t))[0]; } sub last_visit { my ($self, $netloc) = @_; my $t = $self->{'dbm'}{"$netloc|vis"}; return undef unless $t; (split(/;\s*/, $t))[1]; } sub fresh_until { my ($self, $netloc, $fresh) = @_; my $old = $self->{'dbm'}{"$netloc|exp"}; if ($old) { $old =~ s/;.*//; # remove cleartext } if (defined $fresh) { $fresh .= "; " . localtime($fresh); $self->{'dbm'}{"$netloc|exp"} = $fresh; } $old; } sub visit { my ($self, $netloc, $time) = @_; $time ||= time; my $count = 0; my $old = $self->{'dbm'}{"$netloc|vis"}; if ($old) { my $last; ($count, $last) = split(/;\s*/, $old); $time = $last if $last > $time; } $count++; $self->{'dbm'}{"$netloc|vis"} = "$count; $time; " . localtime($time); } sub push_rules { my ($self, $netloc, @rules) = @_; my $cnt = 1; $cnt++ while $self->{'dbm'}{"$netloc|r$cnt"}; foreach (@rules) { $self->{'dbm'}{"$netloc|r$cnt"} = $_; $cnt++; } } sub clear_rules { my ($self, $netloc) = @_; my $cnt = 1; while ($self->{'dbm'}{"$netloc|r$cnt"}) { delete $self->{'dbm'}{"$netloc|r$cnt"}; $cnt++; } } sub rules { my ($self, $netloc) = @_; my @rules = (); my $cnt = 1; while (1) { my $rule = $self->{'dbm'}{"$netloc|r$cnt"}; last unless $rule; push(@rules, $rule); $cnt++; } @rules; } sub dump { } 1; __END__ =head1 NAME WWW::RobotRules::AnyDBM_File - Persistent RobotRules =head1 SYNOPSIS require WWW::RobotRules::AnyDBM_File; require LWP::RobotUA; # Create a robot useragent that uses a diskcaching RobotRules my $rules = WWW::RobotRules::AnyDBM_File->new( 'my-robot/1.0', 'cachefile' ); my $ua = WWW::RobotUA->new( 'my-robot/1.0', 'me@foo.com', $rules ); # Then just use $ua as usual $res = $ua->request($req); =head1 DESCRIPTION This is a subclass of I that uses the AnyDBM_File package to implement persistent diskcaching of F and host visit information. The constructor (the new() method) takes an extra argument specifying the name of the DBM file to use. If the DBM file already exists, then you can specify undef as agent name as the name can be obtained from the DBM database. =head1 SECURITY CONSIDERATIONS The caller-supplied DBM filename must reside in a directory writable only by the same user that runs this code. The underlying C backends open the file via the C C syscall without C, so a symlink at the cache path (or at its C<.dir>/C<.pag>/C<.db> siblings) will be followed and the linked target may be overwritten with DBM page data. The cache file is created with mode C<0600>; callers that need different permissions can C after construction. =head1 SEE ALSO L, L =head1 AUTHORS Hakan Ardo , Gisle Aas =cut