WWW-RobotRules-6.03 000755 001750 001751 0 15204207641 13253 5 ustar 00olaf olaf 000000 000000 Changes 100644 001750 001751 2730 15204207641 14631 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 Revision history for Perl distribution WWW-RobotRules
6.03 2026-05-23 02:23:28Z
- Doing a proper version bump.
6.02 2026-05-21 14:45:27Z
- WWW::RobotRules::AnyDBM_File::agent() no longer truncates the on-disk
cache through an untie/tie(O_TRUNC) sequence. Stale-data reset now goes
through the tied-hash CLEAR, eliminating a symlink-follow race that a
local attacker with write access to the cache directory could exploit
to overwrite arbitrary files writable by the crawler user.
- The on-disk cache file mode has been tightened from 0640 to 0600.
- t/rules-dbm.t has been hardened against symlink attacks on its tempfile
during package builds.
- A new SECURITY CONSIDERATIONS POD section documents the residual
caller-trust requirement: the constructor's tie still follows symlinks
because AnyDBM_File cannot portably plumb O_NOFOLLOW, so the caller
must store the cache file in a directory writable only by the user
that runs the code.
- References: CWE-377, CWE-378, CWE-379.
6.02 2012-02-18
- Restore perl-5.8.1 compatibility.
6.01 2011-03-13
- Added legal notice and updated the meta repository link.
6.00 2011-02-25
- Initial release of WWW-RobotRules as a separate distribution. There are
no code changes besides incrementing the version number since
libwww-perl-5.837. The WWW::RobotRules module used to be bundled with
the libwww-perl distribution.
LICENSE 100644 001750 001751 46270 15204207641 14372 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 This software is copyright (c) 1995 by Gisle Aas.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
Terms of the Perl programming language system itself
a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any
later version, or
b) the "Artistic License"
--- The GNU General Public License, Version 1, February 1989 ---
This software is Copyright (c) 1995 by Gisle Aas.
This is free software, licensed under:
The GNU General Public License, Version 1, February 1989
GNU GENERAL PUBLIC LICENSE
Version 1, February 1989
Copyright (C) 1989 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The license agreements of most software companies try to keep users
at the mercy of those companies. By contrast, our General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. The
General Public License applies to the Free Software Foundation's
software and to any other program whose authors commit to using it.
You can use it for your programs, too.
When we speak of free software, we are referring to freedom, not
price. Specifically, the General Public License is designed to make
sure that you have the freedom to give away or sell copies of free
software, that you receive source code or can get it if you want it,
that you can change the software or use pieces of it in new free
programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of a such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must tell them their rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any program or other work which
contains a notice placed by the copyright holder saying it may be
distributed under the terms of this General Public License. The
"Program", below, refers to any such program or work, and a "work based
on the Program" means either the Program or any work containing the
Program or a portion of it, either verbatim or with modifications. Each
licensee is addressed as "you".
1. You may copy and distribute verbatim copies of the Program's source
code as you receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice and
disclaimer of warranty; keep intact all the notices that refer to this
General Public License and to the absence of any warranty; and give any
other recipients of the Program a copy of this General Public License
along with the Program. You may charge a fee for the physical act of
transferring a copy.
2. You may modify your copy or copies of the Program or any portion of
it, and copy and distribute such modifications under the terms of Paragraph
1 above, provided that you also do the following:
a) cause the modified files to carry prominent notices stating that
you changed the files and the date of any change; and
b) cause the whole of any work that you distribute or publish, that
in whole or in part contains the Program or any part thereof, either
with or without modifications, to be licensed at no charge to all
third parties under the terms of this General Public License (except
that you may choose to grant warranty protection to some or all
third parties, at your option).
c) If the modified program normally reads commands interactively when
run, you must cause it, when started running for such interactive use
in the simplest and most usual way, to print or display an
announcement including an appropriate copyright notice and a notice
that there is no warranty (or else, saying that you provide a
warranty) and that users may redistribute the program under these
conditions, and telling the user how to view a copy of this General
Public License.
d) You may charge a fee for the physical act of transferring a
copy, and you may at your option offer warranty protection in
exchange for a fee.
Mere aggregation of another independent work with the Program (or its
derivative) on a volume of a storage or distribution medium does not bring
the other work under the scope of these terms.
3. You may copy and distribute the Program (or a portion or derivative of
it, under Paragraph 2) in object code or executable form under the terms of
Paragraphs 1 and 2 above provided that you also do one of the following:
a) accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of
Paragraphs 1 and 2 above; or,
b) accompany it with a written offer, valid for at least three
years, to give any third party free (except for a nominal charge
for the cost of distribution) a complete machine-readable copy of the
corresponding source code, to be distributed under the terms of
Paragraphs 1 and 2 above; or,
c) accompany it with the information you received as to where the
corresponding source code may be obtained. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form alone.)
Source code for a work means the preferred form of the work for making
modifications to it. For an executable file, complete source code means
all the source code for all modules it contains; but, as a special
exception, it need not include source code for modules which are standard
libraries that accompany the operating system on which the executable
file runs, or for standard header files or definitions files that
accompany that operating system.
4. You may not copy, modify, sublicense, distribute or transfer the
Program except as expressly provided under this General Public License.
Any attempt otherwise to copy, modify, sublicense, distribute or transfer
the Program is void, and will automatically terminate your rights to use
the Program under this License. However, parties who have received
copies, or rights to use copies, from you under this General Public
License will not have their licenses terminated so long as such parties
remain in full compliance.
5. By copying, distributing or modifying the Program (or any work based
on the Program) you indicate your acceptance of this license to do so,
and all its terms and conditions.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the original
licensor to copy, distribute or modify the Program subject to these
terms and conditions. You may not impose any further restrictions on the
recipients' exercise of the rights granted herein.
7. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of the license which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
the license, you may choose any version ever published by the Free Software
Foundation.
8. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
Appendix: How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to humanity, the best way to achieve this is to make it
free software which everyone can redistribute and change under these
terms.
To do so, attach the following notices to the program. It is safest to
attach them to the start of each source file to most effectively convey
the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
Copyright (C) 19yy
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 1, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, see .
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) 19xx name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the
appropriate parts of the General Public License. Of course, the
commands you use may be called something other than `show w' and `show
c'; they could even be mouse-clicks or menu items--whatever suits your
program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
program `Gnomovision' (a program to direct compilers to make passes
at assemblers) written by James Hacker.
, 1 April 1989
Moe Ghoul, President of Vice
That's all there is to it!
--- The Perl Artistic License 1.0 ---
This software is Copyright (c) 1995 by Gisle Aas.
This is free software, licensed under:
The Perl Artistic License 1.0
The "Artistic License"
Preamble
The intent of this document is to state the conditions under which a
Package may be copied, such that the Copyright Holder maintains some
semblance of artistic control over the development of the package,
while giving the users of the package the right to use and distribute
the Package in a more-or-less customary fashion, plus the right to make
reasonable modifications.
Definitions:
"Package" refers to the collection of files distributed by the
Copyright Holder, and derivatives of that collection of files
created through textual modification.
"Standard Version" refers to such a Package if it has not been
modified, or has been modified in accordance with the wishes
of the Copyright Holder as specified below.
"Copyright Holder" is whoever is named in the copyright or
copyrights for the package.
"You" is you, if you're thinking about copying or distributing
this Package.
"Reasonable copying fee" is whatever you can justify on the
basis of media cost, duplication charges, time of people involved,
and so on. (You will not be required to justify it to the
Copyright Holder, but only to the computing community at large
as a market that must bear the fee.)
"Freely Available" means that no fee is charged for the item
itself, though there may be fees involved in handling the item.
It also means that recipients of the item may redistribute it
under the same conditions they received it.
1. You may make and give away verbatim copies of the source form of the
Standard Version of this Package without restriction, provided that you
duplicate all of the original copyright notices and associated disclaimers.
2. You may apply bug fixes, portability fixes and other modifications
derived from the Public Domain or from the Copyright Holder. A Package
modified in such a way shall still be considered the Standard Version.
3. You may otherwise modify your copy of this Package in any way, provided
that you insert a prominent notice in each changed file stating how and
when you changed that file, and provided that you do at least ONE of the
following:
a) place your modifications in the Public Domain or otherwise make them
Freely Available, such as by posting said modifications to Usenet or
an equivalent medium, or placing the modifications on a major archive
site such as uunet.uu.net, or by allowing the Copyright Holder to include
your modifications in the Standard Version of the Package.
b) use the modified Package only within your corporation or organization.
c) rename any non-standard executables so the names do not conflict
with standard executables, which must also be provided, and provide
a separate manual page for each non-standard executable that clearly
documents how it differs from the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
4. You may distribute the programs of this Package in object code or
executable form, provided that you do at least ONE of the following:
a) distribute a Standard Version of the executables and library files,
together with instructions (in the manual page or equivalent) on where
to get the Standard Version.
b) accompany the distribution with the machine-readable source of
the Package with your modifications.
c) give non-standard executables non-standard names, and clearly
document the differences in manual pages (or equivalent), together
with instructions on where to get the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
5. You may charge a reasonable copying fee for any distribution of this
Package. You may charge any fee you choose for support of this
Package. You may not charge a fee for this Package itself. However,
you may distribute this Package in aggregate with other (possibly
commercial) programs as part of a larger (possibly commercial) software
distribution provided that you do not advertise this Package as a
product of your own. You may embed this Package's interpreter within
an executable of yours (by linking); this shall be construed as a mere
form of aggregation, provided that the complete Standard Version of the
interpreter is so embedded.
6. The scripts and library files supplied as input to or produced as
output from the programs of this Package do not automatically fall
under the copyright of this Package, but belong to whoever generated
them, and may be sold commercially, and may be aggregated with this
Package. If such scripts or library files are aggregated with this
Package via the so-called "undump" or "unexec" methods of producing a
binary executable image, then distribution of such an image shall
neither be construed as a distribution of this Package nor shall it
fall under the restrictions of Paragraphs 3 and 4, provided that you do
not represent such an executable image as a Standard Version of this
Package.
7. C subroutines (or comparably compiled subroutines in other
languages) supplied by you and linked into this Package in order to
emulate subroutines and variables of the language defined by this
Package shall not be considered part of this Package, but are the
equivalent of input as in Paragraph 6, provided these subroutines do
not change the language in any way that would cause it to fail the
regression tests for the language.
8. Aggregation of this Package with a commercial distribution is always
permitted provided that the use of this Package is embedded; that is,
when no overt attempt is made to make this Package's interfaces visible
to the end user of the commercial distribution. Such use shall not be
construed as a distribution of this Package.
9. The name of the Copyright Holder may not be used to endorse or promote
products derived from this software without specific prior written permission.
10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The End
INSTALL 100644 001750 001751 4571 15204207641 14374 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 This is the Perl distribution WWW-RobotRules.
Installing WWW-RobotRules is straightforward.
## Installation with cpanm
If you have cpanm, you only need one line:
% cpanm WWW::RobotRules
If it does not have permission to install modules to the current perl, cpanm
will automatically set up and install to a local::lib in your home directory.
See the local::lib documentation (https://metacpan.org/pod/local::lib) for
details on enabling it in your environment.
## Installing with the CPAN shell
Alternatively, if your CPAN shell is set up, you should just be able to do:
% cpan WWW::RobotRules
## Manual installation
As a last resort, you can manually install it. If you have not already
downloaded the release tarball, you can find the download link on the module's
MetaCPAN page: https://metacpan.org/pod/WWW::RobotRules
Untar the tarball, install configure prerequisites (see below), then build it:
% perl Makefile.PL
% make && make test
Then install it:
% make install
On Windows platforms, you should use `dmake` or `nmake`, instead of `make`.
If your perl is system-managed, you can create a local::lib in your home
directory to install modules to. For details, see the local::lib documentation:
https://metacpan.org/pod/local::lib
The prerequisites of this distribution will also have to be installed manually. The
prerequisites are listed in one of the files: `MYMETA.yml` or `MYMETA.json` generated
by running the manual build process described above.
## Configure Prerequisites
This distribution requires other modules to be installed before this
distribution's installer can be run. They can be found under the
"configure_requires" key of META.yml or the
"{prereqs}{configure}{requires}" key of META.json.
## Other Prerequisites
This distribution may require additional modules to be installed after running
Makefile.PL.
Look for prerequisites in the following phases:
* to run make, PHASE = build
* to use the module code itself, PHASE = runtime
* to run tests, PHASE = test
They can all be found in the "PHASE_requires" key of MYMETA.yml or the
"{prereqs}{PHASE}{requires}" key of MYMETA.json.
## Documentation
WWW-RobotRules documentation is available as POD.
You can run `perldoc` from a shell to read the documentation:
% perldoc WWW::RobotRules
For more information on installing Perl modules via CPAN, please see:
https://www.cpan.org/modules/INSTALL.html
cpanfile 100644 001750 001751 1600 15204207641 15035 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 use strict;
use warnings;
on 'configure' => sub {
requires 'ExtUtils::MakeMaker';
};
on 'runtime' => sub {
requires 'perl' => '5.008001';
requires 'strict';
requires 'AnyDBM_File';
requires 'Carp';
requires 'Fcntl';
requires 'URI' => '1.10';
# DB_File is only needed for the optional WWW::RobotRules::DB_File
# backend. It is a non-core XS module, so it is suggested, not required.
suggests 'DB_File';
};
on 'test' => sub {
requires 'Test::More' => '0.96';
requires 'strict';
requires 'warnings';
};
on 'develop' => sub {
requires 'Pod::Coverage::TrustPod';
requires 'Pod::Spell' => '1.25';
requires 'Test::EOL' => '2.00';
requires 'Test::MinimumVersion';
requires 'Test::Mojibake';
requires 'Test::Pod';
requires 'Test::Pod::Coverage';
requires 'Test::Portability::Files';
requires 'Test::Version';
};
dist.ini 100644 001750 001751 3776 15204207641 15015 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 name = WWW-RobotRules
author = Gisle Aas
license = Perl_5
copyright_holder = Gisle Aas
copyright_year = 1995
[Git::GatherDir]
exclude_filename = LICENSE
exclude_filename = META.json
exclude_filename = Makefile.PL
exclude_filename = perlimports.toml
exclude_filename = precious.toml
exclude_filename = .perltidyrc
exclude_filename = README.md
[MetaConfig]
[MetaProvides::Package]
[MetaNoIndex]
directory = t
directory = xt
[MetaYAML]
[MetaJSON]
[MetaResources]
x_MailingList = mailto:libwww@perl.org
[Git::Contributors]
[GithubMeta]
issues = 1
user = libwww-perl
[Manifest]
[License]
[InstallGuide]
:version = 1.200013
[Prereqs::FromCPANfile]
[MakeMaker]
[MojibakeTests]
[Test::Version]
[Test::ReportPrereqs]
[Test::Compile]
:version = 2.059
bail_out_on_fail = 1
xt_mode = 1
; WWW::RobotRules::DB_File needs the optional, non-core DB_File module
skip = WWW::RobotRules::DB_File
[Test::Portability]
[Test::EOL]
[Test::MinimumVersion]
[PodSyntaxTests]
[Test::Pod::Coverage::Configurable]
skip = WWW::RobotRules::AnyDBM_File
skip = WWW::RobotRules::DB_File
trustme = WWW::RobotRules => qr/^(?:Version|is_me|visit|no_visits|last_visit|fresh_until|push_rules|clear_rules|rules|dump)$/
[Test::PodSpelling]
wordlist = Pod::Wordlist
stopword = AnyDBM
stopword = Aas
stopword = Ardo
stopword = cybermapper
stopword = DBM
stopword = diskcaching
stopword = Gisle
stopword = Hakan
stopword = Koster
stopword = Martijn
stopword = RobotUA
stopword = txt
[Git::Check]
allow_dirty =
[CheckStrictVersion]
decimal_only = 1
[RunExtraTests]
[CheckChangeLog]
[CheckChangesHasContent]
[TestRelease]
[UploadToCPAN]
[ReadmeAnyFromPod / Markdown_Readme]
source_filename = lib/WWW/RobotRules.pm
type = markdown
filename = README.md
location = root
phase = release
[CopyFilesFromRelease]
filename = META.json
filename = LICENSE
[@Git::VersionManager]
commit_files_after_release = META.json
commit_files_after_release = LICENSE
commit_files_after_release = README.md
[Git::Push]
[ConfirmRelease]
META.yml 100644 001750 001751 36010 15204207641 14625 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 ---
abstract: 'database of robots.txt-derived permissions'
author:
- 'Gisle Aas '
build_requires:
ExtUtils::MakeMaker: '0'
File::Spec: '0'
Test::More: '0.96'
strict: '0'
warnings: '0'
configure_requires:
ExtUtils::MakeMaker: '0'
dynamic_config: 0
generated_by: 'Dist::Zilla version 6.037, CPAN::Meta::Converter version 2.150013'
license: perl
meta-spec:
url: http://module-build.sourceforge.net/META-spec-v1.4.html
version: '1.4'
name: WWW-RobotRules
no_index:
directory:
- t
- xt
provides:
WWW::RobotRules:
file: lib/WWW/RobotRules.pm
version: '6.03'
WWW::RobotRules::AnyDBM_File:
file: lib/WWW/RobotRules/AnyDBM_File.pm
version: '6.03'
WWW::RobotRules::DB_File:
file: lib/WWW/RobotRules/DB_File.pm
version: '6.03'
WWW::RobotRules::InCore:
file: lib/WWW/RobotRules.pm
version: '6.03'
requires:
AnyDBM_File: '0'
Carp: '0'
Fcntl: '0'
URI: '1.10'
perl: '5.008001'
strict: '0'
resources:
MailingList: mailto:libwww@perl.org
bugtracker: https://github.com/libwww-perl/WWW-RobotRules/issues
homepage: https://github.com/libwww-perl/WWW-RobotRules
repository: https://github.com/libwww-perl/WWW-RobotRules.git
version: '6.03'
x_Dist_Zilla:
perl:
version: '5.042002'
plugins:
-
class: Dist::Zilla::Plugin::Git::GatherDir
config:
Dist::Zilla::Plugin::GatherDir:
exclude_filename:
- .perltidyrc
- LICENSE
- META.json
- Makefile.PL
- README.md
- perlimports.toml
- precious.toml
exclude_match: []
include_dotfiles: 0
prefix: ''
prune_directory: []
root: .
Dist::Zilla::Plugin::Git::GatherDir:
include_untracked: 0
name: Git::GatherDir
version: '2.052'
-
class: Dist::Zilla::Plugin::MetaConfig
name: MetaConfig
version: '6.037'
-
class: Dist::Zilla::Plugin::MetaProvides::Package
config:
Dist::Zilla::Plugin::MetaProvides::Package:
finder_objects:
-
class: Dist::Zilla::Plugin::FinderCode
name: MetaProvides::Package/AUTOVIV/:InstallModulesPM
version: '6.037'
include_underscores: 0
Dist::Zilla::Role::MetaProvider::Provider:
$Dist::Zilla::Role::MetaProvider::Provider::VERSION: '2.002004'
inherit_missing: 1
inherit_version: 1
meta_noindex: 1
Dist::Zilla::Role::ModuleMetadata:
Module::Metadata: '1.000038'
version: '0.006'
name: MetaProvides::Package
version: '2.004003'
-
class: Dist::Zilla::Plugin::MetaNoIndex
name: MetaNoIndex
version: '6.037'
-
class: Dist::Zilla::Plugin::MetaYAML
name: MetaYAML
version: '6.037'
-
class: Dist::Zilla::Plugin::MetaJSON
name: MetaJSON
version: '6.037'
-
class: Dist::Zilla::Plugin::MetaResources
name: MetaResources
version: '6.037'
-
class: Dist::Zilla::Plugin::Git::Contributors
config:
Dist::Zilla::Plugin::Git::Contributors:
git_version: 2.43.0
include_authors: 0
include_releaser: 1
order_by: name
paths: []
name: Git::Contributors
version: '0.039'
-
class: Dist::Zilla::Plugin::GithubMeta
name: GithubMeta
version: '0.58'
-
class: Dist::Zilla::Plugin::Manifest
name: Manifest
version: '6.037'
-
class: Dist::Zilla::Plugin::License
name: License
version: '6.037'
-
class: Dist::Zilla::Plugin::InstallGuide
config:
Dist::Zilla::Role::ModuleMetadata:
Module::Metadata: '1.000038'
version: '0.006'
name: InstallGuide
version: '1.200014'
-
class: Dist::Zilla::Plugin::Prereqs::FromCPANfile
name: Prereqs::FromCPANfile
version: '0.08'
-
class: Dist::Zilla::Plugin::MakeMaker
config:
Dist::Zilla::Role::TestRunner:
default_jobs: '8'
name: MakeMaker
version: '6.037'
-
class: Dist::Zilla::Plugin::MojibakeTests
name: MojibakeTests
version: '0.8'
-
class: Dist::Zilla::Plugin::Test::Version
name: Test::Version
version: '1.09'
-
class: Dist::Zilla::Plugin::Test::ReportPrereqs
name: Test::ReportPrereqs
version: '0.029'
-
class: Dist::Zilla::Plugin::Test::Compile
config:
Dist::Zilla::Plugin::Test::Compile:
bail_out_on_fail: '1'
fail_on_warning: author
fake_home: 0
filename: xt/author/00-compile.t
module_finder:
- ':InstallModules'
needs_display: 0
phase: develop
script_finder:
- ':PerlExecFiles'
skips:
- WWW::RobotRules::DB_File
switch: []
name: Test::Compile
version: '2.059'
-
class: Dist::Zilla::Plugin::Test::Portability
config:
Dist::Zilla::Plugin::Test::Portability:
options: ''
name: Test::Portability
version: '2.001003'
-
class: Dist::Zilla::Plugin::Test::EOL
config:
Dist::Zilla::Plugin::Test::EOL:
filename: xt/author/eol.t
finder:
- ':ExecFiles'
- ':InstallModules'
- ':TestFiles'
trailing_whitespace: 1
name: Test::EOL
version: '0.19'
-
class: Dist::Zilla::Plugin::Test::MinimumVersion
config:
Dist::Zilla::Plugin::Test::MinimumVersion:
max_target_perl: ~
name: Test::MinimumVersion
version: '2.000011'
-
class: Dist::Zilla::Plugin::PodSyntaxTests
name: PodSyntaxTests
version: '6.037'
-
class: Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable
name: Test::Pod::Coverage::Configurable
version: '0.07'
-
class: Dist::Zilla::Plugin::Test::PodSpelling
config:
Dist::Zilla::Plugin::Test::PodSpelling:
directories:
- bin
- lib
spell_cmd: ''
stopwords:
- Aas
- AnyDBM
- Ardo
- DBM
- Gisle
- Hakan
- Koster
- Martijn
- RobotUA
- cybermapper
- diskcaching
- txt
wordlist: Pod::Wordlist
name: Test::PodSpelling
version: '2.007006'
-
class: Dist::Zilla::Plugin::Git::Check
config:
Dist::Zilla::Plugin::Git::Check:
untracked_files: die
Dist::Zilla::Role::Git::DirtyFiles:
allow_dirty: []
allow_dirty_match: []
changelog: Changes
Dist::Zilla::Role::Git::Repo:
git_version: 2.43.0
repo_root: .
name: Git::Check
version: '2.052'
-
class: Dist::Zilla::Plugin::CheckStrictVersion
name: CheckStrictVersion
version: '0.001'
-
class: Dist::Zilla::Plugin::RunExtraTests
config:
Dist::Zilla::Role::TestRunner:
default_jobs: '8'
name: RunExtraTests
version: '0.029'
-
class: Dist::Zilla::Plugin::CheckChangeLog
name: CheckChangeLog
version: '0.05'
-
class: Dist::Zilla::Plugin::CheckChangesHasContent
name: CheckChangesHasContent
version: '0.011'
-
class: Dist::Zilla::Plugin::TestRelease
name: TestRelease
version: '6.037'
-
class: Dist::Zilla::Plugin::UploadToCPAN
name: UploadToCPAN
version: '6.037'
-
class: Dist::Zilla::Plugin::ReadmeAnyFromPod
config:
Dist::Zilla::Role::FileWatcher:
version: '0.006'
name: Markdown_Readme
version: '0.163250'
-
class: Dist::Zilla::Plugin::CopyFilesFromRelease
config:
Dist::Zilla::Plugin::CopyFilesFromRelease:
filename:
- LICENSE
- META.json
match: []
name: CopyFilesFromRelease
version: '0.007'
-
class: Dist::Zilla::Plugin::Prereqs
config:
Dist::Zilla::Plugin::Prereqs:
phase: develop
type: recommends
name: '@Git::VersionManager/pluginbundle version'
version: '6.037'
-
class: Dist::Zilla::Plugin::RewriteVersion::Transitional
config:
Dist::Zilla::Plugin::RewriteVersion:
add_tarball_name: 0
finders:
- ':ExecFiles'
- ':InstallModules'
global: 0
skip_version_provider: 0
Dist::Zilla::Plugin::RewriteVersion::Transitional: {}
name: '@Git::VersionManager/RewriteVersion::Transitional'
version: '0.009'
-
class: Dist::Zilla::Plugin::MetaProvides::Update
name: '@Git::VersionManager/MetaProvides::Update'
version: '0.007'
-
class: Dist::Zilla::Plugin::CopyFilesFromRelease
config:
Dist::Zilla::Plugin::CopyFilesFromRelease:
filename:
- Changes
match: []
name: '@Git::VersionManager/CopyFilesFromRelease'
version: '0.007'
-
class: Dist::Zilla::Plugin::Git::Commit
config:
Dist::Zilla::Plugin::Git::Commit:
add_files_in: []
commit_msg: v%V%n%n%c
signoff: 0
Dist::Zilla::Role::Git::DirtyFiles:
allow_dirty:
- Changes
- LICENSE
- META.json
- README.md
allow_dirty_match: []
changelog: Changes
Dist::Zilla::Role::Git::Repo:
git_version: 2.43.0
repo_root: .
Dist::Zilla::Role::Git::StringFormatter:
time_zone: local
name: '@Git::VersionManager/release snapshot'
version: '2.052'
-
class: Dist::Zilla::Plugin::Git::Tag
config:
Dist::Zilla::Plugin::Git::Tag:
branch: ~
changelog: Changes
signed: 0
tag: v6.03
tag_format: v%V
tag_message: v%V
Dist::Zilla::Role::Git::Repo:
git_version: 2.43.0
repo_root: .
Dist::Zilla::Role::Git::StringFormatter:
time_zone: local
name: '@Git::VersionManager/Git::Tag'
version: '2.052'
-
class: Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional
config:
Dist::Zilla::Plugin::BumpVersionAfterRelease:
finders:
- ':ExecFiles'
- ':InstallModules'
global: 0
munge_makefile_pl: 1
Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional: {}
name: '@Git::VersionManager/BumpVersionAfterRelease::Transitional'
version: '0.009'
-
class: Dist::Zilla::Plugin::NextRelease
name: '@Git::VersionManager/NextRelease'
version: '6.037'
-
class: Dist::Zilla::Plugin::Git::Commit
config:
Dist::Zilla::Plugin::Git::Commit:
add_files_in: []
commit_msg: 'increment $VERSION after %v release'
signoff: 0
Dist::Zilla::Role::Git::DirtyFiles:
allow_dirty:
- Build.PL
- Changes
- Makefile.PL
allow_dirty_match:
- (?^:^lib/.*\.pm$)
changelog: Changes
Dist::Zilla::Role::Git::Repo:
git_version: 2.43.0
repo_root: .
Dist::Zilla::Role::Git::StringFormatter:
time_zone: local
name: '@Git::VersionManager/post-release commit'
version: '2.052'
-
class: Dist::Zilla::Plugin::Git::Push
config:
Dist::Zilla::Plugin::Git::Push:
push_to:
- origin
remotes_must_exist: 1
Dist::Zilla::Role::Git::Repo:
git_version: 2.43.0
repo_root: .
name: Git::Push
version: '2.052'
-
class: Dist::Zilla::Plugin::ConfirmRelease
name: ConfirmRelease
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':InstallModules'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':IncModules'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':TestFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':ExtraTestFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':ExecFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':PerlExecFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':ShareFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':MainModule'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':AllFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: ':NoFiles'
version: '6.037'
-
class: Dist::Zilla::Plugin::FinderCode
name: MetaProvides::Package/AUTOVIV/:InstallModulesPM
version: '6.037'
zilla:
class: Dist::Zilla::Dist::Builder
config:
is_trial: 0
version: '6.037'
x_contributors:
- 'Adam Kennedy '
- 'Adam Sjogren '
- 'Alexey Tourbin '
- 'Alex Kapranoff '
- 'amire80 '
- 'Andreas J. Koenig '
- 'Anton Yuzhaninov '
- 'Bill Mann '
- 'Bron Gondwana '
- 'Daniel Hedlund '
- 'David E. Wheeler '
- 'DAVIDRW '
- 'David Steinbrunner '
- 'dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>'
- 'Father Chrysostomos '
- 'FWILES '
- 'Gavin Peters '
- 'Graeme Thompson '
- 'Graham Knop '
- 'Hans-H. Froehlich '
- 'Ian Kilgore '
- 'Jacob J '
- 'jefflee '
- 'john9art '
- 'Mark Stosberg '
- 'Mike Schilli '
- 'mschilli '
- 'murphy '
- 'Olaf Alders '
- 'Ondrej Hanak '
- 'Peter Rabbitson '
- 'phrstbrn '
- 'Robert Stone '
- 'Rolf Grossmann '
- 'ruff '
- 'sasao '
- 'Sean M. Burke '
- 'Slaven Rezic '
- 'Spiros Denaxas '
- 'Steve Hay '
- 'Todd Lipcon '
- 'Tom Hukins '
- 'Tony Finch '
- 'Toru Yamaguchi '
- 'uid39246 '
- 'Ville Skyttä '
- 'Yuri Karaban '
- 'Zefram '
x_generated_by_perl: v5.42.2
x_serialization_backend: 'YAML::Tiny version 1.76'
x_spdx_expression: 'Artistic-1.0-Perl OR GPL-1.0-or-later'
MANIFEST 100644 001750 001751 1034 15204207641 14463 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 # This file was automatically generated by Dist::Zilla::Plugin::Manifest v6.037
Changes
INSTALL
LICENSE
MANIFEST
META.json
META.yml
Makefile.PL
cpanfile
dist.ini
lib/WWW/RobotRules.pm
lib/WWW/RobotRules/AnyDBM_File.pm
lib/WWW/RobotRules/DB_File.pm
t/00-report-prereqs.dd
t/00-report-prereqs.t
t/misc/dbmrobot
t/rules-dbm.t
t/rules.t
xt/author/00-compile.t
xt/author/eol.t
xt/author/minimum-version.t
xt/author/mojibake.t
xt/author/pod-coverage.t
xt/author/pod-spell.t
xt/author/pod-syntax.t
xt/author/portability.t
xt/author/test-version.t
t 000755 001750 001751 0 15204207641 13437 5 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 rules.t 100644 001750 001751 11350 15204207641 15136 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t use strict;
use warnings;
use Test::More;
use WWW::RobotRules ();
# We test a number of different /robots.txt files,
#
my $content1 = < 1 => 'http://foo/private' => 1,
2 => 'http://foo/also_private' => 1,
],
[
$content1, 'Wubble' => 3 => 'http://foo/private' => 0,
4 => 'http://foo/also_private' => 0,
5 => 'http://foo/other' => 1,
],
[
$content2, 'MOMspider' => 6 => 'http://foo/private' => 0,
7 => 'http://foo/other' => 1,
],
[
$content2, 'Wubble' => 8 => 'http://foo/private' => 1,
9 => 'http://foo/also_private' => 1,
10 => 'http://foo/other' => 1,
],
[
$content3, 'MOMspider' => 11 => 'http://foo/private' => 1,
12 => 'http://foo/other' => 1,
],
[
$content3, 'Wubble' => 13 => 'http://foo/private' => 1,
14 => 'http://foo/other' => 1,
],
[
$content4, 'MOMspider' => 15 => 'http://foo/private' => 1,
16 => 'http://foo/this' => 0,
17 => 'http://foo/that' => 1,
],
[
$content4, 'Another' => 18 => 'http://foo/private' => 1,
19 => 'http://foo/this' => 1,
20 => 'http://foo/that' => 0,
],
[
$content4, 'Wubble' => 21 => 'http://foo/private' => 0,
22 => 'http://foo/this' => 1,
23 => 'http://foo/that' => 1,
],
[
$content4, 'Another/1.0' => 24 => 'http://foo/private' => 1,
25 => 'http://foo/this' => 1,
26 => 'http://foo/that' => 0,
],
[
$content4, "SvartEnke1" => 27 => "http://foo/" => 0,
28 => "http://foo/this" => 0,
29 => "http://bar/" => 1,
],
[
$content4, "SvartEnke2" => 30 => "http://foo/" => 1,
31 => "http://foo/this" => 1,
32 => "http://bar/" => 1,
],
[
$content4, "MomSpiderJr" => # should match "MomSpider"
33 => 'http://foo/private' => 1,
34 => 'http://foo/also_private' => 1,
35 => 'http://foo/this/' => 0,
],
[
$content4, "SvartEnk" => # should match "*"
36 => "http://foo/" => 1,
37 => "http://foo/private/" => 0,
38 => "http://bar/" => 1,
],
[
$content5, 'Villager/1.0' => 39 => 'http://foo/west-wing/' => 0,
40 => 'http://foo/' => 0,
],
[
$content5, 'Belle/2.0' => 41 => 'http://foo/west-wing/' => 0,
42 => 'http://foo/' => 1,
],
[
$content5, 'Beast/3.0' => 43 => 'http://foo/west-wing/' => 1,
44 => 'http://foo/' => 1,
],
[
$content6, 'Villager/1.0' => 45 => 'http://foo/west-wing/' => 0,
46 => 'http://foo/' => 0,
],
[
$content6, 'Belle/2.0' => 47 => 'http://foo/west-wing/' => 0,
48 => 'http://foo/' => 1,
],
[
$content6, 'Beast/3.0' => 49 => 'http://foo/west-wing/' => 1,
50 => 'http://foo/' => 1,
],
# when adding tests, remember to increase
# the maximum at the top
);
for my $t (@tests1) {
my ($content, $ua) = splice(@$t, 0, 2);
my $robotsrules = WWW::RobotRules->new($ua);
$robotsrules->parse('http://foo/robots.txt', $content);
my ($num, $path, $expected);
while (($num, $path, $expected) = splice(@$t, 0, 3)) {
my $allowed = $robotsrules->allowed($path);
$allowed = 1 if $allowed;
is $allowed, $expected, "$ua => $path" or $robotsrules->dump;
}
}
done_testing;
META.json 100644 001750 001751 56372 15204207641 15012 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 {
"abstract" : "database of robots.txt-derived permissions",
"author" : [
"Gisle Aas "
],
"dynamic_config" : 0,
"generated_by" : "Dist::Zilla version 6.037, CPAN::Meta::Converter version 2.150013",
"license" : [
"perl_5"
],
"meta-spec" : {
"url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
"version" : 2
},
"name" : "WWW-RobotRules",
"no_index" : {
"directory" : [
"t",
"xt"
]
},
"prereqs" : {
"configure" : {
"requires" : {
"ExtUtils::MakeMaker" : "0"
},
"suggests" : {
"JSON::PP" : "2.27300"
}
},
"develop" : {
"recommends" : {
"Dist::Zilla::PluginBundle::Git::VersionManager" : "0.007"
},
"requires" : {
"File::Spec" : "0",
"IO::Handle" : "0",
"IPC::Open3" : "0",
"Pod::Coverage::TrustPod" : "0",
"Pod::Spell" : "1.25",
"Test::EOL" : "2.00",
"Test::MinimumVersion" : "0",
"Test::Mojibake" : "0",
"Test::More" : "0.94",
"Test::Pod" : "1.41",
"Test::Pod::Coverage" : "1.08",
"Test::Portability::Files" : "0",
"Test::Spelling" : "0.17",
"Test::Version" : "1"
}
},
"runtime" : {
"requires" : {
"AnyDBM_File" : "0",
"Carp" : "0",
"Fcntl" : "0",
"URI" : "1.10",
"perl" : "5.008001",
"strict" : "0"
},
"suggests" : {
"DB_File" : "0"
}
},
"test" : {
"recommends" : {
"CPAN::Meta" : "2.120900"
},
"requires" : {
"ExtUtils::MakeMaker" : "0",
"File::Spec" : "0",
"Test::More" : "0.96",
"strict" : "0",
"warnings" : "0"
}
}
},
"provides" : {
"WWW::RobotRules" : {
"file" : "lib/WWW/RobotRules.pm",
"version" : "6.03"
},
"WWW::RobotRules::AnyDBM_File" : {
"file" : "lib/WWW/RobotRules/AnyDBM_File.pm",
"version" : "6.03"
},
"WWW::RobotRules::DB_File" : {
"file" : "lib/WWW/RobotRules/DB_File.pm",
"version" : "6.03"
},
"WWW::RobotRules::InCore" : {
"file" : "lib/WWW/RobotRules.pm",
"version" : "6.03"
}
},
"release_status" : "stable",
"resources" : {
"bugtracker" : {
"web" : "https://github.com/libwww-perl/WWW-RobotRules/issues"
},
"homepage" : "https://github.com/libwww-perl/WWW-RobotRules",
"repository" : {
"type" : "git",
"url" : "https://github.com/libwww-perl/WWW-RobotRules.git",
"web" : "https://github.com/libwww-perl/WWW-RobotRules"
},
"x_MailingList" : "mailto:libwww@perl.org"
},
"version" : "6.03",
"x_Dist_Zilla" : {
"perl" : {
"version" : "5.042002"
},
"plugins" : [
{
"class" : "Dist::Zilla::Plugin::Git::GatherDir",
"config" : {
"Dist::Zilla::Plugin::GatherDir" : {
"exclude_filename" : [
".perltidyrc",
"LICENSE",
"META.json",
"Makefile.PL",
"README.md",
"perlimports.toml",
"precious.toml"
],
"exclude_match" : [],
"include_dotfiles" : 0,
"prefix" : "",
"prune_directory" : [],
"root" : "."
},
"Dist::Zilla::Plugin::Git::GatherDir" : {
"include_untracked" : 0
}
},
"name" : "Git::GatherDir",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::MetaConfig",
"name" : "MetaConfig",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::MetaProvides::Package",
"config" : {
"Dist::Zilla::Plugin::MetaProvides::Package" : {
"finder_objects" : [
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : "MetaProvides::Package/AUTOVIV/:InstallModulesPM",
"version" : "6.037"
}
],
"include_underscores" : 0
},
"Dist::Zilla::Role::MetaProvider::Provider" : {
"$Dist::Zilla::Role::MetaProvider::Provider::VERSION" : "2.002004",
"inherit_missing" : 1,
"inherit_version" : 1,
"meta_noindex" : 1
},
"Dist::Zilla::Role::ModuleMetadata" : {
"Module::Metadata" : "1.000038",
"version" : "0.006"
}
},
"name" : "MetaProvides::Package",
"version" : "2.004003"
},
{
"class" : "Dist::Zilla::Plugin::MetaNoIndex",
"name" : "MetaNoIndex",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::MetaYAML",
"name" : "MetaYAML",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::MetaJSON",
"name" : "MetaJSON",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::MetaResources",
"name" : "MetaResources",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::Git::Contributors",
"config" : {
"Dist::Zilla::Plugin::Git::Contributors" : {
"git_version" : "2.43.0",
"include_authors" : 0,
"include_releaser" : 1,
"order_by" : "name",
"paths" : []
}
},
"name" : "Git::Contributors",
"version" : "0.039"
},
{
"class" : "Dist::Zilla::Plugin::GithubMeta",
"name" : "GithubMeta",
"version" : "0.58"
},
{
"class" : "Dist::Zilla::Plugin::Manifest",
"name" : "Manifest",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::License",
"name" : "License",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::InstallGuide",
"config" : {
"Dist::Zilla::Role::ModuleMetadata" : {
"Module::Metadata" : "1.000038",
"version" : "0.006"
}
},
"name" : "InstallGuide",
"version" : "1.200014"
},
{
"class" : "Dist::Zilla::Plugin::Prereqs::FromCPANfile",
"name" : "Prereqs::FromCPANfile",
"version" : "0.08"
},
{
"class" : "Dist::Zilla::Plugin::MakeMaker",
"config" : {
"Dist::Zilla::Role::TestRunner" : {
"default_jobs" : "8"
}
},
"name" : "MakeMaker",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::MojibakeTests",
"name" : "MojibakeTests",
"version" : "0.8"
},
{
"class" : "Dist::Zilla::Plugin::Test::Version",
"name" : "Test::Version",
"version" : "1.09"
},
{
"class" : "Dist::Zilla::Plugin::Test::ReportPrereqs",
"name" : "Test::ReportPrereqs",
"version" : "0.029"
},
{
"class" : "Dist::Zilla::Plugin::Test::Compile",
"config" : {
"Dist::Zilla::Plugin::Test::Compile" : {
"bail_out_on_fail" : "1",
"fail_on_warning" : "author",
"fake_home" : 0,
"filename" : "xt/author/00-compile.t",
"module_finder" : [
":InstallModules"
],
"needs_display" : 0,
"phase" : "develop",
"script_finder" : [
":PerlExecFiles"
],
"skips" : [
"WWW::RobotRules::DB_File"
],
"switch" : []
}
},
"name" : "Test::Compile",
"version" : "2.059"
},
{
"class" : "Dist::Zilla::Plugin::Test::Portability",
"config" : {
"Dist::Zilla::Plugin::Test::Portability" : {
"options" : ""
}
},
"name" : "Test::Portability",
"version" : "2.001003"
},
{
"class" : "Dist::Zilla::Plugin::Test::EOL",
"config" : {
"Dist::Zilla::Plugin::Test::EOL" : {
"filename" : "xt/author/eol.t",
"finder" : [
":ExecFiles",
":InstallModules",
":TestFiles"
],
"trailing_whitespace" : 1
}
},
"name" : "Test::EOL",
"version" : "0.19"
},
{
"class" : "Dist::Zilla::Plugin::Test::MinimumVersion",
"config" : {
"Dist::Zilla::Plugin::Test::MinimumVersion" : {
"max_target_perl" : null
}
},
"name" : "Test::MinimumVersion",
"version" : "2.000011"
},
{
"class" : "Dist::Zilla::Plugin::PodSyntaxTests",
"name" : "PodSyntaxTests",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable",
"name" : "Test::Pod::Coverage::Configurable",
"version" : "0.07"
},
{
"class" : "Dist::Zilla::Plugin::Test::PodSpelling",
"config" : {
"Dist::Zilla::Plugin::Test::PodSpelling" : {
"directories" : [
"bin",
"lib"
],
"spell_cmd" : "",
"stopwords" : [
"Aas",
"AnyDBM",
"Ardo",
"DBM",
"Gisle",
"Hakan",
"Koster",
"Martijn",
"RobotUA",
"cybermapper",
"diskcaching",
"txt"
],
"wordlist" : "Pod::Wordlist"
}
},
"name" : "Test::PodSpelling",
"version" : "2.007006"
},
{
"class" : "Dist::Zilla::Plugin::Git::Check",
"config" : {
"Dist::Zilla::Plugin::Git::Check" : {
"untracked_files" : "die"
},
"Dist::Zilla::Role::Git::DirtyFiles" : {
"allow_dirty" : [],
"allow_dirty_match" : [],
"changelog" : "Changes"
},
"Dist::Zilla::Role::Git::Repo" : {
"git_version" : "2.43.0",
"repo_root" : "."
}
},
"name" : "Git::Check",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::CheckStrictVersion",
"name" : "CheckStrictVersion",
"version" : "0.001"
},
{
"class" : "Dist::Zilla::Plugin::RunExtraTests",
"config" : {
"Dist::Zilla::Role::TestRunner" : {
"default_jobs" : "8"
}
},
"name" : "RunExtraTests",
"version" : "0.029"
},
{
"class" : "Dist::Zilla::Plugin::CheckChangeLog",
"name" : "CheckChangeLog",
"version" : "0.05"
},
{
"class" : "Dist::Zilla::Plugin::CheckChangesHasContent",
"name" : "CheckChangesHasContent",
"version" : "0.011"
},
{
"class" : "Dist::Zilla::Plugin::TestRelease",
"name" : "TestRelease",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::UploadToCPAN",
"name" : "UploadToCPAN",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::ReadmeAnyFromPod",
"config" : {
"Dist::Zilla::Role::FileWatcher" : {
"version" : "0.006"
}
},
"name" : "Markdown_Readme",
"version" : "0.163250"
},
{
"class" : "Dist::Zilla::Plugin::CopyFilesFromRelease",
"config" : {
"Dist::Zilla::Plugin::CopyFilesFromRelease" : {
"filename" : [
"LICENSE",
"META.json"
],
"match" : []
}
},
"name" : "CopyFilesFromRelease",
"version" : "0.007"
},
{
"class" : "Dist::Zilla::Plugin::Prereqs",
"config" : {
"Dist::Zilla::Plugin::Prereqs" : {
"phase" : "develop",
"type" : "recommends"
}
},
"name" : "@Git::VersionManager/pluginbundle version",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::RewriteVersion::Transitional",
"config" : {
"Dist::Zilla::Plugin::RewriteVersion" : {
"add_tarball_name" : 0,
"finders" : [
":ExecFiles",
":InstallModules"
],
"global" : 0,
"skip_version_provider" : 0
},
"Dist::Zilla::Plugin::RewriteVersion::Transitional" : {}
},
"name" : "@Git::VersionManager/RewriteVersion::Transitional",
"version" : "0.009"
},
{
"class" : "Dist::Zilla::Plugin::MetaProvides::Update",
"name" : "@Git::VersionManager/MetaProvides::Update",
"version" : "0.007"
},
{
"class" : "Dist::Zilla::Plugin::CopyFilesFromRelease",
"config" : {
"Dist::Zilla::Plugin::CopyFilesFromRelease" : {
"filename" : [
"Changes"
],
"match" : []
}
},
"name" : "@Git::VersionManager/CopyFilesFromRelease",
"version" : "0.007"
},
{
"class" : "Dist::Zilla::Plugin::Git::Commit",
"config" : {
"Dist::Zilla::Plugin::Git::Commit" : {
"add_files_in" : [],
"commit_msg" : "v%V%n%n%c",
"signoff" : 0
},
"Dist::Zilla::Role::Git::DirtyFiles" : {
"allow_dirty" : [
"Changes",
"LICENSE",
"META.json",
"README.md"
],
"allow_dirty_match" : [],
"changelog" : "Changes"
},
"Dist::Zilla::Role::Git::Repo" : {
"git_version" : "2.43.0",
"repo_root" : "."
},
"Dist::Zilla::Role::Git::StringFormatter" : {
"time_zone" : "local"
}
},
"name" : "@Git::VersionManager/release snapshot",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::Git::Tag",
"config" : {
"Dist::Zilla::Plugin::Git::Tag" : {
"branch" : null,
"changelog" : "Changes",
"signed" : 0,
"tag" : "v6.03",
"tag_format" : "v%V",
"tag_message" : "v%V"
},
"Dist::Zilla::Role::Git::Repo" : {
"git_version" : "2.43.0",
"repo_root" : "."
},
"Dist::Zilla::Role::Git::StringFormatter" : {
"time_zone" : "local"
}
},
"name" : "@Git::VersionManager/Git::Tag",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional",
"config" : {
"Dist::Zilla::Plugin::BumpVersionAfterRelease" : {
"finders" : [
":ExecFiles",
":InstallModules"
],
"global" : 0,
"munge_makefile_pl" : 1
},
"Dist::Zilla::Plugin::BumpVersionAfterRelease::Transitional" : {}
},
"name" : "@Git::VersionManager/BumpVersionAfterRelease::Transitional",
"version" : "0.009"
},
{
"class" : "Dist::Zilla::Plugin::NextRelease",
"name" : "@Git::VersionManager/NextRelease",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::Git::Commit",
"config" : {
"Dist::Zilla::Plugin::Git::Commit" : {
"add_files_in" : [],
"commit_msg" : "increment $VERSION after %v release",
"signoff" : 0
},
"Dist::Zilla::Role::Git::DirtyFiles" : {
"allow_dirty" : [
"Build.PL",
"Changes",
"Makefile.PL"
],
"allow_dirty_match" : [
"(?^:^lib/.*\\.pm$)"
],
"changelog" : "Changes"
},
"Dist::Zilla::Role::Git::Repo" : {
"git_version" : "2.43.0",
"repo_root" : "."
},
"Dist::Zilla::Role::Git::StringFormatter" : {
"time_zone" : "local"
}
},
"name" : "@Git::VersionManager/post-release commit",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::Git::Push",
"config" : {
"Dist::Zilla::Plugin::Git::Push" : {
"push_to" : [
"origin"
],
"remotes_must_exist" : 1
},
"Dist::Zilla::Role::Git::Repo" : {
"git_version" : "2.43.0",
"repo_root" : "."
}
},
"name" : "Git::Push",
"version" : "2.052"
},
{
"class" : "Dist::Zilla::Plugin::ConfirmRelease",
"name" : "ConfirmRelease",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":InstallModules",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":IncModules",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":TestFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":ExtraTestFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":ExecFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":PerlExecFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":ShareFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":MainModule",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":AllFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : ":NoFiles",
"version" : "6.037"
},
{
"class" : "Dist::Zilla::Plugin::FinderCode",
"name" : "MetaProvides::Package/AUTOVIV/:InstallModulesPM",
"version" : "6.037"
}
],
"zilla" : {
"class" : "Dist::Zilla::Dist::Builder",
"config" : {
"is_trial" : 0
},
"version" : "6.037"
}
},
"x_contributors" : [
"Adam Kennedy ",
"Adam Sjogren ",
"Alexey Tourbin ",
"Alex Kapranoff ",
"amire80 ",
"Andreas J. Koenig ",
"Anton Yuzhaninov ",
"Bill Mann ",
"Bron Gondwana ",
"Daniel Hedlund ",
"David E. Wheeler ",
"DAVIDRW ",
"David Steinbrunner ",
"dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>",
"Father Chrysostomos ",
"FWILES ",
"Gavin Peters ",
"Graeme Thompson ",
"Graham Knop ",
"Hans-H. Froehlich ",
"Ian Kilgore ",
"Jacob J ",
"jefflee ",
"john9art ",
"Mark Stosberg ",
"Mike Schilli ",
"mschilli ",
"murphy ",
"Olaf Alders ",
"Ondrej Hanak ",
"Peter Rabbitson ",
"phrstbrn ",
"Robert Stone ",
"Rolf Grossmann ",
"ruff ",
"sasao ",
"Sean M. Burke ",
"Slaven Rezic ",
"Spiros Denaxas ",
"Steve Hay ",
"Todd Lipcon ",
"Tom Hukins ",
"Tony Finch ",
"Toru Yamaguchi ",
"uid39246 ",
"Ville Skytt\u00e4 ",
"Yuri Karaban ",
"Zefram "
],
"x_generated_by_perl" : "v5.42.2",
"x_serialization_backend" : "Cpanel::JSON::XS version 4.40",
"x_spdx_expression" : "Artistic-1.0-Perl OR GPL-1.0-or-later"
}
Makefile.PL 100644 001750 001751 2534 15204207641 15312 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03 # This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v6.037
use strict;
use warnings;
use 5.008001;
use ExtUtils::MakeMaker;
my %WriteMakefileArgs = (
"ABSTRACT" => "database of robots.txt-derived permissions",
"AUTHOR" => "Gisle Aas ",
"CONFIGURE_REQUIRES" => {
"ExtUtils::MakeMaker" => 0
},
"DISTNAME" => "WWW-RobotRules",
"LICENSE" => "perl",
"MIN_PERL_VERSION" => "5.008001",
"NAME" => "WWW::RobotRules",
"PREREQ_PM" => {
"AnyDBM_File" => 0,
"Carp" => 0,
"Fcntl" => 0,
"URI" => "1.10",
"strict" => 0
},
"TEST_REQUIRES" => {
"ExtUtils::MakeMaker" => 0,
"File::Spec" => 0,
"Test::More" => "0.96",
"strict" => 0,
"warnings" => 0
},
"VERSION" => "6.03",
"test" => {
"TESTS" => "t/*.t"
}
);
my %FallbackPrereqs = (
"AnyDBM_File" => 0,
"Carp" => 0,
"ExtUtils::MakeMaker" => 0,
"Fcntl" => 0,
"File::Spec" => 0,
"Test::More" => "0.96",
"URI" => "1.10",
"strict" => 0,
"warnings" => 0
);
unless ( eval { ExtUtils::MakeMaker->VERSION(6.63_03) } ) {
delete $WriteMakefileArgs{TEST_REQUIRES};
delete $WriteMakefileArgs{BUILD_REQUIRES};
$WriteMakefileArgs{PREREQ_PM} = \%FallbackPrereqs;
}
delete $WriteMakefileArgs{CONFIGURE_REQUIRES}
unless eval { ExtUtils::MakeMaker->VERSION(6.52) };
WriteMakefile(%WriteMakefileArgs);
rules-dbm.t 100644 001750 001751 5536 15204207641 15667 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t use strict;
use warnings;
use Test::More;
use File::Temp qw( tempdir );
use WWW::RobotRules::AnyDBM_File ();
my $dir = tempdir(CLEANUP => 1);
my $file = "$dir/robotdb";
my $r = WWW::RobotRules::AnyDBM_File->new("myrobot/2.0", $file);
# Cache backing file(s) must have no group/world permission bits.
if ($^O ne 'MSWin32') {
my @backing = glob "$file*";
ok scalar @backing, "DBM backing file(s) exist after construction";
for my $f (@backing) {
my $mode = (stat $f)[2] & 07777;
is($mode & 0077,
0,
"$f mode " . sprintf("%04o", $mode) . " has no group/world bits");
}
}
$r->parse("http://www.aas.no/robots.txt", "");
$r->visit("www.aas.no:80");
is $r->no_visits("www.aas.no:80"), 1;
$r->push_rules("www.sn.no:80", "/aas", "/per");
$r->push_rules("www.sn.no:80", "/god", "/old");
my @r = $r->rules("www.sn.no:80");
is "@r", "/aas /per /god /old";
$r->clear_rules("per");
$r->clear_rules("www.sn.no:80");
@r = $r->rules("www.sn.no:80");
is "@r", "";
$r->visit("www.aas.no:80", time + 10);
$r->visit("www.sn.no:80");
note "No visits: " . $r->no_visits("www.aas.no:80");
note "Last visit: " . $r->last_visit("www.aas.no:80");
note "Fresh until: " . $r->fresh_until("www.aas.no:80");
is $r->no_visits("www.aas.no:80"), 2;
cmp_ok abs($r->last_visit("www.sn.no:80") - time), '<=', 2;
$r = undef;
# Try to reopen the database without a name specified
$r = WWW::RobotRules::AnyDBM_File->new(undef, $file);
$r->visit("www.aas.no:80");
is $r->no_visits("www.aas.no:80"), 3;
note "Agent-Name: ", $r->agent;
is $r->agent, 'myrobot';
$r = undef;
note "*** Dump of database ***";
tie(my %cat, 'AnyDBM_File', $file, 0, 0644) or die "Can't tie: $!";
while (my ($key, $val) = each(%cat)) {
note "$key\t$val";
}
note "******";
untie %cat;
# Try to open database with a different agent name
$r = WWW::RobotRules::AnyDBM_File->new("MOMSpider/2.0", $file);
is $r->no_visits("www.sn.no:80"), 0;
# Try parsing
$r->parse("http://www.sn.no:8080/robots.txt", <rules("www.sn.no:8080");
is "@r", "/foo /bar";
cmp_ok $r->allowed("http://www.sn.no"), '<', 0;
ok !$r->allowed("http://www.sn.no:8080/foo/gisle");
sleep(2); # wait until file has expired
cmp_ok $r->allowed("http://www.sn.no:8080/foo/gisle"), '<', 0;
$r = undef;
note "*** Dump of database ***";
tie(%cat, 'AnyDBM_File', $file, 0, 0644) or die "Can't tie: $!";
while (my ($key, $val) = each(%cat)) {
note "$key\t$val";
}
note "******";
untie %cat; # Otherwise the next line fails on DOSish
while (unlink("$file", "$file.pag", "$file.dir", "$file.db")) { }
# Try open a an emty database without specifying a name
eval { $r = WWW::RobotRules::AnyDBM_File->new(undef, $file); };
isnt $@, "";
unlink "$file", "$file.pag", "$file.dir", "$file.db";
done_testing;
misc 000755 001750 001751 0 15204207641 14372 5 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t dbmrobot 100755 001750 001751 1145 15204207641 16271 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t/misc #!/local/perl/bin/perl -w
use strict;
use warnings;
use URI::URL qw( url );
my $url = url(shift) || die "Usage: $0 \n";
use WWW::RobotRules::AnyDBM_File ();
use LWP::RobotUA ();
my $botname = "Spider/0.1";
my $rules = WWW::RobotRules::AnyDBM_File->new($botname, 'robotdb');
my $ua = LWP::RobotUA->new($botname, 'gisle@aas.no', $rules);
$ua->delay(0.1);
my $req = HTTP::Request->new(GET => $url);
my $res = $ua->request($req);
print "Got ", $res->code, " ", $res->message, "(", $res->content_type, ")\n";
my $netloc = $url->netloc;
print "This was visit no ", $ua->no_visits($netloc), " to $netloc\n";
author 000755 001750 001751 0 15204207641 15131 5 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt eol.t 100644 001750 001751 704 15204207641 16216 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
# this test was generated with Dist::Zilla::Plugin::Test::EOL 0.19
use Test::More 0.88;
use Test::EOL;
my @files = (
'lib/WWW/RobotRules.pm',
'lib/WWW/RobotRules/AnyDBM_File.pm',
'lib/WWW/RobotRules/DB_File.pm',
't/00-report-prereqs.dd',
't/00-report-prereqs.t',
't/misc/dbmrobot',
't/rules-dbm.t',
't/rules.t'
);
eol_unix_ok($_, { trailing_whitespace => 1 }) foreach @files;
done_testing;
mojibake.t 100644 001750 001751 151 15204207641 17214 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author #!perl
use strict;
use warnings qw(all);
use Test::More;
use Test::Mojibake;
all_files_encoding_ok();
WWW 000755 001750 001751 0 15204207641 14426 5 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/lib RobotRules.pm 100644 001750 001751 27306 15204207641 17254 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/lib/WWW package WWW::RobotRules;
use strict;
our $VERSION = '6.03';
sub Version { $VERSION; }
use URI ();
sub new {
my ($class, $ua) = @_;
# This ugly hack is needed to ensure backwards compatibility.
# The "WWW::RobotRules" class is now really abstract.
$class = "WWW::RobotRules::InCore" if $class eq "WWW::RobotRules";
my $self = bless {}, $class;
$self->agent($ua);
$self;
}
sub parse {
my ($self, $robot_txt_uri, $txt, $fresh_until) = @_;
$robot_txt_uri = URI->new("$robot_txt_uri");
my $netloc = $robot_txt_uri->host . ":" . $robot_txt_uri->port;
$self->clear_rules($netloc);
$self->fresh_until($netloc, $fresh_until || (time + 365 * 24 * 3600));
my $ua;
my $is_me = 0; # 1 iff this record is for me
my $is_anon = 0; # 1 iff this record is for *
my $seen_disallow = 0; # watch for missing record separators
my @me_disallowed = (); # rules disallowed for me
my @anon_disallowed = (); # rules disallowed for *
# blank lines are significant, so turn CRLF into LF to avoid generating
# false ones
$txt =~ s/\015\012/\012/g;
# split at \012 (LF) or \015 (CR) (Mac text files have just CR for EOL)
for (split(/[\012\015]/, $txt)) {
# Lines containing only a comment are discarded completely, and
# therefore do not indicate a record boundary.
next if /^\s*\#/;
s/\s*\#.*//; # remove comments at end-of-line
if (/^\s*$/) { # blank line
last if $is_me; # That was our record. No need to read the rest.
$is_anon = 0;
$seen_disallow = 0;
}
elsif (/^\s*User-Agent\s*:\s*(.*)/i) {
$ua = $1;
$ua =~ s/\s+$//;
if ($seen_disallow) {
# treat as start of a new record
$seen_disallow = 0;
last if $is_me; # That was our record. No need to read the rest.
$is_anon = 0;
}
if ($is_me) {
# This record already had a User-agent that
# we matched, so just continue.
}
elsif ($ua eq '*') {
$is_anon = 1;
}
elsif ($self->is_me($ua)) {
$is_me = 1;
}
}
elsif (/^\s*Disallow\s*:\s*(.*)/i) {
unless (defined $ua) {
warn
"RobotRules <$robot_txt_uri>: Disallow without preceding User-agent\n"
if $^W;
$is_anon = 1; # assume that User-agent: * was intended
}
my $disallow = $1;
$disallow =~ s/\s+$//;
$seen_disallow = 1;
if (length $disallow) {
my $ignore;
eval {
my $u = URI->new_abs($disallow, $robot_txt_uri);
$ignore++ if $u->scheme ne $robot_txt_uri->scheme;
$ignore++ if lc($u->host) ne lc($robot_txt_uri->host);
$ignore++ if $u->port ne $robot_txt_uri->port;
$disallow = $u->path_query;
$disallow = "/" unless length $disallow;
};
next if $@;
next if $ignore;
}
if ($is_me) {
push(@me_disallowed, $disallow);
}
elsif ($is_anon) {
push(@anon_disallowed, $disallow);
}
}
elsif (/\S\s*:/) {
# ignore
}
else {
warn "RobotRules <$robot_txt_uri>: Malformed record: <$_>\n" if $^W;
}
}
if ($is_me) {
$self->push_rules($netloc, @me_disallowed);
}
else {
$self->push_rules($netloc, @anon_disallowed);
}
}
#
# Returns TRUE if the given name matches the
# name of this robot
#
sub is_me {
my ($self, $ua_line) = @_;
my $me = $self->agent;
# See whether my short-name is a substring of the
# "User-Agent: ..." line that we were passed:
if (index(lc($me), lc($ua_line)) >= 0) {
return 1;
}
else {
return '';
}
}
sub allowed {
my ($self, $uri) = @_;
$uri = URI->new("$uri");
return 1 unless $uri->scheme eq 'http' or $uri->scheme eq 'https';
# Robots.txt applies to only those schemes.
my $netloc = $uri->host . ":" . $uri->port;
my $fresh_until = $self->fresh_until($netloc);
return -1 if !defined($fresh_until) || $fresh_until < time;
my $str = $uri->path_query;
my $rule;
for $rule ($self->rules($netloc)) {
return 1 unless length $rule;
return 0 if index($str, $rule) == 0;
}
return 1;
}
# The following methods must be provided by the subclass.
sub agent;
sub visit;
sub no_visits;
sub last_visit;
sub fresh_until;
sub push_rules;
sub clear_rules;
sub rules;
sub dump;
package WWW::RobotRules::InCore;
our @ISA = qw(WWW::RobotRules);
sub agent {
my ($self, $name) = @_;
my $old = $self->{'ua'};
if ($name) {
# Strip it so that it's just the short name.
# I.e., "FooBot" => "FooBot"
# "FooBot/1.2" => "FooBot"
# "FooBot/1.2 [http://foobot.int; foo@bot.int]" => "FooBot"
$name = $1 if $name =~ m/(\S+)/; # get first word
$name =~ s!/.*!!; # get rid of version
unless ($old && $old eq $name) {
delete $self->{'loc'}; # all old info is now stale
$self->{'ua'} = $name;
}
}
$old;
}
sub visit {
my ($self, $netloc, $time) = @_;
return unless $netloc;
$time ||= time;
$self->{'loc'}{$netloc}{'last'} = $time;
my $count = \$self->{'loc'}{$netloc}{'count'};
if (!defined $$count) {
$$count = 1;
}
else {
$$count++;
}
}
sub no_visits {
my ($self, $netloc) = @_;
$self->{'loc'}{$netloc}{'count'};
}
sub last_visit {
my ($self, $netloc) = @_;
$self->{'loc'}{$netloc}{'last'};
}
sub fresh_until {
my ($self, $netloc, $fresh_until) = @_;
my $old = $self->{'loc'}{$netloc}{'fresh'};
if (defined $fresh_until) {
$self->{'loc'}{$netloc}{'fresh'} = $fresh_until;
}
$old;
}
sub push_rules {
my ($self, $netloc, @rules) = @_;
push(@{$self->{'loc'}{$netloc}{'rules'}}, @rules);
}
sub clear_rules {
my ($self, $netloc) = @_;
delete $self->{'loc'}{$netloc}{'rules'};
}
sub rules {
my ($self, $netloc) = @_;
if (defined $self->{'loc'}{$netloc}{'rules'}) {
return @{$self->{'loc'}{$netloc}{'rules'}};
}
else {
return ();
}
}
sub dump {
my $self = shift;
for (keys %$self) {
next if $_ eq 'loc';
print "$_ = $self->{$_}\n";
}
for (keys %{$self->{'loc'}}) {
my @rules = $self->rules($_);
print "$_: ", join("; ", @rules), "\n";
}
}
1;
__END__
# Bender: "Well, I don't have anything else
# planned for today. Let's get drunk!"
=head1 NAME
WWW::RobotRules - database of robots.txt-derived permissions
=head1 SYNOPSIS
use WWW::RobotRules;
my $rules = WWW::RobotRules->new('MOMspider/1.0');
use LWP::Simple qw(get);
{
my $url = "http://some.place/robots.txt";
my $robots_txt = get $url;
$rules->parse($url, $robots_txt) if defined $robots_txt;
}
{
my $url = "http://some.other.place/robots.txt";
my $robots_txt = get $url;
$rules->parse($url, $robots_txt) if defined $robots_txt;
}
# Now we can check if a URL is valid for those servers
# whose "robots.txt" files we've gotten and parsed:
if($rules->allowed($url)) {
$c = get $url;
...
}
=head1 DESCRIPTION
This module parses F files as specified in at
L.
Webmasters can use the F file to forbid conforming
robots from accessing parts of their web site.
The parsed files are kept in a C object, and this object
provides methods to check if access to a given URL is prohibited. The
same C object can be used for one or more parsed
F files on any number of hosts.
The following methods are provided:
=over 4
=item $rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first
argument given to new() is the name of the robot.
=item $rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to
retrieve the F file, and the contents of the file.
=item $rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
=item $rules->agent([$name])
Get/set the agent name. NOTE: Changing the agent name will clear the
F rules and expire times out of the cache.
=back
=head1 ROBOTS.TXT
The format and semantics of the "/robots.txt" file are as follows:
The file consists of one or more records separated by one or more
blank lines. Each record contains lines of the form
:
The field name is case insensitive. Text after the '#' character on a
line is ignored during parsing. This is used for comments. The
following can be used:
=over 3
=item User-Agent
The value of this field is the name of the robot the record is
describing access policy for. If more than one I field is
present the record describes an identical access policy for more than
one robot. At least one field needs to be present per record. If the
value is '*', the record describes the default access policy for any
robot that has not matched any of the other records.
The I fields must occur before the I fields. If a
record contains a I field after a I field, that
constitutes a malformed record. This parser will assume that a blank
line should have been placed before that I field, and will
break the record into two. All the fields before the I field
will constitute a record, and the I field will be the first
field in a new record.
=item Disallow
The value of this field specifies a partial URL that is not to be
visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved
=back
Unrecognized records are ignored.
=head1 ROBOTS.TXT EXAMPLES
The following example "/robots.txt" file specifies that no robots
should visit any URL starting with "/cyberworld/map/" or "/tmp/":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example "/robots.txt" file specifies that no robots should visit
any URL starting with "/cyberworld/map/", except the robot called
"cybermapper":
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
This is an example of a malformed robots.txt file.
# robots.txt for ancientcastle.example.com
# I've locked myself away.
User-agent: *
Disallow: /
# The castle is your home now, so you can go anywhere you like.
User-agent: Belle
Disallow: /west-wing/ # except the west wing!
# It's good to be the Prince...
User-agent: Beast
Disallow:
This file is missing the required blank lines between records.
However, the intention is clear.
=head1 SEE ALSO
L, L
=head1 COPYRIGHT
Copyright 1995-2009, Gisle Aas
Copyright 1995, Martijn Koster
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
00-report-prereqs.t 100644 001750 001751 13601 15204207641 17214 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t #!perl
use strict;
use warnings;
# This test was generated by Dist::Zilla::Plugin::Test::ReportPrereqs 0.029
use Test::More tests => 1;
use ExtUtils::MakeMaker;
use File::Spec;
# from $version::LAX
my $lax_version_re =
qr/(?: undef | (?: (?:[0-9]+) (?: \. | (?:\.[0-9]+) (?:_[0-9]+)? )?
|
(?:\.[0-9]+) (?:_[0-9]+)?
) | (?:
v (?:[0-9]+) (?: (?:\.[0-9]+)+ (?:_[0-9]+)? )?
|
(?:[0-9]+)? (?:\.[0-9]+){2,} (?:_[0-9]+)?
)
)/x;
# hide optional CPAN::Meta modules from prereq scanner
# and check if they are available
my $cpan_meta = "CPAN::Meta";
my $cpan_meta_pre = "CPAN::Meta::Prereqs";
my $HAS_CPAN_META = eval "require $cpan_meta; $cpan_meta->VERSION('2.120900')" && eval "require $cpan_meta_pre"; ## no critic
# Verify requirements?
my $DO_VERIFY_PREREQS = 1;
sub _max {
my $max = shift;
$max = ( $_ > $max ) ? $_ : $max for @_;
return $max;
}
sub _merge_prereqs {
my ($collector, $prereqs) = @_;
# CPAN::Meta::Prereqs object
if (ref $collector eq $cpan_meta_pre) {
return $collector->with_merged_prereqs(
CPAN::Meta::Prereqs->new( $prereqs )
);
}
# Raw hashrefs
for my $phase ( keys %$prereqs ) {
for my $type ( keys %{ $prereqs->{$phase} } ) {
for my $module ( keys %{ $prereqs->{$phase}{$type} } ) {
$collector->{$phase}{$type}{$module} = $prereqs->{$phase}{$type}{$module};
}
}
}
return $collector;
}
my @include = qw(
);
my @exclude = qw(
);
# Add static prereqs to the included modules list
my $static_prereqs = do './t/00-report-prereqs.dd';
# Merge all prereqs (either with ::Prereqs or a hashref)
my $full_prereqs = _merge_prereqs(
( $HAS_CPAN_META ? $cpan_meta_pre->new : {} ),
$static_prereqs
);
# Add dynamic prereqs to the included modules list (if we can)
my ($source) = grep { -f } 'MYMETA.json', 'MYMETA.yml';
my $cpan_meta_error;
if ( $source && $HAS_CPAN_META
&& (my $meta = eval { CPAN::Meta->load_file($source) } )
) {
$full_prereqs = _merge_prereqs($full_prereqs, $meta->prereqs);
}
else {
$cpan_meta_error = $@; # capture error from CPAN::Meta->load_file($source)
$source = 'static metadata';
}
my @full_reports;
my @dep_errors;
my $req_hash = $HAS_CPAN_META ? $full_prereqs->as_string_hash : $full_prereqs;
# Add static includes into a fake section
for my $mod (@include) {
$req_hash->{other}{modules}{$mod} = 0;
}
for my $phase ( qw(configure build test runtime develop other) ) {
next unless $req_hash->{$phase};
next if ($phase eq 'develop' and not $ENV{AUTHOR_TESTING});
for my $type ( qw(requires recommends suggests conflicts modules) ) {
next unless $req_hash->{$phase}{$type};
my $title = ucfirst($phase).' '.ucfirst($type);
my @reports = [qw/Module Want Have/];
for my $mod ( sort keys %{ $req_hash->{$phase}{$type} } ) {
next if grep { $_ eq $mod } @exclude;
my $want = $req_hash->{$phase}{$type}{$mod};
$want = "undef" unless defined $want;
$want = "any" if !$want && $want == 0;
if ($mod eq 'perl') {
push @reports, ['perl', $want, $]];
next;
}
my $req_string = $want eq 'any' ? 'any version required' : "version '$want' required";
my $file = $mod;
$file =~ s{::}{/}g;
$file .= ".pm";
my ($prefix) = grep { -e File::Spec->catfile($_, $file) } @INC;
if ($prefix) {
my $have = MM->parse_version( File::Spec->catfile($prefix, $file) );
$have = "undef" unless defined $have;
push @reports, [$mod, $want, $have];
if ( $DO_VERIFY_PREREQS && $HAS_CPAN_META && $type eq 'requires' ) {
if ( $have !~ /\A$lax_version_re\z/ ) {
push @dep_errors, "$mod version '$have' cannot be parsed ($req_string)";
}
elsif ( ! $full_prereqs->requirements_for( $phase, $type )->accepts_module( $mod => $have ) ) {
push @dep_errors, "$mod version '$have' is not in required range '$want'";
}
}
}
else {
push @reports, [$mod, $want, "missing"];
if ( $DO_VERIFY_PREREQS && $type eq 'requires' ) {
push @dep_errors, "$mod is not installed ($req_string)";
}
}
}
if ( @reports ) {
push @full_reports, "=== $title ===\n\n";
my $ml = _max( map { length $_->[0] } @reports );
my $wl = _max( map { length $_->[1] } @reports );
my $hl = _max( map { length $_->[2] } @reports );
if ($type eq 'modules') {
splice @reports, 1, 0, ["-" x $ml, "", "-" x $hl];
push @full_reports, map { sprintf(" %*s %*s\n", -$ml, $_->[0], $hl, $_->[2]) } @reports;
}
else {
splice @reports, 1, 0, ["-" x $ml, "-" x $wl, "-" x $hl];
push @full_reports, map { sprintf(" %*s %*s %*s\n", -$ml, $_->[0], $wl, $_->[1], $hl, $_->[2]) } @reports;
}
push @full_reports, "\n";
}
}
}
if ( @full_reports ) {
diag "\nVersions for all modules listed in $source (including optional ones):\n\n", @full_reports;
}
if ( $cpan_meta_error || @dep_errors ) {
diag "\n*** WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***\n";
}
if ( $cpan_meta_error ) {
my ($orig_source) = grep { -f } 'MYMETA.json', 'MYMETA.yml';
diag "\nCPAN::Meta->load_file('$orig_source') failed with: $cpan_meta_error\n";
}
if ( @dep_errors ) {
diag join("\n",
"\nThe following REQUIRED prerequisites were not satisfied:\n",
@dep_errors,
"\n"
);
}
pass('Reported prereqs');
# vim: ts=4 sts=4 sw=4 et:
pod-spell.t 100644 001750 001751 2231 15204207641 17353 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
use Test::More;
# generated by Dist::Zilla::Plugin::Test::PodSpelling 2.007006
use Test::Spelling 0.17;
use Pod::Wordlist;
add_stopwords();
all_pod_files_spelling_ok( qw( bin lib ) );
__DATA__
49699333
Aas
Adam
Alders
Alex
Alexey
Andreas
Anton
AnyDBM
AnyDBM_File
Ardo
Bill
Bron
Burke
Chrysostomos
DAVIDRW
DBM
DB_File
Daniel
David
Denaxas
FWILES
Father
Finch
Froehlich
Gavin
Gisle
Gondwana
Graeme
Graham
Grossmann
Hakan
Hanak
Hans
Hay
Hedlund
Hukins
Ian
Jacob
Kapranoff
Karaban
Kennedy
Kilgore
Knop
Koenig
Koster
Lipcon
MARKSTOS
Mann
Mark
Martijn
Mike
Olaf
Ondrej
Peter
Peters
Rabbitson
Rezic
Robert
RobotRules
RobotUA
Rolf
Schilli
Sean
Sjogren
Skyttä
Slaven
Spiros
Steinbrunner
Steve
SteveHay
Stone
Stosberg
Thompson
Todd
Tom
Tony
Toru
Tourbin
Ville
WWW
Wheeler
Yamaguchi
Yuri
Yuzhaninov
Zefram
adamk
amir
amire80
andreas
asjo
at
brong
citrin
cybermapper
david
davidrw
denaxas
dependabot
diskcaching
dot
dsteinbrunner
gisle
github
gpeters
haarg
hfroehlich
iank
jefflee
john9art
ka
lib
mschilli
murphy
olaf
ondrej
phrstbrn
rg
ribasushi
ruff
sasao
sburke
shaohua
sprout
srezic
talby
tech
todd
tom
txt
uid39246
ville
waif
wfmann
zefram
zigorou
00-report-prereqs.dd 100644 001750 001751 5207 15204207641 17323 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/t do { my $x = {
'configure' => {
'requires' => {
'ExtUtils::MakeMaker' => '0'
},
'suggests' => {
'JSON::PP' => '2.27300'
}
},
'develop' => {
'recommends' => {
'Dist::Zilla::PluginBundle::Git::VersionManager' => '0.007'
},
'requires' => {
'File::Spec' => '0',
'IO::Handle' => '0',
'IPC::Open3' => '0',
'Pod::Coverage::TrustPod' => '0',
'Pod::Spell' => '1.25',
'Test::EOL' => '2.00',
'Test::MinimumVersion' => '0',
'Test::Mojibake' => '0',
'Test::More' => '0.94',
'Test::Pod' => '1.41',
'Test::Pod::Coverage' => '1.08',
'Test::Portability::Files' => '0',
'Test::Spelling' => '0.17',
'Test::Version' => '1'
}
},
'runtime' => {
'requires' => {
'AnyDBM_File' => '0',
'Carp' => '0',
'Fcntl' => '0',
'URI' => '1.10',
'perl' => '5.008001',
'strict' => '0'
},
'suggests' => {
'DB_File' => '0'
}
},
'test' => {
'recommends' => {
'CPAN::Meta' => '2.120900'
},
'requires' => {
'ExtUtils::MakeMaker' => '0',
'File::Spec' => '0',
'Test::More' => '0.96',
'strict' => '0',
'warnings' => '0'
}
}
};
$x;
} 00-compile.t 100644 001750 001751 2571 15204207641 17330 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
# this test was generated with Dist::Zilla::Plugin::Test::Compile 2.059
use Test::More 0.94;
plan tests => 3;
my @module_files = (
'WWW/RobotRules.pm',
'WWW/RobotRules/AnyDBM_File.pm'
);
# no fake home requested
my @switches = (
-d 'blib' ? '-Mblib' : '-Ilib',
);
use File::Spec;
use IPC::Open3;
use IO::Handle;
open my $stdin, '<', File::Spec->devnull or die "can't open devnull: $!";
my @warnings;
for my $lib (@module_files)
{
# see L
my $stderr = IO::Handle->new;
diag('Running: ', join(', ', map { my $str = $_; $str =~ s/'/\\'/g; q{'}.$str.q{'} }
$^X, @switches, '-e', "require q[$lib]"))
if $ENV{PERL_COMPILE_TEST_DEBUG};
my $pid = open3($stdin, '>&STDERR', $stderr, $^X, @switches, '-e', "require q[$lib]");
binmode $stderr, ':crlf' if $^O eq 'MSWin32';
my @_warnings = <$stderr>;
waitpid($pid, 0);
is($?, 0, "$lib loaded ok");
shift @_warnings if @_warnings and $_warnings[0] =~ /^Using .*\bblib/
and not eval { +require blib; blib->VERSION('1.01') };
if (@_warnings)
{
warn @_warnings;
push @warnings, @_warnings;
}
}
is(scalar(@warnings), 0, 'no warnings found') or diag 'got warnings: ', explain(\@warnings);
BAIL_OUT("Compilation problems") if !Test::More->builder->is_passing;
pod-syntax.t 100644 001750 001751 251 15204207641 17542 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author #!perl
# This file was automatically generated by Dist::Zilla::Plugin::PodSyntaxTests
use strict; use warnings;
use Test::More;
use Test::Pod 1.41;
all_pod_files_ok();
portability.t 100644 001750 001751 130 15204207641 17772 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
use Test::More;
use Test::Portability::Files;
run_tests();
test-version.t 100644 001750 001751 637 15204207641 20106 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
use Test::More;
# generated by Dist::Zilla::Plugin::Test::Version 1.09
use Test::Version;
my @imports = qw( version_all_ok );
my $params = {
is_strict => 0,
has_version => 1,
multiple => 0,
};
push @imports, $params
if version->parse( $Test::Version::VERSION ) >= version->parse('1.002');
Test::Version->import(@imports);
version_all_ok;
done_testing;
pod-coverage.t 100644 001750 001751 2336 15204207641 20035 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author #!perl
# This file was automatically generated by Dist::Zilla::Plugin::Test::Pod::Coverage::Configurable 0.07.
use Test::Pod::Coverage 1.08;
use Test::More 0.88;
BEGIN {
if ( $] <= 5.008008 ) {
plan skip_all => 'These tests require Pod::Coverage::TrustPod, which only works with Perl 5.8.9+';
}
}
use Pod::Coverage::TrustPod;
my %skip = map { $_ => 1 } qw( WWW::RobotRules::AnyDBM_File WWW::RobotRules::DB_File );
my @modules;
for my $module ( all_modules() ) {
next if $skip{$module};
push @modules, $module;
}
plan skip_all => 'All the modules we found were excluded from POD coverage test.'
unless @modules;
plan tests => scalar @modules;
my %trustme = (
'WWW::RobotRules' => [
qr/^(?:Version|is_me|visit|no_visits|last_visit|fresh_until|push_rules|clear_rules|rules|dump)$/
]
);
my @also_private;
for my $module ( sort @modules ) {
pod_coverage_ok(
$module,
{
coverage_class => 'Pod::Coverage::TrustPod',
also_private => \@also_private,
trustme => $trustme{$module} || [],
},
"pod coverage for $module"
);
}
done_testing();
minimum-version.t 100644 001750 001751 154 15204207641 20574 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/xt/author use strict;
use warnings;
use Test::More;
use Test::MinimumVersion;
all_minimum_version_from_metayml_ok();
RobotRules 000755 001750 001751 0 15204207641 16526 5 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/lib/WWW DB_File.pm 100644 001750 001751 6432 15204207641 20455 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/lib/WWW/RobotRules package WWW::RobotRules::DB_File;
use strict;
use WWW::RobotRules ();
our @ISA = qw(WWW::RobotRules);
our $VERSION = '6.03';
use Carp ();
use DB_File;
use Fcntl qw( O_CREAT O_RDWR );
sub new {
my ($class, $name, $file) = @_;
Carp::croak('WWW::RobotRules::DB_File cache file required') unless $file;
my $self = WWW::RobotRules->new($name);
$self = bless $self, $class;
tie %{$self->{'rules'}}, DB_File, $file, O_CREAT | O_RDWR, 0640, $DB_HASH;
$self;
}
sub expires {
my ($self, $hostport, $expires) = @_;
my $old = $self->{'rules'}{"$hostport##expires"};
$old = 0 unless (defined $old);
if (defined $expires) {
$self->{'rules'}{"$hostport##expires"} = $expires;
}
$old;
}
sub roboturl {
my ($self, $hostport, $url) = @_;
$url = $url->as_string if ref($url);
my $old = $self->{'rules'}{"$hostport##url"};
if ($url) {
$self->{'rules'}{"$hostport##url"} = $url;
}
$old;
}
sub host_count {
my ($self, $hostport) = @_;
$self->{'rules'}{"$hostport##count"};
}
sub last_visit {
my ($self, $hostport) = @_;
$self->{'rules'}{"$hostport##last"};
}
sub visit {
my ($self, $hostport, $time) = @_;
$time = time unless defined $time;
$self->{'rules'}{"$hostport##last"} = $time;
if (defined $self->{"rules##$hostport##count"}) {
$self->{'rules'}{"$hostport##count"}++;
}
else {
$self->{'rules'}{"$hostport##count"} = 1;
}
}
sub push_rule {
my ($self, $hostport, $rule) = @_;
my $cnt = 0;
foreach (keys %{$self->{'rules'}}) {
if (/^$hostport\#\#rule\#\#\d+$/) { $cnt++; }
}
$self->{'rules'}{"$hostport##rule##$cnt"} = $rule;
}
sub clear_rules {
my ($self, $hostport) = @_;
foreach (keys %{$self->{'rules'}}) {
if (/^($hostport\#\#rule\#\#\d+)$/) {
delete $self->{'rules'}{$1};
}
}
}
sub rules {
my ($self, $hostport) = @_;
my @rules = [];
foreach (keys %{$self->{'rules'}}) {
if (/^($hostport\#\#rule\#\#\d+)$/) {
push(@rules, $self->{'rules'}{$1});
}
}
return \@rules;
}
sub hosts {
my ($self) = @_;
my @hosts;
foreach (keys %{$self->{'rules'}}) {
if (/^([^\#]+)\#\#count/) {
push(@hosts, $1);
}
}
return \@hosts;
}
1;
__END__
=head1 NAME
WWW::RobotRules::DB_File - Parse robots.txt files using a disk cache
=head1 SYNOPSIS
require WWW::RobotRules::DB_File;
require LWP::RobotUA;
#Create a robot useragent that uses a disk caching RobotRules
$ua = WWW::RobotUA->new( 'my-robot/1.0', 'me@foo.com' ,
WWW::RobotRules::DB_File->new( 'my-robot/1.0', '/path/cachefile' ));
#The just use $ua as usual
$res=$ua->request($req);
=head1 DESCRIPTION
This is a subclass of L that uses the DB_File package to
implement disk caching of robots.txt.
=head1 METHODS
This is a subclass of L, so it implements the same methods
=over 4
=item $rules = WWW::RobotRules::DB_File->new('my-robot/1.0', /path/cachefile)
This is the constructor. The only difference from the original constructor
from L is that you here has to specify a cache file as well.
=back
=head1 SEE ALSO
L
=head1 AUTHOR
Hakan Ardo
=cut
AnyDBM_File.pm 100644 001750 001751 10453 15204207641 21260 0 ustar 00olaf olaf 000000 000000 WWW-RobotRules-6.03/lib/WWW/RobotRules package WWW::RobotRules::AnyDBM_File;
use strict;
use WWW::RobotRules ();
our @ISA = qw(WWW::RobotRules);
our $VERSION = '6.03';
use Carp ();
use AnyDBM_File;
use Fcntl qw( O_CREAT O_RDWR );
sub new {
my ($class, $ua, $file) = @_;
Carp::croak('WWW::RobotRules::AnyDBM_File filename required') unless $file;
my $self = bless {}, $class;
$self->{'filename'} = $file;
tie %{$self->{'dbm'}}, 'AnyDBM_File', $file, O_CREAT | O_RDWR, 0600
or Carp::croak("Can't open $file: $!");
if ($ua) {
$self->agent($ua);
}
else {
# Try to obtain name from DBM file
$ua = $self->{'dbm'}{"|ua-name|"};
Carp::croak("No agent name specified") unless $ua;
}
$self;
}
sub agent {
my ($self, $newname) = @_;
my $old = $self->{'dbm'}{"|ua-name|"};
if (defined $newname) {
$newname =~ s!/?\s*\d+.\d+\s*$!!; # loose version
unless ($old && $old eq $newname) {
# Old info is now stale. Clear all keys through the tied
# interface rather than untie+tie(O_TRUNC), which is a
# symlink-follow TOCTOU on the DBM-backing file(s).
%{$self->{'dbm'}} = ();
$self->{'dbm'}{"|ua-name|"} = $newname;
}
}
$old;
}
sub no_visits {
my ($self, $netloc) = @_;
my $t = $self->{'dbm'}{"$netloc|vis"};
return 0 unless $t;
(split(/;\s*/, $t))[0];
}
sub last_visit {
my ($self, $netloc) = @_;
my $t = $self->{'dbm'}{"$netloc|vis"};
return undef unless $t;
(split(/;\s*/, $t))[1];
}
sub fresh_until {
my ($self, $netloc, $fresh) = @_;
my $old = $self->{'dbm'}{"$netloc|exp"};
if ($old) {
$old =~ s/;.*//; # remove cleartext
}
if (defined $fresh) {
$fresh .= "; " . localtime($fresh);
$self->{'dbm'}{"$netloc|exp"} = $fresh;
}
$old;
}
sub visit {
my ($self, $netloc, $time) = @_;
$time ||= time;
my $count = 0;
my $old = $self->{'dbm'}{"$netloc|vis"};
if ($old) {
my $last;
($count, $last) = split(/;\s*/, $old);
$time = $last if $last > $time;
}
$count++;
$self->{'dbm'}{"$netloc|vis"} = "$count; $time; " . localtime($time);
}
sub push_rules {
my ($self, $netloc, @rules) = @_;
my $cnt = 1;
$cnt++ while $self->{'dbm'}{"$netloc|r$cnt"};
foreach (@rules) {
$self->{'dbm'}{"$netloc|r$cnt"} = $_;
$cnt++;
}
}
sub clear_rules {
my ($self, $netloc) = @_;
my $cnt = 1;
while ($self->{'dbm'}{"$netloc|r$cnt"}) {
delete $self->{'dbm'}{"$netloc|r$cnt"};
$cnt++;
}
}
sub rules {
my ($self, $netloc) = @_;
my @rules = ();
my $cnt = 1;
while (1) {
my $rule = $self->{'dbm'}{"$netloc|r$cnt"};
last unless $rule;
push(@rules, $rule);
$cnt++;
}
@rules;
}
sub dump { }
1;
__END__
=head1 NAME
WWW::RobotRules::AnyDBM_File - Persistent RobotRules
=head1 SYNOPSIS
require WWW::RobotRules::AnyDBM_File;
require LWP::RobotUA;
# Create a robot useragent that uses a diskcaching RobotRules
my $rules = WWW::RobotRules::AnyDBM_File->new( 'my-robot/1.0', 'cachefile' );
my $ua = WWW::RobotUA->new( 'my-robot/1.0', 'me@foo.com', $rules );
# Then just use $ua as usual
$res = $ua->request($req);
=head1 DESCRIPTION
This is a subclass of I that uses the AnyDBM_File
package to implement persistent diskcaching of F and host
visit information.
The constructor (the new() method) takes an extra argument specifying
the name of the DBM file to use. If the DBM file already exists, then
you can specify undef as agent name as the name can be obtained from
the DBM database.
=head1 SECURITY CONSIDERATIONS
The caller-supplied DBM filename must reside in a directory writable
only by the same user that runs this code. The underlying
C backends open the file via the C C syscall
without C, so a symlink at the cache path (or at its
C<.dir>/C<.pag>/C<.db> siblings) will be followed and the linked
target may be overwritten with DBM page data. The cache file is
created with mode C<0600>; callers that need different permissions
can C after construction.
=head1 SEE ALSO
L, L
=head1 AUTHORS
Hakan Ardo , Gisle Aas
=cut