URI-Find-Delimited-0.03/0000755000175000017500000000000012407525427014545 5ustar vagrantvagrantURI-Find-Delimited-0.03/META.yml0000664000175000017500000000102412407525427016015 0ustar vagrantvagrant--- #YAML:1.0 name: URI-Find-Delimited version: 0.03 abstract: ~ author: [] license: unknown distribution_type: module configure_requires: ExtUtils::MakeMaker: 0 build_requires: ExtUtils::MakeMaker: 0 requires: Test::More: 0 URI::Find: 0 URI::URL: 0 no_index: directory: - t - inc generated_by: ExtUtils::MakeMaker version 6.57_05 meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 URI-Find-Delimited-0.03/MANIFEST0000644000175000017500000000023312407525427015674 0ustar vagrantvagrantChanges MANIFEST Makefile.PL README lib/URI/Find/Delimited.pm t/delimited.t META.yml Module meta-data (added by MakeMaker) URI-Find-Delimited-0.03/Makefile.PL0000644000175000017500000000055312407523370016515 0ustar vagrantvagrantuse ExtUtils::MakeMaker; WriteMakefile( NAME => "URI::Find::Delimited", VERSION_FROM => "lib/URI/Find/Delimited.pm", PREREQ_PM => { 'Test::More' => 0, 'URI::URL' => 0, 'URI::Find' => 0 } ); URI-Find-Delimited-0.03/README0000644000175000017500000001025112407517743015426 0ustar vagrantvagrantNAME URI::Find::Delimited - Find URIs which may be wrapped in enclosing delimiters. DESCRIPTION Works like URI::Find, but is prepared for URIs in your text to be wrapped in a pair of delimiters and optionally have a title. This will be useful for processing text that already has some minimal markup in it, like bulletin board posts or wiki text. SYNOPSIS my $finder = URI::Find::Delimited->new; my $text = "This is a [http://the.earth.li/ titled link]."; $finder->find(\$text); print $text; METHODS new my $finder = URI::Find::Delimited->new( callback => \&callback, delimiter_re => [ '\[', '\]' ], ignore_quoted => 1 # defaults to 0 ); All arguments are optional; defaults are provided (see below). Creates a new URI::Find::Delimited object. This object works similarly to a URI::Find object, but as well as just looking for URIs it is also aware of the concept of a wrapped, titled URI. These look something like [http://foo.com/ the foo website] where: * "[" is the opening delimiter * "]" is the closing delimiter * "http://foo.com/" is the URI * "the foo website" is the title * the URI and title are separated by spaces and/or tabs The URI::Find::Delimited object will extract each of these parts separately and pass them to your callback. callback "callback" is a function which is called on each URI found. It is passed five arguments: the opening delimiter (if found), the closing delimiter (if found), the URI, the title (if found), and any whitespace found between the URI and title. The return value of the callback will replace the original URI in the text. If you do not supply your own callback, the object will create a default one which will put your URIs in 'a href' tags using the URI for the target and the title for the link text. If no title is provided for a URI then the URI itself will be used as the title. If the delimiters aren't balanced (eg if the opening one is present but no closing one is found) then the URI is treated as not being wrapped. Note: the default callback will not remove the delimiters from the text. It should be simple enough to write your own callback to remove them, based on the one in the source, if that's what you want. In fact there's an example in this distribution, in "t/delimited.t". delimiter_re The "delimiter_re" parameter is optional. If you do supply it then it should be a ref to an array containing two regexes. It defaults to using single square brackets as the delimiters. Don't use capturing groupings "( )" in your delimiters or things will break. Use non-capturing "(?: )" instead. ignore_quoted If the "ignore_quoted" parameter is supplied and set to a true value, then any URIs immediately preceded with a double-quote character will not be matched, ie your callback will not be executed for them and they'll be treated just as normal text. This is a bit of a hack but it's in here because I need to be able to ignore things like A better implementation may happen at some point. SEE ALSO URI::Find. AUTHOR Kake Pugh (kake@earth.li). COPYRIGHT Copyright (C) 2003 Kake Pugh. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. CREDITS Tim Bagot helped me stop faffing over the name, by pointing out that RFC 2396 Appendix E uses "delimited". Dave Hinton helped me fix the regex to make it work for delimited URIs with no title. Nick Cleaton helped me make "ignore_quoted" work. Some of the code was taken from URI::Find. URI-Find-Delimited-0.03/lib/0000755000175000017500000000000012407525427015313 5ustar vagrantvagrantURI-Find-Delimited-0.03/lib/URI/0000755000175000017500000000000012407525427015752 5ustar vagrantvagrantURI-Find-Delimited-0.03/lib/URI/Find/0000755000175000017500000000000012407525427016632 5ustar vagrantvagrantURI-Find-Delimited-0.03/lib/URI/Find/Delimited.pm0000644000175000017500000001447512407523437021102 0ustar vagrantvagrantpackage URI::Find::Delimited; use strict; use vars qw( $VERSION ); $VERSION = '0.03'; use base qw(URI::Find); # For 5.005_03 compatibility (copied from URI::Find::Schemeless) use URI::Find (); use URI::URL; =head1 NAME URI::Find::Delimited - Find URIs which may be wrapped in enclosing delimiters. =head1 DESCRIPTION Works like L, but is prepared for URIs in your text to be wrapped in a pair of delimiters and optionally have a title. This will be useful for processing text that already has some minimal markup in it, like bulletin board posts or wiki text. =head1 SYNOPSIS my $finder = URI::Find::Delimited->new; my $text = "This is a [http://the.earth.li/ titled link]."; $finder->find(\$text); print $text; =head1 METHODS =over 4 =item B my $finder = URI::Find::Delimited->new( callback => \&callback, delimiter_re => [ '\[', '\]' ], ignore_quoted => 1 # defaults to 0 ); All arguments are optional; defaults are provided (see below). Creates a new URI::Find::Delimited object. This object works similarly to a L object, but as well as just looking for URIs it is also aware of the concept of a wrapped, titled URI. These look something like [http://foo.com/ the foo website] where: =over 4 =item * C<[> is the opening delimiter =item * C<]> is the closing delimiter =item * C is the URI =item * C is the title =item * the URI and title are separated by spaces and/or tabs =back The URI::Find::Delimited object will extract each of these parts separately and pass them to your callback. =over 4 =item B C is a function which is called on each URI found. It is passed five arguments: the opening delimiter (if found), the closing delimiter (if found), the URI, the title (if found), and any whitespace found between the URI and title. The return value of the callback will replace the original URI in the text. If you do not supply your own callback, the object will create a default one which will put your URIs in 'a href' tags using the URI for the target and the title for the link text. If no title is provided for a URI then the URI itself will be used as the title. If the delimiters aren't balanced (eg if the opening one is present but no closing one is found) then the URI is treated as not being wrapped. Note: the default callback will not remove the delimiters from the text. It should be simple enough to write your own callback to remove them, based on the one in the source, if that's what you want. In fact there's an example in this distribution, in C. =item B The C parameter is optional. If you do supply it then it should be a ref to an array containing two regexes. It defaults to using single square brackets as the delimiters. Don't use capturing groupings C<( )> in your delimiters or things will break. Use non-capturing C<(?: )> instead. =item B If the C parameter is supplied and set to a true value, then any URIs immediately preceded with a double-quote character will not be matched, ie your callback will not be executed for them and they'll be treated just as normal text. This is a bit of a hack but it's in here because I need to be able to ignore things like A better implementation may happen at some point. =back =cut sub new { my ($class, %args) = @_; my ( $callback, $delimiter_re, $ignore_quoted ) = @args{ qw( callback delimiter_re ignore_quoted ) }; unless (defined $callback) { $callback = sub { my ($open, $close, $uri, $title, $whitespace) = @_; if ( $open && $close ) { $title ||= $uri; qq|$open$title$close|; } else { qq|$open$uri$whitespace$title$close|; } }; } $delimiter_re ||= [ '\[', '\]' ]; my $self = bless { callback => $callback, delimiter_re => $delimiter_re, ignore_quoted => $ignore_quoted }, $class; return $self; } sub find { my($self, $r_text) = @_; my $urlsfound = 0; URI::URL::strict(1); # Don't assume any old thing followed by : is a scheme my $uri_re = $self->uri_re; my $prefix_re = $self->{ignore_quoted} ? '(?{delimiter_re}[0]; my $close_re = $self->{delimiter_re}[1]; # Note we only allow spaces and tabs, not all whitespace, between a URI # and its title. Also we disallow newlines *in* the title. These are # both to avoid the bug where $uri1\n$uri2 leads to $uri2 being considered # as part of the title, and thus not wrapped. $$r_text =~ s{$prefix_re # maybe don't match things preceded by a " (?: ($open_re) # opening delimiter ($uri_re) # the URI itself ([ \t]*) # optional whitespace between URI and title ((?<=[ \t])[^\n$close_re]+)? #title if there was whitespace ($close_re) # closing delimiter | ($uri_re) # just the URI itself ) }{ my ($open, $uri_match, $whitespace, $title, $close, $just_uri) = ($1, $2, $3, $4, $5, $6); $uri_match = $just_uri if $just_uri; foreach ( $open, $whitespace, $title, $close ) { $_ ||= ""; } my $orig_text = qq|$open$uri_match$whitespace$title$close|; if( my $uri = $self->_is_uri( \$uri_match ) ) { # if not a false alarm $urlsfound++; $self->{callback}->($open,$close,$uri_match,$title,$whitespace); } else { $orig_text; } }egx; return $urlsfound; } =head1 SEE ALSO L. =head1 AUTHOR Kake Pugh (kake@earth.li). =head1 COPYRIGHT Copyright (C) 2003 Kake Pugh. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 CREDITS Tim Bagot helped me stop faffing over the name, by pointing out that RFC 2396 Appendix E uses "delimited". Dave Hinton helped me fix the regex to make it work for delimited URIs with no title. Nick Cleaton helped me make C work. Some of the code was taken from L. =cut 1; URI-Find-Delimited-0.03/Changes0000644000175000017500000000053212407523756016043 0ustar vagrantvagrant0.03 21 September 2014 Add URI::URL as a prereq and use it since URI::Find removed it as one. (CPAN RT #99003) 0.02 24 March 2003 Bugfix (CPAN RT #2245) - turned on URI::URL::strict to stop it assuming that any old thing followed by a colon is a scheme. 0.01 18 February 2003 Initial release. URI-Find-Delimited-0.03/t/0000755000175000017500000000000012407525427015010 5ustar vagrantvagrantURI-Find-Delimited-0.03/t/delimited.t0000644000175000017500000001035707637547014017150 0ustar vagrantvagrantuse strict; local $^W = 1; use Test::More tests => 18; use_ok( "URI::Find::Delimited" ); my $finder = URI::Find::Delimited->new; my $text = "This contains no URIs"; $finder->find(\$text); is( $text, qq|This contains no URIs|, "left alone if no URIs" ); $text = "http://the.earth.li/ foo bar"; $finder->find(\$text); like( $text, qr|http://the.earth.li/|, "URIs at very start of line are picked up" ); is( $text, qq|http://the.earth.li/ foo bar|, "...and don't pick up trailing stuff as a title" ); $text = "foo bar http://the.earth.li/"; $finder->find(\$text); is( $text, qq|foo bar http://the.earth.li/|, "URIs at very end of line are picked up" ); $text = "This is a sentence containing http://the.earth.li/"; $finder->find(\$text); is( $text, qq|This is a sentence containing http://the.earth.li/|, "URI used as title if no title or delimiters" ); #print "# $text\n"; $text = "[http://use.perl.org/]"; $finder->find(\$text); is( $text, qq|[http://use.perl.org/]|, "delimited URIs are found even if no title" ); $text = "This has a [http://the.earth.li/ usemod link]"; $finder->find(\$text); is( $text, qq|This has a [usemod link]|, "title found and used" ); #print "# $text\n"; $text = "This has a [http://the.earth.li/ broken usemod link"; $finder->find(\$text); is( $text, qq|This has a [http://the.earth.li/ broken usemod link|, "title ignored when final square bracket missing" ); #print "# $text\n"; $text = "This has a http://the.earth.li/ broken usemod link]"; $finder->find(\$text); is( $text, qq|This has a http://the.earth.li/ broken usemod link]|, "title ignored when first square bracket missing" ); #print "# $text\n"; $text = <find(\$text); like( $text, qr|http://www.pubs.com/|, "untitled URI following another untitled URI gets picked up correctly" ); $text = <find(\$text); like( $text, qr|foo|, "titled URI following untitled URI gets picked up correctly" ); # Test alternative callbacks. $finder = URI::Find::Delimited->new( callback => sub { my ($open, $close, $uri, $title, $whitespace) = @_; if ( $open && $close ) { $title ||= $uri; qq|$title|; } else { qq|$uri$whitespace$title|; } } ); $text = "This has a [http://the.earth.li/ usemod link]"; $finder->find(\$text); is( $text, qq|This has a usemod link|, "can override callback" ); # Test alternative delimiters. $finder = URI::Find::Delimited->new( delimiter_re => [ '\{', '\}' ] ); $text = qq|A {http://the.earth.li/ titled link}|; $finder->find(\$text); is( $text, qq|A {titled link}|, "can overrride the delimiters" ); # Test ignoring quoted URIs. $finder = URI::Find::Delimited->new; $text = qq|This has a link already embedded|; $finder->find(\$text); is( $text, qq|This has a http://the.earth.li/">link already embedded|, "URIs in existing links picked up by default" ); $finder = URI::Find::Delimited->new( ignore_quoted => 0 ); $text = qq|This has a link already embedded|; $finder->find(\$text); is( $text, qq|This has a http://the.earth.li/">link already embedded|, "...and when ignore_quoted is false" ); $finder = URI::Find::Delimited->new( ignore_quoted => 1 ); $text = qq|This has a link already embedded|; $finder->find(\$text); is( $text, qq|This has a link already embedded|, "...but not when ignore_quoted is true" ); # Bug CPAN RT #2245 $finder = URI::Find::Delimited->new; $text = qq|style:font|; $finder->find(\$text); is( $text, "style:font", "random things with colons in not automatically assumed to be URIs" );