HTML-Truncate-0.20/ 0000755 0000765 0000024 00000000000 11226750572 014064 5 ustar jinx staff 0000000 0000000 HTML-Truncate-0.20/Changes 0000644 0000765 0000024 00000014716 11226750560 015365 0 ustar jinx staff 0000000 0000000 Revision history for HTML-Truncate 0.20 20090713 - Normalized test names to hyphen from underscore. - Patch from the omnipresent t0m to fix TT2 recipe. 0.19 20090512 - Fixed Pod typos. 0.18 20090509 - Fixed broken README. 0.17 20090509 - Really been a year...? - Typo in Pod. - Autogenerate README. - Bug in Pod showing utf8 => 1 setting. - Fixed bad optimization in filter recipe. - Fixed char counting problem I discovered while testing. - Fixed standalone tag -- e.g., br -- close bug with repair. - Might have made visual spacing counting more robust. 0.16 20080529 - Updated Pod a bit. 0.15 20080529 - Was considering trying to work around 5.6's utf8 issues but it's not in the cards right now and I'd rather have a module passing all its tests so, 5.8 or better required now. Leaving skips in truncate_html.t in case I ever do it. 0.14 20080529 - Put in skips for tests involving utf8_mode and perl less than 5.8. - Put Encode into test reqs and put File::Spec version. 0.13 20080529 - Fixed my build to remove META.yml, 0.12 had the file from 0.11. - Reorganized tests with /xt; pod-coverage, perlcritic, etc. - Cleaned up Pod. - Put license in Makefile explicitly. 0.12 20080528 - Discoered it didn't work nearly as well (regarding counting of "visible" output as I thought so I rewrote it. Three times actually but ended up with something similar to the original in the end, mostly because, bizarrely, HTML::TreeBuilder does not make text into nodes. This makes the module half useless. If it only did that, it would simple to have redone this module with it in a bombproof way. I considered redoing it with XML::LibXML to achieve the same thing and might add that in eventually with an "eval require" kind of check. - Added some accounting for
tags to be taken literally. - Lorenzo Iannuzzi suggested a way to clip to word-ish boundaries instead of truncating leaving hanging space. I already do something similar in TT2 so I kept his as on_space() and mine as cleanly(), which can accept a regex if you know how you want to handle it better than the default. - Took out style setting, it was pointless, goofy, and antithetical to the point of the module. - Was going to take out "beta" notice - but rewrote too much of it to do so. - Moved to Module::Install. - Noticed the utf8 ellipsis is tripping up perl 5.6. But I don't want to put a prereq in... maybe skip that test for that version. - Several new tests. - Pod clean up, mild reorg. 0.11 200607 - kevinr: Added the 'repair' option, allowing you to tell HTML::Truncate to close unmatched open tags and discard unmatched close tags, with Pod and tests. - Added HTML::Tagset to prereqs; it's not used directly but caused an error on my clean perl install. - Fixed a couple of Pod spelling errors and typos. 0.10 20060304 1750 - Updated "alpha" --> "beta" in README. - Removed .cvsignore from MANIFEST and put in MANIFEST.skip. - Added TT recipe to Pod. 0.09 20060304 1247 - Ryo Okamoto reported two bugs (object needs its renewed text reset per truncate call and pointed out that the chars vs percent wasn't clear enough or bomb-proof). Also suggested different behavior for dropping trailing tags if they've lost their content (i.e., dropping the \Z instead of doing ...). - So calls to chars() now clears the percent() and vice versa. - New test file "countdown" from Ryo Okamoto added. - Pod updates to reflect changes. - Took out defined check on chars. Zero should not be allowed. - Added an "if" clause to the length padding check. - Returning undef if percent() is called when it's unset and not being set. 0.08 20060228 1622 - Fixed year in Change entry below (from 2003). - Fixed percent to not allow a 0%. - Fixed percent bug -- if it was loaded up front instead of in the truncate() call, it was being silently dropped for the default 100 chars. - Made test path agnostic with File::Spec. Added to build prereqs. - Fixed bug with default percent being English instead of decimal; shouldn't be there at all. - Calling it beta now that the known bugs are out; couple other Pod tweaks. 0.07 20060228 1130 - Added embed to skip tags. 0.06 20060101 0152 - Moved the 'ellipsis' to the end of the close tags; it's a compromise but t'is for the best. - And moved it back out. Bad compromise. Needs to have a tree logic to decide if it should go inside the last block level tag or not at all. - Added to stand alone tags. - Put in a more helpful "We have to test something.
'; my $readmore = '... [readmore]'; my $html_truncate = HTML::Truncate->new(); $html_truncate->chars(20); $html_truncate->ellipsis($readmore); print $html_truncate->truncate($html); # or use Encode; my $ht = HTML::Truncate->new( utf8_mode => 1, chars => 1_000, ); print Encode::encode_utf8( $ht->truncate($html) ); =head1 XHTML This module is designed to work with XHTML-style nested tags. More below. =head1 WHITESPACE AND ENTITIES Repeated natural whitespace (i.e., "\s+" and not " ") in HTML -- with rare exception (pre tags or user defined styles) -- is not meaningful. Therefore it is normalized when truncating. Entities are also normalized. The following is only counted 14 chars long. \n\nthis is ‘text’\n\n
^^^^^^^12345----678--9------01234------^^^^^^^^ =head1 METHODS =over 4 =item BWe have to test something.
'; my $readmore = '... [readmore]'; my $html_truncate = HTML::Truncate->new(); $html_truncate->chars(20); $html_truncate->ellipsis($readmore); print $html_truncate->truncate($html); # or use Encode; my $ht = HTML::Truncate->new( utf8_mode => 1, chars => 1_000, ); print Encode::encode_utf8( $ht->truncate($html) ); =head1 XHTML This module is designed to work with XHTML-style nested tags. More below. =head1 WHITESPACE AND ENTITIES Repeated natural whitespace (i.e., "\s+" and not " ") in HTML -- with rare exception (pre tags or user defined styles) -- is not meaningful. Therefore it is normalized when truncating. Entities are also normalized. The following is only counted 14 chars long. \n\nthis is ‘text’\n\n
^^^^^^^12345----678--9------01234------^^^^^^^^ =head1 METHODS =over 4 =item BWe have to test something.
'; my $readmore = '... [readmore]'; my $html_truncate = HTML::Truncate->new(); $html_truncate->chars(20); $html_truncate->ellipsis($readmore); print $html_truncate->truncate($html); # or use Encode; my $ht = HTML::Truncate->new( utf8_mode => 1, chars => 1_000, ); print Encode::encode_utf8( $ht->truncate($html) ); XHTML This module is designed to work with XHTML-style nested tags. More below. WHITESPACE AND ENTITIES Repeated natural whitespace (i.e., "\s+" and not " ") in HTML -- with rare exception (pre tags or user defined styles) -- is not meaningful. Therefore it is normalized when truncating. Entities are also normalized. The following is only counted 14 chars long. \n\nthis is ‘text’\n\n
^^^^^^^12345----678--9------01234------^^^^^^^^ METHODS new Can take all the methods as hash style args. "percent" and "chars" are incompatible so don't use them both. Whichever is set most recently will erase the other. my $ht = HTML::Truncate->new(utf8_mode => 1, chars => 500, # default is 100 ); utf8_mode Set/get, true/false. If "utf8_mode" is set, utf8_mode(1) is also set in the underlying HTML::Parser, entities will be transformed with decode and the default ellipsis will be a literal ellipsis and not the default of "…". chars Set/get. The number of characters remaining after truncation, excluding the "ellipsis". Entities are counted as single characters. E.g., "©" is one character for truncation counts. Default is "100." Side-effect: clears any "percent" that has been set. percent Set/get. A percentage to keep while truncating the rest. For a document of 1,000 chars, percent('15%') and chars(150) would be equivalent. The actual amount of character that the percent represents cannot be known until the given HTML is parsed. Side-effect: clears any "chars" that has been set. ellipsis Set/get. Ellipsis in this case means -- The omission of a word or phrase necessary for a complete syntactical construction but not necessary for understanding. http://www.answers.com/topic/ellipsis What it will probably mean in most real applications is "read more." The default is "…" which if the utf8 flag is true will render as a literal ellipsis, "chr(8230)". The reason the default is "…" and not "..." is this is meant for use in HTML environments, not plain text, and "..." (dot-dot-dot) is not typographically correct or equivalent to a real horizontal ellipsis character. truncate It returns the truncated XHTML if asked for a return value. my $truncated = $ht->truncate($html); It will truncate the string in place if no return value is expected (wantarray is not defined). $ht->truncate($html); print $html; Also can be called with inline arguments- print $ht->truncate( $html, $chars_or_percent, $ellipsis ); No arguments are strictly required. Without HTML to operate upon it returns undef. The two optional arguments may be preset with the methods "chars" (or "percent") and "ellipsis". Valid nesting of tags is required (alla XHTML). Therefore some old HTML habits likewithout a
are not supported and may cause a fatal error. See "repair" for help with badly formed HTML. Certain tags are omitted by default from the truncated output. * Skipped tags These will not be included in truncated output by default. ...11/20/2003 Tags to test and check and such.
I think we can do this in a pretty straightforward fashion otherwise.
It’s a link along with this. I dislike formatting dummy HTML. 15th generation Americans are sometimes decent fellows though gentlemen may differ.
Now you have plenty to test.
—Moo-cow-moo
Something else.
11/20/2003 Tags to test and check and such.
I think we can do this in a pretty straightforward fashion otherwise.
It’s a link along with this. I dislike formatting dummy HTML. 15th generation Americans are sometimes decent fellows though gentlemen may differ.
Now you have plenty to test.
—Moo-cow-moo
Something else.
We have to test something.
'; my $test = [ 1 => 'W...
', 2 => 'We...
', 3 => 'We ...
', 4 => 'We h...
', 5 => 'We ha...
', 6 => 'We hav...
', 7 => 'We have...
', 8 => 'We have ...
', 9 => 'We have t...
', 10 => 'We have to...
', 11 => 'We have to ...
', 12 => 'We have to t...
', 13 => 'We have to te...
', 14 => 'We have to tes...
', 15 => 'We have to test...
', 16 => 'We have to test ...
', 17 => 'We have to test s...
', 18 => 'We have to test so...
', 19 => 'We have to test som...
', 20 => 'We have to test some...
', 21 => 'We have to test somet...
', 22 => 'We have to test someth...
', 23 => 'We have to test somethi...
', 24 => 'We have to test somethin...
', 25 => 'We have to test something...
', 26 => 'We have to test something....
', 27 => 'We have to test something.
', ]; while ( my( $key, $val ) = splice @{$test}, 0, 2 ){ $ht->chars( $key ); my $result; is( $result = $ht->truncate( $html ), $val, $result ); # diag( $key . ' ' . $ht->truncate( $html ) ) if $ENV{TEST_VERBOSE}; } HTML-Truncate-0.20/t/countdown.t 0000644 0000765 0000024 00000003537 11017363607 016541 0 ustar jinx staff 0000000 0000000 #!perl use strict; use Test::More tests => 27; use FindBin; use File::Spec; use lib File::Spec->catfile($FindBin::Bin, 'lib'); use HTML::Truncate; print $HTML::Truncate::VERSION, $/; my $ht = HTML::Truncate->new(); $ht->ellipsis('...'); my $html = 'We have to test something.
'; my $test = [ 1 => 'W...
', 2 => 'We...
', 3 => 'We...
', 4 => 'We h...
', 5 => 'We ha...
', 6 => 'We hav...
', 7 => 'We have...
', 8 => 'We have...
', 9 => 'We have t...
', 10 => 'We have to...
', 11 => 'We have to...
', 12 => 'We have to t...
', 13 => 'We have to te...
', 14 => 'We have to tes...
', 15 => 'We have to test...
', 16 => 'We have to test...
', 17 => 'We have to test s...
', 18 => 'We have to test so...
', 19 => 'We have to test som...
', 20 => 'We have to test some...
', 21 => 'We have to test somet...
', 22 => 'We have to test someth...
', 23 => 'We have to test somethi...
', 24 => 'We have to test somethin...
', 25 => 'We have to test something...
', 26 => 'We have to test something...
', 27 => 'We have to test something.
', ]; while ( my( $key, $val ) = splice @{$test}, 0, 2 ){ $ht->chars( $key ); my $result; is( $result = $ht->truncate( $html ), $val, $result ); # diag( $key . ' ' . $ht->truncate( $html ) ) if $ENV{TEST_VERBOSE}; } HTML-Truncate-0.20/t/linked-readmore.t 0000644 0000765 0000024 00000003236 11202703545 017552 0 ustar jinx staff 0000000 0000000 use warnings; use Encode; use Test::More tests => 1; use HTML::Truncate; my $html = <<"";We have to test something.
'; my $test = [ 1 => '...
', 2 => 'We...
', 3 => 'We...
', 4 => 'We...
', 5 => 'We...
', 6 => 'We...
', 7 => 'We have...
', 8 => 'We have...
', 9 => 'We have...
', 10 => 'We have to...
', 11 => 'We have to...
', 12 => 'We have to...
', 13 => 'We have to...
', 14 => 'We have to...
', 15 => 'We have to test...
', 16 => 'We have to test...
', 17 => 'We have to test ...
', 18 => 'We have to test ...
', 19 => 'We have to test ...
', 20 => 'We have to test ...
', 21 => 'We have to test ...
', 22 => 'We have to test ...
', 23 => 'We have to test ...
', 24 => 'We have to test ...
', 25 => 'We have to test something...
', 26 => 'We have to test something...
', 27 => 'We have to test something.
', ]; while ( my( $key, $val ) = splice @{$test}, 0, 2 ){ $ht->chars( $key ); my $result; is( $result = $ht->truncate( $html ), $val, $result ); # diag( $key . ' ' . $ht->truncate( $html ) ) if $ENV{TEST_VERBOSE}; } HTML-Truncate-0.20/t/pre.t 0000644 0000765 0000024 00000002127 11017366616 015304 0 ustar jinx staff 0000000 0000000 use strict; use Test::More tests => 5; use FindBin; use File::Spec; use lib File::Spec->catfile($FindBin::Bin, 'lib'); use HTML::Truncate; my $ht = HTML::Truncate->new(); my $html = join('', ); { my $char_count = 50; $ht->chars($char_count); ok( $ht->chars() == $char_count, "Chars is reset to $char_count" ); ok( my $trunc = $ht->truncate($html), "Truncating HTML" ); is( $trunc, _with_pre(), "Truncation ofversion matches expectations" ); } { # Swap tags, run otherwise same HTML through $html =~ s,(?)pre,${1}p,g; ok( my $trunc = $ht->truncate($html), "Truncating HTML" ); is( $trunc, _with_p(), "Truncation ofversion matches expectations" ); } sub _with_pre { return q{
}; } sub _with_p { return q{Some indentation with tags inside An…}; } __DATA__Some indentation with tags inside And another line…
HTML-Truncate-0.20/t/repair.t 0000644 0000765 0000024 00000002331 11201365403 015761 0 ustar jinx staff 0000000 0000000 use strict; use warnings; use Test::More tests => 14; use HTML::Truncate; my $cases = { 1 => [ 'foobar', 'foobar'], 2 => [ 'Some indentation with tags inside And another linefoobar
', 'foobar
'], 3 => [ 'foobar', 'foobar'], 4 => [ 'foobar', 'foobar'], 5 => [ 'foobar', 'foobar'], 6 => [ 'foobar quux', 'foobar quux'], 7 => [ 'foobar
quux', 'foobar
quux'], 8 => [ 'foobar
', '
quux
.
.foobar
' ], }; ok( my $ht = HTML::Truncate->new(), "HTML::Truncate->new()" ); isa_ok( $ht, 'HTML::Truncate' ); ok( !$ht->repair, '$ht->repair defaults properly' ); $ht->repair(1); ok( $ht->repair, '$ht->repair(1)' ); $ht->repair(); ok( $ht->repair, 'No change' ); $ht->repair(0); ok( !$ht->repair, '$ht->repair(0)' ); $ht->repair(1); for my $key (sort keys %{$cases}) { is( $ht->truncate($cases->{$key}->[0]), $cases->{$key}->[1], "Repaired case $key"); } 1; HTML-Truncate-0.20/t/synopsis.t 0000644 0000765 0000024 00000001307 11017065316 016375 0 ustar jinx staff 0000000 0000000 use strict; use FindBin; use File::Spec; use lib File::Spec->catfile($FindBin::Bin, 'lib'); use Test::More tests => 2; #------------------------------------------------------------------ open F, '<', "$FindBin::Bin/../lib/HTML/Truncate.pm" or die "Couldn't open self module to read!"; my $synopsis = ''; while (
quux
.
.) { if ( /=head1 SYNOPSIS/i .. /=head\d (?!S)/ and not /^=/ ) { $synopsis .= $_; } } close F; ok( $synopsis, "Got code out of the SYNOPSIS space to evaluate" ); diag( $synopsis ) if $ENV{TEST_VERBOSE}; my $ok = eval "$synopsis; print qq{\n}; 1;"; ok( $ok, "Synopsis eval'd" ); diag( $@ . "\n" . $synopsis ) if $@ and $ENV{TEST_VERBOSE}; HTML-Truncate-0.20/t/truncate-html.t 0000644 0000765 0000024 00000007643 11202700657 017306 0 ustar jinx staff 0000000 0000000 use strict; use Test::More tests => 23; use FindBin; use File::Spec; use lib File::Spec->catfile($FindBin::Bin, 'lib'); use HTML::Truncate; use Encode; ok( my $ht = HTML::Truncate->new(), "HTML::Truncate->new()" ); isa_ok( $ht, 'HTML::Truncate' ); ok( $ht->ellipsis() eq '…', "Ellipsis defaults properly"); diag ( 'Ellipsis: "' . $ht->ellipsis() . '"' ) if $ENV{TEST_VERBOSE}; SKIP: { skip "perl 5.8 or better for unicode features", 4 if $] < 5.008; ok( $ht->utf8_mode(1), "Set utf8_mode" ); ok( $ht->utf8_mode(), "Get utf8_mode" ); ok( $ht->ellipsis() eq chr(8230), "Ellipsis defaults properly" ); if ( $ENV{TEST_VERBOSE} ) { my $ellipsis = Encode::encode_utf8( $ht->ellipsis() ); diag( qq{Ellipsis: "$ellipsis"} ); } ok( $ht->utf8_mode(undef), "Unset utf8_mode" ); } ok( ! $ht->utf8_mode(), "Check utf8_mode is 'off'" ); my $html = join('', ); my $original_length = length($html); is( $original_length, 974, "Test HTML is expected length" ); diag("Length of original corpus is $original_length") if $ENV{TEST_VERBOSE}; ok( $ht->chars() == 100, "Chars is defaulting properly" ); { my $char_count = 10; ok( $ht->chars($char_count), "Setting chars to $char_count" ); ok( $ht->chars() == $char_count, "Chars is reset to $char_count" ); SKIP: { skip "perl 5.8 or better for unicode features", 4 if $] < 5.01; ok( $ht->utf8_mode(1), "Setting utf8_mode" ); ok( $ht->cleanly(undef), "Turning off cleanly"); ok( my $trunc = $ht->truncate($html), "Truncating HTML" ); my $strip = $trunc; _strip_html($strip); is( length($strip), $ht->chars + length($ht->ellipsis), "Length from character count matches expectation" ); diag("TRUNCATED:\n" . Encode::encode_utf8($trunc)) if $ENV{TEST_VERBOSE}; diag(" STRIPPED:\n" . Encode::encode_utf8($strip)) if $ENV{TEST_VERBOSE}; } } { my $char_count = 100; ok( $ht->chars($char_count), "Setting chars to $char_count" ); ok( $ht->chars() == $char_count, "Chars is reset to $char_count" ); ok( $ht->utf8_mode(1), "Setting utf8_mode" ); ok( my $trunc = $ht->truncate($html), "Truncating HTML" ); my $strip = $trunc; _strip_html($strip); is( length($strip), ( $ht->chars + length($ht->ellipsis) ), "Length from character count matches expectation" ); diag("TRUNCATED:\n" . Encode::encode_utf8($trunc)) if $ENV{TEST_VERBOSE}; diag(" STRIPPED:\n" . Encode::encode_utf8($strip)) if $ENV{TEST_VERBOSE}; } ok( $ht->percent('52%'), 'Setting percentage to 52%' ); ok( my $renewed = $ht->truncate($html), "Truncating" ); #is( length($renewed), 580, # "Length from percentage matches expectation" ); sub _strip_html { # Simple HTML stripper since we know the content is clean for it. $_[0] =~ s/\&(?=\S)[^;]+;/./g; $_[0] =~ s/
/./g; $_[0] =~ s/<[^>]+>//g; $_[0] =~ s/\s\s+/ /g; $_[0] =~ s/\A\s+//g; } __DATA__
HTML-Truncate-0.20/xt/ 0000755 0000765 0000024 00000000000 11226750572 014517 5 ustar jinx staff 0000000 0000000 HTML-Truncate-0.20/xt/perl-critic.t 0000644 0000765 0000024 00000001012 11033211704 017074 0 ustar jinx staff 0000000 0000000 use strict; use warnings; use File::Spec; use FindBin (); use Test::More; if ( !-e "$FindBin::Bin/../MANIFEST.SKIP" ) { plan skip_all => 'Critic test only for developers.'; } else { eval { require Test::Perl::Critic }; if ( $@ ) { plan tests => 1; fail( 'You must install Test::Perl::Critic to run perl-critic.t' ); exit; } } # my $rcfile = File::Spec->catfile( 't', '04critic.rc' ); # Test::Perl::Critic->import( -profile => $rcfile ); Test::Perl::Critic::all_critic_ok(); HTML-Truncate-0.20/xt/pod-coverage.t 0000644 0000765 0000024 00000000254 11033211704 017241 0 ustar jinx staff 0000000 0000000 #!perl -T use Test::More; eval "use Test::Pod::Coverage 1.04"; plan skip_all => "Test::Pod::Coverage 1.04 required for testing POD coverage" if $@; all_pod_coverage_ok(); HTML-Truncate-0.20/xt/pod.t 0000644 0000765 0000024 00000000214 11033211704 015444 0 ustar jinx staff 0000000 0000000 #!perl -T use Test::More; eval "use Test::Pod 1.14"; plan skip_all => "Test::Pod 1.14 required for testing POD" if $@; all_pod_files_ok();About the “author”
11/20/2003 Tags to test and check and such.
I think we can do this in a pretty straightforward fashion otherwise.
It’s a link along with this. I dislike formatting dummy HTML. 15th generation Americans are sometimes decent fellows though gentlemen may differ.
Now you have plenty to test.
—Moo-cow-moo
Something else.