structure in tact for inclusion as "multipart/alternative" content. Further, it will
preserve both the formatting specified by the tag's "align" attribute, and will
also preserve multiline text inside of a | element provided it is broken using
tags.
INSTALLATION
To install this module, run the following commands:
perl Makefile.PL
make
make test
make install
SUPPORT AND DOCUMENTATION
After installing, you can find documentation for this module with the
perldoc command.
perldoc HTML::FormatText::WithLinks::AndTables
You can also look for information at:
RT, CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-FormatText-WithLinks-AndTables
AnnoCPAN, Annotated CPAN documentation
http://annocpan.org/dist/HTML-FormatText-WithLinks-AndTables
CPAN Ratings
http://cpanratings.perl.org/d/HTML-FormatText-WithLinks-AndTables
Search CPAN
http://search.cpan.org/dist/HTML-FormatText-WithLinks-AndTables
COPYRIGHT AND LICENCE
Copyright (C) 2008 Shaun Fryer
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
HTML-FormatText-WithLinks-AndTables/t/ 0000755 0001750 0001750 00000000000 11764241375 017345 5 ustar sfryer sfryer HTML-FormatText-WithLinks-AndTables/t/00-load.t 0000755 0001750 0001750 00000000340 11764241375 020666 0 ustar sfryer sfryer #!perl -T
use Test::More tests => 1;
BEGIN {
use_ok( 'HTML::FormatText::WithLinks::AndTables' );
}
diag( "Testing HTML::FormatText::WithLinks::AndTables $HTML::FormatText::WithLinks::AndTables::VERSION, Perl $], $^X" );
HTML-FormatText-WithLinks-AndTables/t/html-formattext-withlinks-andtables.t 0000755 0001750 0001750 00000006455 11764241375 026653 0 ustar sfryer sfryer use strict;
use HTML::FormatText::WithLinks::AndTables;
my $html = q|
BIG
How's it going?
How's it going?
How's it going?
How's it going?
How's it going?
How's it going?
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec hendrerit venenatis dolor. Suspendisse in neque id odio auctor porttitor. Cras adipiscing, orci in venenatis semper, nibh tortor posuere magna, ac blandit sapien purus et ligula. Proin et libero. Duis pellentesque, tellus a viverra pretium, lacus urna fermentum elit, et tempor nibh urna ac erat. Sed suscipit, enim in vulputate aliquam, mi ligula viverra enim, vitae mollis tortor metus ac sapien. Maecenas risus ligula, viverra eget, sagittis at, ultrices in, ante. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc dictum. Praesent gravida neque quis odio. Mauris lacus nulla, iaculis eu, commodo sit amet, molestie sed, diam. In vel ligula.
#1###
#2#########
#3###
#4###
#5###
|
%1%%%
%2%%%
%3%%%
%4%%%
%5%%%%%%%%%%%
|
| booo! |
#6###
#7#########
#8###
#9###
#10##
|
%6%%%
%7%%%
%8%%%
%9%%%
%10%%%%%%%%%%
|
http://xyz
|;
my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2});
my $expected =
q| BIG
===
How's it going?
How's it going?
How's it going?
How's it going?
How's it going?
How's it going?
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec hendrerit
venenatis dolor. Suspendisse in neque id odio auctor porttitor. Cras
adipiscing, orci in venenatis semper, nibh tortor posuere magna, ac blandit
sapien purus et ligula. Proin et libero. Duis pellentesque, tellus a viverra
pretium, lacus urna fermentum elit, et tempor nibh urna ac erat. Sed suscipit,
enim in vulputate aliquam, mi ligula viverra enim, vitae mollis tortor metus
ac sapien. Maecenas risus ligula, viverra eget, sagittis at, ultrices in,
ante. Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Nunc dictum. Praesent gravida neque quis odio. Mauris
lacus nulla, iaculis eu, commodo sit amet, molestie sed, diam. In vel ligula.
#1### %1%%%
#2######### %2%%%
#3### %3%%%
#4### %4%%%
#5### %5%%%%%%%%%%%
booo!
#6### %6%%%
#7######### %7%%%
#8### %8%%%
#9### %9%%%
#10## %10%%%%%%%%%%
[1]http://xyz
1. http://xyz
|;
use Test::More tests=>1;
is $text, $expected,
'HTML::FormatText::WithLinks::AndTables->convert($html,{rm=>80,cellpadding=>2})';
HTML-FormatText-WithLinks-AndTables/t/boilerplate.t 0000755 0001750 0001750 00000002475 11764241375 022047 0 ustar sfryer sfryer #!perl -T
use strict;
use warnings;
use Test::More tests => 3;
sub not_in_file_ok {
my ($filename, %regex) = @_;
open( my $fh, '<', $filename )
or die "couldn't open $filename for reading: $!";
my %violated;
while (my $line = <$fh>) {
while (my ($desc, $regex) = each %regex) {
if ($line =~ $regex) {
push @{$violated{$desc}||=[]}, $.;
}
}
}
if (%violated) {
fail("$filename contains boilerplate text");
diag "$_ appears on lines @{$violated{$_}}" for keys %violated;
} else {
pass("$filename contains no boilerplate text");
}
}
sub module_boilerplate_ok {
my ($module) = @_;
not_in_file_ok($module =>
'the great new $MODULENAME' => qr/ - The great new /,
'boilerplate description' => qr/Quick summary of what the module/,
'stub function definition' => qr/function[12]/,
);
}
TODO: {
local $TODO = "Need to replace the boilerplate text";
not_in_file_ok(README =>
"The README is used..." => qr/The README is used/,
"'version information here'" => qr/to provide version information/,
);
not_in_file_ok(Changes =>
"placeholder date/time" => qr(Date/time)
);
module_boilerplate_ok('lib/HTML/FormatText/WithLinks/AndTables.pm');
}
HTML-FormatText-WithLinks-AndTables/t/pod.t 0000755 0001750 0001750 00000000350 11764241375 020315 0 ustar sfryer sfryer #!perl -T
use strict;
use warnings;
use Test::More;
# Ensure a recent version of Test::Pod
my $min_tp = 1.22;
eval "use Test::Pod $min_tp";
plan skip_all => "Test::Pod $min_tp required for testing POD" if $@;
all_pod_files_ok();
HTML-FormatText-WithLinks-AndTables/Changes 0000755 0001750 0001750 00000000335 11764241375 020401 0 ustar sfryer sfryer Revision history for HTML-FormatText-WithLinks-AndTables
0.02 Thu 7 Jun 2012
Modified so content text of "0" prints properly
0.01 Wed 10 Dec 2008
First version, released on an unsuspecting world.
HTML-FormatText-WithLinks-AndTables/Makefile.PL 0000755 0001750 0001750 00000001237 11764241375 021062 0 ustar sfryer sfryer use strict;
use warnings;
use ExtUtils::MakeMaker;
WriteMakefile(
NAME => 'HTML::FormatText::WithLinks::AndTables',
AUTHOR => 'Shaun Fryer ',
VERSION_FROM => 'lib/HTML/FormatText/WithLinks/AndTables.pm',
ABSTRACT_FROM => 'lib/HTML/FormatText/WithLinks/AndTables.pm',
PL_FILES => {},
PREREQ_PM => {
'Test::More' => 0,
'HTML::TreeBuilder' => 0,
'HTML::FormatText::WithLinks' => 0,
},
dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', },
clean => { FILES => 'HTML-FormatText-WithLinks-AndTables-*' },
);
HTML-FormatText-WithLinks-AndTables/lib/ 0000755 0001750 0001750 00000000000 11764241375 017650 5 ustar sfryer sfryer HTML-FormatText-WithLinks-AndTables/lib/HTML/ 0000755 0001750 0001750 00000000000 11764241375 020414 5 ustar sfryer sfryer HTML-FormatText-WithLinks-AndTables/lib/HTML/FormatText/ 0000755 0001750 0001750 00000000000 11764241375 022511 5 ustar sfryer sfryer HTML-FormatText-WithLinks-AndTables/lib/HTML/FormatText/WithLinks/ 0000755 0001750 0001750 00000000000 11764241375 024425 5 ustar sfryer sfryer HTML-FormatText-WithLinks-AndTables/lib/HTML/FormatText/WithLinks/AndTables.pm 0000755 0001750 0001750 00000030151 11764241375 026623 0 ustar sfryer sfryer package HTML::FormatText::WithLinks::AndTables;
use strict;
use warnings;
use base 'HTML::FormatText::WithLinks';
use HTML::TreeBuilder;
################################################################################
# configuration defaults
################################################################################
my $cellpadding = 1; # number of horizontal spaces to pad interior of cells
my $no_rowspacing = 0; # boolean, suppress space between table rows and rows with empty | s
################################################################################
=head1 NAME
HTML::FormatText::WithLinks::AndTables - Converts HTML to Text with tables in tact
=head1 VERSION
Version 0.01
=cut
our $VERSION = '0.02';
=head1 SYNOPSIS
use HTML::FormatText::WithLinks::AndTables;
my $text = HTML::FormatText::WithLinks::AndTables->convert($html);
Or optionally...
my $conf = { # same as HTML::FormatText excepting below
cellpadding => 2, # defaults to 1
no_rowspacing => 1, # bool, suppress vertical space between table rows
};
my $text = HTML::FormatText::WithLinks::AndTables->convert($html, $conf);
=head1 DESCRIPTION
This module was inspired by HTML::FormatText::WithLinks which has proven to be a
useful `lynx -dump` work-alike. However one frustration was that no other HTML
converters I came across had the ability to deal affectively with HTML s.
This module can in a rudimentary sense do so. The aim was to provide facility to take
a simple HTML based email template, and to also convert it to text with the
structure in tact for inclusion as "multipart/alternative" content. Further, it will
preserve both the formatting specified by the tag's "align" attribute, and will
also preserve multiline text inside of a | element provided it is broken using
tags.
=head2 EXPORT
None by default.
=head1 METHODS
=head2 convert
=cut
my $parser_indent = 3; # HTML::FormatText::WithLinks adds this indent to data
my $conf_defaults = {};
# the one and only public interface
sub convert {
shift if $_[0] eq __PACKAGE__; # to make it function friendly
my ($html, $conf) = @_;
# over-ride our defaults
if ($conf and ref $conf eq 'HASH') {
$no_rowspacing = $$conf{no_rowspacing} if $$conf{no_rowspacing};
delete $$conf{no_rowspacing};
$cellpadding = $$conf{cellpadding} if $$conf{cellpadding};
delete $$conf{cellpadding};
%$conf_defaults = (%$conf_defaults, %$conf);
}
return __PACKAGE__->new->parse($html);
}
# sub-class configure
sub configure {
shift()->SUPER::configure($conf_defaults);
}
# sub-class parse
sub parse {
my $self = shift;
my $html = shift;
return undef unless defined $html;
return '' if $html eq '';
my $tree = HTML::TreeBuilder->new->parse( $html );
return $self->_format_tables( $tree ); # we work our magic...
}
# a private method
sub _format_tables {
my $self = shift;
my $tree = shift;
my $formatted_tables = []; # a nested stack for our formatted table text
# the result of an all night programming session...
#
# essentially we take two passes over each table
# and modify the structure of text and html by replacing content with tokens
# then replacing the tokens after _parse() has converted it to text
#
# for each | in each ...
# we grab all it's inner text (and/or parsed html), rearrange it into a
# single string of formatted text, and put a token into it's first |
# once we have processed the html with _parse(), we replace the tokens with the
# corresponding formatted text
my @tables = $tree->look_down(_tag=>'table');
my $table_count = 0;
for my $table (@tables) {
$formatted_tables->[$table_count] = [];
my @trs = $table->look_down(_tag=>'tr');
my @max_col_width; # max column widths by index
my @max_col_heights; # max column heights (for multi-line text) by index
my @col_lines; # a stack for our redesigned rows of column ( | ) text
FIRST_PASS: {
my $row_count = 0; # obviously a counter...
for my $tr (@trs) { # *** 1st pass over rows
$max_col_heights[$row_count] = 0;
$col_lines[$row_count] = [];
my @cols = $tr->look_down(_tag=>'td'); # no support for | . sorry.
for (my $i = 0; $i < scalar @cols; $i++) {
my $td = $cols[$i]->clone;
my $new_tree = HTML::TreeBuilder->new;
$new_tree->{_content} = [ $td ];
# parse the contents of the td into text
# this doesn't work well with nested tables...
my $text = __PACKAGE__->new->_parse($new_tree);
# we don't want leading or tailing whitespace
$text =~ s/^\s+//s;
$text =~ s/\s+\z//s;
# now we figure out the maximum widths and heights needed for each column
my $max_line_width = 0;
my @lines = split "\n", $text; # take the parsed text and break it into virtual rows
$max_col_heights[$row_count] = scalar @lines if scalar @lines > $max_col_heights[$row_count];
for my $line (@lines) {
my $line_width = length $line;
$max_line_width = $line_width if $line_width > $max_line_width;
}
$cols[$i]->{_content} = [ $text ];
$max_col_width[$i] ||= 0;
$max_col_width[$i] = $max_line_width if $max_line_width > $max_col_width[$i];
# now put the accumulated lines onto our stack
$col_lines[$row_count]->[$i] = \@lines;
}
$tr->{_content} = \@cols;
$row_count++;
}
}
SECOND_PASS: {
my $row_count = 0; # obviously, another counter...
for my $tr (@trs) { # *** 2nd pass over rows
my @cols = $tr->look_down(_tag=>'td'); # no support for | . sorry.
my $row_text; # the final string representing each row of reformatted text
my @col_rows; # a stack for each virtual $new_line spliced together from a group of | 's
# iterate over each column of the maximum rows of parsed multiline text per |
# for each virtual row of each virtual column, concat the text with alignment spacings
# the final concatinated string value will be placed in column 0
for (my $j = 0; $j < $max_col_heights[$row_count]; $j++) {
my $new_line;
for (my $i = 0; $i < scalar @cols; $i++) { # here are the actual | elements we're iterating over...
my $width = $max_col_width[$i] + $cellpadding; # how wide is this column of text
my $line = $col_lines[$row_count]->[$i]->[$j]; # get the text to fit into it
$line = defined $line ? $line : '';
# strip the whitespace from beginning and end of each line
$line =~ s/^\s+//gs;
$line =~ s/\s+\z//gs;
my $n_space = $width - length $line; # the difference between the column and text widths
# we are creating virtual rows of text within a single |
# so we need to add an indent to all but the first row to
# match the indent added by _parse() for presenting table contents
$line = ((' ')x$parser_indent). $line if $j != 0 and $i == 0;
# here we adjust the text alignment by wrapping the text in occulted whitespace
my $justify = $cols[$i]->tag eq 'td' ? ( $cols[$i]->attr('align') || 'left' ) : 'center';
if ($justify eq 'center') {
my $pre = int( ($n_space + $cellpadding) / 2 ); # divide remaining space in half
my $post = $n_space - $pre; # assign any uneven remainder to the end
$new_line .= ((' ')x$pre). $line .((' ')x$post); # wrap the text in spaces
} elsif ($justify eq 'left') {
$new_line .= ((' ')x$cellpadding). $line .((' ')x$n_space);
} else {
$new_line .= ((' ')x$n_space). $line .((' ')x$cellpadding);
}
}
$new_line .= "\n" if $j != $max_col_heights[$row_count] - 1; # add a newline to all but the last text row
$col_rows[$j] = $new_line; # put the line into the stack for this row
}
$row_text .= $_ for @col_rows;
for (my $i = 1; $i < scalar @cols; $i++) {
$cols[$i]->delete; # get rid of unneeded | 's
}
# put the fully formatted text into our accumulator
$formatted_tables->[$table_count]->[$row_count] = $row_text;
$cols[0]->content->[0] = "__TOKEN__${table_count}__${row_count}__"; # place a token into the row at col 0
$row_count++;
}
}
$table_count++;
}
# now replace our tokens
my $text = $self->_parse( $tree );
for (my $i = 0; $i < scalar @$formatted_tables; $i++) {
for (my $j = 0; $j < scalar @{ $$formatted_tables[$i] }; $j++) {
my $token = "__TOKEN__${i}__${j}__";
$token .= "\n?" if $no_rowspacing;
my $new_text = $$formatted_tables[$i][$j];
$text =~ s/$token/$new_text/;
}
}
return $text;
}
1;
__END__
=head1 EXAMPLE
Given the HTML below ...
Name: |
Mr. Foo Bar |
Address: |
#1-276 Quux Lane,
Schenectady, NY, USA,
12345
|
Email: |
foo@bar.baz |
... the (default) return value of convert() will be as follows.
Name: Mr. Foo Bar
Address: #1-276 Quux Lane,
Schenectady, NY, USA,
12345
Email: [1]foo@bar.baz
1. mailto:foo@bar.baz
=head1 SEE ALSO
HTML::FormatText::WithLinks
HTML::TreeBuilder
=head1 CAVEATS
* This does not handle | elements whatsoever!
* It assumes a fixed width font for display of resulting text.
* It doesn't work well on nested s or other nested blocks within s.
=head1 AUTHOR
Shaun Fryer, C<< >>
=head1 BUGS
Please report any bugs or feature requests to C, or through
the web interface at L. I will be notifi
ed, and then you'll
automatically be notified of progress on your bug as I make changes.
=head1 SUPPORT
You can find documentation for this module with the perldoc command.
perldoc HTML::FormatText::WithLinks::AndTables
You can also look for information at:
=over 4
=item * RT: CPAN's request tracker
L
=item * AnnoCPAN: Annotated CPAN documentation
L
=item * CPAN Ratings
L
=item * Search CPAN
L
=back
=head1 ACKNOWLEDGEMENTS
Everybody. :)
L
=head1 COPYRIGHT & LICENSE
Copyright 2008 Shaun Fryer, all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=cut
HTML-FormatText-WithLinks-AndTables/MANIFEST 0000755 0001750 0001750 00000000213 11764241375 020232 0 ustar sfryer sfryer Changes
MANIFEST
Makefile.PL
README
lib/HTML/FormatText/WithLinks/AndTables.pm
t/00-load.t
t/pod.t
t/html-formattext-withlinks-andtables.t
| | | |