Regexp-Grammars-1.033/000755 000765 000765 00000000000 12210257320 015251 5ustar00damiandamian000000 000000 Regexp-Grammars-1.033/Build.PL000644 000765 000765 00000001031 12210257117 016544 0ustar00damiandamian000000 000000 use strict; use warnings; use Module::Build; my $builder = Module::Build->new( module_name => 'Regexp::Grammars', license => 'perl', dist_author => 'Damian Conway ', dist_version_from => 'lib/Regexp/Grammars.pm', requires => { 'Test::More' => 0, 'version' => 0, 'perl' => '5.10.0', ($] >= 5.018 ? ( 'Lexical::Var' => 0.007 ) : ()) }, add_to_cleanup => [ 'Regexp-Grammars-*' ], ); $builder->create_build_script(); Regexp-Grammars-1.033/Changes000644 000765 000765 00000017345 12210257301 016555 0ustar00damiandamian000000 000000 Revision history for Regexp-Grammars 1.001_003 Tue Apr 7 08:42:33 2009 Initial public release. 1.001_004 Sun Aug 2 23:08:52 2009 * Fixed mishandling of (??{....}) blocks * Attempted to patch around three-way bug with lexicals in regexes (further testing may be required to ensure patch is effective across various perl configs) 1.001_005 Sun Aug 2 23:08:52 2009 * Rerelease to sync numbering with CPAN 1.002 Tue Dec 8 21:30:11 2009 * Various doc tweaks * Removed intermittent "uninitialized" warnings * Added warning when no main regex specified * Refined error message status indicators (now only errors get an indicator for every separate message; info is consolidated) * Fixed exponentiation associativity for demo_calc (thanks Schwern!) * Fixed bug in charset parsing (thanks Dave!) * Removed false error messages regarding explicitly use of built-in and rules * Fixed bug with negative lookaheads that incorporate subrule calls * Localized $/ during debugger interactions * Added variation to allow distinct "internal" and "external" names for objrules and objtokens (thanks Casiano) * Fixed handling of (?#...) comments (thanks Casiano) * Added pure grammar definitions: * Added inheritance from grammar definitions: * Added fully qualified subrule calls to allow derived rules to call base rules: 1.005 Tue Jun 22 05:41:35 2010 * Tweaked internals to allow matches against stringifying objects, without nasty warnings * Extended demo/calc* to allow negatives outside parens (thanks Steven) * Pod tweaks (thanks Carl) * Added autoaction callbacks * Made @! contents unique (no more duplicates due to backtracking retries) * Made work in the top-level pattern of a grammar * Added set_context_width() to allow width of context string column to be adjusted either permanently or within a scope. (Thanks Daniel) * Added l10n feature for and directives (thanks Aki!) * Added directive * Fixed debugging directives in grammars * Added per-hash key patterns to <%HASH> (Thanks Aki) * Added <\IDENT> backrefs * Added inverserefs 1.008 Fri Sep 17 20:53:31 2010 * Pod nits denitted (thanks Christopher) * Added builtin and subrules * Added list)> and %ARG * Added <:argname> * Added <\:argname> and * Added Lucene example to demos (thanks Christian) * Added directive * Updated diagnostics list * Improved behaviour (and documentation) of non-bracketed separators in ** 1.009 Sun Sep 19 09:11:06 2010 * Rerelease to remove spurious dependencies on Data::Show (thanks Salvatore!) 1.010 Tue Sep 28 08:03:42 2010 * Added documentation warning about non-reentrancy of Perl 5 regex engine (thanks Andrew). * Fixed behaviour of ** repetitions wrt whitespace (thanks Andrew) * Documented more explicitly that start-pattern is supposed to act like a regular regex (or a token) wrt to whitespace 1.011 Sun Oct 10 18:57:10 2010 * Added as alias for: (?! <.RULENAME> ) * Added as alias for: (?= <.RULENAME> ) (and made it work around normal lookahead/capture problem) * Fixed major bugs in <:arg> handling 1.012 Wed Nov 3 20:24:36 2010 * Added RFC5322 example (thanks Tom and Abigail!) * Added <:nocontext> and <:context> directives to optimize away unwanted context substrings. * Solved transitive inheritance problem (grammars now fully polymorphic) * Added NEXT:: namespace for generic polymorphism 1.013 Wed Jun 29 14:39:40 2011 * Improved in-doc calculator example (thanks Jake!) * Improved RFC5322 example (thanks Tom and Abigail) * Added directive (thanks Dan) * Added directive * Added better compile-time debugging of standard Perl subpatterns * Added documentation of problems when using objrules whose ctors re-invoke the regex engine (thanks Nathan) * Added new tests for objrules whose classes are based on Moose or autoloading (thanks Nathan!) 1.014 Wed Nov 2 13:57:09 2011 * Improved description of directive to make it clearer that errors manifest in @! variable (thanks Leigh) * Added t/error_non_backtracking.t and demo/demo_error_nonbacktracking.pl to demonstrate use of (*COMMIT) to optimize error messages (thanks Nicolas) * Removed undocumented dependency of test suite on Class::Accessor (thanks Duff) * Tweaked caveats section to reflect improvements both in module (grammar inheritance now fully polymorphic) and in Perl 5.14 regexes (regexes now reentrant) * Fixed problem with \N{NAMED CHARS} under 5.12 and later (thanks Tom!) * Added *% +% and {n,m}% separated repetitions to track the Perl 6 feature 1.015 Wed Feb 29 12:37:25 2012 * Enabled limited support for tracking raw regex components when debugging a grammar (mainly literals and backslashed metacharacters) * Fixed bug that prevented named subpattern captures from including lookbehinds (e.g. didn't work) 1.016 Sat Mar 10 07:01:30 2012 * Fixed omissions in charset recognition within metagrammar (now handles \] and otehr escapes correctly) 1.020 Thu Aug 16 14:13:03 2012 * Fixed licence generation in Makefile.PL * Fixed issues with the stupid behaviour of Perl 5.17+ wrt (un)backslashed {'s * Fixed bad code in SYNOPSIS example (Thanks Paul!) * Fixed bad code in demo/demo_pos.pl (Thanks Peng) * Corrected docs for (Thanks Peng) * BACKWARDS INCOMPATIBLE CHANGE!!!! Due to limitations in Perl's qr overload The <\IDENT> backreference syntax has had to be changed to <\_IDENT> 1.021 Mon Aug 20 13:55:34 2012 * Made Latin-1 encoding of docs explicit 1.022 Tue Jan 22 18:39:16 2013 * Fixed minimization (by handling nocontext marker correctly) (thanks Thomas!) 1.025 Wed Jan 30 09:39:52 2013 * Reuploaded with no substantive modifications 1.026 Thu Jan 31 08:20:14 2013 * Re-fixed minimization (by handling nocontext marker correctly) (thanks again Thomas!) 1.027 Fri May 10 07:43:52 2013 * Fixed nasty bug where 0 used as an atom (many thanks Arseny!) 1.028 Sat May 11 06:00:00 2013 * Fixed very nasty caching bug within interpolation support (many more thanks Arseny!) 1.029 Tue Jun 25 15:44:35 2013 * Doc patch (thanks Steven) * Added workarounds for some unfortunate changes in 5.18 behaviour (thanks Steven) * CRITICAL: Added warning regarding fundamental and intractable incompatibilities with Perl 5.18, and announcing that Regexp::Grammars is not supported under that version of Perl. :-( 1.030 Wed Jun 26 07:17:41 2013 * Added essential-but-missing Skip_if_Perl_518 to the MANIFEST 1.031 Mon Aug 19 09:55:37 2013 * Updated warning re 5.18 incompatibilities. Some progress has been made, but a complete solution is still at least a month away, possibly longer. * Added dependency on Lexical::Var under Perl 5.18 to overcome problem with magic pseudo-variables 1.032 Thu Aug 29 10:44:21 2013 * Culled stray DB::single = 1 (thanks Robert!) 1.033 Sat Aug 31 13:10:57 2013 * Listed dependency on Lexical::Var under Perl 5.18 to overcome problem with magic pseudo-variables (thanks Andreas!) Regexp-Grammars-1.033/demo/000755 000765 000765 00000000000 12210257320 016175 5ustar00damiandamian000000 000000 Regexp-Grammars-1.033/lib/000755 000765 000765 00000000000 12210257320 016017 5ustar00damiandamian000000 000000 Regexp-Grammars-1.033/Makefile.PL000644 000765 000765 00000001176 12210257074 017236 0ustar00damiandamian000000 000000 use strict; use warnings; use ExtUtils::MakeMaker; WriteMakefile( NAME => 'Regexp::Grammars', AUTHOR => 'Damian Conway ', LICENSE => 'perl', VERSION_FROM => 'lib/Regexp/Grammars.pm', ABSTRACT_FROM => 'lib/Regexp/Grammars.pm', PL_FILES => {}, PREREQ_PM => { 'Test::More' => 0, 'version' => 0, ($] >= 5.018 ? ( 'Lexical::Var' => 0.007 ) : ()) }, dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, clean => { FILES => 'Regexp-Grammars-*' }, ); Regexp-Grammars-1.033/MANIFEST000644 000765 000765 00000003774 12210257320 016415 0ustar00damiandamian000000 000000 Build.PL Changes MANIFEST Makefile.PL README lib/Regexp/Grammars.pm lib/Skip_if_Perl_5_18.pm t/00.load.t t/alias_literal.t t/aliased_subpatterns.t t/arg.t t/autoflatten.t t/backref.t t/backref_ARG.t t/charset.t t/class_accessor.t t/comment.t t/data_structure.t t/error.t t/error_non_backtracking.t t/error_translate.t t/fatal.t t/fwdref.t t/grammar.t t/grammar_polymorphism.t t/hash.t t/hash_redef.t t/hash_redef_local.t t/inline_computation.t t/inline_computation_handler.t t/inline_computation_obj_handler.t t/local_ws.t t/lookaheads.t t/matchline.t t/matchpos.t t/moose.t t/neg_lookahead.t t/new.t t/new_init.t t/new_init_autoload.t t/new_init_limited_autoload.t t/new_init_limited_autoload_warn.t t/no_context.t t/no_context_counterlocal.t t/no_context_local.t t/obj_rename.t t/pod.t t/repop_ws.t t/seplist.t t/timeout.t t/top_is_token.t demo/demo_IP4.pl demo/demo_LaTeXish.pl demo/demo_LaTeXish_dump.pl demo/demo_Lucene_query.pl demo/demo_calc.pl demo/demo_calc_class.pl demo/demo_calc_inline.pl demo/demo_calc_list.pl demo/demo_calc_list_autoactions.pl demo/demo_calc_list_inline.pl demo/demo_debug.pl demo/demo_error.pl demo/demo_error_non_backtracking.pl demo/demo_flattening.pl demo/demo_hash.pl demo/demo_hash_lexicon.pl demo/demo_hash_lookup_generated.pl demo/demo_hash_lookup_hardcoded.pl demo/demo_hash_lookup_hashrule.pl demo/demo_hash_symtab.pl demo/demo_hashlines.pl demo/demo_list.pl demo/demo_metagrammar.pl demo/demo_pos.pl demo/demo_raw_debug.pl demo/demo_require.pl demo/demo_rfc5322.pl demo/demo_whoson.pl t/charnames.t t/seplist_countedhash_0.t t/seplist_countedhash_0_.t t/seplist_countedhash_0_1.t t/seplist_countedhash_0_N.t t/seplist_countedhash_1.t t/seplist_countedhash_1_.t t/seplist_countedhash_1_N.t t/seplist_countedhash_M_.t t/seplist_countedhash_M_N.t t/seplist_countedhash_N.t t/seplist_greediness.t t/seplist_plushash.t t/seplist_questionmark.t t/seplist_rawhash.t t/seplist_starhash.t t/error_non_hash_based.t t/minimize_bug.t META.yml Module meta-data (added by MakeMaker) Regexp-Grammars-1.033/META.yml000644 000765 000765 00000001123 12210257320 016517 0ustar00damiandamian000000 000000 --- #YAML:1.0 name: Regexp-Grammars version: 1.033 abstract: Add grammatical parsing features to Perl 5.10 regexes author: - Damian Conway license: perl distribution_type: module configure_requires: ExtUtils::MakeMaker: 0 build_requires: ExtUtils::MakeMaker: 0 requires: Test::More: 0 version: 0 no_index: directory: - t - inc generated_by: ExtUtils::MakeMaker version 6.57_05 meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 Regexp-Grammars-1.033/README000644 000765 000765 00000001171 12210257301 016130 0ustar00damiandamian000000 000000 Regexp::Grammars version 1.033 This module adds a small number of new regex constructs that can be used within Perl 5.10 patterns to implement complete recursive-descent parsing. INSTALLATION To install this module, run the following commands: perl Makefile.PL make make test make install Alternatively, to install with Module::Build, you can use the following commands: perl Build.PL ./Build ./Build test ./Build install DEPENDENCIES Perl 5.10. COPYRIGHT AND LICENCE Copyright (C) 2009, Damian Conway This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Regexp-Grammars-1.033/t/000755 000765 000765 00000000000 12210257320 015514 5ustar00damiandamian000000 000000 Regexp-Grammars-1.033/t/00.load.t000644 000765 000765 00000000210 12204055461 017034 0ustar00damiandamian000000 000000 use Test::More tests => 1; BEGIN { use_ok( 'Regexp::Grammars' ); } diag( "Testing Regexp::Grammars $Regexp::Grammars::VERSION" ); Regexp-Grammars-1.033/t/alias_literal.t000644 000765 000765 00000001066 12161417502 020516 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $test_grammar = qr{ one | negtwo | str }xms; no Regexp::Grammars; ok "one" =~ $test_grammar => 'One matched'; is $/{alt}, 1 => 'Correct alternative'; ok "negtwo" =~ $test_grammar => 'NegTwo matched'; is $/{alt}, -20 => 'Correct alternative'; ok "str" =~ $test_grammar => 'Str matched'; is $/{alt}, "str'ing" => 'Correct alternative'; Regexp-Grammars-1.033/t/aliased_subpatterns.t000644 000765 000765 00000002051 12162225203 021734 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ | <_pat='".*"'> | }xms }; my $WARNINGS; my $lookbehind = do { use Regexp::Grammars; BEGIN { close *Regexp::Grammars::LOGFILE; open *Regexp::Grammars::LOGFILE, '>', \$WARNINGS; } qr{ }xms; }; if ($] < 5.018) { ok !defined $WARNINGS => 'No warnings'; } ok +('"abc"' =~ $parser) => 'Matched '; is $/{str}, '"abc"' => 'Captured correctly'; ok +(42 =~ $parser) => 'Matched '; is $/{num}, 42 => 'Captured correctly'; ok +('true' =~ $parser) => 'Matched '; is $/{bool}, 'true or false' => 'Pseudo-captured correctly'; ok +('barfoo' !~ $lookbehind) => 'Neg lookbehind worked'; ok +('foo' !~ $lookbehind) => 'Pos lookbehind worked'; ok +('carfoo' =~ $lookbehind) => 'Both lookbehinds worked'; is $/{foo}, 'foo' => 'Pseudo-captured correctly'; Regexp-Grammars-1.033/t/arg.t000644 000765 000765 00000002660 12204256243 016464 0ustar00damiandamian000000 000000 use strict; use 5.010; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ 'fo+/')> | 'end')> | <[revkeyword=unkeyword(?{ keyword => scalar reverse $MATCH{keyword} })]> (??{ quotemeta( ($ARG{prefix}//q{}) . $ARG{keyword} ) }) (<:delim>) }xms; }; #ok 'fooxdaa' !~ $test_grammar => 'Fail'; ok 'fooxoof' =~ $test_grammar => 'Match reverse'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is_deeply $/{revkeyword}, ['oof'] => 'Revkeyword as expected'; ok 'fooxendfoo' =~ $test_grammar => 'Match end'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{unkeyword}, 'endfoo' => 'Unkeyword as expected'; ok 'fooxfoo/' =~ $test_grammar => 'Match /'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is_deeply $/{dekeyword}, { "" =>'foo/', 'terminator'=>'foo/' } => 'Dekeyword as expected'; Regexp-Grammars-1.033/t/autoflatten.t000644 000765 000765 00000002114 12161417561 020237 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ | | | 1)> \d++ ' ' | <[Dash=(-)]> ** <[Dot=(\.)]> }xms }; ok +("'abc'" =~ $parser) => 'Matched str'; is_deeply $/{str}, { q{} => "'abc'", content => 'abc' } => 'Unflattened correctly'; ok +(42 =~ $parser) => 'Matched num'; is $/{num}, 42 => 'Flattened correctly'; ok +('true' =~ $parser) => 'Matched true'; is $/{bool}, 't' => 'Flattened correctly'; ok +('false' =~ $parser) => 'Matched false'; is $/{bool}, 'alse' => 'Flattened correctly'; ok +('-.-.-' =~ $parser) => 'Matched list'; is_deeply $/{list}, { q{}=>'-.-.-', Dash=>['-','-','-'], Dot=>['.','.'] } => 'Flattened correctly'; ok +('-' =~ $parser) => 'Matched minimized list'; is $/{list}, '-' => 'Flattened correctly'; Regexp-Grammars-1.033/t/backref.t000644 000765 000765 00000003346 12161417575 017323 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $delimited = qr{ <\_delim> }xms; my $delimited_cap = qr{ }xms; my $delimited_listcap = qr{ <[rdel=\_delim]> }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($input, $expected_outcome) = split /\s*:\s*/, $input; if ($expected_outcome eq 'succeed') { ok +($input =~ $delimited) => "Match of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; ok +($input =~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; is $/{rdel}, substr($input,0,1) => "Captured backreference"; ok +($input =~ $delimited_listcap) => "Match and list capture of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; is_deeply $/{rdel}, [substr($input,0,1)] => "Captured backreference"; } else { ok +($input !~ $delimited) => "Match of $input ${expected_outcome}ed"; ok +($input !~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; ok +($input !~ $delimited_listcap) => "Match and list of $input ${expected_outcome}ed"; } } __DATA__ 'a' :succeed "abc" :succeed `` :succeed 'abc" :fail `abc' :fail Regexp-Grammars-1.033/t/backref_ARG.t000644 000765 000765 00000001513 12161417617 020003 0ustar00damiandamian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ | end_ <\:keyword> }xms; }; ok 'fooxend_foo' =~ $test_grammar => 'Match end'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{end_keyword}, 'end_foo' => 'End_keyword as expected'; ok 'fooxoof' =~ $test_grammar => 'Match rev'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{rev_keyword}, 'oof' => 'End_keyword as expected'; Regexp-Grammars-1.033/t/charnames.t000644 000765 000765 00000000330 12161417637 017654 0ustar00damiandamian000000 000000 use Test::More 'no_plan'; use 5.010; use charnames ':full'; use Regexp::Grammars; my $grammar = qr{ \N{LESS-THAN SIGN} a \N{GREATER-THAN SIGN} }xms; ok '' =~ $grammar => '\N{NAMED} correctly interpolated' Regexp-Grammars-1.033/t/charset.t000644 000765 000765 00000002447 12204256362 017351 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; use Regexp::Grammars; # This checks for a bug where [^\]] was not interpreted as a charset. my $bracket_bug = qr{ \[ \] }xms; my $escaped_bs = qr{ \[ \] }xms; my $old_bracket = qr{ \[ \] }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my ( $text, $to_match ) = split /:/, $input; if ( $to_match =~ $bracket_bug ) { ok( 'matched bracketed text with [^\]]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^\]]+' ); } if ( $to_match =~ $escaped_bs ) { ok( 'matched bracketed text with [^\\]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^\\]+' ); } if ( $to_match =~ /$old_bracket/ ) { ok( 'matched bracketed text with [^]]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^]]+' ); } } __DATA__ some text:[some text] and more text :[ and more text ] Regexp-Grammars-1.033/t/class_accessor.t000644 000765 000765 00000003152 12161422160 020672 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my ($class, $data_ref) = @_; my $newobj = bless $data_ref, $class; $newobj->{ctor} = 'was called'; return $newobj; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", "ctor" => "was called", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/comment.t000644 000765 000765 00000000437 12161422171 017352 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ ^ $ (?# ) a test # some here }xms; }; my $target = q{a test}; ok +($target =~ $parser) => 'Matched'; Regexp-Grammars-1.033/t/data_structure.t000644 000765 000765 00000023764 12204256252 020754 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "" => "\\documentclass[a4paper,11pt]{article}", "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "" => "article", "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "" => "[a4paper,11pt]", "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "" => "\n\\usepackage{latexsym}", "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "" => "latexsym", "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\author{D. Conway}", "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "" => "D.", "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "" => " Conway", "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\title{Parsing \\LaTeX{}}", "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "" => "Parsing", "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "" => " \\LaTeX{}", "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\begin{document}", "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "" => "document", "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\maketitle\n", "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "" => "\\tableofcontents\n", "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "" => "\\section{Description}", "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "" => "Description", "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n...is", "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "" => " easy", "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "" => " \\footnote{But not \\emph{necessarily} simple}", "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "" => "But", "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "" => " not", "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "" => " \\emph{necessarily}", "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "" => "necessarily", "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "" => " simple", "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "" => ".", "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "" => "\n\\end{document}", "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "" => "document", "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; is $/{""}, $original_input => 'Captured entire text'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.033/t/error.t000644 000765 000765 00000005026 12161422313 017036 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ \A (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) | }xms }; local $/ = ""; while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", "Can't match subrule (not implemented)", ] 1+2zoo [ "Extra junk after expression at index 1: '+2zoo'", "Expected end of input, but found '+2zoo' instead", "Expected valid input, but found '+2zoo' instead", "Can't match subrule (not implemented)", ] zoo [ "Expected literal, but found 'zoo' instead", "Can't match subrule (not implemented)", ] Regexp-Grammars-1.033/t/error_non_backtracking.t000644 000765 000765 00000004614 12161422322 022415 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ \A (*COMMIT) (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) | }xms }; local $/ = ""; while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", ] 1+2zoo [ "Extra junk after expression at index 3: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", ] zoo [ "Expected literal, but found 'zoo' instead", "Can't match subrule (not implemented)", ] Regexp-Grammars-1.033/t/error_non_hash_based.t000644 000765 000765 00000001003 12161422332 022041 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $goodobj = do{ use Regexp::Grammars; qr{ obj }xms }; my $badobj = do{ use Regexp::Grammars; qr{ obj }xms }; close *STDERR; ok 'obj' =~ $goodobj => 'GoodObj'; ok 'obj' !~ $badobj => 'BadObj'; package GoodObj; sub new { bless {}, shift; } package BadObj; sub new { bless [], shift; } Regexp-Grammars-1.033/t/error_translate.t000644 000765 000765 00000006231 12161422341 021113 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; sub translator { my ($errormsg, $rulename, $context) = @_; if (substr($rulename,0,1) eq '-') { $rulename = substr($rulename,1); return "You forgot to define $rulename. :-("; } if ($errormsg eq q{}) { if ($rulename) { return "<$rulename> failed to match '$context'"; } else { return "Main pattern failed to match '$context'"; } } if (lc(substr($errormsg,0,6)) eq 'wanted') { return "$errormsg, but was given '$context'. What's up with that?"; } return $errormsg; } my $calculator = do{ use Regexp::Grammars; qr{ \A (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) <...> | }xms }; local $/ = ""; { my $temp = Regexp::Grammars::set_error_translator(\&translator); while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } } { use Regexp::Grammars; if ('foo' =~ m{ <...> }xms) { fail 'Restore default translator'; } else { is_deeply \@!, ["Can't match subrule (not implemented)"] => 'Restore default translator'; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Wanted end of input, but was given 'zoo'. What's up with that?", "Main pattern failed to match 'zoo'", "You forgot to define Trailing_stuff. :-(", ] zoo [ " failed to match 'zoo'", "You forgot to define Trailing_stuff. :-(", ] Regexp-Grammars-1.033/t/fatal.t000644 000765 000765 00000002501 12204256365 017001 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; my %AcceptableVersions = ( '0.95' => 1, '0.98' => 1, '1.01' => 1, ); my %UnacceptableVersions = ( '0.99' => 1, '1.00' => 1, ); my $version_checker = do{ use Regexp::Grammars; qr{ vers = <%AcceptableVersions> | vers = | vers = }xms; }; ok 'vers = 0.95' =~ $version_checker => 'Matched version 0.95'; ok @! == 0 => 'with no error messages'; ok 'vers = 0.99' !~ $version_checker => 'Correctly failed to match version 0.99'; ok @! == 1 => 'with correct number of error messages'; is $![0], 'Cannot parse language version 0.99' => 'with correct error message'; ok 'vers = 0.96' !~ $version_checker => 'Correctly failed to match version 0.96'; ok @! == 1 => 'with correct number of error messages'; is $![0], q{Expected valid language version, but found '0.96' instead} => 'with correct error message'; Regexp-Grammars-1.033/t/fwdref.t000644 000765 000765 00000005100 12161422473 017162 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $delimited = qr{ «»'"`]+)> }xms; my $delimited_cap = qr{ «»'"`]+)> }xms; my $delimited_listcap = qr{ «»'"`]+)> <[rdel=/delim]> }xms; no Regexp::Grammars; while (my $spec = ) { next if $spec !~ /\S/; chomp $spec; my $spec_copy = $spec; my ($input, $expected_outcome) = split /\s*:\s*/, $spec; my ($ldelim, $content, $rdelim) = split /(xxx)/, $input; if ($expected_outcome eq 'succeed') { ok +($input =~ $delimited) => "Match of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; ok +($input =~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; is $/{rdel}, $rdelim => "Captured closer"; ok +($input =~ $delimited_listcap) => "Match and list capture of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; is_deeply $/{rdel}, [$rdelim] => "Captured closer"; } else { ok +($input !~ $delimited) => "Match of $input ${expected_outcome}ed"; ok +($input !~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; ok +($input !~ $delimited_listcap) => "Match and list of $input ${expected_outcome}ed"; } } __DATA__ "xxx" :succeed `xxx' :succeed ``xxx'' :succeed 'xxx" :fail {xxx} :succeed [xxx] :succeed :succeed (xxx) :succeed «xxx» :succeed [[xxx]] :succeed {{{xxx}}} :succeed ((((xxx)))) :succeed <> :succeed ««xxx»» :succeed }xxx{ :succeed ]xxx[ :succeed )xxx( :succeed >xxx< :succeed »xxx« :succeed }}}xxx{{{ :succeed ]]xxx[[ :succeed ))))xxx(((( :succeed >>xxx<< :succeed »»xxx«« :succeed ({xxx}) :succeed (*xxx*) :succeed /*xxx*/ :succeed ifxxxfi :succeed `` :fail 'abc" :fail {xxx{ :fail [xxx[ :fail \( <[Elem]> ** (,) \) }xms; # Derived grammar... qr{ \[ <[Elem]> ** (,) \] [01]+ }xms; # Other grammar (for MI)... qr{ [.-]+ }xms; my $list_of_int = qr{ \d+ }xms; my $list_of_nonint = qr{ [^\d,]+ }xms; my $list_without_elem = qr{ }xms; my $list_of_binary = qr{ }xms; my $list_of_binary_or_nonint = qr{ [^\d,]+ | }xms; my $list_of_morse = qr{ # Elem redefinition from here # List redefinition from here # (requires C3 resolution to work) }xms; no Regexp::Grammars; { local $SIG{__WARN__} = sub { my ($errmsg) = @_; is $errmsg, "Can't match against \n" => "Can't match against pure grammars"; }; ok "" !~ $base_grammar => "Match against pure grammar failed"; } ok '(1,2,3)' !~ $list_without_elem => 'Unrepleaced Elem failed'; is $![0], 'Elem matcher not implemented' => 'Error message correct'; ok '(1,23,456)' =~ $list_of_int => 'Polymorphic Elem worked'; is_deeply $/{List}{Elem}, [1,23,456] => 'Extracted correct data'; ok '(a,bc,def)' =~ $list_of_nonint => 'Polymorphic Elem worked again'; is_deeply $/{List}{Elem}, ['a','bc','def'] => 'Extracted correct data'; ok '[0,10,010]' =~ $list_of_binary => '2nd order inheritance worked'; is_deeply $/{List}{Elem}, ['0','10','010'] => 'Extracted correct data'; ok '[0,bc,010]' =~ $list_of_binary_or_nonint => 'Explicit call to overridden worked'; is_deeply $/{List}{Elem}, ['0','bc','010'] => 'Extracted correct data'; ok '[.,-.,..-]' =~ $list_of_morse => 'Multiple inheritance worked'; is_deeply $/{List}{Elem}, ['.','-.','..-'] => 'Extracted correct data'; Regexp-Grammars-1.033/t/grammar_polymorphism.t000644 000765 000765 00000001021 12161422512 022145 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; qr{ baserule baserule2 }x; qr{ }x; my $baserule = qr{ }x; my $baserule2 = qr{ }x; ok 'baserule' =~ $baserule => 'Specific polymorphism worked'; ok 'baserule2' =~ $baserule2 => 'Generic polymorphism worked'; Regexp-Grammars-1.033/t/hash.t000644 000765 000765 00000002202 12204256263 016630 0ustar00damiandamian000000 000000 use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; my %hash = ( do => 'a deer', re => 'a drop of golden sun', dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash]>+ }xms; }; my $first_only = do { use Regexp::Grammars; qr{ }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash>+ }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; if ($input =~ $listified) { is_deeply $/{WORD}, eval($expected), "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, eval($expected)->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['dore','fa','me','re','do'] dorefamell ['dore','fa','me'] zzzzz FAIL zzzdoremezzz ['dore','me'] Regexp-Grammars-1.033/t/hash_redef.t000644 000765 000765 00000002153 12204256265 020004 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; my %hash = ( dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash]>+ .{2} }xms; }; my $first_only = do { use Regexp::Grammars; qr{ .{2} }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash>+ .{2} }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; if ($input =~ $listified) { is_deeply $/{WORD}, eval($expected), "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, eval($expected)->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['fa','me'] dorefamell ['fa','me'] zzzzz FAIL zzzdoremezzz ['me'] Regexp-Grammars-1.033/t/hash_redef_local.t000644 000765 000765 00000003536 12204256267 021166 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; my %hash = ( dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash{ .{2} }]>+ }xms; }; my $first_only = do { use Regexp::Grammars; qr{ }xms; }; my $first_only_override = do { use Regexp::Grammars; qr{ \w+ }xms; }; my $first_two_override = do { use Regexp::Grammars; qr{ \w+ }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash{..}>+ }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; my $expected_data = eval $expected; if ($input =~ $listified) { is_deeply $/{WORD}, $expected_data, "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, $expected_data->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $first_only_override) { is $/{WORD}, $expected_data->[0], "scalar (override): $input"; } else { is 'FAIL', $expected, "scalar (override): $input"; } if (@{$expected_data} > 1) { if ($input =~ $first_two_override) { is $/{WORD1}, $expected_data->[0], "scalars[0] (override): $input"; is $/{WORD2}, $expected_data->[1], "scalars[1] (override): $input"; } else { is 'FAIL', $expected, "scalars (override): $input"; } } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['fa','me'] dorefamell ['fa','me'] zzzzz FAIL zzzdoremezzz ['me'] Regexp-Grammars-1.033/t/inline_computation.t000644 000765 000765 00000002663 12161422560 021615 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{Operand}}; for my $term (@{$MATCH{Operand}}) { my $op = shift @{$MATCH{Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) <[Operand=Pow]> ** <[Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{Op}}) . $b) } @{$MATCH{Operand}}; }) <[Operand=Term]> ** (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{Operand}}; }) | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } __DATA__ 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.033/t/inline_computation_handler.t000644 000765 000765 00000003015 12161422570 023303 0ustar00damiandamian000000 000000 use 5.010; use warnings; package Calculator; use List::Util qw< reduce >; sub Answer { my ($class, $result_hash) = @_; my $sum = shift @{$result_hash->{Operand}}; for my $term (@{$result_hash->{Operand}}) { my $op = shift @{$result_hash->{Op}}; if ($op eq '+') { $sum += $term; } else { $sum -= $term; } } return $sum; } sub Mult { my ($class, $result_hash) = @_; return reduce { eval($a . shift(@{$result_hash->{Op}}) . $b) } @{$result_hash->{Operand}}; } sub Pow { my ($class, $result_hash) = @_; return reduce { $b ** $a } reverse @{$result_hash->{Operand}}; } use Test::More 'no_plan'; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> <[Operand=Pow]> ** <[Op=(\*|/|%)]> <[Operand=Term]> ** | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator->with_actions('Calculator')) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } __DATA__ 1+1+1 = 3 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.033/t/inline_computation_obj_handler.t000644 000765 000765 00000003256 12161422577 024153 0ustar00damiandamian000000 000000 use 5.010; use warnings; package Calculator; use List::Util qw< reduce >; sub Answer { my ($class, $result_hash) = @_; my $sum = shift @{$result_hash->{Operand}}; for my $term (@{$result_hash->{Operand}}) { my $op = shift @{$result_hash->{Op}}; if ($op eq '+') { $sum += $term; } else { $sum -= $term; } } return $sum; } sub Mult { my ($class, $result_hash) = @_; return reduce { eval($a . shift(@{$result_hash->{Op}}) . $b) } @{$result_hash->{Operand}}; } sub Pow { my ($class, $result_hash) = @_; return reduce { $b ** $a } reverse @{$result_hash->{Operand}}; } use Test::More 'no_plan'; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> <[Operand=Pow]> ** <[Op=(\*|/|%)]> <[Operand=Term]> ** | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator->with_actions(bless {}, 'Calculator')) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } ok +('1+1' =~ $calculator) => "Matched final expression"; use Data::Dumper 'Dumper'; cmp_ok $/{Answer}{""}, 'eq', '1+1' => "Got right answer (hash)"; __DATA__ 1+1+1 = 3 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.033/t/local_ws.t000644 000765 000765 00000003057 12161422606 017517 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $parser = qr{ )*+ > a d d | s u b | m u l \d++ }xms; no Regexp::Grammars; if ('mooouoool 7' =~ $parser) { ok 1 => 'Whitespace overridden'; } else { ok 0 => 'Whitespace overridden'; } my $test_grammar = do { use Regexp::Grammars; qr{ # One type of comment between statements <[statement]> ** ( ; ) # Another type within statements <[arg]> ** ( , ) foo | bar baz }xms; }; my $text = q{ foo baz, baz, baz; # comment bar #{ comment }# baz }; my $expected_result = { "" => "\n foo baz, baz, baz;\n # comment\n bar #{ comment }# baz", "program" => { "" => "\n foo baz, baz, baz;\n # comment\n bar #{ comment }# baz", "statement" => [ { "" => "foo baz, baz, baz", "arg" => ["baz", "baz", "baz"], "cmd" => "foo", }, { "" => "bar #{ comment }# baz", "arg" => ["baz"], "cmd" => "bar" }, ], }, }; ok $text =~ $test_grammar => 'Multiple '; is_deeply \%/, $expected_result => 'Parse is correct'; Regexp-Grammars-1.033/t/lookaheads.t000644 000765 000765 00000000622 12161422614 020020 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $neg_lookahead = qr{ START <[Item]> ** 'END')> END 'END')> \w+ <\:Token> }xms; ok "START do it END END" =~ $neg_lookahead => 'Match'; is_deeply $/{List}{Item}, ["do","it"] => 'Correct match'; Regexp-Grammars-1.033/t/matchline.t000644 000765 000765 00000002646 12161422625 017664 0ustar00damiandamian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ \{ (?: <[matchline]> <[num]> ? )+ | \} | \] \d++ }xms; }; #012345 ok " \n{\naa\n}" =~ $test_grammar => 'Matched test 1'; is $/{startmarker}{at}, 2 => "Aliased "; is $/{startmarker}{after}, 2 => "Post-aliased "; is $/{endmarker}{matchline}, 4 => "Unaliased "; ok ! exists $/{content}{matchline} => "No "; #012345 ok " \n{\naa\n]" =~ $test_grammar => 'Matched test 2'; is $/{startmarker}{at}, 2 => "Aliased "; ok ! exists $/{endmarker}{matchline} => "No unaliased "; ok ! exists $/{content}{matchline} => "No "; #0123456 ok "{11\n22\n\n\n33\n}" =~ $test_grammar => 'Matched test 3'; is $/{startmarker}{at}, 1 => "Aliased "; is $/{endmarker}{matchline}, 6 => "Unaliased "; is_deeply $/{content}{num}, [11,22,33] => "Repeated contents"; is_deeply $/{content}{matchline}, [1,2,5] => "Repeated <[matchline]>"; Regexp-Grammars-1.033/t/matchpos.t000644 000765 000765 00000002612 12161422637 017532 0ustar00damiandamian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ \{ (?: <[matchpos]> <[num]> ? )+ | \} | \] \d++ }xms; }; #012345 ok " {aa}" =~ $test_grammar => 'Matched test 1'; is $/{startmarker}{at}, 2 => "Aliased "; is $/{startmarker}{after}, 3 => "Post-aliased "; is $/{endmarker}{matchpos}, 5 => "Unaliased "; ok ! exists $/{content}{matchpos} => "No "; #012345 ok " {aa]" =~ $test_grammar => 'Matched test 2'; is $/{startmarker}{at}, 2 => "Aliased "; ok ! exists $/{endmarker}{matchpos} => "No unaliased "; ok ! exists $/{content}{matchpos} => "No "; #0123456 ok "{1 2 3}" =~ $test_grammar => 'Matched test 3'; is $/{startmarker}{at}, 0 => "Aliased "; is $/{endmarker}{matchpos}, 6 => "Unaliased "; is_deeply $/{content}{num}, [1,2,3] => "Repeated contents"; is_deeply $/{content}{matchpos}, [1,3,5] => "Repeated <[matchpos]>"; Regexp-Grammars-1.033/t/minimize_bug.t000644 000765 000765 00000001350 12161422667 020373 0ustar00damiandamian000000 000000 use 5.010; use Data::Dumper; use Regexp::Grammars; my $nocontext = qr{ <[expr]>+ % <[sep=([\w,;])]> <[item]>+ % <[op=([+-])]> (\d+) }xms; use Test::More; plan tests => 2; if ("1+2,3" =~ $nocontext) { is_deeply \%/, { 'list' => { 'sep' => [','], 'expr' => [{ 'item' => ['1','2'], 'op' => ['+'] }, '3'] } } => 'Should not minimize'; } else { fail 'Should not minimize (did not match)'; } if ("1" =~ $nocontext) { is_deeply \%/, { 'list' => 1 } => 'Should minimize'; } else { fail 'Should minimize (did not match)'; } Regexp-Grammars-1.033/t/moose.t000644 000765 000765 00000003445 12161422705 017037 0ustar00damiandamian000000 000000 use 5.010; use warnings; BEGIN{ use Test::More; plan skip_all => "Moose required for testing moosey objrules" if !eval{require Moose}; } use Test::More 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... use Moose::Util::TypeConstraints; subtype 'Non::Regex::Int', as 'Num', where { int($_) == $_ }; no Moose::Util::TypeConstraints; { package Speaker; use Moose; has 'name' => (is => 'rw', isa => 'Str'); has 'alias' => (is => 'rw', isa => 'Str'); has 'id' => (is => 'rw', isa => 'Non::Regex::Int'); } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/neg_lookahead.t000644 000765 000765 00000000555 12161422716 020476 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $neg_lookahead = qr{ START <[Item]> ** (?!) \w+ END }xms; ok "START do it END END" =~ $neg_lookahead => 'Match'; is_deeply $/{List}{Item}, ["do","it"] => 'Correct match'; Regexp-Grammars-1.033/t/new.t000644 000765 000765 00000003107 12161422725 016503 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "check" => "check", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/new_init.t000644 000765 000765 00000004365 12161422733 017534 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless $self->can($accessor); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub name { my $self = shift; @_ ? ($self->{name} = shift) : $self->{name}; } sub alias { my $self = shift; @_ ? ($self->{alias} = shift) : $self->{alias}; } sub id { my $self = shift; @_ ? ($self->{id} = shift) : $self->{id}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/new_init_autoload.t000644 000765 000765 00000004326 12161422742 021421 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/new_init_limited_autoload.t000644 000765 000765 00000004502 12161422751 023124 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; my %allowed = map { $_ => 1 } qw(name alias id); sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless (exists $allowed{$accessor}); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/new_init_limited_autoload_warn.t000644 000765 000765 00000004502 12161422760 024153 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; my %allowed = map { $_ => 1 } qw(name alias id); sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless (exists $allowed{$accessor}); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.033/t/no_context.t000644 000765 000765 00000016675 12204256301 020101 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "file" => bless({ "check" => "check", "element" => [ bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; ok !exists $/{""} => 'Entire text not captured'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.033/t/no_context_counterlocal.t000644 000765 000765 00000017706 12204256302 022650 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; ok ! exists $/{""} => "Didn't capture entire text"; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.033/t/no_context_local.t000644 000765 000765 00000020271 12204256303 021240 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; is $/{""}, $original_input => 'Captured entire text'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.033/t/obj_rename.t000644 000765 000765 00000001246 12204256427 020017 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan tests => 5; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> ** (,) \) \d+ }xms; #say $list_nonempty; no Regexp::Grammars; ok +('(1,2)' =~ $list_nonempty) => 'Matched non-empty list'; is ref($/{List}), 'SaveAs' => 'Class naming worked at top level'; ok ! exists $/{List}{q{}} => 'No top-level context'; is ref($/{List}{Value}[0]), 'Some::Other' => 'Class naming worked at 2nd level'; is ref($/{List}{Value}[1]), 'Some::Other' => 'Class naming again at 2nd level'; Regexp-Grammars-1.033/t/pod.t000644 000765 000765 00000000214 12161423032 016460 0ustar00damiandamian000000 000000 use 5.010; use Test::More; eval "use Test::Pod 1.14"; plan skip_all => "Test::Pod 1.14 required for testing POD" if $@; all_pod_files_ok(); Regexp-Grammars-1.033/t/repop_ws.t000644 000765 000765 00000002156 12161423043 017545 0ustar00damiandamian000000 000000 use 5.010; use strict; use warnings; use Regexp::Grammars; use Test::More tests => 4; # The text to match against my $text = 'a' . (' ' x 5) . 'z'; # This should match without backtracking my $repop_match = qr/ \A\Z <[val]> ** \s{5} \w+ /x; # This should NOT match my $repop_nomatch = qr/ \A\Z <[val]> ** \s{3} \w+ /x; # This demonstrates the expected behaviour of $repop_match my $standard_match = qr/ \A\Z <[val]> (?: <[val]> )* \s{5} \w+ /x; # This demonstrates the expected behaviour of $repop_nomatch my $standard_nomatch = qr/ \A\Z <[val]> (?: <[val]> )* \s{3} \w+ /x; ok $text =~ $repop_match => "Repetition operator correctly matches"; ok $text !~ $repop_nomatch => "Repetition operator correctly doesn't match"; ok $text =~ $standard_match => "Simulation correctly matches"; ok $text !~ $standard_nomatch => "Simulation correctly doesn't match"; Regexp-Grammars-1.033/t/seplist.t000644 000765 000765 00000002105 12161423054 017366 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> ** [,] \) \d+ }xms; my $list_empty = qr{ \( (?: <[Value]> ** , x? )? \) (?{ $MATCH{Value} //= [] }) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty) => 'Matched non-empty list:' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } ok +($input_copy =~ $list_empty) => 'Matched possibly-empty list:' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } __DATA__ () : [] (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : [] ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.033/t/seplist_countedhash_0.t000644 000765 000765 00000001464 12204256430 022201 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_parser = qr{ \( <[Value]>{0} % [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input =~ $list_parser xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } __DATA__ () : undef (1) : FAIL (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_0_.t000644 000765 000765 00000001633 12204256431 022337 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>{0,}% \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.033/t/seplist_countedhash_0_1.t000644 000765 000765 00000001566 12161423104 022420 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>{0,1}% \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_0_N.t000644 000765 000765 00000001611 12204256433 022453 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {0,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_1.t000644 000765 000765 00000001607 12204256434 022205 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_1_.t000644 000765 000765 00000001616 12161423127 022341 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.033/t/seplist_countedhash_1_N.t000644 000765 000765 00000001607 12204256435 022463 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_M_.t000644 000765 000765 00000001624 12204256436 022401 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {3,}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : FAIL (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.033/t/seplist_countedhash_M_N.t000644 000765 000765 00000001611 12204256437 022514 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {2,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_countedhash_N.t000644 000765 000765 00000001573 12204256440 022241 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {3}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : FAIL (1,2,3) : [1,2,3] (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_greediness.t000644 000765 000765 00000003166 12161423166 021612 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_greedy = qr{ <[item=Value]>+ % [,] \d+ }xms; my $list_parsimonious = qr{ <[item=Value]>+? % [,] \d+ }xms; my $list_parsimonious_anchored = qr{ <[item=Value]>+? % [,] \d+ }xms; my $list_gluttonous = qr{ <[item=Value]>++ % [,] \d+ }xms; no Regexp::Grammars; my $data = '1,2,3,4,5'; my $data_etc = '1,2,3,4,5etc'; ok +($data =~ $list_greedy) => 'Matched greedy'; is_deeply $/{List}{item}, [1,2,3,4] => '...with correct items'; is $/{List}{after}, ',5' => '...with correct remainder'; ok +($data =~ $list_parsimonious) => 'Matched parsimonious'; is_deeply $/{List}{item}, [1] => '...with correct items'; is $/{List}{after}, ',2,3,4,5' => '...with correct remainder'; ok +($data_etc =~ $list_parsimonious_anchored) => 'Matched parsimonious anchored'; is_deeply $/{List}{item}, [1,2,3,4] => '...with correct items'; is $/{List}{after}, ',5etc' => '...with correct remainder'; ok !($data =~ $list_gluttonous) => 'Did not match gluttonous'; ok +($data_etc =~ $list_gluttonous) => 'Matched gluttonous'; #is_deeply $/{List}{item}, [1,2,3,4,5] => '...with correct items'; is $/{List}{after}, 'etc' => '...with correct remainder'; Regexp-Grammars-1.033/t/seplist_plushash.t000644 000765 000765 00000001346 12161423175 021307 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> +% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.033/t/seplist_questionmark.t000644 000765 000765 00000001563 12161423207 022177 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>? % \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.033/t/seplist_rawhash.t000644 000765 000765 00000001530 12161423215 021103 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $raw_hash = qr{ \A (?: | | | ) \Z \d{1,3} % % \w+ % \d{1,3} \% solution }xms; no Regexp::Grammars; ok +('7% solution' =~ $raw_hash) => 'Matched '; ok exists $/{solution} => '...and matched correct rule'; ok +('7%' =~ $raw_hash) => 'Matched '; ok exists $/{percentage} => '...and matched correct rule'; ok +('%foo' =~ $raw_hash) => 'Matched '; ok exists $/{perl_hash} => '...and matched correct rule'; ok +('%bar % 42%' =~ $raw_hash) => 'Matched '; ok exists $/{perl_mod} => '...and matched correct rule'; Regexp-Grammars-1.033/t/seplist_starhash.t000644 000765 000765 00000001300 12204256443 021263 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_parser = qr{ \( <[Value]> *% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input_copy =~ $list_parser) => 'Matched $input: ' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.033/t/timeout.t000644 000765 000765 00000004300 12204256320 017366 0ustar00damiandamian000000 000000 use 5.010; use warnings; use Test::More; use Skip_if_Perl_5_18; plan 'no_plan'; #=====[ Test zero timeout ]============================================= my %AcceptableVersions = ( '0.95' => 1, '0.98' => 1, '1.01' => 1, ); my $version_checker = do{ use Regexp::Grammars; qr{ vers = <%AcceptableVersions> | vers = }xms; }; ok 'vers = 0.95' =~ $version_checker => 'Matched version 0.95'; ok @! == 0 => 'with no error messages'; ok 'vers = 0.96' !~ $version_checker => 'Correctly failed to match version 0.96'; ok @! == 1 => 'with correct number of error messages'; is $![0], 'Cannot parse language version 0.96' => 'with correct error message'; #=====[ Test regular timeouts ]============================================= my $calculator = do{ use Regexp::Grammars; qr{ ( <.Mult> ** <.Op=([+-])> ) ( <.Pow> ** <.Op=([*/%])> ) \^ | (?{ sleep 1 }) | \( \) }xms }; ok '2*2*2' !~ $calculator => 'Correctly failed to match 2*2*2'; ok @! == 1 => 'with single error message'; is $![0], 'Internal error: Timed out after 5 seconds (as requested)' => 'and the correct error message'; ok '2*2' =~ $calculator => 'Matched 2*2'; is $/{Answer}, '4' => 'with correct result'; ok @! == 0 => 'and without error message'; ok '2' =~ $calculator => 'Matched 2'; is $/{Answer}, '2' => 'with correct result'; ok @! == 0 => 'and without error message'; Regexp-Grammars-1.033/t/top_is_token.t000644 000765 000765 00000001203 12161423262 020377 0ustar00damiandamian000000 000000 use 5.010; use Test::More 'no_plan'; use Regexp::Grammars; my $grammar_top = qr{ ^ a b $ }xms; my $grammar_rule = qr{ ^ a b $ }xms; my $grammar_token = qr{ ^ a b $ }xms; ok 'ab' =~ $grammar_top => 'Top correctly matches without space'; ok 'a b' !~ $grammar_top => 'Top correctly fails with space'; ok 'ab' =~ $grammar_token => 'Token correctly matches without space'; ok 'a b' !~ $grammar_token => 'Token correctly fails with space'; ok 'ab' =~ $grammar_rule => 'Rule correctly matches without space'; ok 'a b' =~ $grammar_rule => 'Rule correctly matches with space'; Regexp-Grammars-1.033/lib/Regexp/000755 000765 000765 00000000000 12210257320 017251 5ustar00damiandamian000000 000000 Regexp-Grammars-1.033/lib/Skip_if_Perl_5_18.pm000644 000765 000765 00000000352 12162223352 021463 0ustar00damiandamian000000 000000 package # Hide this from CPAN Skip_if_Perl_5_18; use Test::More; if ($] >= 5.018) { plan skip_all => 'This feature of Regexp::Grammars known to be incompatible with 5.18'; } 1; # Magic true value required at end of module Regexp-Grammars-1.033/lib/Regexp/Grammars.pm000644 000765 000765 00000674166 12210257301 021403 0ustar00damiandamian000000 000000 =encoding ISO8859-1 =cut package Regexp::Grammars; use re 'eval'; use warnings; use strict; use 5.010; use Scalar::Util qw< blessed reftype >; use Data::Dumper qw< Dumper >; our $VERSION = '1.033'; my $anon_scalar_ref = \do{my $var}; my %MAGIC_VARS = ( '$CAPTURE' => $anon_scalar_ref, '$CONTEXT' => $anon_scalar_ref, '$DEBUG' => $anon_scalar_ref, '$INDEX' => $anon_scalar_ref, '$MATCH' => $anon_scalar_ref, '%ARG' => {}, '%MATCH' => {}, ); my $PROBLEM_WITH_5_18 = <<'END_ERROR_MSG'; Warning: Regexp::Grammars is currently unsupported under Perl 5.18. Perl 5.18 changed how 'qr' constant overloadings are parsed and the scope in which they are subsequently compiled. This change currently make it impossible to reliably create 'qr' overloadings that inject code blocks into a regex, as it prevents the overloaded regexes from compiling properly in many cases, even with an explicit 'use re "eval"' in scope. These problems have been reported, and the brave volunteers of P5P are currently working on fixes. However, these will not be available until at least 5.18.2. Because Regexp::Grammars relies on 'qr' overloads to inject code blocks into regexes, the module is curently not compatible with Perl 5.18. It may continue to work in some limited cases, but is no longer reliable. At present, if you rely on Regexp::Grammars for your parsing needs, your alternatives are either not to upgrade to Perl 5.18, to livce with the problems until they are resolved (in Perl 5.18.2, we hope), or else to consider switching to another parsing system, such as Marpa. We deeply regret that Regexp::Grammars cannot currently be maintained completelt due to these backwards-incompatible changes and bugs in Perl 5.18. END_ERROR_MSG # Load the module... sub import { # Signal lexical scoping (active, unless something was exported)... $^H{'Regexp::Grammars::active'} = 1; # Process any regexes in module's active lexical scope... use overload; overload::constant( qr => sub { my ($raw, $cooked, $type) = @_; # In active scope and really a regex... if (_module_is_active() && $type =~ /qq?/) { return bless \$cooked, 'Regexp::Grammars::Precursor'; } # Ignore everything else... else { return $cooked; } } ); # Deal with 5.18 issues... if ($] >= 5.018) { # Issue warning... require Carp; Carp::carp($PROBLEM_WITH_5_18); # Deal with (some, but not all) cases where Perl 5.18 now # complains about the injection of (??{...}) and (?{...}) require re; re->import('eval'); # Sanctify the standard Regexp::Grammars pseudo-variables from # Perl 5.18's early enforcement of strictures... require Lexical::Var; for my $magic_var (keys %MAGIC_VARS) { Lexical::Var->import($magic_var, $MAGIC_VARS{$magic_var}); } } } # Deactivate module's regex effect when it is "anti-imported" with 'no'... sub unimport { # Signal lexical (non-)scoping... $^H{'Regexp::Grammars::active'} = 0; require re; re->unimport('eval'); } # Encapsulate the hoopy user-defined pragma interface... sub _module_is_active { return (caller 1)[10]->{'Regexp::Grammars::active'}; } my $RULE_HANDLER; sub clear_rule_handler { undef $RULE_HANDLER; } { package Regexp; sub with_actions { my ($self, $handler) = @_; $RULE_HANDLER = $handler; return $self; } } #=====[ COMPILE-TIME INTERIM REPRESENTATION OF GRAMMARS ]=================== { package Regexp::Grammars::Precursor; # Only translate precursors once... state %grammar_cache; use overload ( # Concatenation/interpolation just concatenates to the precursor... q{.} => sub { my ($x, $y, $reversed) = @_; if (ref $x) { $x = ${$x} } if (ref $y) { $y = ${$y} } if ($reversed) { ($y,$x) = ($x,$y); } $x .= $y//q{}; return bless \$x, 'Regexp::Grammars::Precursor'; }, # Using as a string (i.e. matching) preprocesses the precursor... q{""} => sub { my ($obj) = @_; return $grammar_cache{ overload::StrVal($$obj) } //= Regexp::Grammars::_build_grammar( ${$obj} ); }, # Everything else, as usual... fallback => 1, ); } #=====[ SUPPORT FOR THE INTEGRATED DEBUGGER ]========================= # All messages go to STDERR by default... *Regexp::Grammars::LOGFILE = *STDERR{IO}; # Debugging levels indicate where to stop... our %DEBUG_LEVEL = ( same => undef, # No change in debugging mode off => 0, # No more debugging run => 1, continue => 1, # Run to completion of regex match match => 2, on => 2, # Run to next successful submatch step => 3, try => 3, # Run to next reportable event ); # Debugging levels can be abbreviated to one character during interactions... @DEBUG_LEVEL{ map {substr($_,0,1)} keys %DEBUG_LEVEL } = values %DEBUG_LEVEL; $DEBUG_LEVEL{o} = $DEBUG_LEVEL{off}; # Not "on" $DEBUG_LEVEL{s} = $DEBUG_LEVEL{step}; # Not "same" # Width of leading context field in debugging messages is constrained... my $MAX_CONTEXT_WIDTH = 20; my $MIN_CONTEXT_WIDTH = 6; sub set_context_width { { package Regexp::Grammars::ContextRestorer; sub new { my ($class, $old_context_width) = @_; bless \$old_context_width, $class; } sub DESTROY { my ($old_context_width_ref) = @_; $MAX_CONTEXT_WIDTH = ${$old_context_width_ref}; } } my ($new_context_width) = @_; my $old_context_width = $MAX_CONTEXT_WIDTH; $MAX_CONTEXT_WIDTH = $new_context_width; if (defined wantarray) { return Regexp::Grammars::ContextRestorer->new($old_context_width); } } # Rewrite a string currently being matched, to make \n and \t visible sub _show_metas { my $context_str = shift // q{}; # Quote newlines (\n -> \\n, without using a regex)... my $index = index($context_str,"\n"); while ($index >= 0) { substr($context_str, $index, 1, '\\n'); $index = index($context_str,"\n",$index+2); } # Quote tabs (\t -> \\t, without using a regex)... $index = index($context_str,"\t"); while ($index >= 0) { substr($context_str, $index, 1, '\\t'); $index = index($context_str,"\t",$index+2); } return $context_str; } # Minimize whitespace in a string... sub _squeeze_ws { my ($str) = @_; $str =~ tr/\n\t/ /; my $index = index($str,q{ }); while ($index >= 0) { substr($str, $index, 2, q{ }); $index = index($str,q{ },$index); } return $str; } # Prepare for debugging... sub _init_try_stack { our (@try_stack, $last_try_pos, $last_context_str); # Start with a representation of the entire grammar match... @try_stack = ({ subrule => '', height => 0, errmsg => ' \\FAIL ', }); # Initialize tracking of location and context... $last_try_pos = -1; $last_context_str = q{}; # Report... say {*Regexp::Grammars::LOGFILE} _debug_context('=>') . 'Trying from position ' . pos(); } # Create a "context string" showing where the regex is currently matching... sub _debug_context { my ($fill_chars) = @_; # Determine minimal sufficient width for context field... my $field_width = length(_show_metas($_//q{})); if ($field_width > $MAX_CONTEXT_WIDTH) { $field_width = $MAX_CONTEXT_WIDTH; } elsif ($field_width < $MIN_CONTEXT_WIDTH) { $field_width = $MIN_CONTEXT_WIDTH; } # Get current matching position (and some additional trailing context)... my $context_str = substr(_show_metas(substr(($_//q{}).q{},pos()//0,$field_width)),0,$field_width); # Build the context string, handling special cases... our $last_context_str; if ($fill_chars) { # If caller supplied a 1- or 2-char fill sequence, use that instead... my $last_fill_char = length($fill_chars) > 1 ? substr($fill_chars,-1,1,q{}) : $fill_chars ; $context_str = $fill_chars x ($field_width-1) . $last_fill_char; } else { # Make end-of-string visible in empty context string... if ($context_str eq q{}) { $context_str = '[eos]'; } # Don't repeat consecutive identical context strings... if ($context_str eq $last_context_str) { $context_str = q{ } x $field_width; } else { # If not repeating, remember for next time... $last_context_str = $context_str; } } # Left justify and return context string... return sprintf("%-*s ",$field_width,$context_str); } # Show a debugging message (mainly used for compile-time errors and info)... sub _debug_notify { # Single arg is a line to be printed with a null severity... my ($severity, @lines) = @_==1 ? (q{},@_) : @_; chomp @lines; # Formatting string for all lines... my $format = qq{%*s | %s\n}; # Track previous severity and avoid repeating the same level... state $prev_severity = q{}; if ($severity !~ /\S/) { # Do nothing } elsif ($severity eq 'info' && $prev_severity eq 'info' ) { $severity = q{}; } else { $prev_severity = $severity; } # Display first line with severity indicator (unless same as previous)... printf {*Regexp::Grammars::LOGFILE} $format, $MIN_CONTEXT_WIDTH, $severity, shift @lines; # Display first line without severity indicator for my $next_line (@lines) { printf {*Regexp::Grammars::LOGFILE} $format, $MIN_CONTEXT_WIDTH, q{}, $next_line; } } # Handle user interactions during runtime debugging... sub _debug_interact { my ($stack_height, $leader, $curr_frame_ref, $min_debug_level) = @_; our $DEBUG; # ...stores current debug level within regex # Only interact with terminals, and if debug level is appropriate... if (-t *Regexp::Grammars::LOGFILE && defined $DEBUG && ($DEBUG_LEVEL{$DEBUG}//0) >= $DEBUG_LEVEL{$min_debug_level} ) { local $/ = "\n"; # ...in case some caller is being clever INPUT: while (1) { my $cmd = readline // q{}; chomp $cmd; # Input of 'd' means 'display current result frame'... if ($cmd eq 'd') { print {*Regexp::Grammars::LOGFILE} join "\n", map { $leader . ($stack_height?'| ':q{}) . ' : ' . $_ } split "\n", q{ }x8 . substr(Dumper($curr_frame_ref),8); print "\t"; } # Any other (valid) input changes debugging level and continues... else { if (defined $DEBUG_LEVEL{$cmd}) { $DEBUG = $cmd; } last INPUT; } } } # When interaction not indicated, just complete the debugging line... else { print {*Regexp::Grammars::LOGFILE} "\n"; } } # Handle reporting of unsuccessful match attempts... sub _debug_handle_failures { my ($stack_height, $subrule, $in_match) = @_; our @try_stack; # Unsuccessful match attempts leave "leftovers" on the attempt stack... CLEANUP: while (@try_stack && $try_stack[-1]{height} >= $stack_height) { # Grab record of (potentially) unsuccessful attempt... my $error_ref = pop @try_stack; # If attempt was the one whose match is being reported, go and report... last CLEANUP if $in_match && $error_ref->{height} == $stack_height && $error_ref->{subrule} eq $subrule; # Otherwise, report the match failure... say {*Regexp::Grammars::LOGFILE} _debug_context(q{ }) . $error_ref->{errmsg}; } } # Handle attempts to call non-existent subrules... sub _debug_fatal { my ($naughty_construct) = @_; print {*Regexp::Grammars::LOGFILE} "_________________________________________________________________\n", "Fatal error: Entire parse terminated prematurely while attempting\n", " to call non-existent rule: $naughty_construct\n", "_________________________________________________________________\n"; $@ = "Entire parse terminated prematurely while attempting to call non-existent rule: $naughty_construct"; } # Handle objrules that don't return hashes... sub _debug_non_hash { my ($obj, $name) = @_; # If the object is okay, no further action required... return q{} if reftype($obj) eq 'HASH'; # Generate error messages... print {*Regexp::Grammars::LOGFILE} "_________________________________________________________________\n", "Fatal error: returned a non-hash-based object\n", "_________________________________________________________________\n"; $@ = " returned a non-hash-based object"; return '(*COMMIT)(*FAIL)'; } # Print a message in context... sub _debug_logmsg { my ($stack_height, @msg) = @_; # Determine indent for messages... my $leader = _debug_context() . q{| } x ($stack_height-1) . '|'; # Report the attempt... print {*Regexp::Grammars::LOGFILE} map { "$leader$_\n" } @msg; } # Print a message indicating a (sub)match attempt... sub _debug_trying { my ($stack_height, $curr_frame_ref, $subrule) = @_; # Clean up after any preceding unsuccessful attempts... _debug_handle_failures($stack_height, $subrule); # Determine indent for messages... my $leader = _debug_context() . q{| } x ($stack_height-2); # Detect and report any backtracking prior to this attempt... our $last_try_pos //= 0; #...Stores the pos() of the most recent match attempt? my $backtrack_distance = $last_try_pos - pos(); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-2) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . q{ and trying new match} ; } # Report the attempt... print {*Regexp::Grammars::LOGFILE} $leader, "|...Trying $subrule\t"; # Handle user interactions during debugging... _debug_interact($stack_height, $leader, $curr_frame_ref, 'step'); # Record the attempt, for later error handling in _debug_matched()... if ($subrule ne 'next alternative') { our @try_stack; push @try_stack, { height => $stack_height, subrule => $subrule, # errmsg should align under: |...Trying $subrule\t errmsg => q{| } x ($stack_height-2) . "| \\FAIL $subrule", }; } $last_try_pos = pos(); } # Print a message indicating a successful (sub)match... sub _debug_matched { my ($stack_height, $curr_frame_ref, $subrule, $matched_text) = @_; # Clean up any intervening unsuccessful attempts... _debug_handle_failures($stack_height, $subrule, 'in match'); # Build debugging message... my $debug_context = _debug_context(); my $leader = $debug_context . q{| } x ($stack_height-2); my $message = ($stack_height ? '| ' : q{}) . " \\_____$subrule matched "; my $filler = $stack_height ? '| ' . q{ } x (length($message)-4) : q{ } x length($message); our $last_try_pos //= 0; #...Stores the pos() of the most recent match attempt? # Report if match required backtracking... my $backtrack_distance = $last_try_pos - (pos()//0); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-2) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . qq{ and rematching $subrule} ; } $last_try_pos = pos(); # Format match text (splitting multi-line texts and indent them correctly)... $matched_text = defined($matched_text) ? $matched_text = q{'} . join("\n$leader$filler", split "\n", $matched_text) . q{'} : q{}; # Print match message... print {*Regexp::Grammars::LOGFILE} $leader . $message . $matched_text . qq{\t}; # Check for user interaction... _debug_interact($stack_height, $leader, $curr_frame_ref, $stack_height ? 'match' : 'run'); } # Print a message indicating a successful (sub)match... sub _debug_require { my ($stack_height, $condition, $succeeded) = @_; # Build debugging message... my $debug_context = _debug_context(); my $leader = $debug_context . q{| } x ($stack_height-1); my $message1 = ($stack_height ? '|...' : q{}) . "Testing condition: $condition" ; my $message2 = ($stack_height ? '| ' : q{}) . " \\_____" . ($succeeded ? 'Satisified' : 'FAILED') ; # Report if match required backtracking... our $last_try_pos; my $backtrack_distance = $last_try_pos - pos(); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-1) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . qq{ and rematching} ; } # Remember where the condition was tried... $last_try_pos = pos(); # Print match message... say {*Regexp::Grammars::LOGFILE} $leader . $message1; say {*Regexp::Grammars::LOGFILE} $leader . $message2; } # Print a message indicating a successful store-result-of-code-block... sub _debug_executed { my ($stack_height, $curr_frame_ref, $subrule, $value) = @_; # Build message... my $leader = _debug_context() . q{| } x ($stack_height-2); my $message = "|...Action $subrule\n"; my $message2 = "| saved value: '"; $message .= $leader . $message2; my $filler = q{ } x length($message2); # Split multiline results over multiple lines (properly indented)... $value = join "\n$leader$filler", split "\n", $value; # Report the action... print {*Regexp::Grammars::LOGFILE} $leader . $message . $value . qq{'\t}; # Check for user interaction... _debug_interact($stack_height, $leader, $curr_frame_ref, 'match'); } # Create the code to be inserted into the regex to facilitate debugging... sub _build_debugging_statements { my ($debugging_active, $subrule, $extra_pre_indent) = @_; return (q{}, q{}) if ! $debugging_active;; $extra_pre_indent //= 0; $subrule = "q{$subrule}"; return ( qq{ Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], $subrule) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}; }, qq{ Regexp::Grammars::_debug_matched(\@Regexp::Grammars::RESULT_STACK+1, \$Regexp::Grammars::RESULT_STACK[-1], $subrule, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}; }, ); } sub _build_raw_debugging_statements { my ($debugging_active, $subpattern, $extra_pre_indent) = @_; return (q{}, q{}) if ! $debugging_active; $extra_pre_indent //= 0; if ($subpattern eq '|') { return ( q{}, qq{ (?{;Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], 'next alternative') if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, ); } else { return ( qq{ (?{;Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], q{subpattern /$subpattern/}, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, qq{ (?{;Regexp::Grammars::_debug_matched(\@Regexp::Grammars::RESULT_STACK+1, \$Regexp::Grammars::RESULT_STACK[-1], q{subpattern /$subpattern/}, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, ); } } #=====[ SUPPORT FOR AUTOMATIC TIMEOUTS ]========================= sub _test_timeout { our ($DEBUG, $TIMEOUT); return q{} if time() < $TIMEOUT->{'limit'}; my $duration = "$TIMEOUT->{duration} second" . ( $TIMEOUT->{duration} == 1 ? q{} : q{s} ); if (defined($DEBUG) && $DEBUG ne 'off') { my $leader = _debug_context(q{ }); say {*LOGFILE} $leader . '|'; say {*LOGFILE} $leader . "|...Invoking {duration}>"; say {*LOGFILE} $leader . "| \\_____No match after $duration"; say {*LOGFILE} $leader . '|'; say {*LOGFILE} $leader . " \\FAIL "; } if (! @!) { @! = "Internal error: Timed out after $duration (as requested)"; } return q{(*COMMIT)(*FAIL)}; } #=====[ SUPPORT FOR UPDATING THE RESULT STACK ]========================= # Create a clone of the current result frame with an new key/value... sub _extend_current_result_frame_with_scalar { my ($stack_ref, $key, $value) = @_; # Autovivify null stacks (only occur when grammar invokes no subrules)... if (!@{$stack_ref}) { $stack_ref = [{}]; } # Copy existing frame, appending new value so it overwrites any old value... my $cloned_result_frame = { %{$stack_ref->[-1]}, $key => $value, }; # Make the copy into an object, if the original was one... if (my $class = blessed($stack_ref->[-1])) { bless $cloned_result_frame, $class; } return $cloned_result_frame; } # Create a clone of the current result frame with an additional key/value # (As above, but preserving the "listiness" of the key being added to)... sub _extend_current_result_frame_with_list { my ($stack_ref, $key, $value) = @_; # Copy existing frame, appending new value to appropriate element's list... my $cloned_result_frame = { %{$stack_ref->[-1]}, $key => [ @{$stack_ref->[-1]{$key}//[]}, $value, ], }; # Make the copy into an object, if the original was one... if (my $class = blessed($stack_ref->[-1])) { bless $cloned_result_frame, $class; } return $cloned_result_frame; } # Pop current result frame and add it to a clone of previous result frame # (flattening it if possible, and preserving any blessing)... sub _pop_current_result_frame { my ($stack_ref, $key, $original_name, $value) = @_; # Where are we in the stack? my $curr_frame = $stack_ref->[-1]; my $caller_frame = $stack_ref->[-2]; # Track which frames are objects... my $is_blessed_curr = blessed($curr_frame); my $is_blessed_caller = blessed($caller_frame); # Remove "private" captures (i.e. those starting with _)... delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} }; # Remove "nocontext" marker... my $nocontext = delete $curr_frame->{'~'}; # Build a clone of the current frame... my $cloned_result_frame = exists $curr_frame->{'='} ? $curr_frame->{'='} : $is_blessed_curr || length(join(q{}, keys %{$curr_frame})) ? { q{} => $value, %{$curr_frame} } : keys %{$curr_frame} ? $curr_frame->{q{}} : $value ; # Apply any appropriate handler... if ($RULE_HANDLER) { if ($RULE_HANDLER->can($original_name) || $RULE_HANDLER->can('AUTOLOAD')) { my $replacement_result_frame = $RULE_HANDLER->$original_name($cloned_result_frame); if (defined $replacement_result_frame) { $cloned_result_frame = $replacement_result_frame; } } } # Remove capture if not requested... if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) { delete $cloned_result_frame->{q{}}; } # Nest a clone of current frame inside a clone of the caller frame... my $cloned_caller_frame = { %{$caller_frame//{}}, $key => $cloned_result_frame, }; # Make the copies into objects, if the originals were... if ($is_blessed_curr && !exists $curr_frame->{'='} ) { bless $cloned_caller_frame->{$key}, $is_blessed_curr; } if ($is_blessed_caller) { bless $cloned_caller_frame, $is_blessed_caller; } return $cloned_caller_frame; } # Pop current result frame and add it to a clone of previous result frame # (flattening it if possible, and preserving any blessing) # (As above, but preserving listiness of key being added to)... sub _pop_current_result_frame_with_list { my ($stack_ref, $key, $original_name, $value) = @_; # Where are we in the stack? my $curr_frame = $stack_ref->[-1]; my $caller_frame = $stack_ref->[-2]; # Track which frames are objects... my $is_blessed_curr = blessed($curr_frame); my $is_blessed_caller = blessed($caller_frame); # Remove "private" captures (i.e. those starting with _)... delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} }; # Remove "nocontext" marker... my $nocontext = delete $curr_frame->{'~'}; # Clone the current frame... my $cloned_result_frame = exists $curr_frame->{'='} ? $curr_frame->{'='} : $is_blessed_curr || length(join(q{}, keys %{$curr_frame})) ? { q{} => $value, %{$curr_frame} } : keys %{$curr_frame} ? $curr_frame->{q{}} : $value ; # Apply any appropriate handler... if ($RULE_HANDLER) { if ($RULE_HANDLER->can($original_name) || $RULE_HANDLER->can('AUTOLOAD')) { my $replacement_result_frame = $RULE_HANDLER->$original_name($cloned_result_frame); if (defined $replacement_result_frame) { $cloned_result_frame = $replacement_result_frame; } } } # Remove capture if not requested... if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) { delete $cloned_result_frame->{q{}}; } # Append a clone of current frame inside a clone of the caller frame... my $cloned_caller_frame = { %{$caller_frame}, $key => [ @{$caller_frame->{$key}//[]}, $cloned_result_frame, ], }; # Make the copies into objects, if the originals were... if ($is_blessed_curr && !exists $curr_frame->{'='} ) { bless $cloned_caller_frame->{$key}[-1], $is_blessed_curr; } if ($is_blessed_caller) { bless $cloned_caller_frame, $is_blessed_caller; } return $cloned_caller_frame; } #=====[ MISCELLANEOUS CONSTANTS ]========================= # Namespace in which grammar inheritance occurs... my $CACHE = 'Regexp::Grammars::_CACHE_::'; my $CACHE_LEN = length $CACHE; my %CACHE; #...for subrule tracking # This code inserted at the start of every grammar regex # (initializes the result stack cleanly and backtrackably, via local)... my $PROLOGUE = q{((?{; @! = () if !pos; local @Regexp::Grammars::RESULT_STACK = (@Regexp::Grammars::RESULT_STACK, {}); local $Regexp::Grammars::TIMEOUT = { limit => -1>>1 }; local $Regexp::Grammars::DEBUG = 'off' }) }; # This code inserted at the end of every grammar regex # (puts final result in %/. Also defines default , , etc.)... my $EPILOGUE = q{)(?{; $Regexp::Grammars::RESULT_STACK[-1]{q{}} //= $^N;; local $Regexp::Grammars::match_frame = pop @Regexp::Grammars::RESULT_STACK; delete @{$Regexp::Grammars::match_frame}{ '~', grep {substr($_,0,1) eq '_'} keys %{$Regexp::Grammars::match_frame} }; if (exists $Regexp::Grammars::match_frame->{'='}) { if (ref($Regexp::Grammars::match_frame->{'='}) eq 'HASH') { $Regexp::Grammars::match_frame = $Regexp::Grammars::match_frame->{'='}; } } if (@Regexp::Grammars::RESULT_STACK) { $Regexp::Grammars::RESULT_STACK[-1]{'(?R)'} = $Regexp::Grammars::match_frame; } Regexp::Grammars::clear_rule_handler(); */ = $Regexp::Grammars::match_frame; })(?(DEFINE) (?(?:\\s*)) (?(?:\\S+)) (? (?{; $Regexp::Grammars::RESULT_STACK[-1]{"="} = pos; }) ) (? (?{; $Regexp::Grammars::RESULT_STACK[-1]{"="} = 1 + substr($_,0,pos) =~ tr/\n/\n/; }) ) ) }; my $EPILOGUE_NC = $EPILOGUE; $EPILOGUE_NC =~ s{ ; .* ;;}{;}xms; #=====[ MISCELLANEOUS PATTERNS THAT MATCH USEFUL THINGS ]======== # Match an identifier... my $IDENT = qr{ [^\W\d] \w*+ }xms; my $QUALIDENT = qr{ (?: $IDENT :: )*+ $IDENT }xms; # Match balanced parentheses, taking into account \-escapes and []-escapes... my $PARENS = qr{ (?&PARENS) (?(DEFINE) (? \( (?: \\. | (?&PARENS) | (?&CHARSET) | [^][()\\]++)*+ \) ) (? \[ \^?+ \]?+ (?: \[:\w+:\] | \\. | [^]])*+ \] ) ) }xms; #=====[ UTILITY SUBS FOR ERROR AND WARNING MESSAGES ]======== sub _uniq { my %seen; return grep { defined $_ && !$seen{$_}++ } @_; } # Default translator for error messages... my $ERRORMSG_TRANSLATOR = sub { my ($errormsg, $rulename, $context) = @_; $rulename = 'valid input' if $rulename eq q{}; $context //= ''; # Unimplemented subrule when rulename starts with '-'... if (substr($rulename,0,1) eq '-') { $rulename = substr($rulename,1); return "Can't match subrule <$rulename> (not implemented)"; } # Empty message converts to a "Expected...but found..." message... if ($errormsg eq q{}) { $rulename =~ tr/_/ /; $rulename = lc($rulename); return "Expected $rulename, but found '$context' instead"; } # "Expecting..." messages get "but found" added... if (lc(substr($errormsg,0,6)) eq 'expect') { return "$errormsg, but found '$context' instead"; } # Everything else stays "as is"... return $errormsg; }; # Allow user to set translation... sub set_error_translator { { package Regexp::Grammars::TranslatorRestorer; sub new { my ($class, $old_translator) = @_; bless \$old_translator, $class; } sub DESTROY { my ($old_translator_ref) = @_; $ERRORMSG_TRANSLATOR = ${$old_translator_ref}; } } my ($translator_ref) = @_; die "Usage: set_error_translator(\$subroutine_reference)\n" if ref($translator_ref) ne 'CODE'; my $old_translator_ref = $ERRORMSG_TRANSLATOR; $ERRORMSG_TRANSLATOR = $translator_ref; return defined wantarray ? Regexp::Grammars::TranslatorRestorer->new($old_translator_ref) : (); } # Dispatch to current translator for error messages... sub _translate_errormsg { goto &{$ERRORMSG_TRANSLATOR}; } #=====[ SUPPORT FOR TRANSLATING GRAMMAR-ENHANCED REGEX TO NATIVE REGEX ]==== # Store any specified grammars... my %user_defined_grammar; my %REPETITION_DESCRIPTION_FOR = ( '+' => 'once or more', '*' => 'any number of times', '?' => 'if possible', '+?' => 'as few times as possible', '*?' => 'as few times as possible', '??' => 'if necessary', '++' => 'as many times as possible', '*+' => 'as many times as possible', '?+' => 'if possible', ); sub _translate_raw_regex { my ($regex, $debug_build, $debug_runtime) = @_; my $is_comment = substr($regex, 0, 1) eq q{#} || substr($regex, 0, 3) eq q{(?#}; my $visible_regex = _squeeze_ws($regex); # Report how regex was interpreted, if requested to... if ($debug_build && $visible_regex ne q{} && $visible_regex ne q{ }) { _debug_notify( info => " |", " |...Treating '$visible_regex' as:", ($is_comment ? " | \\ a comment (which will be ignored)" : " | \\ normal Perl regex syntax" ), ); } return q{} if $is_comment; # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_raw_debugging_statements($debug_runtime,$visible_regex, +1); # Replace negative lookahead with one that works under R::G... $regex =~ s{\(\?!}{(?!(?!)|}gxms; # ToDo: Also replace positive lookahead with one that works under R::G... # This replacement should be of the form: # $regex =~ s{\(\?!}{(?!(?!)|(?!(?!)|}gxms; # but need to find a way to insert the extra ) at the other end return $debug_runtime && $regex eq '|' ? $regex . $debug_post : $debug_runtime && $regex =~ /\S/ ? "(?:$debug_pre($regex)$debug_post)" : $regex; } # Report and convert a debugging directive... sub _translate_debug_directive { my ($construct, $cmd, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Change run-time debugging mode to '$cmd'", ); } return qq{(?{; local \$Regexp::Grammars::DEBUG = q{$cmd}; }) }; } # Report and convert a timeout directive... sub _translate_timeout_directive { my ($construct, $timeout, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", ($timeout > 0 ? " | \\ Cause the entire parse to fail after $timeout second" . ($timeout==1 ? q{} : q{s}) : " | \\ Cause the entire parse to fail immediately" ), ); } return $timeout > 0 ? qq{(?{; local \$Regexp::Grammars::TIMEOUT = { duration => $timeout, limit => time() + $timeout }; }) } : qq{(*COMMIT)(*FAIL)}; } # Report and convert a directive... sub _translate_require_directive { my ($construct, $condition, $debug_build) = @_; $condition = substr($condition, 3, -2); # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Require that {$condition} is true", ); } my $quoted_condition = $condition; $quoted_condition =~ s{\$}{}xms; return qq{(?(?{;$condition}) (?{;Regexp::Grammars::_debug_require( scalar \@Regexp::Grammars::RESULT_STACK, q{$quoted_condition}, 1) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}}) | (?{;Regexp::Grammars::_debug_require( scalar \@Regexp::Grammars::RESULT_STACK, q{$quoted_condition}, 0) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}})(?!)) }; } # Report and convert a directive... sub _translate_minimize_directive { my ($construct, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Minimize result value if possible", ); } return q{(?{; if (1 == grep { $_ ne '!' && $_ ne '@' && $_ ne '~' } keys %MATCH) { # ...single alnum key local %Regexp::Grammars::matches = %MATCH; delete @Regexp::Grammars::matches{'!', '@', '~'}; local ($Regexp::Grammars::only_key) = keys %Regexp::Grammars::matches; local $Regexp::Grammars::array_ref = $MATCH{$Regexp::Grammars::only_key}; if (ref($Regexp::Grammars::array_ref) eq 'ARRAY' && 1 == @{$Regexp::Grammars::array_ref}) { $MATCH = $Regexp::Grammars::array_ref->[0]; } } })}; } # Report and convert a debugging directive... sub _translate_error_directive { my ($construct, $type, $msg, $debug_build, $subrule_name) = @_; $subrule_name //= 'undef'; # Determine severity... my $severity = ($type eq 'error') ? 'fail' : 'non-fail'; # Determine fatality (and build code to invoke it)... my $fatality = ($type eq 'fatal') ? '(*COMMIT)(*FAIL)' : q{}; # Unpack message... if (substr($msg,0,3) eq '(?{') { $msg = 'do'. substr($msg,2,-1); } else { $msg = quotemeta $msg; $msg = qq{qq{$msg}}; } # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", ( $type eq 'log' ? " | \\ Log a message to the logfile" : " | \\ Append a $severity error message to \@!" ), ); } # Generate the regex... return $type eq 'log' ? qq{(?{Regexp::Grammars::_debug_logmsg(scalar \@Regexp::Grammars::RESULT_STACK,$msg) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG} })} : qq{(?:(?{;local \$Regexp::Grammar::_memopos=pos();}) (?>\\s*+((?-s).{0,$MAX_CONTEXT_WIDTH}+)) (?{; pos() = \$Regexp::Grammar::_memopos; @! = Regexp::Grammars::_uniq( @!, Regexp::Grammars::_translate_errormsg($msg,q{$subrule_name},\$CONTEXT) ) }) (?!)|} . ($severity eq 'fail' ? q{(?!)} : $fatality) . q{)} ; } sub _translate_subpattern { my ($construct, $alias, $subpattern, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout, $backref) = @_; # Determine save behaviour... my $is_noncapturing = $savemode eq 'noncapturing'; my $is_listifying = $savemode eq 'list'; my $is_codeblock = substr($subpattern,0,3) eq '(?{'; my $value_saved = $is_codeblock ? '$^R' : '$^N'; my $do_something_with = $is_codeblock ? 'execute the code block' : 'match the pattern'; my $result = $is_codeblock ? 'result' : 'matched substring'; my $description = $is_codeblock ? substr($subpattern,2,-1) : defined $backref ? $backref : $subpattern; my $debug_construct = $is_codeblock ? '<' . substr($alias,1,-1) . '= (?{;' . substr($subpattern,3,-2) . '})>' : $construct ; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? "each $result" : substr($postmodifier,0,1) eq '?' ? "any $result" : $postmodifier && !$is_noncapturing ? "only the final $result" : "the $result" ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | $do_something_with $description $repeatedly", ( $is_noncapturing ? " | \\ but don't save $results" : $is_listifying ? " | \\ appending $results to \@{\$MATCH{$alias}}" : " | \\ saving $results in \$MATCH{$alias}" ) ); } # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime,$debug_construct, +1); # Generate post-match result-capturing code, if match captures... my $post_action = $is_noncapturing ? q{} : qq{local \@Regexp::Grammars::RESULT_STACK = ( \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-2], Regexp::Grammars::_extend_current_result_frame_with_$savemode( \\\@Regexp::Grammars::RESULT_STACK, $alias, $value_saved ), );} ; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{$timeout_test(?{;local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK;$debug_pre}) (?:($subpattern)(?{;$post_action$debug_post}))$postmodifier}; } sub _translate_hashmatch { my ($construct, $alias, $hashname, $keypat, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout) = @_; # Empty or missing keypattern defaults to <.hk>... if (!defined $keypat || $keypat !~ /\S/) { $keypat = '(?&hk)' } else { $keypat = substr($keypat, 1, -1); } # Determine save behaviour... my $is_noncapturing = $savemode eq 'noncapturing'; my $is_listifying = $savemode eq 'list'; # Convert hash to hash lookup... my $hash_lookup = '$' . substr($hashname, 1). '{$^N}'; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? 'each matched key' : substr($postmodifier,0,1) eq '?' ? 'any matched key' : $postmodifier && !$is_noncapturing ? 'only the final matched key' : 'the matched key' ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | match a key from the hash $hashname $repeatedly", ( $is_noncapturing ? " | \\ but don't save $results" : $is_listifying ? " | \\ appending $results to \$MATCH{$alias}" : " | \\ saving $results in \$MATCH{$alias}" ) ); } # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime,$construct, +1); # Generate post-match result-capturing code, if match captures... my $post_action = $is_noncapturing ? q{} : qq{local \@Regexp::Grammars::RESULT_STACK = ( \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-2], Regexp::Grammars::_extend_current_result_frame_with_$savemode( \\\@Regexp::Grammars::RESULT_STACK, $alias, \$^N ), );} ; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{$timeout_test(?:(?{;local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK;$debug_pre}) (?:($keypat)(??{exists $hash_lookup ? q{} : q{(?!)}})(?{;$post_action$debug_post})))$postmodifier}; } # Convert a " % " construct to pure Perl 5.10... sub _translate_separated_list { my ($term, $op, $separator, $term_trans, $sep_trans, $ws, $debug_build, $debug_runtime, $timeout) = @_; # This insertion ensures backtracking upwinds the stack correctly... state $CHECKPOINT = q{(?{;@Regexp::Grammars::RESULT_STACK = @Regexp::Grammars::RESULT_STACK;})}; # Translate meaningful whitespace... $ws = length($ws) ? q{(?&ws)} : q{}; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Report how construct was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $term $op $separator as:", " | | repeatedly match the subrule $term", " | \\ as long as the matches are separated by matches of $separator", ); } # One-or-more... return qq{$timeout_test(?:$ws$CHECKPOINT$sep_trans$ws$term_trans)*$+} if $op =~ m{ [*][*]() | [+]([+?]?) \s* % | \{ 1, \}([+?]?) \s* % }xms; # Zero-or-more... return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans)*$+)?$+} if $op =~ m{ [*]([+?]?) \s* % | \{ 0, \}([+?]?) \s* % }xms; # One-or-zero... return qq{?$+} if $op =~ m{ [?]([+?]?) \s* % | \{ 0,1 \}([+?]?) \s* % }xms; # Zero exactly... return qq{{0}$ws} if $op =~ m{ \{ 0 \}[+?]? \s* % }xms; # N exactly... if ($op =~ m{ \{ (\d+) \}([+?]?) \s* % }xms ) { my $min = $1-1; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){$min}$+)} } # Zero-to-N... if ($op =~ m{ \{ 0,(\d+) \}([+?]?) \s* % }xms ) { my $max = $1-1; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){0,$max}$+)?$+} } # M-to-N and M-to-whatever... if ($op =~ m{ \{ (\d+),(\d*) \} ([+?]?) \s* % }xms ) { my $min = $1-1; my $max = $2 ? $2-1 : q{}; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){$min,$max}$+)} } # Somehow we missed a case (this should never happen)... die "Internal error: missing case in separated list handler"; } sub _translate_subrule_call { my ( $grammar_name, $construct, $alias, $subrule, $args, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout, $valid_subrule_names_ref, $nocontext) = @_; # Translate arg list, if provided... my $arg_desc; if ($args eq q{}) { $args = q{()}; } elsif (substr($args,0,3) eq '(?{') { # Turn parencode into do block... $arg_desc = substr($args,3,-2); substr($args,1,1) = 'do'; } else { # Turn abbreviated format into a key=>value list... $args =~ s{ [(,] \s* \K : (\w+) (?= \s* [,)] ) }{$1 => \$MATCH{'$1'}}gxms; $arg_desc = substr($args,1,-1); } # Transform qualified subrule names... my $simple_subrule = $subrule; my $start_grammar = (($simple_subrule =~ s{(.*)::}{}xms) ? $1 : ""); if ($start_grammar !~ /^NEXT$|::/) { $start_grammar = caller(3).'::'.$start_grammar; } my @candidates = $start_grammar eq 'NEXT' ? _ancestry_of($grammar_name) : _ancestry_of($start_grammar); # Rename fully-qualified rule call, if to ancestor grammar... RESOLVING: for my $parent_class (@candidates) { my $inherited_subrule = $parent_class.'::'.$simple_subrule; if ($CACHE{$inherited_subrule}) { $subrule = $inherited_subrule; last RESOLVING; } } # Replace package separators, which regex engine can't handle... my $internal_subrule = $subrule; $internal_subrule =~ s{::}{_88_}gxms; # Shortcircuit if unknown subrule invoked... if (!$valid_subrule_names_ref->{$subrule}) { _debug_notify( error => qq{Found call to $construct, but no or}, qq{ was defined in the grammar}, qq{(Did you misspell the rule name or forget to define the rule?)}, q{}, ); return "(?{Regexp::Grammars::_debug_fatal('$construct')})(*COMMIT)(*FAIL)"; } # Determine save behaviour... my $is_noncapturing = $savemode =~ /noncapturing|lookahead/; my $is_listifying = $savemode eq 'list'; my $save_code = $is_noncapturing? q{ @Regexp::Grammars::RESULT_STACK[0..@Regexp::Grammars::RESULT_STACK-2] } : $is_listifying? qq{ \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-3], Regexp::Grammars::_pop_current_result_frame_with_list( \\\@Regexp::Grammars::RESULT_STACK, $alias, '$simple_subrule', \$^N ), } : qq{ \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-3], Regexp::Grammars::_pop_current_result_frame( \\\@Regexp::Grammars::RESULT_STACK, $alias, '$simple_subrule', \$^N ), } ; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? 'each match' : substr($postmodifier,0,1) eq '?' ? 'any match' : 'the match' ; my $do_something_with = $savemode eq 'neglookahead' ? 'lookahead for anything except' : $savemode eq 'poslookahead' ? 'lookahead for' : 'match' ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | $do_something_with the subrule <$subrule> $repeatedly", (defined $arg_desc ? " | | passing the args: ($arg_desc)" : () ), ( $is_noncapturing ? " | \\ but don't save anything" : $is_listifying ? " | \\ appending $results to \$MATCH{$alias}" : " | \\ saving $results in \$MATCH{$alias}" ), ); } # Generate post-match result-capturing code, if match captures... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime, $construct); # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{(?:$timeout_test(?{; local \@Regexp::Grammars::RESULT_STACK = (\@Regexp::Grammars::RESULT_STACK, {'\@'=>{$args}}); \$Regexp::Grammars::RESULT_STACK[-2]{'~'} = $nocontext if \@Regexp::Grammars::RESULT_STACK >= 2; $debug_pre})((?&$internal_subrule))(?{; local \@Regexp::Grammars::RESULT_STACK = ( $save_code );$debug_post }))$postmodifier}; } sub _translate_rule_def { my ($type, $qualifier, $name, $callname, $qualname, $body, $objectify, $local_ws) = @_; $qualname =~ s{::}{_88_}gxms; # Return object if requested... my $objectification = $objectify ? qq{(??{; local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK; \$Regexp::Grammars::RESULT_STACK[-1] = '$qualifier$name'->can('new') ? '$qualifier$name'->new(\$Regexp::Grammars::RESULT_STACK[-1]) : bless \$Regexp::Grammars::RESULT_STACK[-1], '$qualifier$name'; Regexp::Grammars::_debug_non_hash(\$Regexp::Grammars::RESULT_STACK[-1],'$name'); })} : q{}; # Each rule or token becomes a DEFINE'd Perl 5.10 named capture... return qq{ (?(DEFINE) $local_ws (?<$qualname> (?<$callname> (?{\$Regexp::Grammars::RESULT_STACK[-1]{'!'}=\$#{!};}) (?:$body) $objectification (?{;\$#{!}=delete(\$Regexp::Grammars::RESULT_STACK[-1]{'!'})//0; delete(\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}); }) )) ) }; } # Locate any valid <...> sequences and replace with native regex code... sub _translate_subrule_calls { my ($grammar_name, $grammar_spec, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, $rule_name, $subrule_names_ref, $magic_ws, $nocontext, ) = @_; # Remember the preceding construct, so as to implement the +% etc. operators... my $prev_construct = q{}; my $prev_translation = q{}; # Translate all other calls (MAIN GRAMMAR FOR MODULE)... $grammar_spec =~ s{ (? (? \s*+) (? (?&SEPLIST_OP) ) (? \s*+) )? (? < (?: (? \. \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? (? \? | \! ) \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \[ \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* \] ) | (? (?(?&IDENT)) \s* = \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* \] ) | (? \s* : (?(?&QUALIDENT)) \s* ) | (? (?(?&IDENT)) \s* = \s* : (?(?&QUALIDENT)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* : (?(?&QUALIDENT)) \s* \] ) | (? \. (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* ) | (? (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* \] ) | (? (?(?&HASH)) \s* (?(?&BRACES))? \s* ) | (? (?(?&IDENT)) \s* = \s* (?(?&HASH)) \s* (?(?&BRACES))? \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&HASH)) \s* (?(?&BRACES))? \s* \] ) | (? \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* | \s* (? \\_ | /) (? (?&QUALIDENT)) \s* ) | (? (?(?&IDENT)) \s* = \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* | (?(?&IDENT)) \s* = \s* (? \\_ | /) (? (?&QUALIDENT)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* \] | \[ (?(?&IDENT)) \s* = \s* (? \\_ | /) (? (?&QUALIDENT)) \s* \] ) | (? minimize \s* : \s* ) | (? require \s* : \s* (? (?&PARENCODE) ) \s* ) | (? debug \s* : \s* (? run | match | step | try | off | on) \s* ) | (? timeout \s* : \s* (? \d+) \s* ) | (? context \s* : \s* ) | (? nocontext \s* : \s* ) | (? [.][.][.] | [!][!][!] | [?][?][?] ) | (? (? error | fatal ) \s*+ : \s*+ ) | (? (? log | error | warning | fatal ) \s*+ : \s*+ (? (?&PARENCODE) | .+? ) \s*+ ) ) > (? \s* (?! (?&SEPLIST_OP) ) [?+*][?+]? | ) | (? (?: \\[^shv] | (?! (?&PARENCODE) ) (?&PARENS) | (?&CHARSET) | \w++ | \| ) (?&QUANTIFIER)? ) | (? \s++ | \\. (?&QUANTIFIER)? | \(\?! | \(\?\# [^)]* \) # (?# -> old style inline comment) | (?&PARENCODE) | \# [^\n]*+ | [^][\s()<>#\\]++ ) ) (?(DEFINE) (? \*\* | [*+?][+?]?\s*% | \{ \d+(,\d*)? \} [+?]?\s*% ) (? \( (?:[?]<[=!])? (?: \\. | (?&PARENCODE) | (?&PARENS) | (?&CHARSET) | [^][()\\<>]++ )*+ \) ) (? \{ (?: \\. | (?&BRACES) | [^{}\\]++ )*+ \} ) (? \(\?[{] (?: \\. | (?&BRACES) | [^{}\\]++ )*+ [}]\) ) (? \% (?&IDENT) (?: :: (?&IDENT) )* ) (? \[ \^?+ \]?+ (?: \[:\w+:\] | \\. | [^]] )*+ \] ) (? [^\W\d]\w*+ ) (? (?: [^\W\d]\w*+ :: )* [^\W\d]\w*+ ) (? (?&NUMBER) | (?&STRING) | (?&VAR) ) (? [+-]? \d++ (?:\. \d++)? (?:[eE] [+-]? \d++)? ) (? ' [^\\']++ (?: \\. [^\\']++ )* ' ) (? (?&PARENCODE) | \( \s* (?&ARGS)? \s* \) | (?# NOTHING ) ) (? (?&ARG) \s* (?: , \s* (?&ARG) \s* )* ,? ) (? (?&VAR) | (?&KEY) \s* => \s* (?&LITERAL) ) (? : (?&IDENT) ) (? (?&IDENT) | (?&LITERAL) ) (? [*+?][+?]? | \{ \d+,?\d* \} [+?]?i ) ) }{ my $curr_construct = $+{construct}; my $alias = ($+{alias}//'MATCH') eq 'MATCH' ? q{'='} : qq{'$+{alias}'}; # Determine and remember the necessary translation... my $curr_translation = do{ # Translate subrule calls of the form: ... if (defined $+{alias_parens_scalar}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_parens_scalar_nocap}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_parens_list}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_hash_scalar}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_hash_scalar_nocap}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_hash_list}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_subrule_scalar}) { _translate_subrule_call( $grammar_name, $curr_construct, $alias, $+{subrule}, $+{args}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{alias_subrule_list}) { _translate_subrule_call( $grammar_name, $curr_construct, $alias, $+{subrule}, $+{args}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } # Translate subrule calls of the form: and ... elsif (defined $+{self_subrule_lookahead}) { # Determine type of lookahead, and work around capture problem... my ($type, $pre, $post) = ( 'neglookahead', '(?!(?!)|', ')' ); if (defined $+{sign} eq '?') { $type = 'poslookahead'; $pre x= 2; $post x= 2; } $pre . _translate_subrule_call( $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, $type, q{}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ) . $post; } elsif (defined $+{self_subrule_scalar_nocap}) { _translate_subrule_call( $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{self_subrule_scalar}) { _translate_subrule_call( $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{self_subrule_list}) { _translate_subrule_call( $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_argrule_scalar}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, $alias, $pattern, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } elsif (defined $+{alias_argrule_list}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, $alias, $pattern, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } # Translate subrule calls of the form: <:ARGNAME>... elsif (defined $+{self_argrule_scalar}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, qq{'$+{subrule}'}, $pattern, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } # Translate subrule calls of the form: <\IDENT> or ... elsif (defined $+{backref} || $+{alias_backref} || $+{alias_backref_list}) { # Use "%ARGS" if subrule names starts with a colon... my $subrule = $+{subrule}; if (substr($subrule,0,1) eq ':') { substr($subrule,0,1,"\@'}{'"); } my $backref = qq{\$Regexp::Grammars::RESULT_STACK[-1]{'$subrule'}}; my $quoter = $+{slash} eq '\\' || $+{slash} eq '\\_' ? "quotemeta($backref)" : "Regexp::Grammars::_invert_delim($backref)" ; my $pattern = qq{ (??{ defined $backref ? $quoter : q{(?!)}})}; my $type = $+{backref} ? 'noncapturing' : $+{alias_backref} ? 'scalar' : 'list' ; _translate_subpattern( $curr_construct, $alias, $pattern, $type, $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$MATCH{'$subrule'}" ); } # Translate reportable raw regexes (add debugging support)... elsif (defined $+{reportable_raw_regex}) { _translate_raw_regex( $+{reportable_raw_regex}, $compiletime_debugging_requested, $runtime_debugging_requested ); } # Translate non-reportable raw regexes (leave as is)... elsif (defined $+{raw_regex}) { _translate_raw_regex( $+{raw_regex}, $compiletime_debugging_requested ); } # Translate directives... elsif (defined $+{require_directive}) { _translate_require_directive( $curr_construct, $+{condition}, $compiletime_debugging_requested ); } elsif (defined $+{minimize_directive}) { _translate_minimize_directive( $curr_construct, $+{condition}, $compiletime_debugging_requested ); } elsif (defined $+{debug_directive}) { _translate_debug_directive( $curr_construct, $+{cmd}, $compiletime_debugging_requested ); } elsif (defined $+{timeout_directive}) { _translate_timeout_directive( $curr_construct, $+{timeout}, $compiletime_debugging_requested ); } elsif (defined $+{error_directive}) { _translate_error_directive( $curr_construct, $+{error_type}, $+{msg}, $compiletime_debugging_requested, $rule_name ); } elsif (defined $+{autoerror_directive}) { _translate_error_directive( $curr_construct, $+{error_type}, q{}, $compiletime_debugging_requested, $rule_name ); } elsif (defined $+{yadaerror_directive}) { _translate_error_directive( $curr_construct, ($+{yadaerror_directive} eq '???' ? 'warning' : 'error'), q{}, $compiletime_debugging_requested, -$rule_name ); } elsif (defined $+{context_directive}) { $nocontext = 0; if ($compiletime_debugging_requested) { _debug_notify( info => " |", " |...Treating $curr_construct as:", " | \\ Turn on context-saving for the current rule" ); } q{}; # Remove the directive } elsif (defined $+{nocontext_directive}) { $nocontext = 1; if ($compiletime_debugging_requested) { _debug_notify( info => " |", " |...Treating $curr_construct as:", " | \\ Turn off context-saving for the current rule" ); } q{}; # Remove the directive } # There shouldn't be any other possibility... else { die qq{Internal error: this shouldn't happen!\nNear '$curr_construct': }; } }; # Handle the **/*%/+%/{n,m}%/etc operators... if (defined $+{list_marker}) { my $ws = $magic_ws ? $+{ws1} . $+{ws2} : q{}; my $op = $+{op}; $curr_translation = _translate_separated_list( $prev_construct, $op, $curr_construct, $prev_translation, $curr_translation, $ws, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); $curr_construct = qq{$prev_construct $op $curr_construct}; } # Finally, remember this latest translation, and return it... $prev_construct = $curr_construct; $prev_translation = $curr_translation;; }exmsg; # Translate magic hash accesses... $grammar_spec =~ s{\$MATCH (?= \s*\{) } {\$Regexp::Grammars::RESULT_STACK[-1]}xmsg; $grammar_spec =~ s{\$ARG (?= \s*\{) } {\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}}xmsg; # Translate magic scalars and hashes... state $translate_scalar = { q{%$MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{@$MATCH} => q{@{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{$MATCH} => q{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}, q{%MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]}}, q{$CAPTURE} => q{$^N}, q{$CONTEXT} => q{$^N}, q{$DEBUG} => q{$Regexp::Grammars::DEBUG}, q{$INDEX} => q{${\\pos()}}, q{%ARG} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{'@'}}}, }; state $translatable_scalar = join '|', map {quotemeta $_} sort {length $b <=> length $a} keys %{$translate_scalar}; $grammar_spec =~ s{ ($translatable_scalar) (?! \s* (?: \[ | \{) ) } {$translate_scalar->{$1}}oxmsg; return $grammar_spec; } # Generate a "decimal timestamp" and insert in a template... sub _timestamp { my ($template) = @_; # Generate and insert any timestamp... if ($template =~ /%t/) { my ($sec, $min, $hour, $day, $mon, $year) = localtime; $mon++; $year+=1900; my $timestamp = sprintf("%04d%02d%02d.%02d%02d%02d", $year, $mon, $day, $hour, $min, $sec); $template =~ s{%t}{$timestamp}xms;; } return $template; } # Open (or re-open) the requested log file... sub _autoflush { my ($fh) = @_; my $originally_selected = select $fh; $|=1; select $originally_selected; } sub _open_log { my ($mode, $filename) = @_; # Special case: '-' --> STDERR if ($filename eq q{-}) { return *STDERR{IO}; } # Otherwise, just open the named file... elsif (open my $fh, $mode, $filename) { _autoflush($fh); return $fh; } # Otherwise, generate a warning and default to STDERR... else { local *Regexp::Grammars::LOGFILE = *STDERR{IO}; _debug_notify( warn => qq{Unable to open log file '$filename'}, qq{($!)}, qq{Defaulting to STDERR instead.}, ); _debug_notify( q{} => q{} ); return *STDERR{IO}; } } sub _invert_delim { my ($delim) = @_; $delim = reverse $delim; $delim =~ tr/<>[]{}()«»`'/><][}{)(»«'`/; return quotemeta $delim; } # Regex to detect if other regexes contain a grammar specification... my $GRAMMAR_DIRECTIVE = qr{ < grammar: \s* (? $QUALIDENT ) \s* > }xms; # Regex to detect if other regexes contain a grammar inheritance... my $EXTENDS_DIRECTIVE = qr{ < extends: \s* (? $QUALIDENT ) \s* > }xms; # Cache of rule/token names within defined grammars... my %subrule_names_for; # Build list of ancestors for a given grammar... sub _ancestry_of { my ($grammar_name) = @_; return () if !$grammar_name; use mro; return map { substr($_, $CACHE_LEN) } @{mro::get_linear_isa($CACHE.$grammar_name, 'c3')}; } # Detect and translate any requested grammar inheritances... sub _extract_inheritances { my ($regex, $compiletime_debugging_requested, $derived_grammar_name) = @_; # Detect and remove inheritance requests... while ($regex =~ s{$EXTENDS_DIRECTIVE}{}xms) { # Normalize grammar name and report... my $orig_grammar_name = $+{base_grammar_name}; my $grammar_name = $orig_grammar_name; if ($grammar_name !~ /::/) { $grammar_name = caller(2).'::'.$grammar_name; } if (exists $user_defined_grammar{$grammar_name}) { if ($compiletime_debugging_requested) { _debug_notify( info => "Processing inheritance request for $grammar_name...", q{}, ); } # Specify new relationship... no strict 'refs'; push @{$CACHE.$derived_grammar_name.'::ISA'}, $CACHE.$grammar_name; } else { my (undef, $file, $line) = caller(2); _debug_notify( fatal => "Inheritance from unknown grammar requested", "in ", "at $file line $line", ); } } # Retrieve ancestors (but not self) in C3 dispatch order... my (undef, @ancestors) = _ancestry_of($derived_grammar_name); # Extract subrule names and implementations for ancestors... my %subrule_names = map { %{$subrule_names_for{$_}} } @ancestors; $_ = -1 for values %subrule_names; my $implementation = join "\n", map { $user_defined_grammar{$_} } @ancestors; return $implementation, \%subrule_names; } # Pattern for directive within rules... my $WS_PATTERN = qr{]++ | $PARENS )*+) >}xms; # Transform grammar-augmented regex into pure Perl 5.10 regex... sub _build_grammar { my ($grammar_spec) = @_; $grammar_spec .= q{}; # Check for dubious repeated constructs that throw away captures... my @dubious = $grammar_spec =~ m{ < (?! \[ ) # not <[SUBRULE]> ( $IDENT (?: = [^>]*)? ) # but or > \s* ( # followed by a quantifier... [+*][?+]? # either symbolic | \{\d+(?:,\d*)?\}[?+]? # or numeric ) }gxms; # Report dubiousities... while (@dubious) { my ($rule, $qual) = splice @dubious, 0, 2; _debug_notify( warn => qq{Repeated subrule <$rule>$qual will only capture its final match}, qq{(Did you mean <[$rule]>$qual instead?)}, q{}, ) } # Check for dubious non-backtracking constructs... @dubious = $grammar_spec =~ m{ < ( [^>]+ ) > \s* ([?+*][+]|\{.*\}[+]) }gxms; # Report dubiousities... while (@dubious) { my ($rule, $qual) = splice @dubious, 0, 2; my $safe_qual = substr($qual,0,-1); _debug_notify( warn => qq{Non-backtracking subrule <$rule>$qual not fully supported yet}, qq{(If grammar does not work try <$rule>$safe_qual instead)}, q{}, ) } # Check whether a log file was specified... my $compiletime_debugging_requested; local *Regexp::Grammars::LOGFILE = *Regexp::Grammars::LOGFILE; my $logfile = q{-}; $grammar_spec =~ s{ ^ [^#]* < logfile: \s* ([^>]+?) \s* > }{ $logfile = _timestamp($1); # Presence of implies compile-time logging... $compiletime_debugging_requested = 1; *Regexp::Grammars::LOGFILE = _open_log('>',$logfile); # Delete directive... q{}; }gexms; # Look ahead for any run-time debugging or timeout requests... my $runtime_debugging_requested = $grammar_spec =~ m{ ^ [^#]* < debug: \s* (run | match | step | try | on | off | same ) \s* > | \$DEBUG (?! \s* (?: \[ | \{) ) }xms; my $timeout_requested = $grammar_spec =~ m{ ^ [^#]* < timeout: \s* \d+ \s* > }xms; # Standard actions set up and clean up any regex debugging... # Before entire match, set up a stack of attempt records and report... my $pre_match_debug = $runtime_debugging_requested ? qq{(?{; *Regexp::Grammars::LOGFILE = Regexp::Grammars::_open_log('>>','$logfile'); Regexp::Grammars::_init_try_stack(); })} : qq{(?{; *Regexp::Grammars::LOGFILE = Regexp::Grammars::_open_log('>>','$logfile'); })} ; # After entire match, report whether successful or not... my $post_match_debug = $runtime_debugging_requested ? qq{(?{;Regexp::Grammars::_debug_matched(0,\\%/,'',\$^N)}) |(?>(?{;Regexp::Grammars::_debug_handle_failures(0,''); }) (?!)) } : q{} ; # Subdivide into rule and token definitions, preparing to process each... # REWRITE THIS, USING (PROBABLY NEED TO REFACTOR ALL GRAMMARS TO REUSe # THESE COMPONENTS: # (? \( \s* (?&PARAMS)? \s* \) | (?# NOTHING ) ) # (? (?&PARAM) \s* (?: , \s* (?&PARAM) \s* )* ,? ) # (? (?&VAR) (?: \s* = \s* (?: (?&LITERAL) | (?&PARENCODE) ) )? ) # (? (?&NUMBER) | (?&STRING) | (?&VAR) ) # (? : (?&IDENT) ) my @defns = split m{ ^ [^#\n]*? \K < (obj|)(rule|token) \s*+ : \s*+ ((?:${IDENT}::)*+) (?: ($IDENT) \s*+ = \s*+ )?+ ($IDENT) \s* > }xms, $grammar_spec; # Extract up list of names of defined rules/tokens... # (Name is every 4th item out of every five, skipping the first item) my @subrule_names = @defns[ map { $_ * 6 + 5 } 0 .. ((@defns-1)/6-1) ]; my %subrule_names; # Build a look-up table of subrule names, checking for duplicates... for my $subrule_name (@subrule_names) { if (++$subrule_names{$subrule_name} == 2) { _debug_notify( warn => "Multiple definitions for <$subrule_name>", "(only the first definition will be used)", ); } } # Add the built-ins... @subrule_names{'ws', 'hk', 'matchpos', 'matchline'} = (1) x 4; # An empty main rule will never match anything... my $main_regex = shift @defns; if ($main_regex =~ m{\A (?: \s++ | \(\?\# [^)]* \) | \# [^\n]++ )* \z}xms) { _debug_notify( error => "No main regex specified before rule definitions.", "Grammar will never match anything.", "(Did you forget a specification?)", q{}, ); } # Compile the regex or grammar... my $regex = q{}; my $grammar_name; my $is_grammar; # Is this a grammar specification? if ($main_regex =~ $GRAMMAR_DIRECTIVE) { # Normalize grammar name and report... $grammar_name = $+{grammar_name}; if ($grammar_name !~ /::/) { $grammar_name = caller(1) . "::$grammar_name"; } $is_grammar = 1; # Add subrule definitions to namespace... for my $subrule_name (@subrule_names) { $CACHE{$grammar_name.'::'.$subrule_name} = 1; } } else { state $dummy_grammar_index = 0; $grammar_name = '______' . $dummy_grammar_index++; } # Extract any inheritance information... my ($inherited_rules, $inherited_subrule_names) = _extract_inheritances( $main_regex, $compiletime_debugging_requested, $grammar_name ); # Remove requests... $main_regex =~ s{ $EXTENDS_DIRECTIVE }{}gxms; # Add inherited subrule names to allowed subrule names; @subrule_names{ keys %{$inherited_subrule_names} } = values %{$inherited_subrule_names}; # Remove comments from top-level grammar... $main_regex =~ s{ \(\?\# [^)]* \) | (? }{}gxms) ? 1 : ($main_regex =~ s{ < context \s* : \s* > }{}gxms) ? 0 : 0; # If so, set up to save the grammar... if ($is_grammar) { # Normalize grammar name and report... if ($grammar_name !~ /::/) { $grammar_name = caller(1) . "::$grammar_name"; } if ($compiletime_debugging_requested) { _debug_notify( info => "Processing definition of grammar $grammar_name...", q{}, ); } # Remove the grammar directive... $main_regex =~ s{ $GRAMMAR_DIRECTIVE | < debug: \s* (run | match | step | try | on | off | same ) \s* > }{}gxms; # Check for anything else in the main regex... if ($main_regex =~ /\S/) { _debug_notify( warn => "Unexpected item before first subrule specification", "in definition of :", map({ " $_"} grep /\S/, split "\n", $main_regex), "(this will be ignored when defining the grammar)", ); } # Remember set of valid subrule names... $subrule_names_for{$grammar_name} = { map({ ($_ => 1) } keys %subrule_names), map({ ($grammar_name.'::'.$_ => 1) } grep { !/::/ } keys %subrule_names), }; } else { #...not a grammar specification # Report how main regex was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( info => "Processing the main regex before any rule definitions", ); } # Any actual regex is processed first... $regex = _translate_subrule_calls( $grammar_name, $main_regex, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, q{}, # Expected...what? \%subrule_names, 0, # Whitespace isn't magical $nocontext, ); # Wrap the main regex (to ensure |'s don't segment pre and # post commands)... $regex = "(?:$regex)"; # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( q{} => q{ |}, q{ \\___End of main regex}, q{}, ); } } # Then iterate any following rule definitions... while (@defns) { # Grab details of each rule defn (as extracted by previous split)... my ($objectify, $type, $qualifier, $name, $callname, $body) = splice(@defns, 0, 6); $name //= $callname; my $qualified_name = $grammar_name.'::'.$callname; # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( info => "Defining a $type: <$callname>", " |...Returns: " . ($objectify ? "an object of class '$qualifier$name'" : "a hash"), ); } # Translate any nested <...> constructs... $body = _translate_subrule_calls( $grammar_name, $body, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, $callname, # Expected...what? \%subrule_names, $type eq 'rule', # Is whitespace magical? $nocontext, # Start with the global nocontextuality ); # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( q{} => q{ |}, q{ \\___End of rule definition}, q{}, ); } # Make allowance for possible local whitespace definitions... my $local_ws_defn = q{}; my $local_ws_call = q{(?&ws)}; # Rules make non-code literal whitespace match textual whitespace... if ($type eq 'rule') { # Implement any local whitespace definition... if ($body =~ s{$WS_PATTERN}{}oxms) { my $defn = $1; if ($defn !~ m{\S}xms) { _debug_notify( warn => qq{Ignoring useless empty directive.}, qq{(Did you mean to use a token instead?)}, ); } state $ws_counter = 0; $ws_counter++; $local_ws_defn = qq{(?<__RG_ws_$ws_counter> $defn)}; $local_ws_call = qq{(?&__RG_ws_$ws_counter)}; } # Implement auto-whitespace... state $CODE_OR_SPACE = qr{ \( \?\?? (?&BRACED) \) | (? \{ (?: \\. | (?&BRACED) | [^{}] )* \} ) ) }xms; $body =~ s{($CODE_OR_SPACE)} [ substr($1,0,3) eq '(?{' || substr($1,0,4) eq '(??{' ? $1 : $local_ws_call ]exmsg; #} } elsif ($body =~ s{$WS_PATTERN}{}oxms) { _debug_notify( warn => qq{Ignoring useless directive in a token definition.}, qq{(Did you mean to use a rule instead?)}, ); } $regex .= _translate_rule_def( $type, $qualifier, $name, $callname, $qualified_name, $body, $objectify, $local_ws_defn ); } # Insert checkpoints into any user-defined code block... $regex =~ s{ \( \?\?? \{ \K (?!;) }{ local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK; }xmsg; # Check for any suspicious left-overs from the start of the regex... pos $regex = 0; # Report anything that starts like a subrule, but isn't... my %seen = ( '' => 1, '' => 1, '' => 1, '' => 1); # autogenerated while ($regex =~ m/( (?] )/gxms) { my $construct = $1; my $something = $2 ? 'directive' : 'subrule call'; # Only report potential problems once... next if $seen{$construct}++; # Also explain how to indicate the construct is intentional... _debug_notify( warn => qq{Possible invalid $something:}, qq{ $construct}, qq{(To silence this warning, use: \\$construct}, ); _debug_notify( q{} => q{} ); } # If a grammar definition, save grammar and return a placeholder... if ($is_grammar) { $user_defined_grammar{$grammar_name} = $regex; return qq{(?{ warn "Can't match against \n"; })(*COMMIT)(?!)}; } # Otherwise, aggregrate the final grammar... else { return _complete_regex($regex.$inherited_rules, $pre_match_debug, $post_match_debug, $nocontext); } } sub _complete_regex { my ($regex, $pre_match_debug, $post_match_debug, $nocontext) = @_; return $nocontext ? qq{$pre_match_debug$PROLOGUE$regex$EPILOGUE_NC$post_match_debug} : qq{$pre_match_debug$PROLOGUE$regex$EPILOGUE$post_match_debug}; } 1; # Magic true value required at end of module __END__ =head1 NAME Regexp::Grammars - Add grammatical parsing features to Perl 5.10 regexes =head1 VERSION This document describes Regexp::Grammars version 1.033 =head1 SYNOPSIS use Regexp::Grammars; my $parser = qr{ (?: # Parse and save a Verb in a scalar <.ws> # Parse but don't save whitespace # Parse and save a Noun in a scalar 0.5 ? 'VN' : 'VerbNoun' })> # Save result of expression in a scalar | (?: <[Noun]> # Parse a Noun and save result in a list (saved under the key 'Noun') <[PostNoun=ws]> # Parse whitespace, save it in a list # (saved under the key 'PostNoun') )+ # Parse a Verb and save result in a scalar (saved under the key 'Verb') # Save a literal in a scalar | # Turn on the integrated debugger here <.Cmd= (?: mv? )> # Parse but don't capture a subpattern (name it 'Cmd' for debugging purposes) <[File]>+ # Parse 1+ Files and save them in a list (saved under the key 'File') # Turn off the integrated debugger here # Parse a File and save it in a scalar (saved under the key 'Dest') ) ################################################################ # Define a subrule named File <.ws> # - Parse but don't capture whitespace # - Parse the subpattern and capture # matched text as the result of the # subrule # Define a subrule named Noun cat | dog | fish # - Match an alternative (as usual) # Define a whitespace-sensitive subrule eats # - Match a literal (after any space) ? # - Parse optional subrule Noun and # save result under the key 'Object' | # Or else... # - Parse subrule AUX and save result # - Match a literal, save under 'part' # Define a whitespace-insensitive subrule (has | is) # - Match an alternative and capture (?{ $MATCH = uc $^N }) # - Use captured text as subrule result }x; # Match the grammar against some text... if ($text =~ $parser) { # If successful, the hash %/ will have the hierarchy of results... process_data_in( %/ ); } =head1 QUICKSTART CHEATSHEET =head2 In your program... use Regexp::Grammars; Allow enhanced regexes in lexical scope %/ Result-hash for successful grammar match =head2 Defining and using named grammars... Define a named grammar that can be inherited Current grammar inherits named grammar's rules =head2 Defining rules in your grammar... Define rule with magic whitespace Define rule without magic whitespace Define rule that blesses return-hash into class Define token that blesses return-hash into class Shortcut for above (rule name derived from class) Shortcut for above (token name derived from class) =head2 Matching rules in your grammar... Call named subrule (may be fully qualified) save result to $MATCH{RULENAME} Call named subrule, passing args to it Call subrule and fail if it matches (shorthand for (?!<.RULENAME>) ) <:IDENT> Match contents of $ARG{IDENT} as a pattern <\:IDENT> Match contents of $ARG{IDENT} as a literal Match closing delimiter for $ARG{IDENT} <%HASH> Match longest possible key of hash <%HASH {PAT}> Match any key of hash that also matches PAT Match closing delimiter for $MATCH{IDENT} <\_IDENT> Match the literal contents of $MATCH{IDENT} Call subrule, save result in $MATCH{ALIAS} Match a hash key, save key in $MATCH{ALIAS} Match pattern, save match in $MATCH{ALIAS} Execute code, save value in $MATCH{ALIAS} Save specified string in $MATCH{ALIAS} Save specified number in $MATCH{ALIAS} Match closing delim, save as $MATCH{ALIAS} Match '$MATCH{IDENT}', save as $MATCH{ALIAS} <.SUBRULE> Call subrule (one of the above forms), but don't save the result in %MATCH <[SUBRULE]> Call subrule (one of the above forms), but append result instead of overwriting it + % Match one or more repetitions of SUBRULE1 as long as they're separated by SUBRULE2 ** Same (only for backwards compatibility) * % Match zero or more repetitions of SUBRULE1 as long as they're separated by SUBRULE2 =head2 In your grammar's code blocks... $CAPTURE Alias for $^N (the most recent paren capture) $CONTEXT Another alias for $^N $INDEX Current index of next matching position in string %MATCH Current rule's result-hash $MATCH Magic override value (returned instead of result-hash) %ARG Current rule's argument hash $DEBUG Current match-time debugging mode =head2 Directives... Fail if code evaluates false Fail after specified number of seconds Change match-time debugging mode Change debugging log file (default: STDERR) Queue error message and fail parse Queue error message and backtrack Queue warning message and continue Explicitly add a message to debugging log Override automatic whitespace matching Simplify the result of a subrule match Switch on context substring retention Switch off context substring retention =head1 DESCRIPTION This module adds a small number of new regex constructs that can be used within Perl 5.10 patterns to implement complete recursive-descent parsing. Perl 5.10 already supports recursive=descent I, via the new C<< (?...) >> and C<< (?&name) >> constructs. For example, here is a simple matcher for a subset of the LaTeX markup language: $matcher = qr{ (?&File) (?(DEFINE) (? (?&Element)* ) (? \s* (?&Command) | \s* (?&Literal) ) (? \\ \s* (?&Literal) \s* (?&Options)? \s* (?&Args)? ) (? \[ \s* (?:(?&Option) (?:\s*,\s* (?&Option) )*)? \s* \]) (? \{ \s* (?&Element)* \s* \} ) (?