Regexp-Grammars-1.045/000755 000765 000765 00000000000 12645103505 014111 5ustar00damian000000 000000 Regexp-Grammars-1.045/Build.PL000644 000765 000765 00000001053 12576203510 015404 0ustar00damian000000 000000 use strict; use warnings; use Module::Build; my $builder = Module::Build->new( module_name => 'Regexp::Grammars', license => 'perl', dist_author => 'Damian Conway ', dist_version_from => 'lib/Regexp/Grammars.pm', requires => { 'Test::More' => 0, 'version' => 0, 'perl' => '5.10.0', ($] >= 5.018 ? ( 'B::Hooks::Parser' => 0.16, ) : ()) }, add_to_cleanup => [ 'Regexp-Grammars-*' ], ); $builder->create_build_script(); Regexp-Grammars-1.045/Changes000644 000765 000765 00000025633 12645103471 015417 0ustar00damian000000 000000 Revision history for Regexp-Grammars 1.001_003 Tue Apr 7 08:42:33 2009 Initial public release. 1.001_004 Sun Aug 2 23:08:52 2009 * Fixed mishandling of (??{....}) blocks * Attempted to patch around three-way bug with lexicals in regexes (further testing may be required to ensure patch is effective across various perl configs) 1.001_005 Sun Aug 2 23:08:52 2009 * Rerelease to sync numbering with CPAN 1.002 Tue Dec 8 21:30:11 2009 * Various doc tweaks * Removed intermittent "uninitialized" warnings * Added warning when no main regex specified * Refined error message status indicators (now only errors get an indicator for every separate message; info is consolidated) * Fixed exponentiation associativity for demo_calc (thanks Schwern!) * Fixed bug in charset parsing (thanks Dave!) * Removed false error messages regarding explicitly use of built-in and rules * Fixed bug with negative lookaheads that incorporate subrule calls * Localized $/ during debugger interactions * Added variation to allow distinct "internal" and "external" names for objrules and objtokens (thanks Casiano) * Fixed handling of (?#...) comments (thanks Casiano) * Added pure grammar definitions: * Added inheritance from grammar definitions: * Added fully qualified subrule calls to allow derived rules to call base rules: 1.005 Tue Jun 22 05:41:35 2010 * Tweaked internals to allow matches against stringifying objects, without nasty warnings * Extended demo/calc* to allow negatives outside parens (thanks Steven) * Pod tweaks (thanks Carl) * Added autoaction callbacks * Made @! contents unique (no more duplicates due to backtracking retries) * Made work in the top-level pattern of a grammar * Added set_context_width() to allow width of context string column to be adjusted either permanently or within a scope. (Thanks Daniel) * Added l10n feature for and directives (thanks Aki!) * Added directive * Fixed debugging directives in grammars * Added per-hash key patterns to <%HASH> (Thanks Aki) * Added <\IDENT> backrefs * Added inverserefs 1.008 Fri Sep 17 20:53:31 2010 * Pod nits denitted (thanks Christopher) * Added builtin and subrules * Added list)> and %ARG * Added <:argname> * Added <\:argname> and * Added Lucene example to demos (thanks Christian) * Added directive * Updated diagnostics list * Improved behaviour (and documentation) of non-bracketed separators in ** 1.009 Sun Sep 19 09:11:06 2010 * Rerelease to remove spurious dependencies on Data::Show (thanks Salvatore!) 1.010 Tue Sep 28 08:03:42 2010 * Added documentation warning about non-reentrancy of Perl 5 regex engine (thanks Andrew). * Fixed behaviour of ** repetitions wrt whitespace (thanks Andrew) * Documented more explicitly that start-pattern is supposed to act like a regular regex (or a token) wrt to whitespace 1.011 Sun Oct 10 18:57:10 2010 * Added as alias for: (?! <.RULENAME> ) * Added as alias for: (?= <.RULENAME> ) (and made it work around normal lookahead/capture problem) * Fixed major bugs in <:arg> handling 1.012 Wed Nov 3 20:24:36 2010 * Added RFC5322 example (thanks Tom and Abigail!) * Added <:nocontext> and <:context> directives to optimize away unwanted context substrings. * Solved transitive inheritance problem (grammars now fully polymorphic) * Added NEXT:: namespace for generic polymorphism 1.013 Wed Jun 29 14:39:40 2011 * Improved in-doc calculator example (thanks Jake!) * Improved RFC5322 example (thanks Tom and Abigail) * Added directive (thanks Dan) * Added directive * Added better compile-time debugging of standard Perl subpatterns * Added documentation of problems when using objrules whose ctors re-invoke the regex engine (thanks Nathan) * Added new tests for objrules whose classes are based on Moose or autoloading (thanks Nathan!) 1.014 Wed Nov 2 13:57:09 2011 * Improved description of directive to make it clearer that errors manifest in @! variable (thanks Leigh) * Added t/error_non_backtracking.t and demo/demo_error_nonbacktracking.pl to demonstrate use of (*COMMIT) to optimize error messages (thanks Nicolas) * Removed undocumented dependency of test suite on Class::Accessor (thanks Duff) * Tweaked caveats section to reflect improvements both in module (grammar inheritance now fully polymorphic) and in Perl 5.14 regexes (regexes now reentrant) * Fixed problem with \N{NAMED CHARS} under 5.12 and later (thanks Tom!) * Added *% +% and {n,m}% separated repetitions to track the Perl 6 feature 1.015 Wed Feb 29 12:37:25 2012 * Enabled limited support for tracking raw regex components when debugging a grammar (mainly literals and backslashed metacharacters) * Fixed bug that prevented named subpattern captures from including lookbehinds (e.g. didn't work) 1.016 Sat Mar 10 07:01:30 2012 * Fixed omissions in charset recognition within metagrammar (now handles \] and otehr escapes correctly) 1.020 Thu Aug 16 14:13:03 2012 * Fixed licence generation in Makefile.PL * Fixed issues with the stupid behaviour of Perl 5.17+ wrt (un)backslashed {'s * Fixed bad code in SYNOPSIS example (Thanks Paul!) * Fixed bad code in demo/demo_pos.pl (Thanks Peng) * Corrected docs for (Thanks Peng) * BACKWARDS INCOMPATIBLE CHANGE!!!! Due to limitations in Perl's qr overload The <\IDENT> backreference syntax has had to be changed to <\_IDENT> 1.021 Mon Aug 20 13:55:34 2012 * Made Latin-1 encoding of docs explicit 1.022 Tue Jan 22 18:39:16 2013 * Fixed minimization (by handling nocontext marker correctly) (thanks Thomas!) 1.025 Wed Jan 30 09:39:52 2013 * Reuploaded with no substantive modifications 1.026 Thu Jan 31 08:20:14 2013 * Re-fixed minimization (by handling nocontext marker correctly) (thanks again Thomas!) 1.027 Fri May 10 07:43:52 2013 * Fixed nasty bug where 0 used as an atom (many thanks Arseny!) 1.028 Sat May 11 06:00:00 2013 * Fixed very nasty caching bug within interpolation support (many more thanks Arseny!) 1.029 Tue Jun 25 15:44:35 2013 * Doc patch (thanks Steven) * Added workarounds for some unfortunate changes in 5.18 behaviour (thanks Steven) * CRITICAL: Added warning regarding fundamental and intractable incompatibilities with Perl 5.18, and announcing that Regexp::Grammars is not supported under that version of Perl. :-( 1.030 Wed Jun 26 07:17:41 2013 * Added essential-but-missing Skip_if_Perl_518 to the MANIFEST 1.031 Mon Aug 19 09:55:37 2013 * Updated warning re 5.18 incompatibilities. Some progress has been made, but a complete solution is still at least a month away, possibly longer. * Added dependency on Lexical::Var under Perl 5.18 to overcome problem with magic pseudo-variables 1.032 Thu Aug 29 10:44:21 2013 * Culled stray DB::single = 1 (thanks Robert!) 1.033 Sat Aug 31 13:10:57 2013 * Listed dependency on Lexical::Var under Perl 5.18 to overcome problem with magic pseudo-variables (thanks Andreas!) 1.034 Wed Jun 11 06:50:04 2014 * Reverted actual encoding to match nominated encoding (i.e. Latin-1) (thanks Olivier) * Fixed last bug preventing module from passing its own test suite under 5.20. * Gave up (for the present) on Perl 5.18 compatibility. * Noted limitations of passing %MATCH values as subrule args under 5.18+ 1.035 Sat Jun 28 19:03:54 2014 (All of the following with deepest thanks to Hugo...) * Removed no-longer-necessar Skip_if_Perl_518.pm from MANIFEST * Allowed rule declarations to be made anywhere (not just at the start of a line) * Fixed buggy edge-case for in-rule whitespace auto-matching at start of rule body (now works even if there is only a single whitespace between and first element of rule body) * Made R::G auto-/x any regex used under its suasion (solves formerly intractable problem of detecting a missing /x) * R::G now short-circuits any regex in its scope that does not contain any R::G constructs (and does not auto-/x them either) * Added line numbers to all warnings generated by the module * Added detection of "stray" quantifiers: unquoted quantifiers that don't actually quantify anything. * Added line-number annotations to the transformed regex as a last-gasp kind of assistance when confronted with the dreaded post-transformation "error marked with <-- HERE" message 1.036 Mon Sep 15 12:58:47 2014 * Minor doc improvements * Fixed odd behaviour of (Thanks, Chris!) 1.038 Thu Dec 11 14:52:05 2014 * Fixed bug where actions persisted after a failed ->with_actions() match (Thanks Hao Wu!) 1.039 Sat Feb 7 08:39:58 2015 * Module works correctly under 5.18.4. Updated warnings to reflect this. (Thanks p5p!) 1.040 Thu Mar 26 07:44:41 2015 * Tweaked Makefile.PL and BUILD.PL to (maybe) work more happily with the CPANTesters toolchain 1.041 Sun May 3 12:57:37 2015 * Documented edge case where new 5.18 regex compilation semantics breaks the <%hash> construct, listing two work-arounds (thanks David and Gianni!) * Fixed subtle problem with implicit whitespace-matching invalidating subrule argument lists (thanks Alex!) 1.042 Wed Sep 16 16:32:23 2015 * Prelimary attempt to support raw named captures as well as R::G syntax (may induce bugs, though none found in the test suite) * Fixed bug deep inside demo/demo_rfc5322.pl (Thanks, Dale!) * Replaced Lexical::Vars with B::Hooks::Parser for 5.22+ compatibility (Thanks Alex!) 1.043 Mon Dec 14 13:14:25 2015 * Eliminated redundant calls to setup() and teardown() for B::Hooks::Parser * Removed timeout test, as it cannot be reliably run across platforms much faster or slower than the author's development platform. (Thanls Slavin) 1.044 Wed Dec 16 08:22:01 2015 * Yet another attempt to code around the changes in vars-inside-regexes to preserve post-5.18 compatibility (Thanks, Kent!) 1.045 Tue Jan 12 15:55:21 2016 * Fixed bug causing premature clearing of action handlers (thanks Keith!) Regexp-Grammars-1.045/demo/000755 000765 000765 00000000000 12645103505 015035 5ustar00damian000000 000000 Regexp-Grammars-1.045/lib/000755 000765 000765 00000000000 12645103505 014657 5ustar00damian000000 000000 Regexp-Grammars-1.045/Makefile.PL000644 000765 000765 00000001236 12576205717 016100 0ustar00damian000000 000000 use strict; use warnings; use ExtUtils::MakeMaker; WriteMakefile( NAME => 'Regexp::Grammars', AUTHOR => 'Damian Conway ', LICENSE => 'perl', VERSION_FROM => 'lib/Regexp/Grammars.pm', ABSTRACT_FROM => 'lib/Regexp/Grammars.pm', PL_FILES => {}, PREREQ_PM => { 'Test::More' => 0, 'version' => 0, ($] >= 5.018 ? ( 'B::Hooks::Parser' => 0.16, ) : ()) }, dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, clean => { FILES => 'Regexp-Grammars-*' }, ); Regexp-Grammars-1.045/MANIFEST000644 000765 000765 00000004203 12645103506 015242 0ustar00damian000000 000000 Build.PL Changes MANIFEST Makefile.PL README lib/Regexp/Grammars.pm t/00.load.t t/alias_literal.t t/aliased_subpatterns.t t/arg.t t/autoflatten.t t/backref.t t/backref_ARG.t t/charset.t t/class_accessor.t t/comment.t t/data_structure.t t/error.t t/error_non_backtracking.t t/error_translate.t t/fatal.t t/fwdref.t t/grammar.t t/grammar_polymorphism.t t/hash.t t/hash_redef.t t/hash_redef_local.t t/inline_computation.t t/inline_computation_handler.t t/inline_computation_obj_handler.t t/local_ws.t t/lookaheads.t t/matchline.t t/matchpos.t t/moose.t t/neg_lookahead.t t/new.t t/new_init.t t/new_init_autoload.t t/new_init_limited_autoload.t t/new_init_limited_autoload_warn.t t/no_context.t t/no_context_counterlocal.t t/no_context_local.t t/obj_rename.t t/pod.t t/repop_ws.t t/seplist.t t/top_is_token.t demo/demo_IP4.pl demo/demo_LaTeXish.pl demo/demo_LaTeXish_dump.pl demo/demo_Lucene_query.pl demo/demo_calc.pl demo/demo_calc_class.pl demo/demo_calc_inline.pl demo/demo_calc_list.pl demo/demo_calc_list_autoactions.pl demo/demo_calc_list_inline.pl demo/demo_debug.pl demo/demo_error.pl demo/demo_error_non_backtracking.pl demo/demo_flattening.pl demo/demo_hash.pl demo/demo_hash_lexicon.pl demo/demo_hash_lookup_generated.pl demo/demo_hash_lookup_hardcoded.pl demo/demo_hash_lookup_hashrule.pl demo/demo_hash_symtab.pl demo/demo_hashlines.pl demo/demo_list.pl demo/demo_metagrammar.pl demo/demo_pos.pl demo/demo_raw_debug.pl demo/demo_require.pl demo/demo_rfc5322.pl demo/demo_whoson.pl t/charnames.t t/seplist_countedhash_0.t t/seplist_countedhash_0_.t t/seplist_countedhash_0_1.t t/seplist_countedhash_0_N.t t/seplist_countedhash_1.t t/seplist_countedhash_1_.t t/seplist_countedhash_1_N.t t/seplist_countedhash_M_.t t/seplist_countedhash_M_N.t t/seplist_countedhash_N.t t/seplist_greediness.t t/seplist_plushash.t t/seplist_questionmark.t t/seplist_rawhash.t t/seplist_starhash.t t/error_non_hash_based.t t/minimize_bug.t t/missing_slash_x.t t/one_liner.t t/clear_actions.t t/ws_redefine.t t/pseudovars.t META.yml Module YAML meta-data (added by MakeMaker) META.json Module JSON meta-data (added by MakeMaker) Regexp-Grammars-1.045/META.json000644 000765 000765 00000001646 12645103505 015541 0ustar00damian000000 000000 { "abstract" : "Add grammatical parsing features to Perl 5.10 regexes", "author" : [ "Damian Conway " ], "dynamic_config" : 1, "generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.142690", "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "Regexp-Grammars", "no_index" : { "directory" : [ "t", "inc" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "runtime" : { "requires" : { "Test::More" : "0", "version" : "0" } } }, "release_status" : "stable", "version" : "1.045" } Regexp-Grammars-1.045/META.yml000644 000765 000765 00000001036 12645103505 015362 0ustar00damian000000 000000 --- abstract: 'Add grammatical parsing features to Perl 5.10 regexes' author: - 'Damian Conway ' build_requires: ExtUtils::MakeMaker: '0' configure_requires: ExtUtils::MakeMaker: '0' dynamic_config: 1 generated_by: 'ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.142690' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: Regexp-Grammars no_index: directory: - t - inc requires: Test::More: '0' version: '0' version: '1.045' Regexp-Grammars-1.045/README000644 000765 000765 00000001171 12645103471 014773 0ustar00damian000000 000000 Regexp::Grammars version 1.045 This module adds a small number of new regex constructs that can be used within Perl 5.10 patterns to implement complete recursive-descent parsing. INSTALLATION To install this module, run the following commands: perl Makefile.PL make make test make install Alternatively, to install with Module::Build, you can use the following commands: perl Build.PL ./Build ./Build test ./Build install DEPENDENCIES Perl 5.10. COPYRIGHT AND LICENCE Copyright (C) 2009, Damian Conway This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Regexp-Grammars-1.045/t/000755 000765 000765 00000000000 12645103505 014354 5ustar00damian000000 000000 Regexp-Grammars-1.045/t/00.load.t000644 000765 000765 00000000170 12576202040 015671 0ustar00damian000000 000000 use Test::More tests => 1; use Regexp::Grammars; ok 1; diag( "Testing Regexp::Grammars $Regexp::Grammars::VERSION" ); Regexp-Grammars-1.045/t/alias_literal.t000644 000765 000765 00000001066 12161417502 017347 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $test_grammar = qr{ one | negtwo | str }xms; no Regexp::Grammars; ok "one" =~ $test_grammar => 'One matched'; is $/{alt}, 1 => 'Correct alternative'; ok "negtwo" =~ $test_grammar => 'NegTwo matched'; is $/{alt}, -20 => 'Correct alternative'; ok "str" =~ $test_grammar => 'Str matched'; is $/{alt}, "str'ing" => 'Correct alternative'; Regexp-Grammars-1.045/t/aliased_subpatterns.t000644 000765 000765 00000002112 12625775454 020611 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ | <_pat='".*"'> | }xms }; my $WARNINGS; my $lookbehind = do { use Regexp::Grammars; BEGIN { close *Regexp::Grammars::LOGFILE; open *Regexp::Grammars::LOGFILE, '>', \$WARNINGS; } qr{ }xms; }; if ($] < 5.018 || $] >= 5.020) { ok !defined $WARNINGS => "No warnings found '$WARNINGS'"; } ok +('"abc"' =~ $parser) => 'Matched '; is $/{str}, '"abc"' => 'Captured correctly'; ok +(42 =~ $parser) => 'Matched '; is $/{num}, 42 => 'Captured correctly'; ok +('true' =~ $parser) => 'Matched '; is $/{bool}, 'true or false' => 'Pseudo-captured correctly'; ok +('barfoo' !~ $lookbehind) => 'Neg lookbehind worked'; ok +('foo' !~ $lookbehind) => 'Pos lookbehind worked'; ok +('carfoo' =~ $lookbehind) => 'Both lookbehinds worked'; is $/{foo}, 'foo' => 'Pseudo-captured correctly'; Regexp-Grammars-1.045/t/arg.t000644 000765 000765 00000002545 12625776173 015337 0ustar00damian000000 000000 use 5.010; use strict; use Test::More; plan 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ 'fo+/')> | 'end')> | scalar(reverse($::MATCH{keyword})) })> (??{ quotemeta( ($::ARG{prefix}//q{}) . $::ARG{keyword} ) }) (<:delim>) }xms; }; ok 'fooxoof' =~ $test_grammar => 'Match reverse'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{revkeyword}, 'oof' => 'Revkeyword as expected'; ok 'fooxendfoo' =~ $test_grammar => 'Match end'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{unkeyword}, 'endfoo' => 'Unkeyword as expected'; ok 'fooxfoo/' =~ $test_grammar => 'Match /'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is_deeply $/{dekeyword}, { "" =>'foo/', 'terminator'=>'foo/' } => 'Dekeyword as expected'; Regexp-Grammars-1.045/t/autoflatten.t000644 000765 000765 00000002114 12161417561 017070 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ | | | 1)> \d++ ' ' | <[Dash=(-)]> ** <[Dot=(\.)]> }xms }; ok +("'abc'" =~ $parser) => 'Matched str'; is_deeply $/{str}, { q{} => "'abc'", content => 'abc' } => 'Unflattened correctly'; ok +(42 =~ $parser) => 'Matched num'; is $/{num}, 42 => 'Flattened correctly'; ok +('true' =~ $parser) => 'Matched true'; is $/{bool}, 't' => 'Flattened correctly'; ok +('false' =~ $parser) => 'Matched false'; is $/{bool}, 'alse' => 'Flattened correctly'; ok +('-.-.-' =~ $parser) => 'Matched list'; is_deeply $/{list}, { q{}=>'-.-.-', Dash=>['-','-','-'], Dot=>['.','.'] } => 'Flattened correctly'; ok +('-' =~ $parser) => 'Matched minimized list'; is $/{list}, '-' => 'Flattened correctly'; Regexp-Grammars-1.045/t/backref.t000644 000765 000765 00000003346 12161417575 016154 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $delimited = qr{ <\_delim> }xms; my $delimited_cap = qr{ }xms; my $delimited_listcap = qr{ <[rdel=\_delim]> }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($input, $expected_outcome) = split /\s*:\s*/, $input; if ($expected_outcome eq 'succeed') { ok +($input =~ $delimited) => "Match of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; ok +($input =~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; is $/{rdel}, substr($input,0,1) => "Captured backreference"; ok +($input =~ $delimited_listcap) => "Match and list capture of $input ${expected_outcome}ed"; is $/{delim}, substr($input,0,1) => "Captured delimiter"; is $/{content}, substr($input,1,-1) => "Captured content"; is_deeply $/{rdel}, [substr($input,0,1)] => "Captured backreference"; } else { ok +($input !~ $delimited) => "Match of $input ${expected_outcome}ed"; ok +($input !~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; ok +($input !~ $delimited_listcap) => "Match and list of $input ${expected_outcome}ed"; } } __DATA__ 'a' :succeed "abc" :succeed `` :succeed 'abc" :fail `abc' :fail Regexp-Grammars-1.045/t/backref_ARG.t000644 000765 000765 00000001513 12161417617 016634 0ustar00damian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ | end_ <\:keyword> }xms; }; ok 'fooxend_foo' =~ $test_grammar => 'Match end'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{end_keyword}, 'end_foo' => 'End_keyword as expected'; ok 'fooxoof' =~ $test_grammar => 'Match rev'; is $/{keyword}, 'foo' => 'Keyword as expected'; is $/{content}, 'x' => 'Content as expected'; is $/{rev_keyword}, 'oof' => 'End_keyword as expected'; Regexp-Grammars-1.045/t/charnames.t000644 000765 000765 00000000330 12161417637 016505 0ustar00damian000000 000000 use Test::More 'no_plan'; use 5.010; use charnames ':full'; use Regexp::Grammars; my $grammar = qr{ \N{LESS-THAN SIGN} a \N{GREATER-THAN SIGN} }xms; ok '' =~ $grammar => '\N{NAMED} correctly interpolated' Regexp-Grammars-1.045/t/charset.t000644 000765 000765 00000002415 12345665544 016211 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; # This checks for a bug where [^\]] was not interpreted as a charset. my $bracket_bug = qr{ \[ \] }xms; my $escaped_bs = qr{ \[ \] }xms; my $old_bracket = qr{ \[ \] }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my ( $text, $to_match ) = split /:/, $input; if ( $to_match =~ $bracket_bug ) { ok( 'matched bracketed text with [^\]]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^\]]+' ); } if ( $to_match =~ $escaped_bs ) { ok( 'matched bracketed text with [^\\]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^\\]+' ); } if ( $to_match =~ /$old_bracket/ ) { ok( 'matched bracketed text with [^]]+' ); is( $/{Bracketed}{text}, $text ); } else { fail( 'did not match bracketed text with [^]]+' ); } } __DATA__ some text:[some text] and more text :[ and more text ] Regexp-Grammars-1.045/t/class_accessor.t000644 000765 000765 00000003152 12161422160 017523 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my ($class, $data_ref) = @_; my $newobj = bless $data_ref, $class; $newobj->{ctor} = 'was called'; return $newobj; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", "ctor" => "was called", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/clear_actions.t000644 000765 000765 00000001461 12645103347 017355 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More tests => 2; my $actions_should_be_active = 0; { package MyAction; sub new { return bless {}, shift; } sub text { my ($self, $result) = @_; ::ok $actions_should_be_active, 'Text action executed'; return $result; } } my $test_grammar = do { use Regexp::Grammars; qr{ \w+ }x; }; $actions_should_be_active = 1; "abc_test" =~ $test_grammar->with_actions(MyAction->new); $actions_should_be_active = 0; "abc_test" =~ $test_grammar; $actions_should_be_active = 1; '$$$$$$$$' =~ $test_grammar->with_actions(MyAction->new); '$$$$$$$$abc_test' =~ $test_grammar->with_actions(MyAction->new); $actions_should_be_active = 0; "abc_test" =~ $test_grammar; done_testing(); Regexp-Grammars-1.045/t/comment.t000644 000765 000765 00000000437 12521301303 016174 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; my $parser = do{ use Regexp::Grammars; qr{ ^ $ (?# ) a test # some here }xms; }; my $target = q{a test}; ok +($target =~ $parser) => 'Matched'; Regexp-Grammars-1.045/t/data_structure.t000644 000765 000765 00000023735 12345665543 017620 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "" => "\\documentclass[a4paper,11pt]{article}", "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "" => "article", "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "" => "[a4paper,11pt]", "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "" => "\n\\usepackage{latexsym}", "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "" => "latexsym", "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\author{D. Conway}", "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "" => "D.", "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "" => " Conway", "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\title{Parsing \\LaTeX{}}", "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "" => "Parsing", "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "" => " \\LaTeX{}", "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\begin{document}", "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "" => "document", "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n\\maketitle\n", "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "" => "\\tableofcontents\n", "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "" => "\\section{Description}", "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "" => "Description", "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "" => "\n...is", "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "" => " easy", "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "" => " \\footnote{But not \\emph{necessarily} simple}", "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "" => "But", "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "" => " not", "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "" => " \\emph{necessarily}", "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "" => "necessarily", "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "" => " simple", "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "" => ".", "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "" => "\n\\end{document}", "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "" => "document", "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; is $/{""}, $original_input => 'Captured entire text'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.045/t/error.t000644 000765 000765 00000005026 12161422313 015667 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ \A (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) | }xms }; local $/ = ""; while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", "Can't match subrule (not implemented)", ] 1+2zoo [ "Extra junk after expression at index 1: '+2zoo'", "Expected end of input, but found '+2zoo' instead", "Expected valid input, but found '+2zoo' instead", "Can't match subrule (not implemented)", ] zoo [ "Expected literal, but found 'zoo' instead", "Can't match subrule (not implemented)", ] Regexp-Grammars-1.045/t/error_non_backtracking.t000644 000765 000765 00000004614 12161422322 021246 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ \A (*COMMIT) (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) | }xms }; local $/ = ""; while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", ] 1+2zoo [ "Extra junk after expression at index 3: 'zoo'", "Expected end of input, but found 'zoo' instead", "Expected valid input, but found 'zoo' instead", ] zoo [ "Expected literal, but found 'zoo' instead", "Can't match subrule (not implemented)", ] Regexp-Grammars-1.045/t/error_non_hash_based.t000644 000765 000765 00000001003 12161422332 020672 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $goodobj = do{ use Regexp::Grammars; qr{ obj }xms }; my $badobj = do{ use Regexp::Grammars; qr{ obj }xms }; close *STDERR; ok 'obj' =~ $goodobj => 'GoodObj'; ok 'obj' !~ $badobj => 'BadObj'; package GoodObj; sub new { bless {}, shift; } package BadObj; sub new { bless [], shift; } Regexp-Grammars-1.045/t/error_translate.t000644 000765 000765 00000006231 12161422341 017744 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; sub translator { my ($errormsg, $rulename, $context) = @_; if (substr($rulename,0,1) eq '-') { $rulename = substr($rulename,1); return "You forgot to define $rulename. :-("; } if ($errormsg eq q{}) { if ($rulename) { return "<$rulename> failed to match '$context'"; } else { return "Main pattern failed to match '$context'"; } } if (lc(substr($errormsg,0,6)) eq 'wanted') { return "$errormsg, but was given '$context'. What's up with that?"; } return $errormsg; } my $calculator = do{ use Regexp::Grammars; qr{ \A (?: \Z | ) <[_Operand=Mult]> ** <[_Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{_Operand}}; for my $term (@{$MATCH{_Operand}}) { my $op = shift @{$MATCH{_Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) | (?: <[_Operand=Pow]> ** <[_Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{_Op}}) . $b) } @{$MATCH{_Operand}}; }) ) (?: <[_Operand=Term]> ** <_Op=(\^)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{_Operand}}; }) ) (?: | \( \) ) <...> | }xms }; local $/ = ""; { my $temp = Regexp::Grammars::set_error_translator(\&translator); while (my $input = ) { chomp $input; my ($text, $expected) = split /\s+/, $input, 2; if ($text =~ $calculator) { is $/{Answer}, $expected => "Input $.: $text"; } else { is_deeply \@!, eval($expected), => "Input $.: $text"; } } } { use Regexp::Grammars; if ('foo' =~ m{ <...> }xms) { fail 'Restore default translator'; } else { is_deeply \@!, ["Can't match subrule (not implemented)"] => 'Restore default translator'; } } __DATA__ 2 2 2*3+4 10 2zoo [ "Extra junk after expression at index 1: 'zoo'", "Wanted end of input, but was given 'zoo'. What's up with that?", "Main pattern failed to match 'zoo'", "You forgot to define Trailing_stuff. :-(", ] zoo [ " failed to match 'zoo'", "You forgot to define Trailing_stuff. :-(", ] Regexp-Grammars-1.045/t/fatal.t000644 000765 000765 00000002452 12345665541 015645 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; my %AcceptableVersions = ( '0.95' => 1, '0.98' => 1, '1.01' => 1, ); my %UnacceptableVersions = ( '0.99' => 1, '1.00' => 1, ); my $version_checker = do{ use Regexp::Grammars; qr{ vers = <%AcceptableVersions> | vers = | vers = }xms; }; ok 'vers = 0.95' =~ $version_checker => 'Matched version 0.95'; ok @! == 0 => 'with no error messages'; ok 'vers = 0.99' !~ $version_checker => 'Correctly failed to match version 0.99'; ok @! == 1 => 'with correct number of error messages'; is $![0], 'Cannot parse language version 0.99' => 'with correct error message'; ok 'vers = 0.96' !~ $version_checker => 'Correctly failed to match version 0.96'; ok @! == 1 => 'with correct number of error messages'; is $![0], q{Expected valid language version, but found '0.96' instead} => 'with correct error message'; Regexp-Grammars-1.045/t/fwdref.t000644 000765 000765 00000005100 12161422473 016013 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $delimited = qr{ «»'"`]+)> }xms; my $delimited_cap = qr{ «»'"`]+)> }xms; my $delimited_listcap = qr{ «»'"`]+)> <[rdel=/delim]> }xms; no Regexp::Grammars; while (my $spec = ) { next if $spec !~ /\S/; chomp $spec; my $spec_copy = $spec; my ($input, $expected_outcome) = split /\s*:\s*/, $spec; my ($ldelim, $content, $rdelim) = split /(xxx)/, $input; if ($expected_outcome eq 'succeed') { ok +($input =~ $delimited) => "Match of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; ok +($input =~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; is $/{rdel}, $rdelim => "Captured closer"; ok +($input =~ $delimited_listcap) => "Match and list capture of $input ${expected_outcome}ed"; is $/{delim}, $ldelim => "Captured delimiter"; is $/{content}, $content => "Captured content"; is_deeply $/{rdel}, [$rdelim] => "Captured closer"; } else { ok +($input !~ $delimited) => "Match of $input ${expected_outcome}ed"; ok +($input !~ $delimited_cap) => "Match and capture of $input ${expected_outcome}ed"; ok +($input !~ $delimited_listcap) => "Match and list of $input ${expected_outcome}ed"; } } __DATA__ "xxx" :succeed `xxx' :succeed ``xxx'' :succeed 'xxx" :fail {xxx} :succeed [xxx] :succeed :succeed (xxx) :succeed «xxx» :succeed [[xxx]] :succeed {{{xxx}}} :succeed ((((xxx)))) :succeed <> :succeed ««xxx»» :succeed }xxx{ :succeed ]xxx[ :succeed )xxx( :succeed >xxx< :succeed »xxx« :succeed }}}xxx{{{ :succeed ]]xxx[[ :succeed ))))xxx(((( :succeed >>xxx<< :succeed »»xxx«« :succeed ({xxx}) :succeed (*xxx*) :succeed /*xxx*/ :succeed ifxxxfi :succeed `` :fail 'abc" :fail {xxx{ :fail [xxx[ :fail \( <[Elem]> ** (,) \) }xms; # Derived grammar... qr{ \[ <[Elem]> ** (,) \] [01]+ }xms; # Other grammar (for MI)... qr{ [.-]+ }xms; my $list_of_int = qr{ \d+ }xms; my $list_of_nonint = qr{ [^\d,]+ }xms; my $list_without_elem = qr{ }xms; my $list_of_binary = qr{ }xms; my $list_of_binary_or_nonint = qr{ [^\d,]+ | }xms; my $list_of_morse = qr{ # Elem redefinition from here # List redefinition from here # (requires C3 resolution to work) }xms; no Regexp::Grammars; { local $SIG{__WARN__} = sub { my ($errmsg) = @_; is $errmsg, "Can't match directly against a pure grammar: \n" => "Can't match against pure grammars"; }; ok "" !~ $base_grammar => "Match against pure grammar failed"; } ok '(1,2,3)' !~ $list_without_elem => 'Unrepleaced Elem failed'; is $![0], 'Elem matcher not implemented' => 'Error message correct'; ok '(1,23,456)' =~ $list_of_int => 'Polymorphic Elem worked'; is_deeply $/{List}{Elem}, [1,23,456] => 'Extracted correct data'; ok '(a,bc,def)' =~ $list_of_nonint => 'Polymorphic Elem worked again'; is_deeply $/{List}{Elem}, ['a','bc','def'] => 'Extracted correct data'; ok '[0,10,010]' =~ $list_of_binary => '2nd order inheritance worked'; is_deeply $/{List}{Elem}, ['0','10','010'] => 'Extracted correct data'; ok '[0,bc,010]' =~ $list_of_binary_or_nonint => 'Explicit call to overridden worked'; is_deeply $/{List}{Elem}, ['0','bc','010'] => 'Extracted correct data'; ok '[.,-.,..-]' =~ $list_of_morse => 'Multiple inheritance worked'; is_deeply $/{List}{Elem}, ['.','-.','..-'] => 'Extracted correct data'; Regexp-Grammars-1.045/t/grammar_polymorphism.t000644 000765 000765 00000001021 12161422512 020776 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; qr{ baserule baserule2 }x; qr{ }x; my $baserule = qr{ }x; my $baserule2 = qr{ }x; ok 'baserule' =~ $baserule => 'Specific polymorphism worked'; ok 'baserule2' =~ $baserule2 => 'Generic polymorphism worked'; Regexp-Grammars-1.045/t/hash.t000644 000765 000765 00000002153 12345665540 015476 0ustar00damian000000 000000 use warnings; use Test::More; plan 'no_plan'; my %hash = ( do => 'a deer', re => 'a drop of golden sun', dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash]>+ }xms; }; my $first_only = do { use Regexp::Grammars; qr{ }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash>+ }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; if ($input =~ $listified) { is_deeply $/{WORD}, eval($expected), "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, eval($expected)->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['dore','fa','me','re','do'] dorefamell ['dore','fa','me'] zzzzz FAIL zzzdoremezzz ['dore','me'] Regexp-Grammars-1.045/t/hash_redef.t000644 000765 000765 00000002124 12345665536 016646 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; my %hash = ( dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash]>+ .{2} }xms; }; my $first_only = do { use Regexp::Grammars; qr{ .{2} }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash>+ .{2} }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; if ($input =~ $listified) { is_deeply $/{WORD}, eval($expected), "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, eval($expected)->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['fa','me'] dorefamell ['fa','me'] zzzzz FAIL zzzdoremezzz ['me'] Regexp-Grammars-1.045/t/hash_redef_local.t000644 000765 000765 00000003507 12345665535 020025 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; my %hash = ( dore => 'a portal', me => 'a name I call myself', fa => 'a long long way to run', ); my $listified = do { use Regexp::Grammars; qr{ <[WORD=%hash{ .{2} }]>+ }xms; }; my $first_only = do { use Regexp::Grammars; qr{ }xms; }; my $first_only_override = do { use Regexp::Grammars; qr{ \w+ }xms; }; my $first_two_override = do { use Regexp::Grammars; qr{ \w+ }xms; }; my $no_cap = do { use Regexp::Grammars; qr{ <%hash{..}>+ }xms; }; while (my $line = ) { my ($input, $expected) = split /\s+/, $line; my $expected_data = eval $expected; if ($input =~ $listified) { is_deeply $/{WORD}, $expected_data, "list: $input"; } else { is 'FAIL', $expected, "list: $input"; } if ($input =~ $first_only) { is $/{WORD}, $expected_data->[0], "scalar: $input"; } else { is 'FAIL', $expected, "scalar: $input"; } if ($input =~ $first_only_override) { is $/{WORD}, $expected_data->[0], "scalar (override): $input"; } else { is 'FAIL', $expected, "scalar (override): $input"; } if (@{$expected_data} > 1) { if ($input =~ $first_two_override) { is $/{WORD1}, $expected_data->[0], "scalars[0] (override): $input"; is $/{WORD2}, $expected_data->[1], "scalars[1] (override): $input"; } else { is 'FAIL', $expected, "scalars (override): $input"; } } if ($input =~ $no_cap) { isnt 'FAIL', $expected, "no-cap: $input"; } else { is 'FAIL', $expected, "no-cap: $input"; } } __DATA__ dorefameredo ['fa','me'] dorefamell ['fa','me'] zzzzz FAIL zzzdoremezzz ['me'] Regexp-Grammars-1.045/t/inline_computation.t000644 000765 000765 00000002663 12161422560 020446 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use List::Util qw< reduce >; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> (?{ $MATCH = shift @{$MATCH{Operand}}; for my $term (@{$MATCH{Operand}}) { my $op = shift @{$MATCH{Op}}; if ($op eq '+') { $MATCH += $term; } else { $MATCH -= $term; } } }) <[Operand=Pow]> ** <[Op=(\*|/|%)]> (?{ $MATCH = reduce { eval($a . shift(@{$MATCH{Op}}) . $b) } @{$MATCH{Operand}}; }) <[Operand=Term]> ** (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{Operand}}; }) | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } __DATA__ 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.045/t/inline_computation_handler.t000644 000765 000765 00000003015 12161422570 022134 0ustar00damian000000 000000 use 5.010; use warnings; package Calculator; use List::Util qw< reduce >; sub Answer { my ($class, $result_hash) = @_; my $sum = shift @{$result_hash->{Operand}}; for my $term (@{$result_hash->{Operand}}) { my $op = shift @{$result_hash->{Op}}; if ($op eq '+') { $sum += $term; } else { $sum -= $term; } } return $sum; } sub Mult { my ($class, $result_hash) = @_; return reduce { eval($a . shift(@{$result_hash->{Op}}) . $b) } @{$result_hash->{Operand}}; } sub Pow { my ($class, $result_hash) = @_; return reduce { $b ** $a } reverse @{$result_hash->{Operand}}; } use Test::More 'no_plan'; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> <[Operand=Pow]> ** <[Op=(\*|/|%)]> <[Operand=Term]> ** | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator->with_actions('Calculator')) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } __DATA__ 1+1+1 = 3 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.045/t/inline_computation_obj_handler.t000644 000765 000765 00000003256 12161422577 023004 0ustar00damian000000 000000 use 5.010; use warnings; package Calculator; use List::Util qw< reduce >; sub Answer { my ($class, $result_hash) = @_; my $sum = shift @{$result_hash->{Operand}}; for my $term (@{$result_hash->{Operand}}) { my $op = shift @{$result_hash->{Op}}; if ($op eq '+') { $sum += $term; } else { $sum -= $term; } } return $sum; } sub Mult { my ($class, $result_hash) = @_; return reduce { eval($a . shift(@{$result_hash->{Op}}) . $b) } @{$result_hash->{Operand}}; } sub Pow { my ($class, $result_hash) = @_; return reduce { $b ** $a } reverse @{$result_hash->{Operand}}; } use Test::More 'no_plan'; my $calculator = do{ use Regexp::Grammars; qr{ <[Operand=Mult]> ** <[Op=(\+|\-)]> <[Operand=Pow]> ** <[Op=(\*|/|%)]> <[Operand=Term]> ** | \( \) }xms }; while (my $input = ) { chomp $input; my ($expr, $result) = split /\s*=\s*/, $input; ok +($expr =~ $calculator->with_actions(bless {}, 'Calculator')) => "Matched expression: $expr"; cmp_ok $/{Answer}, '==', $result => "Got right answer ($result)"; } ok +('1+1' =~ $calculator) => "Matched final expression"; use Data::Dumper 'Dumper'; cmp_ok $/{Answer}{""}, 'eq', '1+1' => "Got right answer (hash)"; __DATA__ 1+1+1 = 3 2^3*4+5 = 37 2+3*4^5 = 3074 2+3*4+5 = 19 2*3+4*5 = 26 2*(3+4)*5 = 70 2+3+4-5 = 4 100/10/2 = 5 100/10*2 = 20 Regexp-Grammars-1.045/t/local_ws.t000644 000765 000765 00000003057 12353646360 016360 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $parser = qr{ )*+ > a d d | s u b | m u l \d++ }xms; no Regexp::Grammars; if ('mooouoool 7' =~ $parser) { ok 1 => 'Whitespace overridden'; } else { ok 0 => 'Whitespace overridden'; } my $test_grammar = do { use Regexp::Grammars; qr{ # One type of comment between statements <[statement]> ** ( ; ) # Another type within statements <[arg]> ** ( , ) foo | bar baz }xms; }; my $text = q{ foo baz, baz, baz; # comment bar #{ comment }# baz }; my $expected_result = { "" => "\n foo baz, baz, baz;\n # comment\n bar #{ comment }# baz", "program" => { "" => "\n foo baz, baz, baz;\n # comment\n bar #{ comment }# baz", "statement" => [ { "" => "foo baz, baz, baz", "arg" => ["baz", "baz", "baz"], "cmd" => "foo", }, { "" => "bar #{ comment }# baz", "arg" => ["baz"], "cmd" => "bar" }, ], }, }; ok $text =~ $test_grammar => 'Multiple '; is_deeply \%/, $expected_result => 'Parse is correct'; Regexp-Grammars-1.045/t/lookaheads.t000644 000765 000765 00000000622 12161422614 016651 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $neg_lookahead = qr{ START <[Item]> ** 'END')> END 'END')> \w+ <\:Token> }xms; ok "START do it END END" =~ $neg_lookahead => 'Match'; is_deeply $/{List}{Item}, ["do","it"] => 'Correct match'; Regexp-Grammars-1.045/t/matchline.t000644 000765 000765 00000002646 12161422625 016515 0ustar00damian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ \{ (?: <[matchline]> <[num]> ? )+ | \} | \] \d++ }xms; }; #012345 ok " \n{\naa\n}" =~ $test_grammar => 'Matched test 1'; is $/{startmarker}{at}, 2 => "Aliased "; is $/{startmarker}{after}, 2 => "Post-aliased "; is $/{endmarker}{matchline}, 4 => "Unaliased "; ok ! exists $/{content}{matchline} => "No "; #012345 ok " \n{\naa\n]" =~ $test_grammar => 'Matched test 2'; is $/{startmarker}{at}, 2 => "Aliased "; ok ! exists $/{endmarker}{matchline} => "No unaliased "; ok ! exists $/{content}{matchline} => "No "; #0123456 ok "{11\n22\n\n\n33\n}" =~ $test_grammar => 'Matched test 3'; is $/{startmarker}{at}, 1 => "Aliased "; is $/{endmarker}{matchline}, 6 => "Unaliased "; is_deeply $/{content}{num}, [11,22,33] => "Repeated contents"; is_deeply $/{content}{matchline}, [1,2,5] => "Repeated <[matchline]>"; Regexp-Grammars-1.045/t/matchpos.t000644 000765 000765 00000002612 12161422637 016363 0ustar00damian000000 000000 use strict; use 5.010; use Test::More 'no_plan'; my $test_grammar = do { use Regexp::Grammars; qr{ \{ (?: <[matchpos]> <[num]> ? )+ | \} | \] \d++ }xms; }; #012345 ok " {aa}" =~ $test_grammar => 'Matched test 1'; is $/{startmarker}{at}, 2 => "Aliased "; is $/{startmarker}{after}, 3 => "Post-aliased "; is $/{endmarker}{matchpos}, 5 => "Unaliased "; ok ! exists $/{content}{matchpos} => "No "; #012345 ok " {aa]" =~ $test_grammar => 'Matched test 2'; is $/{startmarker}{at}, 2 => "Aliased "; ok ! exists $/{endmarker}{matchpos} => "No unaliased "; ok ! exists $/{content}{matchpos} => "No "; #0123456 ok "{1 2 3}" =~ $test_grammar => 'Matched test 3'; is $/{startmarker}{at}, 0 => "Aliased "; is $/{endmarker}{matchpos}, 6 => "Unaliased "; is_deeply $/{content}{num}, [1,2,3] => "Repeated contents"; is_deeply $/{content}{matchpos}, [1,3,5] => "Repeated <[matchpos]>"; Regexp-Grammars-1.045/t/minimize_bug.t000644 000765 000765 00000001350 12161422667 017224 0ustar00damian000000 000000 use 5.010; use Data::Dumper; use Regexp::Grammars; my $nocontext = qr{ <[expr]>+ % <[sep=([\w,;])]> <[item]>+ % <[op=([+-])]> (\d+) }xms; use Test::More; plan tests => 2; if ("1+2,3" =~ $nocontext) { is_deeply \%/, { 'list' => { 'sep' => [','], 'expr' => [{ 'item' => ['1','2'], 'op' => ['+'] }, '3'] } } => 'Should not minimize'; } else { fail 'Should not minimize (did not match)'; } if ("1" =~ $nocontext) { is_deeply \%/, { 'list' => 1 } => 'Should minimize'; } else { fail 'Should minimize (did not match)'; } Regexp-Grammars-1.045/t/missing_slash_x.t000644 000765 000765 00000001466 12353700724 017744 0ustar00damian000000 000000 use 5.010; use Test::More 'no_plan'; use Regexp::Grammars; my $grammar_noRG = qr{^a b$}; my $grammar_top = qr{ ^ a b $ }; my $grammar_rule = qr{ ^ a b $ }; my $grammar_token = qr{ ^ a b $ }; ok 'ab' !~ $grammar_noRG => 'No RG correctly fails without space'; ok 'a b' =~ $grammar_noRG => 'No RG correctly matches with space'; ok 'ab' =~ $grammar_top => 'Top correctly matches without space'; ok 'a b' !~ $grammar_top => 'Top correctly fails with space'; ok 'ab' =~ $grammar_token => 'Token correctly matches without space'; ok 'a b' !~ $grammar_token => 'Token correctly fails with space'; ok 'ab' =~ $grammar_rule => 'Rule correctly matches without space'; ok 'a b' =~ $grammar_rule => 'Rule correctly matches with space'; Regexp-Grammars-1.045/t/moose.t000644 000765 000765 00000003445 12161422705 015670 0ustar00damian000000 000000 use 5.010; use warnings; BEGIN{ use Test::More; plan skip_all => "Moose required for testing moosey objrules" if !eval{require Moose}; } use Test::More 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... use Moose::Util::TypeConstraints; subtype 'Non::Regex::Int', as 'Num', where { int($_) == $_ }; no Moose::Util::TypeConstraints; { package Speaker; use Moose; has 'name' => (is => 'rw', isa => 'Str'); has 'alias' => (is => 'rw', isa => 'Str'); has 'id' => (is => 'rw', isa => 'Non::Regex::Int'); } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/neg_lookahead.t000644 000765 000765 00000000555 12161422716 017327 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $neg_lookahead = qr{ START <[Item]> ** (?!) \w+ END }xms; ok "START do it END END" =~ $neg_lookahead => 'Match'; is_deeply $/{List}{Item}, ["do","it"] => 'Correct match'; Regexp-Grammars-1.045/t/new.t000644 000765 000765 00000003107 12161422725 015334 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "check" => "check", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/new_init.t000644 000765 000765 00000004365 12161422733 016365 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless $self->can($accessor); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub name { my $self = shift; @_ ? ($self->{name} = shift) : $self->{name}; } sub alias { my $self = shift; @_ ? ($self->{alias} = shift) : $self->{alias}; } sub id { my $self = shift; @_ ? ($self->{id} = shift) : $self->{id}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/new_init_autoload.t000644 000765 000765 00000004326 12161422742 020252 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/new_init_limited_autoload.t000644 000765 000765 00000004502 12161422751 021755 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; my %allowed = map { $_ => 1 } qw(name alias id); sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless (exists $allowed{$accessor}); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/new_init_limited_autoload_warn.t000644 000765 000765 00000004502 12161422760 023004 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'tests' => 2; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package Speaker; my %allowed = map { $_ => 1 } qw(name alias id); sub new { my $that = shift; #warn "running 'new'\n"; my $self = bless {}, (ref($that) || $that); $self->init(@_); } sub init { my $self = shift; #warn "running 'init'\n"; my %args = (scalar @_ == 1 and UNIVERSAL::isa($_[0], 'HASH')) ? %{ $_[0] } : @_; foreach my $accessor (keys %args) { next unless (exists $allowed{$accessor}); $self->$accessor($args{$accessor}); } #warn "returning initialized object\n"; return $self; } sub AUTOLOAD { my $self = shift; our $AUTOLOAD; my $var = $AUTOLOAD; my $last_colon_pos = rindex($var, ':'); substr $var, 0, $last_colon_pos+1, q{}; #warn "running AUTOLOAD as '$var' with param '$_[0]'\n"; @_ ? ($self->{$var} = shift) : $self->{$var}; } } my $parser = do{ use Regexp::Grammars; qr{ \\"\> (?:\(\‎\‎\)) \w+ (?:(?:<.ws>|\-|\') \w+) \w+ \d+ }xms }; my $target = { "" => "Nathan Gray (‎kolibrie‎)", "speaker" => bless({ "" => "Nathan Gray (‎kolibrie‎)", "alias" => "kolibrie", "id" => 1613, "name" => "Nathan Gray", }, "Speaker"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; #is $/{""}, $original_input => 'Captured entire text'; __DATA__ Nathan Gray (‎kolibrie‎) - ‎Practical Extraction with Regexp::Grammars‎ (50 min) 9 Regexp-Grammars-1.045/t/no_context.t000644 000765 000765 00000016646 12345665533 016751 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "file" => bless({ "check" => "check", "element" => [ bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "args" => bless({ "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; ok !exists $/{""} => 'Entire text not captured'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.045/t/no_context_counterlocal.t000644 000765 000765 00000017657 12345665532 021525 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; ok ! exists $/{""} => "Didn't capture entire text"; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.045/t/no_context_local.t000644 000765 000765 00000020242 12345665531 020104 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; # Use this class declaration to check that classes with ctors # actually call the ctor when objrules use them... { package file; sub new { my ($class, $data_ref) = @_; my $new_obj = bless {'check'=>'check',%{$data_ref}}, $class; return $new_obj; } } my $parser = do{ use Regexp::Grammars; qr{ <[element]>* | \\ ? ? \[ <[option]> ** (,) \] \{ <[element]>* \} [^][\$&%#_{}~^\s,]+ [^][\$&%#_{}~^\s]+ }xms }; my $target = { "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "file" => bless({ "" => "\\documentclass[a4paper,11pt]{article}\n\\usepackage{latexsym}\n\\author{D. Conway}\n\\title{Parsing \\LaTeX{}}\n\\begin{document}\n\\maketitle\n\\tableofcontents\n\\section{Description}\n...is easy \\footnote{But not \\emph{necessarily} simple}.\n\\end{document}", "check" => "check", "element" => [ bless({ "command" => bless({ "" => "\\documentclass[a4paper,11pt]{article}", "args" => bless({ "" => "{article}", "element" => [ bless({ "literal" => bless({ "" => "article", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "documentclass", }, "literal"), "options" => bless({ "option" => [ bless({ "" => "a4paper", }, "option"), bless({ "" => "11pt", }, "option"), ], }, "options"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\usepackage{latexsym}", "args" => bless({ "" => "{latexsym}", "element" => [ bless({ "literal" => bless({ "" => "latexsym", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "usepackage", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\author{D. Conway}", "args" => bless({ "" => "{D. Conway}", "element" => [ bless({ "literal" => bless({ "" => "D.", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "Conway", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "author", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\title{Parsing \\LaTeX{}}", "args" => bless({ "" => "{Parsing \\LaTeX{}}", "element" => [ bless({ "literal" => bless({ "" => "Parsing", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\LaTeX{}", "args" => bless({ "" => "{}", }, "args"), "name" => bless({ "" => "LaTeX", }, "literal"), }, "command"), }, "element"), ], }, "args"), "name" => bless({ "" => "title", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\begin{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "begin", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\maketitle\n", "name" => bless({ "" => "maketitle", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\tableofcontents\n", "name" => bless({ "" => "tableofcontents", }, "literal"), }, "command"), }, "element"), bless({ "command" => bless({ "" => "\\section{Description}", "args" => bless({ "" => "{Description}", "element" => [ bless({ "literal" => bless({ "" => "Description", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "section", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "...is", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "easy", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\footnote{But not \\emph{necessarily} simple}", "args" => bless({ "" => "{But not \\emph{necessarily} simple}", "element" => [ bless({ "literal" => bless({ "" => "But", }, "literal"), }, "element"), bless({ "literal" => bless({ "" => "not", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\emph{necessarily}", "args" => bless({ "" => "{necessarily}", "element" => [ bless({ "literal" => bless({ "" => "necessarily", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "emph", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => "simple", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "footnote", }, "literal"), }, "command"), }, "element"), bless({ "literal" => bless({ "" => ".", }, "literal"), }, "element"), bless({ "command" => bless({ "" => "\\end{document}", "args" => bless({ "" => "{document}", "element" => [ bless({ "literal" => bless({ "" => "document", }, "literal"), }, "element"), ], }, "args"), "name" => bless({ "" => "end", }, "literal"), }, "command"), }, "element"), ], }, "file"), }; my $input = do{ local $/; }; chomp $input; my $original_input = $input; ok +($input =~ $parser) => 'Matched'; is_deeply \%/, $target => 'Returned correct data structure'; is $/{""}, $original_input => 'Captured entire text'; __DATA__ \documentclass[a4paper,11pt]{article} \usepackage{latexsym} \author{D. Conway} \title{Parsing \LaTeX{}} \begin{document} \maketitle \tableofcontents \section{Description} ...is easy \footnote{But not \emph{necessarily} simple}. \end{document} Regexp-Grammars-1.045/t/obj_rename.t000644 000765 000765 00000001246 12204256427 016650 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan tests => 5; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> ** (,) \) \d+ }xms; #say $list_nonempty; no Regexp::Grammars; ok +('(1,2)' =~ $list_nonempty) => 'Matched non-empty list'; is ref($/{List}), 'SaveAs' => 'Class naming worked at top level'; ok ! exists $/{List}{q{}} => 'No top-level context'; is ref($/{List}{Value}[0]), 'Some::Other' => 'Class naming worked at 2nd level'; is ref($/{List}{Value}[1]), 'Some::Other' => 'Class naming again at 2nd level'; Regexp-Grammars-1.045/t/one_liner.t000644 000765 000765 00000001513 12353610211 016504 0ustar00damian000000 000000 use 5.010; use Test::More 'no_plan'; use Regexp::Grammars; my $grammar_noRG = qr{^a b$}; my $grammar_top = qr{ ^ a b $ }; my $grammar_rule = qr{ ^ $ \w}; my $grammar_token = qr{ ^ $ \w}; ok 'ab' !~ $grammar_noRG => 'No RG correctly fails without space'; ok 'a b' =~ $grammar_noRG => 'No RG correctly matches with space'; ok 'ab' =~ $grammar_top => 'Top correctly matches without space'; ok 'a b' !~ $grammar_top => 'Top correctly fails with space'; ok 'ab' =~ $grammar_token => 'Token correctly matches without space'; ok 'a b' !~ $grammar_token => 'Token correctly fails with space'; ok 'ab' =~ $grammar_rule => 'Rule correctly matches without space'; ok 'a b' =~ $grammar_rule => 'Rule correctly matches with space'; Regexp-Grammars-1.045/t/pod.t000644 000765 000765 00000000214 12161423032 015311 0ustar00damian000000 000000 use 5.010; use Test::More; eval "use Test::Pod 1.14"; plan skip_all => "Test::Pod 1.14 required for testing POD" if $@; all_pod_files_ok(); Regexp-Grammars-1.045/t/pseudovars.t000644 000765 000765 00000000666 12634101454 016743 0ustar00damian000000 000000 use strict; use warnings; # ABSTRACT: Test Regexp::Grammars exports package symbols. use Regexp::Grammars; use Test::More; sub has_var { my ($varname) = @_; local $@; eval <<"EOF"; my \$grammar = qr{} EOF note $@ if $@; return !$@; } for my $varname (qw( $CAPTURE $CONTEXT $DEBUG $INDEX $MATCH %ARG %MATCH )) { ok( has_var($varname), "Has $varname" ); } done_testing; Regexp-Grammars-1.045/t/repop_ws.t000644 000765 000765 00000002156 12161423043 016376 0ustar00damian000000 000000 use 5.010; use strict; use warnings; use Regexp::Grammars; use Test::More tests => 4; # The text to match against my $text = 'a' . (' ' x 5) . 'z'; # This should match without backtracking my $repop_match = qr/ \A\Z <[val]> ** \s{5} \w+ /x; # This should NOT match my $repop_nomatch = qr/ \A\Z <[val]> ** \s{3} \w+ /x; # This demonstrates the expected behaviour of $repop_match my $standard_match = qr/ \A\Z <[val]> (?: <[val]> )* \s{5} \w+ /x; # This demonstrates the expected behaviour of $repop_nomatch my $standard_nomatch = qr/ \A\Z <[val]> (?: <[val]> )* \s{3} \w+ /x; ok $text =~ $repop_match => "Repetition operator correctly matches"; ok $text !~ $repop_nomatch => "Repetition operator correctly doesn't match"; ok $text =~ $standard_match => "Simulation correctly matches"; ok $text !~ $standard_nomatch => "Simulation correctly doesn't match"; Regexp-Grammars-1.045/t/seplist.t000644 000765 000765 00000002105 12161423054 016217 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> ** [,] \) \d+ }xms; my $list_empty = qr{ \( (?: <[Value]> ** , x? )? \) (?{ $MATCH{Value} //= [] }) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty) => 'Matched non-empty list:' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } ok +($input_copy =~ $list_empty) => 'Matched possibly-empty list:' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } __DATA__ () : [] (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : [] ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.045/t/seplist_countedhash_0.t000644 000765 000765 00000001464 12204256430 021032 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_parser = qr{ \( <[Value]>{0} % [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input =~ $list_parser xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } __DATA__ () : undef (1) : FAIL (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_0_.t000644 000765 000765 00000001633 12204256431 021170 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>{0,}% \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.045/t/seplist_countedhash_0_1.t000644 000765 000765 00000001566 12161423104 021251 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>{0,1}% \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_0_N.t000644 000765 000765 00000001611 12204256433 021304 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {0,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_1.t000644 000765 000765 00000001607 12204256434 021036 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_1_.t000644 000765 000765 00000001616 12161423127 021172 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.045/t/seplist_countedhash_1_N.t000644 000765 000765 00000001607 12204256435 021314 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {1,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_M_.t000644 000765 000765 00000001624 12204256436 021232 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {3,}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : FAIL (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : [1,2,3,4,5] ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : [1,2,3,4,5] Regexp-Grammars-1.045/t/seplist_countedhash_M_N.t000644 000765 000765 00000001611 12204256437 021345 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {2,4}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : [1,2] (1,2,3) : [1,2,3] (1,2,3,4) : [1,2,3,4] (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : [1,2,3,4] (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_countedhash_N.t000644 000765 000765 00000001573 12204256440 021072 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> {3}% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : FAIL (1) : FAIL (1,2) : FAIL (1,2,3) : [1,2,3] (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : FAIL ( 1 ) : FAIL (1, 2 ) : FAIL (1, 2,3 ) : [1,2,3] (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_greediness.t000644 000765 000765 00000003166 12161423166 020443 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_greedy = qr{ <[item=Value]>+ % [,] \d+ }xms; my $list_parsimonious = qr{ <[item=Value]>+? % [,] \d+ }xms; my $list_parsimonious_anchored = qr{ <[item=Value]>+? % [,] \d+ }xms; my $list_gluttonous = qr{ <[item=Value]>++ % [,] \d+ }xms; no Regexp::Grammars; my $data = '1,2,3,4,5'; my $data_etc = '1,2,3,4,5etc'; ok +($data =~ $list_greedy) => 'Matched greedy'; is_deeply $/{List}{item}, [1,2,3,4] => '...with correct items'; is $/{List}{after}, ',5' => '...with correct remainder'; ok +($data =~ $list_parsimonious) => 'Matched parsimonious'; is_deeply $/{List}{item}, [1] => '...with correct items'; is $/{List}{after}, ',2,3,4,5' => '...with correct remainder'; ok +($data_etc =~ $list_parsimonious_anchored) => 'Matched parsimonious anchored'; is_deeply $/{List}{item}, [1,2,3,4] => '...with correct items'; is $/{List}{after}, ',5etc' => '...with correct remainder'; ok !($data =~ $list_gluttonous) => 'Did not match gluttonous'; ok +($data_etc =~ $list_gluttonous) => 'Matched gluttonous'; #is_deeply $/{List}{item}, [1,2,3,4,5] => '...with correct items'; is $/{List}{after}, 'etc' => '...with correct remainder'; Regexp-Grammars-1.045/t/seplist_plushash.t000644 000765 000765 00000001346 12161423175 020140 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]> +% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } __DATA__ () : FAIL (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : FAIL ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.045/t/seplist_questionmark.t000644 000765 000765 00000001563 12161423207 021030 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $list_nonempty = qr{ \( <[Value]>? % \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; if ($list !~ m{ \( \s* \) }xms) { ok +($input =~ $list_nonempty xor $data_structure =~ /FAIL/) => 'Correct for:' . $list; if ($data_structure !~ /FAIL/) { is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } } } __DATA__ () : undef (1) : [1] (1,2) : FAIL (1,2,3) : FAIL (1,2,3,4) : FAIL (1,2,3,4,5) : FAIL ( ) : undef ( 1 ) : [1] (1, 2 ) : FAIL (1, 2,3 ) : FAIL (1, 2, 3, 4) : FAIL (1, 2, 3, 4,5) : FAIL Regexp-Grammars-1.045/t/seplist_rawhash.t000644 000765 000765 00000001530 12161423215 017734 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More 'no_plan'; use Regexp::Grammars; my $raw_hash = qr{ \A (?: | | | ) \Z \d{1,3} % % \w+ % \d{1,3} \% solution }xms; no Regexp::Grammars; ok +('7% solution' =~ $raw_hash) => 'Matched '; ok exists $/{solution} => '...and matched correct rule'; ok +('7%' =~ $raw_hash) => 'Matched '; ok exists $/{percentage} => '...and matched correct rule'; ok +('%foo' =~ $raw_hash) => 'Matched '; ok exists $/{perl_hash} => '...and matched correct rule'; ok +('%bar % 42%' =~ $raw_hash) => 'Matched '; ok exists $/{perl_mod} => '...and matched correct rule'; Regexp-Grammars-1.045/t/seplist_starhash.t000644 000765 000765 00000001300 12204256443 020114 0ustar00damian000000 000000 use 5.010; use warnings; use Test::More; plan 'no_plan'; use Regexp::Grammars; my $list_parser = qr{ \( <[Value]> *% [,] \) \d+ }xms; no Regexp::Grammars; while (my $input = ) { chomp $input; my $input_copy = $input; my ($list, $data_structure) = split /\s*:\s*/, $input; ok +($input_copy =~ $list_parser) => 'Matched $input: ' . $list; is_deeply $/{List}{Value}, eval($data_structure) => 'Build correct structure'; } __DATA__ () : undef (1) : [1] (1,2) : [1,2] (1,2,3) : [1,2,3] ( ) : undef ( 1 ) : [1] (1, 2 ) : [1,2] (1, 2,3 ) : [1,2,3] Regexp-Grammars-1.045/t/top_is_token.t000644 000765 000765 00000001203 12161423262 017230 0ustar00damian000000 000000 use 5.010; use Test::More 'no_plan'; use Regexp::Grammars; my $grammar_top = qr{ ^ a b $ }xms; my $grammar_rule = qr{ ^ a b $ }xms; my $grammar_token = qr{ ^ a b $ }xms; ok 'ab' =~ $grammar_top => 'Top correctly matches without space'; ok 'a b' !~ $grammar_top => 'Top correctly fails with space'; ok 'ab' =~ $grammar_token => 'Token correctly matches without space'; ok 'a b' !~ $grammar_token => 'Token correctly fails with space'; ok 'ab' =~ $grammar_rule => 'Rule correctly matches without space'; ok 'a b' =~ $grammar_rule => 'Rule correctly matches with space'; Regexp-Grammars-1.045/t/ws_redefine.t000644 000765 000765 00000001115 12521273642 017034 0ustar00damian000000 000000 use Test::More; plan tests => 4; use Regexp::Grammars; my $grammar_implicit_ws = qr{ 42)> foo bar }; my $grammar_explicit_ws = qr{ 42)> foo bar \s* }; ok 'foo bar' =~ $grammar_implicit_ws => 'Implicit grammar matched'; is $/{foo}{param}, 42 => 'Implicit grammar remembered param'; ok 'foo bar' =~ $grammar_explicit_ws => 'Explicit grammar matched'; is $/{foo}{param}, 42 => 'Explicit grammar remembered param'; Regexp-Grammars-1.045/lib/Regexp/000755 000765 000765 00000000000 12645103505 016111 5ustar00damian000000 000000 Regexp-Grammars-1.045/lib/Regexp/Grammars.pm000644 000765 000765 00000722615 12645103471 020237 0ustar00damian000000 000000 =encoding ISO8859-1 =cut package Regexp::Grammars; use re 'eval'; use warnings; use strict; use 5.010; use vars (); use Scalar::Util qw< blessed reftype >; use Data::Dumper qw< Dumper >; our $VERSION = '1.045'; my $anon_scalar_ref = \do{my $var}; my $MAGIC_VARS = q{my ($CAPTURE, $CONTEXT, $DEBUG, $INDEX, $MATCH, %ARG, %MATCH);}; my $PROBLEM_WITH_5_18 = <<'END_ERROR_MSG'; Warning: Regexp::Grammars is unsupported under Perl 5.18.0 through 5.18.3 due to a bug in regex parsing under those versions. Please upgrade to Perl 5.18.4 or later, or revert to Perl 5.16 or earlier. END_ERROR_MSG # Load the module... sub import { # Signal lexical scoping (active, unless something was exported)... $^H{'Regexp::Grammars::active'} = 1; # Process any regexes in module's active lexical scope... use overload; overload::constant( qr => sub { my ($raw, $cooked, $type) = @_; # In active scope and really a regex... if (_module_is_active() && $type =~ /qq?/) { return bless \$cooked, 'Regexp::Grammars::Precursor'; } # Ignore everything else... else { return $cooked; } } ); # Deal with 5.18 issues... if ($] >= 5.018) { # Issue warning... if ($] < 5.018004) { require Carp; Carp::croak($PROBLEM_WITH_5_18); } # Deal with cases where Perl 5.18+ complains about # the injection of (??{...}) and (?{...}) require re; re->import('eval'); # Sanctify the standard Regexp::Grammars pseudo-variables from # Perl 5.18's early enforcement of strictures... my $caller = caller; warnings->unimport('once'); @_ = ( 'vars', '$CAPTURE', '$CONTEXT', '$DEBUG', '$INDEX', '$MATCH', '%ARG', '%MATCH' ); goto &vars::import; } } # Deactivate module's regex effect when it is "anti-imported" with 'no'... sub unimport { # Signal lexical (non-)scoping... $^H{'Regexp::Grammars::active'} = 0; require re; re->unimport('eval'); } # Encapsulate the hoopy user-defined pragma interface... sub _module_is_active { return (caller 1)[10]->{'Regexp::Grammars::active'}; } my $RULE_HANDLER; sub clear_rule_handler { undef $RULE_HANDLER; } { package Regexp; sub with_actions { my ($self, $handler) = @_; $RULE_HANDLER = $handler; return $self; } } #=====[ COMPILE-TIME INTERIM REPRESENTATION OF GRAMMARS ]=================== { package Regexp::Grammars::Precursor; # Only translate precursors once... state %grammar_cache; use overload ( # Concatenation/interpolation just concatenates to the precursor... q{.} => sub { my ($x, $y, $reversed) = @_; if (ref $x) { $x = ${$x} } if (ref $y) { $y = ${$y} } if ($reversed) { ($y,$x) = ($x,$y); } $x .= $y//q{}; return bless \$x, 'Regexp::Grammars::Precursor'; }, # Using as a string (i.e. matching) preprocesses the precursor... q{""} => sub { my ($obj) = @_; return $grammar_cache{ overload::StrVal($$obj) } //= Regexp::Grammars::_build_grammar( ${$obj} ); }, # Everything else, as usual... fallback => 1, ); } #=====[ SUPPORT FOR THE INTEGRATED DEBUGGER ]========================= # All messages go to STDERR by default... *Regexp::Grammars::LOGFILE = *STDERR{IO}; # Debugging levels indicate where to stop... our %DEBUG_LEVEL = ( same => undef, # No change in debugging mode off => 0, # No more debugging run => 1, continue => 1, # Run to completion of regex match match => 2, on => 2, # Run to next successful submatch step => 3, try => 3, # Run to next reportable event ); # Debugging levels can be abbreviated to one character during interactions... @DEBUG_LEVEL{ map {substr($_,0,1)} keys %DEBUG_LEVEL } = values %DEBUG_LEVEL; $DEBUG_LEVEL{o} = $DEBUG_LEVEL{off}; # Not "on" $DEBUG_LEVEL{s} = $DEBUG_LEVEL{step}; # Not "same" # Width of leading context field in debugging messages is constrained... my $MAX_CONTEXT_WIDTH = 20; my $MIN_CONTEXT_WIDTH = 6; sub set_context_width { { package Regexp::Grammars::ContextRestorer; sub new { my ($class, $old_context_width) = @_; bless \$old_context_width, $class; } sub DESTROY { my ($old_context_width_ref) = @_; $MAX_CONTEXT_WIDTH = ${$old_context_width_ref}; } } my ($new_context_width) = @_; my $old_context_width = $MAX_CONTEXT_WIDTH; $MAX_CONTEXT_WIDTH = $new_context_width; if (defined wantarray) { return Regexp::Grammars::ContextRestorer->new($old_context_width); } } # Rewrite a string currently being matched, to make \n and \t visible sub _show_metas { my $context_str = shift // q{}; # Quote newlines (\n -> \\n, without using a regex)... my $index = index($context_str,"\n"); while ($index >= 0) { substr($context_str, $index, 1, '\\n'); $index = index($context_str,"\n",$index+2); } # Quote tabs (\t -> \\t, without using a regex)... $index = index($context_str,"\t"); while ($index >= 0) { substr($context_str, $index, 1, '\\t'); $index = index($context_str,"\t",$index+2); } return $context_str; } # Minimize whitespace in a string... sub _squeeze_ws { my ($str) = @_; $str =~ tr/\n\t/ /; my $index = index($str,q{ }); while ($index >= 0) { substr($str, $index, 2, q{ }); $index = index($str,q{ },$index); } return $str; } # Prepare for debugging... sub _init_try_stack { our (@try_stack, $last_try_pos, $last_context_str); # Start with a representation of the entire grammar match... @try_stack = ({ subrule => '', height => 0, errmsg => ' \\FAIL ', }); # Initialize tracking of location and context... $last_try_pos = -1; $last_context_str = q{}; # Report... say {*Regexp::Grammars::LOGFILE} _debug_context('=>') . 'Trying from position ' . pos(); } # Create a "context string" showing where the regex is currently matching... sub _debug_context { my ($fill_chars) = @_; # Determine minimal sufficient width for context field... my $field_width = length(_show_metas($_//q{})); if ($field_width > $MAX_CONTEXT_WIDTH) { $field_width = $MAX_CONTEXT_WIDTH; } elsif ($field_width < $MIN_CONTEXT_WIDTH) { $field_width = $MIN_CONTEXT_WIDTH; } # Get current matching position (and some additional trailing context)... my $context_str = substr(_show_metas(substr(($_//q{}).q{},pos()//0,$field_width)),0,$field_width); # Build the context string, handling special cases... our $last_context_str; if ($fill_chars) { # If caller supplied a 1- or 2-char fill sequence, use that instead... my $last_fill_char = length($fill_chars) > 1 ? substr($fill_chars,-1,1,q{}) : $fill_chars ; $context_str = $fill_chars x ($field_width-1) . $last_fill_char; } else { # Make end-of-string visible in empty context string... if ($context_str eq q{}) { $context_str = '[eos]'; } # Don't repeat consecutive identical context strings... if ($context_str eq $last_context_str) { $context_str = q{ } x $field_width; } else { # If not repeating, remember for next time... $last_context_str = $context_str; } } # Left justify and return context string... return sprintf("%-*s ",$field_width,$context_str); } # Show a debugging message (mainly used for compile-time errors and info)... sub _debug_notify { # Single arg is a line to be printed with a null severity... my ($severity, @lines) = @_==1 ? (q{},@_) : @_; chomp @lines; # Formatting string for all lines... my $format = qq{%*s | %s\n}; # Track previous severity and avoid repeating the same level... state $prev_severity = q{}; if ($severity !~ /\S/) { # Do nothing } elsif ($severity eq 'info' && $prev_severity eq 'info' ) { $severity = q{}; } else { $prev_severity = $severity; } # Display first line with severity indicator (unless same as previous)... printf {*Regexp::Grammars::LOGFILE} $format, $MIN_CONTEXT_WIDTH, $severity, shift @lines; # Display first line without severity indicator for my $next_line (@lines) { printf {*Regexp::Grammars::LOGFILE} $format, $MIN_CONTEXT_WIDTH, q{}, $next_line; } } # Handle user interactions during runtime debugging... sub _debug_interact { my ($stack_height, $leader, $curr_frame_ref, $min_debug_level) = @_; our $DEBUG; # ...stores current debug level within regex # Only interact with terminals, and if debug level is appropriate... if (-t *Regexp::Grammars::LOGFILE && defined $DEBUG && ($DEBUG_LEVEL{$DEBUG}//0) >= $DEBUG_LEVEL{$min_debug_level} ) { local $/ = "\n"; # ...in case some caller is being clever INPUT: while (1) { my $cmd = readline // q{}; chomp $cmd; # Input of 'd' means 'display current result frame'... if ($cmd eq 'd') { print {*Regexp::Grammars::LOGFILE} join "\n", map { $leader . ($stack_height?'| ':q{}) . ' : ' . $_ } split "\n", q{ }x8 . substr(Dumper($curr_frame_ref),8); print "\t"; } # Any other (valid) input changes debugging level and continues... else { if (defined $DEBUG_LEVEL{$cmd}) { $DEBUG = $cmd; } last INPUT; } } } # When interaction not indicated, just complete the debugging line... else { print {*Regexp::Grammars::LOGFILE} "\n"; } } # Handle reporting of unsuccessful match attempts... sub _debug_handle_failures { my ($stack_height, $subrule, $in_match) = @_; our @try_stack; # Unsuccessful match attempts leave "leftovers" on the attempt stack... CLEANUP: while (@try_stack && $try_stack[-1]{height} >= $stack_height) { # Grab record of (potentially) unsuccessful attempt... my $error_ref = pop @try_stack; # If attempt was the one whose match is being reported, go and report... last CLEANUP if $in_match && $error_ref->{height} == $stack_height && $error_ref->{subrule} eq $subrule; # Otherwise, report the match failure... say {*Regexp::Grammars::LOGFILE} _debug_context(q{ }) . $error_ref->{errmsg}; } } # Handle attempts to call non-existent subrules... sub _debug_fatal { my ($naughty_construct) = @_; print {*Regexp::Grammars::LOGFILE} "_________________________________________________________________\n", "Fatal error: Entire parse terminated prematurely while attempting\n", " to call non-existent rule: $naughty_construct\n", "_________________________________________________________________\n"; $@ = "Entire parse terminated prematurely while attempting to call non-existent rule: $naughty_construct"; } # Handle objrules that don't return hashes... sub _debug_non_hash { my ($obj, $name) = @_; # If the object is okay, no further action required... return q{} if reftype($obj) eq 'HASH'; # Generate error messages... print {*Regexp::Grammars::LOGFILE} "_________________________________________________________________\n", "Fatal error: returned a non-hash-based object\n", "_________________________________________________________________\n"; $@ = " returned a non-hash-based object"; return '(*COMMIT)(*FAIL)'; } # Print a message in context... sub _debug_logmsg { my ($stack_height, @msg) = @_; # Determine indent for messages... my $leader = _debug_context() . q{| } x ($stack_height-1) . '|'; # Report the attempt... print {*Regexp::Grammars::LOGFILE} map { "$leader$_\n" } @msg; } # Print a message indicating a (sub)match attempt... sub _debug_trying { my ($stack_height, $curr_frame_ref, $subrule) = @_; # Clean up after any preceding unsuccessful attempts... _debug_handle_failures($stack_height, $subrule); # Determine indent for messages... my $leader = _debug_context() . q{| } x ($stack_height-2); # Detect and report any backtracking prior to this attempt... our $last_try_pos //= 0; #...Stores the pos() of the most recent match attempt? my $backtrack_distance = $last_try_pos - pos(); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-2) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . q{ and trying new match} ; } # Report the attempt... print {*Regexp::Grammars::LOGFILE} $leader, "|...Trying $subrule\t"; # Handle user interactions during debugging... _debug_interact($stack_height, $leader, $curr_frame_ref, 'step'); # Record the attempt, for later error handling in _debug_matched()... if ($subrule ne 'next alternative') { our @try_stack; push @try_stack, { height => $stack_height, subrule => $subrule, # errmsg should align under: |...Trying $subrule\t errmsg => q{| } x ($stack_height-2) . "| \\FAIL $subrule", }; } $last_try_pos = pos(); } # Print a message indicating a successful (sub)match... sub _debug_matched { my ($stack_height, $curr_frame_ref, $subrule, $matched_text) = @_; # Clean up any intervening unsuccessful attempts... _debug_handle_failures($stack_height, $subrule, 'in match'); # Build debugging message... my $debug_context = _debug_context(); my $leader = $debug_context . q{| } x ($stack_height-2); my $message = ($stack_height ? '| ' : q{}) . " \\_____$subrule matched "; my $filler = $stack_height ? '| ' . q{ } x (length($message)-4) : q{ } x length($message); our $last_try_pos //= 0; #...Stores the pos() of the most recent match attempt? # Report if match required backtracking... my $backtrack_distance = $last_try_pos - (pos()//0); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-2) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . qq{ and rematching $subrule} ; } $last_try_pos = pos(); # Format match text (splitting multi-line texts and indent them correctly)... $matched_text = defined($matched_text) ? $matched_text = q{'} . join("\n$leader$filler", split "\n", $matched_text) . q{'} : q{}; # Print match message... print {*Regexp::Grammars::LOGFILE} $leader . $message . $matched_text . qq{\t}; # Check for user interaction... _debug_interact($stack_height, $leader, $curr_frame_ref, $stack_height ? 'match' : 'run'); } # Print a message indicating a successful (sub)match... sub _debug_require { my ($stack_height, $condition, $succeeded) = @_; # Build debugging message... my $debug_context = _debug_context(); my $leader = $debug_context . q{| } x ($stack_height-1); my $message1 = ($stack_height ? '|...' : q{}) . "Testing condition: $condition" ; my $message2 = ($stack_height ? '| ' : q{}) . " \\_____" . ($succeeded ? 'Satisified' : 'FAILED') ; # Report if match required backtracking... our $last_try_pos; my $backtrack_distance = $last_try_pos - pos(); if ($backtrack_distance > 0) { say {*Regexp::Grammars::LOGFILE} ' <' . q{~} x (length(_debug_context(q{ }))-3) . q{ } . q{| } x ($stack_height-1) . qq{|...Backtracking $backtrack_distance char} . ($backtrack_distance > 1 ? q{s} : q{}) . qq{ and rematching} ; } # Remember where the condition was tried... $last_try_pos = pos(); # Print match message... say {*Regexp::Grammars::LOGFILE} $leader . $message1; say {*Regexp::Grammars::LOGFILE} $leader . $message2; } # Print a message indicating a successful store-result-of-code-block... sub _debug_executed { my ($stack_height, $curr_frame_ref, $subrule, $value) = @_; # Build message... my $leader = _debug_context() . q{| } x ($stack_height-2); my $message = "|...Action $subrule\n"; my $message2 = "| saved value: '"; $message .= $leader . $message2; my $filler = q{ } x length($message2); # Split multiline results over multiple lines (properly indented)... $value = join "\n$leader$filler", split "\n", $value; # Report the action... print {*Regexp::Grammars::LOGFILE} $leader . $message . $value . qq{'\t}; # Check for user interaction... _debug_interact($stack_height, $leader, $curr_frame_ref, 'match'); } # Create the code to be inserted into the regex to facilitate debugging... sub _build_debugging_statements { my ($debugging_active, $subrule, $extra_pre_indent) = @_; return (q{}, q{}) if ! $debugging_active;; $extra_pre_indent //= 0; $subrule = "q{$subrule}"; return ( qq{ Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], $subrule) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}; }, qq{ Regexp::Grammars::_debug_matched(\@Regexp::Grammars::RESULT_STACK+1, \$Regexp::Grammars::RESULT_STACK[-1], $subrule, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}; }, ); } sub _build_raw_debugging_statements { my ($debugging_active, $subpattern, $extra_pre_indent) = @_; return (q{}, q{}) if ! $debugging_active; $extra_pre_indent //= 0; if ($subpattern eq '|') { return ( q{}, qq{ (?{;Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], 'next alternative') if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, ); } else { return ( qq{ (?{;Regexp::Grammars::_debug_trying(\@Regexp::Grammars::RESULT_STACK+$extra_pre_indent, \$Regexp::Grammars::RESULT_STACK[-2+$extra_pre_indent], q{subpattern /$subpattern/}, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, qq{ (?{;Regexp::Grammars::_debug_matched(\@Regexp::Grammars::RESULT_STACK+1, \$Regexp::Grammars::RESULT_STACK[-1], q{subpattern /$subpattern/}, \$^N) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG};}) }, ); } } #=====[ SUPPORT FOR AUTOMATIC TIMEOUTS ]========================= sub _test_timeout { our ($DEBUG, $TIMEOUT); return q{} if time() < $TIMEOUT->{'limit'}; my $duration = "$TIMEOUT->{duration} second" . ( $TIMEOUT->{duration} == 1 ? q{} : q{s} ); if (defined($DEBUG) && $DEBUG ne 'off') { my $leader = _debug_context(q{ }); say {*LOGFILE} $leader . '|'; say {*LOGFILE} $leader . "|...Invoking {duration}>"; say {*LOGFILE} $leader . "| \\_____No match after $duration"; say {*LOGFILE} $leader . '|'; say {*LOGFILE} $leader . " \\FAIL "; } if (! @!) { @! = "Internal error: Timed out after $duration (as requested)"; } return q{(*COMMIT)(*FAIL)}; } #=====[ SUPPORT FOR UPDATING THE RESULT STACK ]========================= # Create a clone of the current result frame with an new key/value... sub _extend_current_result_frame_with_scalar { my ($stack_ref, $key, $value) = @_; # Autovivify null stacks (only occur when grammar invokes no subrules)... if (!@{$stack_ref}) { $stack_ref = [{}]; } # Copy existing frame, appending new value so it overwrites any old value... my $cloned_result_frame = { %{$stack_ref->[-1]}, $key => $value, }; # Make the copy into an object, if the original was one... if (my $class = blessed($stack_ref->[-1])) { bless $cloned_result_frame, $class; } return $cloned_result_frame; } # Create a clone of the current result frame with an additional key/value # (As above, but preserving the "listiness" of the key being added to)... sub _extend_current_result_frame_with_list { my ($stack_ref, $key, $value) = @_; # Copy existing frame, appending new value to appropriate element's list... my $cloned_result_frame = { %{$stack_ref->[-1]}, $key => [ @{$stack_ref->[-1]{$key}//[]}, $value, ], }; # Make the copy into an object, if the original was one... if (my $class = blessed($stack_ref->[-1])) { bless $cloned_result_frame, $class; } return $cloned_result_frame; } # Pop current result frame and add it to a clone of previous result frame # (flattening it if possible, and preserving any blessing)... sub _pop_current_result_frame { my ($stack_ref, $key, $original_name, $value) = @_; # Where are we in the stack? my $curr_frame = $stack_ref->[-1]; my $caller_frame = $stack_ref->[-2]; # Track which frames are objects... my $is_blessed_curr = blessed($curr_frame); my $is_blessed_caller = blessed($caller_frame); # Remove "private" captures (i.e. those starting with _)... delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} }; # Remove "nocontext" marker... my $nocontext = delete $curr_frame->{'~'}; # Build a clone of the current frame... my $cloned_result_frame = exists $curr_frame->{'='} ? $curr_frame->{'='} : $is_blessed_curr || length(join(q{}, keys %{$curr_frame})) ? { q{} => $value, %{$curr_frame} } : keys %{$curr_frame} ? $curr_frame->{q{}} : $value ; # Apply any appropriate handler... if ($RULE_HANDLER) { if ($RULE_HANDLER->can($original_name) || $RULE_HANDLER->can('AUTOLOAD')) { my $replacement_result_frame = $RULE_HANDLER->$original_name($cloned_result_frame); if (defined $replacement_result_frame) { $cloned_result_frame = $replacement_result_frame; } } } # Remove capture if not requested... if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) { delete $cloned_result_frame->{q{}}; } # Nest a clone of current frame inside a clone of the caller frame... my $cloned_caller_frame = { %{$caller_frame//{}}, $key => $cloned_result_frame, }; # Make the copies into objects, if the originals were... if ($is_blessed_curr && !exists $curr_frame->{'='} ) { bless $cloned_caller_frame->{$key}, $is_blessed_curr; } if ($is_blessed_caller) { bless $cloned_caller_frame, $is_blessed_caller; } return $cloned_caller_frame; } # Pop current result frame and add it to a clone of previous result frame # (flattening it if possible, and preserving any blessing) # (As above, but preserving listiness of key being added to)... sub _pop_current_result_frame_with_list { my ($stack_ref, $key, $original_name, $value) = @_; # Where are we in the stack? my $curr_frame = $stack_ref->[-1]; my $caller_frame = $stack_ref->[-2]; # Track which frames are objects... my $is_blessed_curr = blessed($curr_frame); my $is_blessed_caller = blessed($caller_frame); # Remove "private" captures (i.e. those starting with _)... delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} }; # Remove "nocontext" marker... my $nocontext = delete $curr_frame->{'~'}; # Clone the current frame... my $cloned_result_frame = exists $curr_frame->{'='} ? $curr_frame->{'='} : $is_blessed_curr || length(join(q{}, keys %{$curr_frame})) ? { q{} => $value, %{$curr_frame} } : keys %{$curr_frame} ? $curr_frame->{q{}} : $value ; # Apply any appropriate handler... if ($RULE_HANDLER) { if ($RULE_HANDLER->can($original_name) || $RULE_HANDLER->can('AUTOLOAD')) { my $replacement_result_frame = $RULE_HANDLER->$original_name($cloned_result_frame); if (defined $replacement_result_frame) { $cloned_result_frame = $replacement_result_frame; } } } # Remove capture if not requested... if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) { delete $cloned_result_frame->{q{}}; } # Append a clone of current frame inside a clone of the caller frame... my $cloned_caller_frame = { %{$caller_frame}, $key => [ @{$caller_frame->{$key}//[]}, $cloned_result_frame, ], }; # Make the copies into objects, if the originals were... if ($is_blessed_curr && !exists $curr_frame->{'='} ) { bless $cloned_caller_frame->{$key}[-1], $is_blessed_curr; } if ($is_blessed_caller) { bless $cloned_caller_frame, $is_blessed_caller; } return $cloned_caller_frame; } #=====[ MISCELLANEOUS CONSTANTS ]========================= # Namespace in which grammar inheritance occurs... my $CACHE = 'Regexp::Grammars::_CACHE_::'; my $CACHE_LEN = length $CACHE; my %CACHE; #...for subrule tracking # This code inserted at the start of every grammar regex # (initializes the result stack cleanly and backtrackably, via local)... my $PROLOGUE = q{((?{; @! = () if !pos; local @Regexp::Grammars::RESULT_STACK = (@Regexp::Grammars::RESULT_STACK, {}); local $Regexp::Grammars::TIMEOUT = { limit => -1>>1 }; local $Regexp::Grammars::DEBUG = 'off' }) }; # This code inserted at the end of every grammar regex # (puts final result in %/. Also defines default , , etc.)... my $EPILOGUE = q{)(?{; $Regexp::Grammars::RESULT_STACK[-1]{q{}} //= $^N;; local $Regexp::Grammars::match_frame = pop @Regexp::Grammars::RESULT_STACK; delete @{$Regexp::Grammars::match_frame}{ '~', grep {substr($_,0,1) eq '_'} keys %{$Regexp::Grammars::match_frame} }; if (exists $Regexp::Grammars::match_frame->{'='}) { if (ref($Regexp::Grammars::match_frame->{'='}) eq 'HASH') { $Regexp::Grammars::match_frame = $Regexp::Grammars::match_frame->{'='}; } } if (@Regexp::Grammars::RESULT_STACK) { $Regexp::Grammars::RESULT_STACK[-1]{'(?R)'} = $Regexp::Grammars::match_frame; } Regexp::Grammars::clear_rule_handler(); */ = $Regexp::Grammars::match_frame; })|\Z(?{Regexp::Grammars::clear_rule_handler();})(?!)(?(DEFINE) (? \\s* ) (? (?{$Regexp::Grammars::RESULT_STACK[-1]{'!'}=$#{!};}) \\s* (?{;$#{!}=delete($Regexp::Grammars::RESULT_STACK[-1]{'!'})//0; delete($Regexp::Grammars::RESULT_STACK[-1]{'@'}); }) ) (? \\S+ ) (? (?{$Regexp::Grammars::RESULT_STACK[-1]{'!'}=$#{!};}) \\S+ (?{;$#{!}=delete($Regexp::Grammars::RESULT_STACK[-1]{'!'})//0; delete($Regexp::Grammars::RESULT_STACK[-1]{'@'}); }) ) (? (?{; $Regexp::Grammars::RESULT_STACK[-1]{"="} = pos; }) ) (? (?{; $Regexp::Grammars::RESULT_STACK[-1]{"="} = 1 + substr($_,0,pos) =~ tr/\n/\n/; }) ) ) }; my $EPILOGUE_NC = $EPILOGUE; $EPILOGUE_NC =~ s{ ; .* ;;}{;}xms; #=====[ MISCELLANEOUS PATTERNS THAT MATCH USEFUL THINGS ]======== # Match an identifier... my $IDENT = qr{ [^\W\d] \w*+ }xms; my $QUALIDENT = qr{ (?: $IDENT :: )*+ $IDENT }xms; # Match balanced parentheses, taking into account \-escapes and []-escapes... my $PARENS = qr{ (?&VAR_PARENS) (?(DEFINE) (? \( (?: \\. | (?&VAR_PARENS) | (?&CHARSET) | [^][()\\]++)*+ \) ) (? \[ \^?+ \]?+ (?: \[:\w+:\] | \\. | [^]])*+ \] ) ) }xms; # Match a directive within rules... my $WS_PATTERN = qr{]++ | $PARENS )*+) >}xms; #=====[ UTILITY SUBS FOR ERROR AND WARNING MESSAGES ]======== sub _uniq { my %seen; return grep { defined $_ && !$seen{$_}++ } @_; } # Default translator for error messages... my $ERRORMSG_TRANSLATOR = sub { my ($errormsg, $rulename, $context) = @_; $rulename = 'valid input' if $rulename eq q{}; $context //= ''; # Unimplemented subrule when rulename starts with '-'... if (substr($rulename,0,1) eq '-') { $rulename = substr($rulename,1); return "Can't match subrule <$rulename> (not implemented)"; } # Empty message converts to a "Expected...but found..." message... if ($errormsg eq q{}) { $rulename =~ tr/_/ /; $rulename = lc($rulename); return "Expected $rulename, but found '$context' instead"; } # "Expecting..." messages get "but found" added... if (lc(substr($errormsg,0,6)) eq 'expect') { return "$errormsg, but found '$context' instead"; } # Everything else stays "as is"... return $errormsg; }; # Allow user to set translation... sub set_error_translator { { package Regexp::Grammars::TranslatorRestorer; sub new { my ($class, $old_translator) = @_; bless \$old_translator, $class; } sub DESTROY { my ($old_translator_ref) = @_; $ERRORMSG_TRANSLATOR = ${$old_translator_ref}; } } my ($translator_ref) = @_; die "Usage: set_error_translator(\$subroutine_reference)\n" if ref($translator_ref) ne 'CODE'; my $old_translator_ref = $ERRORMSG_TRANSLATOR; $ERRORMSG_TRANSLATOR = $translator_ref; return defined wantarray ? Regexp::Grammars::TranslatorRestorer->new($old_translator_ref) : (); } # Dispatch to current translator for error messages... sub _translate_errormsg { goto &{$ERRORMSG_TRANSLATOR}; } #=====[ SUPPORT FOR TRANSLATING GRAMMAR-ENHANCED REGEX TO NATIVE REGEX ]==== # Store any specified grammars... my %user_defined_grammar; my %REPETITION_DESCRIPTION_FOR = ( '+' => 'once or more', '*' => 'any number of times', '?' => 'if possible', '+?' => 'as few times as possible', '*?' => 'as few times as possible', '??' => 'if necessary', '++' => 'as many times as possible', '*+' => 'as many times as possible', '?+' => 'if possible', ); sub _translate_raw_regex { my ($regex, $debug_build, $debug_runtime) = @_; my $is_comment = substr($regex, 0, 1) eq q{#} || substr($regex, 0, 3) eq q{(?#}; my $visible_regex = _squeeze_ws($regex); # Report how regex was interpreted, if requested to... if ($debug_build && $visible_regex ne q{} && $visible_regex ne q{ }) { _debug_notify( info => " |", " |...Treating '$visible_regex' as:", ($is_comment ? " | \\ a comment (which will be ignored)" : " | \\ normal Perl regex syntax" ), ); } return q{} if $is_comment; # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_raw_debugging_statements($debug_runtime,$visible_regex, +1); # Replace negative lookahead with one that works under R::G... $regex =~ s{\(\?!}{(?!(?!)|}gxms; # ToDo: Also replace positive lookahead with one that works under R::G... # This replacement should be of the form: # $regex =~ s{\(\?!}{(?!(?!)|(?!(?!)|}gxms; # but need to find a way to insert the extra ) at the other end return $debug_runtime && $regex eq '|' ? $regex . $debug_post : $debug_runtime && $regex =~ /\S/ ? "(?:$debug_pre($regex)$debug_post)" : $regex; } # Report and convert a debugging directive... sub _translate_debug_directive { my ($construct, $cmd, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Change run-time debugging mode to '$cmd'", ); } return qq{(?{; local \$Regexp::Grammars::DEBUG = q{$cmd}; }) }; } # Report and convert a timeout directive... sub _translate_timeout_directive { my ($construct, $timeout, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", ($timeout > 0 ? " | \\ Cause the entire parse to fail after $timeout second" . ($timeout==1 ? q{} : q{s}) : " | \\ Cause the entire parse to fail immediately" ), ); } return $timeout > 0 ? qq{(?{; local \$Regexp::Grammars::TIMEOUT = { duration => $timeout, limit => time() + $timeout }; }) } : qq{(*COMMIT)(*FAIL)}; } # Report and convert a directive... sub _translate_require_directive { my ($construct, $condition, $debug_build) = @_; $condition = substr($condition, 3, -2); # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Require that {$condition} is true", ); } my $quoted_condition = $condition; $quoted_condition =~ s{\$}{}xms; return qq{(?(?{;$condition}) (?{;Regexp::Grammars::_debug_require( scalar \@Regexp::Grammars::RESULT_STACK, q{$quoted_condition}, 1) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}}) | (?{;Regexp::Grammars::_debug_require( scalar \@Regexp::Grammars::RESULT_STACK, q{$quoted_condition}, 0) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG}})(?!)) }; } # Report and convert a directive... sub _translate_minimize_directive { my ($construct, $debug_build) = @_; # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | \\ Minimize result value if possible", ); } return q{(?{; if (1 == grep { $_ ne '!' && $_ ne '@' && $_ ne '~' } keys %MATCH) { # ...single alnum key local %Regexp::Grammars::matches = %MATCH; delete @Regexp::Grammars::matches{'!', '@', '~'}; local ($Regexp::Grammars::only_key) = keys %Regexp::Grammars::matches; local $Regexp::Grammars::array_ref = $MATCH{$Regexp::Grammars::only_key}; if (ref($Regexp::Grammars::array_ref) eq 'ARRAY' && 1 == @{$Regexp::Grammars::array_ref}) { $MATCH = $Regexp::Grammars::array_ref->[0]; } } })}; } # Report and convert a debugging directive... sub _translate_error_directive { my ($construct, $type, $msg, $debug_build, $subrule_name) = @_; $subrule_name //= 'undef'; # Determine severity... my $severity = ($type eq 'error') ? 'fail' : 'non-fail'; # Determine fatality (and build code to invoke it)... my $fatality = ($type eq 'fatal') ? '(*COMMIT)(*FAIL)' : q{}; # Unpack message... if (substr($msg,0,3) eq '(?{') { $msg = 'do'. substr($msg,2,-1); } else { $msg = quotemeta $msg; $msg = qq{qq{$msg}}; } # Report how directive was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", ( $type eq 'log' ? " | \\ Log a message to the logfile" : " | \\ Append a $severity error message to \@!" ), ); } # Generate the regex... return $type eq 'log' ? qq{(?{Regexp::Grammars::_debug_logmsg(scalar \@Regexp::Grammars::RESULT_STACK,$msg) if \$Regexp::Grammars::DEBUG_LEVEL{\$Regexp::Grammars::DEBUG} })} : qq{(?:(?{;local \$Regexp::Grammar::_memopos=pos();}) (?>\\s*+((?-s).{0,$MAX_CONTEXT_WIDTH}+)) (?{; pos() = \$Regexp::Grammar::_memopos; @! = Regexp::Grammars::_uniq( @!, Regexp::Grammars::_translate_errormsg($msg,q{$subrule_name},\$CONTEXT) ) }) (?!)|} . ($severity eq 'fail' ? q{(?!)} : $fatality) . q{)} ; } sub _translate_subpattern { my ($construct, $alias, $subpattern, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout, $backref) = @_; # Determine save behaviour... my $is_noncapturing = $savemode eq 'noncapturing'; my $is_listifying = $savemode eq 'list'; my $is_codeblock = substr($subpattern,0,3) eq '(?{'; my $value_saved = $is_codeblock ? '$^R' : '$^N'; my $do_something_with = $is_codeblock ? 'execute the code block' : 'match the pattern'; my $result = $is_codeblock ? 'result' : 'matched substring'; my $description = $is_codeblock ? substr($subpattern,2,-1) : defined $backref ? $backref : $subpattern; my $debug_construct = $is_codeblock ? '<' . substr($alias,1,-1) . '= (?{;' . substr($subpattern,3,-2) . '})>' : $construct ; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? "each $result" : substr($postmodifier,0,1) eq '?' ? "any $result" : $postmodifier && !$is_noncapturing ? "only the final $result" : "the $result" ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | $do_something_with $description $repeatedly", ( $is_noncapturing ? " | \\ but don't save $results" : $is_listifying ? " | \\ appending $results to \@{\$MATCH{$alias}}" : " | \\ saving $results in \$MATCH{$alias}" ) ); } # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime,$debug_construct, +1); # Generate post-match result-capturing code, if match captures... my $post_action = $is_noncapturing ? q{} : qq{local \@Regexp::Grammars::RESULT_STACK = ( \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-2], Regexp::Grammars::_extend_current_result_frame_with_$savemode( \\\@Regexp::Grammars::RESULT_STACK, $alias, $value_saved ), );} ; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{$timeout_test(?{;local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK;$debug_pre}) (?:($subpattern)(?{;$post_action$debug_post}))$postmodifier}; } sub _translate_hashmatch { my ($construct, $alias, $hashname, $keypat, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout) = @_; # Empty or missing keypattern defaults to <.hk>... if (!defined $keypat || $keypat !~ /\S/) { $keypat = '(?&hk__implicit__)' } else { $keypat = substr($keypat, 1, -1); } # Determine save behaviour... my $is_noncapturing = $savemode eq 'noncapturing'; my $is_listifying = $savemode eq 'list'; # Convert hash to hash lookup... my $hash_lookup = '$' . substr($hashname, 1). '{$^N}'; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? 'each matched key' : substr($postmodifier,0,1) eq '?' ? 'any matched key' : $postmodifier && !$is_noncapturing ? 'only the final matched key' : 'the matched key' ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | match a key from the hash $hashname $repeatedly", ( $is_noncapturing ? " | \\ but don't save $results" : $is_listifying ? " | \\ appending $results to \$MATCH{$alias}" : " | \\ saving $results in \$MATCH{$alias}" ) ); } # Generate run-time debugging code (if any)... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime,$construct, +1); # Generate post-match result-capturing code, if match captures... my $post_action = $is_noncapturing ? q{} : qq{local \@Regexp::Grammars::RESULT_STACK = ( \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-2], Regexp::Grammars::_extend_current_result_frame_with_$savemode( \\\@Regexp::Grammars::RESULT_STACK, $alias, \$^N ), );} ; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{$timeout_test(?:(?{;local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK;$debug_pre}) (?:($keypat)(??{exists $hash_lookup ? q{} : q{(?!)}})(?{;$post_action$debug_post})))$postmodifier}; } # Convert a " % " construct to pure Perl 5.10... sub _translate_separated_list { my ($term, $op, $separator, $term_trans, $sep_trans, $ws, $debug_build, $debug_runtime, $timeout) = @_; # This insertion ensures backtracking upwinds the stack correctly... state $CHECKPOINT = q{(?{;@Regexp::Grammars::RESULT_STACK = @Regexp::Grammars::RESULT_STACK;})}; # Translate meaningful whitespace... $ws = length($ws) ? q{(?&ws__implicit__)} : q{}; # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Report how construct was interpreted, if requested to... if ($debug_build) { _debug_notify( info => " |", " |...Treating $term $op $separator as:", " | | repeatedly match the subrule $term", " | \\ as long as the matches are separated by matches of $separator", ); } # One-or-more... return qq{$timeout_test(?:$ws$CHECKPOINT$sep_trans$ws$term_trans)*$+} if $op =~ m{ [*][*]() | [+]([+?]?) \s* % | \{ 1, \}([+?]?) \s* % }xms; # Zero-or-more... return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans)*$+)?$+} if $op =~ m{ [*]([+?]?) \s* % | \{ 0, \}([+?]?) \s* % }xms; # One-or-zero... return qq{?$+} if $op =~ m{ [?]([+?]?) \s* % | \{ 0,1 \}([+?]?) \s* % }xms; # Zero exactly... return qq{{0}$ws} if $op =~ m{ \{ 0 \}[+?]? \s* % }xms; # N exactly... if ($op =~ m{ \{ (\d+) \}([+?]?) \s* % }xms ) { my $min = $1-1; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){$min}$+)} } # Zero-to-N... if ($op =~ m{ \{ 0,(\d+) \}([+?]?) \s* % }xms ) { my $max = $1-1; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){0,$max}$+)?$+} } # M-to-N and M-to-whatever... if ($op =~ m{ \{ (\d+),(\d*) \} ([+?]?) \s* % }xms ) { my $min = $1-1; my $max = $2 ? $2-1 : q{}; return qq{{0}$timeout_test$ws(?:$term_trans(?:$ws$CHECKPOINT$sep_trans$ws$term_trans){$min,$max}$+)} } # Somehow we missed a case (this should never happen)... die "Internal error: missing case in separated list handler"; } sub _translate_subrule_call { my ($source_line, $source_file, $rulename, $grammar_name, $construct, $alias, $subrule, $args, $savemode, $postmodifier, $debug_build, $debug_runtime, $timeout, $valid_subrule_names_ref, $nocontext) = @_; # Translate arg list, if provided... my $arg_desc; if ($args eq q{}) { $args = q{()}; } elsif (substr($args,0,3) eq '(?{') { # Turn parencode into do block... $arg_desc = substr($args,3,-2); substr($args,1,1) = 'do'; } else { # Turn abbreviated format into a key=>value list... $args =~ s{ [(,] \s* \K : (\w+) (?= \s* [,)] ) }{$1 => \$MATCH{'$1'}}gxms; $arg_desc = substr($args,1,-1); } # Transform qualified subrule names... my $simple_subrule = $subrule; my $start_grammar = (($simple_subrule =~ s{(.*)::}{}xms) ? $1 : ""); if ($start_grammar !~ /^NEXT$|::/) { $start_grammar = caller(3).'::'.$start_grammar; } my @candidates = $start_grammar eq 'NEXT' ? _ancestry_of($grammar_name) : _ancestry_of($start_grammar); # Rename fully-qualified rule call, if to ancestor grammar... RESOLVING: for my $parent_class (@candidates) { my $inherited_subrule = $parent_class.'::'.$simple_subrule; if ($CACHE{$inherited_subrule}) { $subrule = $inherited_subrule; last RESOLVING; } } # Replace package separators, which regex engine can't handle... my $internal_subrule = $subrule; $internal_subrule =~ s{::}{_88_}gxms; # Shortcircuit if unknown subrule invoked... if (!$valid_subrule_names_ref->{$subrule}) { _debug_notify( error => qq{Found call to $construct inside definition of $rulename}, qq{near $source_file line $source_line.}, qq{But no or was defined in the grammar}, qq{(Did you misspell $construct? Or forget to define the rule?)}, q{}, ); return "(?{Regexp::Grammars::_debug_fatal('$construct')})(*COMMIT)(*FAIL)"; } # Determine save behaviour... my $is_noncapturing = $savemode =~ /noncapturing|lookahead/; my $is_listifying = $savemode eq 'list'; my $save_code = $is_noncapturing? q{ @Regexp::Grammars::RESULT_STACK[0..@Regexp::Grammars::RESULT_STACK-2] } : $is_listifying? qq{ \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-3], Regexp::Grammars::_pop_current_result_frame_with_list( \\\@Regexp::Grammars::RESULT_STACK, $alias, '$simple_subrule', \$^N ), } : qq{ \@Regexp::Grammars::RESULT_STACK[0..\@Regexp::Grammars::RESULT_STACK-3], Regexp::Grammars::_pop_current_result_frame( \\\@Regexp::Grammars::RESULT_STACK, $alias, '$simple_subrule', \$^N ), } ; # Report how construct was interpreted, if requested to... my $repeatedly = $REPETITION_DESCRIPTION_FOR{$postmodifier} // q{}; my $results = $is_listifying && $postmodifier ? 'each match' : substr($postmodifier,0,1) eq '?' ? 'any match' : 'the match' ; my $do_something_with = $savemode eq 'neglookahead' ? 'lookahead for anything except' : $savemode eq 'poslookahead' ? 'lookahead for' : 'match' ; if ($debug_build) { _debug_notify( info => " |", " |...Treating $construct as:", " | | $do_something_with the subrule <$subrule> $repeatedly", (defined $arg_desc ? " | | passing the args: ($arg_desc)" : () ), ( $is_noncapturing ? " | \\ but don't save anything" : $is_listifying ? " | \\ appending $results to \$MATCH{$alias}" : " | \\ saving $results in \$MATCH{$alias}" ), ); } # Generate post-match result-capturing code, if match captures... my ($debug_pre, $debug_post) = _build_debugging_statements($debug_runtime, $construct); # Generate timeout test... my $timeout_test = $timeout ? q{(??{;Regexp::Grammars::_test_timeout()})} : q{}; # Translate to standard regex code... return qq{(?:$timeout_test(?{; local \@Regexp::Grammars::RESULT_STACK = (\@Regexp::Grammars::RESULT_STACK, {'\@'=>{$args}}); \$Regexp::Grammars::RESULT_STACK[-2]{'~'} = $nocontext if \@Regexp::Grammars::RESULT_STACK >= 2; $debug_pre})((?&$internal_subrule))(?{; local \@Regexp::Grammars::RESULT_STACK = ( $save_code );$debug_post }))$postmodifier}; } sub _translate_rule_def { my ($type, $qualifier, $name, $callname, $qualname, $body, $objectify, $local_ws) = @_; $qualname =~ s{::}{_88_}gxms; # Return object if requested... my $objectification = $objectify ? qq{(??{; local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK; \$Regexp::Grammars::RESULT_STACK[-1] = '$qualifier$name'->can('new') ? '$qualifier$name'->new(\$Regexp::Grammars::RESULT_STACK[-1]) : bless \$Regexp::Grammars::RESULT_STACK[-1], '$qualifier$name'; Regexp::Grammars::_debug_non_hash(\$Regexp::Grammars::RESULT_STACK[-1],'$name'); })} : q{}; # Each rule or token becomes a DEFINE'd Perl 5.10 named capture... my $implicit_version = ($callname eq 'ws' || $callname eq 'hk') ? qq{(?<${callname}__implicit__> $body) } : qq{}; return qq{ (?(DEFINE) $local_ws (?<$qualname> (?<$callname> (?{\$Regexp::Grammars::RESULT_STACK[-1]{'!'}=\$#{!};}) (?:$body) $objectification (?{;\$#{!}=delete(\$Regexp::Grammars::RESULT_STACK[-1]{'!'})//0; delete(\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}); }) )) $implicit_version ) }; } # Locate any valid <...> sequences and replace with native regex code... sub _translate_subrule_calls { my ($source_file, $source_line, $grammar_name, $grammar_spec, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, $rule_name, $subrule_names_ref, $magic_ws, $nocontext, ) = @_; my $pretty_rule_name = $rule_name ? ($magic_ws ? '" : 'main regex (before first rule)'; # Remember the preceding construct, so as to implement the +% etc. operators... my $prev_construct = q{}; my $prev_translation = q{}; my $curr_line_num = 1; # Translate all other calls (MAIN GRAMMAR FOR MODULE)... $grammar_spec =~ s{ (? (? \s*+) (? (?&SEPLIST_OP) ) (? \s*+) )? (? (? \. \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? (? \? | \! ) \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \[ \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* \] ) | (? (?(?&IDENT)) \s* = \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&QUALIDENT)) \s* (?(?&ARGLIST)) \s* \] ) | (? \s* : (?(?&QUALIDENT)) \s* ) | (? (?(?&IDENT)) \s* = \s* : (?(?&QUALIDENT)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* : (?(?&QUALIDENT)) \s* \] ) | (? \. (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* ) | (? (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&PARENCODE)|(?&PARENS)|(?&LITERAL)) \s* \] ) | (? (?(?&HASH)) \s* (?(?&BRACES))? \s* ) | (? (?(?&IDENT)) \s* = \s* (?(?&HASH)) \s* (?(?&BRACES))? \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (?(?&HASH)) \s* (?(?&BRACES))? \s* \] ) | (? \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* | \s* (? \\_ | /) (? (?&QUALIDENT)) \s* ) | (? (?(?&IDENT)) \s* = \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* | (?(?&IDENT)) \s* = \s* (? \\_ | /) (? (?&QUALIDENT)) \s* ) | (? \[ (?(?&IDENT)) \s* = \s* (? \\ | /) (? [:] (?&QUALIDENT)) \s* \] | \[ (?(?&IDENT)) \s* = \s* (? \\_ | /) (? (?&QUALIDENT)) \s* \] ) | (? minimize \s* : \s* ) | (? require \s* : \s* (? (?&PARENCODE) ) \s* ) | (? debug \s* : \s* (? run | match | step | try | off | on) \s* ) | (? timeout \s* : \s* (? \d+) \s* ) | (? context \s* : \s* ) | (? nocontext \s* : \s* ) | (? [.][.][.] | [!][!][!] | [?][?][?] ) | (? (? error | fatal ) \s*+ : \s*+ ) | (? (? log | error | warning | fatal ) \s*+ : \s*+ (? (?&PARENCODE) | .+? ) \s*+ ) ) > (? \s* (?! (?&SEPLIST_OP) ) [?+*][?+]? | ) | (? $WS_PATTERN ) | (? \(\?\<\w+\> ) | (? < [^>\n]* [>\n] ) | (? (? (?: \\[^shv] | (?! (?&PARENCODE) ) (?&PARENS) | (?&CHARSET) | \w++ | \| ) (?&QUANTIFIER)? ) | (? \s++ | \\. (?&QUANTIFIER)? | \(\?! | \(\?\# [^)]* \) # (?# -> old style inline comment) | (?&PARENCODE) | \# [^\n]*+ | [^][\s()<>#\\]++ ) ) (?(DEFINE) (? \*\* | [*+?][+?]?\s*% | \{ \d+(,\d*)? \} [+?]?\s*% ) (? \( (?:[?]<[=!])? (?: \\. | (?&PARENCODE) | (?&PARENS) | (?&CHARSET) | [^][()\\<>]++ )*+ \) ) (? \{ (?: \\. | (?&BRACES) | [^{}\\]++ )*+ \} ) (? \(\?[{] (?: \\. | (?&BRACES) | [^{}\\]++ )*+ [}]\) ) (? \% (?&IDENT) (?: :: (?&IDENT) )* ) (? \[ \^?+ \]?+ (?: \[:\w+:\] | \\. | [^]] )*+ \] ) (? [^\W\d]\w*+ ) (? (?: [^\W\d]\w*+ :: )* [^\W\d]\w*+ ) (? (?&NUMBER) | (?&STRING) | (?&VAR) ) (? [+-]? \d++ (?:\. \d++)? (?:[eE] [+-]? \d++)? ) (? ' [^\\']++ (?: \\. [^\\']++ )* ' ) (? (?&PARENCODE) | \( \s* (?&ARGS)? \s* \) | (?# NOTHING ) ) (? (?&ARG) \s* (?: , \s* (?&ARG) \s* )* ,? ) (? (?&VAR) | (?&KEY) \s* => \s* (?&LITERAL) ) (? : (?&IDENT) ) (? (?&IDENT) | (?&LITERAL) ) (? [*+?][+?]? | \{ \d+,?\d* \} [+?]? ) ) }{ my $curr_construct = $+{construct}; my $list_marker = $+{list_marker} // q{}; my $alias = ($+{alias}//'MATCH') eq 'MATCH' ? q{'='} : qq{'$+{alias}'}; # Determine and remember the necessary translation... my $curr_translation = do{ # Translate subrule calls of the form: ... if (defined $+{alias_parens_scalar}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_parens_scalar_nocap}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_parens_list}) { my $pattern = substr($+{pattern},0,1) eq '(' ? $+{pattern} : "(?{$+{pattern}})"; _translate_subpattern( $curr_construct, $alias, $pattern, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_hash_scalar}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_hash_scalar_nocap}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } elsif (defined $+{alias_hash_list}) { _translate_hashmatch( $curr_construct, $alias, $+{varname}, $+{keypat}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_subrule_scalar}) { _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, $alias, $+{subrule}, $+{args}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{alias_subrule_list}) { _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, $alias, $+{subrule}, $+{args}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } # Translate subrule calls of the form: and ... elsif (defined $+{self_subrule_lookahead}) { # Determine type of lookahead, and work around capture problem... my ($type, $pre, $post) = ( 'neglookahead', '(?!(?!)|', ')' ); if (defined $+{sign} eq '?') { $type = 'poslookahead'; $pre x= 2; $post x= 2; } $pre . _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, $type, q{}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ) . $post; } elsif (defined $+{self_subrule_scalar_nocap}) { _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{self_subrule_scalar}) { _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } elsif (defined $+{self_subrule_list}) { _translate_subrule_call( $source_line, $source_file, $pretty_rule_name, $grammar_name, $curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $subrule_names_ref, $nocontext, ); } # Translate subrule calls of the form: ... elsif (defined $+{alias_argrule_scalar}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, $alias, $pattern, 'scalar', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } elsif (defined $+{alias_argrule_list}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, $alias, $pattern, 'list', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } # Translate subrule calls of the form: <:ARGNAME>... elsif (defined $+{self_argrule_scalar}) { my $pattern = qq{(??{;\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}{'$+{subrule}'} // '(?!)'})}; _translate_subpattern( $curr_construct, qq{'$+{subrule}'}, $pattern, 'noncapturing', $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$ARG{'$+{subrule}'}" ); } # Translate subrule calls of the form: <\IDENT> or ... elsif (defined $+{backref} || $+{alias_backref} || $+{alias_backref_list}) { # Use "%ARGS" if subrule names starts with a colon... my $subrule = $+{subrule}; if (substr($subrule,0,1) eq ':') { substr($subrule,0,1,"\@'}{'"); } my $backref = qq{\$Regexp::Grammars::RESULT_STACK[-1]{'$subrule'}}; my $quoter = $+{slash} eq '\\' || $+{slash} eq '\\_' ? "quotemeta($backref)" : "Regexp::Grammars::_invert_delim($backref)" ; my $pattern = qq{ (??{ defined $backref ? $quoter : q{(?!)}})}; my $type = $+{backref} ? 'noncapturing' : $+{alias_backref} ? 'scalar' : 'list' ; _translate_subpattern( $curr_construct, $alias, $pattern, $type, $+{modifier}, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, "in \$MATCH{'$subrule'}" ); } # Translate reportable raw regexes (add debugging support)... elsif (defined $+{reportable_raw_regex}) { _translate_raw_regex( $+{reportable_raw_regex}, $compiletime_debugging_requested, $runtime_debugging_requested ); } # Translate non-reportable raw regexes (leave as is)... elsif (defined $+{raw_regex}) { _translate_raw_regex( $+{raw_regex}, $compiletime_debugging_requested ); } # Translate directives... elsif (defined $+{require_directive}) { _translate_require_directive( $curr_construct, $+{condition}, $compiletime_debugging_requested ); } elsif (defined $+{minimize_directive}) { _translate_minimize_directive( $curr_construct, $+{condition}, $compiletime_debugging_requested ); } elsif (defined $+{debug_directive}) { _translate_debug_directive( $curr_construct, $+{cmd}, $compiletime_debugging_requested ); } elsif (defined $+{timeout_directive}) { _translate_timeout_directive( $curr_construct, $+{timeout}, $compiletime_debugging_requested ); } elsif (defined $+{error_directive}) { _translate_error_directive( $curr_construct, $+{error_type}, $+{msg}, $compiletime_debugging_requested, $rule_name ); } elsif (defined $+{autoerror_directive}) { _translate_error_directive( $curr_construct, $+{error_type}, q{}, $compiletime_debugging_requested, $rule_name ); } elsif (defined $+{yadaerror_directive}) { _translate_error_directive( $curr_construct, ($+{yadaerror_directive} eq '???' ? 'warning' : 'error'), q{}, $compiletime_debugging_requested, -$rule_name ); } elsif (defined $+{context_directive}) { $nocontext = 0; if ($compiletime_debugging_requested) { _debug_notify( info => " |", " |...Treating $curr_construct as:", " | \\ Turn on context-saving for the current rule" ); } q{}; # Remove the directive } elsif (defined $+{nocontext_directive}) { $nocontext = 1; if ($compiletime_debugging_requested) { _debug_notify( info => " |", " |...Treating $curr_construct as:", " | \\ Turn off context-saving for the current rule" ); } q{}; # Remove the directive } elsif (defined $+{ws_directive}) { if ($compiletime_debugging_requested) { _debug_notify( info => " |", " |...Treating $curr_construct as:", " | \\ Change whitespace matching for the current rule" ); } $curr_construct; } # Something that looks like a rule call or directive, but isn't... elsif (defined $+{incomplete_request}) { my $request = $+{incomplete_request}; my $inferred_type = $request =~ /:/ ? 'directive' : 'subrule call'; _debug_notify( warn => qq{Possible failed attempt to specify a $inferred_type:}, qq{ $request}, qq{near $source_file line $source_line}, qq{(If you meant to match literally, use: \\$request)}, q{}, ); $request; } # A quantifier that isn't quantifying anything... elsif (defined $+{loose_quantifier}) { my $quant = $+{loose_quantifier}; $quant =~ s{^\s+}{}; my $literal = quotemeta($quant); _debug_notify( fatal => qq{Quantifier that doesn't quantify anything: $quant}, qq{in declaration of $pretty_rule_name}, qq{near $source_file line $source_line}, qq{(Did you mean to match literally? If so, try: $literal)}, q{}, ); exit(1); } # There shouldn't be any other possibility... else { die qq{Internal error: this shouldn't happen!\n}, qq{Near '$curr_construct' in $pretty_rule_name\n}; } }; # Handle the **/*%/+%/{n,m}%/etc operators... if ($list_marker) { my $ws = $magic_ws ? $+{ws1} . $+{ws2} : q{}; my $op = $+{op}; $curr_translation = _translate_separated_list( $prev_construct, $op, $curr_construct, $prev_translation, $curr_translation, $ws, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, ); $curr_construct = qq{$prev_construct $op $curr_construct}; } # Finally, remember this latest translation, and return it... $prev_construct = $curr_construct; $prev_translation = $curr_translation;; }exmsg; # Translate magic hash accesses... $grammar_spec =~ s{\$(?:\:\:)?MATCH (?= \s*\{) } {\$Regexp::Grammars::RESULT_STACK[-1]}xmsg; $grammar_spec =~ s{\$(?:\:\:)?ARG (?= \s*\{) } {\$Regexp::Grammars::RESULT_STACK[-1]{'\@'}}xmsg; # Translate magic scalars and hashes... state $translate_scalar = { q{%$MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{@$MATCH} => q{@{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{$MATCH} => q{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}, q{%MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]}}, q{$CAPTURE} => q{$^N}, q{$CONTEXT} => q{$^N}, q{$DEBUG} => q{$Regexp::Grammars::DEBUG}, q{$INDEX} => q{${\\pos()}}, q{%ARG} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{'@'}}}, q{%$::MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{@$::MATCH} => q{@{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}}, q{$::MATCH} => q{$Regexp::Grammars::RESULT_STACK[-1]{q{=}}}, q{%::MATCH} => q{%{$Regexp::Grammars::RESULT_STACK[-1]}}, q{$::CAPTURE} => q{$^N}, q{$::CONTEXT} => q{$^N}, q{$::DEBUG} => q{$Regexp::Grammars::DEBUG}, q{$::INDEX} => q{${\\pos()}}, q{%::ARG} => q{%{$Regexp::Grammars::RESULT_STACK[-1]{'@'}}}, }; state $translatable_scalar = join '|', map {quotemeta $_} sort {length $b <=> length $a} keys %{$translate_scalar}; $grammar_spec =~ s{ ($translatable_scalar) (?! \s* (?: \[ | \{) ) } {$translate_scalar->{$1}}oxmsg; return $grammar_spec; } # Generate a "decimal timestamp" and insert in a template... sub _timestamp { my ($template) = @_; # Generate and insert any timestamp... if ($template =~ /%t/) { my ($sec, $min, $hour, $day, $mon, $year) = localtime; $mon++; $year+=1900; my $timestamp = sprintf("%04d%02d%02d.%02d%02d%02d", $year, $mon, $day, $hour, $min, $sec); $template =~ s{%t}{$timestamp}xms;; } return $template; } # Open (or re-open) the requested log file... sub _autoflush { my ($fh) = @_; my $originally_selected = select $fh; $|=1; select $originally_selected; } sub _open_log { my ($mode, $filename, $from_where) = @_; $from_where //= q{}; # Special case: '-' --> STDERR if ($filename eq q{-}) { return *STDERR{IO}; } # Otherwise, just open the named file... elsif (open my $fh, $mode, $filename) { _autoflush($fh); return $fh; } # Otherwise, generate a warning and default to STDERR... else { local *Regexp::Grammars::LOGFILE = *STDERR{IO}; _debug_notify( warn => qq{Unable to open log file '$filename'}, ($from_where ? $from_where : ()), qq{($!)}, qq{Defaulting to STDERR instead.}, q{}, ); return *STDERR{IO}; } } sub _invert_delim { my ($delim) = @_; $delim = reverse $delim; $delim =~ tr/<>[]{}()«»`'/><][}{)(»«'`/; return quotemeta $delim; } # Regex to detect if other regexes contain a grammar specification... my $GRAMMAR_DIRECTIVE = qr{ < grammar: \s* (? $QUALIDENT ) \s* > }xms; # Regex to detect if other regexes contain a grammar inheritance... my $EXTENDS_DIRECTIVE = qr{ < extends: \s* (? $QUALIDENT ) \s* > }xms; # Cache of rule/token names within defined grammars... my %subrule_names_for; # Build list of ancestors for a given grammar... sub _ancestry_of { my ($grammar_name) = @_; return () if !$grammar_name; use mro; return map { substr($_, $CACHE_LEN) } @{mro::get_linear_isa($CACHE.$grammar_name, 'c3')}; } # Detect and translate any requested grammar inheritances... sub _extract_inheritances { my ($source_line, $source_file, $regex, $compiletime_debugging_requested, $derived_grammar_name) = @_; # Detect and remove inheritance requests... while ($regex =~ s{$EXTENDS_DIRECTIVE}{}xms) { # Normalize grammar name and report... my $orig_grammar_name = $+{base_grammar_name}; my $grammar_name = $orig_grammar_name; if ($grammar_name !~ /::/) { $grammar_name = caller(2).'::'.$grammar_name; } if (exists $user_defined_grammar{$grammar_name}) { if ($compiletime_debugging_requested) { _debug_notify( info => "Processing inheritance request for $grammar_name...", q{}, ); } # Specify new relationship... no strict 'refs'; push @{$CACHE.$derived_grammar_name.'::ISA'}, $CACHE.$grammar_name; } else { _debug_notify( fatal => "Inheritance from unknown grammar requested", "by directive", "in regex grammar declared at $source_file line $source_line", q{}, ); exit(1); } } # Retrieve ancestors (but not self) in C3 dispatch order... my (undef, @ancestors) = _ancestry_of($derived_grammar_name); # Extract subrule names and implementations for ancestors... my %subrule_names = map { %{$subrule_names_for{$_}} } @ancestors; $_ = -1 for values %subrule_names; my $implementation = join "\n", map { $user_defined_grammar{$_} } @ancestors; return $implementation, \%subrule_names; } # Transform grammar-augmented regex into pure Perl 5.10 regex... sub _build_grammar { my ($grammar_spec) = @_; $grammar_spec .= q{}; # Check for lack of Regexp::Grammar-y constructs and short-circuit... if ($grammar_spec !~ m{ < (?: [.?![:%\\/]? [^\W\d]\w* [^>]* | [.?!]{3} ) > }xms) { return $grammar_spec; } # Remember where we parked... my ($source_file, $source_line) = (caller 1)[1,2]; $source_line -= $grammar_spec =~ tr/\n//; # Check for dubious repeated constructs that throw away captures... my $dubious_line = $source_line; while ($grammar_spec =~ m{ (.*?) ( < (?! \[ ) # not <[SUBRULE]> ( $IDENT (?: = [^>]*)? ) # but or > \s* ( # followed by a quantifier... [+*][?+]? # either symbolic | \{\d+(?:,\d*)?\}[?+]? # or numeric ) ) }gxms) { my ($prefix, $match, $rule, $qual) = ($1, $2, $3, $4); $dubious_line += $prefix =~ tr/\n//; _debug_notify( warn => qq{Repeated subrule <$rule>$qual}, qq{at $source_file line $dubious_line}, qq{will only capture its final match}, qq{(Did you mean <[$rule]>$qual instead?)}, q{}, ); $dubious_line += $match =~ tr/\n//; } # Check for dubious non-backtracking constructs... $dubious_line = $source_line; while ( $grammar_spec =~ m{ (.*?) ( < (?! (?:obj)? (?:rule: | token ) ) ( [^>]+ ) > \s* ( [?+*][+] | \{.*\}[+] ) ) }gxms) { my ($prefix, $match, $rule, $qual) = ($1, $2, $3, $4); $dubious_line += $prefix =~ tr/\n//; my $safe_qual = substr($qual,0,-1); _debug_notify( warn => qq{Non-backtracking subrule call <$rule>$qual}, qq{at $source_file line $dubious_line}, qq{may not revert correctly during backtracking.}, qq{(If grammar does not work, try <$rule>$safe_qual instead)}, q{}, ); $dubious_line += $match =~ tr/\n//; } # Check whether a log file was specified... my $compiletime_debugging_requested; local *Regexp::Grammars::LOGFILE = *Regexp::Grammars::LOGFILE; my $logfile = q{-}; my $log_where = "for regex grammar defined at $source_file line $source_line"; $grammar_spec =~ s{ ^ [^#]* < logfile: \s* ([^>]+?) \s* > }{ $logfile = _timestamp($1); # Presence of implies compile-time logging... $compiletime_debugging_requested = 1; *Regexp::Grammars::LOGFILE = _open_log('>',$logfile, $log_where ); # Delete directive... q{}; }gexms; # Look ahead for any run-time debugging or timeout requests... my $runtime_debugging_requested = $grammar_spec =~ m{ ^ [^#]* < debug: \s* (run | match | step | try | on | same ) \s* > | \$DEBUG (?! \s* (?: \[ | \{) ) }xms; my $timeout_requested = $grammar_spec =~ m{ ^ [^#]* < timeout: \s* \d+ \s* > }xms; # Standard actions set up and clean up any regex debugging... # Before entire match, set up a stack of attempt records and report... my $pre_match_debug = $runtime_debugging_requested ? qq{(?{; *Regexp::Grammars::LOGFILE = Regexp::Grammars::_open_log('>>','$logfile', '$log_where'); Regexp::Grammars::_init_try_stack(); })} : qq{(?{; *Regexp::Grammars::LOGFILE = Regexp::Grammars::_open_log('>>','$logfile', '$log_where'); })} ; # After entire match, report whether successful or not... my $post_match_debug = $runtime_debugging_requested ? qq{(?{;Regexp::Grammars::_debug_matched(0,\\%/,'',\$^N)}) |(?>(?{;Regexp::Grammars::_debug_handle_failures(0,''); }) (?!)) } : q{} ; # Remove comment lines... $grammar_spec =~ s{^ ([^#\n]*) \s \# [^\n]* }{$1}gxms; # Subdivide into rule and token definitions, preparing to process each... # REWRITE THIS, USING (PROBABLY NEED TO REFACTOR ALL GRAMMARS TO REUSe # THESE COMPONENTS: # (? \( \s* (?&PARAMS)? \s* \) | (?# NOTHING ) ) # (? (?&PARAM) \s* (?: , \s* (?&PARAM) \s* )* ,? ) # (? (?&VAR) (?: \s* = \s* (?: (?&LITERAL) | (?&PARENCODE) ) )? ) # (? (?&NUMBER) | (?&STRING) | (?&VAR) ) # (? : (?&IDENT) ) my @defns = split m{ (< (obj|)(rule|token) \s*+ : \s*+ ((?:${IDENT}::)*+) (?: ($IDENT) \s*+ = \s*+ )?+ ($IDENT) \s* >) }xms, $grammar_spec; # Extract up list of names of defined rules/tokens... # (Name is every 6th item out of every seven, skipping the first item) my @subrule_names = @defns[ map { $_ * 7 + 6 } 0 .. ((@defns-1)/7-1) ]; my @defns_copy = @defns[1..$#defns]; my %subrule_names; # Build a look-up table of subrule names, checking for duplicates... my $defn_line = $source_line + $defns[0] =~ tr/\n//; my %first_decl_explanation; for my $subrule_name (@subrule_names) { my ($full_decl, $objectify, $type, $qualifier, $name, $callname, $body) = splice(@defns_copy, 0, 7); if (++$subrule_names{$subrule_name} > 1) { _debug_notify( warn => "Redeclaration of <$objectify$type: $subrule_name>", "at $source_file line $defn_line", "will be ignored.", @{ $first_decl_explanation{$subrule_name} }, q{}, ); } else { $first_decl_explanation{$subrule_name} = [ "(Hidden by the earlier declaration of <$objectify$type: $subrule_name>", " at $source_file line $defn_line)" ]; } $defn_line += ($full_decl.$body) =~ tr/\n//; } # Add the built-ins... @subrule_names{'ws', 'hk', 'matchpos', 'matchline'} = (1) x 4; # An empty main rule will never match anything... my $main_regex = shift @defns; if ($main_regex =~ m{\A (?: \s++ | \(\?\# [^)]* \) | \# [^\n]++ )* \z}xms) { _debug_notify( error => "No main regex specified before rule definitions", "in regex grammar declared at $source_file line $source_line", "Grammar will never match anything.", "(Or did you forget a specification?)", q{}, ); } # Compile the regex or grammar... my $regex = q{}; my $grammar_name; my $is_grammar; # Is this a grammar specification? if ($main_regex =~ $GRAMMAR_DIRECTIVE) { # Normalize grammar name and report... $grammar_name = $+{grammar_name}; if ($grammar_name !~ /::/) { $grammar_name = caller(1) . "::$grammar_name"; } $is_grammar = 1; # Add subrule definitions to namespace... for my $subrule_name (@subrule_names) { $CACHE{$grammar_name.'::'.$subrule_name} = 1; } } else { state $dummy_grammar_index = 0; $grammar_name = '______' . $dummy_grammar_index++; } # Extract any inheritance information... my ($inherited_rules, $inherited_subrule_names) = _extract_inheritances( $source_line, $source_file, $main_regex, $compiletime_debugging_requested, $grammar_name ); # Remove requests... $main_regex =~ s{ $EXTENDS_DIRECTIVE }{}gxms; # Add inherited subrule names to allowed subrule names; @subrule_names{ keys %{$inherited_subrule_names} } = values %{$inherited_subrule_names}; # Remove comments from top-level grammar... $main_regex =~ s{ \(\?\# [^)]* \) | (? }{}gxms) ? 1 : ($main_regex =~ s{ < context \s* : \s* > }{}gxms) ? 0 : 0; # If so, set up to save the grammar... if ($is_grammar) { # Normalize grammar name and report... if ($grammar_name !~ /::/) { $grammar_name = caller(1) . "::$grammar_name"; } if ($compiletime_debugging_requested) { _debug_notify( info => "Processing definition of grammar $grammar_name...", q{}, ); } # Remove the grammar directive... $main_regex =~ s{ ( $GRAMMAR_DIRECTIVE | < debug: \s* (run | match | step | try | on | off | same ) \s* > ) }{$source_line += $1 =~ tr/\n//; q{}}gexms; # Check for anything else in the main regex... if ($main_regex =~ /\A(\s*)\S/) { $source_line += $1 =~ tr/\n//; _debug_notify( warn => "Unexpected item before first subrule specification", "in definition of ", "at $source_file line $source_line:", map({ " $_"} grep /\S/, split "\n", $main_regex), "(this will be ignored when defining the grammar)", q{}, ); } # Remember set of valid subrule names... $subrule_names_for{$grammar_name} = { map({ ($_ => 1) } keys %subrule_names), map({ ($grammar_name.'::'.$_ => 1) } grep { !/::/ } keys %subrule_names), }; } else { #...not a grammar specification # Report how main regex was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( info => "Processing the main regex before any rule definitions", ); } # Any actual regex is processed first... $regex = _translate_subrule_calls( $source_file, $source_line, $grammar_name, $main_regex, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, q{}, # Expected...what? \%subrule_names, 0, # Whitespace isn't magical $nocontext, ); # Wrap the main regex (to ensure |'s don't segment pre and # post commands)... $regex = "(?:$regex)"; # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( q{} => q{ |}, q{ \\___End of main regex}, q{}, ); } } # Update line number... $source_line += $main_regex =~ tr/\n//; # Then iterate any following rule definitions... while (@defns) { # Grab details of each rule defn (as extracted by previous split)... my ($full_decl, $objectify, $type, $qualifier, $name, $callname, $body) = splice(@defns, 0, 7); $name //= $callname; my $qualified_name = $grammar_name.'::'.$callname; # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( info => "Defining a $type: <$callname>", " |...Returns: " . ($objectify ? "an object of class '$qualifier$name'" : "a hash"), ); } # Translate any nested <...> constructs... my $trans_body = _translate_subrule_calls( $source_file, $source_line, $grammar_name, $body, $compiletime_debugging_requested, $runtime_debugging_requested, $timeout_requested, $pre_match_debug, $post_match_debug, $callname, # Expected...what? \%subrule_names, $type eq 'rule', # Is whitespace magical? $nocontext, # Start with the global nocontextuality ); # Report how construct was interpreted, if requested to... if ($compiletime_debugging_requested) { _debug_notify( q{} => q{ |}, q{ \\___End of rule definition}, q{}, ); } # Make allowance for possible local whitespace definitions... my $local_ws_defn = q{}; my $local_ws_call = q{(?&ws__implicit__)}; # Rules make non-code literal whitespace match textual whitespace... if ($type eq 'rule') { # Implement any local whitespace definition... my $first_ws = 1; WS_DIRECTIVE: while ($trans_body =~ s{$WS_PATTERN}{}oxms) { my $defn = $1; if ($defn !~ m{\S}xms) { _debug_notify( warn => qq{Ignoring useless empty directive}, qq{in definition of }, qq{near $source_file line $source_line}, qq{(Did you mean instead?)}, q{}, ); next WS_DIRECTIVE; } elsif (!$first_ws) { _debug_notify( warn => qq{Ignoring useless extra directive}, qq{in definition of }, qq{at $source_file line $source_line}, qq{(No more than one is permitted per rule!)}, q{}, ); next WS_DIRECTIVE; } else { $first_ws = 0; } state $ws_counter = 0; $ws_counter++; $local_ws_defn = qq{(?<__RG_ws_$ws_counter> $defn)}; $local_ws_call = qq{(?&__RG_ws_$ws_counter)}; } # Implement auto-whitespace... state $CODE_OR_SPACE = qr{ (? # These are not magic... \( \?\?? (?&BRACED) \) # Embedded code blocks | \s++ # Whitespace not followed by... (?= \| # ...an OR | (?: \) \s* )? \z # ...the end of the rule | \(\(?\?\&ws\) # ...an explicit ws match | \(\?\??\{ # ...an embedded code block | \\s # ...an explicit space match ) ) | (? \s++ ) # All other whitespace is magic (?(DEFINE) (? \{ (?: \\. | (?&BRACED) | [^{}] )* \} ) ) }xms; $trans_body =~ s{($CODE_OR_SPACE)}{ $+{ignorable_space} // $local_ws_call }exmsg; } else { while ($trans_body =~ s{$WS_PATTERN}{}oxms) { _debug_notify( warn => qq{Ignoring useless directive}, qq{in definition of }, qq{at $source_file line $source_line}, qq{(Did you need to define instead of ?)}, q{}, ); } } $regex .= "\n###############[ $source_file line $source_line ]###############\n" . _translate_rule_def( $type, $qualifier, $name, $callname, $qualified_name, $trans_body, $objectify, $local_ws_defn ); # Update line number... $source_line += ($full_decl.$body) =~ tr/\n//; } # Insert checkpoints into any user-defined code block... $regex =~ s{ \( \?\?? \{ \K (?!;) }{ local \@Regexp::Grammars::RESULT_STACK = \@Regexp::Grammars::RESULT_STACK; }xmsg; # Check for any suspicious left-overs from the start of the regex... pos $regex = 0; # If a grammar definition, save grammar and return a placeholder... if ($is_grammar) { $user_defined_grammar{$grammar_name} = $regex; return qq{(?{ warn "Can't match directly against a pure grammar: \n"; })(*COMMIT)(?!)}; } # Otherwise, aggregrate the final grammar... else { return _complete_regex($regex.$inherited_rules, $pre_match_debug, $post_match_debug, $nocontext); } } sub _complete_regex { my ($regex, $pre_match_debug, $post_match_debug, $nocontext) = @_; return $nocontext ? qq{(?x)$pre_match_debug$PROLOGUE$regex$EPILOGUE_NC$post_match_debug} : qq{(?x)$pre_match_debug$PROLOGUE$regex$EPILOGUE$post_match_debug}; } 1; # Magic true value required at end of module __END__ =head1 NAME Regexp::Grammars - Add grammatical parsing features to Perl 5.10 regexes =head1 VERSION This document describes Regexp::Grammars version 1.045 =head1 SYNOPSIS use Regexp::Grammars; my $parser = qr{ (?: # Parse and save a Verb in a scalar <.ws> # Parse but don't save whitespace # Parse and save a Noun in a scalar 0.5 ? 'VN' : 'VerbNoun' })> # Save result of expression in a scalar | (?: <[Noun]> # Parse a Noun and save result in a list (saved under the key 'Noun') <[PostNoun=ws]> # Parse whitespace, save it in a list # (saved under the key 'PostNoun') )+ # Parse a Verb and save result in a scalar (saved under the key 'Verb') # Save a literal in a scalar | # Turn on the integrated debugger here <.Cmd= (?: mv? )> # Parse but don't capture a subpattern (name it 'Cmd' for debugging purposes) <[File]>+ # Parse 1+ Files and save them in a list (saved under the key 'File') # Turn off the integrated debugger here # Parse a File and save it in a scalar (saved under the key 'Dest') ) ################################################################ # Define a subrule named File <.ws> # - Parse but don't capture whitespace # - Parse the subpattern and capture # matched text as the result of the # subrule # Define a subrule named Noun cat | dog | fish # - Match an alternative (as usual) # Define a whitespace-sensitive subrule eats # - Match a literal (after any space) ? # - Parse optional subrule Noun and # save result under the key 'Object' | # Or else... # - Parse subrule AUX and save result # - Match a literal, save under 'part' # Define a whitespace-insensitive subrule (has | is) # - Match an alternative and capture (?{ $MATCH = uc $^N }) # - Use captured text as subrule result }x; # Match the grammar against some text... if ($text =~ $parser) { # If successful, the hash %/ will have the hierarchy of results... process_data_in( %/ ); } =head1 QUICKSTART CHEATSHEET =head2 In your program... use Regexp::Grammars; Allow enhanced regexes in lexical scope %/ Result-hash for successful grammar match =head2 Defining and using named grammars... Define a named grammar that can be inherited Current grammar inherits named grammar's rules =head2 Defining rules in your grammar... Define rule with magic whitespace Define rule without magic whitespace Define rule that blesses return-hash into class Define token that blesses return-hash into class Shortcut for above (rule name derived from class) Shortcut for above (token name derived from class) =head2 Matching rules in your grammar... Call named subrule (may be fully qualified) save result to $MATCH{RULENAME} Call named subrule, passing args to it Call subrule and fail if it matches (shorthand for (?!<.RULENAME>) ) <:IDENT> Match contents of $ARG{IDENT} as a pattern <\:IDENT> Match contents of $ARG{IDENT} as a literal Match closing delimiter for $ARG{IDENT} <%HASH> Match longest possible key of hash <%HASH {PAT}> Match any key of hash that also matches PAT Match closing delimiter for $MATCH{IDENT} <\_IDENT> Match the literal contents of $MATCH{IDENT} Call subrule, save result in $MATCH{ALIAS} Match a hash key, save key in $MATCH{ALIAS} Match pattern, save match in $MATCH{ALIAS} Execute code, save value in $MATCH{ALIAS} Save specified string in $MATCH{ALIAS} Save specified number in $MATCH{ALIAS} Match closing delim, save as $MATCH{ALIAS} Match '$MATCH{IDENT}', save as $MATCH{ALIAS} <.SUBRULE> Call subrule (one of the above forms), but don't save the result in %MATCH <[SUBRULE]> Call subrule (one of the above forms), but append result instead of overwriting it + % Match one or more repetitions of SUBRULE1 as long as they're separated by SUBRULE2 ** Same (only for backwards compatibility) * % Match zero or more repetitions of SUBRULE1 as long as they're separated by SUBRULE2 =head2 In your grammar's code blocks... $CAPTURE Alias for $^N (the most recent paren capture) $CONTEXT Another alias for $^N $INDEX Current index of next matching position in string %MATCH Current rule's result-hash $MATCH Magic override value (returned instead of result-hash) %ARG Current rule's argument hash $DEBUG Current match-time debugging mode =head2 Directives... Fail if code evaluates false Fail after specified number of seconds Change match-time debugging mode Change debugging log file (default: STDERR) Queue error message and fail parse Queue error message and backtrack Queue warning message and continue Explicitly add a message to debugging log Override automatic whitespace matching Simplify the result of a subrule match Switch on context substring retention Switch off context substring retention =head1 DESCRIPTION This module adds a small number of new regex constructs that can be used within Perl 5.10 patterns to implement complete recursive-descent parsing. Perl 5.10 already supports recursive=descent I, via the new C<< (?...) >> and C<< (?&name) >> constructs. For example, here is a simple matcher for a subset of the LaTeX markup language: $matcher = qr{ (?&File) (?(DEFINE) (? (?&Element)* ) (? \s* (?&Command) | \s* (?&Literal) ) (? \\ \s* (?&Literal) \s* (?&Options)? \s* (?&Args)? ) (? \[ \s* (?:(?&Option) (?:\s*,\s* (?&Option) )*)? \s* \]) (? \{ \s* (?&Element)* \s* \} ) (?